WHAT:
GTM added for partnered satellites sign up pages
csp values were extended to make GTM work at all:
1. googletagmanager.com for GTM script
2. google-analytics.com for GA script
3. hash was added to avoid using 'unsafe-inline' value in 'script-src' directive
Also config flag for GTM id was added
WHY:
Marketing team needs GTM and GA for their campaigns
Change-Id: Ibb2ace737feb971dda6c191599d479fe4a7af332
When a request comes in on the satellite api and we validate the
macaroon, we now also check if any of the macaroon's tails have been
revoked.
Change-Id: I80ce4312602baf431cfa1b1285f79bed88bb4497
As the tables that get cleaned up by this job get a lot of inserts and deletes over the course of a day, the autovacuum process on PostgreSQL struggles fairly easily/quickly.
Due to its limitation, it can only delete 180,000,000 tuples in one go, before it has to rescan the entire table/index.
With the current load, the most busy satellites accumulate about 1,000,000,000 tuples per day (consumed_serials). With our current 24h interval that results in ~6-7 scans, slowing the entire database down for a quite long time.
This PR reduces the interval to 4 hours, which under a constant load, results in less than 180,000,000 entries per run.
That way, we do not scan twice for only a small gain over said amount. Reducing the interval further would also increase the DB load unnecessary, as each run scans the entire tables at least once.
For future reference, we might need to adjust the interval, if the load is significantly changing.
Change-Id: I18fdd45d93d468cff126e719c8380c29a49f43dd
also remove the continuation support from the queue, otherwise
we may end up sequential scanning the entire table to get
a few rows at the end.
then, in the core, instead of looping both to get a big enough
batch inside of the queue, as well as outside of it to ensure
we consume the whole queue, just get a single batch at a time.
also, make the queue size configurable because we'll need to
do some tuning in production.
Change-Id: If1a997c6012898056ace89366a847c4cb141a025
WHAT:
1. updated verification page URL in config
2. added list of partnered satellites to config
3. added logic for satellites dropdown on new signup/login pages
WHY:
1. signup/login flow was reworked in tardigrade.io repo (iframe removed, new pages etc.)
2. new config flag was added to check if satellite name matches at least one member of partnered satellites list to redirect user to verification page
3. new pages will have dropdown with partnered satellites list. Appropriate logic was added.
Change-Id: I33399ab66ca31f07b297a433f6b1f41da4cb6e66
The current satellite config lock code relies on bash scripts and
gnu diff, it must be run as root and hence it typically requires
docker. The old version will be removed at a later date..
I tried for several hours to run directly against cmdSetup() in
cmd/satellite/main.go, to avoid the ctx.Compile() call. I had no
luck.
Change-Id: I0a4888421e743b436d32b6af69d04759d7816751
Also distinguish the purpose for selecting nodes to avoid potential
confusion, what should allow caching and what shouldn't.
Change-Id: Iee2451c1f10d0f1c81feb1641507400d89918d61
Add a flag that allows us to easily switch disqualification from
suspension mode on or off. A node will only be disqualified from
suspension mode if it has been suspended for longer than the grace
period AND the SuspensionDQEnabled flag is true.
Change-Id: I9e67caa727183cd52ab2042b0a370a1bcaebe792
Previously we are using tracing.sampled to be the switch for turning on/off tracing.
However we would like to separate sampling rate from being the switch,
so we can set sampling rate to be 0 but still intialize tracing for
satellite and storagenodes
Change-Id: I27e6ba25ea6f6b612b4e1a57cf1301889ded41ec
Added a per IP rate limiter to the console web.
Cleaned up password check to leak less bcyrpt info.
Change-Id: I3c882978bd8de3ee9428cb6434a41ab2fc405fb2
If a node is suspended and receives an unknown or failing audit,
disqualify them if the grace period (default 1w in production) has
passed.
Migrate the nodes table so any node that is currently suspended gets
unsuspended when the satellite starts up.
Change-Id: I7b81c68026f823417faa0bf5e5cb5e67c7156b82
* Delete expired segments in expired segments service using metainfo
loop
* Add test to verify expired segments service deletes expired segments
* Ignore expired segments in checker observer
* Modify checker tests to verify that expired segments are ignored
* Ignore expired segments in segment repairer and drop from repair queue
* Add repair test to verify that a segment that expires after being
added to the repair queue is ignored and dropped from the repair queue
Change-Id: Ib2b0934db525fef58325583d2a7ca859b88ea60d
Alpha=1 and beta=0 are the expected first values for any alpha/beta
reputation system we are using in the codebase. So we are removing the
configurability of these values.
Change-Id: Ic61861b8ea5047fa1438ea6609b1d0048bf0abc3
We want to increase our throughput for downtime estimation. This commit
adds the ability to reach out to multiple nodes concurrently for downtime
estimation. The number of concurrent routines is determined by a new config
flag, EstimationConcurrencyLimit. It also increases the default
EstimationBatchSize to 1000.
Change-Id: I800ce7ec1035885afa194c3c3f64eedd4f6f61eb
BeginObject response
We want to control inline segment size and segment size on satellite
side. We need to return such information to uplink like with redundancy
scheme.
Change-Id: If04b0a45a2757a01c0cc046432c115f475e9323c
Add flag to satellite repairer, "InMemoryRepair" that allows the
satellite to decide whether to download the entire segment being
repaired into memory (this is what the satellite already does), or to
download it into temporary files on disk that will be read from in the
upload phase of repair.
This should help with handling high repair traffic on satellites that
cannot afford to spend 64mb of memory per repair worker.
Updates tests to test repair for both in memory and to disk.
Change-Id: Iddf591e165621497c98533d45bfea3c28b08a194
Now that we are trying to identify the root cause of the satellite load limitations (i.e. currently the satellite has a max ability of 400 rps for uploads and we need this to be higher), we are using the golang diagnostic tools to collect insight into what the bottlenecks are. We currently have a debug endpoint to gather some cpu and mem data, but it could be useful to have continuous profiling. GCP stackdriver has support for continuous profiling so lets set that up and see if it is helpful to gather more data.
This PR adds support for [GCP continuous profiler](https://cloud.google.com/profiler) which allows enabling continuous cpu/mem profiling and the stats are sent to stackdriver in google cloud console.
To enable the continuous profiling for a storj component, do the following:
- prereq: the workload must be running in GKE and have Stackdriver Profiling IAM role permissions
- provide the config flag `debug.profilename` in the config.yaml file for the workload (i.e. satellite api process, etc). The profilename should be the workload name, for example "satellite-api".
- once the above config flag is provided, the profiler will be initialized and profiling stats will automatically be sent to GCP project where the workload is running and viewable in the Stackdriver Profile page in the console
The current implementation assumes the workload is running in GKE, however if we find if useful we can add support to enable this from anywhere. But for simplicity, its configured this way assuming the main goal is to enable in production systems.
Change-Id: Ibf8ebe2df7bf06fdd4951ee6a1e48854dd36ad47
This new repair timeout (configured as TotalTimeout) will include both
the time to download pieces and the time to upload pieces, as well as
the time to pop the segment from the repair queue.
This is a move from Github PR #3645.
Change-Id: I47d618f57285845d8473fcd285f7d9be9b4318c8
This change adds two new tables to process orders as fast as we used
to but in an asynchronous manner and with hopefully less storage
usage. This should help scale on cockroach, but limits us to one
worker. It lays the groundwork for the order processing pipeline to
be queue rather than database driven.
For more details, see the added fast billing changes blueprint.
It also fixes the orders db so that all the timestamps that are
passed to columns that do not contain a time zone are converted to
UTC at the last possible opportunity, making it less likely to use
the APIs incorrectly. We really should migrate to include timezones
on all of our timestamp columns.
Change-Id: Ibfda8e7a3d5972b7798fb61b31ff56419c64ea35