storj

Author	SHA1	Message	Date
JT Olio	6bce907cb0	satellite: try to stream rollups to aggregation function to use less memory this change tries really hard to never have all of the storage node rollups in memory at the same time, up until the rollups are actually getting summed together. Change-Id: If67f49e7d71106798d996a6850b3e48671bd9e18	2020-11-29 10:26:32 -07:00
JT Olio	6aae21541f	satellitedb: do saverollup in batches Change-Id: I78278a192cba60541eee2986f54a88d5a479bd3e	2020-11-28 19:26:46 -07:00
Moby von Briesen	575f50df84	satellite/repair: Update repair override config to support multiple RS schemes. Rather than having a single repair override value, we will now support repair override values based on a particular segment's RS scheme. The new format for RS override values is "k/o/n-override,k/o/n-override..." Change-Id: Ieb422638446ef3a9357d59b2d279ee941367604d	2020-11-23 18:01:15 +00:00
Ethan	2b92bba563	satellite/satellitedb/orders: Handle serial_numbers deletes in smaller increments on CRDB CRDB doesn't like large deletes. While testing in the POC environment we found that deletes on the serial_numbers table could take hours. This change limits deletes to 1000 at a time (configurable) to avoid blocking other queries. Change-Id: I08455e25db1574579dd4d7b7125a08e9c913dff1	2020-11-20 13:44:52 +00:00
Moby von Briesen	0ec685b173	satellite/{satellitedb, repair/{queue, checker}}: Use new column "segmentHealth" instead of "numHealthy" in injured segments queue We plan to add support for a new Reed-Solomon scheme soon, but our repair queue orders segments by least number of healthy pieces first. With a second RS scheme, fewer healthy pieces will not necessarily correlate to lower health. This change just adds the new column in a migration. A separate change will add the new health function. Right now, since we only support one RS scheme, behavior will not change. Number of healthy pieces is being inserted as "segment health" until the new health function is merged. Segment health is calculated with a new priority function created in commit `3e5640359`. In order to use the function, a new config value is added, called NodeFailureRate, representing the approximate probability of any individual node going down in the duration of one checker run. Change-Id: I51c4202203faf52528d923befbe886dbf86d02f2	2020-11-16 21:18:09 +00:00
Moby von Briesen	db6bc6503d	satellite/metainfo: Update metainfo RS config to more easily support multiple RS schemes. Make metainfo.RSConfig a valid pflag config value. This allows us to configure the RSConfig as a string like k/m/o/n-shareSize, which makes having multiple supported RS schemes easier in the future. RS-related config values that are no longer needed have been removed (MinTotalThreshold, MaxTotalThreshold, MaxBufferMem, Verify). Change-Id: I0178ae467dcf4375c504e7202f31443d627c15e1	2020-11-09 22:16:13 +00:00
littleskunk	ed1f6d7973	satellite/config: move repair override from config to default (#3958 ) Co-authored-by: Igor <38665104+ihaid@users.noreply.github.com>	2020-10-28 17:24:39 +02:00
Jessica Grebenschikov	f5880f6833	satellite/orders: rollout phase3 of SettlementWithWindow endpoint Change-Id: Id19fae4f444c83157ce58c933a18be1898430ad0	2020-10-26 14:56:28 +00:00
Moby von Briesen	7c3afe164b	satellite/overlay: uncomment dq for offline and disable with feature flag Change-Id: Ib39e2be32e880b822a94eddfb81af99a38843a27	2020-10-16 12:55:16 +00:00
Jessica Grebenschikov	205c39d404	satellite/orders: upgrade to phase 2 rollout ordersWithWindow We are moving an error into rejectErr since its preventing storage nodes from being able to settle other orders. Change-Id: I3ac97c340e491b127f5e0024c5e8bd9f4df8d5c3	2020-10-15 21:20:19 +00:00
Jeff Wendling	0f0faf0a9f	satellite/orders: do a better job limiting concurrent requests Doing it at the ProcessOrders level was insufficient: the endpoints make multiple database calls. It was a misguided attempt to only have one spot enter the semaphore. By putting it in the endpoint we can not only be sure that the concurrency is correctly limited but it can be configurable easily. Change-Id: I937149dd077adf9eb87fce52a1a17dc0afe96f64	2020-10-09 16:27:15 -04:00
Jessica Grebenschikov	4a2c66fa06	satellite/accounting: add cache for getting project storage and bw limits This PR adds the following items: 1) an in-memory read-only cache thats stores project limit info for projectIDs This cache is stored in-memory since this is expected to be a small amount of data. In this implementation we are only storing in the cache projects that have been accessed. Currently for the largest Satellite (eu-west) there is about 4500 total projects. So storing the storage limit (int64) and the bandwidth limit (int64), this would end up being about 200kb (including the 32 byte project ID) if all 4500 projectIDs were in the cache. So this all fits in memory for the time being. At some point it may not as usage grows, but that seems years out. The cache is a read only cache. When requests come in to upload/download a file, we will read from the cache what the current limits are for that project. If the cache does not contain the projectID, it will get the info from the database (satellitedb project table), then add it to the cache. The only time the values in the cache are modified is when either a) the project ID is not in the cache, or b) the item in the cache has expired (default 10mins), then the data gets refreshed out of the database. This occurs by default every 10 mins. This means that if we update the usage limits in the database, that change might not show up in the cache for 10 mins which mean it will not be reflected to limit end users uploading/downloading files for that time period.. Change-Id: I3fd7056cf963676009834fcbcf9c4a0922ca4a8f	2020-09-25 16:28:49 +00:00
Egon Elbre	888bfaae4b	cmd/satellite: only add google profiler to satellite Previously uplink, storagenode etc. included google cloud profiler, however they don't need it. Change-Id: Ibc95cb03d667a3844672eecd49fa455a6acc3866	2020-09-25 18:56:59 +03:00
Stefan Benten	9d0d0ad728	satellite/console: enable multiple projects all users Change-Id: I42cc9f48cac387e1a67d21c1dd394f28cc5ff399	2020-09-23 16:18:28 +00:00
VitaliiShpital	c4d6f472fc	web/satellite: notification bar for reaching projects count limit WHAT: notification bar added to project dashboard page. It is shown when projects count limit is reached. Create project button is removed after creating last available project WHY: inform user that their projects count limit was reached Change-Id: If0d67148003be40cc9eb4d8b25cc17f8204008d4	2020-09-08 15:48:27 +00:00
Egon Elbre	dc48197bd8	satellite/orders: add bucket id to order limit Change-Id: I9019ec77d692e62ac17b67a1da71dc3535cde50c	2020-09-03 10:50:11 +03:00
Egon Elbre	61b17f1214	satellite/orders: add encryption keys flag to Service Change-Id: Ie96e75bc96241b799d04654ef5e05b82e6a899bb	2020-09-02 05:02:14 +00:00
Natalie Villasana	95ff29cce1	satellite/metainfo: reduce lookupLimit default to 2500 Change-Id: I6569c6d1f145b127a9e8e1a65e4344dd62c989bb	2020-09-01 12:04:48 -04:00
stefanbenten	4645805b18	private/dbutil: set connMaxLifetime to 30 minutes To prevent longlived unused connections, set the maximum time to 30 minutes to prevent proxies and loadbalancers forcefully cutting the connection. This helps in scenarios with low load/requests to a DB. Change-Id: I7dba15ef97f6f6541e872a6fb1d3a9bbbfe5bb50	2020-08-28 18:00:41 +00:00
Qweder93	88ff8829a1	satellite/gracefulexit: RecvTimeout increased to 2h, so slow nodes stop receiving lot of fails and as a result DQ Change-Id: Id4c8a394162ba368aeb573a927f825bf7250aa52	2020-08-24 18:59:24 +03:00
Yingrong Zhao	14ad7a4f1c	satellite/metainfo: add limiter for objectdeletion and piecedeletion services This PR adds a limiter on the amount of concurrent objects deletion can be handled so we don't run out of memory. Change-Id: Id2ce368af6f86845fcdfd34cb2f5e460efe9b272	2020-08-19 16:08:29 +00:00
Qweder93	4ee1b2d45a	storagenode/console: added list of all audits per satellite to sno dashboard/satellites Change-Id: I52e58748d6467f372d9a308347fc77e400d137e2	2020-08-10 12:55:07 +00:00
Moby von Briesen	e02adfe5e9	satellite/overlay/config.go: Add AuditHistoryConfig to overlay Adds AuditHistory{WindowSize, TrackingPeriod, GracePeriod, OfflineThreshold}. These values will be used to track offline audits over time, and to suspend/disqualify nodes for being offline for too long. Change-Id: I05f7dbc3c034bdc53c4fbd7719c71a44f37ec6a5	2020-08-04 18:18:56 +00:00
Jeff Wendling	85a74b47e7	satellite/orders: 3-phase rollout This adds a config flag orders.window-endpoint-rollout-phase that can take on the values phase1, phase2 or phase3. In phase1, the current orders endpoint continues to work as usual, and the windowed orders endpoint uses the same backend as the current one (but also does a bit extra). In phase2, the current orders endpoint is disabled and the windowed orders endpoint continues to use the same backend. In phase3, the current orders endpoint is still disabled and the windowed orders endpoint uses the new backend that requires much less database traffic and state. The intention is to deploy in phase1, roll out code to nodes to have them use the windowed endpoint, switch to phase2, wait a couple days for all existing orders to expire, then switch to phase3. Additionally, it fixes a bug where a node could submit a bunch of orders and rack up charges for a bucket. Change-Id: Ifdc10e09ae1645159cbec7ace687dcb2d594c76d	2020-08-03 17:01:42 +00:00
Rafael Gomes	935f44ddb7	satellite/metainfo: Add Delete Service config Change-Id: I0a6e3ce1adfe1488eb23da9dda92877af1834599	2020-08-03 14:28:02 +00:00
Bill Thorp	b265b7f555	satellite/console: make paywall optional Add a config so that some percent of users require credit cards / account balances in order to create a project or have a promotional coupon applied UI was updated to match needed paywall status At this point we decided not to use a field to store if a user is in an A/B test, and instead just use math to see if they're in a test. We decided to use MD5 (because its in Postgres too) and User UUID for that math. Change-Id: I0fcd80707dc29afc668632d078e1b5a7a24f3bb3	2020-07-28 10:57:49 +00:00
Egon Elbre	ba4c3d9986	satellite/orders: remove unused node status logging flag Change-Id: I24da78a11cc5d3d88cdf6aca85c4238e4086e59c	2020-07-24 16:35:59 +03:00
Ethan	cfca021839	satellite/accounting: Add chore to cleanup old project bandwidth rollups data Removes old project_bandwidth_rollups records that are no longer used. Uses a retain months configuration to determine how many months to save. Current month cannot be removed. Tests retainMonths=-1, 0, 2 Change-Id: Ia4be2546cdb28802427acf41ecd85ad66df3e62c	2020-07-22 18:56:49 +00:00
stefanbenten	0209a2095f	satellite/{console,satellitedb}: add project_limit column to users table Change-Id: I603f085f17ca5b413dd1c6837c2081f9e7e791a1	2020-07-15 17:27:31 +00:00
Jennifer Johnson	784a156eea	satellite: prevents uplink from creating a bucket once it exceeds the max bucket allocation. Change-Id: I4b3822ed723c03dbbc0df136b2201027e19ba0cd	2020-07-15 17:27:05 +00:00
Stefan Benten	9dbd511396	private/dbutil: reduce db connection defaults (#3920 )	2020-07-08 19:59:42 +02:00
VitaliiShpital	5b3c8b2f1a	web/satellite: google tag manager for signup pages WHAT: GTM added for partnered satellites sign up pages csp values were extended to make GTM work at all: 1. googletagmanager.com for GTM script 2. google-analytics.com for GA script 3. hash was added to avoid using 'unsafe-inline' value in 'script-src' directive Also config flag for GTM id was added WHY: Marketing team needs GTM and GA for their campaigns Change-Id: Ibb2ace737feb971dda6c191599d479fe4a7af332	2020-06-23 10:45:04 +00:00
Isaac Hess	2d727bb14e	satellite: Check macaroon revocation When a request comes in on the satellite api and we validate the macaroon, we now also check if any of the macaroon's tails have been revoked. Change-Id: I80ce4312602baf431cfa1b1285f79bed88bb4497	2020-06-22 13:50:07 -06:00
Rafael Gomes	958ea1b9df	satellite/accounting: add download limit cache Change-Id: I722930cab8bd5d240f4878dc6997e9bc7637311f	2020-06-12 16:33:46 -03:00
stefanbenten	c6c8b923af	satellite/dbcleanup: run cleanup more frequently As the tables that get cleaned up by this job get a lot of inserts and deletes over the course of a day, the autovacuum process on PostgreSQL struggles fairly easily/quickly. Due to its limitation, it can only delete 180,000,000 tuples in one go, before it has to rescan the entire table/index. With the current load, the most busy satellites accumulate about 1,000,000,000 tuples per day (consumed_serials). With our current 24h interval that results in ~6-7 scans, slowing the entire database down for a quite long time. This PR reduces the interval to 4 hours, which under a constant load, results in less than 180,000,000 entries per run. That way, we do not scan twice for only a small gain over said amount. Reducing the interval further would also increase the DB load unnecessary, as each run scans the entire tables at least once. For future reference, we might need to adjust the interval, if the load is significantly changing. Change-Id: I18fdd45d93d468cff126e719c8380c29a49f43dd	2020-06-10 18:32:15 +00:00
Yingrong Zhao	9d7713cdd0	script/testdata: update tracing agent default address Change-Id: I730994f16f135c4b8643a52f4cf499487e4af326	2020-06-03 23:46:18 +00:00
Moby von Briesen	b82d04e618	satellite/metainfo: limit size of uplink-provided metadata to 2KiB Change-Id: Id44a46046ddb4a12102525531f4502fcff2b6252	2020-06-01 16:51:29 -04:00
Jeff Wendling	44433f38be	satellite/satellitedb: remove ORDER BY when reading from queue also remove the continuation support from the queue, otherwise we may end up sequential scanning the entire table to get a few rows at the end. then, in the core, instead of looping both to get a big enough batch inside of the queue, as well as outside of it to ensure we consume the whole queue, just get a single batch at a time. also, make the queue size configurable because we'll need to do some tuning in production. Change-Id: If1a997c6012898056ace89366a847c4cb141a025	2020-06-01 18:31:14 +00:00
littleskunk	801a3ab90d	satellite/coinpayments: Reduce update interval to 2 minutes (#3897 ) * satellite/coinpayments: Reduce update interval to 2 minutes * satellite/coinpayments: Reduce balance update Co-authored-by: paul cannon <thepaul@users.noreply.github.com>	2020-05-29 22:21:27 +02:00
VitaliiShpital	c9b9c686fc	web/satellite: logic for new signup/login flow WHAT: 1. updated verification page URL in config 2. added list of partnered satellites to config 3. added logic for satellites dropdown on new signup/login pages WHY: 1. signup/login flow was reworked in tardigrade.io repo (iframe removed, new pages etc.) 2. new config flag was added to check if satellite name matches at least one member of partnered satellites list to redirect user to verification page 3. new pages will have dropdown with partnered satellites list. Appropriate logic was added. Change-Id: I33399ab66ca31f07b297a433f6b1f41da4cb6e66	2020-05-29 17:11:44 +00:00
littleskunk	2fbb34c3ea	nodeselection: Increase minimum free space to 500MB (#3898 )	2020-05-25 12:13:28 +02:00
littleskunk	8ec64f3daf	satellite/overlay: enable node selection cache on all satellites (#3895 )	2020-05-19 19:25:53 +02:00
Bill Thorp	5a7a4d2e98	satellite: add Go test version of satellite-config-lock tests The current satellite config lock code relies on bash scripts and gnu diff, it must be run as root and hence it typically requires docker. The old version will be removed at a later date.. I tried for several hours to run directly against cmdSetup() in cmd/satellite/main.go, to avoid the ctx.Compile() call. I had no luck. Change-Id: I0a4888421e743b436d32b6af69d04759d7816751	2020-05-13 08:14:24 +00:00
igor gaidaienko	1eab5e2980	satellite/console: Increase default webUI rate limit to 5 Previous limit is annoying for normal users Change-Id: I7cb783e0b2515f415b2a055d5e811efab3810654	2020-05-12 16:12:17 +00:00
Stefan Benten	e23bd806b4	satellite/accounting: separate usage and bandwidth limit (#3878 )	2020-05-12 15:01:15 +02:00
Stefan Benten	65f3e26f80	satellite: Change Default Project Limits and minimum STORJ Payment (#3877 )	2020-05-12 14:18:58 +03:00
Egon Elbre	ec589a8289	all: fix comments about grpc Change-Id: Id830fbe2d44f083c88765561b6c07c5689afe5bd	2020-05-11 13:05:34 +03:00
Egon Elbre	4e94da3fda	satellite/overlay: add feature flag for node selection cache Also distinguish the purpose for selecting nodes to avoid potential confusion, what should allow caching and what shouldn't. Change-Id: Iee2451c1f10d0f1c81feb1641507400d89918d61	2020-05-06 16:13:47 +03:00
Jennifer Johnson	18078bf7ee	satellite/audit: increases audit worker concurrency to 2 Change-Id: Ibe3e3801b79accffbcfe9e2e02c96fc963894a7f	2020-05-05 11:31:55 +00:00
Egon Elbre	d98b8f6e23	satellite/metainfo,storage: use different limit for metainfo loop Change-Id: I5ef7233930679b977b33f7b3e1dda45c907dcfad	2020-05-05 10:37:20 +00:00
Moby von Briesen	8f60cfc4fb	satellite/overlay: Add flag for enabling/disabling disqualification from suspension mode Add a flag that allows us to easily switch disqualification from suspension mode on or off. A node will only be disqualified from suspension mode if it has been suspended for longer than the grace period AND the SuspensionDQEnabled flag is true. Change-Id: I9e67caa727183cd52ab2042b0a370a1bcaebe792	2020-05-04 17:25:09 +00:00
Yingrong Zhao	9b4a3f8fcc	cmd/uplink: use tracing.enabled flag Previously we are using tracing.sampled to be the switch for turning on/off tracing. However we would like to separate sampling rate from being the switch, so we can set sampling rate to be 0 but still intialize tracing for satellite and storagenodes Change-Id: I27e6ba25ea6f6b612b4e1a57cf1301889ded41ec	2020-04-27 17:54:57 +00:00
Bill Thorp	341aecfe0f	satellite/console: add rate limiter to login, register, password recovery Added a per IP rate limiter to the console web. Cleaned up password check to leak less bcyrpt info. Change-Id: I3c882978bd8de3ee9428cb6434a41ab2fc405fb2	2020-04-24 17:15:49 +00:00
Jess G	825226c98e	satellite/overlay: use node selection cache for uploads (#3859 ) * satellite/overlay: use node selection cache for uploads Change-Id: Ibd16cccee979d0544f2f4a01749af9f36f02a6ad * fix config lock Change-Id: Idd307e4dee8ab92749f1ec3f996419ea0af829fd * start fixing tests Change-Id: I207d373a3b2a2d9312c9e72fe9bd0b01e06ad6cf * fix test, add some more Change-Id: I82b99c2004fca2510965f9b389f87dd4474bc722 * change config name Change-Id: I0c0f7fc726b2565dc3828cb723f5459a940f2a0b * add benchmarks Change-Id: I05fa25bff8d5b65f94d918556855b95163d002e9 * revert bench to put in different PR Change-Id: I0f6942296895594768f19614bd7b2e3b9b106ade * add staleness to benchmark Change-Id: Ia80a310623d5a342afa6d835402170b531b0f870 * add cache config to testplanet Change-Id: I39abdab8cc442694da543115a9e470b2a8a25dff * have repair select old way Change-Id: I25a938457d7d1bcf89fd15130cb6b0ac19585252 * lower testplante config time Change-Id: Ib56a2ed086c06bc6061388d15a10a2526a663af7 * fix test Change-Id: I3868e9cacde2dfbf9c407afab04dc5fc2f286f69	2020-04-24 09:11:04 -07:00
Moby von Briesen	72b93f3120	satellite/satellitedb: disqualify suspended nodes when the grace period passes If a node is suspended and receives an unknown or failing audit, disqualify them if the grace period (default 1w in production) has passed. Migrate the nodes table so any node that is currently suspended gets unsuspended when the satellite starts up. Change-Id: I7b81c68026f823417faa0bf5e5cb5e67c7156b82	2020-04-22 15:45:00 -04:00
Yingrong Zhao	0bdcf123cf	bump monkit, monkit-jaeger, and private to latest Also bump storj.io/common and sync repo Change-Id: If8e60db6bdf0af8077b7befcb1da304c3c4dcae4	2020-04-22 12:30:37 -04:00
Moby von Briesen	178aa8b5e0	satellite/{metainfo,repair}: Delete expired segments from metainfo * Delete expired segments in expired segments service using metainfo loop * Add test to verify expired segments service deletes expired segments * Ignore expired segments in checker observer * Modify checker tests to verify that expired segments are ignored * Ignore expired segments in segment repairer and drop from repair queue * Add repair test to verify that a segment that expires after being added to the repair queue is ignored and dropped from the repair queue Change-Id: Ib2b0934db525fef58325583d2a7ca859b88ea60d	2020-04-22 13:02:31 +00:00
Yingrong Zhao	8375a09c89	cmd: remove InitTracing from satellite and storagenode main.go file Change-Id: I4addbe7d0645f66abfb3e98d74d17035e9624e69	2020-04-20 14:06:26 -04:00
VitaliiShpital	2dce4c232c	web/satellite: redirect to verification page on sign up if inside iframe Change-Id: I606b63fd27bef46597697b491970523e8a3a0cae	2020-04-16 13:35:49 +00:00
VitaliiShpital	158013a866	satellite/console: redirect on account activation Change-Id: I2506ce0fd3832bf46fbcdcc5a42bb83dc926e99a	2020-04-15 11:49:50 +00:00
Moby von Briesen	d7794a4851	satellite/overlay: hardcode default values for audit alpha/beta Alpha=1 and beta=0 are the expected first values for any alpha/beta reputation system we are using in the codebase. So we are removing the configurability of these values. Change-Id: Ic61861b8ea5047fa1438ea6609b1d0048bf0abc3	2020-04-14 19:12:40 +00:00
Cameron Ayer	3ee6c14f54	satellite/downtime: add concurrency to downtime estimation We want to increase our throughput for downtime estimation. This commit adds the ability to reach out to multiple nodes concurrently for downtime estimation. The number of concurrent routines is determined by a new config flag, EstimationConcurrencyLimit. It also increases the default EstimationBatchSize to 1000. Change-Id: I800ce7ec1035885afa194c3c3f64eedd4f6f61eb	2020-04-14 14:39:13 +00:00
Qweder93	3a9422cc9a	satellite/nodestats: add pricing model to endpoint Change-Id: Iddace8e437216a343458f440b543cee61164f233	2020-04-08 14:29:51 +00:00
Yingrong Zhao	96e58d21b4	cmd;pkg/server: init tracing collector in all processes Add tracing handler in drpc server. Initializing tracing collector in admin, satellite api, garbage collection, satellite core, repaier, storagenode. Change-Id: Ie98420e35dfc6913836ebd82b517d9d12877aefc Change-Id: I91057b6265a4ac8bde033dfde692b8a28acca99f	2020-04-07 17:20:59 -04:00
Cameron Ayer	42be4bdc0f	satellite/contact: add timeout to PingBack method Change-Id: I2ec2f82e2e10d8be16f82e9de13ce42358e47c98	2020-04-04 18:26:30 +00:00
Michal Niewrzal	c178a08cb8	satellite/metainfo: add max segment size and max inline size to BeginObject response We want to control inline segment size and segment size on satellite side. We need to return such information to uplink like with redundancy scheme. Change-Id: If04b0a45a2757a01c0cc046432c115f475e9323c	2020-04-02 12:41:28 +00:00
Jeff Wendling	e2ff2ce672	satellite: compensation package and commands Change-Id: I7fd6399837e45ff48e5f3d47a95192a01d58e125	2020-03-30 14:08:14 -06:00
JT Olio	f28100b73f	bump storj.io/private Change-Id: I4ddd5c34521602967b89bd18e2a71a6f1e29f436	2020-03-27 21:57:35 +00:00
Moby von Briesen	a933bcc99a	satellite/repair/repairer/ec.go: add option for downloading pieces onto disk instead of in memory during repair Add flag to satellite repairer, "InMemoryRepair" that allows the satellite to decide whether to download the entire segment being repaired into memory (this is what the satellite already does), or to download it into temporary files on disk that will be read from in the upload phase of repair. This should help with handling high repair traffic on satellites that cannot afford to spend 64mb of memory per repair worker. Updates tests to test repair for both in memory and to disk. Change-Id: Iddf591e165621497c98533d45bfea3c28b08a194	2020-03-27 16:41:00 +00:00
Natalie Villasana	8e0ca0e6f5	satellite/gc: update release default for gc to run separately (#3830 )	2020-03-26 14:44:18 -04:00
Jennifer Johnson	699b635e5d	satellite/overlay: rename newNodePercentage to newNodeFraction Change-Id: Ie66de91f88183b44de0773589e83e4ade9aa997a	2020-03-19 20:09:32 +00:00
Jessica Grebenschikov	5142874144	satellite/gc: move garbage collection to its own process Change-Id: I7235aa83f7c641e31c62ba9d42192b2232dca4a5	2020-03-18 16:44:01 +00:00
Egon Elbre	09e0f3de63	satellite/metainfo/piecedeletion: add Service Change-Id: Id7e32ed569701fa0be66f9527c43a67052994570	2020-03-18 14:50:08 +00:00
Stefan Benten	49a30ce4a7	satellite/payments: Set proper defaults for the release (#3806 ) * Slight adjustments to the migration Change-Id: I68ae81c010c3414fde2845df16ab124f8d17834b * Change Coupon Value Change-Id: I0f241d09e5f716f1d1b3f0688643ba7f614d83c4 * Change AlphaUsage to 5GB Change-Id: I5d25c6b5750684510cda8b14a27f38d5b2b07408 * change config lock Change-Id: Ib7c7a54555ba2387c9aa8dd60a0501b0ee6491dd * Use Scan properly Change-Id: Ie39cf4644e3ddd703a254e2f5e616763dd805235 * Fix Config Lock Change-Id: I558ecc1c1becfaaefc7aea5ad2fe83fd6bf6b561	2020-03-16 22:53:12 +01:00
Stefan Benten	52590197c2	satellite/payments: More Cleanup and Satellite command to ensure we have stripe customers (#3805 )	2020-03-16 20:34:15 +01:00
Stefan Benten	bd603c0751	satellite/payments: Improve Invoice Generation (#3800 )	2020-03-13 17:07:39 +01:00
JT Olio	051569c69f	satellite: enable open registration (and add flag that disables it) SM-441 Change-Id: I47bfedb312089f6d2bfbab013bd74ad4b8aa5f5e	2020-03-11 03:53:34 +01:00
Jessica Grebenschikov	e19e3c1101	pkg/process: Now that we are trying to identify the root cause of the satellite load limitations (i.e. currently the satellite has a max ability of 400 rps for uploads and we need this to be higher), we are using the golang diagnostic tools to collect insight into what the bottlenecks are. We currently have a debug endpoint to gather some cpu and mem data, but it could be useful to have continuous profiling. GCP stackdriver has support for continuous profiling so lets set that up and see if it is helpful to gather more data. This PR adds support for [GCP continuous profiler](https://cloud.google.com/profiler) which allows enabling continuous cpu/mem profiling and the stats are sent to stackdriver in google cloud console. To enable the continuous profiling for a storj component, do the following: - prereq: the workload must be running in GKE and have Stackdriver Profiling IAM role permissions - provide the config flag `debug.profilename` in the config.yaml file for the workload (i.e. satellite api process, etc). The profilename should be the workload name, for example "satellite-api". - once the above config flag is provided, the profiler will be initialized and profiling stats will automatically be sent to GCP project where the workload is running and viewable in the Stackdriver Profile page in the console The current implementation assumes the workload is running in GKE, however if we find if useful we can add support to enable this from anywhere. But for simplicity, its configured this way assuming the main goal is to enable in production systems. Change-Id: Ibf8ebe2df7bf06fdd4951ee6a1e48854dd36ad47	2020-02-25 09:04:23 -08:00
paul cannon	92d86fa044	satellite/repair: fix repair concurrency This new repair timeout (configured as TotalTimeout) will include both the time to download pieces and the time to upload pieces, as well as the time to pop the segment from the repair queue. This is a move from Github PR #3645. Change-Id: I47d618f57285845d8473fcd285f7d9be9b4318c8	2020-02-24 19:57:09 +00:00
Jeff Wendling	f671eb2beb	satellite/satellitedb: use queue for orders to get back fast billing This change adds two new tables to process orders as fast as we used to but in an asynchronous manner and with hopefully less storage usage. This should help scale on cockroach, but limits us to one worker. It lays the groundwork for the order processing pipeline to be queue rather than database driven. For more details, see the added fast billing changes blueprint. It also fixes the orders db so that all the timestamps that are passed to columns that do not contain a time zone are converted to UTC at the last possible opportunity, making it less likely to use the APIs incorrectly. We really should migrate to include timezones on all of our timestamp columns. Change-Id: Ibfda8e7a3d5972b7798fb61b31ff56419c64ea35	2020-02-24 17:07:07 +00:00
Yingrong Zhao	77f67a8086	satellite/metainfo: add timeout for delete request Change-Id: I9cad6d7ea185fc2c0ed4e58b42e4e3a78178a79f	2020-02-20 09:10:16 +00:00
JT Olio	2ae9978304	satellite/gc: skip first gc run rationale: if GC kills the satellite, it would be nice to make it through a repair checker sweep first Change-Id: Id56171dc8e13940cfb6481e36a910bad077a01ed	2020-02-13 13:41:15 +02:00
littleskunk	76849558cb	satellite/gracefulexit: increase performance and tolerate higher error rate Graceful exit is very slow at the moment. Over the last couple days we increase the batch size on Stefans satellite to 1000 but as a side effect the error rate was increased. With a batch size of 500 the error rate looks stable. This PR will increase the default to batch size to 300. Graceful exit will still be painful slow but at least it will be a bit faster. At the same time this PR also increases the number of errors we tolerate. We don't want to DQ slow storage nodes just because they didn't finish all 300 transfers in time. We want to give them more retries. Change-Id: I92e3f99e116d4988457d8b902a88e85ed1bcc1a7	2020-02-12 11:40:15 +00:00
Egon Elbre	dbf46c4aa7	satellite/admin: administrative endpoint Admin server allows creating basic REST and html API-s for different administrative tasks. Change-Id: I3dc1786abe1c87350eed60ec90e48130f44e63cf	2020-02-12 12:12:50 +02:00
Cameron Ayer	b22bf16b35	satellite/overlay: add config flag for node selection free disk requirement Currently SNs report their free disk space once per hour. If a node becomes full, it has to wait until the next contact cycle begins to report; all the while receiving and failing upload requests. By increasing the minimum required disk space, we can give the storage nodes more time to report their space before the completely fill up. This change goes hand-in-hand with another change we want to implement: trigger capacity report on SN immediately upon falling below threshold. Change-Id: I12f778286c6c3f582438b0e2949765ac43325e27	2020-02-11 18:08:25 +00:00
Qweder93	dc075eaa96	satellite/payments : deposit bonuses (credits) added Change-Id: Ib151bbb9b02d655fa619c53bfbc04ed6f3bb39e0	2020-02-11 11:11:42 +00:00
Egon Elbre	a2b2bc676b	pkg/debug: implement control panel Control Panel allows to control different chores and services. Currently this adds controlling of cycles. Change-Id: I734f1676b2a0d883b8f5ba937e93c45ac1a9ce21	2020-01-29 16:30:31 -05:00
littleskunk	e0cb8037c1	satellite/projectusage: reduce usage limit from 5GB to 0GB Change-Id: Ie3d2509613e7a4336e2a8d2b136b32f5f308aafc	2020-01-29 20:38:39 +00:00
Ethan	149273c63f	satellite/metainfo: add cache expiration for project level rate limiting Allow rate limit project cache to expire so we can make project level rate limit changes without restarting the satellite process. Change-Id: I159ea22edff5de7cbfcd13bfe70898dcef770e42	2020-01-29 16:14:10 +00:00
Yaroslav Vorobiov	083b396c16	satellite/payments: allow floating point numbers for pricing Change-Id: I78b60134cf043746efef5371b761939a10f75aaf	2020-01-28 22:52:13 -05:00
littleskunk	a0c9f7f3b0	satellite/projectusage: reduce usage limit from 25GB to 5GB Change-Id: I2819012b520fd687ab8058000aa38d76b8208158	2020-01-29 04:01:09 +01:00
littleskunk	a6c6440ab7	satellite/order: decrease expire time from 7 days to 2 days For the last few month we had no issues with order submission. I would call it stable and now it is time to risk a lower expire time. This will increase the database performance on the satellite and it will reduce the delay for billing. The long term goal is 6h but for that step we need to change graceful exit first. At the moment storage nodes would get disuqlaified for not transfering alle pieces in less than 6 hours. Change-Id: I421a2c2421c5374c4e706e2338f1c2161fedc14c	2020-01-24 23:37:39 +00:00
Michal Niewrzal	6502454947	satellite/metainfo: move RS configuration to satellite With this change RS configuration will be set on satellite. Uplink with get RS values with BeginObject request and will use it. For backward compatibility and to avoid super large change redundancy scheme stored with bucket is not touched. This can be done in future. Change-Id: Ia5f76fc10c37e2c44e4f7b8754f28eafe1f97eff	2020-01-22 09:33:53 +00:00
Ethan	21a5d70a83	satellite/metainfo: Rate limiting - API requests Limits how many times metainfo APIs can be called per second by project ID. If limit is exceeded, the API will return Unauthorized/Too Many requests. Limit per second and the size of the limiter cache per project are configurable, as well as whether the limiter is enabled. Tests added/updated for the new rate_limit field in projects table. Tests added for exceeding limits and disableing limiter. Change-Id: Ic8ad102de3b690a475809d4f684156d5715f20fa	2020-01-21 14:25:04 +00:00
stefanbenten	f4097d518c	satellite: reduce logging of node status Change-Id: I6618cf4bf31b856acd7a28b54011a943c03ab22a	2020-01-18 17:47:59 +00:00
Cameron Ayer	4424697d7f	satellite/accounting: refactor live accounting to hold current estimated totals live accounting used to be a cache to store writes before they are picked up during the tally iteration, after which the cache is cleared. This created a window in which users could potentially exceed the storage limit. This PR refactors live accounting to hold current estimations of space used per project. This should also reduce DB load since we no longer need to query the satellite DB when checking space used for limiting. The mechanism by which the new live accounting system works is as follows: During the upload of any segment, the size of that segment is added to its respective project total in live accounting. At the beginning of the tally iteration we record the current values in live accounting as `initialLiveTotals`. At the end of the tally iteration we again record the current totals in live accounting as `latestLiveTotals`. The metainfo loop observer in tally allows us to get the project totals from what it observed in metainfo DB which are stored in `tallyProjectTotals`. However, for any particular segment uploaded during the metainfo loop, the observer may or may not have seen it. Thus, we take half of the difference between `latestLiveTotals` and `initialLiveTotals`, and add that to the total that was found during tally and set that as the new live accounting total. Initially, live accounting was storing the total stored amount across all nodes rather than the segment size, which is inconsistent with how we record amounts stored in the project accounting DB, so we have refactored live accounting to record segment size Change-Id: Ie48bfdef453428fcdc180b2d781a69d58fd927fb	2020-01-16 10:26:49 -05:00
Jeff Wendling	78c6d5bb32	satellite/satellitedb: reported_serials table for processing orders this commit introduces the reported_serials table. its purpose is to allow for blind writes into it as nodes report in so that we have minimal contention. in order to continue to accurately account for used bandwidth, though, we cannot immediately add the settled amount. if we did, we would have to give up on blind writes. the table's primary key is structured precisely so that we can quickly find expired orders and so that we maximally benefit from rocksdb path prefix compression. we do this by rounding the expires at time forward to the next day, effectively giving us storagenode petnames for free. and since there's no secondary index or foreign key constraints, this design should use significantly less space than the current used_serials table while also reducing contention. after inserting the orders into the table, we have a chore that periodically consumes all of the expired orders in it and inserts them into the existing rollups tables. this is as if we changed the nodes to report as the order expired rather than as soon as possible, so the belief in correctness of the refactor is higher. since we are able to process large batches of orders (typically a day's worth), we can use the code to maximally batch inserts into the rollup tables to make inserts as friendly as possible to cockroach. Change-Id: I25d609ca2679b8331979184f16c6d46d4f74c1a6	2020-01-15 19:21:21 -07:00
Isaac Hess	4950d7106a	satellite/orders: Add write cache for bw rollups Change-Id: I8ba454cb2ab4742cafd6ed09120e4240874831fc	2020-01-13 22:40:51 +00:00
Jeff Wendling	77fd41a02e	satellite: add an expiring lru cache around api keys Change-Id: I995429c66affd33da59b091f28f09ca122070b5e	2020-01-09 22:13:41 -07:00
Natalie Ventura Villasana	6b1829f3c3	satellite/downtime: new chore estimates downtime Adds EstimationChore to the downtime package, which is an independent chore that finds offline nodes given a configurable limit, then uptime checks those nodes, and sets a last contact success or failure given a response. For failed nodes, the chore updates the amount of downtime the node has been offline in the DowntimeTracking table. Design doc section: https://github.com/storj/storj/blob/master/docs/blueprints/storage-node-downtime-tracking.md#estimating-offline-time Jira: https://storjlabs.atlassian.net/browse/V3-2545 Change-Id: I60af95803930bf9b33232b248bb20cca6f0e0b5f	2020-01-09 15:05:13 -05:00

1 2 3 4 5

242 Commits