storj

Author	SHA1	Message	Date
Jessica Grebenschikov	f5880f6833	satellite/orders: rollout phase3 of SettlementWithWindow endpoint Change-Id: Id19fae4f444c83157ce58c933a18be1898430ad0	2020-10-26 14:56:28 +00:00
Jessica Grebenschikov	89bdb20a62	storagenodedb/orders: select unsent satellite with expiration In production we are seeing ~115 storage nodes (out of ~6,500) are not using the new SettlementWithWindow endpoint (but they are upgraded to > v1.12). We analyzed data being reported by monkit for the nodes who were above version 1.11 but were not successfully submitting orders to the new endpoint. The nodes fell into a few categories: 1. Always fail to list orders from the db; never get to try sending orders from the filestore 2. Successfully list/send orders from the db; never get to calling satellite endpoint for submitting filestore orders 3. Successfully list/send orders from the db; successfully list filestore orders, but satellite endpoint fails (with "unauthenticated" drpc error) The code change here add the following to address these issues: - modify the query for ordersDB.listUnsentBySatellite so that we no longer select expired orders from the unsent_orders table - always process any orders that are in the ordersDB and also any orders stored in the filestore - add monkit monitoring to filestore.ListUnsentBySatellite so that we can see the failures/successes Change-Id: I0b473e5d75252e7ab5fa6b5c204ed260ab5094ec	2020-10-21 15:02:23 +00:00
paul cannon	360ab17869	satellite/audit: use LastIPAndPort preferentially This preserves the last_ip_and_port field from node lookups through CreateAuditOrderLimits() and CreateAuditOrderLimit(), so that later calls to (Verifier).GetShare() can try to use that IP and port. If a connection to the given IP and port cannot be made, or the connection cannot be verified and secured with the target node identity, an attempt is made to connect to the original node address instead. A similar change is not necessary to the other CreateOrderLimits functions, because they already replace node addresses with the cached IP and port as appropriate. We might want to consider making a similar change to CreateGetRepairOrderLimits(), though. The audit situation is unique because the ramifications are especially powerful when we get the address wrong. Failing a single audit can have a heavy cost to a storage node. We need to make extra effort in order to avoid imposing that cost unfairly. Situation 1: If an audit fails because the repair worker failed to make a DNS query (which might well be the fault on the satellite side), and we have last_ip_and_port information available for the target node, it would be unfair not to try connecting to that last_ip_and_port address. Situation 2: If a node has changed addresses recently and the operator correctly changed its DNS entry, but we don't bother querying DNS, it would be unfair to penalize the node for our failure to connect to it. So the audit worker must try both last_ip_and_port _and_ the node address as supplied by the SNO. We elect here to try last_ip_and_port first, on the grounds that (a) it is expected to work in the large majority of cases, and (b) there should not be any security concerns with connecting to an out-or-date address, and (c) avoiding DNS queries on the satellite side helps alleviate satellite operational load. Change-Id: I9bf6c6c79866d879adecac6144a6c346f4f61200	2020-10-21 13:34:40 +00:00
Jessica Grebenschikov	205c39d404	satellite/orders: upgrade to phase 2 rollout ordersWithWindow We are moving an error into rejectErr since its preventing storage nodes from being able to settle other orders. Change-Id: I3ac97c340e491b127f5e0024c5e8bd9f4df8d5c3	2020-10-15 21:20:19 +00:00
Egon Elbre	0bdb952269	all: use keyed special comment Change-Id: I57f6af053382c638026b64c5ff77b169bd3c6c8b	2020-10-13 15:13:41 +03:00
Stefan Benten	1d3b728766	satellite/{console/payments/satellitedb}: add validation for deletion of account and project The same was that our Admin API handles project and account deletions currently, we would like to have the same checks on the user-facing API. This PR adds the same checks to the console service. General more applicable checks have been moved directly into the payments service. In addition it adds the BucketsDB to the console DB, to have easier access and avoiding import cycles with the metainfo package. A small cleanup around our unnecessary monkit imports made it in as well. Change-Id: I8769b01c2271c1687fbd2269a738a41764216e51	2020-10-13 07:55:26 +00:00
Jeff Wendling	4cbd4d52a9	satellite/orders: only hold the orders semaphore during database calls holding it during node i/o means slow nodes can hold up order processing for everyone else. this dramatically increases the amount of tiem spent handling orders. Change-Id: Iec999b7ed0817c921a0fd039097a75bdd3c70ea2	2020-10-10 15:40:50 -04:00
Jeff Wendling	0f0faf0a9f	satellite/orders: do a better job limiting concurrent requests Doing it at the ProcessOrders level was insufficient: the endpoints make multiple database calls. It was a misguided attempt to only have one spot enter the semaphore. By putting it in the endpoint we can not only be sure that the concurrency is correctly limited but it can be configurable easily. Change-Id: I937149dd077adf9eb87fce52a1a17dc0afe96f64	2020-10-09 16:27:15 -04:00
Jeff Wendling	1fecaed7df	satellite/orders: don't version check old endpoint nodes are submitting using both the legacy and windowed endpoints and thus having their legacy submissions rejected. it is legal to use both the legacy and windowed endpoints in phase1 since they use the same backend. the legacy endpoint is disabled in phase2 and phase3. therefore, if we wait an order expiration period (2 days) after we determine enough nodes have started using the windowed endpoint, we can be sure that any orders they did have to submit with the legacy endpoint will have expired. Change-Id: I4418a881bf8bb9377efaef4c651e6103a5dc6ed0	2020-09-09 10:23:48 -04:00
Egon Elbre	dc48197bd8	satellite/orders: add bucket id to order limit Change-Id: I9019ec77d692e62ac17b67a1da71dc3535cde50c	2020-09-03 10:50:11 +03:00
Egon Elbre	61b17f1214	satellite/orders: add encryption keys flag to Service Change-Id: Ie96e75bc96241b799d04654ef5e05b82e6a899bb	2020-09-02 05:02:14 +00:00
Egon Elbre	c86c732fc0	satellite: simplify tests satellite.DB.Console().Projects().GetAll database query can be replaced with planet.Uplinks[0].Projects[0].ID Change-Id: I73b82b91afb2dde7b690917345b798f9d81f6831	2020-08-28 22:28:04 +00:00
Egon Elbre	3ca405aa97	satellite/orders: use metabase types as arguments Change-Id: I7ddaad207c20572a5ea762667531770a56fd54ef	2020-08-28 15:52:37 +03:00
Jeff Wendling	91698207cf	storagenode: live tracking of order window usage This change accomplishes multiple things: 1. Instead of having a max in flight time, which means we effectively have a minimum bandwidth for uploads and downloads, we keep track of what windows have active requests happening in them. 2. We don't double check when we save the order to see if it is too old: by then, it's too late. A malicious uplink could just submit orders outside of the grace window and receive all the data, but the node would just not commit it, so the uplink gets free traffic. Because the endpoints also check for the order being too old, this would be a very tight race that depends on knowledge of the node system clock, but best to not have the race exist. Instead, we piggy back off of the in flight tracking and do the check when we start to handle the order, and commit at the end. 3. Change the functions that send orders and list unsent orders to accept a time at which that operation is happening. This way, in tests, we can pretend we're listing or sending far into the future after the windows are available to send, rather than exposing test functions to modify internal state about the grace period to get the desired effect. This brings tests closer to actual usage in production. 4. Change the calculation for if an order is allowed to be enqueued due to the grace period to just look at the order creation time, rather than some computation involving the window it will be in. In this way, you can easily answer the question of "will this order be accepted?" by asking "is it older than X?" where X is the grace period. 5. Increases the frequency we check to send up orders to once every 5 minutes instead of once every hour because we already have hour-long buffering due to the windows. This decreases the maximum latency that an order will be reported back to the satellite by 55 minutes. Change-Id: Ie08b90d139d45ee89b82347e191a2f8db1b88036	2020-08-19 19:42:33 +00:00
Moby von Briesen	708cb48aa6	storagenode/orders: implement orders filestore on storagenode * Add all new orders to the orders filestore instead of the database. * Submit orders from the filestore to the new satellite SettleWindow endpoint. The orders filestore will eventually replace the orders DB completely. For now, we will still be checking the orders DB and submitting those orders if they exist. In a later release, we will completely remove the orders DB, but we need both the DB and filestore for the transitionary period. Change-Id: Iac8780fd5ab770296181bbd313e1d335f072d4dc	2020-08-19 15:00:35 +00:00
Egon Elbre	b4c8e219c7	satellite/orders: calculate order expiration inside signer Change-Id: I07f79eeb1ab41b061a1f3146f684bd21291cffb0	2020-08-18 13:21:16 +03:00
Egon Elbre	189ab07846	satellite/orders: use Signer in CreateGetOrderLimits Change-Id: Icb7ed4f1af1dabbbb68cb6f6e1f86d93a9b5faa3	2020-08-18 13:20:00 +03:00
Egon Elbre	cd5e99ea6b	satellite/orders: Signer for simplifying signing logic Create a separate struct for signing order limits. Change-Id: I8f8f5245040efa8c03138512be9248d4834f3f36	2020-08-18 13:19:16 +03:00
Qweder93	01bb2bd17d	satellite/audit: verifier checks if node made sucess GE before auditing Change-Id: Ia6cde4e9fcf11020a5301d38065f7159f276eb80	2020-08-17 23:37:57 +03:00
Egon Elbre	94a09ce20b	all: add missing dots Change-Id: I93b86c9fb3398c5d3c9121b8859dad1c615fa23a	2020-08-11 17:50:01 +03:00
Jeff Wendling	85a74b47e7	satellite/orders: 3-phase rollout This adds a config flag orders.window-endpoint-rollout-phase that can take on the values phase1, phase2 or phase3. In phase1, the current orders endpoint continues to work as usual, and the windowed orders endpoint uses the same backend as the current one (but also does a bit extra). In phase2, the current orders endpoint is disabled and the windowed orders endpoint continues to use the same backend. In phase3, the current orders endpoint is still disabled and the windowed orders endpoint uses the new backend that requires much less database traffic and state. The intention is to deploy in phase1, roll out code to nodes to have them use the windowed endpoint, switch to phase2, wait a couple days for all existing orders to expire, then switch to phase3. Additionally, it fixes a bug where a node could submit a bunch of orders and rack up charges for a bucket. Change-Id: Ifdc10e09ae1645159cbec7ace687dcb2d594c76d	2020-08-03 17:01:42 +00:00
Egon Elbre	36ed939b89	satellite/orders: add buckets db to service We need to add bucket UUID into the order limit, hence we need access to the buckets table. Change-Id: I348ce1f709c9fcdec5c4034acaab59805b33da9f	2020-07-24 17:36:49 +03:00
Egon Elbre	44f9193404	satellite/orders: make optimal threshold multiplier into an argument It feels weird having a repairer configuration part of order services. Let's have a single source of truth for it. Change-Id: I24f7c897aec80f3293f8af24876cbb6733d85a0b	2020-07-24 16:35:59 +03:00
Egon Elbre	ba4c3d9986	satellite/orders: remove unused node status logging flag Change-Id: I24da78a11cc5d3d88cdf6aca85c4238e4086e59c	2020-07-24 16:35:59 +03:00
Egon Elbre	b84923558b	satellite: fix scoping, formatting Change-Id: I21ef9edc2d449d75ad74891df7f966fb150d80fd	2020-07-16 19:13:14 +03:00
Egon Elbre	080ba47a06	all: fix dots Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2	2020-07-16 14:58:28 +00:00
Jeff Wendling	1944d734ef	satellite/orders: check and enforce node api version Change-Id: Ibdeb1a85dfed8b534bfed32a7cdaae5c3dc8b420	2020-07-16 10:38:12 +00:00
Jeff Wendling	3a8766936b	satellite/orders: remove race condition in new endpoint tests the flush batch size was set to 1 which means that a flush was async scheduled after the first write. the explicit trigger wait was then always flushing nothing, and the test would only pass if the async flush was scheduled before the read. remove that async flush and pause the flush loop so that we are in full control of when the flushes happen so there are no races. the tests are still disabled but that's because the endpoint is still disabled. Change-Id: I2b7b07fd5525388c30be8efbf4af7105087228da	2020-07-14 15:12:33 -04:00
stefanbenten	257855b5de	all: replace == comparison with errors.Is Change-Id: I05d9a369c7c6f144b94a4c524e8aea18eb9cb714	2020-07-14 15:50:25 +00:00
Jessica Grebenschikov	1f1e3f0604	satellite/orders: disable settlementwithwindow, skip flaky tests Change-Id: Ia60d7e0f2d383919650cdc736ba4569bb26ff2d7	2020-07-14 07:17:18 -07:00
Jessica Grebenschikov	8abb907010	satellite/orders: add settle orders with window Why: We need a way to cut down on database traffic due to bandwidth measurement and tracking. What: This changeset is the Satellite side of settling orders in 1 hr windows. See design doc for more details: https://review.dev.storj.io/c/storj/storj/+/1732 Change-Id: I2e1c151e2e65516ebe1b7f47b7c5f83a3a220b31	2020-07-13 15:41:29 -07:00
Egon Elbre	06a3510ae4	satellite/orders: add EncryptionKey encryption Add encryption using nacl/secretbox. Change-Id: I0add31a9fd2359be1bf9d3e43b6c4b9ff3b6fb03	2020-07-13 16:10:19 +00:00
Egon Elbre	5bdcd86fa7	ci: test benchmarks This runs each benchmark for one iteration to ensure that they are valid. Unfortunately, it does not give any useful metrics as output. Change-Id: I68940398c8dd849aed656bd12656f48d5df10128	2020-07-10 13:26:49 +00:00
Jessica Grebenschikov	41497569cd	satellite/orders: add settled amount to rollups write cache https://storjlabs.atlassian.net/browse/SM-1109 Change-Id: Ic5859b141c1384157b33df0d7fb6c8b43cc8a6b1	2020-07-07 14:46:06 +00:00
Egon Elbre	735dc6e163	satellite/orders: add encryption keys This adds EncryptionKey definition that can be used as a flag. These order.EncryptionKey-s will be used to encrypt data in order limits. This helps to avoid storing lots of transient data in the main database. This code doesn't yet contain encryption itself. Change-Id: I2efae102a89b851d33342a0106f8d8b3f35119bb	2020-06-30 15:03:14 +00:00
Egon Elbre	df8cf8f58a	satellite/orders: delete unused code Change-Id: I431c8cc2f23e538c676d6f742fb1faef7cc1d73e	2020-06-25 16:48:26 +03:00
Egon Elbre	f6301612ac	satellite/orders: use serial SerialNumber By ensuring that they have less randomness it means that they can be compressed better. Using a timestamp should be a good improvement here. Change-Id: Ic4dabb53335a744ff1c332dd279f37ae2cd79357	2020-05-15 11:33:53 +00:00
Egon Elbre	7d29f2e0d3	all: remove drpc wrappers Change-Id: I45016f7d2a771dc00776196c1f531f3343e93b40	2020-05-11 08:20:34 +03:00
Egon Elbre	e6d5ce6b77	all: remove grpc It seems everyone has migrated to drpc. Change-Id: Ica6b2d0bdef68c6603083f2963458843eca71e9e	2020-05-10 06:36:09 +00:00
Egon Elbre	9052085f70	private/testplanet: simplify uplink usage Change-Id: I3e488dc296f1094ce95e6d6597ca6d3f8da90a76	2020-04-16 16:45:55 +00:00
Jeff Wendling	a409bd5dec	satellite/orders: check for expired orders first there are a subset of storagenodes hammering the satellite with expired orders. if we check for expiration first, we don't have to do a bunch of pointless signature verification. since a && b is equal to b && a, we can order these checks in any way we want and have it still be correct. Change-Id: I6ffc8025c8b0d54949a1daf5f5ea1fed9e213372	2020-04-02 12:35:11 -06:00
Egon Elbre	6492b13d81	all: remove old uuid Change-Id: I3a137f73456f010c37d3933dbe12cbbb840b809f	2020-04-02 19:30:36 +03:00
Egon Elbre	1024bf9ce1	all: simplify uuid usage Instead of uuid.Parse, use uuid.FromString. This removes a bunch of pointer management logic. Change-Id: Id25bd174eb43c71d00b450158a198abafd8958f2	2020-04-02 13:45:19 +00:00
Egon Elbre	0a69da4ff1	all: switch to storj.io/common/uuid Change-Id: I178a0a8dac691e57bce317b91411292fb3c40c9f	2020-03-31 19:16:41 +03:00
Egon Elbre	439aba922a	satellite/overlay: reduce overhead of GetNodes Instead of filtering on the client side it's better to filter on the database side. Change-Id: I845fbbe5ed28c2ffdb0b8a3f789b59c094fd1069	2020-03-30 18:36:23 +03:00
Egon Elbre	cb781d66c7	satellite/overlay: optimize FindStorageNodes Reduce the number of fields returned from the query. Benchmark results in `satellite/overlay`: benchstat before.txt after2.txt name old time/op new time/op delta SelectStorageNodes-32 7.85ms ± 1% 6.27ms ± 1% -20.18% (p=0.002 n=10+4) SelectNewStorageNodes-32 8.21ms ± 1% 6.61ms ± 0% -19.53% (p=0.002 n=10+4) SelectStorageNodesExclusion-32 17.2ms ± 1% 15.9ms ± 1% -7.55% (p=0.002 n=10+4) SelectNewStorageNodesExclusion-32 17.8ms ± 2% 16.1ms ± 0% -9.38% (p=0.002 n=10+4) FindStorageNodes-32 48.4ms ± 1% 45.1ms ± 0% -6.69% (p=0.002 n=10+4) FindStorageNodesExclusion-32 79.2ms ± 1% 76.1ms ± 1% -3.89% (p=0.002 n=10+4) Benchmark results from `satellite/overlay` after making them parallel: benchstat before-parallel.txt after2-parallel.txt name old time/op new time/op delta SelectStorageNodes-32 548µs ± 1% 353µs ± 1% -35.60% (p=0.029 n=4+4) SelectNewStorageNodes-32 562µs ± 0% 368µs ± 0% -34.51% (p=0.029 n=4+4) SelectStorageNodesExclusion-32 1.02ms ± 1% 0.84ms ± 0% -18.08% (p=0.029 n=4+4) SelectNewStorageNodesExclusion-32 1.03ms ± 1% 0.86ms ± 2% -16.22% (p=0.029 n=4+4) FindStorageNodes-32 3.11ms ± 0% 2.79ms ± 1% -10.27% (p=0.029 n=4+4) FindStorageNodesExclusion-32 4.75ms ± 0% 4.43ms ± 1% -6.56% (p=0.029 n=4+4) Change-Id: I1d85e2764eb270f4c2b1998303ccfc1179d65b26	2020-03-30 18:36:23 +03:00
JT Olio	5511827662	satellite/orders: don't log expired order limits we still need to come up with a better plan to get storage nodes to stop doing this, but in the meantime, we know this is happening, just stop logging it and keep some stats instead. Change-Id: Icb6bcba275e0e955c54b1a90da2b37219fff2349	2020-03-26 22:31:10 -06:00
Jeff Wendling	115f4559e5	satellite/orders: more efficient processing of orders by doing an indexed anti-join we're able to reduce the time to select the pending orders by over 10x on postgres. this should help us process pending orders much more quickly. it probably won't do as good a job on cockroach because it does not do an indexed anti-join and instead does a hash join after scanning the entire consumed serials table. we should either remove orders entirely or try to make that more efficient when necessary. Change-Id: I8ca0535acd21c51e74955b24c9b86d20e4f2ff9c	2020-03-18 09:03:30 +00:00
Jeff Wendling	7baa59753a	satellite/orders: add tests for double sending the same order Change-Id: If2fa7f035257df3b04f506f81aa8b2e0916f5033	2020-03-17 14:18:03 +00:00
Ethan	bdbf764b86	satellite/orders;overlay: Consolidate order limit storage node lookups into 1 query. https: //storjlabs.atlassian.net/browse/SM-449 Change-Id: Idc62cc2978fba67cf48f7c98b27b0f996f9c58ac	2020-03-16 23:15:47 +00:00

1 2 3

144 Commits