storj

Author	SHA1	Message	Date
Michal Niewrzal	7dde184cb5	Merge 'master' branch Change-Id: I6070089128a150a4dd501bbc62a1f8b394aa643e	2020-11-10 11:58:59 +00:00
Moby von Briesen	2fbb4095b2	storagenode/orders/ordersfile: Handle remaining pb.Unmarshal errors Missed one case of Unmarshal in the previous commit for V0 files (0f4e4969b7) In V1, unmarshalling was being attempted before the checksum was verified, so this commit moves those calls to the end of the V1 ReadOne function. Change-Id: Ic0b49f0bbc91fb61fb28af6003060994d0af22ed	2020-10-26 20:27:05 +00:00
Moby von Briesen	53ba01b1f1	storagenode/orders/ordersfile/v0.go: Return ErrEntryCorrupt on pb.Unmarshal failure In V0 orders files, unexpected EOF is correctly treated as a file corruption, but pb.Unmarshal can also fail, and this is not treated as a file corruption. This commit fixes that. Change-Id: I6b446a10f4b1a5a44e832cbcc9bf8b2548cfcfeb	2020-10-26 17:38:22 +00:00
Jessica Grebenschikov	f5880f6833	satellite/orders: rollout phase3 of SettlementWithWindow endpoint Change-Id: Id19fae4f444c83157ce58c933a18be1898430ad0	2020-10-26 14:56:28 +00:00
Jessica Grebenschikov	89bdb20a62	storagenodedb/orders: select unsent satellite with expiration In production we are seeing ~115 storage nodes (out of ~6,500) are not using the new SettlementWithWindow endpoint (but they are upgraded to > v1.12). We analyzed data being reported by monkit for the nodes who were above version 1.11 but were not successfully submitting orders to the new endpoint. The nodes fell into a few categories: 1. Always fail to list orders from the db; never get to try sending orders from the filestore 2. Successfully list/send orders from the db; never get to calling satellite endpoint for submitting filestore orders 3. Successfully list/send orders from the db; successfully list filestore orders, but satellite endpoint fails (with "unauthenticated" drpc error) The code change here add the following to address these issues: - modify the query for ordersDB.listUnsentBySatellite so that we no longer select expired orders from the unsent_orders table - always process any orders that are in the ordersDB and also any orders stored in the filestore - add monkit monitoring to filestore.ListUnsentBySatellite so that we can see the failures/successes Change-Id: I0b473e5d75252e7ab5fa6b5c204ed260ab5094ec	2020-10-21 15:02:23 +00:00
Moby von Briesen	02cbf1e72a	storagenode/orders: Add V1 orders file V1 allows the storagenode to continue reading orders from an unsent/archived orders file, even if orders in the middle are corrupted. Change-Id: Iea4117d55c05ceeb77f47d5c973e5ba95da46c66	2020-10-14 15:04:33 +00:00
Stefan Benten	c1ca470e7e	storagenode/orders: fix import and cleanup go.mod and go.sum Accidentally we imported the wrong monkit package with a previous commit and made our go.mod and go.sum file unclean. This should fix it. Change-Id: I4c3c8b696f59cfd06dc2d5436bb7aea2805936ce	2020-10-09 00:04:57 +02:00
Moby von Briesen	3209effeb6	storagenode/orders: Increase order sending interval from 5m to 1h Since storage nodes check to see if any order files can be sent every 5 minutes, every storage node attempts to send orders to the satellite within 5 minutes of each hour since this is when the files become "available" to send. It is placing a lot of load on our satellite and storage nodes are not being paid out properly due to timeouts during order sending due to the increased satellite load. Change-Id: I44d991b5884b8c11e8a3856d39aee8323f086b51	2020-10-08 12:51:21 -04:00
Moby von Briesen	fbf2c0b242	storagenode/orders: Refactor orders store Abstract details of writing and reading data to/from orders files so that adding V1 and future maintenance are easier. Change-Id: I85f4a91761293de1a782e197bc9e09db228933c9	2020-10-06 15:28:07 -04:00
Moby von Briesen	8287e3a32d	storagenode/orders/store.go: combine writeLimit/writeOrder operations Combine store.writeLimit and store.writeOrder into store.writeLimitAndOrder, which only requires a single call to file.Write(). This simplifies code, but it also reduces the likelihood of multiple calls to Write() increasing the likelihood of file corruption. Also combine the corresponding readLimit/readOrder functions for consistency. Change-Id: I62ed406fa2c02708465a678d18293f510f666440	2020-09-22 17:53:12 +00:00
Moby von Briesen	7db5794c16	storagenode/orders/store: Do not lock order enqueues for entire duration of ListUnsentBySatellite We only need to lock aquire mutexes inside ListUnsentBySatellite when we want to determine whether a file has an active enqueue in progress. On some nodes, ListUnsentBySatellite can take a particularly long time, having undesired side-effects, so if we can minimize locking time, those nodes will be better off. Also, lock archive mu during ListUnsentBySatellite so files cannot be archived and listed at the same time. Change-Id: Ieb7e2a759c20c724a74dd8315728c873ccab14a3	2020-09-15 15:15:30 +00:00
Moby von Briesen	789b07e226	storagenode/orders/store.go: Do not return error from ListUnsentBySatellite when order files are corrupted. If we see an UnexpectedEOF error when attempting to read orders, return the orders we have been able to read successfully and do not return an error. This behavior ensures that the storagenode orders service attempts to archive corrupted files and does not retry them repeatedly and get stuck. Change-Id: I0d00d1e174f968af6e99ca861eddad190f1339e2	2020-09-10 23:36:05 +00:00
Stefan Benten	179b5adad4	storagenode/orders: add missing mon.Task parameter Change-Id: If98cf347a81f29698a6bdb0907520d60f71db433	2020-09-06 00:05:53 +00:00
Egon Elbre	c86c732fc0	satellite: simplify tests satellite.DB.Console().Projects().GetAll database query can be replaced with planet.Uplinks[0].Projects[0].ID Change-Id: I73b82b91afb2dde7b690917345b798f9d81f6831	2020-08-28 22:28:04 +00:00
Egon Elbre	3ca405aa97	satellite/orders: use metabase types as arguments Change-Id: I7ddaad207c20572a5ea762667531770a56fd54ef	2020-08-28 15:52:37 +03:00
Moby von Briesen	68b67c83a7	storagenode/{orders,piecestore}: Always unlock unsent orders file, even with an empty order. When we call ordersStore.BeginEnqueue, the unsent orders file for that satellite and hour is prevented from being sent. It is freed when the commit callback returned by BeginEnqueue is used. This change ensures that we always call the commit callback, even when we have an empty order or an order with Amount <= 0. Change-Id: Ic4678f7eaa1e6957dd77d4bb5a23bb35d25b1e93	2020-08-21 11:35:31 -04:00
Jeff Wendling	91698207cf	storagenode: live tracking of order window usage This change accomplishes multiple things: 1. Instead of having a max in flight time, which means we effectively have a minimum bandwidth for uploads and downloads, we keep track of what windows have active requests happening in them. 2. We don't double check when we save the order to see if it is too old: by then, it's too late. A malicious uplink could just submit orders outside of the grace window and receive all the data, but the node would just not commit it, so the uplink gets free traffic. Because the endpoints also check for the order being too old, this would be a very tight race that depends on knowledge of the node system clock, but best to not have the race exist. Instead, we piggy back off of the in flight tracking and do the check when we start to handle the order, and commit at the end. 3. Change the functions that send orders and list unsent orders to accept a time at which that operation is happening. This way, in tests, we can pretend we're listing or sending far into the future after the windows are available to send, rather than exposing test functions to modify internal state about the grace period to get the desired effect. This brings tests closer to actual usage in production. 4. Change the calculation for if an order is allowed to be enqueued due to the grace period to just look at the order creation time, rather than some computation involving the window it will be in. In this way, you can easily answer the question of "will this order be accepted?" by asking "is it older than X?" where X is the grace period. 5. Increases the frequency we check to send up orders to once every 5 minutes instead of once every hour because we already have hour-long buffering due to the windows. This decreases the maximum latency that an order will be reported back to the satellite by 55 minutes. Change-Id: Ie08b90d139d45ee89b82347e191a2f8db1b88036	2020-08-19 19:42:33 +00:00
Moby von Briesen	708cb48aa6	storagenode/orders: implement orders filestore on storagenode * Add all new orders to the orders filestore instead of the database. * Submit orders from the filestore to the new satellite SettleWindow endpoint. The orders filestore will eventually replace the orders DB completely. For now, we will still be checking the orders DB and submitting those orders if they exist. In a later release, we will completely remove the orders DB, but we need both the DB and filestore for the transitionary period. Change-Id: Iac8780fd5ab770296181bbd313e1d335f072d4dc	2020-08-19 15:00:35 +00:00
Egon Elbre	94a09ce20b	all: add missing dots Change-Id: I93b86c9fb3398c5d3c9121b8859dad1c615fa23a	2020-08-11 17:50:01 +03:00
Egon Elbre	e70da5cd4e	all: fix comments Change-Id: I2d2307e3fab87de47a72b3595d051e2c95ff4f8a	2020-07-16 19:13:14 +03:00
Egon Elbre	080ba47a06	all: fix dots Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2	2020-07-16 14:58:28 +00:00
Moby von Briesen	1b807761bd	storagenode/orders: Update orders filestore to be compatible with new satellite endpoint * Instead of archiving a list of orders and deleting an "unsent" file in separate steps, archival simply moves the old unsent file to a new archived file * Add maxInFlightTime to be used along with grace period for sending buffer * Create unsent/archival directories in constructor * Code cleanup Change-Id: Ia3bc2aaf60cced6c6d413465423d78c7d5151188	2020-07-15 14:21:56 -04:00
stefanbenten	257855b5de	all: replace == comparison with errors.Is Change-Id: I05d9a369c7c6f144b94a4c524e8aea18eb9cb714	2020-07-14 15:50:25 +00:00
Moby von Briesen	be59727790	storagenode/orders: Add archival functionality to orders filestore * Allow orders to be archived after being settled successfully with the satellite. * Allow for cleanup of orders that were archived before a certain time. * Rewrite other parts of the orders file store to work better with new design. Change-Id: I39bea96d80e66a324ec522745169bd6d8b351751	2020-06-11 08:47:37 +00:00
Moby von Briesen	c8c0a42269	storagenode/orders: begin implementation of file store for order limits * Will replace order limits database. * This change adds functionality for storing and listing unsent orders. * The next change will add functionality for order archival after submission. Change-Id: Ic5e2abc63991513245b6851a968ff2f2e18ce48d	2020-06-03 22:47:04 +00:00
Egon Elbre	94b2b315f7	storagenode/trust: refactor GetAddress to GetNodeURL Most places now need the NodeURL rather than the ID and Address separately. This simplifies code in multiple places. Change-Id: I52621d8ca52296a8b5bf7afbc1001cf8bfb44239	2020-05-20 11:05:15 +00:00
Egon Elbre	ed627144ed	all: use DialNodeURL throughout the codebase Change-Id: Iaf9ae3aeef7305c937f2660c929744db2d88776c	2020-05-20 10:36:30 +00:00
Kaloyan Raev	e0cf9ae888	storagenode/orders: set devDefault sender interval to 30s Change-Id: Ic2877da12b5985b73d1565e8ccd65bc817a76448	2020-05-11 18:55:56 +03:00
Egon Elbre	00b2dfa313	storagenode/orders: fix TestDB and TestCleanArchive Tests were relying on too exact time values. Change-Id: I83aa0e815b06372a65ec2d62b23df86051050d19	2020-04-30 15:00:06 +03:00
Yingrong Zhao	b7b19289d1	bump storj.io/common to latest Change-Id: I16e337660ce8e1ef332cc842dbf4cfa067b9b98b	2020-03-25 09:08:40 -04:00
Bill Thorp	94c11c5212	satellite: remove some unnecessary UTC() calls Fixes some easy cases of extraneous UTC() calls Change-Id: I3f4c287ae622a455b9a492a8892a699e0710ca9a	2020-03-13 13:49:44 +00:00
Jeff Wendling	7999d24f81	all: use monkit v3 this commit updates our monkit dependency to the v3 version where it outputs in an influx style. this makes discovery much easier as many tools are built to look at it this way. graphite and rothko will suffer some due to no longer being a tree based on dots. hopefully time will exist to update rothko to index based on the new metric format. it adds an influx output for the statreceiver so that we can write to influxdb v1 or v2 directly. Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff	2020-02-05 23:53:17 +00:00
Egon Elbre	10be538602	storagenode: add pkg/debug support Change-Id: If941095b886c28a0d53fff4c9bf9fa0ce7471dea	2020-01-29 16:30:31 -05:00
nerdatwork	9ea32016c2	storagenode/orders: fix typos in log messages (#3760 )	2020-01-26 13:45:57 -05:00
Egon Elbre	21f53e38da	storagenode/storagenodedb/storagenodedbtest: pass ctx as an argument Change-Id: I10b0a8ef3a7d5001e7d361f1873ad5987af1f9c2	2020-01-20 16:56:12 +02:00
Egon Elbre	d80cfeb4ab	storagenode: ensure we don't eat the underlying error When error is formatted using %v it's not possible to check whether the error was caused by a context cancellation. Change-Id: Ia77dfb0817e49d9a7b168c12a6300d131007d0ee	2020-01-14 20:26:23 +00:00
Stefan Benten	758fe35aba	storagenode/orders: adding jitter to sending (#3725 )	2019-12-30 21:35:26 +01:00
Egon Elbre	6615ecc9b6	common: separate repository Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a	2019-12-27 14:11:15 +02:00
Egon Elbre	d55288cf68	pkg/rpc: replace methods with direct calls to pb Change-Id: I8bd015d8d316a2c12c1daceca1d9fd257f6f57bc	2019-12-22 17:12:43 +02:00
littleskunk	c2ea75208f	storagenode/orderdb: fix db lock Change-Id: Id1add0ba7ae1b20bd98099bd4d3aff0fcfdd90c9	2019-12-15 23:41:22 +01:00
Egon Elbre	ee6c1cac8a	private: rename internal to private (#3573 )	2019-11-14 21:46:15 +02:00
JT Olio	2c6fa3c5f8	pkg/rpc: remove read/write deadlines as a mechanism for request timeouts (#3335 ) libuplink was incorrectly setting timeouts to 10 seconds still, but should have been at least 10 minutes. the order sender was setting them to 1 hour. we don't want timeouts in uplink-side logic as it establishes a minimum rate on tcp streams. instead of all of this, just use tcp keep alive. tcp keep alive packets are sent every 15 seconds and if the peer stops responding the connection dies. this is enabled by default with go. this will kill tcp connections when they stop working. Change-Id: I3d7ad49f71950b3eb43044eedf4b17993116045b	2019-10-22 17:57:24 -06:00
Jeff Wendling	098cbc9c67	all: use pkg/rpc instead of pkg/transport all of the packages and tests work with both grpc and drpc. we'll probably need to do some jenkins pipelines to run the tests with drpc as well. most of the changes are really due to a bit of cleanup of the pkg/transport.Client api into an rpc.Dialer in the spirit of a net.Dialer. now that we don't need observers, we can pass around stateless configuration to everything rather than stateful things that issue observations. it also adds a DialAddressID for the case where we don't have a pb.Node, but we do have an address and want to assert some ID. this happened pretty frequently, and now there's no more weird contortions creating custom tls options, etc. a lot of the other changes are being consistent/using the abstractions in the rpc package to do rpc style things like finding peer information, or checking status codes. Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412	2019-09-25 15:37:06 -06:00
Egon Elbre	ca058e606f	storagenode/orders: fix data race in settle (#3042 )	2019-09-13 15:50:39 +03:00
Natalie Villasana	aa3567187e	satellite/audit: worker now verifies and reverifies (#2965 )	2019-09-11 18:37:01 -04:00
Egon Elbre	a801fab66a	all: add archview annotations (#2964 )	2019-09-10 16:24:16 +03:00
Michal Niewrzal	ee614bf032	storagenode: add custom dial timeout for orders sending (#2939 )	2019-09-03 17:32:28 +02:00
Michal Niewrzal	3fbe31aada	storagenode: Increase order sending request timeout (#2930 )	2019-09-02 13:24:02 +02:00
littleskunk	9d1910cb2b	storagenode/orderarchive: Reduce TTL from 45 to 7 days (#2915 )	2019-08-29 22:38:09 +02:00
Ivan Fraixedes	537769d7fa	storagenode/orders: Don't return error Archiving unsent (#2903 ) Don't return error when archiving errors which aren't found in the DB because it causes Storage Node send orders cycle to stop. This was applied in the commit `e47b8ed131` but the last call to orders.Archive function was missed so the errors weren't returned when not found orders in the first call but they were returned in the second call. This commit address the second call for making handleBatches function never returns error on not found orders.	2019-08-29 20:22:22 +02:00

1 2

79 Commits