storj

Author	SHA1	Message	Date
Egon Elbre	0bdb952269	all: use keyed special comment Change-Id: I57f6af053382c638026b64c5ff77b169bd3c6c8b	2020-10-13 15:13:41 +03:00
Moby von Briesen	fbf2c0b242	storagenode/orders: Refactor orders store Abstract details of writing and reading data to/from orders files so that adding V1 and future maintenance are easier. Change-Id: I85f4a91761293de1a782e197bc9e09db228933c9	2020-10-06 15:28:07 -04:00
JT Olio	1f711523d5	satellite/repair: switch to piecestore.UploadReader part 2 Change-Id: I5a91d2960b037c7a3c96d01bc40404316ba028e3	2020-09-01 12:40:54 -06:00
JT Olio	b872fe52a1	satellite/repair: switch to piecestore.UploadReader Change-Id: Ia99ad2cf5422e6ba1d98b32946740f9cadba7b6d	2020-09-01 09:26:54 -06:00
Cameron Ayer	ca0c1a5f0c	storagenode/{monitor,pieces}, storage/filestore: add loop to check storage directory writability periodically create and delete a temp file in the storage directory to verify writability. If this check fails, shut the node down. Change-Id: I433e3a8d1d775fc779ae78e7cf3144a05ffd0574	2020-08-31 21:20:49 +00:00
Moby von Briesen	68b67c83a7	storagenode/{orders,piecestore}: Always unlock unsent orders file, even with an empty order. When we call ordersStore.BeginEnqueue, the unsent orders file for that satellite and hour is prevented from being sent. It is freed when the commit callback returned by BeginEnqueue is used. This change ensures that we always call the commit callback, even when we have an empty order or an order with Amount <= 0. Change-Id: Ic4678f7eaa1e6957dd77d4bb5a23bb35d25b1e93	2020-08-21 11:35:31 -04:00
Jeff Wendling	91698207cf	storagenode: live tracking of order window usage This change accomplishes multiple things: 1. Instead of having a max in flight time, which means we effectively have a minimum bandwidth for uploads and downloads, we keep track of what windows have active requests happening in them. 2. We don't double check when we save the order to see if it is too old: by then, it's too late. A malicious uplink could just submit orders outside of the grace window and receive all the data, but the node would just not commit it, so the uplink gets free traffic. Because the endpoints also check for the order being too old, this would be a very tight race that depends on knowledge of the node system clock, but best to not have the race exist. Instead, we piggy back off of the in flight tracking and do the check when we start to handle the order, and commit at the end. 3. Change the functions that send orders and list unsent orders to accept a time at which that operation is happening. This way, in tests, we can pretend we're listing or sending far into the future after the windows are available to send, rather than exposing test functions to modify internal state about the grace period to get the desired effect. This brings tests closer to actual usage in production. 4. Change the calculation for if an order is allowed to be enqueued due to the grace period to just look at the order creation time, rather than some computation involving the window it will be in. In this way, you can easily answer the question of "will this order be accepted?" by asking "is it older than X?" where X is the grace period. 5. Increases the frequency we check to send up orders to once every 5 minutes instead of once every hour because we already have hour-long buffering due to the windows. This decreases the maximum latency that an order will be reported back to the satellite by 55 minutes. Change-Id: Ie08b90d139d45ee89b82347e191a2f8db1b88036	2020-08-19 19:42:33 +00:00
Cameron Ayer	0155c21b44	private/testplanet, storagenode/{monitor,pieces}: write storage dir verification file on run and verify on loop On run, write the storage directory verification file. Every time the node runs it will write the file even if it already exists. The reason we do this is because if the verification file is missing, the SN doesn't know whether it is an incorrect directory, or it simply hasn't written the file yet, and we want to keep nodes running without needing operator intervention. Once this change has been a part of the minimum version for several releases, we will move the file creation from the run command to the setup command. Run will only verify its existence. Change-Id: Ib7d20e78e711c63817db0ab3036a50af0e8f49cb	2020-08-19 19:12:21 +00:00
Moby von Briesen	708cb48aa6	storagenode/orders: implement orders filestore on storagenode * Add all new orders to the orders filestore instead of the database. * Submit orders from the filestore to the new satellite SettleWindow endpoint. The orders filestore will eventually replace the orders DB completely. For now, we will still be checking the orders DB and submitting those orders if they exist. In a later release, we will completely remove the orders DB, but we need both the DB and filestore for the transitionary period. Change-Id: Iac8780fd5ab770296181bbd313e1d335f072d4dc	2020-08-19 15:00:35 +00:00
Cameron Ayer	1f5d5235a6	storagenode/{monitor,piecestore}: if free disk < expected available space, return free disk We only sync free disk and available space, if necessary, on startup. If the SNs disk fills up with non-storj data, we will not know about it when reporting available space to the satellite. Solution: whenever we check the node's capacity, double check free disk. If free disk < than expected available space, return free disk. Change-Id: I66265c16e03be45b6e1f5817c70df7eac0a76455	2020-07-22 15:08:37 +00:00
Egon Elbre	e70da5cd4e	all: fix comments Change-Id: I2d2307e3fab87de47a72b3595d051e2c95ff4f8a	2020-07-16 19:13:14 +03:00
Egon Elbre	080ba47a06	all: fix dots Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2	2020-07-16 14:58:28 +00:00
Moby von Briesen	e9dd5b2845	storagenode/piecestore: Properly log/send metrics for all successful pieces When an uplink or repair work finishes uploading a piece to a storagenode, it has no reason to wait another round trip after the piece is committed to gracefully close the connection - in many (most?) cases, the connection is simply canceled once the upload is complete. This has the unintended side effect of producing a lot of "piece canceled" logs and metrics on the storagenode side, when the reality is that the piece uploads were successful, and not really canceled. This commit fixes that. Change-Id: Icbc1f7857d380134560219c1c19c186df2783cd0	2020-07-07 15:19:17 +00:00
Moby von Briesen	0b109c32e4	storagenode/piecestore/usedserials: add monkit metric for serials that are randomly deleted This will give storagenode operators a better idea of whether the memory allocated to the usedserials store is sufficient. Change-Id: I5c30f2e39473a573f43409511ad9e2e32680479c	2020-06-09 17:04:37 -04:00
Egon Elbre	4d0cb1af7e	storagenode/piecestore: verify before checking free disk Change-Id: I3fe0383f9b13b99ef9d63ff235616ff204cf6d76	2020-06-02 17:49:14 +03:00
Qweder93	89c9672ce0	storagenode/piecestore: available storage check added in Upload Change-Id: I71e9e5f335d4320d5de8b374fe747fec43179f78	2020-06-01 16:55:22 +00:00
Moby von Briesen	dc57640d9c	storagenode/piecestore: switch usedserials db for in-memory usedserials store Part 2 of moving usedserials in memory * Drop usedserials table in storagenodedb * Use in-memory usedserials store in place of db for order limit verification * Update order limit grace period to be only one hour - this means uplinks must send their order limits to storagenodes within an hour of receiving them Change-Id: I37a0e1d2ca6cb80854a3ef495af2d1d1f92e9f03	2020-05-28 12:52:52 -04:00
Moby von Briesen	909d6d9668	storagenode/piecestore/usedserials: add in-memory store for used serials Implement an in-memory store for keeping track of order limit serial numbers. It automatically deletes items if its size exceeds a configured limit. This change is part 1 - it creates the store In part 2, the in-memory store will replace the usedserials database Change-Id: I36f540ed809f034a27c1d7cede8a0a8b080af818	2020-05-28 12:52:52 -04:00
littleskunk	2fbb34c3ea	nodeselection: Increase minimum free space to 500MB (#3898 )	2020-05-25 12:13:28 +02:00
Egon Elbre	ed627144ed	all: use DialNodeURL throughout the codebase Change-Id: Iaf9ae3aeef7305c937f2660c929744db2d88776c	2020-05-20 10:36:30 +00:00
littleskunk	ef2671927d	storagenode/piecestore: move queue size defaults (#3881 )	2020-05-15 19:10:26 +02:00
Egon Elbre	ec589a8289	all: fix comments about grpc Change-Id: Id830fbe2d44f083c88765561b6c07c5689afe5bd	2020-05-11 13:05:34 +03:00
Egon Elbre	7d29f2e0d3	all: remove drpc wrappers Change-Id: I45016f7d2a771dc00776196c1f531f3343e93b40	2020-05-11 08:20:34 +03:00
Egon Elbre	e6d5ce6b77	all: remove grpc It seems everyone has migrated to drpc. Change-Id: Ica6b2d0bdef68c6603083f2963458843eca71e9e	2020-05-10 06:36:09 +00:00
Egon Elbre	2fad150952	storagenode/piecestore: remove piecestore.Client.Delete This test was the last place using it. Replace it with a direct call so we can remove the method from uplink piecestore. Change-Id: I62e13028663a7e67aa2495f90ecc02d0d8657fbd	2020-05-04 10:14:55 +03:00
Jeff Wendling	57eb8a17e2	storagenode: allow configuring database path independently Fixes #3852 Change-Id: I021c29c4dd7c393399f6abef41d8457514032833	2020-05-04 06:04:31 +00:00
Matt Robinson	4a95ad0b2f	storagenode/piecestore: remove error from info log for download canceled (#3869 )	2020-04-30 18:42:16 +03:00
Isaac Hess	4f492a1ca9	storagenode/piecestore: add wait to test Change-Id: I05efeffe2845f6b816bf44e50b79c115a86d4b60	2020-04-29 12:38:22 -06:00
Egon Elbre	131654b080	storagenode/piecestore: make tests use DeletePieces directly Currently this test was the last place that was using piecestore.Client.DeletePieces. This way we can remove it from uplink to reduce the code. Change-Id: I72fda8888d05181f95eeb544d067c031ec3e36a0	2020-04-29 15:52:22 +00:00
Isaac Hess	db0371703f	storagenode/pieces: Return UnhandledCount to satellite When we receive a piece deletion request, include the number of piece IDs we couldn't add to the queue in the reponse Change-Id: Ibebbe92ac50105bb5c74b18211ed38d468eb33f3	2020-04-27 08:56:56 -06:00
Isaac Hess	a785d37157	storagenode/pieces: Process deletes asynchronously To improve delete performance, we want to process deletes asynchronously once the message has been received from the satellite. This change makes it so that storagenodes will send the delete request to a piece Deleter, which will process a "best-effort" delete asynchronously and return a success message to the satellite. There is a configurable number of max delete workers and a max delete queue size. Change-Id: I016b68031f9065a9b09224f161b6783e18cf21e5	2020-04-23 11:51:19 -06:00
littleskunk	1336070fec	storagenode/piecestore: add missing log message about audit errors (#3861 ) * storagenode/piecestore: add missing log message about audit errors * storagenode/piecestore: add monkit data for oder limit verification errors	2020-04-23 13:20:47 -04:00
Jess G	dc78cd9634	storagenode/piecestore: fix annoying info log for upload canceled (#3857 ) * storagenode/piecestore: fix annoying info upload canceled log Change-Id: I3dd0f44226e7b946a2b30c3d0f30f28749ca6e88 * keep as info Change-Id: I7ebb5c19e4865e3030a8a6bb7f6279d316853e89 Co-authored-by: Fadila <Fadila82@users.noreply.github.com>	2020-04-15 14:15:07 -07:00
Jess G	75b9a5971e	satellite: update log levels (#3851 ) * satellite: update log levels Change-Id: I86bc32e042d742af6dbc469a294291a2e667e81f * log version on start up for every service Change-Id: Ic128bb9c5ac52d4dc6d6c4cb3059fbad73f5d3de * Use monkit for tracking failed ip resolutions Change-Id: Ia5aa71d315515e0c5f62c98d9d115ef984cd50c2 * fix compile errors Change-Id: Ia33c8b6e34e780bd1115120dc347a439d99e83bf * add request limit value to storage node rpc err Change-Id: I1ad6706a60237928e29da300d96a1bafa94156e5 * we cant track storage node ids in monkit metrics so lets use logging to track that for expired orders Change-Id: I1cc1d240b29019ae2f8c774792765df3cbeac887 * fix build errs Change-Id: I6d0ffe058e9a38b7ed031c85a29440f3d68e8d47	2020-04-15 12:32:22 -07:00
Egon Elbre	e8f18a2cfe	private/testplanet: expose storagenode and satellite Config Change-Id: I80fe7ed8ef7356948879afcc6ecb984c5d1a6b9d	2020-03-27 17:01:25 +02:00
Egon Elbre	1b6ab173a8	private/context2: moved to storj.io/common/context2 Change-Id: Ic1dd1ed645ff3e1057c9b2b143e2c3ddf29d678e	2020-03-20 14:39:46 +00:00
Egon Elbre	f4d5d89b68	private/testplanet: add WaitForStorageNodeEndpoints After calling uplink.Upload it is not guaranteed that the storage node has yet saved all the orders since it happens asynchronously. Hence we need a separate func to wait for them to complete. Change-Id: I0c34b3ea6c98dbcf37f80493c0e10a8bdbbb2aaf	2020-03-05 10:33:56 +00:00
Jennifer Johnson	1c1750e6be	removes bandwidth limiting On satellite, remove all references to free_bandwidth column in nodes table. On storage node, remove references to AllocatedBandwidth and MinimumBandwidth and mark as deprecated. Protobuf message, NodeCapacity, is left intact for backwards compatibility. Once this is released to all satellites, we can drop the column from the DB. Change-Id: I2ff6c6537fc9008a0c5588e951afea58ede85838	2020-03-04 14:04:00 +00:00
Cameron Ayer	7244a6a84e	storagenode/{contact, piecestore}: implement low disk notification with cooldown When a storagenode begins to run low on capacity, we want to notify the satellite before completely running out of space. To achieve this, at the end of an upload request, the SN checks if its available space has fallen below a certain threshold. If so, trigger a notification to the satellites. The new NotifyLowDisk method on the monitor chore is implemented using the common/syn2.Cooldown type, which allows us to execute contact only once within a given timeframe; avoiding hammering the satellites with requests. This PR contains changes to the storagenode/contact package, namely moving methods involving the actual satellite communication out of Chore and into Service. This allows us to ping satellites from the monitor chore Change-Id: I668455748cdc6741291b61130d8ef9feece86458	2020-03-03 10:45:37 -05:00
Egon Elbre	64330c55b3	all: use pbgrpc common/pb moved grpc to a separate package common/pb/pbgrpc. This updates this repository to use it. Change-Id: I2de2a190688871cf9cb61f7ea511f8a01e264e4e	2020-02-26 21:27:47 +02:00
Cameron Ayer	d578102672	storagenode/piecestore: add workgroup to endpoint to prevent stray goroutine after shutdown Change-Id: Ie8444c3c8f870745b73342de2e9a93027fcad371	2020-02-24 21:38:52 +00:00
Cameron Ayer	f22bddf122	{storagenode/contact, private/testplanet}: remove ErrFailureToStart and panic in testplanet.Start Change-Id: I252e8c9407400af7bda95a7657c8154660c3c801	2020-02-24 18:24:23 +00:00
Yingrong Zhao	5011e78311	storagenode/piecestore: remove unused DeletePiece endpoint With commit: `3331b443e7`, satellite will start calling `DeletePieces`. Therefore, we can remove the old endpoint once the above commit is deployed with all satellites Change-Id: I0124bc00a7cb808d119eb59f8fcd7fadf68158bb	2020-02-21 21:03:49 +00:00
Egon Elbre	5342dd9fe6	go.mod: update uplink Change-Id: I867a6a1eef8aa5d60bb676e5112b98c4192ce811	2020-02-21 16:08:12 +02:00
Cameron Ayer	3e70a893dd	storagenode/{piecestore, contact}: report capacity to satellites if below specific threshold Curently, storage nodes only report their capacity to satellites once per hour. If a node fills up, it will fail all uploads until the next contact cycle begins. With these changes, at the end of an upload we check whether the MinimumDiskSpace threshold has been passed. If so, trigger the monitor chore to update the node's capacity, then trigger the contact chore to report the new capacity to the satellites Change-Id: Ie6aadaade1e2c12c87e03f8ff9059a50121380a0	2020-02-18 15:42:48 -05:00
Egon Elbre	8f20085683	storagenode/piecestore: clearer client cancellation error message Change-Id: Ia0595f71eb3eb1c0f091e615652e2de376d5609d	2020-02-14 09:36:03 +00:00
Michal Niewrzal	426c8eb31a	private/testplanet: add DeleteBucket method for uplink New method added to be able to delete easily bucket during tests. Change-Id: Iaae89618cc676ddbbbd4b0df2eeacd143ea6f3c2	2020-02-11 15:58:13 +00:00
Jeff Wendling	7999d24f81	all: use monkit v3 this commit updates our monkit dependency to the v3 version where it outputs in an influx style. this makes discovery much easier as many tools are built to look at it this way. graphite and rothko will suffer some due to no longer being a tree based on dots. hopefully time will exist to update rothko to index based on the new metric format. it adds an influx output for the statreceiver so that we can write to influxdb v1 or v2 directly. Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff	2020-02-05 23:53:17 +00:00
Jeff Wendling	03166d6be3	storagenode/piecestore: log available bandwidth and space on uploads Change-Id: Ia92228cb2a178da45f4f123b48c476e5ec821fe8	2020-01-31 19:47:14 +00:00
Jeff Wendling	16bb374deb	storagenode/piecestore: add large timeouts to read/write operations this is to help protect against intentional or unintentional slowloris style problems where a client keeps a tcp connection alive but never sends any data. because grpc is great, we have to spawn a separate goroutine for every read/write to the stream so that we can return from the server handler to cancel it if necessary. yep. really. additionally, we update the rpcstatus package to do some stack trace capture and add a Wrap method for the times where we want to just use the existing error. also fixes a number of TODOs where we attach status codes to the returned errors in the endpoints. Change-Id: Id8bb8ff84aa34e0f711b0cf9bce3908b36a1d3c1	2020-01-23 19:20:49 +00:00

1 2 3 4

152 Commits