storj

Author	SHA1	Message	Date
paul cannon	556250911c	storagenode/monitor: add option to log only when verification check fails This is not recommended for most nodes; leaving your node running when it can't handle requests fast enough is a good way to fail audits and get disqualified, which may happen before you even know about the problem. But some Windows users are finding that this is being triggered regularly on their nodes, and that it apparently causes the whole system to lock up occasionally. We are adding this option as a way to mitigate that problem until we can collect more information. Change-Id: I7a652b0f9f970bbb9ed9f2cb3ad1cb89d90db8d7	2023-04-04 12:41:33 +00:00
Clement Sam	c3d5965ef2	storagenode/monitor: add timeout to storage dir verification Resolves https://github.com/storj/storj/issues/4567 Change-Id: Ia071c476bcd1f5c99a9874801c94db86d1e105c6	2023-03-14 13:43:14 +00:00
Clement Sam	0d58172c38	storagenode: add doc.go files for sno packages Change-Id: I23d4b8b462e1b03718d0c4801cc2aaff520e7356	2021-09-29 08:24:56 +00:00
Egon Elbre	1aec831d98	satellite/audit,storage: increase sleep delay in TestMaxVerifyCount Currently TextMaxVerifyCount flakes in some tests, try increasing the sleep time to ensure that things are slow enough to trigger the error condition. Also pass ctx to all the funcs so we can handle sleep better. Change-Id: I605b6ea8b14a0a66d81a605ce3251f57a1669c00	2021-09-10 15:30:37 +00:00
Yaroslav Vorobiov	fb00d099cf	multinode/console: list node infos Change-Id: I5cac49feff2bac6fbd7ac61dfccffd672da8e8c0	2021-01-05 14:49:11 +00:00
Stefan Benten	494bd5db81	all: golangci-lint v1.33.0 fixes (#3985 )	2020-12-05 17:01:42 +01:00
Cameron Ayer	5a337c48ec	{cmd,private,storagenode}: create storage dir verification during setup Previously, we created a new file to use for directory verification every time the storage node starts. This is not helpful if the storage node points to the wrong directory when restarting. Now we will only create the file on setup. Now the file should be created only once and will be verified at runtime. Change-Id: Id529f681469138d368e5ea3c63159befe62b1a5b	2020-11-11 11:01:36 -05:00
Egon Elbre	cda67a659a	storagenode/internalpb: move inspector.proto Change-Id: I951379c3b2ff00d1bc09d6a49c026a7e723432d6	2020-10-30 14:51:26 +02:00
Cameron Ayer	ca0c1a5f0c	storagenode/{monitor,pieces}, storage/filestore: add loop to check storage directory writability periodically create and delete a temp file in the storage directory to verify writability. If this check fails, shut the node down. Change-Id: I433e3a8d1d775fc779ae78e7cf3144a05ffd0574	2020-08-31 21:20:49 +00:00
nerdatwork	e072febbcc	Fixed typo in log for allocated space (#3934 )	2020-08-29 16:36:37 +02:00
Cameron Ayer	0155c21b44	private/testplanet, storagenode/{monitor,pieces}: write storage dir verification file on run and verify on loop On run, write the storage directory verification file. Every time the node runs it will write the file even if it already exists. The reason we do this is because if the verification file is missing, the SN doesn't know whether it is an incorrect directory, or it simply hasn't written the file yet, and we want to keep nodes running without needing operator intervention. Once this change has been a part of the minimum version for several releases, we will move the file creation from the run command to the setup command. Run will only verify its existence. Change-Id: Ib7d20e78e711c63817db0ab3036a50af0e8f49cb	2020-08-19 19:12:21 +00:00
Cameron Ayer	586e6f2f13	private/testblobs, storage, storage/filestore: add storage dir verification to filestore Sometimes SNOs fail to properly configure or lose connection to their storage directory which can result in DQ. This causes unnecessary repair and is unfortunate for all parties. This change introduces the creation of a special file in the storage directory at runtime containing the node ID. While the storage node runs, it periodically verifies that it can find said file with the correct contents in the correct location. If not, the node will shut down with an error message. This change will solve the issue of nodes losing access to the storage directory, but it will not solve the issue of nodes pointing to the wrong directory, as the identifying file is created each time the node starts up. After this change has been the minimum version for a few releases, we will remove the creation of the directory-identifying file from the storage node run command and add it to the setup command. Change-Id: Ib7b10e96ac07373219835e39239e93957e7667a4	2020-08-19 17:18:14 +00:00
Egon Elbre	94a09ce20b	all: add missing dots Change-Id: I93b86c9fb3398c5d3c9121b8859dad1c615fa23a	2020-08-11 17:50:01 +03:00
Cameron Ayer	1f5d5235a6	storagenode/{monitor,piecestore}: if free disk < expected available space, return free disk We only sync free disk and available space, if necessary, on startup. If the SNs disk fills up with non-storj data, we will not know about it when reporting available space to the satellite. Solution: whenever we check the node's capacity, double check free disk. If free disk < than expected available space, return free disk. Change-Id: I66265c16e03be45b6e1f5817c70df7eac0a76455	2020-07-22 15:08:37 +00:00
Egon Elbre	080ba47a06	all: fix dots Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2	2020-07-16 14:58:28 +00:00
Jennifer Johnson	1c1750e6be	removes bandwidth limiting On satellite, remove all references to free_bandwidth column in nodes table. On storage node, remove references to AllocatedBandwidth and MinimumBandwidth and mark as deprecated. Protobuf message, NodeCapacity, is left intact for backwards compatibility. Once this is released to all satellites, we can drop the column from the DB. Change-Id: I2ff6c6537fc9008a0c5588e951afea58ede85838	2020-03-04 14:04:00 +00:00
Cameron Ayer	7244a6a84e	storagenode/{contact, piecestore}: implement low disk notification with cooldown When a storagenode begins to run low on capacity, we want to notify the satellite before completely running out of space. To achieve this, at the end of an upload request, the SN checks if its available space has fallen below a certain threshold. If so, trigger a notification to the satellites. The new NotifyLowDisk method on the monitor chore is implemented using the common/syn2.Cooldown type, which allows us to execute contact only once within a given timeframe; avoiding hammering the satellites with requests. This PR contains changes to the storagenode/contact package, namely moving methods involving the actual satellite communication out of Chore and into Service. This allows us to ping satellites from the monitor chore Change-Id: I668455748cdc6741291b61130d8ef9feece86458	2020-03-03 10:45:37 -05:00
Jeff Wendling	05a240050e	storagenode: monitor available space and bandwidth Change-Id: I5763597327c5b32982faab8910c136c6c8dc18c5	2020-02-13 07:07:29 +00:00
Jeff Wendling	7999d24f81	all: use monkit v3 this commit updates our monkit dependency to the v3 version where it outputs in an influx style. this makes discovery much easier as many tools are built to look at it this way. graphite and rothko will suffer some due to no longer being a tree based on dots. hopefully time will exist to update rothko to index based on the new metric format. it adds an influx output for the statreceiver so that we can write to influxdb v1 or v2 directly. Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff	2020-02-05 23:53:17 +00:00
igor gaidaienko	efa0f6d443	storagenode/monitor: set MinimumDiskSpace default to 500GB. As a workaround it was set to 0 in previous release. Now according to the TOC must be set to 500GB. Change-Id: Ia2743d49e86683396958aff51b95df743af4f872	2020-02-04 15:55:42 +00:00
Jeff Wendling	71ff044edb	storagenode/bandwidth: fix tests to not fail for 10 hours near the end of the month Change-Id: I390569a8702164c42edddd3be020e93782227c2e	2020-01-31 16:25:52 -07:00
littleskunk	81eddaa2c1	storagenode/monitor: reduce space requirement to 0 We have added a bug with v0.31.7 and deploying it would kick out all the storage nodes that are full. Easy fix is setting the requirment to 0. That will allow them to still start up even if they are full. Change-Id: Ie66f369952d929fcfd47f44f6e5e57eea8f51ff6	2020-01-30 01:44:45 +01:00
Egon Elbre	10be538602	storagenode: add pkg/debug support Change-Id: If941095b886c28a0d53fff4c9bf9fa0ce7471dea	2020-01-29 16:30:31 -05:00
Egon Elbre	6615ecc9b6	common: separate repository Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a	2019-12-27 14:11:15 +02:00
Isaac Hess	7d1e28ea30	storagenode: Include trash space when calculating space used This commit adds functionality to include the space used in the trash directory when calculating available space on the node. It also includes this trash value in the space used cache, with methods to keep the cache up-to-date as files are trashed, restored, and emptied. As part of the commit, the RestoreTrash and EmptyTrash methods have slightly changed signatures. RestoreTrash now also returns the keys that were restored, while EmptyTrash also returns the total disk space recovered. Each of these changes makes it possible to keep the cache up-to-date and know how much space is being used/recovered. Also changed is the signature of PieceStoreAccess.ContentSize method. Previously this method returns only the content size of the blob, removing the size of any header data. This method has been renamed `Size` and returns both the full disk size and content size of the blob. This allows us to only stat the file once, and in some instances (i.e. cache) knowing the full file size is useful. Note: This commit simply adds the trash size data to the piece size data we were already collecting. The piece size data is not accurate for all use-cases (e.g. because it does not contain piece header data); however, this commit does not fix that problem. Now that the ContentSize (Size) method returns the full size of the file, it should be easier to fix this problem in a future commit. Change-Id: I4a6cae09e262c8452a618116d1dc66b687f59f85	2019-12-23 19:07:03 -07:00
Rafael Antonio Ribeiro Gomes	2739771761	storagenode: add bandwidth metrics (#3623 ) * storagenode: add bandwidth metrics * remove unecessary metric	2019-11-21 16:51:40 -03:00
Egon Elbre	ee6c1cac8a	private: rename internal to private (#3573 )	2019-11-14 21:46:15 +02:00
Jennifer Li Johnson	b185dbbee2	satellite/discovery: remove discovery related code (#3175 )	2019-10-14 10:57:01 -04:00
Jennifer Li Johnson	7ceaabb18e	Delete Bootstrap and Kademlia (#2974 )	2019-10-04 16:48:41 -04:00
Jennifer Li Johnson	724bb44723	Remove Kademlia dependencies from Satellite and Storagenode (#2966 ) What: cmd/inspector/main.go: removes kad commands internal/testplanet/planet.go: Waits for contact chore to finish satellite/contact/nodesservice.go: creates an empty nodes service implementation satellite/contact/service.go: implements Local and FetchInfo methods & adds external address config value satellite/discovery/service.go: replaces kad.FetchInfo with contact.FetchInfo in Refresh() & removes Discover() satellite/peer.go: sets up contact service and endpoints storagenode/console/service.go: replaces nodeID with contact.Local() storagenode/contact/chore.go: replaces routing table with contact service storagenode/contact/nodesservice.go: creates empty implementation for ping and request info nodes service & implements RequestInfo method storagenode/contact/service.go: creates a service to return the local node and update its own capacity storagenode/monitor/monitor.go: uses contact service in place of routing table storagenode/operator.go: moves operatorconfig from kad into its own setup storagenode/peer.go: sets up contact service, chore, pingstats and endpoints satellite/overlay/config.go: changes NodeSelectionConfig.OnlineWindow default to 4hr to allow for accurate repair selection Removes kademlia setups in: cmd/storagenode/main.go cmd/storj-sim/network.go internal/testplane/planet.go internal/testplanet/satellite.go internal/testplanet/storagenode.go satellite/peer.go scripts/test-sim-backwards.sh scripts/testdata/satellite-config.yaml.lock storagenode/inspector/inspector.go storagenode/peer.go storagenode/storagenodedb/database.go Why: Replacing Kademlia Please describe the tests: • internal/testplanet/planet_test.go: TestBasic: assert that the storagenode can check in with the satellite without any errors TestContact: test that all nodes get inserted into both satellites' overlay cache during testplanet setup • satellite/contact/contact_test.go: TestFetchInfo: Tests that the FetchInfo method returns the correct info • storagenode/contact/contact_test.go: TestNodeInfoUpdated: tests that the contact chore updates the node information TestRequestInfoEndpoint: tests that the Request info endpoint returns the correct info Please describe the performance impact: Node discovery should be at least slightly more performant since each node connects directly to each satellite and no longer needs to wait for bootstrapping. It probably won't be faster in real time on start up since each node waits a random amount of time (less than 1 hr) to initialize its first connection (jitter).	2019-09-19 15:56:34 -04:00
Egon Elbre	a801fab66a	all: add archview annotations (#2964 )	2019-09-10 16:24:16 +03:00
Kaloyan Raev	9dccf59e8e	Restrict node info only for trusted satellites (#2737 )	2019-08-09 12:21:41 +03:00
paul cannon	17bdb5e9e5	move piece info into files (#2629 ) Deprecate the pieceinfo database, and start storing piece info as a header to piece files. Institute a "storage format version" concept allowing us to handle pieces stored under multiple different types of storage. Add a piece_expirations table which will still be used to track expiration times, so we can query it, but which should be much smaller than the pieceinfo database would be for the same number of pieces. (Only pieces with expiration times need to be stored in piece_expirations, and we don't need to store large byte blobs like the serialized order limit, etc.) Use specialized names for accessing any functionality related only to dealing with V0 pieces (e.g., `store.V0PieceInfo()`). Move SpaceUsed- type functionality under the purview of the piece store. Add some generic interfaces for traversing all blobs or all pieces. Add lots of tests.	2019-08-07 20:47:30 -05:00
ethanadams	4faed2098d	method name changer per PR 2469 (#2494 )	2019-07-09 16:05:14 -04:00
ethanadams	ff6f1d1b32	storagenode: add in-memory tracking for bandwidth and disk usage (#2469 ) * Add in-memory cache for bandwidth and space usage monitoring * moved some structs around and added error handling for get piece size query * added to existing bandwidth test. fixed typo * added test, updates from PR review, added monkit for new methods * PR review updates. renamed space used methods * changed bw cache so that only Add updates the cache and it only overwrites when the date moves forward * moved bandwidth usage to bw and space usage to pieceinfodb * fixed interface comment * removed pointer from sync.Once	2019-07-08 20:33:50 -04:00
Egon Elbre	b6ad3e9c9f	internal/testrand: new package for random data (#2282 )	2019-06-26 13:38:51 +03:00
Stefan Benten	74484fc57e	Enforce our Minimum Requirements for Node Operators and sanity check them (#2155 )	2019-06-10 12:14:50 +02:00
JT Olio	c4bb84f209	storagenode: add monkit task to missing places (#2107 )	2019-06-04 14:31:38 +02:00
Egon Elbre	a2b61fd67c	storage node collector (#1913 )	2019-05-08 14:11:59 +03:00
Stefan Benten	eeb4f6541e	Update Calculation to include the used space for the allocation (#1899 )	2019-05-06 14:59:30 -04:00
Kaloyan Raev	8fc5fe1d6f	Refactor pb.Node protobuf (#1785 )	2019-04-22 12:07:50 +03:00
Michal Niewrzal	e922f71f61	Piecestore space and bandwidth allocation check (#1527 ) * Space and bandwidth allocation check * use proper config flag * one more check + tests * use monitor to check space and bandwidth * remove unused field * check during read/write * fix linter * fix pieceid * remove unused methods * revert unneeded change	2019-04-15 12:12:22 +02:00
Michal Niewrzal	b05cf05649	Restrict slash in bucket name (#1524 )	2019-03-19 15:37:28 +01:00
Egon Elbre	7961bcbc92	remove free disk check since it's unreliable (#1516 )	2019-03-18 16:18:44 -04:00
Egon Elbre	05d148aeb5	Storage node and upload/download protocol refactor (#1422 ) refactor storage node server refactor upload and download protocol	2019-03-18 12:55:06 +02:00

45 Commits