storj

Author	SHA1	Message	Date
Isaac Hess	2c5e169888	storagenode/storagenodedb: Vacuum info.db to prepare for splitting storagenodedbs (#3134 )	2019-09-27 07:55:51 -06:00
Jeff Wendling	098cbc9c67	all: use pkg/rpc instead of pkg/transport all of the packages and tests work with both grpc and drpc. we'll probably need to do some jenkins pipelines to run the tests with drpc as well. most of the changes are really due to a bit of cleanup of the pkg/transport.Client api into an rpc.Dialer in the spirit of a net.Dialer. now that we don't need observers, we can pass around stateless configuration to everything rather than stateful things that issue observations. it also adds a DialAddressID for the case where we don't have a pb.Node, but we do have an address and want to assert some ID. this happened pretty frequently, and now there's no more weird contortions creating custom tls options, etc. a lot of the other changes are being consistent/using the abstractions in the rpc package to do rpc style things like finding peer information, or checking status codes. Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412	2019-09-25 15:37:06 -06:00
Isaac Hess	580e511b4c	storagenode/storagenodedb: Migrate to separate dbs (#3081 ) * storagenode/storagenodedb: Migrate to separate dbs * storagenode/storagenodedb: Add migration to drop versions tables * Put drop table statements into a transaction. * Fix CI errors. * Fix CI errors. * Changes requested from PR feedback. * storagenode/storagenodedb: fix tx commit	2019-09-23 12:36:46 -07:00
Jennifer Li Johnson	d2502bb51b	Adds tests for kad replacement and restores kad operator configs (#3094 ) * test that all nodes can check in with all satellites * keep kademlia config * add untrusted satellite test * use getversion * remove kademlia config changes in test-sim-backwards.sh * add kademlia flags back to storj-sim storagenode * reset kademlia flags in storagenode entrypoint	2019-09-20 16:02:23 -04:00
Jennifer Li Johnson	724bb44723	Remove Kademlia dependencies from Satellite and Storagenode (#2966 ) What: cmd/inspector/main.go: removes kad commands internal/testplanet/planet.go: Waits for contact chore to finish satellite/contact/nodesservice.go: creates an empty nodes service implementation satellite/contact/service.go: implements Local and FetchInfo methods & adds external address config value satellite/discovery/service.go: replaces kad.FetchInfo with contact.FetchInfo in Refresh() & removes Discover() satellite/peer.go: sets up contact service and endpoints storagenode/console/service.go: replaces nodeID with contact.Local() storagenode/contact/chore.go: replaces routing table with contact service storagenode/contact/nodesservice.go: creates empty implementation for ping and request info nodes service & implements RequestInfo method storagenode/contact/service.go: creates a service to return the local node and update its own capacity storagenode/monitor/monitor.go: uses contact service in place of routing table storagenode/operator.go: moves operatorconfig from kad into its own setup storagenode/peer.go: sets up contact service, chore, pingstats and endpoints satellite/overlay/config.go: changes NodeSelectionConfig.OnlineWindow default to 4hr to allow for accurate repair selection Removes kademlia setups in: cmd/storagenode/main.go cmd/storj-sim/network.go internal/testplane/planet.go internal/testplanet/satellite.go internal/testplanet/storagenode.go satellite/peer.go scripts/test-sim-backwards.sh scripts/testdata/satellite-config.yaml.lock storagenode/inspector/inspector.go storagenode/peer.go storagenode/storagenodedb/database.go Why: Replacing Kademlia Please describe the tests: • internal/testplanet/planet_test.go: TestBasic: assert that the storagenode can check in with the satellite without any errors TestContact: test that all nodes get inserted into both satellites' overlay cache during testplanet setup • satellite/contact/contact_test.go: TestFetchInfo: Tests that the FetchInfo method returns the correct info • storagenode/contact/contact_test.go: TestNodeInfoUpdated: tests that the contact chore updates the node information TestRequestInfoEndpoint: tests that the Request info endpoint returns the correct info Please describe the performance impact: Node discovery should be at least slightly more performant since each node connects directly to each satellite and no longer needs to wait for bootstrapping. It probably won't be faster in real time on start up since each node waits a random amount of time (less than 1 hr) to initialize its first connection (jitter).	2019-09-19 15:56:34 -04:00
Jess G	93788e5218	remove kademlia: create upsert query to update uptime (#2999 ) * create upsert query for check-in method * add tests * fix lint err * add benchmark test for db query * fix lint and tests * add a unit test, fix lint * add address to tests * replace print w/ b.Fatal * refactor query per CR comments * fix disqualified, only set if null * fix query * add version to updatecheckin query * fix version * fix tests * change version for tests * add version to tests * add IP, add transport, mv unit test * use node.address as arg * add last ip * fix lint	2019-09-19 11:37:31 -07:00
Simon Guindon	a2b1e9fa95	storagenode/storagenodedb: refactor both data access objects and migrations to support multiple DB connections (#3057 ) * Split the info.db database into multiple DBs using Backup API. * Remove location. Prev refactor assumed we would need this but don't. * Added VACUUM to reclaim space after splitting storage node databases. * Added unique names to SQLite3 connection hooks to fix testplanet. * Moving DB closing to the migration step. * Removing the closing of the versions DB. It's already getting closed. * Swapping the database connection references on reconnect. * Moved sqlite closing logic away from the boltdb closing logic. * Moved sqlite closing logic away from the boltdb closing logic. * Remove certificate and vouchers from DB split migration. * Removed vouchers and bumped up the migration version. * Use same constructor in tests for storage node databases. * Use same constructor in tests for storage node databases. * Adding method to access underlining SQL database connections and cleanup * Adding logging for migration diagnostics. * Moved migration closing database logic to minimize disk usage. * Cleaning up error handling. * Fix missing copyright. * Fix linting error. * Add test for migration 21 (#3012) * Refactoring migration code into a nicer to use object. * Refactoring migration code into a nicer to use object. * Fixing broken migration test. * Removed unnecessary code that is no longer needed now that we close DBs. * Removed unnecessary code that is no longer needed now that we close DBs. * Fixed bug where an invalid database path was being opened. * Fixed linting errors. * Renamed VersionsDB to LegacyInfoDB and refactored DB lookup keys. * Renamed VersionsDB to LegacyInfoDB and refactored DB lookup keys. * Fix migration test. NOTE: This change does not address new tables satellites and satellite_exit_progress * Removing v22 migration to move into it's own PR. * Removing v22 migration to move into it's own PR. * Refactored schema, rebind and configure functions to be re-useable. * Renamed LegacyInfoDB to DeprecatedInfoDB. * Cleaned up closeDatabase function. * Renamed storageNodeSQLDB to migratableDB. * Switched from using errs.Combine() to errs.Group in closeDatabases func. * Removed constructors from storage node data access objects. * Reformatted usage of const. * Fixed broken test snapshots. * Fixed linting error.	2019-09-18 12:17:28 -04:00
Simon Guindon	91d54af705	Add satellites database business objects. (#3055 ) * Add satellites database business objects. * Fixed linting error.	2019-09-16 13:54:53 -04:00
Jess G	d3ef574b20	pkg/pb: minor changes to contact.proto (#3048 ) * minor fixes to contact proto * simply and rm nodeAddr object from client	2019-09-13 19:37:32 -05:00
Egon Elbre	ca058e606f	storagenode/orders: fix data race in settle (#3042 )	2019-09-13 15:50:39 +03:00
Cameron	44bcdd222f	storagenode/contact: test node info is updated (#3039 )	2019-09-13 07:53:48 -04:00
Jeff Wendling	0dcbd3dc08	bootstrap/satellite/certificate/storagenode: register drpc services Change-Id: Id29f14b76a8c9cb2be31001b9a7a4356a4bda183	2019-09-12 15:09:46 -06:00
Cameron	ab1147afb6	storagenode/pieces: fix race condition in cache service (#2972 ) * add mutex lock to PersistCacheTotals, move lock around copyCacheTotals to inside function	2019-09-12 12:42:39 -04:00
Egon Elbre	e5ac95b6e9	storagenode/inspector: fix TestInspectorStats flakyiness by waiting for requests to be handled (#3018 )	2019-09-12 02:26:55 -07:00
Yingrong Zhao	9f2f1527c5	storagenode/storagenodedb: add new tables for graceful exit (#3008 ) * add database schema * add migration * change table name and update blueprint	2019-09-11 18:57:53 -04:00
Natalie Villasana	aa3567187e	satellite/audit: worker now verifies and reverifies (#2965 )	2019-09-11 18:37:01 -04:00
paul cannon	c139ed8ea1	storagenode/console: remove kademlia (#2942 ) this is a trivial operation for storagenode/console, as it doesn't really need or use kademlia in the first place. What: Removes kademlia from storagenode/console Why: We are in the process of getting rid of kademlia, and this is one place where it's particularly easy. Please describe the tests: Existing tests exercise storagenode/console behavior; if they continue to work, everything here should be tested satisfactorily. Please describe the performance impact: None	2019-09-11 16:41:43 -04:00
Isaac Hess	7718802f0c	storagenode/storagenodedb: prepare for multiple databases (#3005 ) * Migrate test: prepare for multiple databases * Add copyright * Fix unused variables * Move data to testdata, split MultiDBSnapshot from MultiDBState	2019-09-11 14:31:46 -06:00
Isaac Hess	0b32572ae6	migrate: Allow work on separate dbs (#2996 )	2019-09-10 13:42:23 -06:00
Jess G	2fc4d61610	implement contact.checkin method (#2952 ) * implement contact.checkin method * add batching to update uptime checks * rm batching * rm other unneeded things * fix lint * fix unit test * changes per CR comments * couple more CR changes * add identity check into grpcOpt * fix lint * why do you fix the test * revert test change * stop contact chore for repair test * put node in cache * comment out contact chore. See what happens * Revert "comment out contact chore. See what happens" This reverts commit 2e45008e36a50e0a842ae455ac83de77093d4daa. * try stopping contact earlier * stop contact chore in uplink_test * replace self on chore with RoutingTable for access to latest node info Revert "stop contact chore in uplink_test" This reverts commit 302db70f4071112d1b9f7ee0279225ea12757723. * Revert "try stopping contact earlier" This reverts commit 806cc3b82f9d598899dafd83da9315a1cb0cb43c. * Revert "stop contact chore for repair test" This reverts commit dd34de1cfdfc09b972186c9ab9a4f1e822446b79.	2019-09-10 09:05:07 -07:00
Egon Elbre	a801fab66a	all: add archview annotations (#2964 )	2019-09-10 16:24:16 +03:00
Jennifer Li Johnson	3387750280	storagenode/contact: create chore for nodes to ping satellites (#2877 ) Creates a chore for nodes to announce themselves to their trusted satellites. Runs on startup and every hour thereafter	2019-09-06 12:14:03 -04:00
Yaroslav Vorobiov	c35ad5cbfc	storagenode/console: update api (#2969 )	2019-09-06 15:01:03 +03:00
paul cannon	9821a21e5c	satellite,storagenode,bootstrap: add contact service to peer (#2951 ) * satellite,storagenode,bootstrap: add contact service to peer	2019-09-04 15:04:18 -04:00
paul cannon	adfa16188b	pkg/contact: bare-bones service and endpoint (#2941 ) * pkg/contact: bare-bones service and endpoint * split contact package into satellite and node * use new contact protobuf types	2019-09-04 11:29:34 -07:00
Yaroslav Vorobiov	f7403f97b0	storagenode/storageusage: add summary, rename timestamp to interval_start (#2911 )	2019-09-04 17:13:43 +03:00
Yaroslav Vorobiov	758f7cb3dd	storagenode/bandwidth: remove bandwidth concerns from console, add satellite summary (#2923 )	2019-09-04 17:01:55 +03:00
Michal Niewrzal	ee614bf032	storagenode: add custom dial timeout for orders sending (#2939 )	2019-09-03 17:32:28 +02:00
Michal Niewrzal	3fbe31aada	storagenode: Increase order sending request timeout (#2930 )	2019-09-02 13:24:02 +02:00
littleskunk	9d1910cb2b	storagenode/orderarchive: Reduce TTL from 45 to 7 days (#2915 )	2019-08-29 22:38:09 +02:00
Ivan Fraixedes	537769d7fa	storagenode/orders: Don't return error Archiving unsent (#2903 ) Don't return error when archiving errors which aren't found in the DB because it causes Storage Node send orders cycle to stop. This was applied in the commit `e47b8ed131` but the last call to orders.Archive function was missed so the errors weren't returned when not found orders in the first call but they were returned in the second call. This commit address the second call for making handleBatches function never returns error on not found orders.	2019-08-29 20:22:22 +02:00
Egon Elbre	8a5db77e04	storagenode/retain: add comment (#2910 )	2019-08-29 19:42:17 +03:00
Yaroslav Vorobiov	b4d7d6778f	storagenode/reputation: add disqualified flag (#2862 )	2019-08-28 23:54:12 +03:00
Egon Elbre	62e3bf5b34	storagenode/retain: fix concurrency issues (#2828 ) * nicer flags * fix concurrency * add concurrent workers * initialize things * fix tests * close retain service * ensure we don't have workers working on the same satellite * ensure things compile * fix other compilation issues: * concurrency changes ran this with `go test -count=1000` and it passed all of them. - we add a closed channel so that we can select on it with context cancellation. - we put a once in so we only close the channel once. - every time the queue/running state changes, we have to broadcast because we may want to wake up N pending Wait calls or other concurrent workers. - because we broadcast, we don't need to do the polling in Wait anymore. - ensure Run doesn't start multiple times so that we don't have to worry about concurrent Close with multiple Runs. - hold the lock while we start workers so that a concurrent Close with Run can't decide that there's nothing started and exit and then have Run start things. - make sure to poll the closed/context channels through loops or at the start of Run calls in case Close happens first. - these polls should be under a mutex because they have a default case which makes it possible to schedule such that Close hasn't executed the channel close so it starts more work. - cancel a local Run context when it's going to exit to make sure that any retainPieces calls have a canceled context. - hopefully enough comments to both check my work and help readers digest what's going on. Change-Id: Ida0e226a7e01e8ae64fa2c59dd5a84b04bccfbd7 * use the retain error class Change-Id: I1511eaef135f98afd57b878e997e4c8a0d11cafc * concurrency fixes again - forgot to update the gc test to use the old Wait api. - we need to drop the lock while we wait for the workers to exit, because they may be blocked on the condition variable - additionally, we need to broadcast when we close the signal channel because the state changed: they want to wake up and exit. Change-Id: I4204699792275260cd912f29aa73720f7d9b14b5 * undo my misguided rename Change-Id: I6baffe1eb0434e260212c485bbcc01bed3250881 * remove pollInterval * format paragraph more nicely * move skew calculation into retain pieces	2019-08-28 16:35:25 -04:00
Bill Thorp	a250551b6d	storagenode/piecestore + uplink/piecestore: return `PieceHash` and original `OrderLimit` during GET_REPAIR (#2775 )	2019-08-26 14:57:41 -04:00
Egon Elbre	977472ed32	all: use fewer storage nodes to improve test memory usage (#2875 ) * storagenode/inspector: use less storage nodes * lib/uplinkc: use fewer storage nodes	2019-08-26 14:40:44 -04:00
Cameron	1f3537d4a9	storagenode/vouchers: remove storagenode vouchers (#2873 )	2019-08-26 19:35:19 +03:00
Maximillian von Briesen	65e2d2e711	storagenode/piecestore: ignore canceled errors on download (#2822 ) * ignore canceled errors on piecestore endpoint download	2019-08-23 11:16:43 -04:00
Cameron	3d9441999a	storagenode/orders: add archive cleanup to orders service (#2821 ) This PR introduces functionality for routine deletion of archived orders. The user may specify an interval at which to run archive cleanup and a TTL for archived items. During each cleanup, all items that have reached the TTL are deleted This archive cleanup job is combined with the order sender into a new combined orders service	2019-08-22 10:33:14 -04:00
Egon Elbre	00b2e1a7d7	all: enable staticcheck (#2849 ) * by having megacheck in disable it also disabled staticcheck * fix closing body * keep interfacer disabled * hide bodies * don't use deprecated func * fix dead code * fix potential overrun * keep stylecheck disabled * don't pass nil as context * fix infinite recursion * remove extraneous return * fix data race * use correct func * ignore unused var * remove unused consts	2019-08-22 13:40:15 +02:00
Jeff Wendling	14e36e4d60	storagenode/nodestats: fix issue on 32 bit platforms (#2841 ) * storagenode/nodestats: fix issue on 32 bit platforms time.Duration is an int64, so casting it down to an int can cause it to become negative, causing a panic. Change-Id: I33da7c29ddd59be60d8deec944a25f4a025902c7 * storagenode/nodestats: fix lint issue in test Change-Id: Ie68598d724d2cae0dc959d4877098a08f4eb9af7	2019-08-21 18:57:44 -06:00
Egon Elbre	2d69d47655	all: fix Error.New formatting (#2840 )	2019-08-21 19:30:29 +03:00
Simon Guindon	476fbf919a	storagenode/storagenodedb: refactor SQLite3 database connection initialization. (#2732 ) * Rebasing changes against master. * Added back withTx(). * Fix using new error type. * Moving back database initialization back into the struct. * Fix failing migration tests. * Fix linting errors. * Renamed database object names to be consistent. * Fixing linting error in imports. * Rebasing changes against master. * Added back withTx(). * Fix using new error type. * Moving back database initialization back into the struct. * Fix failing migration tests. * Fix linting errors. * Renamed database object names to be consistent. * Fixing linting error in imports. * Adding missing change from merge. * Fix error name.	2019-08-21 10:32:25 -04:00
Egon Elbre	9ec0ceddf3	pkg/revocation: ensure we close revocation databases (#2825 )	2019-08-20 18:04:17 +03:00
littleskunk	6615350188	initialize used space table with sum over pieceinfo (#2818 )	2019-08-20 08:13:18 -04:00
Isaac Hess	25154720bd	lib/uplink: remove redis and bolt dependencies (#2812 ) * identity: remove redis and bolt dependencies * identity: move revDB creation to main files	2019-08-19 16:10:38 -06:00
Maximillian von Briesen	d83a965139	storagenode/piecestore: Add retain service on storagenode (#2785 ) Add retain service on storagenode. This service runs retain jobs that have been queued by the storagenodes. Rather than running retain jobs during the grpc Retain() call, the grpc call queues a retain job to the retain service and returns immediately afterwards, removing a significant bottleneck in garbage collection.	2019-08-19 14:52:47 -04:00
Ivan Fraixedes	546d099cf5	storagenode/orders: An invalid one don't have to stop all (#2804 ) When an unsent order stored in the DB cannot be unmarshalled due to an unmarshal error the rest unsent orders must be processed as usual. This changes will avoid that a Storage Node with unsent orders with invalid protobuf serialized values get blocked without sending orders until those invalid ones get removed from the DB.	2019-08-16 17:33:51 +02:00
Ivan Fraixedes	e47b8ed131	storagenode: No FATAL error when unsent orders aren't found (#2801 ) * pkg/process: Fatal show complete error information Change the general process execution function to not using the sugared logger for outputting the full error information. Delete some unreachable code because Zap logger Fatal method calls exit 1 internally. * storagenode/storagenodedb: Add info to error Add more information to an error returned due to some data inconsistency. * storagenode/orders: Don't use sugared logger Don't use sugar logger and provide better contextualized error messages in settle method. * storagenode/orders: Add some log fields to error msgs Add some relevant log fields to some logged errors of the sender settle method. * satellite/orders: Remove always nil error from debug Remove an error which as logged in debug level which was always nil and makes the logic that used this variable clear. * storagenode/orders: Don't return error Archiving unsent Don't stop the process which archive unsent orders if some of them aren't found the DB because it cause the Storage Node to stop with a fatal error.	2019-08-16 16:53:22 +02:00
Cameron	497f10d7b1	add method CleanArchive to delete archived orders (#2796 )	2019-08-15 12:56:33 -04:00

1 2 3 4 5

224 Commits