Commit Graph

389 Commits

Author SHA1 Message Date
Natalie Villasana
855fca003d satellite/metrics: create a metrics chore (#3263)
* add metrics counter and chore

* updates metrics observer interval release default and dev default to 15min

* add more specific check for remote pointers

* add Counter field to metrics chore, add counter tests

* rm redundant ObjectCount suffix

* make pointer check easier to read

* change metrics.Config.Interval to ChoreInterval

* rm unneeded var

* fix comment

* update satellite config lock
2019-10-16 14:08:33 -04:00
Cameron
76ad83f12c
satellite/accounting: add redis support to live accounting (#3213)
* set up redis support in live accounting

* move live.Service interface into accounting package and rename to Cache, pass into satellite

* refactor Cache to store one int64 total, add IncrBy method to redis client implementation

* add monkit tracing to live accounting
2019-10-16 12:50:29 -04:00
Bryan White
951c2891b9
{versioncontrol,internal/version}: add rollout to versioncontrol server (#3176) 2019-10-16 10:16:59 +02:00
Ethan Adams
1ad2ba7e3e
storagenode/gracefulexit: Add graceful exit chore and worker. (#3262)
Adds graceful exit chore and worker for V3-2614
2019-10-15 11:29:47 -04:00
Jess G
87a426f228 internal/testplanet: add satellite.API to testplanet (#3237) 2019-10-14 16:01:53 -04:00
Jennifer Li Johnson
b185dbbee2
satellite/discovery: remove discovery related code (#3175) 2019-10-14 10:57:01 -04:00
Ethan Adams
a1275746b4
satellite/gracefulexit: Implement the 'process' endpoint on the satellite (#3223) 2019-10-11 17:18:05 -04:00
Cameron
d17be58237 remove random sleep in storagenode contact (#3243) 2019-10-11 16:44:18 -04:00
Egon Elbre
e9c36d560f
satellite: make PointerDB an argument to satellite.New (#3233) 2019-10-10 21:06:26 +03:00
Egon Elbre
f60a7baf17
internal/testplanet: ensure that metainfo schema gets dropped (#3229) 2019-10-10 17:04:08 +03:00
Ethan Adams
4c4519f0be
satellite/gracefulexit: add transfer queue for pieces (#3174)
initial impl of transfer queue
updated docs represent the new design how we handle durability during exit
2019-10-07 16:38:05 -04:00
Jennifer Li Johnson
7ceaabb18e
Delete Bootstrap and Kademlia (#2974) 2019-10-04 16:48:41 -04:00
Michal Niewrzal
b25e0154c9
internal/testplanet: use postgres for pointerDB (#3139) 2019-10-04 07:12:21 -07:00
Maximillian von Briesen
08ed50bcaa
satellite/metainfo: add commit interval to prevent long delays between order limit creation and segment commit (#3149) 2019-10-01 12:55:02 -04:00
Jennifer Li Johnson
755cbd4dce
storagenode/main: map aliases for kademlia config values (#3118) 2019-09-30 19:33:00 -04:00
Jennifer Li Johnson
29b96a666b
internal/testplanet: fix conn leak (#3132) 2019-09-27 09:47:57 -06:00
Cameron
c874dae596 internal/testplanet: ensure monitor chore is finished before contacting satellite (#3124) 2019-09-26 16:14:39 +03:00
Jeff Wendling
098cbc9c67 all: use pkg/rpc instead of pkg/transport
all of the packages and tests work with both grpc and
drpc. we'll probably need to do some jenkins pipelines
to run the tests with drpc as well.

most of the changes are really due to a bit of cleanup
of the pkg/transport.Client api into an rpc.Dialer in
the spirit of a net.Dialer. now that we don't need
observers, we can pass around stateless configuration
to everything rather than stateful things that issue
observations. it also adds a DialAddressID for the
case where we don't have a pb.Node, but we do have an
address and want to assert some ID. this happened
pretty frequently, and now there's no more weird
contortions creating custom tls options, etc.

a lot of the other changes are being consistent/using
the abstractions in the rpc package to do rpc style
things like finding peer information, or checking
status codes.

Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412
2019-09-25 15:37:06 -06:00
Stefan Benten
c71f3a3f4a internal/version: Change default endpoint to query (#3126)
* change default domain name

change default domain name to point to the new version control

* Update satellite-config.yaml.lock
2019-09-25 22:55:38 +02:00
Egon Elbre
94bbb9563d
internal/testplanet: set intervals to 15s by default (#3103) 2019-09-25 18:41:24 +03:00
Isaac Hess
580e511b4c
storagenode/storagenodedb: Migrate to separate dbs (#3081)
* storagenode/storagenodedb: Migrate to separate dbs

* storagenode/storagenodedb: Add migration to drop versions tables

* Put drop table statements into a transaction.

* Fix CI errors.

* Fix CI errors.

* Changes requested from PR feedback.

* storagenode/storagenodedb: fix tx commit
2019-09-23 12:36:46 -07:00
Jennifer Li Johnson
d2502bb51b Adds tests for kad replacement and restores kad operator configs (#3094)
* test that all nodes can check in with all satellites

* keep kademlia config

* add untrusted satellite test

* use getversion

* remove kademlia config changes in test-sim-backwards.sh

* add kademlia flags back to storj-sim storagenode

* reset kademlia flags in storagenode entrypoint
2019-09-20 16:02:23 -04:00
Egon Elbre
1ed724b7a6
internal/migrate: make rebind optional (#3071) 2019-09-20 19:26:07 +03:00
Egon Elbre
02fb891d27 internal/testplanet: reenable TestUplinksParallel (#2338) 2019-09-20 10:45:04 +03:00
Isaac Hess
8b358ef365
internal/dbutil/sqliteutil: Fix error handling, ensure connections are closed (#3078)
* internal/dbutil/sqliteutil: Fix error handling, ensure connections are closed

* internal/dbutil/sqliteutil: Separate function to handle conn

* internal/dbutil/sqliteutil: Fix names
2019-09-19 15:21:03 -06:00
Jennifer Li Johnson
724bb44723
Remove Kademlia dependencies from Satellite and Storagenode (#2966)
What:

cmd/inspector/main.go: removes kad commands
internal/testplanet/planet.go: Waits for contact chore to finish
satellite/contact/nodesservice.go: creates an empty nodes service implementation
satellite/contact/service.go: implements Local and FetchInfo methods & adds external address config value
satellite/discovery/service.go: replaces kad.FetchInfo with contact.FetchInfo in Refresh() & removes Discover()
satellite/peer.go: sets up contact service and endpoints
storagenode/console/service.go: replaces nodeID with contact.Local()
storagenode/contact/chore.go: replaces routing table with contact service
storagenode/contact/nodesservice.go: creates empty implementation for ping and request info nodes service & implements RequestInfo method
storagenode/contact/service.go: creates a service to return the local node and update its own capacity
storagenode/monitor/monitor.go: uses contact service in place of routing table
storagenode/operator.go: moves operatorconfig from kad into its own setup
storagenode/peer.go: sets up contact service, chore, pingstats and endpoints
satellite/overlay/config.go: changes NodeSelectionConfig.OnlineWindow default to 4hr to allow for accurate repair selection
Removes kademlia setups in:

cmd/storagenode/main.go
cmd/storj-sim/network.go
internal/testplane/planet.go
internal/testplanet/satellite.go
internal/testplanet/storagenode.go
satellite/peer.go
scripts/test-sim-backwards.sh
scripts/testdata/satellite-config.yaml.lock
storagenode/inspector/inspector.go
storagenode/peer.go
storagenode/storagenodedb/database.go
Why: Replacing Kademlia

Please describe the tests:
• internal/testplanet/planet_test.go:

TestBasic: assert that the storagenode can check in with the satellite without any errors
TestContact: test that all nodes get inserted into both satellites' overlay cache during testplanet setup
• satellite/contact/contact_test.go:

TestFetchInfo: Tests that the FetchInfo method returns the correct info
• storagenode/contact/contact_test.go:

TestNodeInfoUpdated: tests that the contact chore updates the node information
TestRequestInfoEndpoint: tests that the Request info endpoint returns the correct info
Please describe the performance impact: Node discovery should be at least slightly more performant since each node connects directly to each satellite and no longer needs to wait for bootstrapping. It probably won't be faster in real time on start up since each node waits a random amount of time (less than 1 hr) to initialize its first connection (jitter).
2019-09-19 15:56:34 -04:00
Jess G
93788e5218
remove kademlia: create upsert query to update uptime (#2999)
* create upsert query for check-in method

* add tests

* fix lint err

* add benchmark test for db query

* fix lint and tests

* add a unit test, fix lint

* add address to tests

* replace print w/ b.Fatal

* refactor query per CR comments

* fix disqualified, only set if null

* fix query

* add version to updatecheckin query

* fix version

* fix tests

* change version for tests

* add version to tests

* add IP, add transport, mv unit test

* use node.address as arg

* add last ip

* fix lint
2019-09-19 11:37:31 -07:00
JT Olio
946ec201e2
metainfo: move api keys to part of the request (#3069)
What: we move api keys out of the grpc connection-level metadata on the client side and into the request protobufs directly. the server side still supports both mechanisms for backwards compatibility.

Why: dRPC won't support connection-level metadata. the only thing we currently use connection-level metadata for is api keys. we need to move all information needed by a request into the request protobuf itself for drpc support. check out the .proto changes for the main details.

One fun side-fact: Did you know that protobuf fields 1-15 are special and only use one byte for both the field number and type? Additionally did you know we don't use field 15 anywhere yet? So the new request header will use field 15, and should use field 15 on all protobufs going forward.

Please describe the tests: all existing tests should pass

Please describe the performance impact: none
2019-09-19 10:19:29 -06:00
Isaac Hess
fd20fa38c6
internal/dbutil/sqliteutil: add MigrateTablesToDatabase (#3064)
* internal/dbutil/sqliteutil: add migrator

* internal/dbutil/sqliteutil: Fix errors and tablename
2019-09-17 15:42:40 -06:00
Jess G
7c203b4884
add satelliteSystem to testplanet and update tests (#3066) 2019-09-17 13:14:49 -07:00
littleskunk
1d8cd526e0 storj-sim: correct storagenode dashboard config (#3010) 2019-09-12 15:20:52 +03:00
Egon Elbre
8b668ab1f8
satellite/metainfo.Loop: use a parsed path for observers (#3003) 2019-09-12 13:38:49 +03:00
Natalie Villasana
aa3567187e
satellite/audit: worker now verifies and reverifies (#2965) 2019-09-11 18:37:01 -04:00
Natalie Villasana
dbe90926ca
internal/testplanet: reduce coalesce duration (#3009) 2019-09-11 18:15:14 -04:00
Michal Niewrzal
40ff56f6c7
versioncontrol: add new version schema (#2991) 2019-09-11 07:01:36 -07:00
Egon Elbre
b9e5331193
internal/migrate: remove duplicate logging (#3000) 2019-09-11 16:34:48 +03:00
Isaac Hess
0b32572ae6
migrate: Allow work on separate dbs (#2996) 2019-09-10 13:42:23 -06:00
Egon Elbre
a801fab66a
all: add archview annotations (#2964) 2019-09-10 16:24:16 +03:00
Jennifer Li Johnson
3387750280 storagenode/contact: create chore for nodes to ping satellites (#2877)
Creates a chore for nodes to announce themselves to their trusted satellites. Runs on startup and every hour thereafter
2019-09-06 12:14:03 -04:00
Michal Niewrzal
587be8f206
auto-updater: download versions and archive with binary (#2922) 2019-09-05 23:10:05 +02:00
Natalie Villasana
6d363fb756
satellite/audit: create the audit queue, chore, and worker (#2888) 2019-09-05 11:40:52 -04:00
Michal Niewrzal
61168493dc
uplink: don't stop deleting segments on first error (#2943) 2019-09-05 14:25:30 +02:00
Cameron
af5fb8e9c5
satellite/vouchers: deprecate voucher endpoint, return 'please upgrade' error (#2940)
* voucher endpoint returns 'please upgrade' error, test
2019-09-04 13:21:02 -04:00
Yaroslav Vorobiov
dadd7327df
satellite/nodestats: return storage usage in Byte*hours (#2858) 2019-09-04 19:05:34 +03:00
Yaroslav Vorobiov
758f7cb3dd
storagenode/bandwidth: remove bandwidth concerns from console, add satellite summary (#2923) 2019-09-04 17:01:55 +03:00
Michal Niewrzal
a6721ba92f
satellite/metainfo: Improve metainfo ListSegments (#2882) 2019-08-30 23:30:18 +02:00
Egon Elbre
62e3bf5b34 storagenode/retain: fix concurrency issues (#2828)
* nicer flags

* fix concurrency

* add concurrent workers

* initialize things

* fix tests

* close retain service

* ensure we don't have workers working on the same satellite

* ensure things compile

* fix other compilation issues:

* concurrency changes

ran this with `go test -count=1000` and it passed all of them.

- we add a closed channel so that we can select on it with
  context cancellation.
- we put a once in so we only close the channel once.
- every time the queue/running state changes, we have to broadcast
  because we may want to wake up N pending Wait calls or other
  concurrent workers.
- because we broadcast, we don't need to do the polling in Wait
  anymore.
- ensure Run doesn't start multiple times so that we don't have
  to worry about concurrent Close with multiple Runs.
- hold the lock while we start workers so that a concurrent Close
  with Run can't decide that there's nothing started and exit
  and then have Run start things.
- make sure to poll the closed/context channels through loops
  or at the start of Run calls in case Close happens first.
- these polls should be under a mutex because they have a default
  case which makes it possible to schedule such that Close hasn't
  executed the channel close so it starts more work.
- cancel a local Run context when it's going to exit to make sure
  that any retainPieces calls have a canceled context.
- hopefully enough comments to both check my work and help readers
  digest what's going on.

Change-Id: Ida0e226a7e01e8ae64fa2c59dd5a84b04bccfbd7

* use the retain error class

Change-Id: I1511eaef135f98afd57b878e997e4c8a0d11cafc

* concurrency fixes again

- forgot to update the gc test to use the old Wait api.
- we need to drop the lock while we wait for the workers
  to exit, because they may be blocked on the condition
  variable
- additionally, we need to broadcast when we close the
  signal channel because the state changed: they want
  to wake up and exit.

Change-Id: I4204699792275260cd912f29aa73720f7d9b14b5

* undo my misguided rename

Change-Id: I6baffe1eb0434e260212c485bbcc01bed3250881

* remove pollInterval

* format paragraph more nicely

* move skew calculation into retain pieces
2019-08-28 16:35:25 -04:00
Cameron
599324c364
satellite/dbcleanup: delete expired serials from satellite (#2867)
Creates a new chore, dbcleanup, which can be used for routine deletion of items from the satellite database and adds functionality for deletion of expired serial numbers
2019-08-27 13:12:38 -04:00
Egon Elbre
c309bd3fec
lint: add linting for errs package (#2881) 2019-08-27 19:07:12 +03:00
Cameron
1f3537d4a9 storagenode/vouchers: remove storagenode vouchers (#2873) 2019-08-26 19:35:19 +03:00