Commit Graph

415 Commits

Author SHA1 Message Date
Egon Elbre
4b85d3d739
internal/testplanet: better error message when postgres is not defined (#3539) 2019-11-11 17:05:21 +02:00
Natalie Villasana
68a7790069 satellite/gracefulexit: select new node filtered by Distinct IP (#3435) 2019-11-06 16:38:51 -05:00
Caleb Case
3b78addb2d
Metadata Access from Uplink CLI (#3310) 2019-11-06 11:04:12 -05:00
Egon Elbre
df923c2a8f internal/dbutil/pgutil: retry create schema on a concurrent call (#3510) 2019-11-06 13:12:20 +01:00
Maximillian von Briesen
257d3946d5
storagenode/gracefulexit: allow storagenodes to concurrently transfer pieces for graceful exit (#3478) 2019-11-05 10:33:44 -05:00
Jennifer Li Johnson
aa7d15a365
storagenode/contact: exponential backoff retries for pinging Satellites (#3372) 2019-11-04 16:03:21 -05:00
Jess G
5abb91afcf
satellite: change the Peer name to Core (#3472)
* change satellite.Peer name to Core

* change to Core in testplanet

* missed a few places

* keep shared stuff in peer.go to stay consistent with storj/docs
2019-11-04 11:01:02 -08:00
Isaac Hess
4d26d0a6a6 storagenode/pieces: Add migration from v0 piece to v1 piece (#3401) 2019-11-04 17:59:45 +01:00
Jess G
8d92c288e2
satellitedb: separate migration into subcommand (#3436)
* separate sadb migration, add version check

* update checkversion to do same validation as migration

* changes per CR

* add sa migration to storj-sim

* add different debug port in storj-sim for migration

* add wait for exit for storj-sim migration

* update sa docker entrypoint to support migration

* storj-sim satellite parts all wait for migration

* upgrade golang-migrate/migrate to v4 because bug

* fix go mod tidy
2019-11-02 13:09:07 -07:00
JT Olio
41c0093e5b drpc: enable by default (#3452) 2019-11-01 22:43:24 +01:00
Maximillian von Briesen
590312970d satellite/gracefulexit: add flag for enabling/disabling graceful exit on the satellite (#3437) 2019-11-01 16:21:24 +02:00
Maximillian von Briesen
d9bb25b4b9 satellite/metainfo: support a wider range of values for RS.Total in satellite metainfo validation (#3431)
change uplink RS default configuration from 130 to 95
2019-10-31 15:04:33 -04:00
Michal Niewrzal
015350e230
storagenode-updater: add autoupdating (#3422) 2019-10-31 05:27:53 -07:00
Jeff Wendling
59f81a4a0d groupcancel/ec delete: add a timeout based on completion times
we used to do something similar for puts, but that ended up hurting
more than it helped. since deletes are best effort, we can do it
here to kill long tails or unresponsive nodes.

Change-Id: I89fd2d9dcf519d76c78ddad70bc419d1868d2df1
2019-10-30 16:18:39 -06:00
Yingrong Zhao
bfa6699e2c
satellite/repair: add timeout for repair download from a single node(#3418) 2019-10-30 16:31:08 -04:00
Natalie Villasana
4878135068
satellite/gracefulexit, storagenode/gracefulexit: add timeouts (#3407) 2019-10-30 13:40:57 -04:00
Cameron
b2ff13f1fa
{cmd/satellite, storj/satellite}: create command to run repair process in isolation (#3341)
* set up satellite repair run command

* add separated repair process to storj-sim

* add repairer peer to satellite in testplanet

* move api run cmd into api.go

* add satellite run repair to entrypoint
2019-10-29 10:55:57 -04:00
Ivan Fraixedes
016be4525a
internal/textcontext: Give proper name compiled bin (#3395)
Give a proper name when a Go binary is compiled but the passed package
is the current working directory which is passed as an empty string.
2019-10-28 17:54:55 +01:00
Egon Elbre
93353df4d6
internal/sync2: make Fence accept context (#3393) 2019-10-28 16:04:31 +02:00
Bryan White
d61b3688f7 internal/version: fix OldMinimum to use old semver type (#3373)
* fix version checker type assertion

* more fixing
2019-10-25 13:24:23 +02:00
Bryan White
0b678c23c7 {internal/version,versioncontrol}: fix old versions (#3359)
* fix old semver

* go jenkins!
2019-10-24 22:24:28 +02:00
Yingrong Zhao
fa1ac24e19
satellite/gracefulexit: add failure threshold check (#3329)
* add overall failure percentage check and inactive time frame check before sending a response to sno

* update comment

* delete node from transfer queue if it has been inactive for too long

* fix linting error

* add test config value

* fix nil pointer

* add config value into testplanet

* add unit test for overall failure threshold

* move timeframe threshold to chore

* update protolock

* add chore test

* add per peiece failure count logic

* change config name from EndpointMaxFailures to MaxFailuresPerPiece

* address comments

* fix linting error

* add error handling for no row returned from progress table

* fix test for graceful exit chore on storagenode

* fix typo InActive -> Inactive

* improve readability for failure threshold calculation

* update config lock

* change error handling for GetProgress in graceful exit endpoint on the satellite side

* return proper rpc error in endpoint

* add check in chore test for checking finish timestamp and queue
2019-10-24 12:24:42 -04:00
JT Olio
2c6fa3c5f8
pkg/rpc: remove read/write deadlines as a mechanism for request timeouts (#3335)
libuplink was incorrectly setting timeouts to 10 seconds still, but
should have been at least 10 minutes. the order sender was setting them
to 1 hour. we don't want timeouts in uplink-side logic as it establishes
a minimum rate on tcp streams.

instead of all of this, just use tcp keep alive. tcp keep alive packets are
sent every 15 seconds and if the peer stops responding the connection
dies. this is enabled by default with go. this will kill tcp connections
when they stop working.

Change-Id: I3d7ad49f71950b3eb43044eedf4b17993116045b
2019-10-22 17:57:24 -06:00
Bryan White
f468816f13
{internal/version,versioncontrol,cmd/storagenode-updater}: add rollout to storagenode updater (#3276) 2019-10-21 12:50:59 +02:00
Bryan White
243ba1cb17
{versioncontrol,internal/version,cmd/*}: refactor version control (#3253) 2019-10-20 09:56:23 +02:00
Egon Elbre
89ed997706
satellite/satellitedb: switch to postgres only (#3320) 2019-10-18 22:03:10 +03:00
Natalie Villasana
855fca003d satellite/metrics: create a metrics chore (#3263)
* add metrics counter and chore

* updates metrics observer interval release default and dev default to 15min

* add more specific check for remote pointers

* add Counter field to metrics chore, add counter tests

* rm redundant ObjectCount suffix

* make pointer check easier to read

* change metrics.Config.Interval to ChoreInterval

* rm unneeded var

* fix comment

* update satellite config lock
2019-10-16 14:08:33 -04:00
Cameron
76ad83f12c
satellite/accounting: add redis support to live accounting (#3213)
* set up redis support in live accounting

* move live.Service interface into accounting package and rename to Cache, pass into satellite

* refactor Cache to store one int64 total, add IncrBy method to redis client implementation

* add monkit tracing to live accounting
2019-10-16 12:50:29 -04:00
Bryan White
951c2891b9
{versioncontrol,internal/version}: add rollout to versioncontrol server (#3176) 2019-10-16 10:16:59 +02:00
Ethan Adams
1ad2ba7e3e
storagenode/gracefulexit: Add graceful exit chore and worker. (#3262)
Adds graceful exit chore and worker for V3-2614
2019-10-15 11:29:47 -04:00
Jess G
87a426f228 internal/testplanet: add satellite.API to testplanet (#3237) 2019-10-14 16:01:53 -04:00
Jennifer Li Johnson
b185dbbee2
satellite/discovery: remove discovery related code (#3175) 2019-10-14 10:57:01 -04:00
Ethan Adams
a1275746b4
satellite/gracefulexit: Implement the 'process' endpoint on the satellite (#3223) 2019-10-11 17:18:05 -04:00
Cameron
d17be58237 remove random sleep in storagenode contact (#3243) 2019-10-11 16:44:18 -04:00
Egon Elbre
e9c36d560f
satellite: make PointerDB an argument to satellite.New (#3233) 2019-10-10 21:06:26 +03:00
Egon Elbre
f60a7baf17
internal/testplanet: ensure that metainfo schema gets dropped (#3229) 2019-10-10 17:04:08 +03:00
Ethan Adams
4c4519f0be
satellite/gracefulexit: add transfer queue for pieces (#3174)
initial impl of transfer queue
updated docs represent the new design how we handle durability during exit
2019-10-07 16:38:05 -04:00
Jennifer Li Johnson
7ceaabb18e
Delete Bootstrap and Kademlia (#2974) 2019-10-04 16:48:41 -04:00
Michal Niewrzal
b25e0154c9
internal/testplanet: use postgres for pointerDB (#3139) 2019-10-04 07:12:21 -07:00
Maximillian von Briesen
08ed50bcaa
satellite/metainfo: add commit interval to prevent long delays between order limit creation and segment commit (#3149) 2019-10-01 12:55:02 -04:00
Jennifer Li Johnson
755cbd4dce
storagenode/main: map aliases for kademlia config values (#3118) 2019-09-30 19:33:00 -04:00
Jennifer Li Johnson
29b96a666b
internal/testplanet: fix conn leak (#3132) 2019-09-27 09:47:57 -06:00
Cameron
c874dae596 internal/testplanet: ensure monitor chore is finished before contacting satellite (#3124) 2019-09-26 16:14:39 +03:00
Jeff Wendling
098cbc9c67 all: use pkg/rpc instead of pkg/transport
all of the packages and tests work with both grpc and
drpc. we'll probably need to do some jenkins pipelines
to run the tests with drpc as well.

most of the changes are really due to a bit of cleanup
of the pkg/transport.Client api into an rpc.Dialer in
the spirit of a net.Dialer. now that we don't need
observers, we can pass around stateless configuration
to everything rather than stateful things that issue
observations. it also adds a DialAddressID for the
case where we don't have a pb.Node, but we do have an
address and want to assert some ID. this happened
pretty frequently, and now there's no more weird
contortions creating custom tls options, etc.

a lot of the other changes are being consistent/using
the abstractions in the rpc package to do rpc style
things like finding peer information, or checking
status codes.

Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412
2019-09-25 15:37:06 -06:00
Stefan Benten
c71f3a3f4a internal/version: Change default endpoint to query (#3126)
* change default domain name

change default domain name to point to the new version control

* Update satellite-config.yaml.lock
2019-09-25 22:55:38 +02:00
Egon Elbre
94bbb9563d
internal/testplanet: set intervals to 15s by default (#3103) 2019-09-25 18:41:24 +03:00
Isaac Hess
580e511b4c
storagenode/storagenodedb: Migrate to separate dbs (#3081)
* storagenode/storagenodedb: Migrate to separate dbs

* storagenode/storagenodedb: Add migration to drop versions tables

* Put drop table statements into a transaction.

* Fix CI errors.

* Fix CI errors.

* Changes requested from PR feedback.

* storagenode/storagenodedb: fix tx commit
2019-09-23 12:36:46 -07:00
Jennifer Li Johnson
d2502bb51b Adds tests for kad replacement and restores kad operator configs (#3094)
* test that all nodes can check in with all satellites

* keep kademlia config

* add untrusted satellite test

* use getversion

* remove kademlia config changes in test-sim-backwards.sh

* add kademlia flags back to storj-sim storagenode

* reset kademlia flags in storagenode entrypoint
2019-09-20 16:02:23 -04:00
Egon Elbre
1ed724b7a6
internal/migrate: make rebind optional (#3071) 2019-09-20 19:26:07 +03:00
Egon Elbre
02fb891d27 internal/testplanet: reenable TestUplinksParallel (#2338) 2019-09-20 10:45:04 +03:00