Commit Graph

653 Commits

Author SHA1 Message Date
Maximillian von Briesen
abb567f6ae
cmd/satellite: add graceful exit reports command to satellite CLI (#3300)
* update lock file and add comment

* add created at and bytes transferred

* cleanup

* rename db func to GetGracefulExitNodesByTimeFrame

* fix flag

* split into two overlay functions

* := to =

* fix test

* add node not found error class

* fix overlay test

* suggested test changes

* review suggestions

* get exit status from overlay.Get()

* check rows.Err

* fix panic when ExitFinishedAt is nil

* fix comments in cmdGracefulExit
2019-10-22 21:06:01 -04:00
JT Olio
2c6fa3c5f8
pkg/rpc: remove read/write deadlines as a mechanism for request timeouts (#3335)
libuplink was incorrectly setting timeouts to 10 seconds still, but
should have been at least 10 minutes. the order sender was setting them
to 1 hour. we don't want timeouts in uplink-side logic as it establishes
a minimum rate on tcp streams.

instead of all of this, just use tcp keep alive. tcp keep alive packets are
sent every 15 seconds and if the peer stops responding the connection
dies. this is enabled by default with go. this will kill tcp connections
when they stop working.

Change-Id: I3d7ad49f71950b3eb43044eedf4b17993116045b
2019-10-22 17:57:24 -06:00
Ethan Adams
3e0d12354a
storagenode/gracefulexit: Implement storage node graceful exit worker - part 1 (#3322) 2019-10-22 16:42:21 -04:00
Nikolai Siedov
3eec4e9070
satellite/console: add REST delete API endpoint (#3337) 2019-10-22 19:17:09 +03:00
Vitalii Shpital
867771787f
web/satellite: project owner status added on team page (#3160) 2019-10-22 14:12:49 +03:00
Michal Niewrzal
04c2454c71
satellite/metainfo: pass streamID/segmentID between Batch request/response (#3311) 2019-10-22 03:23:22 -07:00
Nikolai Siedov
6dd478e43c
satellite/console: forgot password, resend email endpoints added, default http route replaced with gorilla mux (#3327) 2019-10-21 19:42:49 +03:00
Nikolai Siedov
1814fbfa89
satellite/console: new passwordChange API endpoint (#3308) 2019-10-21 15:48:29 +03:00
Bryan White
f468816f13
{internal/version,versioncontrol,cmd/storagenode-updater}: add rollout to storagenode updater (#3276) 2019-10-21 12:50:59 +02:00
Bryan White
243ba1cb17
{versioncontrol,internal/version,cmd/*}: refactor version control (#3253) 2019-10-20 09:56:23 +02:00
Egon Elbre
3c438f31bd
satellite/satellitedb: remove sqlite support (#3296) 2019-10-19 00:27:57 +03:00
Egon Elbre
89ed997706
satellite/satellitedb: switch to postgres only (#3320) 2019-10-18 22:03:10 +03:00
littleskunk
2a5526fcc4
satellite/repair: reduce timeout (#3302) 2019-10-18 13:43:24 +02:00
Ivan Fraixedes
071d1c4313
upload: Add more info to returned error response & to logs (#3218)
* uplink/storage/segments: return error no optimal threshold
  Return an error if the store get less uploaded pieces than the indicated
  by the optimal threshold.

* satellite/metainfo: Fix gRPC status error & add reason
  This commit fix the CommitSegment endpoint method to return an
  "Invalid Argument" status code when uplink submits invalid data which is
  detected when filtering invalid pieces by filterInvalidPieces endpoint
  method.

  Because filterInvalidPieces is also used by CommitSegmentOld, such
  method part has been changed accordingly.

  * An initial check in CommitSegment to detect earlier if uplink sends an
    invalid number of upload pieces.
  * Add more information to some log messages.
  * Return more information to uplink when it sends a number of invalid
    pieces which make impossible to finish the operation successfully.

* satellite/metainfo: Swap some "sugar" loggers to normal ones
  Swap "sugar" loggers to normal ones because they impact the performance
  in production systems and they should only be used under specific
  circumstances which were none of the ones changed.
2019-10-17 20:01:40 +02:00
Nikolai Siedov
f4162bd33f
satellite/console: add REST registration API endpoint (#3303) 2019-10-17 19:34:27 +03:00
Natalie Villasana
45c35d7c3f
satellite/satellitedb: add exit_status column to nodes table (#3301) 2019-10-17 11:01:39 -04:00
Yehor Butko
26cc625dc6
satellite/console: payments api (#3297) 2019-10-17 17:42:18 +03:00
Natalie Villasana
22d0f89941
satellite/gracefulexit: use zap.Stringer instead of zap.String (#3299) 2019-10-17 10:29:35 -04:00
Yaroslav Vorobiov
24e72f35d3
satellite/payments: token deposit (#3283) 2019-10-17 17:04:50 +03:00
Natalie Villasana
855fca003d satellite/metrics: create a metrics chore (#3263)
* add metrics counter and chore

* updates metrics observer interval release default and dev default to 15min

* add more specific check for remote pointers

* add Counter field to metrics chore, add counter tests

* rm redundant ObjectCount suffix

* make pointer check easier to read

* change metrics.Config.Interval to ChoreInterval

* rm unneeded var

* fix comment

* update satellite config lock
2019-10-16 14:08:33 -04:00
Cameron
76ad83f12c
satellite/accounting: add redis support to live accounting (#3213)
* set up redis support in live accounting

* move live.Service interface into accounting package and rename to Cache, pass into satellite

* refactor Cache to store one int64 total, add IncrBy method to redis client implementation

* add monkit tracing to live accounting
2019-10-16 12:50:29 -04:00
littleskunk
6e7607239c
satellite/repair: improve logging (#3287)
* satellite/repair: improve logging

* use Stringer wherever possible
2019-10-16 17:28:56 +02:00
Nikolai Siedov
c1033fc1ed
satellite/console: move token to separate endpoint (#3292) 2019-10-16 18:01:15 +03:00
littleskunk
2301a8287f Satellite/PieceHashValidation: Increase time window from 2h to 24h to avoid timezone issues (#3291) 2019-10-16 06:47:08 -06:00
littleskunk
eeb38245ff
satellite/audit: improve logging (#3285) 2019-10-16 13:48:05 +02:00
Ethan Adams
37ab84355f
satellite/gracefulexit: protobuf field name updates (#3284)
rename piece_id to original_piece_id
2019-10-15 15:59:12 -04:00
Yehor Butko
a5f4bbee22
satellite/payments: dbx scheme renamed, userID placed on Account level (#3281) 2019-10-15 21:05:45 +03:00
Natalie Villasana
cf430d2d73
scripts: add check-monitoring script to detect changes to monkit calls (#3114) 2019-10-15 13:00:14 -04:00
Yehor Butko
4d43b67ff9
add credit card to payment account (#3279) 2019-10-15 17:50:28 +03:00
Vitalii Shpital
f1867a954b
web/satellite: project members sorting fixed (#3231) 2019-10-15 15:24:53 +03:00
Yehor Butko
501816770b
satellite/payments: receive credit cards (#3249) 2019-10-15 14:23:54 +03:00
littleskunk
5b20c716e6 satellite/repair: dont error on deleted segments (#3252) 2019-10-15 05:39:28 +02:00
Jess G
87a426f228 internal/testplanet: add satellite.API to testplanet (#3237) 2019-10-14 16:01:53 -04:00
JT Olio
1b66517664 contact: small typo
Change-Id: If9126ae518b5672bfd9163a8c9fc518727d5138b
2019-10-14 13:21:05 -06:00
Jennifer Li Johnson
b185dbbee2
satellite/discovery: remove discovery related code (#3175) 2019-10-14 10:57:01 -04:00
littleskunk
96aeedcdee
OrderLimit/GracePeriod: Increase time window from 1h to 24h (#3255)
* OrderLimit/GracePeriod: Increase time window from 1h to 24h

* update satellite config lock
2019-10-13 17:40:24 +02:00
Ethan Adams
78ccf14837 fix interface and EOF check (#3251) 2019-10-12 07:06:20 -06:00
Ethan Adams
a1275746b4
satellite/gracefulexit: Implement the 'process' endpoint on the satellite (#3223) 2019-10-11 17:18:05 -04:00
JT Olio
a5d1776539 audits: missing continue
Change-Id: Ifcac8e61ebd8c59407e01c791adc60d9f88ff1b7
2019-10-11 13:55:27 -06:00
Yaroslav Vorobiov
535742d37e
satellite/payments: add coinpayments HTTP client (#3181) 2019-10-11 18:51:14 +03:00
Yehor Butko
451909b3ec
satellite/payments: account balance (#3242) 2019-10-11 18:00:35 +03:00
Egon Elbre
e9c36d560f
satellite: make PointerDB an argument to satellite.New (#3233) 2019-10-10 21:06:26 +03:00
Yehor Butko
0cc23add5b
satellite/payments - payment account setup (#3187) 2019-10-10 20:12:23 +03:00
Bogdan Artemenko
5f775b9e46
satellite/console: Added error for adding api key with existing name attempt. (#3185) 2019-10-10 16:28:35 +03:00
Maximillian von Briesen
784ca1582a
satellite/audit: fix audit panic (#3217) 2019-10-09 10:06:58 -04:00
Bogdan Artemenko
ed3a424858
satellite/console: Removed 'user not found' message on password reset request (#3184) 2019-10-09 13:18:58 +03:00
Maximillian von Briesen
3a3d576d9b satellite/audit: add mutex to pieceHashesVerified map (#3214)
* add mutex

* remove double send to ch

* lock mutex inside defer

* import sync
2019-10-08 17:01:32 -04:00
Maximillian von Briesen
f75893c1ba
satellite/overlay: do not include gracefully exiting nodes in node selection (#3211) 2019-10-08 15:03:38 -04:00
Egon Elbre
a744fdfef0 satellite/metainfo: remove Iterate from service (#3196)
* satellite/metainfo: remove Iterate from service

* fix test
2019-10-08 07:39:23 -07:00
Maximillian von Briesen
0ea0d8c3da
satellite/overlay: remove overlay.IsVetted (#3203) 2019-10-08 09:25:41 -04:00
littleskunk
c009543236
satellite/audit: Add piece hash verified to log messages (#3204) 2019-10-08 12:51:57 +02:00
Egon Elbre
73d4d83467 satellite/accounting: implement tally as an observer (#2992) 2019-10-07 16:55:20 -04:00
Ethan Adams
4c4519f0be
satellite/gracefulexit: add transfer queue for pieces (#3174)
initial impl of transfer queue
updated docs represent the new design how we handle durability during exit
2019-10-07 16:38:05 -04:00
Maximillian von Briesen
e1b7d01160
satellite/audit: do not fail or contain nodes for audited segments that are not piece-hash-verified (#3161) 2019-10-07 16:06:10 -04:00
Maximillian von Briesen
a75e3e6b81
satellite/repairer: fix segment_time_until_repair metric (#3199) 2019-10-07 13:54:12 -04:00
littleskunk
b908a09c8e
satellite/repair: remove deprecated error message (#3193) 2019-10-06 20:54:20 +02:00
Cameron
eb5413ae5e
defer close piecestore in downloadAndVerifyPiece (#3192) 2019-10-06 13:41:53 -04:00
Jennifer Li Johnson
7ceaabb18e
Delete Bootstrap and Kademlia (#2974) 2019-10-04 16:48:41 -04:00
Egon Elbre
6afa4dd9cf
satellite/accounting: refactor code and remove unused fields (#3178) 2019-10-04 22:09:52 +03:00
Michal Niewrzal
b25e0154c9
internal/testplanet: use postgres for pointerDB (#3139) 2019-10-04 07:12:21 -07:00
Nikolay Yurchenko
51003dcad2
web/satellie: added missing alt attribute to img tags (#3172) 2019-10-04 16:22:26 +03:00
Egon Elbre
394a9c82c3
satellite/{accounting/tally,repair/checker}: create a valid test pointer (#3167) 2019-10-04 13:05:25 +03:00
Yaroslav Vorobiov
f05a2eea5b satellite/console: remove domain prefix from token cookie key (#3170) 2019-10-04 10:23:52 +03:00
Jess G
6e6d0ad9b8 split satellite: create separate API process (#3152) 2019-10-02 19:02:47 -04:00
Stefan Benten
1db4251234 Satellite/repair: Add Repair Threshold Override to allow earlier repair (#3151) 2019-10-02 14:58:37 +02:00
Natalie Villasana
4f2f8ae11b satellite/overlay: add UpdateExitStatus and GetExitingNodes for graceful exit (#3087) 2019-10-01 18:18:21 -04:00
Maximillian von Briesen
08ed50bcaa
satellite/metainfo: add commit interval to prevent long delays between order limit creation and segment commit (#3149) 2019-10-01 12:55:02 -04:00
Maximillian von Briesen
edadf46009 satellite/audit: delete nodes from containment when segment has changed (#3115) 2019-09-29 04:03:15 +02:00
Bogdan Artemenko
423d35fb3f
satellite/console: Added support URLs and other fields to config file (#3090) 2019-09-27 10:48:53 -06:00
Yaroslav Vorobiov
acbe449435 satellite/console: remove payments (#3074) 2019-09-27 12:46:37 +03:00
Cameron
fd72de211c satellite/satellitedb: update node version columns in UpdateCheckIn (#3129)
* update node version columns in UpdateCheckIn

* tests and fix sqlite implementation

* check timestamps

* edit timestamp check
2019-09-26 02:07:39 +02:00
Jeff Wendling
098cbc9c67 all: use pkg/rpc instead of pkg/transport
all of the packages and tests work with both grpc and
drpc. we'll probably need to do some jenkins pipelines
to run the tests with drpc as well.

most of the changes are really due to a bit of cleanup
of the pkg/transport.Client api into an rpc.Dialer in
the spirit of a net.Dialer. now that we don't need
observers, we can pass around stateless configuration
to everything rather than stateful things that issue
observations. it also adds a DialAddressID for the
case where we don't have a pb.Node, but we do have an
address and want to assert some ID. this happened
pretty frequently, and now there's no more weird
contortions creating custom tls options, etc.

a lot of the other changes are being consistent/using
the abstractions in the rpc package to do rpc style
things like finding peer information, or checking
status codes.

Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412
2019-09-25 15:37:06 -06:00
Ethan Adams
9edfb6efe0
satellite/satellitedb: Initial GE Satellite DB Implementation (#3049)
Initial GE Satellite DB impl
Add basic CRUD operations for graceful_exit_progress and graceful_exit_transfer_queue tables.
2019-09-25 11:12:44 -06:00
Egon Elbre
c1fb791a49
satellite/accounting/rollup: pause in tests (#3122) 2019-09-25 19:54:34 +03:00
Egon Elbre
e82666c769
satellite/accounting/tally: ensure we have a root piece id (#3123) 2019-09-25 18:51:12 +03:00
Egon Elbre
ab3e3f827c
satellite/repair/checker: set erasure share size in tests (#3101) 2019-09-24 10:01:48 +03:00
Michal Niewrzal
607da4ab4a
metainfo: move FinishDeleteSegment logic to BeginDeleteSegment (#3104) 2019-09-23 14:41:58 -07:00
Egon Elbre
4bd1ce868f
satellite/metainfo: close loop separately to avoid logical races (#3100) 2019-09-23 22:14:39 +03:00
Egon Elbre
9ceff9f9c6 satellite/overlay: move CheckIn benchmark to overlay (#3095) 2019-09-20 16:35:52 -04:00
Bogdan Artemenko
69aa0c6cc4
satellite/console: GraphQL input length limitation. (#3045) 2019-09-20 20:40:26 +03:00
paul cannon
53db517154 satellite/overlay: don't use transport observers (#2989) 2019-09-19 16:22:50 -04:00
Jennifer Li Johnson
724bb44723
Remove Kademlia dependencies from Satellite and Storagenode (#2966)
What:

cmd/inspector/main.go: removes kad commands
internal/testplanet/planet.go: Waits for contact chore to finish
satellite/contact/nodesservice.go: creates an empty nodes service implementation
satellite/contact/service.go: implements Local and FetchInfo methods & adds external address config value
satellite/discovery/service.go: replaces kad.FetchInfo with contact.FetchInfo in Refresh() & removes Discover()
satellite/peer.go: sets up contact service and endpoints
storagenode/console/service.go: replaces nodeID with contact.Local()
storagenode/contact/chore.go: replaces routing table with contact service
storagenode/contact/nodesservice.go: creates empty implementation for ping and request info nodes service & implements RequestInfo method
storagenode/contact/service.go: creates a service to return the local node and update its own capacity
storagenode/monitor/monitor.go: uses contact service in place of routing table
storagenode/operator.go: moves operatorconfig from kad into its own setup
storagenode/peer.go: sets up contact service, chore, pingstats and endpoints
satellite/overlay/config.go: changes NodeSelectionConfig.OnlineWindow default to 4hr to allow for accurate repair selection
Removes kademlia setups in:

cmd/storagenode/main.go
cmd/storj-sim/network.go
internal/testplane/planet.go
internal/testplanet/satellite.go
internal/testplanet/storagenode.go
satellite/peer.go
scripts/test-sim-backwards.sh
scripts/testdata/satellite-config.yaml.lock
storagenode/inspector/inspector.go
storagenode/peer.go
storagenode/storagenodedb/database.go
Why: Replacing Kademlia

Please describe the tests:
• internal/testplanet/planet_test.go:

TestBasic: assert that the storagenode can check in with the satellite without any errors
TestContact: test that all nodes get inserted into both satellites' overlay cache during testplanet setup
• satellite/contact/contact_test.go:

TestFetchInfo: Tests that the FetchInfo method returns the correct info
• storagenode/contact/contact_test.go:

TestNodeInfoUpdated: tests that the contact chore updates the node information
TestRequestInfoEndpoint: tests that the Request info endpoint returns the correct info
Please describe the performance impact: Node discovery should be at least slightly more performant since each node connects directly to each satellite and no longer needs to wait for bootstrapping. It probably won't be faster in real time on start up since each node waits a random amount of time (less than 1 hr) to initialize its first connection (jitter).
2019-09-19 15:56:34 -04:00
Jess G
93788e5218
remove kademlia: create upsert query to update uptime (#2999)
* create upsert query for check-in method

* add tests

* fix lint err

* add benchmark test for db query

* fix lint and tests

* add a unit test, fix lint

* add address to tests

* replace print w/ b.Fatal

* refactor query per CR comments

* fix disqualified, only set if null

* fix query

* add version to updatecheckin query

* fix version

* fix tests

* change version for tests

* add version to tests

* add IP, add transport, mv unit test

* use node.address as arg

* add last ip

* fix lint
2019-09-19 11:37:31 -07:00
JT Olio
946ec201e2
metainfo: move api keys to part of the request (#3069)
What: we move api keys out of the grpc connection-level metadata on the client side and into the request protobufs directly. the server side still supports both mechanisms for backwards compatibility.

Why: dRPC won't support connection-level metadata. the only thing we currently use connection-level metadata for is api keys. we need to move all information needed by a request into the request protobuf itself for drpc support. check out the .proto changes for the main details.

One fun side-fact: Did you know that protobuf fields 1-15 are special and only use one byte for both the field number and type? Additionally did you know we don't use field 15 anywhere yet? So the new request header will use field 15, and should use field 15 on all protobufs going forward.

Please describe the tests: all existing tests should pass

Please describe the performance impact: none
2019-09-19 10:19:29 -06:00
Maximillian von Briesen
d22987ea1d
satellite/audit: Fix flakiness in TestReverifyDifferentShare 2019-09-19 10:50:16 -04:00
Michal Niewrzal
b07a490f95 satellite/metainfo: fix GetObject method (#3088) 2019-09-19 11:13:57 +02:00
Jess G
fae2c2c9f5 satellite/contact: return status codes from endpoint (#3086) 2019-09-19 11:01:34 +03:00
Maximillian von Briesen
a4048fd529 satellite/audit: fix containment mode (#3085)
* add test to make sure we will reverify the share in the containment db rather than in the pointer passed into reverify

* use pending audit information only when running reverify
2019-09-19 01:45:15 +02:00
Michal Niewrzal
1c72e80e40 uplink/satellite: fix for case when inline segment is last one (#3062)
* uplink/satellite: fix when inline seg is last one

* review comments
2019-09-19 01:18:14 +02:00
Cameron
ccdd435610
defer client.close() (#3084) 2019-09-18 16:17:04 -04:00
Jennifer Li Johnson
ce3203e910
update NodeSelectionConfig.OnlineWindow to 4hr default (#3082) 2019-09-18 14:57:57 -04:00
Maximillian von Briesen
574c96c350
satellite/metainfo: Verify storagenode signature on satellite upload (#2985) 2019-09-18 09:50:33 -04:00
Jess G
7c203b4884
add satelliteSystem to testplanet and update tests (#3066) 2019-09-17 13:14:49 -07:00
Natalie Villasana
cc70cd2329
satellite/repair: add metric trackers for segment age before repair (#3056) 2019-09-17 15:18:48 -04:00
Natalie Villasana
4aaf525bd3
satellite/audit: set devDefaults for ChoreInterval and QueueInterval to 1m (#3058) 2019-09-16 16:36:33 -04:00
Yingrong Zhao
b37ea864b1
satellite/repair: delete pieces that failed piece hashes verification from pointer (#3051)
* add test

* add implementation

* remove todo comments

* modifies cooment

* fix linting

* typo oops
2019-09-16 13:13:24 -04:00
Maximillian von Briesen
7ada5d4152
satellite/metainfo: make piece hashes nil before storing pointer in metainfo.UpdatePieces() (#3050) 2019-09-16 12:11:12 -04:00
Jess G
d3ef574b20 pkg/pb: minor changes to contact.proto (#3048)
* minor fixes to contact proto

* simply and rm nodeAddr object from client
2019-09-13 19:37:32 -05:00
Ethan Adams
886041e0ba
satellite/satellitedb: add new graceful exit tables and add graceful exit fields to nodes (#3033)
DB schema changes for satellite to support Graceful Exit
2019-09-13 12:57:32 -04:00
Yingrong Zhao
95aa33c964
satellite/repair/repairer: update audit status as failed after failing piece hash verification (#2997)
* update audit status as failed for nodes that failed piece hash verification

* remove comment

* fix lint error

* add test

* fix format

* use named return value for Get

* add comments

* add more better comment

* format
2019-09-13 12:21:20 -04:00