Commit Graph

40 Commits

Author SHA1 Message Date
Jeff Wendling
17b057b33e satellite/audit: monitor worker function
Change-Id: I94d1161deffe4ea9782abee1afbb5735f18aab44
2019-11-25 17:58:13 +00:00
Maximillian von Briesen
8653dda2b1 satellite/audit: do not contain nodes for unknown errors (#3592)
* skip unknown errors (wip)

* add tests to make sure nodes that time out are added to containment

* add bad blobs store

* call "Skipped" "Unknown"

* add tests to ensure unknown errors do not trigger containment

* add monkit stats to lockfile

* typo

* add periods to end of bad blobs comments
2019-11-19 17:30:28 +01:00
littleskunk
8b3444e088
satellite/nodeselection: don't select nodes that haven't checked in for a while (#3567)
* satellite/nodeselection: dont select nodes that havent checked in for a while

* change testplanet online window to one minute

* remove satellite reconfigure online window = 0 in repair tests

* pass timestamp into UpdateCheckIn

* change timestamp to timestamptz

* edit tests to set last_contact_success to 4 hours ago

* fix syntax error

* remove check for last_contact_success > last_contact_failure in IsOnline
2019-11-15 23:43:06 +01:00
Egon Elbre
ee6c1cac8a
private: rename internal to private (#3573) 2019-11-14 21:46:15 +02:00
Egon Elbre
cc032d3151 satellite/metainfo: fix some uses of metainfo.Delete (#3513)
* satellite/metainfo: rename Delete to UnsynchronizedDelete

* fix deletes

* make db private

* fix typos

* also verify on commit object
2019-11-06 18:02:14 +01:00
Maximillian von Briesen
7cdc1b351a satellite/audit: do not audit expired segments (#3497)
* during audit Verify, return error and delete segment if segment is expired

* delete "main" reverify segment and return error if expired

* delete contained nodes and pointers when pointers to audit are expired

* update testplanet.Upload and testplanet.UploadWithConfig to use an expiration time of an hour from now

* Revert "update testplanet.Upload and testplanet.UploadWithConfig to use an expiration time of an hour from now"

This reverts commit e9066151cf84afbff0929a6007e641711a56b6e5.

* do not count ExpirationDate=time.Time{} as expired
2019-11-05 20:41:48 +01:00
littleskunk
def3dcbaa9
satellite/audit: increase timeout to 5 minutes (#3480)
* satellite/audit: increase timeout to 5 minutes

* fix lint error
2019-11-05 11:21:25 +01:00
JT Olio
2c6fa3c5f8
pkg/rpc: remove read/write deadlines as a mechanism for request timeouts (#3335)
libuplink was incorrectly setting timeouts to 10 seconds still, but
should have been at least 10 minutes. the order sender was setting them
to 1 hour. we don't want timeouts in uplink-side logic as it establishes
a minimum rate on tcp streams.

instead of all of this, just use tcp keep alive. tcp keep alive packets are
sent every 15 seconds and if the peer stops responding the connection
dies. this is enabled by default with go. this will kill tcp connections
when they stop working.

Change-Id: I3d7ad49f71950b3eb43044eedf4b17993116045b
2019-10-22 17:57:24 -06:00
littleskunk
eeb38245ff
satellite/audit: improve logging (#3285) 2019-10-16 13:48:05 +02:00
JT Olio
a5d1776539 audits: missing continue
Change-Id: Ifcac8e61ebd8c59407e01c791adc60d9f88ff1b7
2019-10-11 13:55:27 -06:00
Maximillian von Briesen
784ca1582a
satellite/audit: fix audit panic (#3217) 2019-10-09 10:06:58 -04:00
Maximillian von Briesen
3a3d576d9b satellite/audit: add mutex to pieceHashesVerified map (#3214)
* add mutex

* remove double send to ch

* lock mutex inside defer

* import sync
2019-10-08 17:01:32 -04:00
littleskunk
c009543236
satellite/audit: Add piece hash verified to log messages (#3204) 2019-10-08 12:51:57 +02:00
Maximillian von Briesen
e1b7d01160
satellite/audit: do not fail or contain nodes for audited segments that are not piece-hash-verified (#3161) 2019-10-07 16:06:10 -04:00
Maximillian von Briesen
edadf46009 satellite/audit: delete nodes from containment when segment has changed (#3115) 2019-09-29 04:03:15 +02:00
Jeff Wendling
098cbc9c67 all: use pkg/rpc instead of pkg/transport
all of the packages and tests work with both grpc and
drpc. we'll probably need to do some jenkins pipelines
to run the tests with drpc as well.

most of the changes are really due to a bit of cleanup
of the pkg/transport.Client api into an rpc.Dialer in
the spirit of a net.Dialer. now that we don't need
observers, we can pass around stateless configuration
to everything rather than stateful things that issue
observations. it also adds a DialAddressID for the
case where we don't have a pb.Node, but we do have an
address and want to assert some ID. this happened
pretty frequently, and now there's no more weird
contortions creating custom tls options, etc.

a lot of the other changes are being consistent/using
the abstractions in the rpc package to do rpc style
things like finding peer information, or checking
status codes.

Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412
2019-09-25 15:37:06 -06:00
Jennifer Li Johnson
724bb44723
Remove Kademlia dependencies from Satellite and Storagenode (#2966)
What:

cmd/inspector/main.go: removes kad commands
internal/testplanet/planet.go: Waits for contact chore to finish
satellite/contact/nodesservice.go: creates an empty nodes service implementation
satellite/contact/service.go: implements Local and FetchInfo methods & adds external address config value
satellite/discovery/service.go: replaces kad.FetchInfo with contact.FetchInfo in Refresh() & removes Discover()
satellite/peer.go: sets up contact service and endpoints
storagenode/console/service.go: replaces nodeID with contact.Local()
storagenode/contact/chore.go: replaces routing table with contact service
storagenode/contact/nodesservice.go: creates empty implementation for ping and request info nodes service & implements RequestInfo method
storagenode/contact/service.go: creates a service to return the local node and update its own capacity
storagenode/monitor/monitor.go: uses contact service in place of routing table
storagenode/operator.go: moves operatorconfig from kad into its own setup
storagenode/peer.go: sets up contact service, chore, pingstats and endpoints
satellite/overlay/config.go: changes NodeSelectionConfig.OnlineWindow default to 4hr to allow for accurate repair selection
Removes kademlia setups in:

cmd/storagenode/main.go
cmd/storj-sim/network.go
internal/testplane/planet.go
internal/testplanet/satellite.go
internal/testplanet/storagenode.go
satellite/peer.go
scripts/test-sim-backwards.sh
scripts/testdata/satellite-config.yaml.lock
storagenode/inspector/inspector.go
storagenode/peer.go
storagenode/storagenodedb/database.go
Why: Replacing Kademlia

Please describe the tests:
• internal/testplanet/planet_test.go:

TestBasic: assert that the storagenode can check in with the satellite without any errors
TestContact: test that all nodes get inserted into both satellites' overlay cache during testplanet setup
• satellite/contact/contact_test.go:

TestFetchInfo: Tests that the FetchInfo method returns the correct info
• storagenode/contact/contact_test.go:

TestNodeInfoUpdated: tests that the contact chore updates the node information
TestRequestInfoEndpoint: tests that the Request info endpoint returns the correct info
Please describe the performance impact: Node discovery should be at least slightly more performant since each node connects directly to each satellite and no longer needs to wait for bootstrapping. It probably won't be faster in real time on start up since each node waits a random amount of time (less than 1 hr) to initialize its first connection (jitter).
2019-09-19 15:56:34 -04:00
Jess G
93788e5218
remove kademlia: create upsert query to update uptime (#2999)
* create upsert query for check-in method

* add tests

* fix lint err

* add benchmark test for db query

* fix lint and tests

* add a unit test, fix lint

* add address to tests

* replace print w/ b.Fatal

* refactor query per CR comments

* fix disqualified, only set if null

* fix query

* add version to updatecheckin query

* fix version

* fix tests

* change version for tests

* add version to tests

* add IP, add transport, mv unit test

* use node.address as arg

* add last ip

* fix lint
2019-09-19 11:37:31 -07:00
Maximillian von Briesen
d22987ea1d
satellite/audit: Fix flakiness in TestReverifyDifferentShare 2019-09-19 10:50:16 -04:00
Maximillian von Briesen
a4048fd529 satellite/audit: fix containment mode (#3085)
* add test to make sure we will reverify the share in the containment db rather than in the pointer passed into reverify

* use pending audit information only when running reverify
2019-09-19 01:45:15 +02:00
Jess G
7c203b4884
add satelliteSystem to testplanet and update tests (#3066) 2019-09-17 13:14:49 -07:00
Natalie Villasana
4aaf525bd3
satellite/audit: set devDefaults for ChoreInterval and QueueInterval to 1m (#3058) 2019-09-16 16:36:33 -04:00
Egon Elbre
7240e6cbb2
satellite: remove remote/inline file from BucketTally (#3041) 2019-09-13 16:51:41 +03:00
Egon Elbre
8b668ab1f8
satellite/metainfo.Loop: use a parsed path for observers (#3003) 2019-09-12 13:38:49 +03:00
Natalie Villasana
aa3567187e
satellite/audit: worker now verifies and reverifies (#2965) 2019-09-11 18:37:01 -04:00
Egon Elbre
a801fab66a
all: add archview annotations (#2964) 2019-09-10 16:24:16 +03:00
Natalie Villasana
6d363fb756
satellite/audit: create the audit queue, chore, and worker (#2888) 2019-09-05 11:40:52 -04:00
Yingrong Zhao
8eda360ad3
add segment path into logs (#2898) 2019-08-29 08:38:26 -04:00
Natalie Villasana
49303ea3ac
satellite/audit: mv ReservoirService into its own file (#2886) 2019-08-27 13:39:51 -04:00
Ivan Fraixedes
df29699641
satellite/audit: Improve code comment in reporter (#2838) 2019-08-22 14:13:43 +02:00
Egon Elbre
00b2e1a7d7 all: enable staticcheck (#2849)
* by having megacheck in disable it also disabled staticcheck

* fix closing body

* keep interfacer disabled

* hide bodies

* don't use deprecated func

* fix dead code

* fix potential overrun

* keep stylecheck disabled

* don't pass nil as context

* fix infinite recursion

* remove extraneous return

* fix data race

* use correct func

* ignore unused var

* remove unused consts
2019-08-22 13:40:15 +02:00
Egon Elbre
2d69d47655
all: fix Error.New formatting (#2840) 2019-08-21 19:30:29 +03:00
Natalie Villasana
243cedb628
satellite/audit: implement reservoir struct and RemoteSegment observer method (#2744) 2019-08-21 11:49:27 -04:00
Ivan Fraixedes
87f3b6c708 satellite/audit: Improve comments in verifier (#2829)
Improve some source code comments in the verifier.
2019-08-20 10:23:14 -04:00
Isaac Hess
25154720bd
lib/uplink: remove redis and bolt dependencies (#2812)
* identity: remove redis and bolt dependencies

* identity: move revDB creation to main files
2019-08-19 16:10:38 -06:00
Egon Elbre
c8edeb0257
satellite/overlay: rename overlay.Cache to overlay.Service (#2717) 2019-08-06 19:35:59 +03:00
ethanadams
c9b46f2fe2
V3-1987: Optimize audits stats persistence (#2632)
* Added batch update stats for recordAuditSuccessStatus
* Added batch update stats to recordAuditFailStatus
* added configurable batch size
* build individual update/delete statements so the statements can be batched into 1 call to the DB
* notified #config-changes channel and ran make update-satellite-config-lock
* updated tests to use batch update stats
2019-07-31 13:21:06 -04:00
aligeti
6dee5a3918
added more logging info (#2667) 2019-07-30 12:38:10 -04:00
aligeti
55e3a37d19
Adds more detailed error logging information V3-2239 (#2654)
* added more logging info
2019-07-30 12:03:25 -04:00
Egon Elbre
5d0816430f
rename all the things (#2531)
* rename pkg/linksharing to linksharing
* rename pkg/httpserver to linksharing/httpserver
* rename pkg/eestream to uplink/eestream
* rename pkg/stream to uplink/stream
* rename pkg/metainfo/kvmetainfo to uplink/metainfo/kvmetainfo
* rename pkg/auth/signing to pkg/signing
* rename pkg/storage to uplink/storage
* rename pkg/accounting to satellite/accounting
* rename pkg/audit to satellite/audit
* rename pkg/certdb to satellite/certdb
* rename pkg/discovery to satellite/discovery
* rename pkg/overlay to satellite/overlay
* rename pkg/datarepair to satellite/repair
2019-07-28 08:55:36 +03:00