this will allow for some nice runtime analysis down the road.
also, this allows for wrapping database handles in a way that
can interact with these contexts
requires https://review.dev.storj.io/c/storj/dbx/+/514
Change-Id: Ib087b7cd73296dd2c1e0331314da34d861f61d2b
With the new storage node downtime tracking feature, we need remove current uptime reputation configs: UptimeReputationAlpha, UptimeReputationBeta, and
UptimeReputationDQ. This is the first step of removing the uptime
reputation columns from satellitedb
Change-Id: Ie8fab13295dbf545e33aeda0c4306cda4ba54e36
Disqualifies a node when the node fails to complete a graceful
exit.
Adds a new DisqualifyNode method to the overlay cache, since there
wasn't an existing method to disqualify a node but do nothing else
to its stats.
Adds checks to existing tests to make sure that a storage node that
fails a graceful exit is marked as disqualified in the overlay
cache.
https: //storjlabs.atlassian.net/browse/V3-3342
Change-Id: I4d554a519ab59db31ad3b8e28764c8683a6e3888
overlay.GetOfflineNodesLimited
We only care about node ID, address, and last contact success/failure
from the downtime service, so the overlay should only return these
values for the downtime-specific queries.
Change-Id: I08a6ecfdd2a12b82cae62e87d6adeab53975bfce
Transactions in our code that might need to work against CockroachDB
need to be retried in the event of a retryable error. The transaction
helper functions in dbutil do that automatically. I am changing this
code to use those helpers instead.
Change-Id: Icd3da71448a84c582c6afdc6b52d1f345fe9469f
Adds the KnownReliable method to Overlay Service that filters all nodes
from the given list to be only reliable nodes (online and qualified).
The method return []*pb.Node of reliable nodes. The pb.Node values are
ready for dialing.
The first use case is when deleting an object to efficiently dial all
reliable nodes holding a piece of that object and send them a delete
request.
Change-Id: I13e0a8666f3807c5c31ef1a1087476018a5d3acb
Backstory: I needed a better way to pass around information about the
underlying driver and implementation to all the various db-using things
in satellitedb (at least until some new "cockroach driver" support makes
it to DBX). After hitting a few dead ends, I decided I wanted to have a
type that could act like a *dbx.DB but which would also carry
information about the implementation, etc. Then I could pass around that
type to all the things in satellitedb that previously wanted *dbx.DB.
But then I realized that *satellitedb.DB was, essentially, exactly that
already.
One thing that might have kept *satellitedb.DB from being directly
usable was that embedding a *dbx.DB inside it would make a lot of dbx
methods publicly available on a *satellitedb.DB instance that previously
were nicely encapsulated and hidden. But after a quick look, I realized
that _nothing_ outside of satellite/satellitedb even needs to use
satellitedb.DB at all. It didn't even need to be exported, except for
some trivially-replaceable code in migrate_postgres_test.go. And once
I made it unexported, any concerns about exposing new methods on it were
entirely moot.
So I have here changed the exported *satellitedb.DB type into the
unexported *satellitedb.satelliteDB type, and I have changed all the
places here that wanted raw dbx.DB handles to use this new type instead.
Now they can just take a gander at the implementation member on it and
know all they need to know about the underlying database.
This will make it possible for some other pending code here to
differentiate between postgres and cockroach backends.
Change-Id: I27af99f8ae23b50782333da5277b553b34634edc
* satellite/nodeselection: dont select nodes that havent checked in for a while
* change testplanet online window to one minute
* remove satellite reconfigure online window = 0 in repair tests
* pass timestamp into UpdateCheckIn
* change timestamp to timestamptz
* edit tests to set last_contact_success to 4 hours ago
* fix syntax error
* remove check for last_contact_success > last_contact_failure in IsOnline
* add overall failure percentage check and inactive time frame check before sending a response to sno
* update comment
* delete node from transfer queue if it has been inactive for too long
* fix linting error
* add test config value
* fix nil pointer
* add config value into testplanet
* add unit test for overall failure threshold
* move timeframe threshold to chore
* update protolock
* add chore test
* add per peiece failure count logic
* change config name from EndpointMaxFailures to MaxFailuresPerPiece
* address comments
* fix linting error
* add error handling for no row returned from progress table
* fix test for graceful exit chore on storagenode
* fix typo InActive -> Inactive
* improve readability for failure threshold calculation
* update config lock
* change error handling for GetProgress in graceful exit endpoint on the satellite side
* return proper rpc error in endpoint
* add check in chore test for checking finish timestamp and queue
* update lock file and add comment
* add created at and bytes transferred
* cleanup
* rename db func to GetGracefulExitNodesByTimeFrame
* fix flag
* split into two overlay functions
* := to =
* fix test
* add node not found error class
* fix overlay test
* suggested test changes
* review suggestions
* get exit status from overlay.Get()
* check rows.Err
* fix panic when ExitFinishedAt is nil
* fix comments in cmdGracefulExit
* create upsert query for check-in method
* add tests
* fix lint err
* add benchmark test for db query
* fix lint and tests
* add a unit test, fix lint
* add address to tests
* replace print w/ b.Fatal
* refactor query per CR comments
* fix disqualified, only set if null
* fix query
* add version to updatecheckin query
* fix version
* fix tests
* change version for tests
* add version to tests
* add IP, add transport, mv unit test
* use node.address as arg
* add last ip
* fix lint
* implement contact.checkin method
* add batching to update uptime checks
* rm batching
* rm other unneeded things
* fix lint
* fix unit test
* changes per CR comments
* couple more CR changes
* add identity check into grpcOpt
* fix lint
* why do you fix the test
* revert test change
* stop contact chore for repair test
* put node in cache
* comment out contact chore. See what happens
* Revert "comment out contact chore. See what happens"
This reverts commit 2e45008e36a50e0a842ae455ac83de77093d4daa.
* try stopping contact earlier
* stop contact chore in uplink_test
* replace self on chore with *RoutingTable for access to latest node info
* Revert "stop contact chore in uplink_test"
This reverts commit 302db70f4071112d1b9f7ee0279225ea12757723.
* Revert "try stopping contact earlier"
This reverts commit 806cc3b82f9d598899dafd83da9315a1cb0cb43c.
* Revert "stop contact chore for repair test"
This reverts commit dd34de1cfdfc09b972186c9ab9a4f1e822446b79.
* Added batch update stats for recordAuditSuccessStatus
* Added batch update stats to recordAuditFailStatus
* added configurable batch size
* build individual update/delete statements so the statements can be batched into 1 call to the DB
* notified #config-changes channel and ran make update-satellite-config-lock
* updated tests to use batch update stats
* Update overlaycache.go
Removes one select statement and columns gets filtered in first query.
Needs to be tested agains real database that this query is working and faster!
* Correct linting
reorder scans that this fit to new sql result order
* rename pkg/linksharing to linksharing
* rename pkg/httpserver to linksharing/httpserver
* rename pkg/eestream to uplink/eestream
* rename pkg/stream to uplink/stream
* rename pkg/metainfo/kvmetainfo to uplink/metainfo/kvmetainfo
* rename pkg/auth/signing to pkg/signing
* rename pkg/storage to uplink/storage
* rename pkg/accounting to satellite/accounting
* rename pkg/audit to satellite/audit
* rename pkg/certdb to satellite/certdb
* rename pkg/discovery to satellite/discovery
* rename pkg/overlay to satellite/overlay
* rename pkg/datarepair to satellite/repair
* add counters for nodes that have/have not been seen in the past 24 hours/week
* add additional uptime counters
* add monkit stats for containment mode
* satellite/satellitedb: Alter nodes disqualification column
Change the type of the 'disqualification' column of the nodes table from
boolean to timestamp.
* overlay/cache: Change Disqualified field type
Change the Disqualified field type the NodeDossier struct type from bool
to time.Time to match with the disqualified type used by the DB layer.
* satellite/satellitedb: Update queries uses disqualified
Update the queries which uses the disqualified column due to the column
type has been changed from boolean to nullable timestamp.
* docs/design: Update disqualification due impl changes
Update the disqualification design document to contain the architectural
change required to be able to restore unfair disqualified nodes in case
of an unexpected cause (bug, mistake, hard network disconnection, etc.).
* set up voucher service skeleton, basic test
* add VetNode db method
* basic test for VetNode
* encode and sign voucher functions
* fill out and sign vouchers
* test pass/fail voucher request
* match EncodeVoucher to other Encode functions
* add last_ip field to dbx model node, generate dbx
* add last_ip to node proto, generate pb
* migrate
* resolve address in transport.DialNode, update lastIp in cache.UpdateAddress
* use net.SplitHostPort to isolate host address from port
* define DistinctIPs flag
* add test for GetIP
* select last_ip when querying for nodes
* if distinctIPs flag == true, query for nodes with distinct IPs
* some basic tests
* change last_ip to field 14 in proto
* remove comments
* check err
* change distinctIPs to distinctIP
* exclude IPs from newNodes in query for reputable nodes
* add index on last_ip
* only add to excludedIPs if flag is true
* test half new nodes returns distinct IPs
* fix alignment
* add test
* rework ip filter query, add retry logic, add switch for database driver
* add retry to SelectNewNodes
* change discovery intervals so IPs don't get overwritten
* remove TestGetIP
* edit updating node stats in test
* split exclude into nodeIDs and IPs
* separate non-distinct IP query into other function
* trigger checks
* remove else block
* Initial Webserver Draft for Version Controlling
* Rename type to avoid confusion
* Move Function Calls into Version Package
* Fix Linting and Language Typos
* Fix Linting and Spelling Mistakes
* Include Copyright
* Include Copyright
* Adjust Version-Control Server to return list of Versions
* Linting
* Improve Request Handling and Readability
* Add Configuration File Option
Add Systemd Service file
* Add Logging to File
* Smaller Changes
* Add Semantic Versioning and refuses outdated Software from Startup (#1612)
* implements internal Semantic Version library
* adds version logging + reporting to process
* Advance SemVer struct for easier handling
* Add Accepted Version Store
* Fix Function
* Restructure
* Type Conversion
* Handle Version String properly
* Add Note about array index
* Set temporary Default Version
* Add Copyright
* Adding Version to Dashboard
* Adding Version Info Log
* Renaming and adding CheckerProcess
* Iteration Sync
* Iteration V2
* linting
* made LogAndReportVersion a go routine
* Refactor to Go Routine
* Add Context to Go Routine and allow Operation if Lookup to Control Server fails
* Handle Unmarshal properly
* Linting
* Relocate Version Checks
* Relocating Version Check and specified default Version for now
* Linting Error Prevention
* Refuse Startup on outdated Version
* Add Startup Check Function
* Straighten Logging
* Dont force Shutdown if --dev flag is set
* Create full Service/Peer Structure for ControlServer
* Linting
* Straighting Naming
* Finish VersionControl Service Layout
* Improve Error Handling
* Change Listening Address
* Move Checker Function
* Remove VersionControl Peer
* Linting
* Linting
* Create VersionClient Service
* Renaming
* Add Version Client to Peer Definitions
* Linting and Renaming
* Linting
* Remove Transport Checks for now
* Move to Client Side Flag
* Remove check
* Linting
* Transport Client Version Intro
* Adding Version Client to Transport Client
* Add missing parameter
* Adding Version Check, to set Allowed = true
* Set Default to true, testing
* Restructuring Code
* Uplink Changes
* Add more proper Defaults
* Renaming of Version struct
* Dont pass Service use Pointer
* Set Defaults for Versioning Checks
* Put HTTP Server in go routine
* Add Versioncontrol to Storj-Sim
* Testplanet Fixes
* Linting
* Add Error Handling and new Server Struct
* Move Lock slightly
* Reduce Race Potentials
* Remove unnecessary files
* Linting
* Add Proper Transport Handling
* small fixes
* add fence for allowed check
* Add Startup Version Check and Service Naming
* make errormessage private
* Add Comments about VersionedClient
* Linting
* Remove Checks that refuse outgoing connections
* Remove release cmd
* Add Release Script
* Linting
* Update to use correct Values
* Change Timestamp handling
* Adding Protobuf changes back in
* Adding SatelliteDB Changes and adding Storj Node Version to PB
* Add Migration Table
* Add Default Stats for Creation
* Move to BigInt
* Proper SQL Migration
* Ensure minimum Version is passed to the node selection
* Linting...
* Remove VersionedClient and adjust smaller changes from prior merge
* Linting
* Fix PB Message Handling and Query for Node Selection
* some future-proofing type changes
Change-Id: I3cb5018dcccdbc9739fe004d859065992720caaf
* fix a compiler error
Change-Id: If66bb92d8b98e31cd618ecec9c6448ab9b037fa5
* Comment on Constant for Overlay
* Remove NOT NULL and add epoch call as function
* add versions to bootstrap and satellites
Change-Id: I436944589ea5f21600cdd997742a84fe0b16e47b
* Change Update Migration
* Fix DB Migration
* Increase Timeout temporarily, to see whats going on
* Remove unnecessary const and vars
Cleanup Function calls from deprecated NodeVersion struct
* Updated Protopuf, removed depcreated Code from Inspector
* Implement NodeVersion into InfoResponse
* Regenerated locked.go
* Linting
* Fix Tests
* Remove unnecessary constant
* Update Function and Flag Description
* Remove Empty Stat Creation
* return properly with error
* Remove unnecessary struct
* simplify migration step
* Update Inspector to return Version Info
* Update local Endpoint Version Handling
* Reset Travis Timeout
* Add Default for CommitHash
* single quotes
This changes semantics slightly! with this change,
CreateEntryIfNotExists() will do a cache Update with every node passed
in, whether it exists or not. Update() already does a race-free upsert
operation, so that change removes the problematic race in
CreateEntryIfNotExists(). As far as I can tell, this semantic change
doesn't break any expectations of callers, and shouldn't affect
performance in a significant way, as we already have an awful lot of
round-trips to the db either way. But if I've misunderstood the
intention of the method, someone ought to catch it during review.
* got tests passed
* wire up paginate function for cache node retrieval
* Add tests for paginate but they're failing
* fix the test arguments
* Updates paginate function to return more variable
* Updates
* Some test and logic tweaks
* improves config handling in discovery
* adds refresh offset to discovery struct