* reputation: Add configuration parameters
Add the configuration parameters which will be used by the algorithm
which will calculate the storage node reputation.
Because the reputation calculation is based on audit and uptime check
results some configuration parameters are in pkg/audit, others in
pkg/discovery and other in the satellite which will combine the both
reputation results to obtain the storage node reputation for repair and
uplink.
* satellite-config: Refresh lock file with new params
Refresh the Satellite configuration yaml lock file with the new
parameters added in this branch.
* Disabled discovery service by changiing from Stop() to Pause()
Paused to solve race condition. If discovery is running, it may mark a node "up" after they've been manually marked "down" in this test.
* Extend to the repair timeout
Fixes intermittent test failures when repairs were taking more than 2 seconds.
* Re-enabled test. Disabled discovery service by changiing from Stop() to Pause()
* Changed back to Stop.
* Revert "Changed back to Stop."
This reverts commit 46d410e72dfae63e0c44915be42784cc9a7b5abf.
* re-enabling TestIdentifyInjuredSegments
* Changed Pause to Stop. Commented on timeout change
* testing...
* temporarily skipping audit tests
* changing back to discover Stop for testing via jenkins
* Revert "changing back to discover Stop for testing via jenkins"
This reverts commit 6aa8558b11a0053c30e0c8b2dbf0d6c0cb34ee6c.
* Changing back to Stop(). Depends on PR 2137
* Revert "temporarily skipping audit tests"
This reverts commit 1940ed9b315d663a0eb6c95521780cbcb48cb121.
* Removed reference to Graveyard since its been removed
Fix a bug in a range loop which used the reference of the variable assigned in the same loop to be appended in a slice, turning out that the slice will contain the same reference rather the a reference for each value of the ranged items.
What: add monkit.Task to a bunch of functions that are missing it
Why: this will significantly help our instrumentation, data collection, and tracing about what's going on in the network
What: this will make it so release binaries default to whatever-release instead of whatever-dev in metrics collection
Why: So we can monitor release binaries with default configuration without getting drowned out by dev binaries
* fix bug for setting flag only values in process setup
when the code was changed to directly load values into the config
structs, it was missed that some configuration is only defined
through flags, but can be loaded from config files still.
so, we need to propogate the settings to the flag only values.
* add test for setting propagation
* fix linting error
* set up voucher service skeleton, basic test
* add VetNode db method
* basic test for VetNode
* encode and sign voucher functions
* fill out and sign vouchers
* test pass/fail voucher request
* match EncodeVoucher to other Encode functions
* change BindSetup to be an option to Bind
* add process.Bind to allow composite structures
* hack fix for noprefix flags
* used tagged version of structs
Before this PR, some flags were created by calling `cfgstruct.Bind` and having their fields create a flag. Once the flags were parsed, `viper` was used to acquire all the values from them and config files, and the fields in the struct were set through the flag interface.
This doesn't work for slices of things on config structs very well, since it can only set strings, and for a string slice, it turns out that the implementation in `pflag` appends an entry rather than setting it.
This changes three things:
1. Only have a `Bind` call instead of `Bind` and `BindSetup`, and make `BindSetup` an option instead.
2. Add a `process.Bind` call that takes in a `*cobra.Cmd`, binds the struct to the command's flags, and keeps track of that struct in a global map keyed by the command.
3. Use `viper` to get the values and load them into the bound configuration structs instead of using the flags to propagate the changes.
In this way, we can support whatever rich configuration we want in the config yaml files, while still getting command like flags when important.
* added scopelint and correcte issues found
* corrected scopelint issue
* made updates based on Ivan's suggestions
Most were around naming conventions
Some were false positives, but I kept them since the test.Run could eventually be changed to run in parallel, which could cause a bug
Others were false positives. Added // nolint: scopelint
* first round cleanup based on go-critic
* more issues resolved for ifelsechain and unlambda checks
* updated from master and gocritic found a new ifElseChain issue
* disable appendAssign. i reports false positives
* re-enabled go-critic appendAssign and disabled lint check at code line level
* fixed go-critic lint error
* fixed // nolint add gocritic specifically
What: Changes to support custom usage limit for the project. With this implementation by default project usage limit is taken from configuration flag. If project DB field usage_limit will be set to value larger than 0 it will become custom usage limit and we will be used to verify is limit was exceeded.
Whats changed:
usage_limit (bigint) field added to projects table (with migration)
things related to project usage moved from metainfo endpoint to project usage type
accounting.ProjectAccounting extended with GetProjectUsageLimits() method
Why: We need to have different usage limits per project. https://storjlabs.atlassian.net/browse/V3-1814
* add repair monkit stats
* rename values, use meter instead of counter, use success threshold instead of repair threshold
* Counter -> Meter
* add repair segment size
* update names and use ratios for healthy before/after repair
* restart jenkins
* add last_ip field to dbx model node, generate dbx
* add last_ip to node proto, generate pb
* migrate
* resolve address in transport.DialNode, update lastIp in cache.UpdateAddress
* use net.SplitHostPort to isolate host address from port
* define DistinctIPs flag
* add test for GetIP
* select last_ip when querying for nodes
* if distinctIPs flag == true, query for nodes with distinct IPs
* some basic tests
* change last_ip to field 14 in proto
* remove comments
* check err
* change distinctIPs to distinctIP
* exclude IPs from newNodes in query for reputable nodes
* add index on last_ip
* only add to excludedIPs if flag is true
* test half new nodes returns distinct IPs
* fix alignment
* add test
* rework ip filter query, add retry logic, add switch for database driver
* add retry to SelectNewNodes
* change discovery intervals so IPs don't get overwritten
* remove TestGetIP
* edit updating node stats in test
* split exclude into nodeIDs and IPs
* separate non-distinct IP query into other function
* trigger checks
* remove else block
* uplink: Add a new flag to set the filepath of the file which is used for
saving the encryption key and rename the one that hold the encryption key and
establish that it has priority over the key stored in the file to make the
configuration usable without having a huge refactoring in test-sim.
* cmd/uplink: Adapt the setup subcommand for storing the user input key to a file
and adapt the rest of the subcommands for reading the key from the key-file when
the key isn't explicitly set with a command line flag.
* cmd/gateway: Adapt it to read the encryption key from the key-file or use the
one passed by a command line flag.
* pkg/process: Export the default configuration filename so other packages which
use the same value can reference to it rather than having it hardcoded.
* Adapt several integrations (scripts, etc.) to consider the changes applied in uplink and cmd packages.
* repair no cutoff longtail
* commit repair pieces even if not hitting success threshold
* commit repair pieces even if not hitting success threshold
* remove useless condition
* better error message
* cmd/uplink: add share command to restrict an api key
This commit is an early bit of work to just implement restricting
macaroon api keys from the command line. It does not convert
api keys to be macaroons in general.
It also does not apply the path restriction caveats appropriately
yet because it does not encrypt them.
* cmd/uplink: fix path encryption for shares
It should now properly encrypt the path prefixes when adding
caveats to a macaroon.
* fix up linting problems
* print summary of caveat and require iso8601
* make clone part more clear
A previous change reused the same timeout for dialing as well as
requesting in order to speed up some tests. This change introduces
a distinct timeout so that the different operations can have different
timeouts.
Ran into difficulties trying to find the ideal solution for sharing
these counts between multiple satellite servers, so for now this is a
dumb solution storing recent space-usage changes in a big dumb in-memory
map with a big dumb lock around it. The interface used, though, should
allow us to swap out the implementation without much difficulty
elsewhere once we know what we want it to be.
The timeout tests were configured to use very short timeouts but
for some reason they took many seconds to complete. This commit
fixes two issues:
1. The transports were always using the default timeout rather than
the timeout specified.
2. The tests were possibly calling t.FailNow inside of the non-test
goroutine which causes it to exit, possibly losing an error from
the error group. Additionally, it didn't seem to be testing that
the error came back as a deadline wrapped in a transport error.
The tests run in ~3s instead of ~60s now.
* set all intervals to UTC in rollupStats map, only delete latest day after both rollups
* clean up usage of interval, use intervalEndTime rather than createdAt
* change some variable names, add comments
* add flag for tally deletion
* adds deletetallies flag to testplanet
* space
* Removes println:
* adds test for deletes false
* tie defaults to releases
this change makes it so that by default, the flag defaults are
chosen based on whether the build was built as a release build or
an ordinary build. release builds by default get release defaults,
whereas ordinary builds by default get dev defaults.
any binary can have its defaults changed by specifying
--defaults=dev
or
--defaults=release
Change-Id: I6d216aa345d211c69ad913159d492fac77b12c64
* make release defaults more clear
this change extends cfgstruct structs to support either
a 'default' tag, or a pair of 'devDefault' and 'releaseDefault'
tags, but not both, for added clarity
Change-Id: Ia098be1fa84b932fdfe90a4a4d027ffb95e249c6
* clarify cfgstruct.DefaultsFlag
Change-Id: I55f2ff9080ebbc0ce83abf956e085242a92f883e
this adds a flag to all pkg/process binaries where if set,
will write a process-level trace for all monkit-instrumented
functions as an svg to the target path.
usage like:
uplink cp --debug.trace-out=trace.svg file sj://bucket/file
Change-Id: I8f6bbfc488f0f1ac270cb28a61347c6b9698cea7
What: This change moves project-level bucket metadata encryption information to the volatile section, because it is unlikely to remain in future releases
Why: Ultimately, the web user interface will allow bucket management (creation, removal, etc), but not object management as that requires an encryption key for sure and we don't want to have users give the satellite their encryption keys.
At a high level, a (*Project) type should map to all of the things you can do inside the web user interface within a project, which by necessity cannot have an encryption key. So, we really don't want an encryption key in the non-volatile section of this library.
* Initial Webserver Draft for Version Controlling
* Rename type to avoid confusion
* Move Function Calls into Version Package
* Fix Linting and Language Typos
* Fix Linting and Spelling Mistakes
* Include Copyright
* Include Copyright
* Adjust Version-Control Server to return list of Versions
* Linting
* Improve Request Handling and Readability
* Add Configuration File Option
Add Systemd Service file
* Add Logging to File
* Smaller Changes
* Add Semantic Versioning and refuses outdated Software from Startup (#1612)
* implements internal Semantic Version library
* adds version logging + reporting to process
* Advance SemVer struct for easier handling
* Add Accepted Version Store
* Fix Function
* Restructure
* Type Conversion
* Handle Version String properly
* Add Note about array index
* Set temporary Default Version
* Add Copyright
* Adding Version to Dashboard
* Adding Version Info Log
* Renaming and adding CheckerProcess
* Iteration Sync
* Iteration V2
* linting
* made LogAndReportVersion a go routine
* Refactor to Go Routine
* Add Context to Go Routine and allow Operation if Lookup to Control Server fails
* Handle Unmarshal properly
* Linting
* Relocate Version Checks
* Relocating Version Check and specified default Version for now
* Linting Error Prevention
* Refuse Startup on outdated Version
* Add Startup Check Function
* Straighten Logging
* Dont force Shutdown if --dev flag is set
* Create full Service/Peer Structure for ControlServer
* Linting
* Straighting Naming
* Finish VersionControl Service Layout
* Improve Error Handling
* Change Listening Address
* Move Checker Function
* Remove VersionControl Peer
* Linting
* Linting
* Create VersionClient Service
* Renaming
* Add Version Client to Peer Definitions
* Linting and Renaming
* Linting
* Remove Transport Checks for now
* Move to Client Side Flag
* Remove check
* Linting
* Transport Client Version Intro
* Adding Version Client to Transport Client
* Add missing parameter
* Adding Version Check, to set Allowed = true
* Set Default to true, testing
* Restructuring Code
* Uplink Changes
* Add more proper Defaults
* Renaming of Version struct
* Dont pass Service use Pointer
* Set Defaults for Versioning Checks
* Put HTTP Server in go routine
* Add Versioncontrol to Storj-Sim
* Testplanet Fixes
* Linting
* Add Error Handling and new Server Struct
* Move Lock slightly
* Reduce Race Potentials
* Remove unnecessary files
* Linting
* Add Proper Transport Handling
* small fixes
* add fence for allowed check
* Add Startup Version Check and Service Naming
* make errormessage private
* Add Comments about VersionedClient
* Linting
* Remove Checks that refuse outgoing connections
* Remove release cmd
* Add Release Script
* Linting
* Update to use correct Values
* Change Timestamp handling
* Adding Protobuf changes back in
* Adding SatelliteDB Changes and adding Storj Node Version to PB
* Add Migration Table
* Add Default Stats for Creation
* Move to BigInt
* Proper SQL Migration
* Ensure minimum Version is passed to the node selection
* Linting...
* Remove VersionedClient and adjust smaller changes from prior merge
* Linting
* Fix PB Message Handling and Query for Node Selection
* some future-proofing type changes
Change-Id: I3cb5018dcccdbc9739fe004d859065992720caaf
* fix a compiler error
Change-Id: If66bb92d8b98e31cd618ecec9c6448ab9b037fa5
* Comment on Constant for Overlay
* Remove NOT NULL and add epoch call as function
* add versions to bootstrap and satellites
Change-Id: I436944589ea5f21600cdd997742a84fe0b16e47b
* Change Update Migration
* Fix DB Migration
* Increase Timeout temporarily, to see whats going on
* Remove unnecessary const and vars
Cleanup Function calls from deprecated NodeVersion struct
* Updated Protopuf, removed depcreated Code from Inspector
* Implement NodeVersion into InfoResponse
* Regenerated locked.go
* Linting
* Fix Tests
* Remove unnecessary constant
* Update Function and Flag Description
* Remove Empty Stat Creation
* return properly with error
* Remove unnecessary struct
* simplify migration step
* Update Inspector to return Version Info
* Update local Endpoint Version Handling
* Reset Travis Timeout
* Add Default for CommitHash
* single quotes
* update ProjectStorageTotals func to get all records for project
* fix var name
* update test to catch bug
* fix spelling
* modify query so we only have to make 1
Make separate "CreateCertificate" and "CreateSelfSignedCertificate"
functions to take the two roles of NewCert. These names should help
clarify that they actually make certificates and not just allocate new
"Cert" or "Certificate" objects.
Secondly, in the case of non-self-signed certs, require a public and a
private key to be passed in instead of two private keys, because it's
pretty hard to tell when reading code which one is meant to be the
signer and which one is the signee. With a public and private key, you
know.
(These are some changes I made in the course of the openssl port,
because the NewCert function kept being confusing to me. It's possible
I'm just being ridiculous, and this doesn't help improve readability for
anyone else, but if I'm not being ridiculous let's get this in)
* Initial Webserver Draft for Version Controlling
* Rename type to avoid confusion
* Move Function Calls into Version Package
* Fix Linting and Language Typos
* Fix Linting and Spelling Mistakes
* Include Copyright
* Include Copyright
* Adjust Version-Control Server to return list of Versions
* Linting
* Improve Request Handling and Readability
* Add Configuration File Option
Add Systemd Service file
* Add Logging to File
* Smaller Changes
* Add Semantic Versioning and refuses outdated Software from Startup (#1612)
* implements internal Semantic Version library
* adds version logging + reporting to process
* Advance SemVer struct for easier handling
* Add Accepted Version Store
* Fix Function
* Restructure
* Type Conversion
* Handle Version String properly
* Add Note about array index
* Set temporary Default Version
* Add Copyright
* Adding Version to Dashboard
* Adding Version Info Log
* Renaming and adding CheckerProcess
* Iteration Sync
* Iteration V2
* linting
* made LogAndReportVersion a go routine
* Refactor to Go Routine
* Add Context to Go Routine and allow Operation if Lookup to Control Server fails
* Handle Unmarshal properly
* Linting
* Relocate Version Checks
* Relocating Version Check and specified default Version for now
* Linting Error Prevention
* Refuse Startup on outdated Version
* Add Startup Check Function
* Straighten Logging
* Dont force Shutdown if --dev flag is set
* Create full Service/Peer Structure for ControlServer
* Linting
* Straighting Naming
* Finish VersionControl Service Layout
* Improve Error Handling
* Change Listening Address
* Move Checker Function
* Remove VersionControl Peer
* Linting
* Linting
* Create VersionClient Service
* Renaming
* Add Version Client to Peer Definitions
* Linting and Renaming
* Linting
* Remove Transport Checks for now
* Move to Client Side Flag
* Remove check
* Linting
* Transport Client Version Intro
* Adding Version Client to Transport Client
* Add missing parameter
* Adding Version Check, to set Allowed = true
* Set Default to true, testing
* Restructuring Code
* Uplink Changes
* Add more proper Defaults
* Renaming of Version struct
* Dont pass Service use Pointer
* Set Defaults for Versioning Checks
* Put HTTP Server in go routine
* Add Versioncontrol to Storj-Sim
* Testplanet Fixes
* Linting
* Add Error Handling and new Server Struct
* Move Lock slightly
* Reduce Race Potentials
* Remove unnecessary files
* Linting
* Add Proper Transport Handling
* small fixes
* add fence for allowed check
* Add Startup Version Check and Service Naming
* make errormessage private
* Add Comments about VersionedClient
* Linting
* Remove Checks that refuse outgoing connections
* Remove release cmd
* Add Release Script
* Linting
* Update to use correct Values
* Move vars private and set minimum default versions for testing builds
* Remove VersionedClient
* Better Error Handling and naked return removal
* Straighten the Regex and string conversion
* Change Check to allows testplanet and storj-sim to run without the
need to pass an LDFlag
* Cosmetic Change to Dashboard
* Cleanup Returns and remove commented code
* Remove Version Check if no build options are passed in
* Pass in Config Values instead of Pointers
* Handle missed Error
* Update Endpoint URL
* Change Type of Release Flag
* Add additional Logging
* Remove Versions Logging of other Services
* minor fixes
Change-Id: I5cc04a410ea6b2008d14dffd63eb5f36dd348a8b
* timeout calculation
* Initial Migration Draft to wipe the network gracefully (#1578)
* Initial Migration Draft to wipe the network gracefully
* Change from DATE to UNIX Timestamp (INT)
* SQL Derps
* Remove Satellite Migration - gets done manually
* Move from possible STRING to INT Vars
* Add TestTable for Migration Tests
* Adjust Test data
* Update TestData Part 2
* Update Wipeout to 28th 00:00
* merge spelling
We want to use those fields in the bucket-level Pointer objects as
bucket defaults, but we need to be able to get at them first.
I don't see any strong reason not to make these available, except
that it was kind of a pain.
* reorg uplink cmd files for consistency
* init implementation of usage limiting
* Revert "reorg uplink cmd files for consistency"
This reverts commit 91ced7639bf36fc8af1db237b01e233ca92f1890.
* add changes per CR comments
* fix custom query to use rebind
* updates per convo about what to limit on
* changes per comments
* fix syntax and comments
* add integration test, add db methods for test
* update migration, add rebind to query
* update testdata for psql
* remove unneeded drop index statement
* fix migrations, fix calculate usage limit
* fix comment
* add audit test back
* change methods to use bucketName/projectID, fix tests
* add changes per CR comments
* add test for uplink upload and err ssg
* changes per CR comments
* check get/put limit separately
* add MetadataSize to stats
* add logic for accumulating bucket stats in calculateAtRestData
* rename stats to BucketTally, move to accounting package
* define method on accountingDB for inserting bucketTallies
* insert bucketTallies into bucket_storage_tally table
This change adds satellite endpoint for receiving OrderLimits sent by storage node.
Change includes:
* wire up orders sender in storage node (also in testplanet)
* saving serial number for OrderLimit in serial_numbers table
* satellite endpoint for receiving, verifying and storing OrderLimit and Order serial number
* initial implementation for Orders DB
* basic test for sending orders to satellite
* define irreparable inspector protobuf
* add IrreparableDB method GetLimited
* fill out irreparable inspector API
* add IrreparableInspector server to satellite, fix small error
* refactor IrreparableDB to use pb.IrreparableSegment instead of irreparable.RemoteSegmentInfo
* Removes concept of Email from Kademlia
* Removes kad email
* adds emails back to operator config for satellite
* replace operator configs in testplanet
* Warn about permissions when creating identity
* Function to determine if directory is writeable
* Check if writable before authorizing
* Remove redeclatarion
* remove windows specific utils
* Nat nits
* Actually test if directory is writeable with file creation
* add private listener to grpc server
* add changes per init CR
* fix server.close
* add insecure grpc connection, update logs msg
* fix tests, move insecure client
* add private ports to storj-sim, add insecure client to other inspectors
* add ports to test so there arent conflicts
* fix lint err
* fix node started log msg, close public listener
* remove commented out line
* small identity refactor:
+ Optimize? iterative cert chain methods to use array instead of slice
+ Add `ToChain` helper for converting 1d to 2d cert chain
TODO: replace literal declarations with this
+ rename `ChainRaw/RestChainRaw` to `RawChain/RawRestChain`
(adjective noun, instead of nound adjective)
* add regression tests for V3-1320
* fix V3-1320
* separate `DialUnverifiedIDOption` from `DialOption`
* separate `PingNode` and `DialNode` from `PingAddress` and `DialAddress`
* update node ID while bootstrapping
* goimports & fix comment
* add test case
This change ensures that the upload timer of ECClient is always stopped after no more status is expected from uploaded pieces. It also ensures that the "Timer expired" message will be logged only if the context is not already cancelled.
This is to avoid confusing logs where a "Timer expired" message is logged significantly later and mixes with similar messages logged from the upload of the next file segments.
* Wiring up DumpNodes response for Inspector
* Finalize everything and test that it works
* Get Count and DumpNodes working for Overlay Cache
* WIP updating payment rollup to check statDB instead of overlay
* FIrst pass at updating statDB to take wallet and email
* Passing tests
* use pb.NodeOperator instead of Meta struct
* remove TODO
* revert go.mod
* Get SQL migration working correctly
* Changes Meta to Operator in NodeStats struct
* Adds update operator logic for statDB
* Fix db migrate tests - added v5 snapshot
* User friendly msg for missing snapshot version
* Passing tests
* Change node update to happen in discovery instead of in overlay
* Fix logic and update function calls
* Update comment on UpdateOperator interface method
* Update name of parameter
* Change type of argument to UpdateOperator
* Updates statDB tests
* replace direct reference with an interface in various places
* hide piecePath
* ensure psserver tests don't use path
* ensure psserver tests don't use sql queries directly
* Wiring up DumpNodes response for Inspector
* Finalize everything and test that it works
* remove changes to overlay inspector
* Replace list with iterate method in DumpNodes
* generate proto
* Merge in nodeinfo changes
* Update iterate function in DumpNodes
* Fix import order
Adds a new `Info` method to the Kademlia endpoint that returns the following local node info:
* ID
* Type
* Metadata (email and wallet)
* Restrictions (free storage and bandwidth)
The new endpoint is exposed as `inspector kad node-info` command too.
* psclient receives storage node hash and compare it to own hash for verification
* uplink sends delete request when hashes don't match
* valid hashes are propagated up to segments.Store for future sending to satellite
This PR includes a new package called testrouting, which implements a very algorithmically slow but hopefully easy-to-keep-operationally-correct in-memory routing table. The routing table also supports writing out its current structure as a DOT graph for visualization. testrouting is primarily meant to help in coming up with generic routing table integration tests.
This PR also includes a new routing table integration test suite that runs against all current routing table implementations. Our existing routing table passes a lot of the tests, but not all of them, still debugging why. I have confirmed the tests should pass with the visualization graphs though.
Removes most instances of pb.SignedMessage (there's more to take out but they shouldn't hurt anyone as is).
There used to be places in psserver where a PieceID was hmac'd with the SatelliteID, which was gotten from a SignedMessage. This PR makes it so some functions access the SatelliteID from the Payer Bandwidth Allocation instead.
This requires passing a SatelliteID into psserver functions where they weren't before, so the following proto messages have been changed:
* PieceId - satellite_id field added
This is so the psserver.Piece function has access to the SatelliteID when it needs to get the namespaced pieceID.
This proto message should probably be renamed to PieceRequest, or a new PieceRequest message should be created so this isn't misnamed.
* PieceDelete - satellite_id field added
This is so the psserver.Delete function has access to the SatelliteID when receiving a request to Delete.
We realized that the Kademlia FindNear() function was
1. not using XOR distance (AKA _totally broken_)
2. largely a duplicate of the RoutingTable FindNear() function
Changes in this PR:
1. upgraded RoutingTable FindNear() to use iterator and restrictions
2. removed unneeded RoutingTable interface
3. made Kademlia wrap methods that were previously accessed via RoutingTable
4. fixed the tests