Create a new variable rather than reusing the existing one because the
name of the existing one is confusing when reading the logic and it
requires more time that the logic doesn't have a bug.
* Added the ability to pass timeout settings from cmd/uplink to libuplink.
* Removed commented out code.
* Updated 2min timeouts for the uplink CLI.
* Removed comment.
* Made transport defaultDialTimeout and defaultRequestTimeout public
* Added comments to describe where these defaults apply.
* Added a new defaults to libuplink and added tests.
* Added a new defaults to libuplink and added tests.
* pkg/datarepair: Add test to check num upload pieces
Add a new test for ensuring the number of pieces that the repair process
upload when a segment is injured.
* satellite/orders: Don't create "put order limits" over total
Repair must not create "put order limits" more than the total count.
* pkg/datarepair: Update upload repair pieces test
Update the test which checks the number of pieces which are uploaded
during a repair for using the same excess over the success threshold
value than the implementation.
* satellites/orders: Limit repair put order for not being total
Limit the number of put orders to be used by repair for only uploading
pieces to a % excess over the successful threshold.
* pkg/datarepair: Change DataRepair test to pass again
Make some changes in the DataRepair test to make pass again after the
repair upload repaired pieces only until a % excess over success
threshold.
Also update the steps description of the DataRepair test after it has been
changed, to match on what's now, besides to leave it more generic for
avoiding having to update it on minimal future refactorings.
* satellite: Make repair excess optimal threshold configurable
Add a new configuration parameter to the satellite for being able to
configure the percentage excess over the optimal threshold, used for
determining how many pieces should be repaired/uploaded, rather than
having the value hard coded.
* repairer: Add configurable param to segments/repairer
Add a new parameters to the segment/repairer to calculate the maximum
number of excess nodes, based on the optimal threshold, that repaired
pieces can be uploaded.
This new parameter has been added for not returning more nodes than the
number of upload orders for data repair satellite service calculate for
repairing pieces.
* pkg/storage/ec: Update log message in clien.Repair
* satellite: Update configuration lock file
checker_segment_total_count - Number of total segments in pointer during checker iteration
checker_segment_healthy_count - Number of healthy segments in pointer during checker iterationn
time_since_checker_queue - Seconds elapsed between checker queue and beginning repair
time_for_repair - Seconds elapsed between beginning repair and ending repair/dequeueing
* add db interface and methods, add sa metainfo endpoints and svc
* add bucket metainfo svc funcs
* add sadb bucekts
* bucket list gets all buckets
* filter buckets list on macaroon restrictions
* update pb cipher suite to be enum
* add conversion funcs
* updates per comments
* bucket settings should say default
* add direction to list buckets, add tests
* fix test bucket names
* lint err
* only support forward direction
* add comments
* minor refactoring
* make sure list up to limit
* update test
* update protolock file
* fix lint
* change per PR
* Fix some log message to actually report the number of pieces needed to
repaired for reaching the successful/optimal threshold.
* Remove some unneeded `nil` check conditional.
* monitor optimal wait fraction
Change-Id: I1c76da5e8031237cf78ce5a0774732dd5e558ea1
* monitor other times about the upload
Change-Id: Iae81c80fb1446fbf4b3dd04fc6b238f2ede96545
* fix orderdDB methods to take correct args
* update tally to save projectID in correct format
* update var names in splitBucket test
* changes per CR comments
* pkg/process/metrics: add an instance prefix
the distinction between which satellite is sending which
data should go in the instance field, not the suffix or application
fields. (un)fortunately, the instance id is deliberately not
configurable because we don't want it to be easy to accidentally
have multiple applications collide with the same instance id.
so we're currently stuffing the human readable instance in the
suffix. :(
perhaps a reasonable tradeoff would be an optional instance
prefix that allows operators to put their domain name in
the instance
Change-Id: I6fcc8498be908c5740439cc00f77474ad151febd
* linting
Change-Id: I9f9a44fa9a2634ef5e4f89548d42d57ce9e4450e
* add path implementation
This commit adds a pkg/paths package which contains two types,
Encrypted and Unencrypted, to statically enforce what is contained
in a path. It's part of a refactoring of the code base to be more
clear about what is contained in a storj.Path at all the layers.
Change-Id: Ifc4d4932da26a97ea99749b8356b4543496a8864
* add encryption store
This change adds an encryption.Store type to keep a collection
of root keys for arbitrary locations in some buckets. It allows
one to look up all of the necessary information to encrypt paths,
decrypt paths and decrypt list operations.
It adds some exported functions to perform encryption on paths
using a Store.
Change-Id: I1a3d230c521d65f0ede727f93e1cb389f8be9497
* add shim around streams store
This commit changes no functionality, but just reorganizes the code
so that changes can be made directly to the streams store
implementation without affecting callers.
It also adds a Path type that will be used at the interface boundary
for the streams store so that it can be sure that it's getting well
formed paths that it expects.
Change-Id: I50bd682995b185beb653b00562fab62ef11f1ab5
* refactor streams to use encryption store
This commit changes the streams store to use the path type as
well as the encryption store to handle all of it's encryption
and decryption.
Some changes were made to how the default key is returned in
the encryption store to have it include the case when the bucket
exists but no paths matched. The path iterator could also be
simplified to not report if a consume was valid: that information
is no longer necessary.
The kvmetainfo tests were changed to appropriately pass the
subtests *testing.T rather than having the closure it executes
use the parent one. The test framework now correctly reports
which test did the failing.
There are still some latent issues with listing in that listing
for "a/" and listing for "a" are not the same operation, but we
treat them as such. I suspect that there are also issues with
paths like "/" or "//foo", but that's for another time.
Change-Id: I81cad4ba2850c3d14ba7e632777c4cac93db9472
* use an encryption store at the upper layers
Change-Id: Id9b4dd5f27b3ecac863de586e9ae076f4f927f6f
* fix linting failures
Change-Id: Ifb8378879ad308d4d047a0483850156371a41280
* fix linting in encryption test
Change-Id: Ia35647dfe18b0f20fe13763b28e53294f75c38fa
* get rid of kvmetainfo rootKey
Change-Id: Id795ca03d9417e3fe9634365a121430eb678d6d5
* Fix linting failure for return with else
Change-Id: I0b9ffd92be42ffcd8fef7ea735c5fc114a55d3b5
* fix some bugs adding enc store to kvmetainfo
Change-Id: I8e765970ba817289c65ec62971ae3bfa2c53a1ba
* respond to review feedback
Change-Id: I43e2ce29ce2fb6677b1cd6b9469838d80ec92c86
* add voucher service on storage node
* config field tag syntax, go routines for requests
* hook up voucher service in storagenode/peer.go
* add voucher config to testplanet
* add voucher config to testplanet
* add voucher response status INVALID, ACCEPTED, REJECTED
* add a test for vouchers service
* handle no row from GetValid, test it
* add trust pool to voucher service
* use trusted list to get satellites
* verify vouchers upon receipt
* test VerifyVoucher
This commit adds two functions that implement the algorithms
described in the password key derivation design document. They
will be used during setup to derive bucket level root keys or
default passwords to use when buckets do not have their own
independent key.
Change-Id: Ie7fb2d8d549ba7465d0722716a2c1ac0ad907286
* pkg/audit: Add DQ test for too many failed audits
Add an integration test which checks that a node which fails several
audits gets disqualified but not before it reaches the audit reputation
disqualification cut-off.
* internal/testplanet: Set DQ cut-off config values
Set the values of the Overlay cache DQ cut-off configuration parameters
used by testplanet.
Move 2 helper function used for test which relay on testplanet from the
test file where they were created to separated file to contain them
because they are not only used in the test file were initially they were
created.
* add counters for nodes that have/have not been seen in the past 24 hours/week
* add additional uptime counters
* add monkit stats for containment mode
* satellite/satellitedb: Alter nodes disqualification column
Change the type of the 'disqualification' column of the nodes table from
boolean to timestamp.
* overlay/cache: Change Disqualified field type
Change the Disqualified field type the NodeDossier struct type from bool
to time.Time to match with the disqualified type used by the DB layer.
* satellite/satellitedb: Update queries uses disqualified
Update the queries which uses the disqualified column due to the column
type has been changed from boolean to nullable timestamp.
* docs/design: Update disqualification due impl changes
Update the disqualification design document to contain the architectural
change required to be able to restore unfair disqualified nodes in case
of an unexpected cause (bug, mistake, hard network disconnection, etc.).
* reputation: Add configuration parameters
Add the configuration parameters which will be used by the algorithm
which will calculate the storage node reputation.
Because the reputation calculation is based on audit and uptime check
results some configuration parameters are in pkg/audit, others in
pkg/discovery and other in the satellite which will combine the both
reputation results to obtain the storage node reputation for repair and
uplink.
* satellite-config: Refresh lock file with new params
Refresh the Satellite configuration yaml lock file with the new
parameters added in this branch.
* Disabled discovery service by changiing from Stop() to Pause()
Paused to solve race condition. If discovery is running, it may mark a node "up" after they've been manually marked "down" in this test.
* Extend to the repair timeout
Fixes intermittent test failures when repairs were taking more than 2 seconds.
* Re-enabled test. Disabled discovery service by changiing from Stop() to Pause()
* Changed back to Stop.
* Revert "Changed back to Stop."
This reverts commit 46d410e72dfae63e0c44915be42784cc9a7b5abf.
* re-enabling TestIdentifyInjuredSegments
* Changed Pause to Stop. Commented on timeout change
* testing...
* temporarily skipping audit tests
* changing back to discover Stop for testing via jenkins
* Revert "changing back to discover Stop for testing via jenkins"
This reverts commit 6aa8558b11a0053c30e0c8b2dbf0d6c0cb34ee6c.
* Changing back to Stop(). Depends on PR 2137
* Revert "temporarily skipping audit tests"
This reverts commit 1940ed9b315d663a0eb6c95521780cbcb48cb121.
* Removed reference to Graveyard since its been removed
Fix a bug in a range loop which used the reference of the variable assigned in the same loop to be appended in a slice, turning out that the slice will contain the same reference rather the a reference for each value of the ranged items.
What: add monkit.Task to a bunch of functions that are missing it
Why: this will significantly help our instrumentation, data collection, and tracing about what's going on in the network
What: this will make it so release binaries default to whatever-release instead of whatever-dev in metrics collection
Why: So we can monitor release binaries with default configuration without getting drowned out by dev binaries
* fix bug for setting flag only values in process setup
when the code was changed to directly load values into the config
structs, it was missed that some configuration is only defined
through flags, but can be loaded from config files still.
so, we need to propogate the settings to the flag only values.
* add test for setting propagation
* fix linting error
* set up voucher service skeleton, basic test
* add VetNode db method
* basic test for VetNode
* encode and sign voucher functions
* fill out and sign vouchers
* test pass/fail voucher request
* match EncodeVoucher to other Encode functions
* change BindSetup to be an option to Bind
* add process.Bind to allow composite structures
* hack fix for noprefix flags
* used tagged version of structs
Before this PR, some flags were created by calling `cfgstruct.Bind` and having their fields create a flag. Once the flags were parsed, `viper` was used to acquire all the values from them and config files, and the fields in the struct were set through the flag interface.
This doesn't work for slices of things on config structs very well, since it can only set strings, and for a string slice, it turns out that the implementation in `pflag` appends an entry rather than setting it.
This changes three things:
1. Only have a `Bind` call instead of `Bind` and `BindSetup`, and make `BindSetup` an option instead.
2. Add a `process.Bind` call that takes in a `*cobra.Cmd`, binds the struct to the command's flags, and keeps track of that struct in a global map keyed by the command.
3. Use `viper` to get the values and load them into the bound configuration structs instead of using the flags to propagate the changes.
In this way, we can support whatever rich configuration we want in the config yaml files, while still getting command like flags when important.
* added scopelint and correcte issues found
* corrected scopelint issue
* made updates based on Ivan's suggestions
Most were around naming conventions
Some were false positives, but I kept them since the test.Run could eventually be changed to run in parallel, which could cause a bug
Others were false positives. Added // nolint: scopelint
* first round cleanup based on go-critic
* more issues resolved for ifelsechain and unlambda checks
* updated from master and gocritic found a new ifElseChain issue
* disable appendAssign. i reports false positives
* re-enabled go-critic appendAssign and disabled lint check at code line level
* fixed go-critic lint error
* fixed // nolint add gocritic specifically
What: Changes to support custom usage limit for the project. With this implementation by default project usage limit is taken from configuration flag. If project DB field usage_limit will be set to value larger than 0 it will become custom usage limit and we will be used to verify is limit was exceeded.
Whats changed:
usage_limit (bigint) field added to projects table (with migration)
things related to project usage moved from metainfo endpoint to project usage type
accounting.ProjectAccounting extended with GetProjectUsageLimits() method
Why: We need to have different usage limits per project. https://storjlabs.atlassian.net/browse/V3-1814
* add repair monkit stats
* rename values, use meter instead of counter, use success threshold instead of repair threshold
* Counter -> Meter
* add repair segment size
* update names and use ratios for healthy before/after repair
* restart jenkins
* add last_ip field to dbx model node, generate dbx
* add last_ip to node proto, generate pb
* migrate
* resolve address in transport.DialNode, update lastIp in cache.UpdateAddress
* use net.SplitHostPort to isolate host address from port
* define DistinctIPs flag
* add test for GetIP
* select last_ip when querying for nodes
* if distinctIPs flag == true, query for nodes with distinct IPs
* some basic tests
* change last_ip to field 14 in proto
* remove comments
* check err
* change distinctIPs to distinctIP
* exclude IPs from newNodes in query for reputable nodes
* add index on last_ip
* only add to excludedIPs if flag is true
* test half new nodes returns distinct IPs
* fix alignment
* add test
* rework ip filter query, add retry logic, add switch for database driver
* add retry to SelectNewNodes
* change discovery intervals so IPs don't get overwritten
* remove TestGetIP
* edit updating node stats in test
* split exclude into nodeIDs and IPs
* separate non-distinct IP query into other function
* trigger checks
* remove else block
* uplink: Add a new flag to set the filepath of the file which is used for
saving the encryption key and rename the one that hold the encryption key and
establish that it has priority over the key stored in the file to make the
configuration usable without having a huge refactoring in test-sim.
* cmd/uplink: Adapt the setup subcommand for storing the user input key to a file
and adapt the rest of the subcommands for reading the key from the key-file when
the key isn't explicitly set with a command line flag.
* cmd/gateway: Adapt it to read the encryption key from the key-file or use the
one passed by a command line flag.
* pkg/process: Export the default configuration filename so other packages which
use the same value can reference to it rather than having it hardcoded.
* Adapt several integrations (scripts, etc.) to consider the changes applied in uplink and cmd packages.
* repair no cutoff longtail
* commit repair pieces even if not hitting success threshold
* commit repair pieces even if not hitting success threshold
* remove useless condition
* better error message
* cmd/uplink: add share command to restrict an api key
This commit is an early bit of work to just implement restricting
macaroon api keys from the command line. It does not convert
api keys to be macaroons in general.
It also does not apply the path restriction caveats appropriately
yet because it does not encrypt them.
* cmd/uplink: fix path encryption for shares
It should now properly encrypt the path prefixes when adding
caveats to a macaroon.
* fix up linting problems
* print summary of caveat and require iso8601
* make clone part more clear
A previous change reused the same timeout for dialing as well as
requesting in order to speed up some tests. This change introduces
a distinct timeout so that the different operations can have different
timeouts.
Ran into difficulties trying to find the ideal solution for sharing
these counts between multiple satellite servers, so for now this is a
dumb solution storing recent space-usage changes in a big dumb in-memory
map with a big dumb lock around it. The interface used, though, should
allow us to swap out the implementation without much difficulty
elsewhere once we know what we want it to be.
The timeout tests were configured to use very short timeouts but
for some reason they took many seconds to complete. This commit
fixes two issues:
1. The transports were always using the default timeout rather than
the timeout specified.
2. The tests were possibly calling t.FailNow inside of the non-test
goroutine which causes it to exit, possibly losing an error from
the error group. Additionally, it didn't seem to be testing that
the error came back as a deadline wrapped in a transport error.
The tests run in ~3s instead of ~60s now.