stage
The test-versions script no longer uses the `testfiles` directory, which
the final upload for the rolling-upgrade script depended on. This change
creates and populates a `testfiles` diirectory during the final upload
stage of the rolling upgrade test.
Change-Id: Iabeccbadc55a8c85a1febbd5eb4e7d889a57a8dc
We don't want slowloris nodes to be able to indefinitely block
up the satellite, so add a timeout. Some monitoring inspection
showed the largest success times being on the order of 30s, so
a 1min timeout should be sufficient to kill the misbehaving nodes.
Change-Id: I5e2c3480a15f6304e37262d0a4d30d07eae99bb3
As per discussed we decided to rate limit how fast we iterate through
the metainfo database in the metainfo loop. This puts in place a
mechanism for rate limiting and burst limiting if need be in the future.
The default for this rate limiting is still no limits so it stays the
same as our previous functionality.
Change-Id: I950f7192962b0e49f082d2c4284e2d52b0a925c7
Adds check to see if storage nodes are eligible to initiate
graceful exit, by checking their CreatedAt date and seeing if
their "age" is greater than the new config value:
NodeMinAgeInMonths
The default for this value is 6 months for now.
https://storjlabs.atlassian.net/browse/V3-3357
Change-Id: Ib807ab8987ddb5a38a27a83886490f73fe8c5816
test-sim-versions.sh tests upgrading the satellite, storagenodes, and uplinks from the most recent release to master, and ensures that compatibility across all uplink versions since v0.15 is maintained.
Change-Id: I80a54236d0eb2d681716caf4b825a883bdc25ef1
- also updated ping chore to pick up trust changes
- fixed small typo in blueprint
- fixed flags for storj-sim
- wired up changes to testplanet
Change-Id: I02982f3a63a1b4150b82a009ee126b25ed51917d
* add integration tests to jenksin
* have jenkins run storj-sim integration tests w/crdb
Change-Id: I696d55c5894aaf630dcd7a566e1dd705ee88486b
* rm crdb integration tests to see if postgres passes
Change-Id: I1727a027ff802acbff5fc55961a0d605faefcf2d
* comment out aws tests to see if that is the error
Change-Id: I456c3d36f6a4ce7760ea0b6c402b6ea16cfe77e3
* add aws profile to integration tests
Change-Id: Ic01185dbc7b84ac48dfb846f8f272b34b50379b6
* add tmp path for aws profile and creds
Change-Id: I7b82ee5a99937edd3f66ae01bfb5cb21028a62cf
* change linux KiB syntax to bytes to support osx
Change-Id: Ia1f1027ba8da64a6ba537062deb9b3519973621f
* satellite/console: Add X-Frame-Options and Referrer-Policy security headers
* Update to use CSP instead of XFO and include tardigrade.io
* Make FrameAncestors a config option
* Update satellite-config lock
* Make help text for FrameAncestors better
* change satellite.Peer name to Core
* change to Core in testplanet
* missed a few places
* keep shared stuff in peer.go to stay consistent with storj/docs
* rm dup api code from sa peer, update storj-sim
* fix for backwards compat tests
* use env var instead of localhost
* changes per CR
* fix env var name
* skip peer for setup
* add overall failure percentage check and inactive time frame check before sending a response to sno
* update comment
* delete node from transfer queue if it has been inactive for too long
* fix linting error
* add test config value
* fix nil pointer
* add config value into testplanet
* add unit test for overall failure threshold
* move timeframe threshold to chore
* update protolock
* add chore test
* add per peiece failure count logic
* change config name from EndpointMaxFailures to MaxFailuresPerPiece
* address comments
* fix linting error
* add error handling for no row returned from progress table
* fix test for graceful exit chore on storagenode
* fix typo InActive -> Inactive
* improve readability for failure threshold calculation
* update config lock
* change error handling for GetProgress in graceful exit endpoint on the satellite side
* return proper rpc error in endpoint
* add check in chore test for checking finish timestamp and queue
* add metrics counter and chore
* updates metrics observer interval release default and dev default to 15min
* add more specific check for remote pointers
* add Counter field to metrics chore, add counter tests
* rm redundant ObjectCount suffix
* make pointer check easier to read
* change metrics.Config.Interval to ChoreInterval
* rm unneeded var
* fix comment
* update satellite config lock
* set up redis support in live accounting
* move live.Service interface into accounting package and rename to Cache, pass into satellite
* refactor Cache to store one int64 total, add IncrBy method to redis client implementation
* add monkit tracing to live accounting
* test that all nodes can check in with all satellites
* keep kademlia config
* add untrusted satellite test
* use getversion
* remove kademlia config changes in test-sim-backwards.sh
* add kademlia flags back to storj-sim storagenode
* reset kademlia flags in storagenode entrypoint
What:
cmd/inspector/main.go: removes kad commands
internal/testplanet/planet.go: Waits for contact chore to finish
satellite/contact/nodesservice.go: creates an empty nodes service implementation
satellite/contact/service.go: implements Local and FetchInfo methods & adds external address config value
satellite/discovery/service.go: replaces kad.FetchInfo with contact.FetchInfo in Refresh() & removes Discover()
satellite/peer.go: sets up contact service and endpoints
storagenode/console/service.go: replaces nodeID with contact.Local()
storagenode/contact/chore.go: replaces routing table with contact service
storagenode/contact/nodesservice.go: creates empty implementation for ping and request info nodes service & implements RequestInfo method
storagenode/contact/service.go: creates a service to return the local node and update its own capacity
storagenode/monitor/monitor.go: uses contact service in place of routing table
storagenode/operator.go: moves operatorconfig from kad into its own setup
storagenode/peer.go: sets up contact service, chore, pingstats and endpoints
satellite/overlay/config.go: changes NodeSelectionConfig.OnlineWindow default to 4hr to allow for accurate repair selection
Removes kademlia setups in:
cmd/storagenode/main.go
cmd/storj-sim/network.go
internal/testplane/planet.go
internal/testplanet/satellite.go
internal/testplanet/storagenode.go
satellite/peer.go
scripts/test-sim-backwards.sh
scripts/testdata/satellite-config.yaml.lock
storagenode/inspector/inspector.go
storagenode/peer.go
storagenode/storagenodedb/database.go
Why: Replacing Kademlia
Please describe the tests:
• internal/testplanet/planet_test.go:
TestBasic: assert that the storagenode can check in with the satellite without any errors
TestContact: test that all nodes get inserted into both satellites' overlay cache during testplanet setup
• satellite/contact/contact_test.go:
TestFetchInfo: Tests that the FetchInfo method returns the correct info
• storagenode/contact/contact_test.go:
TestNodeInfoUpdated: tests that the contact chore updates the node information
TestRequestInfoEndpoint: tests that the Request info endpoint returns the correct info
Please describe the performance impact: Node discovery should be at least slightly more performant since each node connects directly to each satellite and no longer needs to wait for bootstrapping. It probably won't be faster in real time on start up since each node waits a random amount of time (less than 1 hr) to initialize its first connection (jitter).
* Adjust arm32v6 and aarch64 docker images to match the hello-world image
* Update from master, fix potential bug in push-images target, and update storagenode deploy to handle arm64 image
Creates a new chore, dbcleanup, which can be used for routine deletion of items from the satellite database and adds functionality for deletion of expired serial numbers
* + add git worktree cleanup
+ specify commit in git worktree add
* fail backwards compatibility test if no postgres
* fix typo
* rm dir after worktree rm
* add jenkins runs backwards compat test, fix sa config issue
* try to use current branch
* add jenkins stage condition of branch name
* try when with branch env var
* only run test on master branch
* fix != to ==
* add comment, fix indentation
* run for all branches
* make comment more specific
* add stage to jenkins, add script for backwards compat tests
* debug backwards compat tests
* add setup in a script
* set env vars
* add more env ars
* mv to one script to debug
* debug api key problem
* add tmp dir
* rm print statements
What: Change cmd/uplink to use scopes
It moves the fields that will be subsumed by scopes into an explicit legacy section and hides their configuration flags.
Why: So that it can read scopes in from files and stuff
* Added batch update stats for recordAuditSuccessStatus
* Added batch update stats to recordAuditFailStatus
* added configurable batch size
* build individual update/delete statements so the statements can be batched into 1 call to the DB
* notified #config-changes channel and ran make update-satellite-config-lock
* updated tests to use batch update stats
* Added a gc package at satellite/gc, which contains the gc.Service, which runs garbage collection integrated with the metainfoloop, and the gc PieceTracker, which implements the metainfo loop Observer interface and stores all of the filters (about which pieces are good) for each node.
* Added a gc config located at satellite/gc/service.go (loop disabled by default in release)
* Creates bloom filters with pieces to be retained inside the metainfo loop
* Sends RetainRequests (or filters with good piece ids) to all storage nodes.
* pkg/datarepair: Add test to check num upload pieces
Add a new test for ensuring the number of pieces that the repair process
upload when a segment is injured.
* satellite/orders: Don't create "put order limits" over total
Repair must not create "put order limits" more than the total count.
* pkg/datarepair: Update upload repair pieces test
Update the test which checks the number of pieces which are uploaded
during a repair for using the same excess over the success threshold
value than the implementation.
* satellites/orders: Limit repair put order for not being total
Limit the number of put orders to be used by repair for only uploading
pieces to a % excess over the successful threshold.
* pkg/datarepair: Change DataRepair test to pass again
Make some changes in the DataRepair test to make pass again after the
repair upload repaired pieces only until a % excess over success
threshold.
Also update the steps description of the DataRepair test after it has been
changed, to match on what's now, besides to leave it more generic for
avoiding having to update it on minimal future refactorings.
* satellite: Make repair excess optimal threshold configurable
Add a new configuration parameter to the satellite for being able to
configure the percentage excess over the optimal threshold, used for
determining how many pieces should be repaired/uploaded, rather than
having the value hard coded.
* repairer: Add configurable param to segments/repairer
Add a new parameters to the segment/repairer to calculate the maximum
number of excess nodes, based on the optimal threshold, that repaired
pieces can be uploaded.
This new parameter has been added for not returning more nodes than the
number of upload orders for data repair satellite service calculate for
repairing pieces.
* pkg/storage/ec: Update log message in clien.Repair
* satellite: Update configuration lock file
* pkg/process/metrics: add an instance prefix
the distinction between which satellite is sending which
data should go in the instance field, not the suffix or application
fields. (un)fortunately, the instance id is deliberately not
configurable because we don't want it to be easy to accidentally
have multiple applications collide with the same instance id.
so we're currently stuffing the human readable instance in the
suffix. :(
perhaps a reasonable tradeoff would be an optional instance
prefix that allows operators to put their domain name in
the instance
Change-Id: I6fcc8498be908c5740439cc00f77474ad151febd
* linting
Change-Id: I9f9a44fa9a2634ef5e4f89548d42d57ce9e4450e
* move offer out of marketing package and remove marketing package
* fix imports
* fix rename errors
* remove offer service
* change package name from offers to rewards
* fix linting
* remove unused code and use appropriate comment
* scripts/tag-release.sh: libuplink release tagging
a couple of arch review meetings ago we discussed how to make sure
that it is much easier to get the release defaults for binaries,
libraries, and so on. we already imperfectly solved the binary
problem with the release.sh script, but (until now!) have not solved
the problem of getting release defaults for people building from
source.
the solution we seemed to all prefer was to make sure our tagged
version commits check the release state into the source code.
this script aides in tagging commits with version tags and
updating the source defaults. it still plays nicely with
release.sh and our other build processes.
after this is merged we should configure github/go modules to
prefer people use one of our tags instead of master (which will
keep dev defaults).
Change-Id: I36c5c33a1bc90ec1685f59b05dde779090e252b6
* gofmt release.go
Change-Id: I6e968eff86230496e9cbddecd767ca8d8ff36ba4
* regex for version tag
Change-Id: Icaa6d753ffc962115d961bcabe9daed89b16430c
* added some docs
Change-Id: Ide624fab794ce849e3a3e7254fb038251bba0c71
* add voucher service on storage node
* config field tag syntax, go routines for requests
* hook up voucher service in storagenode/peer.go
* add voucher config to testplanet
* add voucher config to testplanet
* add voucher response status INVALID, ACCEPTED, REJECTED
* add a test for vouchers service
* handle no row from GetValid, test it
* add trust pool to voucher service
* use trusted list to get satellites
* verify vouchers upon receipt
* test VerifyVoucher
* set to only listen on 127.0.0.1, move static files to same location, better template handling
* handle error
* fix path in storj-sim
* revert template handling changes
* code shouldn't panic on invalid tempalte
* do not rewrite once writing has started
* write correct error code
* use filepath for path handling
* revert change
* fix
* fix mod tidy
* use correct error code for not found, avoid infinite loop on failure
* Set up new port 8090 for in offers
Clean up commented code
Rename offers to offersweb
Remove unused code
Add todos for adding front-end templates
Add middleware for only allow local access
Add comment
Fix linting error
Remove commented code
Update storj-sim
Check request IP against Host IP
Use net pakcage to retrieve IP address
Rename service to marketing
* Add wrapper for all errors
* fix conflicts
* update the config file
* fix linting error
* remove unused packages
* remove global runtime var and add flag to storj-sim for mar static dir
* remove debugging lines
* add new config for test data and check if static dir flag is set before passing to mux
* change 'console' to 'marketing' for test data config
* fix linting errors
* update config flag
* Trigger Jenkins
* Trigger CLA
* reputation: Add configuration parameters
Add the configuration parameters which will be used by the algorithm
which will calculate the storage node reputation.
Because the reputation calculation is based on audit and uptime check
results some configuration parameters are in pkg/audit, others in
pkg/discovery and other in the satellite which will combine the both
reputation results to obtain the storage node reputation for repair and
uplink.
* satellite-config: Refresh lock file with new params
Refresh the Satellite configuration yaml lock file with the new
parameters added in this branch.
* uplink: Mark encryption key config field for setup
Set the "setup" property to the `EncryptionConfig.EncrptionKey` for
avoiding to save it in the configuration file.
This field is only meant for using in the command line parameters which
need to use a different encryption key than the one present in the key
file or use it when there is not set any encryption key file path.
* cmd/uplink: Setup non-interactive accept enc key
Change the uplink CLI setup command non-interactive to save the
encryption key into a file when it's passed through the flag
--enc.encryption-key
Previous to this change it wasn't possible to create an key file despite
of that the flag was provided, so it was useless on the setup command.
* cmd/uplink: Reuse logic to read pwd from terminal
Reuse the logic which is already implemented in the pkg/cfgstruct for
reading a password from the terminal on interactive mode, rather than
duplicating it in the setup command.
* cmd/gateway: Use encryption key file flags
The cmd/gateway was still using the `enc.key` configuration field which
doesn't exist anymore and its setup command wasn't using the
`enc.key-filepath` with combination of the `enc.encryption-key` for
generating a file with the encryption key.
This commit update the cmd/gateway appropriately and move to the uplink
package the function used by cmd/uplink to save the encryption key for
allowing to also be used by the cmd/gateway without duplicating the
logic.
* cmd/storj-sim: Adapt gateway config cmd changes
Adapt the cmd/storj-sim to correctly pass the parameters to the
cmd/gateway setup and run command.
* scripts: Don't pass the --enc.encryption-key flag
uplink configuration has changed to only support the
`--enc.encryption-key` flag for setup commands and consequently the
cmd/uplink and cmd/gateway don't accept this flag over other commands,
hence the test for the uplink had to be updated for no passing the
flag on the multiples calls that the test do to cmd/uplink.
* uplink: Remove func which aren't useful anymore
Remove the function which allows to user or load an encryption key
because it isn't needed anymore since the `--enc.encryption-key` flag is
only available for the setup command.
Consequently remove its usage from cmd/uplink and cmd/gateway, because
such flag will always be empty because in case that's passed Cobra will
return an error due to a "unknown flag".