For the last few month we had no issues with order submission. I would
call it stable and now it is time to risk a lower expire time. This will
increase the database performance on the satellite and it will reduce
the delay for billing.
The long term goal is 6h but for that step we need to change graceful
exit first. At the moment storage nodes would get disuqlaified for not
transfering alle pieces in less than 6 hours.
Change-Id: I421a2c2421c5374c4e706e2338f1c2161fedc14c
paths are organized as follows:
project_id/segment_index/bucket_name/encrypted_key
so by picking parts[0] and parts[1], we were using the segment
index instead of the bucket name, causing bandwidth to be
accounted for incorrectly. additionally, we were using the
PUT action instead of the PUT_GRACEFUL_EXIT action, causing
the data to be charged incorrectly. we use PUT_REPAIR for
now because nodes won't accept uploads with PUT_GRACEFUL_EXIT
and our tables need migrations to handle rollups with it.
Change-Id: Ife2aff541222bac930c35df8fcf76e8bac5d60b2
A uuid.UUID is an array of bytes, and slicing it refers to the
underlying value, much like taking the address. Because range
in Go reuses the same value for every loop iteration, this means
that later iterations would overwrite earlier stored project
ids. We fix that by making a copy of the value before slicing it
for every loop iteration.
Change-Id: Iae3f11138d11a176ce360bd5af2244307c74fdad
Currently we risk losing pending bandwidth rollup writes even on a clean
shutdown. This change ensures that all pending writes are actually
written to the db when shutting down the satellite.
Change-Id: Ideab62fa9808937d3dce9585c52405d8c8a0e703
With this change RS configuration will be set on satellite. Uplink with
get RS values with BeginObject request and will use it. For backward
compatibility and to avoid super large change redundancy scheme stored
with bucket is not touched. This can be done in future.
Change-Id: Ia5f76fc10c37e2c44e4f7b8754f28eafe1f97eff
warning: databases migrated to version 77 before this commit
is merged must be manually re-migrated. this should not be a
problem for anything but staging databases.
Change-Id: Ie1631c48379472352014183ee43f1465e22200f7
this commit introduces the reported_serials table. its purpose is
to allow for blind writes into it as nodes report in so that we have
minimal contention. in order to continue to accurately account for
used bandwidth, though, we cannot immediately add the settled amount.
if we did, we would have to give up on blind writes.
the table's primary key is structured precisely so that we can quickly
find expired orders and so that we maximally benefit from rocksdb
path prefix compression. we do this by rounding the expires at time
forward to the next day, effectively giving us storagenode petnames
for free. and since there's no secondary index or foreign key
constraints, this design should use significantly less space than
the current used_serials table while also reducing contention.
after inserting the orders into the table, we have a chore that
periodically consumes all of the expired orders in it and inserts
them into the existing rollups tables. this is as if we changed
the nodes to report as the order expired rather than as soon as
possible, so the belief in correctness of the refactor is higher.
since we are able to process large batches of orders (typically
a day's worth), we can use the code to maximally batch inserts into
the rollup tables to make inserts as friendly as possible to
cockroach.
Change-Id: I25d609ca2679b8331979184f16c6d46d4f74c1a6
the hope is that it is mostly interfering with itself, so this
will make it not do that (well, N api servers, but hopefully
that's not enough to cause it to have issues).
Change-Id: Ifd0c9e6617457785ab25fe5b714d8556cdc8e2d3
When an uplink requests an upload or download from the satellite we are trackig the
allocated bandwidth twice. The value in bucket_bandwidth_rollups is used
for project limits but the value in storagenode_bandwidth_rollups is not
used at all. We can increase the performance by removing it. Uplinks
will get a faster response from the satellite.
Change-Id: Icccd41f94107ef34668f30f99bf5f728c384b07e
Remove direct dependency on uplink.RSConfig, this simplifies
moving the config file without introducing weird dependencies.
Change-Id: I7fd2a145401e0205d7047631df9d2810241efeec
all of the packages and tests work with both grpc and
drpc. we'll probably need to do some jenkins pipelines
to run the tests with drpc as well.
most of the changes are really due to a bit of cleanup
of the pkg/transport.Client api into an rpc.Dialer in
the spirit of a net.Dialer. now that we don't need
observers, we can pass around stateless configuration
to everything rather than stateful things that issue
observations. it also adds a DialAddressID for the
case where we don't have a pb.Node, but we do have an
address and want to assert some ID. this happened
pretty frequently, and now there's no more weird
contortions creating custom tls options, etc.
a lot of the other changes are being consistent/using
the abstractions in the rpc package to do rpc style
things like finding peer information, or checking
status codes.
Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412
* V3-2529: Add DB savepoint to fix issue with postgres. Add test force a rejected order
Co-Authored-By: Ivan Fraixedes <ivan@fraixed.es>
* Update satellite/satellitedb/orders.go
Creates a new chore, dbcleanup, which can be used for routine deletion of items from the satellite database and adds functionality for deletion of expired serial numbers
This PR introduces functionality for routine deletion of archived orders.
The user may specify an interval at which to run archive cleanup and a TTL for archived items. During each cleanup, all items that have reached the TTL are deleted
This archive cleanup job is combined with the order sender into a new combined orders service
* pkg/process: Fatal show complete error information
Change the general process execution function to not using the sugared
logger for outputting the full error information.
Delete some unreachable code because Zap logger Fatal method calls exit
1 internally.
* storagenode/storagenodedb: Add info to error
Add more information to an error returned due to some data
inconsistency.
* storagenode/orders: Don't use sugared logger
Don't use sugar logger and provide better contextualized error messages
in settle method.
* storagenode/orders: Add some log fields to error msgs
Add some relevant log fields to some logged errors of the sender settle
method.
* satellite/orders: Remove always nil error from debug
Remove an error which as logged in debug level which was always nil and
makes the logic that used this variable clear.
* storagenode/orders: Don't return error Archiving unsent
Don't stop the process which archive unsent orders if some of them
aren't found the DB because it cause the Storage Node to stop with a
fatal error.
* satellitedb/certDB: refactors of the node certificate storage DB table
The existing implementation doesnt allow to store the complete certificate chain of uplinkIDs or storagenodeIDs, so the current table is dropped and new table will be added which addresses the storage and retrieval of certificates
pkg/identity: fixes spelling mistakes that I missed on PR#2754
Fixes V3-1992/V3-2388
* rename pkg/linksharing to linksharing
* rename pkg/httpserver to linksharing/httpserver
* rename pkg/eestream to uplink/eestream
* rename pkg/stream to uplink/stream
* rename pkg/metainfo/kvmetainfo to uplink/metainfo/kvmetainfo
* rename pkg/auth/signing to pkg/signing
* rename pkg/storage to uplink/storage
* rename pkg/accounting to satellite/accounting
* rename pkg/audit to satellite/audit
* rename pkg/certdb to satellite/certdb
* rename pkg/discovery to satellite/discovery
* rename pkg/overlay to satellite/overlay
* rename pkg/datarepair to satellite/repair
* pkg/datarepair: Add test to check num upload pieces
Add a new test for ensuring the number of pieces that the repair process
upload when a segment is injured.
* satellite/orders: Don't create "put order limits" over total
Repair must not create "put order limits" more than the total count.
* pkg/datarepair: Update upload repair pieces test
Update the test which checks the number of pieces which are uploaded
during a repair for using the same excess over the success threshold
value than the implementation.
* satellites/orders: Limit repair put order for not being total
Limit the number of put orders to be used by repair for only uploading
pieces to a % excess over the successful threshold.
* pkg/datarepair: Change DataRepair test to pass again
Make some changes in the DataRepair test to make pass again after the
repair upload repaired pieces only until a % excess over success
threshold.
Also update the steps description of the DataRepair test after it has been
changed, to match on what's now, besides to leave it more generic for
avoiding having to update it on minimal future refactorings.
* satellite: Make repair excess optimal threshold configurable
Add a new configuration parameter to the satellite for being able to
configure the percentage excess over the optimal threshold, used for
determining how many pieces should be repaired/uploaded, rather than
having the value hard coded.
* repairer: Add configurable param to segments/repairer
Add a new parameters to the segment/repairer to calculate the maximum
number of excess nodes, based on the optimal threshold, that repaired
pieces can be uploaded.
This new parameter has been added for not returning more nodes than the
number of upload orders for data repair satellite service calculate for
repairing pieces.
* pkg/storage/ec: Update log message in clien.Repair
* satellite: Update configuration lock file