Commit Graph

225 Commits

Author SHA1 Message Date
Jeff Wendling
32f683fe9d satellite/orders: filter nodes based on segment placement
this change adds code to CreateGetOrderLimits to filter
out any nodes that are not in the placement specified
by the segment. notably, it does not change the audit
or repair order limits. the list segments code had to be
changed to include getting the placement field from the
database.

Change-Id: Ice3e42a327811bb20928c619a72ed94e0c1464ac
2023-06-05 13:56:22 -04:00
Michal Niewrzal
eabd9dd994 satellite/orders: remove unsed argument
Change-Id: I6c5221fc19f97ae6db5627d7239795ff663289e0
2023-05-22 14:35:08 +00:00
paul cannon
915f3952af satellite/repair: repair pieces on the same last_net
We avoid putting more than one piece of a segment on the same /24
network (or /64 for ipv6). However, it is possible for multiple pieces
of the same segment to move to the same network over time. Nodes can
change addresses, or segments could be uploaded with dev settings, etc.
We will call such pieces "clumped", as they are clumped into the same
net, and are much more likely to be lost or preserved together.

This change teaches the repair checker to recognize segments which have
clumped pieces, and put them in the repair queue. It also teaches the
repair worker to repair such segments (treating clumped pieces as
"retrievable but unhealthy"; i.e., they will be replaced on new nodes if
possible).

Refs: https://github.com/storj/storj/issues/5391
Change-Id: Iaa9e339fee8f80f4ad39895438e9f18606338908
2023-04-06 17:34:25 +00:00
Michal Niewrzal
bc8f8f62b5 satellite/orders: cleanup after altering primary key
We changed primary key for bucket_bandwidth_rollups table. Now we
need to do some cleanup in places like structs, sorting methods or SQL
queries.

Change-Id: Ida4f874f161356df193379a53507602e04db1668
2023-03-06 16:03:11 +00:00
Michal Niewrzal
b46c0fb78f satellite/orders: don't cancel flushing bandwidth orders
Earlier we made a change to not cancel flushing orders when flushing
was triggered by orders endpoint method but we missed a case
where it can be also triggered (and canceled) by metainfo endpoints
method. This change moves ignoring context cancellation deeper.

Change-Id: Id43176f552efc3167345783f73aab885411ac247
2023-03-01 17:10:05 +00:00
Michal Niewrzal
16b7901fde satellite/metabase: add piece size calculation to segment
This code is essentially replacement for eestream.CalcPieceSize. To call
eestream.CalcPieceSize we need eestream.RedundancyStrategy which is not
trivial to get as it requires infectious.FEC. For example infectious.FEC
creation is visible on GE loop observer CPU profile because we were
doing this for each segment in DB.

New method was added to storj.Redundancy and here we are just wiring it
with metabase Segment.

BenchmarkSegmentPieceSize
BenchmarkSegmentPieceSize/eestream.CalcPieceSize
BenchmarkSegmentPieceSize/eestream.CalcPieceSize-8         	    5822	    189189 ns/op	    9776 B/op	       8 allocs/op
BenchmarkSegmentPieceSize/segment.PieceSize
BenchmarkSegmentPieceSize/segment.PieceSize-8              	94721329	        11.49 ns/op	       0 B/op	       0 allocs/op

Change-Id: I5a8b4237aedd1424c54ed0af448061a236b00295
2023-02-22 11:04:02 +00:00
Márton Elek
8f8e97de23 satellite/metainfo: support desired node number for download object/segment
This modification introduce support of the new "desired node" field of download segment/object.

This can be used to request more nodes than the suggested minimum. It can be used to achieve better performance in exchange of using more bandwidth. (more parallel downloads).

Change-Id: Ia167d6979e6d70a597c85070a4ccd1c3a573e406
2023-02-13 13:57:48 +00:00
Márton Elek
ca6e3a9e88 satellite/orders: create mock based unit test
Most of our (~integration) tests based on testplanet runner.

However running testplanet for each test make the testing process slow.
It seems to be better to use real unit tests (without db dependency) when it's possible.

This patch makes small modification to make it possible to test orders.Service with real unit test.

As the existing unit test of `service.go` is isolated with `_test` package name, it's moved to an `_integration_test.go` file to make place for the unit test.

Change-Id: Ia69f26a34e2c48d230d8d36c2040dd02a60455a6
2023-02-13 13:24:30 +00:00
Michal Niewrzal
252c437b0e satellite/orders: ignore context canceled when updating bucket bandwidth
Orders from storage nodes are received by SettlementWithWindowFinal method. There is a stream which receives all orders and after getting
all orders we are inserting into DB storagenode and bucket bandwidth. Problem is with bucket bandwidth which is stored through cache which is often using context from SettlementWithWindowFinal stream to perform DB inserts and its doing this in separate goroutine. Because of that is possible that  SettlementWithWindowFinal is finished before flushing was finished and context is canceled while doing insert into DB

Change-Id: I3a72c86390e9aedc060f6b082bb059f1406231ee
2023-02-08 13:21:42 +00:00
JT Olio
686faeedbd satellite/overlay: return noise info with selected nodes
we have two more fields in the database (noise_proto and
noise_public_key) that now need to go into pb.NodeAddress when
returning AddressedOrderLimits.

the only real complication is making sure type conversions between
database types and NodeURLs and so on don't lose this new
pb.NodeAddress field (NoiseInfo). otherwise this is a relatively
straightforward commit

Change-Id: I45b59d7b2d3ae21c2e6eb95497f07cd388d454b3
2023-02-02 15:46:27 +00:00
Michal Niewrzal
15508d270c satellite/orders: don't store non user bandwidth actions for bucket
For bucket_bandwidth_rollups we are trying to insert lots of entries
with empty bucket name and project id. Those are inserts from orders
created by repair, audit and GE. High load on the same primary key
(the same range) is causing many retries and that's affect all inserts
as we are putting 1000 entries into DB a once.

This change solves this problem by not storing into
bucket_bandwidth_rollups other actions then GET and PUT. Those actions
are only important from bucket bandwidth usage perspective because
those are actions performed by users. Other actions (repair, audit or
GE) are also stored in storagenode_bandwdith_rollups so we will still
have access to them e.g. for statistic purposes.

https://github.com/storj/storj/issues/5332

Change-Id: Ibb5bf0a4c869b0439dc65da1c9342a38ca2890ba
2023-02-01 15:38:48 +00:00
Michal Niewrzal
3b6e1123b8 satellite/orders: fix sorting rollups before inserting
Sorting by primary key before inserting data into DB is fixed.
Earlier we were sorting input slice of BucketBandwidthRollup but then
we were putting all entries into map to rollup input data. Iteration
over map with a range loop doesn't guarantee any specific order so we
were loosing sorted order when we were creating with this map slices to
use with DB insert.

New code is also using map but when map is full its sorting map keys
separately and iterates over them to get data from map.

https://github.com/storj/storj/issues/5332

Change-Id: I5bf09489b0eecb6858bf854ab387b660124bf53f
2023-02-01 12:17:25 +00:00
Andrew Harding
abd0ad92dc satellite/metainfo: RetryBeginSegmentPieces RPC implementation
Part of:
https://github.com/storj/uplink/issues/120

Change-Id: I2a2873455f7498ffd31f50ade16c173fe1d18157
2023-01-27 15:04:59 +00:00
Michal Niewrzal
4cbb1ed296 satellite/orders: log bandwidth values we are dropping
When we have problem with inserting bandwidth amounts into cache
or DB we are logging information about it but log entries are not
very detailed. This change adds bandwidth amounts to the log entry.

https://github.com/storj/storj/issues/5470

Change-Id: I55ccad837d17b141501d3def1dec7ad5f3acdb0b
2023-01-20 09:28:25 +00:00
Michal Niewrzal
0185bba90a cmd: cleanup segment verify/repair tools
* use the same DB application name for satellite and metabase
* use noop orders DB implementation to avoid storing allocated bandwidth
in DB

Change-Id: I20e88c694d38240fe1a20c45719e210cfb76402c
2023-01-12 15:27:07 +00:00
Michal Niewrzal
a2a9dafa33 satellite/orders: don't store allocated bandwidth in
bucket_bandwidth_rollups table

We have performance problems with updating bucket_bandwidth_rollups. To
improve situation we can stop storing allocated bandwidth in this table.
This should reduce large number of updates which are comming from
metainfo endpoints, repair workers and audit.

Next step will be to drop `allocated` column completely from
bucket_bandwidth_rollups.

Allocated GET bandwidth is all we need and we are keeping it in
bucket_bandwidth_rollups table.

Change-Id: Ifdd26a89ba8262acbca6d794a6c02883ad0c0c9b
2023-01-12 13:21:02 +00:00
Clement Sam
3378215adf satellite/orders: decrease order expiration time to 24hours
Closes https://github.com/storj/storj/issues/5202

Change-Id: I55d1a84c46dd610eeb00dd79df8f4f7e699499a0
2022-11-21 14:52:32 +00:00
Ivan Fraixedes
567557abc3 satellite/orders: Remove period logs messages
Remove the final period of two log messages to be consistent with the
other logs messages.

Change-Id: I9253a4d5fb293c95d3baf8e093dc5744387c1516
2022-11-21 13:19:13 +00:00
Stefan Benten
70d42ead1c
satellite/orders/endpoint.go: order rollups by bucket name first
Currently the primary key of the underlying rollup table has the
primary key being the bucket name, but we used to sort by projectID.
This caused dead locks due to the contention during updates/inserts.

We should reevalute if bucket name being the primary key is the right
way for this table, this should stop the long running and failing attempts tho.

Change-Id: Ie7d0f86944da48ad9cbd92eb162226882a2fb954
2022-11-16 19:48:43 +01:00
paul cannon
c54c45c9c7 satellite/audit: new ReverifyPiece implementation
ReverifyPiece() is not currently hooked up to anything, but is planned
to take the place of audit.(*Verifier).Reverify().

ReverifyPiece() works by downloading one piece in its entirety, rather
than pulling an entire stripe across many nodes.

Change-Id: Ie2c680f4d3c3b65273a72466a3f9f55c115b0311
2022-10-27 16:06:21 +00:00
Michal Niewrzal
a97cd97789 satellite/orders: remove unused service dependency
Orders service doesn't need buckets service anymore.

Change-Id: I27853cda87e82b528f53667e4b4866801f7bfb62
2022-09-28 08:56:36 +00:00
Egon Elbre
51d4e5c275 satellite/{orders,overlay}: use cache for downloads
Use DownloadSelectionCache to avoid querying database for every
download.
This change only addresses downloads from users. The download selection
cache is not currently used for audit and repair.

Change-Id: I96a49e121dac0b4204f97592a63131edabd73fb5
2022-07-12 11:04:34 +00:00
Michał Niewrzał
456aea727e satellite: use PieceIDDeriver for derivation
We can use PieceIDDeriver in all places where we are deriving id from
the same id multiple times. We have serveral such places: gc, segment
deletion, segment validation, order limit creation. Using it should
save some resources.

Change-Id: I24668d516c0f7cea4aec6470614067734149501d
2022-05-19 06:31:42 +00:00
Fadila Khadar
29fd36a20e satellite/repairer: handle excluded countries
For nodes in excluded areas, we don't necessarily want to remove them
from the pointer, but we do want to increase the number of pieces in the
segment in case those excluded area nodes go down. To do that, we
increase the number of pieces repaired by the number of pieces in
excluded areas.

Change-Id: I0424f1bcd7e93f33eb3eeeec79dbada3b3ea1f3a
2022-03-14 10:59:36 -04:00
Yingrong Zhao
1f8f7ebf06 satellite/{audit, reputation}: fix potential nodes reputation status
inconsistency

The original design had a flaw which can potentially cause discrepancy
for nodes reputation status between reputations table and nodes table.
In the event of a failure(network issue, db failure, satellite failure, etc.)
happens between update to reputations table and update to nodes table, data
can be out of sync.
This PR tries to fix above issue by passing through node's reputation from
the beginning of an audit/repair(this data is from nodes table) to the next
update in reputation service. If the updated reputation status from the service
is different from the existing node status, the service will try to update nodes
table. In the case of a failure, the service will be able to try update nodes
table again since it can see the discrepancy of the data. This will allow
both tables to be in-sync eventually.

Change-Id: Ic22130b4503a594b7177237b18f7e68305c2f122
2022-01-06 21:05:59 +00:00
littleskunk
b21cbc85f1
satellite/orders: log level warn for remaining "bucketName or projectID not set" (#4326)
Co-authored-by: Stefan Benten <mail@stefan-benten.de>
2022-01-06 16:31:26 +01:00
dlamarmorgan
ab37b65cfc satellite/{accounting,orders,satellitedb}: group bucket bandwidth rollups by time window
Batching of the order submissions can lead to combining the allocated
traffic totals for two completely different time windows, resulting
in incorrect customer accounting. This change will group the batched
order submissions by projectID as well as time window, leading to
distinct updates of a buckets bandwidth rollup based on the hour
window in which the order was created.

Change-Id: Ifb4d67923eec8a533b9758379914f17ff7abea32
2022-01-05 20:24:48 +00:00
Stefan Benten
a6139f5a6b satellite/orders: log less decrypt order messages
Currently most of the satellite log is made up by this specific log
message and thus we should increase the log level. We already have
a monkit event tracking this case. In case we need to look into this
more we should just increase the satellite log level.

Change-Id: I27ebed3e6745701c66c83e8c52ddc836ad9d5f4e
2021-12-21 00:39:29 +01:00
Michał Niewrzał
911540e76e satellite/orders: avoid logging "bucketName or projectID not set"
When we switched to segments loop we stopped
adding bucket name and project ID to order limit
metadata. This is causing lots of logging that we don't need.

Change-Id: Id3f878a170b7a6b801e8a838ee69165715985d60
2021-12-14 10:31:48 +00:00
Fadila Khadar
fb0d055a41 satellites/orders: populate egress_dead in project_bandwidth_daily_rollups
Populate the egress_dead column for taking into account allocated bandwidth that can be removed because orders have been sent by the storage nodes. The bandwidth not used in these orders can be allocated again.

Change-Id: I78c333a03945cd7330aec052edd3562ec671118e
2021-10-06 16:54:49 +00:00
Michał Niewrzał
1ed5db1467 satellite/metainfo: simplifying limits code
Its a very simple change to reduct code duplication.

Change-Id: Ia135232e3aefd094f76c6988e82e297be028e174
2021-09-28 06:22:13 +00:00
Jeff Wendling
b160ec4c1b satellite/orders: bound RollupsWriteCache flushes
In the situation where the flushes take longer than the incoming
rate of writes, the RollupsWriteCache will take every connection
in the database pool and use them forever. Instead of doing that
and taking down satellite availability, bound the number of flush
operations that it will perform and drop incoming writes earlier
to keep memory usage constant.

Adds monitoring events for if any flushes or updates are lost.

Change-Id: I81b169b73501ee9b999f4b03d1e79645fc56f167
2021-09-15 19:14:39 +00:00
Michał Niewrzał
c258f4bbac private/testplanet: move Metabase outside Metainfo for satellite
At some point we moved metabase package outside Metainfo
but we didn't do that for satellite structure. This change
refactors only tests.
When uplink will be adjusted we can remove old entries in
Metainfo struct.

Change-Id: I2b66ed29f539b0ec0f490cad42c72840e0351bcb
2021-09-09 07:15:51 +00:00
Cameron Ayer
26f839a445 satellite/repair/repairer: if not enough nodes for repair order limits, increment metric and log as irreparable segment
Change-Id: I4bd46f28d64278c8d463e885ad221aafb6ce7cf3
2021-08-27 13:42:28 +00:00
Cameron Ayer
24e02b6352 satellite/{audit,orders}: if not enough nodes for audit order limits, increment metric and wrap error with ErrNotEnoughShares
Increment a metric so we can get alerts. Wrap the error so we can search
the logs for it.

Change-Id: I3827aa306c431009828014d9d9afff8dfc057ee6
2021-08-26 20:14:05 +00:00
Clement Sam
d73b9fff9a satellite/orders: set the expirationDate in CreatePutRepairOrderLimits
In the past ExpirationDate was available inside CreatePutRepairOrderLimits
but this was removed since the metabase segment was missing the ExpiresAt field.
Now ExpiresAt field is available in the metabase segment
and can be set correctly while executing NewSignerRepairPut.

Change-Id: I068c07492ab27bde2c44477bbd32c5872edd024a
2021-07-27 12:44:40 +00:00
Cameron Ayer
449c873681 satellite/repair/repairer: attempt repair GETs using nodes' last IP and port first
Sometimes we see timeouts from DNS lookups when trying to do
repair GETs. Solution: try using node's last IP and port first.
If we can't connect, retry with DNS lookup.

Change-Id: I59e223aebb436118779fb18378f6e09d072f12be
2021-07-21 13:13:06 +00:00
Michał Niewrzał
70e6cdfd06 satellite/audit: move to segmentloop
Change-Id: I10e63a1e4b6b62f5cd3098f5922ad3de1ec5af51
2021-06-28 11:32:00 +00:00
Jeff Wendling
8a6efa1f58 satellite/orders: query for node first before upsert/replace
the very common case is that the node api version is
indeed at least the requested version, so query for
that first to avoid write traffic.

Change-Id: Ib047d93078205bc07fee75d1f635503b792307f0
2021-06-22 15:16:12 -04:00
JT Olio
da9ca0c650 testplanet/satellite: reduce the number of places default values need to be configured
Satellites set their configuration values to default values using
cfgstruct, however, it turns out our tests don't test these values
at all! Instead, they have a completely separate definition system
that is easy to forget about.

As is to be expected, these values have drifted, and it appears
in a few cases test planet is testing unreasonable values that we
won't see in production, or perhaps worse, features enabled in
production were missed and weren't enabled in testplanet.

This change makes it so all values are configured the same,
systematic way, so it's easy to see when test values are different
than dev values or release values, and it's less hard to forget
to enable features in testplanet.

In terms of reviewing, this change should be actually fairly
easy to review, considering private/testplanet/satellite.go keeps
the current config system and the new one and confirms that they
result in identical configurations, so you can be certain that
nothing was missed and the config is all correct.
You can also check the config lock to see what actual config
values changed.

Change-Id: I6715d0794887f577e21742afcf56fd2b9d12170e
2021-06-01 22:14:17 +00:00
Egon Elbre
10a0216af5 satellite/metainfo: use range for specifying download limit
Previously the object range was not used for calculating order limit.
This meant that even if you were downloading only a small range it would
account bandwidth based on the full segment.

This doesn't fully address the accounting since the lazy segment
downloads do not send their requested range nor requested limit.

Change-Id: Ic811e570c889be87bac4293547d6537a255078da
2021-06-01 09:36:55 +00:00
Michał Niewrzał
59eabcca24 satellite/orders: populate
project_bandwidth_daily_rollups table

We want to calculate used bandwidth better so we need to calculate it
from allocated and settled bandwidth. To do this we need first populate
this new table.

https://storjlabs.atlassian.net/browse/PG-56

Change-Id: I308b737bf08ee48ce4e46a3605697ab2095f7257
2021-05-25 18:07:22 +00:00
Egon Elbre
961e841bd7 all: fix error naming
errs.Class should not contain "error" in the name, since that causes a
lot of stutter in the error logs. As an example a log line could end up
looking like:

    ERROR node stats service error: satellitedbs error: node stats database error: no rows

Whereas something like:

    ERROR nodestats service: satellitedbs: nodestatsdb: no rows

Would contain all the necessary information without the stutter.

Change-Id: I7b7cb7e592ebab4bcfadc1eef11122584d2b20e0
2021-04-29 15:38:21 +03:00
Egon Elbre
267506bb20 satellite/metabase: move package one level higher
metabase has become a central concept and it's more suitable for it to
be directly nested under satellite rather than being part of metainfo.

metainfo is going to be the "endpoint" logic for handling requests.

Change-Id: I53770d6761ac1e9a1283b5aa68f471b21e784198
2021-04-21 15:54:22 +03:00
Egon Elbre
86e698f572 pb: use *UnimplementedServer to avoid breaking API changes
Change-Id: I99a34eeb37ac4453411f273511710562a519f57a
2021-03-29 12:26:10 +03:00
Michał Niewrzał
67e26aafcd Merge remote-tracking branch 'origin/main' into multipart-upload
Change-Id: I9b183323cb470185be22f7c648bb76917d2e6fca
2021-03-10 08:53:38 +01:00
Natalie Villasana
c290e5ac9a satellite/orders: decrease FlushBatchSize default to 1000
The previous default FlushBatchSize of 10000 was causing major
slow down in select and insert statements on bucket_bandwidth_rollups.
We saw on the saltlake satellite that a FlushBatchSize of 1000 helped
reduce contention and query latency.

Change-Id: Ib95e73482219bc5aedc11925b1849fa5999774ba
2021-03-02 14:00:48 +00:00
Michał Niewrzał
9a60011774 Merge remote-tracking branch 'origin/main' into multipart-upload
Change-Id: Ia90f29be432e207c4125f7f955c912978eabe59a
2021-02-04 09:38:08 +01:00
Ivan Fraixedes
9c9f481469 satellite/orders: Remove deprecated endpoint
Remove the orders Settlement endpoint because it isn't used and it was
already always returning an error.

Change-Id: I81486fbe7044a1444182173bc0693698ee7cfe7e
2021-02-03 23:47:07 +00:00
Kaloyan Raev
6f3d0c4ad5 Merge remote-tracking branch 'origin/main' into multipart-upload
Conflicts:
	go.mod
	go.sum
	satellite/repair/repair_test.go
	satellite/repair/repairer/segments.go

Change-Id: Ie51a56878bee84ad9f2d31135f984881a882e906
2021-02-02 19:19:04 +02:00