Commit Graph

4544 Commits

Author SHA1 Message Date
Yingrong Zhao
af773ec8a6 cmd/uplink: use DeleteBucketWithObjects for bucket force deletion
This PR updates `uplink rb --force` command to use the new libuplink API
`DeleteBucketWithObjects`.
It also updates `DeleteBucket` endpoint to return a specific error
message when a given bucket has concurrent writes while being deleted.

Change-Id: Ic9593d55b0c27b26cd8966dd1bc8cd1e02a6666e
2020-09-02 16:39:20 +00:00
Qweder93
36d752e92d storagenode/reputation: offline_under_review_at added
Change-Id: Ia7ec79b2d6f20fe29de0c36223f9485380d2845c
2020-09-02 18:48:28 +03:00
Qweder93
7d9897b7af storagenode/nodestats: online_score added
Change-Id: I84b50a6cace306e5f10d53a2073fe8810d4d2960
2020-09-02 17:45:01 +03:00
NickolaiYurchenko
5ed0038ace web/satellite: estimation table surge added
Change-Id: Ied7d737174d357076f0735091cc0a18c9e95b464
2020-09-02 11:35:56 +00:00
VitaliiShpital
3fbbe1847e web/satellite: scrolling with no paywall banner fixed
WHAT:
dashboard and billing pages' scrolling fixed with no paywall banner

WHY:
content was slightly clipped

Change-Id: I4d663775456980887af7bc41c72bf3a8da87d301
2020-09-02 09:51:04 +00:00
Yingrong Zhao
7c7334e9d5 satellite/metainfo/piecedeletion: fix deadlock when waiting for threshold to reach
This PR fixes a deadlock that can happen when the number of piece
deletion requests is different from the distinct node count from those
requests. The success threshold should be based on the number of nodes
instead of the amount of requests

Change-Id: I83073a22eb1e111be1e27641cebcefecdc16afcb
2020-09-02 07:07:49 +00:00
Yaroslav Vorobiov
4783c3e1d3 storagenode-updater/linux: restart systemd service
Change-Id: I8ac25ecc41323ec0d5abf8ee65276c1d7a59f74d
2020-09-02 06:22:19 +00:00
Egon Elbre
61b17f1214 satellite/orders: add encryption keys flag to Service
Change-Id: Ie96e75bc96241b799d04654ef5e05b82e6a899bb
2020-09-02 05:02:14 +00:00
JT Olio
1f711523d5 satellite/repair: switch to piecestore.UploadReader part 2
Change-Id: I5a91d2960b037c7a3c96d01bc40404316ba028e3
2020-09-01 12:40:54 -06:00
Natalie Villasana
95ff29cce1 satellite/metainfo: reduce lookupLimit default to 2500
Change-Id: I6569c6d1f145b127a9e8e1a65e4344dd62c989bb
2020-09-01 12:04:48 -04:00
JT Olio
b872fe52a1 satellite/repair: switch to piecestore.UploadReader
Change-Id: Ia99ad2cf5422e6ba1d98b32946740f9cadba7b6d
2020-09-01 09:26:54 -06:00
Michal Niewrzal
084e280a6d satellite/admin: fix time issues in tests
With day -1 test was creating tallies for two different months.

Change-Id: I10201f4e787e73bae40bf7120856fb7cfdcd66f7
2020-09-01 16:03:52 +02:00
Michal Niewrzal
0604a672c1 satellite/metainfo: use metabase in loop
Change-Id: I1bb0c6fe0a762895fde950690b06f7dd9d77e178
2020-09-01 10:06:16 +00:00
Isaac Hess
2d54447088 satellite/metainfo: Fix GetObjectIPs loop and test
This change forces the test of GetObjectIPs to use multiple remote
segments (earlier versions of the test were accidentally using inline
segments). This change also revealed a small bug in the for loop code,
which is fixed.

Change-Id: Ic486b079d221952ba13553acf0ca41a8873f3f21
2020-08-31 15:56:12 -06:00
Cameron Ayer
ca0c1a5f0c storagenode/{monitor,pieces}, storage/filestore: add loop to check storage directory writability
periodically create and delete a temp file in the storage directory
to verify writability. If this check fails, shut the node down.

Change-Id: I433e3a8d1d775fc779ae78e7cf3144a05ffd0574
2020-08-31 21:20:49 +00:00
Moby von Briesen
5d21e85529 satellite/audit/queue: Separate audit queue into two separate structs.
* The audit worker wants to get items from the queue and process them.
* The audit chore wants to create new queues and swap them in when the
old queue has been processed.

This change adds a "Queues" struct which handles the concurrency
issues around the worker fetching a queue and the chore swapping a new
queue in. It simplifies the logic of the "Queue" struct to its bare
bones, so that it behaves like a normal queue with no need to understand
the details of swapping and worker/chore interactions.

Change-Id: Ic3689ede97a528e7590e98338cedddfa51794e1b
2020-08-31 20:51:25 +00:00
Isaac Hess
ba1a113e2e satellite/metainfo: Add test for GetObjectIPs
This PR adds the integration test for GetObjectIPs, testing all repos.

Change-Id: Id4d4f19c427a2e320b2a82efa68150fa7f5f86fb
2020-08-31 10:27:56 -06:00
Isaac Hess
351aa70eb7 satellite/metainfo: Implement GetObjectIPs
Change-Id: Ibabbe7c555b790498d28a6ac4c95fcf2f7376978
2020-08-31 10:27:56 -06:00
Moby von Briesen
2d01dd9732 satellite/satellitedb: Add online_score column to nodes table
Add online score used for the new audit history offline tracking system
to the nodes table. This allows us easy access to the node's online
score for the storagenode dashboard as well as for data analysis.

Change-Id: Ie99be1192e5236862a5b3dbed2e5ef03b9169410
2020-08-31 15:07:07 +00:00
Bill Thorp
328004c0ef satellite/accounting: fix build - time rounding
We were seeing error on the last day of the month with TestProjectAllocatedBandwidthRetainTwo.
This is due to AddDate normalizes its result in the same way that Date does, so, for example,
adding one month to October 31 yields December 1, the normalized form for November 31."

I also fixed a minor UTC issue with this test as well.

Change-Id: I0157873e7befa57810e5f264a922b188890fa46a
2020-08-31 09:37:13 -04:00
nerdatwork
e072febbcc
Fixed typo in log for allocated space (#3934) 2020-08-29 16:36:37 +02:00
Egon Elbre
c86c732fc0 satellite: simplify tests
satellite.DB.Console().Projects().GetAll database query
can be replaced with planet.Uplinks[0].Projects[0].ID

Change-Id: I73b82b91afb2dde7b690917345b798f9d81f6831
2020-08-28 22:28:04 +00:00
stefanbenten
4645805b18 private/dbutil: set connMaxLifetime to 30 minutes
To prevent longlived unused connections, set the maximum time to 30 minutes to
prevent proxies and loadbalancers forcefully cutting the connection.
This helps in scenarios with low load/requests to a DB.

Change-Id: I7dba15ef97f6f6541e872a6fb1d3a9bbbfe5bb50
2020-08-28 18:00:41 +00:00
Moby von Briesen
60a95d0dc9 satellite/{satellitedb,overlay}: Enable offline suspension and review period
When a node's audit history "online score" passes below a configured
threshold, the node goes into "offline suspension" mode and begins a
review period, where the operator is given an opportunity to bring their
node back online.
After the review period passes, offline suspension is turned off for the
node.

In the future, if a node still has a bad online score at the end of the
review period, it will be disqualified. This is disabled right now.
In the future, if a node is in offline suspension, it will be treated as
"unhealthy". Right now, there are no consequences for being in offline
suspension.

Minor changes:
* Moves AuditHistoryConfig out of UpdateStats/BatchUpdateStats args and
into UpdateRequest.
* Adds "now" argument to UpdateStats/BatchUpdateStats args for easy
testing.
* Changes formatting strings inside buildUpdateStatement to use specific
types.

Change-Id: I032b60298840fc16e6ef831da750f2d57619a397
2020-08-28 16:35:48 +00:00
Egon Elbre
9225fc5aef satellite/accounting: simplify ExceedsBandwidthUsage call
Change-Id: I5376da2329e44da8f060226d2a76432df0acdaa0
2020-08-28 18:10:02 +03:00
Egon Elbre
3ca405aa97 satellite/orders: use metabase types as arguments
Change-Id: I7ddaad207c20572a5ea762667531770a56fd54ef
2020-08-28 15:52:37 +03:00
Brandon Iglesias
3bfb0a5246
Adding Kesque, MSP360, Innovoedge, Taloflow, Restic partners (#3933)
* Adding Kesque, MSP360, Innovoedge, Taloflow, Restic partners

* ABC order
2020-08-27 14:29:54 -04:00
Egon Elbre
d7c6ca6013 satellite/metainfo/metabase: add package for metainfo database
Currently there is confusion between responsibilities of
metainfo.Endpoint, metainfo.Service, PointerDB.

By separating database "service" into a separate package and
its types allows to disentagle them.

This gives us responsibilities:

1. metainfo.Endpoint - translates requests and permissions
2. metainfo.Service - handles requests and coordinates with
   objectdeletion, piecedeletion, metabase
3. metabase.Service - communication with the database interface and invariants

Currently metabase will contain the types necessary to coordinate
information.

Change-Id: If8c992b4b9d9e70a56bbd8a378a5af6b1a2ec34e
2020-08-27 17:31:11 +00:00
stefanbenten
086a3d5348 satellite/{payments,admin}: add deletion of user creditcards on account deletion
Change-Id: I38bf7e3995846150268f7b88a70f75b0ac871b62
2020-08-27 10:18:19 +00:00
Bill Thorp
729079965f satellite/satellitedb : remove migation steps 69-102
Jenkins has been failing a lot lately due to test timeouts with CockroachDB.
TestMigrateCockroach previously took around 5 minutes, now it takes 2.

Why 103?  I couldn't get 100 to work due to an error w/ NOT NULL and PKs.

Change-Id: Iec95d4e25f9d6cd36920e7f43272c486a17fa879
2020-08-27 07:36:05 +00:00
Cameron Ayer
3e343b683b cmd/segment-reaper: add metrics for zombie segments count
Change-Id: I106c6795946283165ba3de8465e5898346da1a3f
2020-08-26 18:42:59 +00:00
Bill Thorp
dbb53151f0 private/testplanet: Decrease metainfo MaxBuckets test value to speed testing.
TestMaxOutBuckets is one of our slower tests (50-90s).
This change seems to make it 2-12s.

It reduces the number of buckets that need to be created.
It also removes unnecessary storage nodes.

Change-Id: I1012fc6e9258b2f7674b16da4e8b418741c93eea
2020-08-26 17:31:31 +00:00
Moby von Briesen
4f28bf0720 satellite/audit: Do not return errors from Verify or Reverify on segment modified, expired, or deleted
If a segment is deleted, is modified, or expires during an audit, this
is not problematic, so we should not return errors. Functionally,
nothing changes, but our metrics around audit success rate will be
improved after this change.

Change-Id: Ic11df056b2c73894b67a55894bd4d58c00470606
2020-08-26 13:24:00 +00:00
Qweder93
c4a4745dd8 storagenode/console: audit per satellite now uses satelliteName instead of satelliteID
Change-Id: I8221ec840f654a62aedfb62a4194616db890f539
2020-08-25 12:52:47 +00:00
Qweder93
f16cf5cccf storagenode/console & /inspector: added recalculation of disk space info
Change-Id: Id003d031a6464ec095c31290fd6a756ead644261
2020-08-25 14:19:10 +03:00
Qweder93
88ff8829a1 satellite/gracefulexit: RecvTimeout increased to 2h, so slow nodes stop receiving lot of fails and as a result DQ
Change-Id: Id4c8a394162ba368aeb573a927f825bf7250aa52
2020-08-24 18:59:24 +03:00
Yingrong Zhao
bd5213f68b satellite/metainfo: implement batch delete for DeleteBucket
This PR changes DeleteBucket to be able to delete all objects within a
bucket if `DeleteAll` is set in `BucketDeleteRequest`.
It also changes `DeleteBucket` API to treat `ErrBucketNotFound` as a
successful delete operation instead of returning an error back to the
client.

Change-Id: I3a22c16224c7894f2d0c2a40ba1ae8717fa1005f
2020-08-24 13:28:09 +00:00
Egon Elbre
f0ef01de5b storagenode/gracefulexit: retry workers faster
Change-Id: Ica20a691ff117a2b36a6362ee1fed21ce49a9ac1
2020-08-24 12:27:27 +03:00
Egon Elbre
e6bea41083 Revert "gracefulexit: reconnect added"
This reverts commit cff44fbd19.

Change-Id: I6590f483493e308b8244151e1df7570fd32ca2f8
2020-08-23 18:11:24 +03:00
Qweder93
cff44fbd19 gracefulexit: reconnect added
Change-Id: I236689af944effe3e79ef92e852ae264d3b372e5
2020-08-22 14:59:46 +03:00
Moby von Briesen
68b67c83a7 storagenode/{orders,piecestore}: Always unlock unsent orders file, even with an empty order.
When we call ordersStore.BeginEnqueue, the unsent orders file for that
satellite and hour is prevented from being sent. It is freed when the
commit callback returned by BeginEnqueue is used. This change ensures
that we always call the commit callback, even when we have an empty
order or an order with Amount <= 0.

Change-Id: Ic4678f7eaa1e6957dd77d4bb5a23bb35d25b1e93
2020-08-21 11:35:31 -04:00
VitaliiShpital
5729d087b0 web/satellite: dashboard template simplified, project selection moved to nav bar
WHAT:
project selection moved to navigation panel

WHY:
preparing for multiple project state

Change-Id: I434c73c25b3fec85fc7226a8400cf280b379b537
2020-08-21 17:33:14 +03:00
VitaliiShpital
e5012fcb3d web/satellite: info bars for accounts with no paywall
WHAT:
info bars for accounts with no paywall implemented, USR-976

WHY:
we should notify users with no paywall that available coupon value is running low or coupon is used

Change-Id: I1a84afce890515b3aaedf1f0b8d359499af05471
2020-08-21 09:39:01 +00:00
littleskunk
db57d76ee9
storagenode/gracefulexit: fix wrong error handling for corrupted pieces (#3930) 2020-08-21 11:35:03 +02:00
Moby von Briesen
959cd5cd83 satellite/satellitedb: Update audit history from overlay.UpdateStats and overlay.BatchUpdateStats
Change-Id: Ib530b61895ca4a8b12ba022c408a416b237b56d7
2020-08-20 22:46:28 +00:00
Moby von Briesen
5f0477ebe9 satellite/{overlay,satellitedb}: Create database functionality for updating audit history
Add a function to the overlay cache called UpdateAuditHistory, which
allows us to add online or offline audits to a particular node's audit
history, and get that node's "online score" for the configured tracking
period.

The next step will be to use UpdateAuditHistory from inside
BatchUpdateStats/UpdateStats, so that audit history is actually updated
when nodes get audited, and we can suspend nodes based on their online
score.

Change-Id: I2289105e6961e68e829a987ff756b0e576fab120
2020-08-20 17:34:27 +00:00
Jeff Wendling
91698207cf storagenode: live tracking of order window usage
This change accomplishes multiple things:

1. Instead of having a max in flight time, which means
   we effectively have a minimum bandwidth for uploads
   and downloads, we keep track of what windows have
   active requests happening in them.

2. We don't double check when we save the order to see if it
   is too old: by then, it's too late. A malicious uplink
   could just submit orders outside of the grace window and
   receive all the data, but the node would just not commit
   it, so the uplink gets free traffic. Because the endpoints
   also check for the order being too old, this would be a
   very tight race that depends on knowledge of the node system
   clock, but best to not have the race exist. Instead, we piggy
   back off of the in flight tracking and do the check when
   we start to handle the order, and commit at the end.

3. Change the functions that send orders and list unsent
   orders to accept a time at which that operation is
   happening. This way, in tests, we can pretend we're
   listing or sending far into the future after the windows
   are available to send, rather than exposing test functions
   to modify internal state about the grace period to get
   the desired effect. This brings tests closer to actual
   usage in production.

4. Change the calculation for if an order is allowed to be
   enqueued due to the grace period to just look at the
   order creation time, rather than some computation involving
   the window it will be in. In this way, you can easily
   answer the question of "will this order be accepted?" by
   asking "is it older than X?" where X is the grace period.

5. Increases the frequency we check to send up orders to once
   every 5 minutes instead of once every hour because we already
   have hour-long buffering due to the windows. This decreases
   the maximum latency that an order will be reported back to
   the satellite by 55 minutes.

Change-Id: Ie08b90d139d45ee89b82347e191a2f8db1b88036
2020-08-19 19:42:33 +00:00
Cameron Ayer
0155c21b44 private/testplanet, storagenode/{monitor,pieces}: write storage dir verification file on run and verify on loop
On run, write the storage directory verification file.

Every time the node runs it will write the file even if it already exists.
The reason we do this is because if the verification file is missing, the SN
doesn't know whether it is an incorrect directory, or it simply hasn't written
the file yet, and we want to keep nodes running without needing operator intervention.

Once this change has been a part of the minimum version for several releases,
we will move the file creation from the run command to the setup
command. Run will only verify its existence.

Change-Id: Ib7d20e78e711c63817db0ab3036a50af0e8f49cb
2020-08-19 19:12:21 +00:00
Cameron Ayer
586e6f2f13 private/testblobs, storage, storage/filestore: add storage dir verification to filestore
Sometimes SNOs fail to properly configure or lose connection to their storage directory
which can result in DQ. This causes unnecessary repair and is unfortunate for all parties.

This change introduces the creation of a special file in the storage directory at runtime
containing the node ID. While the storage node runs, it periodically verifies that it can
find said file with the correct contents in the correct location. If not, the node will
shut down with an error message.

This change will solve the issue of nodes losing access to the storage directory, but it will not
solve the issue of nodes pointing to the wrong directory, as the identifying file is created each
time the node starts up. After this change has been the minimum version for a few releases, we will
remove the creation of the directory-identifying file from the storage node run command and add it
to the setup command.

Change-Id: Ib7b10e96ac07373219835e39239e93957e7667a4
2020-08-19 17:18:14 +00:00
Yingrong Zhao
14ad7a4f1c satellite/metainfo: add limiter for objectdeletion and piecedeletion
services

This PR adds a limiter on the amount of concurrent objects deletion can be handled so
we don't run out of memory.

Change-Id: Id2ce368af6f86845fcdfd34cb2f5e460efe9b272
2020-08-19 16:08:29 +00:00