Commit Graph

610 Commits

Author SHA1 Message Date
Egon Elbre
7183dca6cb all: fix defers in loop
defer should not be called in a loop.

Change-Id: Ifa5a25a56402814b974bcdfb0c2fce56df8e7e59
2020-11-02 15:06:38 +02:00
Egon Elbre
fd8e697ab2 {satellite,storagenode}/internalpb: use specific package name
Ensure we don't register types with the same name into protobuf.

Change-Id: I53d025863fff8c91a067ca5819befa87eb5e35bb
2020-10-30 17:31:08 +02:00
Egon Elbre
1903b15474 storagenode/internalpb: move gracefulexit.proto
Change-Id: Ia3614846ed49a39c8f39331516d16d45a695240b
2020-10-30 15:24:56 +02:00
Egon Elbre
cda67a659a storagenode/internalpb: move inspector.proto
Change-Id: I951379c3b2ff00d1bc09d6a49c026a7e723432d6
2020-10-30 14:51:26 +02:00
Qweder93
f5ba8b8009 storagenode/suspensions: added offline-suspension notificatio chore + tests
Change-Id: I2521cd2e7d08a1dd379e717a554a026c7508c18f
2020-10-29 19:44:22 +02:00
Egon Elbre
e0dca4042d all: add pprof labels for debugger
By using pprof.Labels debugger is able to show service/peer names in
goroutine names.

Change-Id: I5f55253470f7cc7e556f8e8b87f746394e41675f
2020-10-29 15:10:07 +00:00
Qweder93
624255e8ba storagenode/secret: db tests added, small renaming fixes added
Change-Id: I7eae1a9a64c20a39c97e81fa741cfc9b9e1e615a
2020-10-29 14:23:04 +02:00
Egon Elbre
caefde6b32 private/{dbutil,tagsql}: pass ctx to database opening
Database opening usually dial and hence we should pass ctx to them.

Change-Id: Iaa2875981570d83e65be3710f841cf30349f807b
2020-10-29 10:51:29 +00:00
Egon Elbre
89ce1fe626 storagenode/storagenodedb: add ctx to OpenNew and OpenExisting
Database opening usually dial and hence we should pass ctx to them.

Change-Id: I9160ae95829f22f347bd525904898a47279a7427
2020-10-29 09:52:37 +02:00
Egon Elbre
76f4619a9c {satellite,storagenode}/gracefulexit: ensure client is closed
Change-Id: I576a955a5578caf7fcbee832beca28cef2b0c83e
2020-10-27 23:27:07 +02:00
Moby von Briesen
2fbb4095b2 storagenode/orders/ordersfile: Handle remaining pb.Unmarshal errors
Missed one case of Unmarshal in the previous commit for V0 files (0f4e4969b7)
In V1, unmarshalling was being attempted before the checksum was
verified, so this commit moves those calls to the end of the V1 ReadOne
function.

Change-Id: Ic0b49f0bbc91fb61fb28af6003060994d0af22ed
2020-10-26 20:27:05 +00:00
Moby von Briesen
53ba01b1f1 storagenode/orders/ordersfile/v0.go: Return ErrEntryCorrupt on pb.Unmarshal failure
In V0 orders files, unexpected EOF is correctly treated as a file
corruption, but pb.Unmarshal can also fail, and this is not treated as a
file corruption. This commit fixes that.

Change-Id: I6b446a10f4b1a5a44e832cbcc9bf8b2548cfcfeb
2020-10-26 17:38:22 +00:00
Jessica Grebenschikov
f5880f6833 satellite/orders: rollout phase3 of SettlementWithWindow endpoint
Change-Id: Id19fae4f444c83157ce58c933a18be1898430ad0
2020-10-26 14:56:28 +00:00
paul cannon
76d4977b6a storagenode/gracefulexit: logic moved from worker to service
Change-Id: I8b12606a96b712050bf40d587664fb1b2c578fbc
2020-10-22 23:19:30 +00:00
Jessica Grebenschikov
89bdb20a62 storagenodedb/orders: select unsent satellite with expiration
In production we are seeing ~115 storage nodes (out of ~6,500) are not using the new SettlementWithWindow endpoint (but they are upgraded to > v1.12).

We analyzed data being reported by monkit for the nodes who were above version 1.11 but were not successfully submitting orders to the new endpoint.
The nodes fell into a few categories:
1. Always fail to list orders from the db; never get to try sending orders from the filestore
2. Successfully list/send orders from the db; never get to calling satellite endpoint for submitting filestore orders
3. Successfully list/send orders from the db; successfully list filestore orders, but satellite endpoint fails (with "unauthenticated" drpc error)

The code change here add the following to address these issues:
- modify the query for ordersDB.listUnsentBySatellite so that we no longer select expired orders from the unsent_orders table
- always process any orders that are in the ordersDB and also any orders stored in the filestore
- add monkit monitoring to filestore.ListUnsentBySatellite so that we can see the failures/successes

Change-Id: I0b473e5d75252e7ab5fa6b5c204ed260ab5094ec
2020-10-21 15:02:23 +00:00
littleskunk
77d54ff0ac
storagenode/bandwidthdb: Use existing indexes (#3949)
* storagenode/bandwidthdb: Use existing indexes
2020-10-20 22:48:40 +02:00
Qweder93
9df74338a8 storagenode: secret db and service added
Change-Id: I91257e5adc4fc6711653f30c118e476ed1c95b6b
2020-10-16 13:24:33 +00:00
NickolaiYurchenko
7c275830a1 web/storagenode: gross total added to historical data, with surge moved
WHAT:
changed estimation table row order.

WHY:
to show gross total for selected period to avoid misunderstanding
when held amount is bigger than paid multiple times.

Change-Id: I03881c8af682372139a378030acf04f199d3260b
2020-10-16 13:26:28 +03:00
Yaroslav Vorobiov
139a7ee959 private/migrate: add ablity to create dbs during migration
Use tagsql.DB pointer as step database, to propagate changes
back and forth between actual database and migration.
Adds CreateDB operation to the migration step to be able to
create new dbs before executing migration action.
Adjusts storagenode database migration to use inner tagsql.DB
pointer of each database as step.DB.
Adjusts satellite dabase migration, adds proxy migrationDB field
to satellite db that wraps itself as tagsql.DB, pointer of which
is used as step.DB.

Change-Id: Ifed4de5b01a356cf7b37db64d2eaeb7b61982c5c
2020-10-15 15:28:04 +03:00
Moby von Briesen
aa86c0889c storagenode/console: Add current storage used per satellite to storagenode api
Right now, the best way for a storage node operator to get the current
space used for each satellite is to run the `storagenode exit-satellite`
command for graceful exit, and cancel at the second confirmation prompt.
This is convoluted and the data is readily available from the Blobs
Usage Cache.

This change adds the current space used by each satellite to the
endpoints `/api/sno` and `/api/sno/satellite/<Satellite ID>`

Change-Id: I2173005bb016fc76db96fd598d26b485e5b2aa0b
2020-10-14 21:30:28 +00:00
Moby von Briesen
02cbf1e72a storagenode/orders: Add V1 orders file
V1 allows the storagenode to continue reading orders from an
unsent/archived orders file, even if orders in the middle are corrupted.

Change-Id: Iea4117d55c05ceeb77f47d5c973e5ba95da46c66
2020-10-14 15:04:33 +00:00
Egon Elbre
cf2dd76db7 cmd/satellite: proper log usage
log.Fatal immediately terminates the program without running any defers.
We should properly close all the services and databases.

Change-Id: I5e959cef3eafedeacb3a2062e3da47e8d04e8e75
2020-10-13 16:56:35 +03:00
Egon Elbre
2268cc1df3 all: fix linter complaints
Change-Id: Ia01404dbb6bdd19a146fa10ff7302e08f87a8c95
2020-10-13 15:59:01 +03:00
Egon Elbre
0bdb952269 all: use keyed special comment
Change-Id: I57f6af053382c638026b64c5ff77b169bd3c6c8b
2020-10-13 15:13:41 +03:00
Stefan Benten
c1ca470e7e storagenode/orders: fix import and cleanup go.mod and go.sum
Accidentally we imported the wrong monkit package with a previous
commit and made our go.mod and go.sum file unclean.
This should fix it.

Change-Id: I4c3c8b696f59cfd06dc2d5436bb7aea2805936ce
2020-10-09 00:04:57 +02:00
Moby von Briesen
3209effeb6 storagenode/orders: Increase order sending interval from 5m to 1h
Since storage nodes check to see if any order files can be sent every 5
minutes, every storage node attempts to send orders to the satellite
within 5 minutes of each hour since this is when the files become
"available" to send. It is placing a lot of load on our satellite and
storage nodes are not being paid out properly due to timeouts during
order sending due to the increased satellite load.

Change-Id: I44d991b5884b8c11e8a3856d39aee8323f086b51
2020-10-08 12:51:21 -04:00
Moby von Briesen
fbf2c0b242 storagenode/orders: Refactor orders store
Abstract details of writing and reading data to/from orders files so
that adding V1 and future maintenance are easier.

Change-Id: I85f4a91761293de1a782e197bc9e09db228933c9
2020-10-06 15:28:07 -04:00
Qweder93
664b8f6821 storagenode/payout: estimation payout values switched from int64 to float64 to avoid incorrect rounding.
float64 values rounding to 2nd sign after dot.

Change-Id: Ice49f6a0944231ea6adb3343545bf1a62ff6dbc1
2020-10-02 11:33:43 +00:00
Qweder93
245986d528 negative space calculations fix removed
Change-Id: I342c61856fce6d02dc99fd27fd3d563540f22b64
2020-09-30 14:08:24 +00:00
Yaroslav Vorobiov
a840cb71e7 storagenode: check db version before run
Change-Id: I912f63fd62f2bff10341346c28dfb92fcd683806
2020-09-30 10:58:09 +00:00
Michal Niewrzal
cd2a5484f3 storagenode/console: ignore untrusted satellite while returning
dashboard data and calculating satellites data

Change-Id: I71d596891477e0839863e007689b6e2e6e420a22
2020-09-29 18:27:49 +00:00
Yaroslav Vorobiov
8786e55a78 storagenode/storagenodedb: allow existing dbs on setup
Allow existing storagenode dbs on setup to be able to reinstall
the node with existing data.

Change-Id: Ib42ab585432e61dfecc10640b6cd755ce83f0c46
2020-09-28 16:31:48 +03:00
nerdatwork
870abd8676
storagenode/pieces: tidying trash log 2020-09-24 11:55:06 +03:00
Moby von Briesen
8287e3a32d storagenode/orders/store.go: combine writeLimit/writeOrder operations
Combine store.writeLimit and store.writeOrder into
store.writeLimitAndOrder, which only requires a single call to
file.Write(). This simplifies code, but it also reduces the likelihood
of multiple calls to Write() increasing the likelihood of file
corruption.

Also combine the corresponding readLimit/readOrder functions for
consistency.

Change-Id: I62ed406fa2c02708465a678d18293f510f666440
2020-09-22 17:53:12 +00:00
nerdatwork
54dd430048
storagenode/pieces: fix typo for satellite id and piece id 2020-09-22 08:19:12 +03:00
nerdatwork
96ec44ff1b
storagenode/pieces: make log more legible 2020-09-18 15:10:13 +03:00
Qweder93
8182fdad0b storagenode: heldamount renamed to payouts, renamed some methods and structs to more meaningful names. grouped estimated payout with pathouts
satellite: heldamount renamed to SNOpayouts.

Change-Id: I244b4d2454e0621f4b8e22d3c0d3e602c0bbcb02
2020-09-16 14:57:35 +00:00
Moby von Briesen
7db5794c16 storagenode/orders/store: Do not lock order enqueues for entire duration of ListUnsentBySatellite
We only need to lock aquire mutexes inside ListUnsentBySatellite when we
want to determine whether a file has an active enqueue in progress.
On some nodes, ListUnsentBySatellite can take a particularly long time, having
undesired side-effects, so if we can minimize locking time, those nodes
will be better off.

Also, lock archive mu during ListUnsentBySatellite so files cannot be
archived and listed at the same time.

Change-Id: Ieb7e2a759c20c724a74dd8315728c873ccab14a3
2020-09-15 15:15:30 +00:00
Qweder93
528aa76ae6 storagenode/payouts: payoutHistoryMonthly surge reworked, empty receipt now won't return error
Change-Id: If99f8aec102550cd30e5906f986a4417903100be
2020-09-14 18:19:17 +03:00
Moby von Briesen
789b07e226 storagenode/orders/store.go: Do not return error from ListUnsentBySatellite when order files are corrupted.
If we see an UnexpectedEOF error when attempting to read orders, return
the orders we have been able to read successfully and do not return an
error. This behavior ensures that the storagenode orders service
attempts to archive corrupted files and does not retry them repeatedly
and get stuck.

Change-Id: I0d00d1e174f968af6e99ca861eddad190f1339e2
2020-09-10 23:36:05 +00:00
Qweder93
ac29d80495 storagenode: heldamount GetPaystub refactored, estimationPayouts logic separated form console to separate service, storagenodeapi tests fixed.
Change-Id: I902823ef40a62861ce32799e9fb7a67a1e14710d
2020-09-09 15:31:16 +00:00
Stefan Benten
179b5adad4 storagenode/orders: add missing mon.Task parameter
Change-Id: If98cf347a81f29698a6bdb0907520d60f71db433
2020-09-06 00:05:53 +00:00
Jennifer Johnson
4e2413a99d satellite/satellitedb: uses vetted_at field to select for reputable nodes
Additionally, this PR changes NewNodeFraction devDefault and testplanet config from 0.05 to 1.
This is because many tests relied on selecting nodes that were reputable based on audit and uptime
counts of 0, in effect, selecting new nodes as reputable ones.
However, since reputation is now indicated by a vetted_at db field that is explicitly set
rather than implied by audit and uptime counts, it would be more complicated to try to
update all of the nodes' reputations before selecting nodes for tests.
Now we just allow all test nodes to be new if needed.

Change-Id: Ib9531be77408662315b948fd029cee925ed2ca1d
2020-09-04 16:45:32 +00:00
Michal Niewrzal
aa47e70f03 satellite/metainfo: use metabase.SegmentKey with metainfo.Service
Instead of using string or []byte we will be using dedicated type
SegmentKey.

Change-Id: I6ca8039f0741f6f9837c69a6d070228ed10f2220
2020-09-03 15:11:32 +00:00
Qweder93
36d752e92d storagenode/reputation: offline_under_review_at added
Change-Id: Ia7ec79b2d6f20fe29de0c36223f9485380d2845c
2020-09-02 18:48:28 +03:00
Qweder93
7d9897b7af storagenode/nodestats: online_score added
Change-Id: I84b50a6cace306e5f10d53a2073fe8810d4d2960
2020-09-02 17:45:01 +03:00
JT Olio
1f711523d5 satellite/repair: switch to piecestore.UploadReader part 2
Change-Id: I5a91d2960b037c7a3c96d01bc40404316ba028e3
2020-09-01 12:40:54 -06:00
JT Olio
b872fe52a1 satellite/repair: switch to piecestore.UploadReader
Change-Id: Ia99ad2cf5422e6ba1d98b32946740f9cadba7b6d
2020-09-01 09:26:54 -06:00
Cameron Ayer
ca0c1a5f0c storagenode/{monitor,pieces}, storage/filestore: add loop to check storage directory writability
periodically create and delete a temp file in the storage directory
to verify writability. If this check fails, shut the node down.

Change-Id: I433e3a8d1d775fc779ae78e7cf3144a05ffd0574
2020-08-31 21:20:49 +00:00
nerdatwork
e072febbcc
Fixed typo in log for allocated space (#3934) 2020-08-29 16:36:37 +02:00