Commit Graph

823 Commits

Author SHA1 Message Date
Jessica Grebenschikov
b261110352 satellite/orders: get bucketID from encrypted metadata in order instead of serial_numbers table
We want to stop using the serial_numbers table in satelliteDB. One of the last places using the serial_numbers table is when storagenodes settle orders, we look up the bucket name and project ID from the serial number from the serial_numbers table.

Now that we have support to add encrypted metadata into the OrderLimit, this PR makes use of that and now attempts to read the project ID and bucket name from the encrypted orderLimit metadata instead of from the serial_numbers table. For backwards compatibility and to ensure no errors, we will still fallback to the old way of getting that info from the serial_numbers table, but this will be removed in the next release as long as there are no errors.

All processes that create orderLimits must have an orders.encryption-keys set. The services that create orderLimits (and thus need to encrypt the order metadata) are the satellite apiProcess, the repair process, audit service (core process), and graceful exit (core process). Only the satellite api process decrypts the order metadata when storagenodes settle orders. This means that the same encryption key needs to be provided in the config for the satellite api process, repair process, and the core process like so:
orders.include-encrypted-metadata=true
orders.encryption-keys="<"encryptionKeyID>=<encryptionKey>"

Change-Id: Ie2c037971713d6fbf69d697bfad7f8b672eedd66
2020-12-01 15:29:32 +00:00
Egon Elbre
aeb801604e {satellite,storagenode}/orders: fix flaky tests
Before manipulating order information on storagenodes we need to wait
for the orders to propagate to the database. Some of that happens
async with uplink.

Change-Id: Iaacfd7db0909ab5d2831d06388e5fb27b6d4778f
2020-11-18 13:44:02 +00:00
Moby von Briesen
41d86c0985 storagenode/orders/ordersfile: Add reasonable size caps for orders/limits when detecting file corruption.
Define constants of 32 KiB as the upper limit of the marshalled order
and limit protobuf sizes. This value gives lots of buffer in case the
protobufs ever change, but is not as extreme as what we were doing
before in V0 files, which was to use the Uint32 max value.

Change-Id: I0914d17dde3b044b2611af33f931d46d55f81e98
2020-11-18 12:33:26 +00:00
Qweder93
a17cd9aa3e storageode/apikey: added service, CLI issue api key
Change-Id: I840cd0fdbd8dca884eefbd111f21fd3990c11e68
2020-11-18 10:40:17 +00:00
Ivan Fraixedes
fa95c6bbb9
storagenode/orders/ordersfile: Fix error message wrong var
Fix the error message reported by a wrong order size due to passing the
wrong variable to the interpolation pattern.

Change-Id: Ic0059615c60cfa33a26d4aeb0ebda5e586f0df05
2020-11-17 15:22:27 +01:00
Ivan Fraixedes
9740da6508 storagenode/orders: Don't panic if size is over MaxInt32
`make` built function to build a new slice with a negative
length panics.
`make` length parameter is of `int` type.

These changes avoid that `make` panics on 32 bits architecture due to
the fact that `int` type is a `int32` an uint32 value can be over the
maximum `int32`, and when that happens the length parameter value
becomes negative and makes `make` to panic.

Change-Id: Ife9ab5993916d6dcf5584b37c208272269cb2b45
2020-11-17 10:35:21 +00:00
Qweder93
c409194d43 storagenode/payouts: estimation payout heldamount rounding removed
Change-Id: I9fdc7cda15de0df8875436b0b376f0e6479d3aeb
2020-11-17 10:06:11 +00:00
Cameron Ayer
48d8114b3f satellite/contact: treat pingback failure as error
If the satellite fails to pingback the storage node during CheckIn
an error message is returned to the node in the response, but the actual
error value returned is nil. We are only checking the error. This means
the node has no feedback about the failure, and the node also does not
attempt to retry the connection.

Change-Id: Iaed00e422ba91af573e72255cc6671ea97928eae
2020-11-16 18:26:37 +00:00
Moby von Briesen
db480e6e1b storagenode/orders: Improve performance of handling corrupt orders.
This change fixes two things which can make reading from a corrupted
orders file inefficient.
* When a corrupted order is detected, but the underlying error is an
UnexpectedEOF (as opposed to a pb.Unmarshal error, for instance), there
is no point in attempting to read from the file another time to find an
additional uncorrupted order - we will continue to get UnexpectedEOF
errors until we seek to the very end of the file and get a normal EOF.
Instead, when UnexpectedEOF occurs, log and send metrics as with other
types of corruption, but do not attempt to read again.
* When a corrupted order is detected, instead of seeking forward only
one byte for the next attempt, seek forward by the size of entryHeader.
This cuts down on the number of iterations needed to find an uncorrupted
order after detecting a corrupted one.

Change-Id: Ie1a613127e29d29318584ec7f60e8f7554f73487
2020-11-16 14:08:36 +00:00
Cameron Ayer
5a337c48ec {cmd,private,storagenode}: create storage dir verification during setup
Previously, we created a new file to use for directory verification
every time the storage node starts. This is not helpful if the storage node
points to the wrong directory when restarting. Now we will only create the file
on setup. Now the file should be created only once and will be verified at
runtime.

Change-Id: Id529f681469138d368e5ea3c63159befe62b1a5b
2020-11-11 11:01:36 -05:00
Egon Elbre
b892a00143 mod: bump dependencies and reenable test
We shouldn't have any EOF issues with recent drpc fix, let's reenable
and see whether it's still flaky.

Change-Id: I0de312bcb087c7f70ec9d3281d73d86f971845d5
2020-11-10 10:32:21 +00:00
Moby von Briesen
db6bc6503d satellite/metainfo: Update metainfo RS config to more easily support multiple RS schemes.
Make metainfo.RSConfig a valid pflag config value. This allows us to
configure the RSConfig as a string like k/m/o/n-shareSize, which makes
having multiple supported RS schemes easier in the future.

RS-related config values that are no longer needed have been removed
(MinTotalThreshold, MaxTotalThreshold, MaxBufferMem, Verify).

Change-Id: I0178ae467dcf4375c504e7202f31443d627c15e1
2020-11-09 22:16:13 +00:00
Qweder93
8dc10e32ad stefan benten satellited added to historical payout data
Change-Id: I1177b2d2ef10d514f7d401e29891fa7dd964e9ac
2020-11-09 15:43:41 +00:00
Egon Elbre
7183dca6cb all: fix defers in loop
defer should not be called in a loop.

Change-Id: Ifa5a25a56402814b974bcdfb0c2fce56df8e7e59
2020-11-02 15:06:38 +02:00
Egon Elbre
fd8e697ab2 {satellite,storagenode}/internalpb: use specific package name
Ensure we don't register types with the same name into protobuf.

Change-Id: I53d025863fff8c91a067ca5819befa87eb5e35bb
2020-10-30 17:31:08 +02:00
Egon Elbre
1903b15474 storagenode/internalpb: move gracefulexit.proto
Change-Id: Ia3614846ed49a39c8f39331516d16d45a695240b
2020-10-30 15:24:56 +02:00
Egon Elbre
cda67a659a storagenode/internalpb: move inspector.proto
Change-Id: I951379c3b2ff00d1bc09d6a49c026a7e723432d6
2020-10-30 14:51:26 +02:00
Qweder93
f5ba8b8009 storagenode/suspensions: added offline-suspension notificatio chore + tests
Change-Id: I2521cd2e7d08a1dd379e717a554a026c7508c18f
2020-10-29 19:44:22 +02:00
Egon Elbre
e0dca4042d all: add pprof labels for debugger
By using pprof.Labels debugger is able to show service/peer names in
goroutine names.

Change-Id: I5f55253470f7cc7e556f8e8b87f746394e41675f
2020-10-29 15:10:07 +00:00
Qweder93
624255e8ba storagenode/secret: db tests added, small renaming fixes added
Change-Id: I7eae1a9a64c20a39c97e81fa741cfc9b9e1e615a
2020-10-29 14:23:04 +02:00
Egon Elbre
caefde6b32 private/{dbutil,tagsql}: pass ctx to database opening
Database opening usually dial and hence we should pass ctx to them.

Change-Id: Iaa2875981570d83e65be3710f841cf30349f807b
2020-10-29 10:51:29 +00:00
Egon Elbre
89ce1fe626 storagenode/storagenodedb: add ctx to OpenNew and OpenExisting
Database opening usually dial and hence we should pass ctx to them.

Change-Id: I9160ae95829f22f347bd525904898a47279a7427
2020-10-29 09:52:37 +02:00
Egon Elbre
76f4619a9c {satellite,storagenode}/gracefulexit: ensure client is closed
Change-Id: I576a955a5578caf7fcbee832beca28cef2b0c83e
2020-10-27 23:27:07 +02:00
Moby von Briesen
2fbb4095b2 storagenode/orders/ordersfile: Handle remaining pb.Unmarshal errors
Missed one case of Unmarshal in the previous commit for V0 files (0f4e4969b7)
In V1, unmarshalling was being attempted before the checksum was
verified, so this commit moves those calls to the end of the V1 ReadOne
function.

Change-Id: Ic0b49f0bbc91fb61fb28af6003060994d0af22ed
2020-10-26 20:27:05 +00:00
Moby von Briesen
53ba01b1f1 storagenode/orders/ordersfile/v0.go: Return ErrEntryCorrupt on pb.Unmarshal failure
In V0 orders files, unexpected EOF is correctly treated as a file
corruption, but pb.Unmarshal can also fail, and this is not treated as a
file corruption. This commit fixes that.

Change-Id: I6b446a10f4b1a5a44e832cbcc9bf8b2548cfcfeb
2020-10-26 17:38:22 +00:00
Jessica Grebenschikov
f5880f6833 satellite/orders: rollout phase3 of SettlementWithWindow endpoint
Change-Id: Id19fae4f444c83157ce58c933a18be1898430ad0
2020-10-26 14:56:28 +00:00
paul cannon
76d4977b6a storagenode/gracefulexit: logic moved from worker to service
Change-Id: I8b12606a96b712050bf40d587664fb1b2c578fbc
2020-10-22 23:19:30 +00:00
Jessica Grebenschikov
89bdb20a62 storagenodedb/orders: select unsent satellite with expiration
In production we are seeing ~115 storage nodes (out of ~6,500) are not using the new SettlementWithWindow endpoint (but they are upgraded to > v1.12).

We analyzed data being reported by monkit for the nodes who were above version 1.11 but were not successfully submitting orders to the new endpoint.
The nodes fell into a few categories:
1. Always fail to list orders from the db; never get to try sending orders from the filestore
2. Successfully list/send orders from the db; never get to calling satellite endpoint for submitting filestore orders
3. Successfully list/send orders from the db; successfully list filestore orders, but satellite endpoint fails (with "unauthenticated" drpc error)

The code change here add the following to address these issues:
- modify the query for ordersDB.listUnsentBySatellite so that we no longer select expired orders from the unsent_orders table
- always process any orders that are in the ordersDB and also any orders stored in the filestore
- add monkit monitoring to filestore.ListUnsentBySatellite so that we can see the failures/successes

Change-Id: I0b473e5d75252e7ab5fa6b5c204ed260ab5094ec
2020-10-21 15:02:23 +00:00
littleskunk
77d54ff0ac
storagenode/bandwidthdb: Use existing indexes (#3949)
* storagenode/bandwidthdb: Use existing indexes
2020-10-20 22:48:40 +02:00
Qweder93
9df74338a8 storagenode: secret db and service added
Change-Id: I91257e5adc4fc6711653f30c118e476ed1c95b6b
2020-10-16 13:24:33 +00:00
NickolaiYurchenko
7c275830a1 web/storagenode: gross total added to historical data, with surge moved
WHAT:
changed estimation table row order.

WHY:
to show gross total for selected period to avoid misunderstanding
when held amount is bigger than paid multiple times.

Change-Id: I03881c8af682372139a378030acf04f199d3260b
2020-10-16 13:26:28 +03:00
Yaroslav Vorobiov
139a7ee959 private/migrate: add ablity to create dbs during migration
Use tagsql.DB pointer as step database, to propagate changes
back and forth between actual database and migration.
Adds CreateDB operation to the migration step to be able to
create new dbs before executing migration action.
Adjusts storagenode database migration to use inner tagsql.DB
pointer of each database as step.DB.
Adjusts satellite dabase migration, adds proxy migrationDB field
to satellite db that wraps itself as tagsql.DB, pointer of which
is used as step.DB.

Change-Id: Ifed4de5b01a356cf7b37db64d2eaeb7b61982c5c
2020-10-15 15:28:04 +03:00
Moby von Briesen
aa86c0889c storagenode/console: Add current storage used per satellite to storagenode api
Right now, the best way for a storage node operator to get the current
space used for each satellite is to run the `storagenode exit-satellite`
command for graceful exit, and cancel at the second confirmation prompt.
This is convoluted and the data is readily available from the Blobs
Usage Cache.

This change adds the current space used by each satellite to the
endpoints `/api/sno` and `/api/sno/satellite/<Satellite ID>`

Change-Id: I2173005bb016fc76db96fd598d26b485e5b2aa0b
2020-10-14 21:30:28 +00:00
Moby von Briesen
02cbf1e72a storagenode/orders: Add V1 orders file
V1 allows the storagenode to continue reading orders from an
unsent/archived orders file, even if orders in the middle are corrupted.

Change-Id: Iea4117d55c05ceeb77f47d5c973e5ba95da46c66
2020-10-14 15:04:33 +00:00
Egon Elbre
cf2dd76db7 cmd/satellite: proper log usage
log.Fatal immediately terminates the program without running any defers.
We should properly close all the services and databases.

Change-Id: I5e959cef3eafedeacb3a2062e3da47e8d04e8e75
2020-10-13 16:56:35 +03:00
Egon Elbre
2268cc1df3 all: fix linter complaints
Change-Id: Ia01404dbb6bdd19a146fa10ff7302e08f87a8c95
2020-10-13 15:59:01 +03:00
Egon Elbre
0bdb952269 all: use keyed special comment
Change-Id: I57f6af053382c638026b64c5ff77b169bd3c6c8b
2020-10-13 15:13:41 +03:00
Stefan Benten
c1ca470e7e storagenode/orders: fix import and cleanup go.mod and go.sum
Accidentally we imported the wrong monkit package with a previous
commit and made our go.mod and go.sum file unclean.
This should fix it.

Change-Id: I4c3c8b696f59cfd06dc2d5436bb7aea2805936ce
2020-10-09 00:04:57 +02:00
Moby von Briesen
3209effeb6 storagenode/orders: Increase order sending interval from 5m to 1h
Since storage nodes check to see if any order files can be sent every 5
minutes, every storage node attempts to send orders to the satellite
within 5 minutes of each hour since this is when the files become
"available" to send. It is placing a lot of load on our satellite and
storage nodes are not being paid out properly due to timeouts during
order sending due to the increased satellite load.

Change-Id: I44d991b5884b8c11e8a3856d39aee8323f086b51
2020-10-08 12:51:21 -04:00
Moby von Briesen
fbf2c0b242 storagenode/orders: Refactor orders store
Abstract details of writing and reading data to/from orders files so
that adding V1 and future maintenance are easier.

Change-Id: I85f4a91761293de1a782e197bc9e09db228933c9
2020-10-06 15:28:07 -04:00
Qweder93
664b8f6821 storagenode/payout: estimation payout values switched from int64 to float64 to avoid incorrect rounding.
float64 values rounding to 2nd sign after dot.

Change-Id: Ice49f6a0944231ea6adb3343545bf1a62ff6dbc1
2020-10-02 11:33:43 +00:00
Qweder93
245986d528 negative space calculations fix removed
Change-Id: I342c61856fce6d02dc99fd27fd3d563540f22b64
2020-09-30 14:08:24 +00:00
Yaroslav Vorobiov
a840cb71e7 storagenode: check db version before run
Change-Id: I912f63fd62f2bff10341346c28dfb92fcd683806
2020-09-30 10:58:09 +00:00
Michal Niewrzal
cd2a5484f3 storagenode/console: ignore untrusted satellite while returning
dashboard data and calculating satellites data

Change-Id: I71d596891477e0839863e007689b6e2e6e420a22
2020-09-29 18:27:49 +00:00
Yaroslav Vorobiov
8786e55a78 storagenode/storagenodedb: allow existing dbs on setup
Allow existing storagenode dbs on setup to be able to reinstall
the node with existing data.

Change-Id: Ib42ab585432e61dfecc10640b6cd755ce83f0c46
2020-09-28 16:31:48 +03:00
nerdatwork
870abd8676
storagenode/pieces: tidying trash log 2020-09-24 11:55:06 +03:00
Moby von Briesen
8287e3a32d storagenode/orders/store.go: combine writeLimit/writeOrder operations
Combine store.writeLimit and store.writeOrder into
store.writeLimitAndOrder, which only requires a single call to
file.Write(). This simplifies code, but it also reduces the likelihood
of multiple calls to Write() increasing the likelihood of file
corruption.

Also combine the corresponding readLimit/readOrder functions for
consistency.

Change-Id: I62ed406fa2c02708465a678d18293f510f666440
2020-09-22 17:53:12 +00:00
nerdatwork
54dd430048
storagenode/pieces: fix typo for satellite id and piece id 2020-09-22 08:19:12 +03:00
nerdatwork
96ec44ff1b
storagenode/pieces: make log more legible 2020-09-18 15:10:13 +03:00
Qweder93
8182fdad0b storagenode: heldamount renamed to payouts, renamed some methods and structs to more meaningful names. grouped estimated payout with pathouts
satellite: heldamount renamed to SNOpayouts.

Change-Id: I244b4d2454e0621f4b8e22d3c0d3e602c0bbcb02
2020-09-16 14:57:35 +00:00
Moby von Briesen
7db5794c16 storagenode/orders/store: Do not lock order enqueues for entire duration of ListUnsentBySatellite
We only need to lock aquire mutexes inside ListUnsentBySatellite when we
want to determine whether a file has an active enqueue in progress.
On some nodes, ListUnsentBySatellite can take a particularly long time, having
undesired side-effects, so if we can minimize locking time, those nodes
will be better off.

Also, lock archive mu during ListUnsentBySatellite so files cannot be
archived and listed at the same time.

Change-Id: Ieb7e2a759c20c724a74dd8315728c873ccab14a3
2020-09-15 15:15:30 +00:00
Qweder93
528aa76ae6 storagenode/payouts: payoutHistoryMonthly surge reworked, empty receipt now won't return error
Change-Id: If99f8aec102550cd30e5906f986a4417903100be
2020-09-14 18:19:17 +03:00
Moby von Briesen
789b07e226 storagenode/orders/store.go: Do not return error from ListUnsentBySatellite when order files are corrupted.
If we see an UnexpectedEOF error when attempting to read orders, return
the orders we have been able to read successfully and do not return an
error. This behavior ensures that the storagenode orders service
attempts to archive corrupted files and does not retry them repeatedly
and get stuck.

Change-Id: I0d00d1e174f968af6e99ca861eddad190f1339e2
2020-09-10 23:36:05 +00:00
Qweder93
ac29d80495 storagenode: heldamount GetPaystub refactored, estimationPayouts logic separated form console to separate service, storagenodeapi tests fixed.
Change-Id: I902823ef40a62861ce32799e9fb7a67a1e14710d
2020-09-09 15:31:16 +00:00
Stefan Benten
179b5adad4 storagenode/orders: add missing mon.Task parameter
Change-Id: If98cf347a81f29698a6bdb0907520d60f71db433
2020-09-06 00:05:53 +00:00
Jennifer Johnson
4e2413a99d satellite/satellitedb: uses vetted_at field to select for reputable nodes
Additionally, this PR changes NewNodeFraction devDefault and testplanet config from 0.05 to 1.
This is because many tests relied on selecting nodes that were reputable based on audit and uptime
counts of 0, in effect, selecting new nodes as reputable ones.
However, since reputation is now indicated by a vetted_at db field that is explicitly set
rather than implied by audit and uptime counts, it would be more complicated to try to
update all of the nodes' reputations before selecting nodes for tests.
Now we just allow all test nodes to be new if needed.

Change-Id: Ib9531be77408662315b948fd029cee925ed2ca1d
2020-09-04 16:45:32 +00:00
Michal Niewrzal
aa47e70f03 satellite/metainfo: use metabase.SegmentKey with metainfo.Service
Instead of using string or []byte we will be using dedicated type
SegmentKey.

Change-Id: I6ca8039f0741f6f9837c69a6d070228ed10f2220
2020-09-03 15:11:32 +00:00
Qweder93
36d752e92d storagenode/reputation: offline_under_review_at added
Change-Id: Ia7ec79b2d6f20fe29de0c36223f9485380d2845c
2020-09-02 18:48:28 +03:00
Qweder93
7d9897b7af storagenode/nodestats: online_score added
Change-Id: I84b50a6cace306e5f10d53a2073fe8810d4d2960
2020-09-02 17:45:01 +03:00
JT Olio
1f711523d5 satellite/repair: switch to piecestore.UploadReader part 2
Change-Id: I5a91d2960b037c7a3c96d01bc40404316ba028e3
2020-09-01 12:40:54 -06:00
JT Olio
b872fe52a1 satellite/repair: switch to piecestore.UploadReader
Change-Id: Ia99ad2cf5422e6ba1d98b32946740f9cadba7b6d
2020-09-01 09:26:54 -06:00
Cameron Ayer
ca0c1a5f0c storagenode/{monitor,pieces}, storage/filestore: add loop to check storage directory writability
periodically create and delete a temp file in the storage directory
to verify writability. If this check fails, shut the node down.

Change-Id: I433e3a8d1d775fc779ae78e7cf3144a05ffd0574
2020-08-31 21:20:49 +00:00
nerdatwork
e072febbcc
Fixed typo in log for allocated space (#3934) 2020-08-29 16:36:37 +02:00
Egon Elbre
c86c732fc0 satellite: simplify tests
satellite.DB.Console().Projects().GetAll database query
can be replaced with planet.Uplinks[0].Projects[0].ID

Change-Id: I73b82b91afb2dde7b690917345b798f9d81f6831
2020-08-28 22:28:04 +00:00
Egon Elbre
3ca405aa97 satellite/orders: use metabase types as arguments
Change-Id: I7ddaad207c20572a5ea762667531770a56fd54ef
2020-08-28 15:52:37 +03:00
Qweder93
c4a4745dd8 storagenode/console: audit per satellite now uses satelliteName instead of satelliteID
Change-Id: I8221ec840f654a62aedfb62a4194616db890f539
2020-08-25 12:52:47 +00:00
Qweder93
f16cf5cccf storagenode/console & /inspector: added recalculation of disk space info
Change-Id: Id003d031a6464ec095c31290fd6a756ead644261
2020-08-25 14:19:10 +03:00
Egon Elbre
f0ef01de5b storagenode/gracefulexit: retry workers faster
Change-Id: Ica20a691ff117a2b36a6362ee1fed21ce49a9ac1
2020-08-24 12:27:27 +03:00
Egon Elbre
e6bea41083 Revert "gracefulexit: reconnect added"
This reverts commit cff44fbd19.

Change-Id: I6590f483493e308b8244151e1df7570fd32ca2f8
2020-08-23 18:11:24 +03:00
Qweder93
cff44fbd19 gracefulexit: reconnect added
Change-Id: I236689af944effe3e79ef92e852ae264d3b372e5
2020-08-22 14:59:46 +03:00
Moby von Briesen
68b67c83a7 storagenode/{orders,piecestore}: Always unlock unsent orders file, even with an empty order.
When we call ordersStore.BeginEnqueue, the unsent orders file for that
satellite and hour is prevented from being sent. It is freed when the
commit callback returned by BeginEnqueue is used. This change ensures
that we always call the commit callback, even when we have an empty
order or an order with Amount <= 0.

Change-Id: Ic4678f7eaa1e6957dd77d4bb5a23bb35d25b1e93
2020-08-21 11:35:31 -04:00
littleskunk
db57d76ee9
storagenode/gracefulexit: fix wrong error handling for corrupted pieces (#3930) 2020-08-21 11:35:03 +02:00
Jeff Wendling
91698207cf storagenode: live tracking of order window usage
This change accomplishes multiple things:

1. Instead of having a max in flight time, which means
   we effectively have a minimum bandwidth for uploads
   and downloads, we keep track of what windows have
   active requests happening in them.

2. We don't double check when we save the order to see if it
   is too old: by then, it's too late. A malicious uplink
   could just submit orders outside of the grace window and
   receive all the data, but the node would just not commit
   it, so the uplink gets free traffic. Because the endpoints
   also check for the order being too old, this would be a
   very tight race that depends on knowledge of the node system
   clock, but best to not have the race exist. Instead, we piggy
   back off of the in flight tracking and do the check when
   we start to handle the order, and commit at the end.

3. Change the functions that send orders and list unsent
   orders to accept a time at which that operation is
   happening. This way, in tests, we can pretend we're
   listing or sending far into the future after the windows
   are available to send, rather than exposing test functions
   to modify internal state about the grace period to get
   the desired effect. This brings tests closer to actual
   usage in production.

4. Change the calculation for if an order is allowed to be
   enqueued due to the grace period to just look at the
   order creation time, rather than some computation involving
   the window it will be in. In this way, you can easily
   answer the question of "will this order be accepted?" by
   asking "is it older than X?" where X is the grace period.

5. Increases the frequency we check to send up orders to once
   every 5 minutes instead of once every hour because we already
   have hour-long buffering due to the windows. This decreases
   the maximum latency that an order will be reported back to
   the satellite by 55 minutes.

Change-Id: Ie08b90d139d45ee89b82347e191a2f8db1b88036
2020-08-19 19:42:33 +00:00
Cameron Ayer
0155c21b44 private/testplanet, storagenode/{monitor,pieces}: write storage dir verification file on run and verify on loop
On run, write the storage directory verification file.

Every time the node runs it will write the file even if it already exists.
The reason we do this is because if the verification file is missing, the SN
doesn't know whether it is an incorrect directory, or it simply hasn't written
the file yet, and we want to keep nodes running without needing operator intervention.

Once this change has been a part of the minimum version for several releases,
we will move the file creation from the run command to the setup
command. Run will only verify its existence.

Change-Id: Ib7d20e78e711c63817db0ab3036a50af0e8f49cb
2020-08-19 19:12:21 +00:00
Cameron Ayer
586e6f2f13 private/testblobs, storage, storage/filestore: add storage dir verification to filestore
Sometimes SNOs fail to properly configure or lose connection to their storage directory
which can result in DQ. This causes unnecessary repair and is unfortunate for all parties.

This change introduces the creation of a special file in the storage directory at runtime
containing the node ID. While the storage node runs, it periodically verifies that it can
find said file with the correct contents in the correct location. If not, the node will
shut down with an error message.

This change will solve the issue of nodes losing access to the storage directory, but it will not
solve the issue of nodes pointing to the wrong directory, as the identifying file is created each
time the node starts up. After this change has been the minimum version for a few releases, we will
remove the creation of the directory-identifying file from the storage node run command and add it
to the setup command.

Change-Id: Ib7b10e96ac07373219835e39239e93957e7667a4
2020-08-19 17:18:14 +00:00
Moby von Briesen
708cb48aa6 storagenode/orders: implement orders filestore on storagenode
* Add all new orders to the orders filestore instead of the database.
* Submit orders from the filestore to the new satellite SettleWindow
endpoint.

The orders filestore will eventually replace the orders DB completely.
For now, we will still be checking the orders DB and submitting those
orders if they exist. In a later release, we will completely remove the
orders DB, but we need both the DB and filestore for the transitionary
period.

Change-Id: Iac8780fd5ab770296181bbd313e1d335f072d4dc
2020-08-19 15:00:35 +00:00
Ethan
5445d595c0 storagenode/gracefulexit: Wait for the worker delete and transfer goroutines to finish before completing the exit
A failed test showed the same piece being deleted twice. This happens if the graceful exit completes before a previous piece deletion finishes. This change adds a "wait" on the limiter before executing the delete all step when GE is done.

Change-Id: I1c8c49d1e501c2728c80d4224a4854e742be27da
2020-08-19 14:20:26 +00:00
Egon Elbre
be3fd0147e storagenode/storagenodedb: database name in all preflight errors
Shorten the error strings and include database name in all potential
preflight errors.

Change-Id: Ic92ca1ec6e14ffbddb0a0cf89e357eec9532d27e
2020-08-18 16:31:19 +03:00
NickolaiYurchenko
4cdba365ef web/storagenode: payout history table
Change-Id: I448ea8424baf31400d9868ef9ca2b8002caa7bbd
2020-08-13 12:05:56 +00:00
Egon Elbre
94a09ce20b all: add missing dots
Change-Id: I93b86c9fb3398c5d3c9121b8859dad1c615fa23a
2020-08-11 17:50:01 +03:00
Qweder93
4ee1b2d45a storagenode/console: added list of all audits per satellite to sno dashboard/satellites
Change-Id: I52e58748d6467f372d9a308347fc77e400d137e2
2020-08-10 12:55:07 +00:00
Qweder93
373934efb2 storagenode/heldamount: payout history: removed extra doubling with surge percent, added held percent
Change-Id: Idd3927c3130bff771e5437b9b18b4a4907f787e4
2020-08-10 15:29:34 +03:00
Qweder93
f804f03b1f storagenode/heldamount: payuout history updated
Change-Id: I6dc91e9eed51f9b81af3e47a45168c43d254356a
2020-08-05 09:58:34 +00:00
Qweder93
53a5d18e1a storagenode: fixed logging about piece being moved to trash, and added logging when piece was actually deleted
Change-Id: I46f6a141b27033c2087b5c4681506d80b90f4a18
2020-08-02 20:00:05 +03:00
Qweder93
b4c9badab1 storagenode/console: estimation payout fix
Change-Id: I5d9f11fffd74978f3ca684fd08aac44a27a83c71
2020-07-27 21:41:07 +03:00
Qweder93
5988ad6646 storagenode/heldamount: payout history extended with satellite's id, url, surge percent
Change-Id: I669b7b6073ded48fd5686c587357b6c86a970fc7
2020-07-25 14:30:07 +03:00
Qweder93
123aebd79f storagenode/version: version chore test fix
Change-Id: I61537ea325779cefbb1f8d7c5d373dc4bf80a7aa
2020-07-24 20:17:35 +03:00
Qweder93
f531bc8638 storagenode/heldamount payout-history rout fix, usage_at_rest in estimation payout calculations fixed
Change-Id: I6f819a404a45b2a96c1aae33c67ebea1ab83aef0
2020-07-24 15:19:45 +00:00
Yaroslav Vorobiov
4d2a505788 storagenode/db: explicitly open and create dbs
To prevent storagenode from implicitly recreating missing dbs and storage,
as such behaviour leads to audit failures. Do not allow storagenode to
start if any of dbs or storage is missing, corrupted, or dedicated storage disk is
unmounted, to get downtime instead.

Change-Id: Ic64e1f0ff4d8ef5b2fddbe7a7e53df4f4bd8652e
2020-07-24 14:08:47 +03:00
Qweder93
92efffb48a storagenode/version: notification flow now based on cursor, chore_test added, versioncontrol added to reconfigure.
Change-Id: I70713def8d585228270ec5a8c586ecc5b4d510c4
2020-07-23 14:13:24 +00:00
Cameron Ayer
1f5d5235a6 storagenode/{monitor,piecestore}: if free disk < expected available space, return free disk
We only sync free disk and available space, if necessary, on startup. If the
SNs disk fills up with non-storj data, we will not know about it when reporting
available space to the satellite.

Solution: whenever we check the node's capacity, double check free disk.
If free disk < than expected available space, return free disk.

Change-Id: I66265c16e03be45b6e1f5817c70df7eac0a76455
2020-07-22 15:08:37 +00:00
Qweder93
aa6afc3879 error handling in heldamount cash and collector delete fixed
Change-Id: I8fe58c50f844a6b819eacc14a40bc5c67268ed5c
2020-07-22 12:26:13 +00:00
Qweder93
0949731caa storagenode/console: estimation payout held split from total payout, calculations fixed
Change-Id: I064f473ffeb3a3051c9228d1dd84fe0fc86dd3ef
2020-07-21 15:31:51 +03:00
Egon Elbre
d8dcae3075 all: fix error checking
Change-Id: Ia0da1bbd6ce695139922f94096c2419281905e32
2020-07-16 19:13:14 +03:00
Egon Elbre
e70da5cd4e all: fix comments
Change-Id: I2d2307e3fab87de47a72b3595d051e2c95ff4f8a
2020-07-16 19:13:14 +03:00
Egon Elbre
080ba47a06 all: fix dots
Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2
2020-07-16 14:58:28 +00:00
Qweder93
dfdf73282d storagenode/heldamount: db tests updated with payout.Receipt
Change-Id: I17699b923c5a4d7decbd446c382f0c886c36d5e1
2020-07-16 12:24:22 +00:00
Moby von Briesen
1b807761bd storagenode/orders: Update orders filestore to be compatible with new satellite endpoint
* Instead of archiving a list of orders and deleting an "unsent" file in
separate steps, archival simply moves the old unsent file to a new
archived file
* Add maxInFlightTime to be used along with grace period for sending
buffer
* Create unsent/archival directories in constructor
* Code cleanup

Change-Id: Ia3bc2aaf60cced6c6d413465423d78c7d5151188
2020-07-15 14:21:56 -04:00
Qweder93
62fec25104 storagenode/heldamount: returns usage_at_rest in tbm instead of tbh
Change-Id: I183a56460ea76a53680ca6861d02cecebe3576ec
2020-07-15 15:46:13 +03:00
stefanbenten
257855b5de all: replace == comparison with errors.Is
Change-Id: I05d9a369c7c6f144b94a4c524e8aea18eb9cb714
2020-07-14 15:50:25 +00:00
Qweder93
7b4a8c4d6d storagenode/heldamount: payoutHistory added
Change-Id: I93dd3d024085d19ecff76075e52bf66796207fd6
2020-07-14 17:35:03 +03:00
Qweder93
7d6973b5a2 satellite: heldamount and nodestats not returning error node not found by rpc
Change-Id: Ifb00b16a4a04603251de60da6a6612fd5e98d597
2020-07-14 16:31:02 +03:00
Egon Elbre
262da14359 storagenode/console/consoleapi: disable flaky TestStorageNodeApi
Change-Id: I076c9a46fece86d34eae117ab84f94f99e7e64e0
2020-07-13 18:35:38 +03:00
Isaac Hess
78f5755d46 storagenode/nodestats: Add sat to heldamount error
Nodes are receiving an error that heldamount rpc doesn't exist on a
satellite. This simply adds which satellite to the error.

Change-Id: I7708e0511b55fdd2425969db2a545645339bad81
2020-07-13 14:29:09 +00:00
Qweder93
facde770de storagenode/heldadmount: removed logging errors node not found during getAllPaystubs/Payments from trusted satellites
Change-Id: I87f6c697d98546812450fcfb090623c76dec4bbc
2020-07-13 16:45:23 +03:00
Qweder93
f73e92c268 storagenode/gracefulexit: added blobs clean
on node's start checks if any of trusted satellites has GE status "Exited successfully"
if so - trying to delete blobs/satellite folder, so no trash left on SNO.

Change-Id: I566266c84f2a872df54cd01bc2f15a9934f138ed
2020-07-13 11:49:18 +00:00
Qweder93
e17243fcd7 storagenode/console: estimation payour for current and previous month reworked
Change-Id: I937d5d8f7c17949b539dcd6e36af27400a5043e2
2020-07-10 12:18:53 +00:00
Qweder93
0521435e08 storagenode/gracefulexit: added deletion of all files left in storage/blobs/satellite after successful GE
https://storjlabs.atlassian.net/browse/SG-368

Change-Id: I29a978fe0d0153aedf2be91dc7f45b4ef386d447
2020-07-08 14:38:31 +03:00
Bill Thorp
a3c902ab84 storagenode/pieces: hours in a month should be 720
Per https://documentation.tardigrade.io/pricing/billing-and-payment:
"The calculation of per object fees is based on a standard 720-hour month."

On most years, the average value is 730 (365*24/12), except leap years.
However, we want to have ours be 720 (30*24) so its lines up with days.

Change-Id: Ifb9691878f1a7ea81ed36c92b37985493295fe31
2020-07-07 15:26:15 -04:00
Moby von Briesen
e9dd5b2845 storagenode/piecestore: Properly log/send metrics for all successful pieces
When an uplink or repair work finishes uploading a piece to a
storagenode, it has no reason to wait another round trip after the piece
is committed to gracefully close the connection - in many (most?) cases,
the connection is simply canceled once the upload is complete. This has
the unintended side effect of producing a lot of "piece canceled" logs
and metrics on the storagenode side, when the reality is that the piece
uploads were successful, and not really canceled. This commit fixes
that.

Change-Id: Icbc1f7857d380134560219c1c19c186df2783cd0
2020-07-07 15:19:17 +00:00
Qweder93
ac716e1514 storagenode/heldamount: payment receipt added to monthly paystub, heldamount.service separated for service and endpoint
Change-Id: Id759586c6362edbef34c230d4f0d2585c11c9b47
2020-07-06 09:51:52 +00:00
Cameron Ayer
35b709ba18 storagenode/storagenodedb: check if db is nil before closing
In the event of an error in storj.io/storj/storagenode/storagenodedb.(*DB).openDatabases
the caller will attempt to close all databases. However, the error prevents the DB from
being opened and set in the proper place. Attempting to close results in a nil pointer
dereference

https://forum.storj.io/t/node-wont-start-after-update-to-v1-6-4-runtime-error-invalid-memory-address-or-nil-pointer-dereference/7889

Change-Id: Ibfe6f3e13c36d9d15a0cb46e384f0120afdab60b
2020-07-02 15:02:38 +00:00
Qweder93
577f72cb92 storagenode/version: notifications added
Change-Id: Ib9720d8124d8e078354a292b644e2db1f5fffe67
2020-07-01 19:35:46 +03:00
NickolaiYurchenko
b878fcc4b2 storagenode/heldamount: id removed from satellite name
Change-Id: Ic524a40930a5fe7673ccce817d6f68c3538e5208
2020-07-01 15:38:05 +03:00
Qweder93
9b90712aa0 storagenode/heldamount: payents added to db
Change-Id: Ib6c486251ca08d34003c35379d10314127edf103
2020-06-30 17:24:35 +03:00
Qweder93
9a02149654 storagenode/heladamount: held history extended with joined_at date, total_held and total_disposed amounts
Change-Id: I41fe9ab8c5667aa988257a94848ea70225305d79
2020-06-30 13:33:25 +00:00
Yingrong Zhao
51dfc6bf4f storagenode/gracefulexit: make minimum transfer speed to be 5KiB
with 128B/sec, a satellite with 10min default timeout could already
closed its connection to a node even though the node was able to
compelete the transfer.

Change-Id: I6173d6473a62c6d0b0e0a8765c1ae0a5e57b0a08
2020-06-23 21:14:18 +00:00
Ivan Fraixedes
98d477effb
storagenode/collector: Fix comment doc
Change-Id: I703dce7d1b7d7653bbea901c798266a0108b9eec
2020-06-19 13:51:23 +02:00
Jeff Wendling
3842118bab storagenode/heldamount: use correct field for repair usage
Change-Id: I1e0d0bd4c416a21d6900fb723185599f58391d8a
2020-06-12 10:55:35 -06:00
Qweder93
2c3fe5597d storagenode/nodestats: unknown_audit_score added to Service
Change-Id: I1f97f6f0eace9858e466a53d4d4eeabe8059e4eb
2020-06-12 14:00:41 +00:00
Qweder93
0826c8d87f satellite/heldamount: fix dimension of usage_at_rest
Change-Id: If1518ad41736912d15fb2c882c9e236c16f85a51
2020-06-11 13:07:51 +00:00
Moby von Briesen
be59727790 storagenode/orders: Add archival functionality to orders filestore
* Allow orders to be archived after being settled successfully with the
satellite.
* Allow for cleanup of orders that were archived before a certain time.
* Rewrite other parts of the orders file store to work better with new
design.

Change-Id: I39bea96d80e66a324ec522745169bd6d8b351751
2020-06-11 08:47:37 +00:00
Qweder93
e52809d53e cmd/storagenode: add check if satellites available to gracefulexit
Change-Id: I8747507593d810bbdec0d140de0600ee147011c3
2020-06-10 13:38:36 +00:00
Moby von Briesen
0b109c32e4 storagenode/piecestore/usedserials: add monkit metric for serials that are randomly deleted
This will give storagenode operators a better idea of whether the memory
allocated to the usedserials store is sufficient.

Change-Id: I5c30f2e39473a573f43409511ad9e2e32680479c
2020-06-09 17:04:37 -04:00
Yaroslav Vorobiov
09ca382abf storagenode/db: preflight improve index discovery
Change-Id: I876b321f6cd4e91dfced87aa4d39f2cf9a8e63d0
2020-06-05 14:03:25 +03:00
Qweder93
7f8e553022 console/dashboard: added pieces headers size to calculations
Change-Id: I0ee8d6bcb9ce9f69d49ebac2b95579166389668e
2020-06-04 16:39:02 +00:00
Moby von Briesen
c8c0a42269 storagenode/orders: begin implementation of file store for order limits
* Will replace order limits database.
* This change adds functionality for storing and listing unsent orders.
* The next change will add functionality for order archival after
submission.

Change-Id: Ic5e2abc63991513245b6851a968ff2f2e18ce48d
2020-06-03 22:47:04 +00:00
Egon Elbre
4d0cb1af7e storagenode/piecestore: verify before checking free disk
Change-Id: I3fe0383f9b13b99ef9d63ff235616ff204cf6d76
2020-06-02 17:49:14 +03:00
Qweder93
89c9672ce0 storagenode/piecestore: available storage check added in Upload
Change-Id: I71e9e5f335d4320d5de8b374fe747fec43179f78
2020-06-01 16:55:22 +00:00
Egon Elbre
07050eea26 all: use common/storj
Change-Id: Id1e36d52f9807b5ffbb72ce73f4b60cb21b68a78
2020-05-29 11:57:32 +03:00
Moby von Briesen
dc57640d9c storagenode/piecestore: switch usedserials db for in-memory usedserials store
Part 2 of moving usedserials in memory
* Drop usedserials table in storagenodedb
* Use in-memory usedserials store in place of db for order limit
verification
* Update order limit grace period to be only one hour - this means
uplinks must send their order limits to storagenodes within an hour of
receiving them

Change-Id: I37a0e1d2ca6cb80854a3ef495af2d1d1f92e9f03
2020-05-28 12:52:52 -04:00
Moby von Briesen
909d6d9668 storagenode/piecestore/usedserials: add in-memory store for used serials
Implement an in-memory store for keeping track of order limit serial
numbers. It automatically deletes items if its size exceeds a configured
limit.
This change is part 1 - it creates the store
In part 2, the in-memory store will replace the usedserials database

Change-Id: I36f540ed809f034a27c1d7cede8a0a8b080af818
2020-05-28 12:52:52 -04:00
paul cannon
7395dd1e6e storagenode/gracefulexit: revalidate existing pieces
..before they are transferred to another node and submitted to the
satellite as successful piece transfers, because if we submit an invalid
signature, the node will be marked as a cheater and disqualified
immediately.

These signatures should have been validated when the piece was
originally stored, but bitrot does happen and needn't be cause for an
immediate DQ.

Change-Id: I8b0ebd5812ea8a2e60766005b7251fbb74ef7857
2020-05-28 09:50:14 -05:00
Qweder93
73214c6d1c storagenode/heldamount: heldhistory reworked to all satellites
Change-Id: I8d7707fddfbdc52d29951a8a002978c7fbb07049
2020-05-28 11:44:26 +00:00
Qweder93
f2a0c64425 storage/filestore: log potential disk corruption
In walkNamespaceWithPrefix log in case of "lstat" error, because this may indicate an underlying disk corruption.

SG-50

Change-Id: I867c3ffc47cfac325ae90658ec4780d213ff3e63
2020-05-27 12:12:55 +00:00
Qweder93
8db848791f storagenode/console: added estimated payout for current month and estimated pay stub for previous month (until there's real data in satellite's table) + heldback percentage rate for previous month.
Change-Id: I9346f6d22ed6fbb7e5346b102fc898467678f384
2020-05-27 14:51:23 +03:00
littleskunk
2fbb34c3ea
nodeselection: Increase minimum free space to 500MB (#3898) 2020-05-25 12:13:28 +02:00
crawter
f5ac678b0a storagenode/satellitesdb: added FK constraint to satelliteID
Change-Id: If5adf2b92627fcf80850670ba672b346320ddd87
2020-05-21 13:01:20 +00:00
Egon Elbre
bef84a5f9d storagenode: remove dependency to overlay.NodeDossier
This is the last dependency from storage node to satellite.

Change-Id: I12f7abb91e84f823ba5af126c6e2979519838612
2020-05-21 08:37:13 +03:00
crawter
2c9afe7f17 storagenode/console/api/helamount: periods with heldamount data endpoint added
Change-Id: Ie893f56f02c7a76bcfc21c32c10bd1f1d05660e7
2020-05-20 11:45:06 +00:00
Qweder93
49ad90dcd8 storagenode/reputation: unknown_score (unknown_alpha / unknown_alpha+unknown_beta) added to reputation stats, https://storjlabs.atlassian.net/browse/SG-326
Change-Id: I0b29ad736f7a11c7e57a846b6891f4b40755aa48
2020-05-20 11:25:14 +00:00
Egon Elbre
941d10cbc3 private/testplanet: remove Peer.Local()
Currently storagenode depends on overlay.NodeDossier, this is the first
step in removing it.

Change-Id: I034a3f1601835f8349bd41752455022e19bcc707
2020-05-20 11:05:34 +00:00
Egon Elbre
94b2b315f7 storagenode/trust: refactor GetAddress to GetNodeURL
Most places now need the NodeURL rather than the ID and Address
separately. This simplifies code in multiple places.

Change-Id: I52621d8ca52296a8b5bf7afbc1001cf8bfb44239
2020-05-20 11:05:15 +00:00
Egon Elbre
ed627144ed all: use DialNodeURL throughout the codebase
Change-Id: Iaf9ae3aeef7305c937f2660c929744db2d88776c
2020-05-20 10:36:30 +00:00
littleskunk
ef2671927d
storagenode/piecestore: move queue size defaults (#3881) 2020-05-15 19:10:26 +02:00
Qweder93
ef87192120 storagenode/notifications: tests fixed: added time interval between inserts so created_at fields are different when running tests on Windows
Change-Id: I26ba059ab58d0216122ab2f49ae85f7ce7cfced4
2020-05-14 05:26:18 +00:00
Ethan
159df8b2e4 Add logging listener for retrieving and setting log levels
See https://storjlabs.atlassian.net/browse/SM-752

These changes allow us to change the log level at runtime through a handler off of the debug endpoint.

Examples of changing the log level on storj-sim

To get the current level for the satellite api process:
curl -XGET 'http://127.0.0.1:10009/logging' --header 'Content-Type: text/plain'

To change the log level:
curl -XPUT 'http://127.0.0.1:10009/logging' --header 'Content-Type: text/plain' --data-raw '{"level":"error"}'

Change-Id: I05d164b290929fa06b6d78c01075ee41f8238044
2020-05-12 16:38:06 -04:00
Kaloyan Raev
e0cf9ae888 storagenode/orders: set devDefault sender interval to 30s
Change-Id: Ic2877da12b5985b73d1565e8ccd65bc817a76448
2020-05-11 18:55:56 +03:00
Egon Elbre
ec589a8289 all: fix comments about grpc
Change-Id: Id830fbe2d44f083c88765561b6c07c5689afe5bd
2020-05-11 13:05:34 +03:00
Egon Elbre
7d29f2e0d3 all: remove drpc wrappers
Change-Id: I45016f7d2a771dc00776196c1f531f3343e93b40
2020-05-11 08:20:34 +03:00
Egon Elbre
e6d5ce6b77 all: remove grpc
It seems everyone has migrated to drpc.

Change-Id: Ica6b2d0bdef68c6603083f2963458843eca71e9e
2020-05-10 06:36:09 +00:00
Yaroslav Vorobiov
d8bdc60e19 storagenode/console: remove last ping id and address
Change-Id: I001e7d31d50aee7b9a07608158f8bd5c407c9c45
2020-05-07 13:34:29 +03:00
Jeff Wendling
efee3563a8 storagenode/storagenodedb: fix joined_at migration
Change-Id: I9091741bbec6e7a2fc6addc23249a40176ec040a
2020-05-05 15:01:07 -06:00
Qweder93
b754806b23 storagenode/reputation: unknown_audit_reputation_alpha and beta added to db, and reputation endpoint
Change-Id: I5d92268851bb9a202cc5d6b4d403467e6f692726
2020-05-05 15:48:04 +00:00
Qweder93
8a443bc9f5 storagenode/console: api notifications tests added
Change-Id: I472abc22746ea95841dcb0ae72dacee06fd98a2e
2020-05-05 17:53:19 +03:00
Qweder93
16cd9b06ec storagenode/heldamount: added api for heldamount history separated by periods
Change-Id: I170010364269822848bc6cd051e0e0fb3df95d91
2020-05-05 12:29:44 +00:00
Qweder93
7a83473f00 storagenode/console: API /dashboard disk-space data separated for UsedForPieces, Trash, Available
Change-Id: I7411da2c92c72a24af98e007efe476d8a023db82
2020-05-04 12:36:18 +00:00
Egon Elbre
2fad150952 storagenode/piecestore: remove piecestore.Client.Delete
This test was the last place using it. Replace it with a direct call so
we can remove the method from uplink piecestore.

Change-Id: I62e13028663a7e67aa2495f90ecc02d0d8657fbd
2020-05-04 10:14:55 +03:00
Jeff Wendling
57eb8a17e2 storagenode: allow configuring database path independently
Fixes #3852

Change-Id: I021c29c4dd7c393399f6abef41d8457514032833
2020-05-04 06:04:31 +00:00
Egon Elbre
c630cf2490 storagenode/pieces: implement buffering for writing
Currently uploads can cause a lot of IOPS, reduce this by introducing a
in-memory buffer on-top of the file.

Change-Id: I5f4e3e01c0a36258271d180b922107de447bcb59
2020-05-04 06:01:32 +00:00
Jeff Wendling
4ad01e8170 storagenode/storagenodedb: backfill reputation.joined_at
it was being used in ways that implied it should be NOT NULL
even though it was possibly null. we used to get this data
from the satellite db's added_at column as seen in 30369b02,
so backfill it using that data where joined_at is NULL, and
then alter the table to constrain the column to be NOT NULL.

Fixes #3866.

Change-Id: If2d856189209740d985f71dada7b93525e625ef3
2020-05-01 17:59:53 +00:00
Qweder93
6c4d3f133f storagenode/dashboard: trash added to avaliable space calculations
Change-Id: Ia6f3af20dc98f569b86796ffa68428065d662c78
2020-05-01 15:26:02 +00:00
Jeff Wendling
202e65e408 storagenode/storagenodedb: do correct generalized alter table procedure
According to the docs at https://www.sqlite.org/lang_altertable.html
doing the steps

    1. Rename old table
    2. Create new table
    3. Copy data
    4. Drop old table

is incorrect and should be

    1. Create new table
    2. Copy data
    3. Drop old table
    4. Rename new into old

Additionally, each step was being run in a different transaction,
which could cause permanent failures if a problem happened during
the migration.

Avoid both of those problems by changing up some previous migrations
that ran in this way. Since they are semantically identical, it's
fine to change up these old migrations. It will help make newer
nodes coming up for the first time more robust.

Change-Id: I43fb004fa1b6cb2fe2554f9920925420da28fb4a
2020-05-01 13:53:26 +00:00
Matt Robinson
4a95ad0b2f
storagenode/piecestore: remove error from info log for download canceled (#3869) 2020-04-30 18:42:16 +03:00
Egon Elbre
00b2dfa313 storagenode/orders: fix TestDB and TestCleanArchive
Tests were relying on too exact time values.

Change-Id: I83aa0e815b06372a65ec2d62b23df86051050d19
2020-04-30 15:00:06 +03:00
Egon Elbre
8928399d02 all: rename CreateTables to MigrateToLatest
CreateTables hasn't been quite true for a while now, rename to
MigrateToLatest to be clearer in it's behavior.

Change-Id: Ida48e95122a5d9b7a814e922d3698e00024a2ba7
2020-04-30 07:21:17 +00:00
Isaac Hess
4f492a1ca9 storagenode/piecestore: add wait to test
Change-Id: I05efeffe2845f6b816bf44e50b79c115a86d4b60
2020-04-29 12:38:22 -06:00
Egon Elbre
d225e2de48 all: add missing ctx.Cleanup calls in tests
Change-Id: Iaa65f90b9731d721691322bb92fc3da736aa10fe
2020-04-29 17:58:40 +00:00
Isaac Hess
237d9da477 storagenode/pieces: Deleter can handle multiple tests
Before the deleter would close its done channel once, so if additional
tests shared a storagenode, even if not in parallel, the later waits
would not work properly. This fixes that problem.

Change-Id: I7dcacf6699cef7c2c2948ba0f4369ef520601bf5
2020-04-29 11:26:56 -06:00
Egon Elbre
131654b080 storagenode/piecestore: make tests use DeletePieces directly
Currently this test was the last place that was using
piecestore.Client.DeletePieces. This way we can remove it from uplink to
reduce the code.

Change-Id: I72fda8888d05181f95eeb544d067c031ec3e36a0
2020-04-29 15:52:22 +00:00
Michal Niewrzal
b2acd93a78 storagenode/inspector: adjust Uptime type change in protobuf
Change-Id: I4f5110b534ac4f419b74f1a3dd72f8600e0a53a8
2020-04-29 09:28:37 +00:00
Egon Elbre
85c45cd56f private/dbutil/pgtest: support multiple databases for testing
Currently Cockroach isn't performant for concurrent database setup and
tear-down. Instead of a single instance allow setting multiple potential
connection strings and let the tests pick one connection string
randomly.

This improves test duration by ~10 minutes.

While we are at significantly changing how pgtest works, introduce
helper PickPostgres and PickCockroach for selecting the database to
reduce code duplications in multiple places.

Change-Id: I8ad171d5c4c8a4fc081ec2ae9bdd0cc948a80619
2020-04-28 21:55:49 +03:00
Isaac Hess
13bf0c62ab satellite/pieces: Fix race in piece deleter
There was a race in the test code for piece deleter, which made it
possible to broadcast on the condition variable before anyone was
waiting. This change fixes that and has Wait take a context so it times
out with the context.

Change-Id: Ia4f77a7b7d2287d5ab1d7ba541caeb1ba036dba3
2020-04-28 10:50:20 -06:00
Isaac Hess
db0371703f storagenode/pieces: Return UnhandledCount to satellite
When we receive a piece deletion request, include the number of piece
IDs we couldn't add to the queue in the reponse

Change-Id: Ibebbe92ac50105bb5c74b18211ed38d468eb33f3
2020-04-27 08:56:56 -06:00
Isaac Hess
edda8d73bd storagenode/pieces: Piece deleter monitor queue
Each time we process a piece deletion on the storagenode, monitor how
long the item was in the queue and the size of the queue.

Change-Id: I23f1a44f8b9cecb901bdf4739d55c005ffed4bef
2020-04-27 08:55:43 -06:00
Isaac Hess
a785d37157 storagenode/pieces: Process deletes asynchronously
To improve delete performance, we want to process deletes asynchronously
once the message has been received from the satellite. This change makes
it so that storagenodes will send the delete request to a piece Deleter,
which will process a "best-effort" delete asynchronously and return a
success message to the satellite.

There is a configurable number of max delete workers and a max delete
queue size.

Change-Id: I016b68031f9065a9b09224f161b6783e18cf21e5
2020-04-23 11:51:19 -06:00
littleskunk
1336070fec
storagenode/piecestore: add missing log message about audit errors (#3861)
* storagenode/piecestore: add missing log message about audit errors
* storagenode/piecestore: add monkit data for oder limit verification errors
2020-04-23 13:20:47 -04:00
Qweder93
805e328c47 storagenode/heldamount payments removed
Change-Id: I87cc04f43d182a4190a571ef417be85d02db9d34
2020-04-21 17:15:31 +00:00
Qweder93
3d56efc82d storagenode/console/service: Satellites EarliestJoinDate calculation ignores empty date
Change-Id: Ic528467dbf0a47a7779fd7ae054856744298a39c
2020-04-21 17:50:21 +03:00
Qweder93
e999f24e54 storagenode/nodestats/cache: storagenodeDB/heldamount sync with satelliteDB/storagenode_paystub
Change-Id: If894166809bee8a5e036e618005d8141c2a0c594
2020-04-20 19:12:17 +00:00
Qweder93
1af70703ef storagenode/console/service: SatellitePayStubMonthly returns array of objects
Change-Id: I06d86087c81acd2eb3acd73c1997bab9734bae9e
2020-04-20 14:26:07 +00:00
Qweder93
30369b027c storagenode/storagenodedb/reputation: add joined_at
Change-Id: Ic471fac97bf54b537f2c34f24b4069b0641c746d
2020-04-17 12:12:09 +00:00
Egon Elbre
a129a8bd35 all: separate err check for http
We want to avoid net/http dependency in errs2 package, hence we removed
http.ErrServerClosed from IgnoreCanceled and IsCanceled check. Now we
need to add that check explicitly to every http endpoint.

Change-Id: I62b1cc0a0a2d3b43301d713a7951e5022145f88f
2020-04-16 18:50:24 +03:00
Jess G
dc78cd9634
storagenode/piecestore: fix annoying info log for upload canceled (#3857)
* storagenode/piecestore: fix annoying info upload canceled log

Change-Id: I3dd0f44226e7b946a2b30c3d0f30f28749ca6e88

* keep as info

Change-Id: I7ebb5c19e4865e3030a8a6bb7f6279d316853e89

Co-authored-by: Fadila <Fadila82@users.noreply.github.com>
2020-04-15 14:15:07 -07:00
Jess G
75b9a5971e
satellite: update log levels (#3851)
* satellite: update log levels

Change-Id: I86bc32e042d742af6dbc469a294291a2e667e81f

* log version on start up for every service

Change-Id: Ic128bb9c5ac52d4dc6d6c4cb3059fbad73f5d3de

* Use monkit for tracking failed ip resolutions

Change-Id: Ia5aa71d315515e0c5f62c98d9d115ef984cd50c2

* fix compile errors

Change-Id: Ia33c8b6e34e780bd1115120dc347a439d99e83bf

* add request limit value to storage node rpc err

Change-Id: I1ad6706a60237928e29da300d96a1bafa94156e5

* we cant track storage node ids in monkit metrics so lets use logging to track that for expired orders

Change-Id: I1cc1d240b29019ae2f8c774792765df3cbeac887

* fix build errs

Change-Id: I6d0ffe058e9a38b7ed031c85a29440f3d68e8d47
2020-04-15 12:32:22 -07:00
Egon Elbre
d3ce845f82 satellite: revert log lines used to figure out node id
Currently storj-sim relies on the log lines to be exactly the same,
when they change it cannot find the necessary information from log.

Change-Id: Ia039915ef3375a7cf60f107b2c05c958de15b6d5
2020-04-15 17:07:56 +03:00
Kaloyan Raev
a2ce836761 remove sugar logging
Change-Id: I6b6ca9704837cb3f5f5449ba7f55661487814d9f
2020-04-15 12:37:47 +00:00
Qweder93
bb4b7a919e storagenode/console/service satellites extends with data of oldest join to satellite
Change-Id: I413d5d649a0f331bda8fb4b72c4d43cbc8152361
2020-04-14 19:30:19 +03:00
Qweder93
743b3fb226 storagenode/nodestats: add pricing model, storagenode/cache: add paystub history storing
Change-Id: I9bc104a1407c8f286a964c796656d89b122bf752
2020-04-14 19:04:00 +03:00
Moby von Briesen
14b3704f56 storagenode: add suspended status to storagenode dashboard/api
* Add migration to storagenode reputation table to add suspended
timestamp
* Send suspended info to storagenode from satellite nodestats endpoint
* Add suspended status to storagenode api
* Add an indicator on the storagenode dashboard informing operator of
the satellites the node is suspended on

Change-Id: Ie3669f6069cc0258ba76ec99d17006e1b5fd9c8a
2020-04-09 13:36:23 +00:00
Egon Elbre
11a44cdd88 all: don't depend on gogo/proto directly
Change-Id: I8822dea0d1b7b99e0b828e0373a0308a42dde2be
2020-04-08 17:32:15 +00:00
Qweder93
0f71e60a53 storanode/version/chore notifications temporary disabled
Change-Id: Ifacdede772bc7b678288410d215e665b014f566f
2020-04-06 16:57:12 +00:00
Stefan Benten
0a4d253990
storagenode/storagenodedb: Improve preflight schema error message (#3844) 2020-04-03 11:20:24 +02:00
Egon Elbre
6492b13d81 all: remove old uuid
Change-Id: I3a137f73456f010c37d3933dbe12cbbb840b809f
2020-04-02 19:30:36 +03:00
Egon Elbre
1024bf9ce1 all: simplify uuid usage
Instead of uuid.Parse, use uuid.FromString.
This removes a bunch of pointer management logic.

Change-Id: Id25bd174eb43c71d00b450158a198abafd8958f2
2020-04-02 13:45:19 +00:00
Michal Niewrzal
fe2340285f storagenode/console/consoleserver: fix TestConsole tests
https://storjlabs.atlassian.net/browse/SG-145

Change-Id: Idee6050cea5fd53ec631a7c64fb18f1ad7f16af0
2020-04-02 13:17:04 +00:00
Egon Elbre
8f73fb7a32 all: simplify uuid usage
uuid.UUID implements driver.Value so it can be directly used as a
scannable result.

Replace uses of dbutil.BytesToUUID with uuid.FromBytes.

Change-Id: I51a670185ceb3cc2199d5aa2b76bc3fc191ca8fe
2020-04-02 05:48:58 +00:00
Yingrong Zhao
f663906357 storagenode/contact: call return value from mon.Task() on function finish
Change-Id: I5e6462acb99ac1d28b5d2518d5db8a4afe593d11
2020-04-01 23:26:14 +00:00
Egon Elbre
0a69da4ff1 all: switch to storj.io/common/uuid
Change-Id: I178a0a8dac691e57bce317b91411292fb3c40c9f
2020-03-31 19:16:41 +03:00
Qweder93
1fb37915c3 heldamount date picking hot fix
Change-Id: I0cd0787e20e8af35a44bf05fa5c177434f563572
2020-03-31 15:26:26 +03:00
Qweder93
dc32f1da55 storagenode/cache/heldamount added, errNoRows ignored
Change-Id: If6b675e622d6c1324c0893c43cca93dc5323cd78
2020-03-31 11:35:58 +00:00
Jess G
a73416e208
revert to use the config for storagesync (#3837)
Change-Id: I11920b54736e7e548f5638a9ba5f6380a94d99c4
2020-03-30 13:43:38 -04:00
Egon Elbre
e1a443b04a private/testplanet: allow modifying created database
Instead of providing the database from outside to testplanet create it
inside and then allow wrapping and modifying it. This is more convenient
to use.

Change-Id: I9b8f69e6e0a19ff984b4e2bfe927c9100c77bc6c
2020-03-27 19:14:48 +00:00
Egon Elbre
e8f18a2cfe private/testplanet: expose storagenode and satellite Config
Change-Id: I80fe7ed8ef7356948879afcc6ecb984c5d1a6b9d
2020-03-27 17:01:25 +02:00
Jeff Wendling
97e980cd8a private/dbutil: add database name to configure as a tag
storagenodes have like 10 or more databases. without this
tag they all get sent as the same value, stomping on each
other.

Change-Id: Ib12019684d6ea8f2a5b83df584056dfa79e3c4b3
2020-03-26 16:50:15 +00:00
crawter
f879bfcf70 storagenode/console/server/heldamount - endpoint tests added
Change-Id: I4a324e808fd18e55bad28c1cce2ea90c41dc659e
2020-03-26 15:50:33 +00:00
Jennifer Johnson
b75cbc8e24 satellite,storagenode: remove references to free bandwidth
Change-Id: I42a6597544804fa9235e89ec656ebc365eb522e5
2020-03-25 22:28:34 +00:00
Yingrong Zhao
b7b19289d1 bump storj.io/common to latest
Change-Id: I16e337660ce8e1ef332cc842dbf4cfa067b9b98b
2020-03-25 09:08:40 -04:00
Yingrong Zhao
a731472496 bump storj.io/common to latest and storj.io/drpc to v0.0.11
Change-Id: I7a6e823b441eeff4621dfdf2d6577be76c9761c8
2020-03-24 15:17:10 -04:00
Michal Niewrzal
fdf40a7526 storj: remove storj/private/version package which was moved to
`storj/private` repo

Change-Id: I81c3f5b9d5e4fe7bca760999eb045ee9734e5e2e
2020-03-24 14:31:33 +00:00
Egon Elbre
a9fb7b7694 storagenode/trust: fix go 1.14 failures
Change-Id: I6d147f8e5907666ffb3a3e487fa179e377c70841
2020-03-24 11:04:30 +00:00
Michal Niewrzal
f0aeda3091 storj: remove from storj/pkg packages moved to storj/private repo
* debug
* traces
* cfgstruct
* process

Package `storj/private/version` will be removed as a separate change.

Change-Id: Iadc40faa782e6225513b28218952f02d9c240a9f
2020-03-24 09:56:29 +01:00
Egon Elbre
1b6ab173a8 private/context2: moved to storj.io/common/context2
Change-Id: Ic1dd1ed645ff3e1057c9b2b143e2c3ddf29d678e
2020-03-20 14:39:46 +00:00
Qweder93
8597e6b512 storagenode/console/api period payment api extended
Change-Id: I18ec331c6a684e3a9351e3c917bacdb8b8f18c28
2020-03-19 16:51:31 +02:00
Qweder93
0df586c3a8 satellitedb/heldamount updated, tests added + storagenode console updated
Change-Id: I10f568a426d0fc42069d025de2accbef5b26dc0c
2020-03-19 15:37:45 +02:00
crawter
fde5c3542b storagenode/console/api: period payStub api extended
Change-Id: I624bbf7a9640f9df97789bea109201cbfb556753
2020-03-19 14:42:02 +02:00
Qweder93
5dc1b7db90 storagenode/cache heldamount disabled
Change-Id: If81a19ba196b10d2bcac6e2850c8d07edb9a85b5
2020-03-16 15:20:56 +00:00
crawter
f12b6dc27e storagenode/console/api: monthly payStub api extended
Change-Id: I816aeb01ea1338372e3a5f07a11dfdc0ffe20393
2020-03-16 03:39:08 +02:00
crawter
89374e260d storagenode/console/consoleapi: using cached data in heldamount api
Change-Id: I0efca320eaf722ade1146100bbb0e70d75a5dca3
2020-03-16 01:39:11 +02:00
Qweder93
9f84261c36 storagenode/cache heldamount added
Change-Id: I7fc807789de63e8a9b8ca2018fd73bdb9e01ad0d
2020-03-16 00:28:35 +02:00
crawter
0c18ecf32e storagenode/api refactored
Change-Id: Icfd6ded7a21b3803411688a0a34b0c80b44e756f
2020-03-15 20:30:23 +02:00
crawter
45507d285b storagenode/storagenodedb: heldamount tests added
Change-Id: I862b0b38349a11254426855ffafb1f2f0845cb4c
2020-03-15 14:55:11 +00:00
Qweder93
988bb52855 storagenode/heldamount GetPayment added, console/server updated
Change-Id: I51ebe69f9dc7920e59e91d7ba26617ee61889f78
2020-03-13 17:37:44 +00:00
Qweder93
7b0371e9e2 storagenode/heldamount/service added, console/heldamountapi added, console/server updated
Change-Id: I6290a6ea1b07b222908440defbbd7aec5f2a4cdf
2020-03-13 19:18:03 +02:00
Qweder93
5ccce04338 storagenode/storagenodedb: heldamount added
Change-Id: I213e3abffd7356bbfccb3f33bcbafa558674b8d9
2020-03-13 16:23:59 +00:00
Bill Thorp
94c11c5212 satellite: remove some unnecessary UTC() calls
Fixes some easy cases of extraneous UTC() calls

Change-Id: I3f4c287ae622a455b9a492a8892a699e0710ca9a
2020-03-13 13:49:44 +00:00
Moby von Briesen
178dbb4683 storagenode/storagenodedb: allow storagenodes to start test_table exists
In many cases when a storagenode fails the preflight check, it is due to
test_table existing, which is used to determine read/write capabilities
after the initial schema verification. If preflight ends early due to a
failure or stopped storagenode, it may not get the chance to drop this
table.

This change excludes test_table from the schema comparison to ensure
that it never prevents a storagenode from starting up.

It also adds Preflight DB test for storagenode.

Change-Id: Ib8e71df2e42fda3b2a364fbf7a801891c5831d39
2020-03-09 14:29:46 -04:00
Egon Elbre
f4d5d89b68 private/testplanet: add WaitForStorageNodeEndpoints
After calling uplink.Upload it is not guaranteed that the
storage node has yet saved all the orders since it happens
asynchronously. Hence we need a separate func to wait
for them to complete.

Change-Id: I0c34b3ea6c98dbcf37f80493c0e10a8bdbbb2aaf
2020-03-05 10:33:56 +00:00
Jennifer Johnson
1c1750e6be removes bandwidth limiting
On satellite, remove all references to free_bandwidth column in nodes table.
On storage node, remove references to AllocatedBandwidth and MinimumBandwidth and mark as deprecated.

Protobuf message, NodeCapacity, is left intact for backwards compatibility.
Once this is released to all satellites, we can drop the column from the DB.

Change-Id: I2ff6c6537fc9008a0c5588e951afea58ede85838
2020-03-04 14:04:00 +00:00
Cameron Ayer
7244a6a84e storagenode/{contact, piecestore}: implement low disk notification with cooldown
When a storagenode begins to run low on capacity, we want to notify
the satellite before completely running out of space. To achieve this,
at the end of an upload request, the SN checks if its available space has
fallen below a certain threshold. If so, trigger a notification to the
satellites.

The new NotifyLowDisk method on the monitor chore is implemented using the
common/syn2.Cooldown type, which allows us to execute contact only once
within a given timeframe; avoiding hammering the satellites with requests.
This PR contains changes to the storagenode/contact package, namely moving
methods involving the actual satellite communication out of Chore and into
Service. This allows us to ping satellites from the monitor chore

Change-Id: I668455748cdc6741291b61130d8ef9feece86458
2020-03-03 10:45:37 -05:00
Qweder93
484ec7463a storagenode: notifications on outdated software version
Change-Id: If19b075c78a7b2c441e11b783c3c09fed55060c7
2020-03-02 16:48:02 +00:00
Egon Elbre
64330c55b3 all: use pbgrpc
common/pb moved grpc to a separate package common/pb/pbgrpc.
This updates this repository to use it.

Change-Id: I2de2a190688871cf9cb61f7ea511f8a01e264e4e
2020-02-26 21:27:47 +02:00
Cameron Ayer
d578102672 storagenode/piecestore: add workgroup to endpoint to prevent stray goroutine after shutdown
Change-Id: Ie8444c3c8f870745b73342de2e9a93027fcad371
2020-02-24 21:38:52 +00:00
Cameron Ayer
f22bddf122 {storagenode/contact, private/testplanet}: remove ErrFailureToStart and panic in testplanet.Start
Change-Id: I252e8c9407400af7bda95a7657c8154660c3c801
2020-02-24 18:24:23 +00:00
Yingrong Zhao
5011e78311 storagenode/piecestore: remove unused DeletePiece endpoint
With commit: 3331b443e7, satellite will
start calling `DeletePieces`. Therefore, we can remove the old endpoint
once the above commit is deployed with all satellites

Change-Id: I0124bc00a7cb808d119eb59f8fcd7fadf68158bb
2020-02-21 21:03:49 +00:00
NikolaiYurchenko
2601f25c98 web/storagenode: notification logic implementation
Change-Id: Iec741997312203117213674ef85125fa8a976249
2020-02-21 15:49:27 +00:00
Egon Elbre
5342dd9fe6 go.mod: update uplink
Change-Id: I867a6a1eef8aa5d60bb676e5112b98c4192ce811
2020-02-21 16:08:12 +02:00
Egon Elbre
4044b8eeea storagenode/pieces: ensure chore is stopped before test ends
Change-Id: Ibc26e156d13011bf0f91b4206980200a24d348fe
2020-02-21 10:14:44 +02:00
Cameron Ayer
3e70a893dd storagenode/{piecestore, contact}: report capacity to satellites if below specific threshold
Curently, storage nodes only report their capacity to satellites
once per hour. If a node fills up, it will fail all uploads until
the next contact cycle begins. With these changes, at the end of an
upload we check whether the MinimumDiskSpace threshold has been
passed. If so, trigger the monitor chore to update the node's
capacity, then trigger the contact chore to report the new
capacity to the satellites

Change-Id: Ie6aadaade1e2c12c87e03f8ff9059a50121380a0
2020-02-18 15:42:48 -05:00
Egon Elbre
8f20085683 storagenode/piecestore: clearer client cancellation error message
Change-Id: Ia0595f71eb3eb1c0f091e615652e2de376d5609d
2020-02-14 09:36:03 +00:00
Jeff Wendling
05a240050e storagenode: monitor available space and bandwidth
Change-Id: I5763597327c5b32982faab8910c136c6c8dc18c5
2020-02-13 07:07:29 +00:00
Michal Niewrzal
426c8eb31a private/testplanet: add DeleteBucket method for uplink
New method added to be able to delete easily bucket during tests.

Change-Id: Iaae89618cc676ddbbbd4b0df2eeacd143ea6f3c2
2020-02-11 15:58:13 +00:00
Jeff Wendling
7999d24f81 all: use monkit v3
this commit updates our monkit dependency to the v3 version where
it outputs in an influx style. this makes discovery much easier
as many tools are built to look at it this way.

graphite and rothko will suffer some due to no longer being a tree
based on dots. hopefully time will exist to update rothko to
index based on the new metric format.

it adds an influx output for the statreceiver so that we can
write to influxdb v1 or v2 directly.

Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff
2020-02-05 23:53:17 +00:00
Isaac Hess
17580fdf57 storagenode/pieces: Add test to cache store
This test checks that we are actually walking over the pieces when
starting the cache, and that it is returning expected values.

A recent outage was partially caused by the fact that this cache was
accidentally reading itself (via the pieces store, which has the cache
embedded). This test ensures that does not happen, and checks that when
the cache's `Run` method is called, the space used values are read from
disk and accurately update the cache.

Change-Id: I9ec61c4299ed06c90f79b17de3ffdbbb06bc502e
2020-02-05 11:39:06 -07:00
igor gaidaienko
efa0f6d443 storagenode/monitor: set MinimumDiskSpace default to 500GB.
As a workaround it was set to 0 in previous release. Now according to the TOC must be set to 500GB.

Change-Id: Ia2743d49e86683396958aff51b95df743af4f872
2020-02-04 15:55:42 +00:00
Egon Elbre
9e5679fdaa storagenode/console/consoleserver: set content-type manually
http.FileServer relies on mime types defined in the operating system.
These values may be misconfigured, so a javascript file might
end up being served as "plain/text".

Change-Id: I3c13c8a9ac484bd765a4de0f8253bfe40dde7513
2020-02-03 15:37:47 +02:00
Jeff Wendling
d20db90cff private/dbutil/txutil: create new transactions for retries
it was noticed that if you had a long lived transaction A that
was blocking some other transaction B and A was being aborted
due to retriable errors, then transaction B was never given
priority. this was due to using savepoints to do lightweight
retries.

this behavior was problematic becaue we had some queries blocked
for over 16 hours, so this commit addresses the issue with two
prongs:

    1. bound the amount of time we will retry a transaction
    2. create new transactions when a retry is needed

the first ensures that we never wait for 16 hours, and the value
chosen is 10 minutes. that should be long enough for an ample
amount of retries for small queries, and huge queries probably
shouldn't be retried, even if possible: it's more preferrable to
find a way to make them smaller.

the second ensures that even in the case of retries, queries that
are blocked on the aborted transaction gain priority to run.

between those two changes, the maximum stall time due to retries
should be bounded to around 10 minutes.

Change-Id: Icf898501ef505a89738820a3fae2580988f9f5f4
2020-02-01 18:34:28 +00:00
Jeff Wendling
71ff044edb storagenode/bandwidth: fix tests to not fail for 10 hours near the end of the month
Change-Id: I390569a8702164c42edddd3be020e93782227c2e
2020-01-31 16:25:52 -07:00
Jeff Wendling
03166d6be3 storagenode/piecestore: log available bandwidth and space on uploads
Change-Id: Ia92228cb2a178da45f4f123b48c476e5ec821fe8
2020-01-31 19:47:14 +00:00
Isaac Hess
78d0868bc9 storagenode/pieces: Log error if cannot calculate piece size
Change-Id: I33b49315a0f6044a801a8b118e6b61dbcd751bfe
2020-01-31 09:57:44 -05:00