instead of aborting on the first error, so that we can hit all
satellites and get the best numbers we can
Change-Id: I21d5163884940612d7d39eaf73a6fac07235cd9e
We have added a bug with v0.31.7 and deploying it would kick out all the
storage nodes that are full. Easy fix is setting the requirment to 0.
That will allow them to still start up even if they are full.
Change-Id: Ie66f369952d929fcfd47f44f6e5e57eea8f51ff6
Per default our server address is listening on all IP addresses on the machine.
This caused our preflight check to fail, as it did not have an hostname to lookup.
With this change, we are fine with this and go ahead.
Change-Id: I9eb5c891c099eb35f679d6d7e79ec38bb43b619f
1 transfer with a minimum speed of 128 Bytes was a nice try but it is
way too low. Even a pi3 was able to handle 7 grpc transfers. We have 4
satellites and with 5 concurrent transfers that should be a total of 20
concurrent transfers. Each transfer will have a minimum speed of 5KB/s.
That should give us a better througput and still be Ok on a pi3.
Change-Id: I650a7baf890080901ef70ea3b5636d93009b4e60
With the v0.30.5 release we asked the storage node operators to manually
enable the preflight check while they are in front of their machine. We
didn't want to risk taking too many storage nodes offline at the same
time because of some unknow bug. The preflight check worked. We have no
negative feedback. We can now enable it by default.
Change-Id: Ic670ee52becd0b35eca84af7a0841ea983d7b19d
clock sync
Change 24h and 1h to 30m and 10m respectively for clock sync. If a
storagenode's clock is off by more than 30m for every trusted satellite,
it will not start. If it is off by more than 10m for any trusted
satellite, a warning is displayed.
Change-Id: I05ef611a30a49c1783e3b68b513745922c2f7e28
this is to help protect against intentional or unintentional
slowloris style problems where a client keeps a tcp connection
alive but never sends any data. because grpc is great, we have
to spawn a separate goroutine for every read/write to the stream
so that we can return from the server handler to cancel it if
necessary. yep. really.
additionally, we update the rpcstatus package to do some stack
trace capture and add a Wrap method for the times where we want
to just use the existing error.
also fixes a number of TODOs where we attach status codes to the
returned errors in the endpoints.
Change-Id: Id8bb8ff84aa34e0f711b0cf9bce3908b36a1d3c1
A few variables were not renamed to the new standard piecesTotal and
piecesContentSize, so it was unclear which value was being used. These
have been updated, and some comments made more thorough.
Change-Id: I363bad4dec2a8e5c54d22c3c4cd85fc3d2b3096c
This change updates the storagenode piecestore apis to expose access to
the full piece size stored on disk. Previously we only had access to
(and only kept a cache of) the content size used for all pieces. This
was inaccurate when reporting the amount of disk space used by nodes.
We now have access to the total content size, as well as the total disk
usage, of all pieces. The pieces cache also keeps a cache of the total
piece size along with the content size.
Change-Id: I4fffe7e1257e04c46021a2e37c5adc6fe69bee55
With this change RS configuration will be set on satellite. Uplink with
get RS values with BeginObject request and will use it. For backward
compatibility and to avoid super large change redundancy scheme stored
with bucket is not touched. This can be done in future.
Change-Id: Ia5f76fc10c37e2c44e4f7b8754f28eafe1f97eff
Replace all the remaining uses of sql.DB with tagsql.DB to
fix issues with context cancellation.
Introduce tagsql.Open which helps to get rid of all tagsql.Wrap-s.
Use tagsql in cockroachkv and postgreskv.
Change-Id: I8946d203341cb85a25976896fc7881e1f704e779
Migration step was closing a database that was used by
the migration itself. There is an active tranasction
over the database.
Instead of closing in the same transaction we can wait
until restart for the database cleanup.
Change-Id: Ic971d8cea81a3ab783f4a1bdc6357009c8b31386
Also added temporary types withRebind and withTagTx,
which will be later removed. Currently they help to avoid
changing the whole codebase at the same time.
Change-Id: I7f07ba8f4709a23a463bfa67464628665a05808f
storagenode database preflight check.
Disable preflight database check by default, and have the option to
enable it. This will allow us to enable it once it is definitely
working.
Also change the name of the config flag for preflight time sync.
Change-Id: Ie2e20f9e25dcb38794eafa7e1505e7c6ff287c99
On pieces usage cache init we now load the trash info from the db. Also
fixes a test that was masking the failure here.
Change-Id: I9ff7da5bc6c0f74cf0942e20931b40e0c88d70fa
for storagenode
Ensure that database schema matches latest test migration schema before
allowing the node to start up.
Ensure minimal read/write functionality for each storagenode database
before allowing the node to start up.
This will eliminate many unhandled audit errors we are seeing.
Change-Id: Ic0e628b04a9c35b7a8243f6a81d4683918170ba9
This reverts commit 8e242cd012.
Revert because lib/pq has known issues with context cancellation.
These issues need to be resolved before these changes can be merged.
Change-Id: I160af51dbc2d67c5449aafa406a403e5367bb555
this will allow for some nice runtime analysis down the road.
also, this allows for wrapping database handles in a way that
can interact with these contexts
requires https://review.dev.storj.io/c/storj/dbx/+/514
Change-Id: Ib087b7cd73296dd2c1e0331314da34d861f61d2b
When error is formatted using %v it's not possible to check
whether the error was caused by a context cancellation.
Change-Id: Ia77dfb0817e49d9a7b168c12a6300d131007d0ee