Commit Graph

372 Commits

Author SHA1 Message Date
Isaac Hess
2968857e21 storagenode/pieces: Prevent recalculate from having negative numbers
Change-Id: Iafd2bcb9963e85508cb5e2bd69f229d89c589a6c
2020-01-30 17:47:54 -05:00
paul cannon
157b8c4d71 storagenode/pieces: accumulate errors in traversal
instead of aborting on the first error, so that we can hit all
satellites and get the best numbers we can

Change-Id: I21d5163884940612d7d39eaf73a6fac07235cd9e
2020-01-30 19:31:29 +00:00
Isaac Hess
5a053483b7 storagenode/pieces: read trash from blobstore
Change-Id: Ib134e63a13b8a5dda5d6a9ead42013ce18411227
2020-01-30 13:30:48 -05:00
Isaac Hess
4dafd03f11 storagenode: Prevent negative values in piece_space_used, migrate negatives to 0
Change-Id: Ibd663db087058c928190aa52c520f22e9338dd04
2020-01-30 13:03:18 -05:00
Isaac Hess
00fc192f6b storagenode/pieces: Explicitly walk satellite pieces in SpaceUsedTotalAndBySatellite
Change-Id: I7ff9a1120d4ced0b5cba7d7765ef8aed7a1edae0
2020-01-30 12:01:50 -06:00
Jeff Wendling
21b65ca3b0 storagenode/storagenodedb: migrate to set total to content_size
Change-Id: I4906c2fe9cdb3a32c045c98039d4bde6b8b809e3
2020-01-30 08:53:12 -07:00
Egon Elbre
4e2bf81719 pkg/debug: add better title
Change-Id: Icc6114f4e7523cfe6c7984ef1f6eec664ae4ee65
2020-01-30 07:49:40 -05:00
littleskunk
81eddaa2c1
storagenode/monitor: reduce space requirement to 0
We have added a bug with v0.31.7 and deploying it would kick out all the
storage nodes that are full. Easy fix is setting the requirment to 0.
That will allow them to still start up even if they are full.

Change-Id: Ie66f369952d929fcfd47f44f6e5e57eea8f51ff6
2020-01-30 01:44:45 +01:00
Egon Elbre
d10d6fd153 storagenode,satellite: ignore error on listening debug port
Change-Id: Id3a6d153535776ce41f8edf2bd6f6dad5e2a60bf
2020-01-29 18:06:02 -05:00
Egon Elbre
10be538602 storagenode: add pkg/debug support
Change-Id: If941095b886c28a0d53fff4c9bf9fa0ce7471dea
2020-01-29 16:30:31 -05:00
Egon Elbre
f237d70098 storagenode,satellite: use pkg/debug
Use debug.Server in storage node and satellite for customizing debug server.

Change-Id: I7979412376d028cadf29656d838ab94f18e2aa99
2020-01-29 16:30:31 -05:00
Egon Elbre
e319660f7a private/lifecycle: implement Group
lifecycle.Group implements controlling multiple items such
that their startup and close works.

Change-Id: Idb4f4a6c3a1f07cdcf44d3147a6c959686df0007
2020-01-29 00:37:33 +00:00
Yingrong Zhao
d8e3556a22 storagenode/preflight: wait for server to shutdown when tests are
finished

Change-Id: Ie3ede9f285cb61bb6bc6b0158e41d8ea10b2497e
2020-01-28 17:54:19 +00:00
Stefan Benten
3abb8c8ed7 Dont require an IP address being set
Per default our server address is listening on all IP addresses on the machine.
This caused our preflight check to fail, as it did not have an hostname to lookup.
With this change, we are fine with this and go ahead.

Change-Id: I9eb5c891c099eb35f679d6d7e79ec38bb43b619f
2020-01-28 15:25:17 +01:00
nerdatwork
9ea32016c2 storagenode/orders: fix typos in log messages (#3760) 2020-01-26 13:45:57 -05:00
littleskunk
5c68f4fc7c storagenode/gracefulexit: higher concurrency and shorter timeouts
1 transfer with a minimum speed of 128 Bytes was a nice try but it is
way too low. Even a pi3 was able to handle 7 grpc transfers. We have 4
satellites and with 5 concurrent transfers that should be a total of 20
concurrent transfers. Each transfer will have a minimum speed of 5KB/s.
That should give us a better througput and still be Ok on a pi3.

Change-Id: I650a7baf890080901ef70ea3b5636d93009b4e60
2020-01-24 23:51:39 +00:00
littleskunk
226bc4de36 storagenode/preflightcheck: enable database check by default
With the v0.30.5 release we asked the storage node operators to manually
enable the preflight check while they are in front of their machine. We
didn't want to risk taking too many storage nodes offline at the same
time because of some unknow bug. The preflight check worked. We have no
negative feedback. We can now enable it by default.

Change-Id: Ic670ee52becd0b35eca84af7a0841ea983d7b19d
2020-01-24 23:23:35 +00:00
Moby von Briesen
e4cff1c938 storagenode/preflight: update allowed time difference for preflight
clock sync

Change 24h and 1h to 30m and 10m respectively for clock sync. If a
storagenode's clock is off by more than 30m for every trusted satellite,
it will not start. If it is off by more than 10m for any trusted
satellite, a warning is displayed.

Change-Id: I05ef611a30a49c1783e3b68b513745922c2f7e28
2020-01-24 22:57:13 +00:00
Jeff Wendling
16bb374deb storagenode/piecestore: add large timeouts to read/write operations
this is to help protect against intentional or unintentional
slowloris style problems where a client keeps a tcp connection
alive but never sends any data. because grpc is great, we have
to spawn a separate goroutine for every read/write to the stream
so that we can return from the server handler to cancel it if
necessary. yep. really.

additionally, we update the rpcstatus package to do some stack
trace capture and add a Wrap method for the times where we want
to just use the existing error.

also fixes a number of TODOs where we attach status codes to the
returned errors in the endpoints.

Change-Id: Id8bb8ff84aa34e0f711b0cf9bce3908b36a1d3c1
2020-01-23 19:20:49 +00:00
Isaac Hess
44de90ecc8 storagenode/pieces: Rename vars and update comments
A few variables were not renamed to the new standard piecesTotal and
piecesContentSize, so it was unclear which value was being used. These
have been updated, and some comments made more thorough.

Change-Id: I363bad4dec2a8e5c54d22c3c4cd85fc3d2b3096c
2020-01-23 11:00:24 -07:00
Isaac Hess
14fd6a9ef0 storagenode/pieces: Track total piece size
This change updates the storagenode piecestore apis to expose access to
the full piece size stored on disk. Previously we only had access to
(and only kept a cache of) the content size used for all pieces. This
was inaccurate when reporting the amount of disk space used by nodes.

We now have access to the total content size, as well as the total disk
usage, of all pieces. The pieces cache also keeps a cache of the total
piece size along with the content size.

Change-Id: I4fffe7e1257e04c46021a2e37c5adc6fe69bee55
2020-01-23 11:00:24 -07:00
stefanbenten
62d3783928 storagenode/peer: ensure contact.external-address and server.address is valid
Change-Id: I634f0d355b0be18ba419726ace746921adda3ac0
2020-01-23 15:51:46 +00:00
Egon Elbre
5a4745eddb all: remove usages of testplanet.New
Ensure that tests use testplanet.Run, so we always require running
against all database backends.

Change-Id: I6b0209e6a4912cf3328bd35b2c31bb8598930acb
2020-01-22 22:42:57 +02:00
Michal Niewrzal
6502454947 satellite/metainfo: move RS configuration to satellite
With this change RS configuration will be set on satellite. Uplink with
get RS values with BeginObject request and will use it. For backward
compatibility and to avoid super large change redundancy scheme stored
with bucket is not touched. This can be done in future.

Change-Id: Ia5f76fc10c37e2c44e4f7b8754f28eafe1f97eff
2020-01-22 09:33:53 +00:00
Egon Elbre
c1c878efcf all: fix import groupings
check-imports was broken and didn't complain about things.

Change-Id: I38adafd16b4aba86f0eb4f53427b4393f9a6c710
2020-01-20 17:47:44 +00:00
Egon Elbre
21f53e38da storagenode/storagenodedb/storagenodedbtest: pass ctx as an argument
Change-Id: I10b0a8ef3a7d5001e7d361f1873ad5987af1f9c2
2020-01-20 16:56:12 +02:00
Egon Elbre
f3b4bf2b7c satellite/satellitedb/satellitedbtest: pass ctx as an argument
ctx is created in most tests, instead pass in as argument
to reduce code duplication.

Change-Id: I466c51c008392001129c8b007c9d6b3619935ac4
2020-01-20 16:35:42 +02:00
Egon Elbre
1279eeae39 private/tagsql,storage: fixes to context cancellation
Replace all the remaining uses of sql.DB with tagsql.DB to
fix issues with context cancellation.

Introduce tagsql.Open which helps to get rid of all tagsql.Wrap-s.
Use tagsql in cockroachkv and postgreskv.

Change-Id: I8946d203341cb85a25976896fc7881e1f704e779
2020-01-20 15:44:39 +02:00
Egon Elbre
d5438036b5 {satellite,storagnode}/gracefulexit: reduce logging
Change-Id: I9f274ede77a582fc43ef14a47bf9341d4e3083df
2020-01-19 22:36:13 +02:00
Egon Elbre
3cd584c007 storagenode/gracefulexit: move database test
Database tests belong to the interface, not the implementation.

Change-Id: I5d76fdc7df0b0f32391ebad1b595ef26b062a9cb
2020-01-19 18:12:01 +00:00
Egon Elbre
7bc76624cf storagenode/storagenodedb: fix closing in-use database
Migration step was closing a database that was used by
the migration itself. There is an active tranasction
over the database.

Instead of closing in the same transaction we can wait
until restart for the database cleanup.

Change-Id: Ic971d8cea81a3ab783f4a1bdc6357009c8b31386
2020-01-19 16:18:46 +02:00
Egon Elbre
25b76fe63f storagenode/storagenodedb: use tagsql
Change-Id: Iba3b34a97b982deb4f72ce55517a294f249b6b55
2020-01-19 14:39:16 +02:00
Egon Elbre
59d06644b9 private/migrate: switch to tagsql
Also added temporary types withRebind and withTagTx,
which will be later removed. Currently they help to avoid
changing the whole codebase at the same time.

Change-Id: I7f07ba8f4709a23a463bfa67464628665a05808f
2020-01-19 14:39:16 +02:00
Moby von Briesen
273eb66fae cmd/storagenode,storagenode/preflight: add config flag to disable
storagenode database preflight check.

Disable preflight database check by default, and have the option to
enable it. This will allow us to enable it once it is definitely
working.

Also change the name of the config flag for preflight  time sync.

Change-Id: Ie2e20f9e25dcb38794eafa7e1505e7c6ff287c99
2020-01-17 17:53:17 +00:00
Isaac Hess
614e04d055 storagenode/pieces: Cache inits trash info from db
On pieces usage cache init we now load the trash info from the db. Also
fixes a test that was masking the failure here.

Change-Id: I9ff7da5bc6c0f74cf0942e20931b40e0c88d70fa
2020-01-17 09:33:05 -07:00
Bill Thorp
6f2f97b313 storagenode\gracefulexit: broke worker deleteOnePieceOrAll into deleteOnePiece and deleteAllPieces and deletePiece
Change-Id: Ic3bd21e89fa71e962c2bb1c4943f4696bc4f83e5
2020-01-17 15:07:34 +00:00
Moby von Briesen
e115bc1903 cmd/storagenode;storagenode/storagenodedb: add preflight database check
for storagenode

Ensure that database schema matches latest test migration schema before
allowing the node to start up.

Ensure minimal read/write functionality for each storagenode database
before allowing the node to start up.

This will eliminate many unhandled audit errors we are seeing.

Change-Id: Ic0e628b04a9c35b7a8243f6a81d4683918170ba9
2020-01-16 18:44:46 +00:00
Egon Elbre
81d53b8097 storagenode/storagenodedb: fixes to row handling
Change-Id: I3813310b48337428f13678a9fcba5c8a0e0b2b2a
2020-01-16 15:08:37 +00:00
Yingrong Zhao
db8aee0806 satellite/contact; storagenode/preflight: add clock check on startup for storagenode
add config preflight.enabled-local-time

Change-Id: I7b942c9bee063aae409ee6721ae9d079dff0144f
2020-01-15 15:35:26 +00:00
Yingrong Zhao
07c2824d94 storagenode/gracefulexit: fix exit-status command output
When exit succeeded, cli should display `Y` in Successful column and `100%` in PercentComplete.

Change-Id: I6093eca207ecd618bb332af12e5e455bc8224dde
2020-01-15 14:58:15 +00:00
Egon Elbre
08f63614be private/context2: add WithoutCancellation
Change-Id: I38557c16f41b8983886f256353cc6afb7634d9e6
2020-01-15 14:23:46 +02:00
Egon Elbre
64fb2d3d2f Revert "dbutil: statically require all databases accesses to use contexts"
This reverts commit 8e242cd012.

Revert because lib/pq has known issues with context cancellation.
These issues need to be resolved before these changes can be merged.

Change-Id: I160af51dbc2d67c5449aafa406a403e5367bb555
2020-01-15 07:28:00 +00:00
JT Olio
8e242cd012 dbutil: statically require all databases accesses to use contexts
this will allow for some nice runtime analysis down the road.
also, this allows for wrapping database handles in a way that
can interact with these contexts

requires https://review.dev.storj.io/c/storj/dbx/+/514

Change-Id: Ib087b7cd73296dd2c1e0331314da34d861f61d2b
2020-01-14 18:20:47 -05:00
Egon Elbre
5af1f9e6d1 storagenode/{piecestore,storagenodedb}: use context in queries
In endpoint.saveOrder, ensure we always try to save orders such
that they can be settled.

Change-Id: Ic9ac8f4bf684d8493282912ca97f386c1762e364
2020-01-14 20:27:26 +00:00
Egon Elbre
d80cfeb4ab storagenode: ensure we don't eat the underlying error
When error is formatted using %v it's not possible to check
whether the error was caused by a context cancellation.

Change-Id: Ia77dfb0817e49d9a7b168c12a6300d131007d0ee
2020-01-14 20:26:23 +00:00
Egon Elbre
23e2664327 storagenode/inspector: return rpcstatus
Change-Id: I7e13b6dc8c9c3f4550f77885b1ef99662f5a5727
2020-01-14 20:24:46 +00:00
Egon Elbre
ff267168c5 private/migrate: add ctx argument
Change-Id: I3d65912d89261386413c494c7ed1576fed4dcaf4
2020-01-13 15:52:26 +02:00
Egon Elbre
c7b846589e private/dbutil/sqliteutil: add ctx argument
Change-Id: If1caa9cde746817e62cae32a152eeec81959129c
2020-01-13 15:03:30 +02:00
Qweder93
cf19e141e0 storagenode/notifications: return unread count and fix json id, list-notifications method fix
Change-Id: Ic56beac1f388d91a29c9e8266161715d09364520
2020-01-09 17:56:00 +00:00
Yingrong Zhao
ebeee58001 storagenode/gracefulexit: remove satellite entry when node fail precondition
Change-Id: I3c215170f10f0053e4f8718ee31d64d93f52ec80
2020-01-08 18:11:58 +00:00