Commit Graph

57 Commits

Author SHA1 Message Date
Egon Elbre
64c8de6ea5 mod: use vendored base58
Change-Id: I5aa29515928848c862500330218cc094618638d7
2022-01-31 15:54:33 +02:00
Michał Niewrzał
c258f4bbac private/testplanet: move Metabase outside Metainfo for satellite
At some point we moved metabase package outside Metainfo
but we didn't do that for satellite structure. This change
refactors only tests.
When uplink will be adjusted we can remove old entries in
Metainfo struct.

Change-Id: I2b66ed29f539b0ec0f490cad42c72840e0351bcb
2021-09-09 07:15:51 +00:00
Egon Elbre
ca64e55281 satellite/gc: remove skip first
We used this to reduce initial load on the core to avoid OOM. However,
this is not a problem anymore with garbage collection running
separately.

Change-Id: Ifd62c822a74974bc21a5913199334469a4bc0130
2021-06-21 18:30:38 +00:00
Egon Elbre
9b2607d6ba satellite: remove garbage collection option from core
We don't run it anywhere in this configuration, so it's not worthwhile
to keep it that way.

Change-Id: I88afb8bb3eb3843801b15454408f10d1353596cb
2021-06-15 21:07:02 +03:00
JT Olio
da9ca0c650 testplanet/satellite: reduce the number of places default values need to be configured
Satellites set their configuration values to default values using
cfgstruct, however, it turns out our tests don't test these values
at all! Instead, they have a completely separate definition system
that is easy to forget about.

As is to be expected, these values have drifted, and it appears
in a few cases test planet is testing unreasonable values that we
won't see in production, or perhaps worse, features enabled in
production were missed and weren't enabled in testplanet.

This change makes it so all values are configured the same,
systematic way, so it's easy to see when test values are different
than dev values or release values, and it's less hard to forget
to enable features in testplanet.

In terms of reviewing, this change should be actually fairly
easy to review, considering private/testplanet/satellite.go keeps
the current config system and the new one and confirms that they
result in identical configurations, so you can be certain that
nothing was missed and the config is all correct.
You can also check the config lock to see what actual config
values changed.

Change-Id: I6715d0794887f577e21742afcf56fd2b9d12170e
2021-06-01 22:14:17 +00:00
Michał Niewrzał
e76cbc9bd5 satellite/gc: move GC to segments loop
This change is refactor to move GC from metainfo loop
(objects/segments)  to segments loop.

Change-Id: I21f1ff7cb0b6f98c41aa8930447b8d9bea227975
2021-06-01 20:36:02 +00:00
Egon Elbre
69b149a66f mod: bump uplink
uplink stopped using zap, hence some of the private methods needed to be
changed.

Change-Id: Iac1fae45a40cd3f1649b9f672bf8c250344986d5
2021-05-06 14:48:36 +00:00
Egon Elbre
961e841bd7 all: fix error naming
errs.Class should not contain "error" in the name, since that causes a
lot of stutter in the error logs. As an example a log line could end up
looking like:

    ERROR node stats service error: satellitedbs error: node stats database error: no rows

Whereas something like:

    ERROR nodestats service: satellitedbs: nodestatsdb: no rows

Would contain all the necessary information without the stutter.

Change-Id: I7b7cb7e592ebab4bcfadc1eef11122584d2b20e0
2021-04-29 15:38:21 +03:00
Michał Niewrzał
7944df20d6 storj: use multipart API
Change-Id: I10b401434e3e77468d12ecd225b41689568fd197
2021-04-26 13:15:09 +00:00
Egon Elbre
4c9ed64f75 satellite/metabase/metaloop: move loop under metabase
Currently the loop handling is heavily related to the metabase rather
than metainfo.

metainfo over time has become related to the "public API" for accessing
the metabase data.

Currently updates monkit.lock, because monkit monitoring does not handle
ScopeNamed correctly. Needs a followup change to monitoring check.

Change-Id: Ie50519991d718dfb872ec9a0176a82e732c97584
2021-04-22 12:58:09 +03:00
Egon Elbre
267506bb20 satellite/metabase: move package one level higher
metabase has become a central concept and it's more suitable for it to
be directly nested under satellite rather than being part of metainfo.

metainfo is going to be the "endpoint" logic for handling requests.

Change-Id: I53770d6761ac1e9a1283b5aa68f471b21e784198
2021-04-21 15:54:22 +03:00
Fadila Khadar
bde367ae73 satellite/gc: check on bloom filter creation date
Check that the bloom filter creation date is earlier than the
metainfo loop system time used for db scanning.

Change-Id: Ib0f47c124f5651deae0fd7e7996abcdcaac98fb4
2021-04-14 16:40:37 +00:00
Kaloyan Raev
035c393da0 satellite: update tests to pass etag.Reader to multipart.PutObjectPart
Change-Id: Ibe99357945ae7a91f5b5d4f87b83d425c9fa84a5
2021-03-29 13:18:11 +00:00
Egon Elbre
f19ef4afe5 satellite/metainfo/metaloop: move loop to a separate package
Change-Id: I94c931a27c1af6062185ec62688624ec02050f11
2021-03-23 15:37:34 +00:00
Michał Niewrzał
9a60011774 Merge remote-tracking branch 'origin/main' into multipart-upload
Change-Id: Ia90f29be432e207c4125f7f955c912978eabe59a
2021-02-04 09:38:08 +01:00
Egon Elbre
c4578eb3ec satellite/gc: add test for pending object
Change-Id: Ifb076ab38442f88f94a3e0c2ae1b19528a55f724
2020-12-22 09:42:32 +00:00
Fadila Khadar
724b0f91eb satellite/gc: update tests to use metabase
Change-Id: I13c6c02a46254ea1d7176c0c6045fd24dd117a58
2020-12-16 10:38:24 +00:00
Kaloyan Raev
fc85179a19 satellite/metainfo: refactor SegmentLocation.Index to SegmentPosition
Change-Id: Ic9403c8126712693326dd83d6ba4f3b84be3e0c7
2020-12-14 13:35:53 +02:00
Stefan Benten
494bd5db81
all: golangci-lint v1.33.0 fixes (#3985) 2020-12-05 17:01:42 +01:00
Ivan Fraixedes
7eb3b2d6d0
satellite/gc: Init map with an aprox size
Because the PieceTracker receives a piece count per nodes which is an
approximation of the number of nodes that they are going to be reported
by the metainfo loop so we can use as a good guess of the map's size and
initialized with it.

Change-Id: I644db40926c03e4c457457fb41d2ec1da059cea6
2020-11-27 10:44:19 +01:00
Kaloyan Raev
92a2be2abd satellite/metainfo: get away from using pb.Pointer in Metainfo Loop
As part of the Metainfo Refactoring, we need to make the Metainfo Loop
working with both the current PointerDB and the new Metabase. Thus, the
Metainfo Loop should pass to the Observer interface more specific Object
and Segment types instead of pb.Pointer.

After this change, there are still a couple of use cases that require
access to the pb.Pointer (hence we have it as a field in the
metainfo.Segment type):
1. Expired Deletion Service
2. Repair Service

It would require additional refactoring in these two services before we
are able to clean this.

Change-Id: Ib3eb6b7507ed89d5ba745ffbb6b37524ef10ed9f
2020-10-27 13:06:47 +00:00
Michal Niewrzal
9202295348 satellite/metainfo: replace ScopedPath with metabase.SegmentLocation
Change-Id: I7e89c9e8eaeae58be828a32ad47ed3028501f4c7
2020-09-04 10:06:52 +00:00
Michal Niewrzal
aa47e70f03 satellite/metainfo: use metabase.SegmentKey with metainfo.Service
Instead of using string or []byte we will be using dedicated type
SegmentKey.

Change-Id: I6ca8039f0741f6f9837c69a6d070228ed10f2220
2020-09-03 15:11:32 +00:00
Egon Elbre
c86c732fc0 satellite: simplify tests
satellite.DB.Console().Projects().GetAll database query
can be replaced with planet.Uplinks[0].Projects[0].ID

Change-Id: I73b82b91afb2dde7b690917345b798f9d81f6831
2020-08-28 22:28:04 +00:00
Egon Elbre
94a09ce20b all: add missing dots
Change-Id: I93b86c9fb3398c5d3c9121b8859dad1c615fa23a
2020-08-11 17:50:01 +03:00
Egon Elbre
080ba47a06 all: fix dots
Change-Id: I6a419c62700c568254ff67ae5b73efed2fc98aa2
2020-07-16 14:58:28 +00:00
Michal Niewrzal
84892631c8 private/testplanet: remove old libuplink from testplanet
Change-Id: Ib1553f84d0b3ae12a5b00382f0f53357b6a273e2
2020-05-28 13:50:23 +00:00
Egon Elbre
ed627144ed all: use DialNodeURL throughout the codebase
Change-Id: Iaf9ae3aeef7305c937f2660c929744db2d88776c
2020-05-20 10:36:30 +00:00
Egon Elbre
ec589a8289 all: fix comments about grpc
Change-Id: Id830fbe2d44f083c88765561b6c07c5689afe5bd
2020-05-11 13:05:34 +03:00
Egon Elbre
e8f18a2cfe private/testplanet: expose storagenode and satellite Config
Change-Id: I80fe7ed8ef7356948879afcc6ecb984c5d1a6b9d
2020-03-27 17:01:25 +02:00
Natalie Villasana
8e0ca0e6f5
satellite/gc: update release default for gc to run separately (#3830) 2020-03-26 14:44:18 -04:00
Stefan Benten
173cb1e484
Changing LogLevel to Warn (#3822)
This is not a process error and can cause false alarm for monitoring systems
2020-03-24 13:46:28 +01:00
Jessica Grebenschikov
5142874144 satellite/gc: move garbage collection to its own process
Change-Id: I7235aa83f7c641e31c62ba9d42192b2232dca4a5
2020-03-18 16:44:01 +00:00
Egon Elbre
5342dd9fe6 go.mod: update uplink
Change-Id: I867a6a1eef8aa5d60bb676e5112b98c4192ce811
2020-02-21 16:08:12 +02:00
JT Olio
2ae9978304 satellite/gc: skip first gc run
rationale: if GC kills the satellite, it would be nice to make
it through a repair checker sweep first

Change-Id: Id56171dc8e13940cfb6481e36a910bad077a01ed
2020-02-13 13:41:15 +02:00
Jeff Wendling
7999d24f81 all: use monkit v3
this commit updates our monkit dependency to the v3 version where
it outputs in an influx style. this makes discovery much easier
as many tools are built to look at it this way.

graphite and rothko will suffer some due to no longer being a tree
based on dots. hopefully time will exist to update rothko to
index based on the new metric format.

it adds an influx output for the statreceiver so that we can
write to influxdb v1 or v2 directly.

Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff
2020-02-05 23:53:17 +00:00
Egon Elbre
8dea4f52db satellite: add control panel
Change-Id: Id48246e9bcd4c6ec643277fe740937b2e42ad85b
2020-01-30 08:06:43 -05:00
Egon Elbre
10d932fd65 lib/uplinkc: fix test flakiness by setting MaxTimeSkew
Not having a skew caused an issue where:

1. Uplink calls "begin segment", where segment isn't committed to the
database.
2. Uplink stores piece X to the storage node A with timestamp 1.
3. Satellite runs garbage collection with timestamp 2.
4. Satellite sends retain request to storage node A with timestamp 2.
5. Storage node A deletes piece X, because 1 < 2.
6. Uplink calls "commit segment" with storage node A in it.
7. Download of segment fails, because A doesn't have piece X.

In production this is not an issue since the MaxTimeSkew is 72h by
default.

Change-Id: Id87ca3ddc44103dcd85d031b1367168c014b8e7b
2020-01-20 12:44:42 +00:00
Egon Elbre
082ec81714
uplink: move to storj.io/uplink (#3746) 2020-01-08 15:40:19 +02:00
Jeff Wendling
29fe206b9a satellite/gc: add timeout to retain requests
We don't want slowloris nodes to be able to indefinitely block
up the satellite, so add a timeout. Some monitoring inspection
showed the largest success times being on the order of 30s, so
a 1min timeout should be sufficient to kill the misbehaving nodes.

Change-Id: I5e2c3480a15f6304e37262d0a4d30d07eae99bb3
2020-01-03 21:46:46 +00:00
Egon Elbre
6615ecc9b6 common: separate repository
Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a
2019-12-27 14:11:15 +02:00
Egon Elbre
ee6c1cac8a
private: rename internal to private (#3573) 2019-11-14 21:46:15 +02:00
Egon Elbre
cc032d3151 satellite/metainfo: fix some uses of metainfo.Delete (#3513)
* satellite/metainfo: rename Delete to UnsynchronizedDelete

* fix deletes

* make db private

* fix typos

* also verify on commit object
2019-11-06 18:02:14 +01:00
littleskunk
7eb6724c92
logging: unify logging around satellite ID, node ID and piece ID (#3491)
* logging: unify logging around satellite ID, node ID and piece ID

* unify segment index
2019-11-05 22:04:07 +01:00
Jeff Wendling
098cbc9c67 all: use pkg/rpc instead of pkg/transport
all of the packages and tests work with both grpc and
drpc. we'll probably need to do some jenkins pipelines
to run the tests with drpc as well.

most of the changes are really due to a bit of cleanup
of the pkg/transport.Client api into an rpc.Dialer in
the spirit of a net.Dialer. now that we don't need
observers, we can pass around stateless configuration
to everything rather than stateful things that issue
observations. it also adds a DialAddressID for the
case where we don't have a pb.Node, but we do have an
address and want to assert some ID. this happened
pretty frequently, and now there's no more weird
contortions creating custom tls options, etc.

a lot of the other changes are being consistent/using
the abstractions in the rpc package to do rpc style
things like finding peer information, or checking
status codes.

Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412
2019-09-25 15:37:06 -06:00
Jess G
7c203b4884
add satelliteSystem to testplanet and update tests (#3066) 2019-09-17 13:14:49 -07:00
Egon Elbre
7240e6cbb2
satellite: remove remote/inline file from BucketTally (#3041) 2019-09-13 16:51:41 +03:00
Egon Elbre
8b668ab1f8
satellite/metainfo.Loop: use a parsed path for observers (#3003) 2019-09-12 13:38:49 +03:00
Egon Elbre
a801fab66a
all: add archview annotations (#2964) 2019-09-10 16:24:16 +03:00
Egon Elbre
62e3bf5b34 storagenode/retain: fix concurrency issues (#2828)
* nicer flags

* fix concurrency

* add concurrent workers

* initialize things

* fix tests

* close retain service

* ensure we don't have workers working on the same satellite

* ensure things compile

* fix other compilation issues:

* concurrency changes

ran this with `go test -count=1000` and it passed all of them.

- we add a closed channel so that we can select on it with
  context cancellation.
- we put a once in so we only close the channel once.
- every time the queue/running state changes, we have to broadcast
  because we may want to wake up N pending Wait calls or other
  concurrent workers.
- because we broadcast, we don't need to do the polling in Wait
  anymore.
- ensure Run doesn't start multiple times so that we don't have
  to worry about concurrent Close with multiple Runs.
- hold the lock while we start workers so that a concurrent Close
  with Run can't decide that there's nothing started and exit
  and then have Run start things.
- make sure to poll the closed/context channels through loops
  or at the start of Run calls in case Close happens first.
- these polls should be under a mutex because they have a default
  case which makes it possible to schedule such that Close hasn't
  executed the channel close so it starts more work.
- cancel a local Run context when it's going to exit to make sure
  that any retainPieces calls have a canceled context.
- hopefully enough comments to both check my work and help readers
  digest what's going on.

Change-Id: Ida0e226a7e01e8ae64fa2c59dd5a84b04bccfbd7

* use the retain error class

Change-Id: I1511eaef135f98afd57b878e997e4c8a0d11cafc

* concurrency fixes again

- forgot to update the gc test to use the old Wait api.
- we need to drop the lock while we wait for the workers
  to exit, because they may be blocked on the condition
  variable
- additionally, we need to broadcast when we close the
  signal channel because the state changed: they want
  to wake up and exit.

Change-Id: I4204699792275260cd912f29aa73720f7d9b14b5

* undo my misguided rename

Change-Id: I6baffe1eb0434e260212c485bbcc01bed3250881

* remove pollInterval

* format paragraph more nicely

* move skew calculation into retain pieces
2019-08-28 16:35:25 -04:00