this commit updates our monkit dependency to the v3 version where
it outputs in an influx style. this makes discovery much easier
as many tools are built to look at it this way.
graphite and rothko will suffer some due to no longer being a tree
based on dots. hopefully time will exist to update rothko to
index based on the new metric format.
it adds an influx output for the statreceiver so that we can
write to influxdb v1 or v2 directly.
Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff
Not having a skew caused an issue where:
1. Uplink calls "begin segment", where segment isn't committed to the
database.
2. Uplink stores piece X to the storage node A with timestamp 1.
3. Satellite runs garbage collection with timestamp 2.
4. Satellite sends retain request to storage node A with timestamp 2.
5. Storage node A deletes piece X, because 1 < 2.
6. Uplink calls "commit segment" with storage node A in it.
7. Download of segment fails, because A doesn't have piece X.
In production this is not an issue since the MaxTimeSkew is 72h by
default.
Change-Id: Id87ca3ddc44103dcd85d031b1367168c014b8e7b
We don't want slowloris nodes to be able to indefinitely block
up the satellite, so add a timeout. Some monitoring inspection
showed the largest success times being on the order of 30s, so
a 1min timeout should be sufficient to kill the misbehaving nodes.
Change-Id: I5e2c3480a15f6304e37262d0a4d30d07eae99bb3
all of the packages and tests work with both grpc and
drpc. we'll probably need to do some jenkins pipelines
to run the tests with drpc as well.
most of the changes are really due to a bit of cleanup
of the pkg/transport.Client api into an rpc.Dialer in
the spirit of a net.Dialer. now that we don't need
observers, we can pass around stateless configuration
to everything rather than stateful things that issue
observations. it also adds a DialAddressID for the
case where we don't have a pb.Node, but we do have an
address and want to assert some ID. this happened
pretty frequently, and now there's no more weird
contortions creating custom tls options, etc.
a lot of the other changes are being consistent/using
the abstractions in the rpc package to do rpc style
things like finding peer information, or checking
status codes.
Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412
* nicer flags
* fix concurrency
* add concurrent workers
* initialize things
* fix tests
* close retain service
* ensure we don't have workers working on the same satellite
* ensure things compile
* fix other compilation issues:
* concurrency changes
ran this with `go test -count=1000` and it passed all of them.
- we add a closed channel so that we can select on it with
context cancellation.
- we put a once in so we only close the channel once.
- every time the queue/running state changes, we have to broadcast
because we may want to wake up N pending Wait calls or other
concurrent workers.
- because we broadcast, we don't need to do the polling in Wait
anymore.
- ensure Run doesn't start multiple times so that we don't have
to worry about concurrent Close with multiple Runs.
- hold the lock while we start workers so that a concurrent Close
with Run can't decide that there's nothing started and exit
and then have Run start things.
- make sure to poll the closed/context channels through loops
or at the start of Run calls in case Close happens first.
- these polls should be under a mutex because they have a default
case which makes it possible to schedule such that Close hasn't
executed the channel close so it starts more work.
- cancel a local Run context when it's going to exit to make sure
that any retainPieces calls have a canceled context.
- hopefully enough comments to both check my work and help readers
digest what's going on.
Change-Id: Ida0e226a7e01e8ae64fa2c59dd5a84b04bccfbd7
* use the retain error class
Change-Id: I1511eaef135f98afd57b878e997e4c8a0d11cafc
* concurrency fixes again
- forgot to update the gc test to use the old Wait api.
- we need to drop the lock while we wait for the workers
to exit, because they may be blocked on the condition
variable
- additionally, we need to broadcast when we close the
signal channel because the state changed: they want
to wake up and exit.
Change-Id: I4204699792275260cd912f29aa73720f7d9b14b5
* undo my misguided rename
Change-Id: I6baffe1eb0434e260212c485bbcc01bed3250881
* remove pollInterval
* format paragraph more nicely
* move skew calculation into retain pieces
The call to monkit for functions which mostly run from the beginning to
the end of the satellite process must be done because it only causes a
little overhead.
Add retain service on storagenode. This service runs retain jobs that have been queued by the storagenodes. Rather than running retain jobs during the grpc Retain() call, the grpc call queues a retain job to the retain service and returns immediately afterwards, removing a significant bottleneck in garbage collection.
Deprecate the pieceinfo database, and start storing piece info as a header to
piece files. Institute a "storage format version" concept allowing us to handle
pieces stored under multiple different types of storage. Add a piece_expirations
table which will still be used to track expiration times, so we can query it, but
which should be much smaller than the pieceinfo database would be for the
same number of pieces. (Only pieces with expiration times need to be stored in piece_expirations, and we don't need to store large byte blobs like the serialized
order limit, etc.) Use specialized names for accessing any functionality related
only to dealing with V0 pieces (e.g., `store.V0PieceInfo()`). Move SpaceUsed-
type functionality under the purview of the piece store. Add some generic
interfaces for traversing all blobs or all pieces. Add lots of tests.
* rename pkg/linksharing to linksharing
* rename pkg/httpserver to linksharing/httpserver
* rename pkg/eestream to uplink/eestream
* rename pkg/stream to uplink/stream
* rename pkg/metainfo/kvmetainfo to uplink/metainfo/kvmetainfo
* rename pkg/auth/signing to pkg/signing
* rename pkg/storage to uplink/storage
* rename pkg/accounting to satellite/accounting
* rename pkg/audit to satellite/audit
* rename pkg/certdb to satellite/certdb
* rename pkg/discovery to satellite/discovery
* rename pkg/overlay to satellite/overlay
* rename pkg/datarepair to satellite/repair
* Added a gc package at satellite/gc, which contains the gc.Service, which runs garbage collection integrated with the metainfoloop, and the gc PieceTracker, which implements the metainfo loop Observer interface and stores all of the filters (about which pieces are good) for each node.
* Added a gc config located at satellite/gc/service.go (loop disabled by default in release)
* Creates bloom filters with pieces to be retained inside the metainfo loop
* Sends RetainRequests (or filters with good piece ids) to all storage nodes.