Commit Graph

203 Commits

Author SHA1 Message Date
Michal Niewrzal
a6721ba92f
satellite/metainfo: Improve metainfo ListSegments (#2882) 2019-08-30 23:30:18 +02:00
Egon Elbre
62e3bf5b34 storagenode/retain: fix concurrency issues (#2828)
* nicer flags

* fix concurrency

* add concurrent workers

* initialize things

* fix tests

* close retain service

* ensure we don't have workers working on the same satellite

* ensure things compile

* fix other compilation issues:

* concurrency changes

ran this with `go test -count=1000` and it passed all of them.

- we add a closed channel so that we can select on it with
  context cancellation.
- we put a once in so we only close the channel once.
- every time the queue/running state changes, we have to broadcast
  because we may want to wake up N pending Wait calls or other
  concurrent workers.
- because we broadcast, we don't need to do the polling in Wait
  anymore.
- ensure Run doesn't start multiple times so that we don't have
  to worry about concurrent Close with multiple Runs.
- hold the lock while we start workers so that a concurrent Close
  with Run can't decide that there's nothing started and exit
  and then have Run start things.
- make sure to poll the closed/context channels through loops
  or at the start of Run calls in case Close happens first.
- these polls should be under a mutex because they have a default
  case which makes it possible to schedule such that Close hasn't
  executed the channel close so it starts more work.
- cancel a local Run context when it's going to exit to make sure
  that any retainPieces calls have a canceled context.
- hopefully enough comments to both check my work and help readers
  digest what's going on.

Change-Id: Ida0e226a7e01e8ae64fa2c59dd5a84b04bccfbd7

* use the retain error class

Change-Id: I1511eaef135f98afd57b878e997e4c8a0d11cafc

* concurrency fixes again

- forgot to update the gc test to use the old Wait api.
- we need to drop the lock while we wait for the workers
  to exit, because they may be blocked on the condition
  variable
- additionally, we need to broadcast when we close the
  signal channel because the state changed: they want
  to wake up and exit.

Change-Id: I4204699792275260cd912f29aa73720f7d9b14b5

* undo my misguided rename

Change-Id: I6baffe1eb0434e260212c485bbcc01bed3250881

* remove pollInterval

* format paragraph more nicely

* move skew calculation into retain pieces
2019-08-28 16:35:25 -04:00
Cameron
599324c364
satellite/dbcleanup: delete expired serials from satellite (#2867)
Creates a new chore, dbcleanup, which can be used for routine deletion of items from the satellite database and adds functionality for deletion of expired serial numbers
2019-08-27 13:12:38 -04:00
Cameron
1f3537d4a9 storagenode/vouchers: remove storagenode vouchers (#2873) 2019-08-26 19:35:19 +03:00
Cameron
3d9441999a
storagenode/orders: add archive cleanup to orders service (#2821)
This PR introduces functionality for routine deletion of archived orders.

The user may specify an interval at which to run archive cleanup and a TTL for archived items. During each cleanup, all items that have reached the TTL are deleted

This archive cleanup job is combined with the order sender into a new combined orders service
2019-08-22 10:33:14 -04:00
Egon Elbre
00b2e1a7d7 all: enable staticcheck (#2849)
* by having megacheck in disable it also disabled staticcheck

* fix closing body

* keep interfacer disabled

* hide bodies

* don't use deprecated func

* fix dead code

* fix potential overrun

* keep stylecheck disabled

* don't pass nil as context

* fix infinite recursion

* remove extraneous return

* fix data race

* use correct func

* ignore unused var

* remove unused consts
2019-08-22 13:40:15 +02:00
Egon Elbre
9ec0ceddf3
pkg/revocation: ensure we close revocation databases (#2825) 2019-08-20 18:04:17 +03:00
Isaac Hess
25154720bd
lib/uplink: remove redis and bolt dependencies (#2812)
* identity: remove redis and bolt dependencies

* identity: move revDB creation to main files
2019-08-19 16:10:38 -06:00
Maximillian von Briesen
d83a965139
storagenode/piecestore: Add retain service on storagenode (#2785)
Add retain service on storagenode. This service runs retain jobs that have been queued by the storagenodes. Rather than running retain jobs during the grpc Retain() call, the grpc call queues a retain job to the retain service and returns immediately afterwards, removing a significant bottleneck in garbage collection.
2019-08-19 14:52:47 -04:00
Egon Elbre
9eba5ac631
lib/uplink: remove Seek method (#2768) 2019-08-13 20:29:02 +03:00
Jess G
022f5d2e14
storagenode: add space used cache for pieces (#2753)
* add cache, update cache w/piece create/delete

* add service w/loop to cache to recalculate space used cache

* add piecestore cache to other sn svcs to use

* add table to persist the total space used

* rm cache where not needed

* rm stuff from sn svcs

* start fixing tests, changes per comments

* update commits

* add unit tests

* fix commiting before we write header bytes

* fix cache create test

* copy cache map, add started back to recalc

* fix test

* add test, update comments
2019-08-12 14:43:05 -07:00
Yaroslav Vorobiov
4cf2b96731
storagenode/nodestats: fix test duration (#2748) 2019-08-09 14:12:32 +03:00
Yaroslav Vorobiov
28a7778e9e
storagenode/nodestats: cache node stats (#2543) 2019-08-08 16:47:04 +03:00
Simon Guindon
f236fc3d91
Remove the use of in memory SQLite3 tables for storage nodes. (#2726) 2019-08-07 10:52:00 -04:00
Egon Elbre
c8edeb0257
satellite/overlay: rename overlay.Cache to overlay.Service (#2717) 2019-08-06 19:35:59 +03:00
Jeff Wendling
21a3bf89ee cmd/uplink: use scopes to open (#2501)
What: Change cmd/uplink to use scopes

It moves the fields that will be subsumed by scopes into an explicit legacy section and hides their configuration flags.

Why: So that it can read scopes in from files and stuff
2019-08-05 11:01:20 -06:00
Egon Elbre
369a51ed00 lib/uplink: ensure it's silent by default (#2676) 2019-08-01 07:14:09 -04:00
ethanadams
c9b46f2fe2
V3-1987: Optimize audits stats persistence (#2632)
* Added batch update stats for recordAuditSuccessStatus
* Added batch update stats to recordAuditFailStatus
* added configurable batch size
* build individual update/delete statements so the statements can be batched into 1 call to the DB
* notified #config-changes channel and ran make update-satellite-config-lock
* updated tests to use batch update stats
2019-07-31 13:21:06 -04:00
Egon Elbre
ec3d5c0bdd
don't use global loggers (#2671)
* pkg/server: don't use global logger
* satellite/overlay: use correct logger
* pkg/kademlia: use correct logger
* linksharing: use conventional way to pass in logger
* use zaptest in tests
2019-07-31 15:09:45 +03:00
ethanadams
8f8b13abb9
Re-enable SN bandwidth rollups. Fix SN bandwidth rollup unique constraint issue. Re-organize service code (#2617)
* re-organizing into bandwidth service. re-enable rollup loop
* Prevent uniqueness failure in bandwidth rollup
* Add test to make sure the rollup select date range works correctly
* add bandwidth config for rollup interval
2019-07-29 10:07:52 -04:00
Egon Elbre
5d0816430f
rename all the things (#2531)
* rename pkg/linksharing to linksharing
* rename pkg/httpserver to linksharing/httpserver
* rename pkg/eestream to uplink/eestream
* rename pkg/stream to uplink/stream
* rename pkg/metainfo/kvmetainfo to uplink/metainfo/kvmetainfo
* rename pkg/auth/signing to pkg/signing
* rename pkg/storage to uplink/storage
* rename pkg/accounting to satellite/accounting
* rename pkg/audit to satellite/audit
* rename pkg/certdb to satellite/certdb
* rename pkg/discovery to satellite/discovery
* rename pkg/overlay to satellite/overlay
* rename pkg/datarepair to satellite/repair
2019-07-28 08:55:36 +03:00
Maximillian von Briesen
906c77b55a
Add RetainStatus to storagenode config (#2633)
--storage2.retain-status = "disabled" (default), "debug", or "enabled"
2019-07-26 16:49:08 -04:00
Egon Elbre
0cdeae1922 add missing error handling (#2630) 2019-07-25 17:01:44 +02:00
Natalie Villasana
f11413bc8e Implement garbage collection on satellite (#2577)
* Added a gc package at satellite/gc, which contains the gc.Service, which runs garbage collection integrated with the metainfoloop, and the gc PieceTracker, which implements the metainfo loop Observer interface and stores all of the filters (about which pieces are good) for each node.
* Added a gc config located at satellite/gc/service.go (loop disabled by default in release)
* Creates bloom filters with pieces to be retained inside the metainfo loop
* Sends RetainRequests (or filters with good piece ids) to all storage nodes.
2019-07-24 13:26:43 -04:00
Jess G
353b089927
update testplanet with libuplink (#2618)
* update testplanet uplink upload with libuplink

* add libuplink to testplanet download

* update createbucket and delete obj with libuplink

* update downloadStream, fix tests

* fix test

* updates for CR comments
2019-07-23 07:58:45 -07:00
Jennifer Li Johnson
53d96be44a
Stylistic Go Cleanup (#2524) 2019-07-22 15:10:04 -04:00
Maximillian von Briesen
6c1c3fb4a7
Add metainfo loop service (#2563)
Add a metainfo loop service on the satellite that can be subscribed to by various services that need to make use of metainfo information
2019-07-22 09:34:12 -04:00
Egon Elbre
13dd501042
storagenode/storagenodedb: move tests near the interface rather than the implementation (#2596) 2019-07-19 20:40:27 +03:00
Egon Elbre
f6f65a80d7
storagenode/trust: implement fetching peer identity without kademlia and endpoint (#2584) 2019-07-17 21:14:44 +03:00
Alexander Leitner
64b2769de3
discovery: parallelize refresh (#2535)
* parallelize discovery refresh

* add paginateQualifiedtest, address pr comments

* Remove duplicate uptime update

* Lower concurrency in Testplanet for discovery
2019-07-12 10:35:48 -04:00
Ivan Fraixedes
f420b29d35
[V3-1927] Repairer uploads to max threshold instead of success… (#2423)
* pkg/datarepair: Add test to check num upload pieces
  Add a new test for ensuring the number of pieces that the repair process
  upload when a segment is injured.
* satellite/orders: Don't create "put order limits" over total
  Repair must not create "put order limits" more than the total count.
* pkg/datarepair: Update upload repair pieces test
  Update the test which checks the number of pieces which are uploaded
  during a repair for using the same excess over the success threshold
  value than the implementation.
* satellites/orders: Limit repair put order for not being total
  Limit the number of put orders to be used by repair for only uploading
  pieces to a % excess over the successful threshold.
* pkg/datarepair: Change DataRepair test to pass again
  Make some changes in the DataRepair test to make pass again after the
  repair upload repaired pieces only until a % excess over success
  threshold.
  Also update the steps description of the DataRepair test after it has been
  changed, to match on what's now, besides to leave it more generic for
  avoiding having to update it on minimal future refactorings.
* satellite: Make repair excess optimal threshold configurable
  Add a new configuration parameter to the satellite for being able to
  configure the percentage excess over the optimal threshold, used for
  determining how many pieces should be repaired/uploaded, rather than
  having the value hard coded.
* repairer: Add configurable param to segments/repairer
  Add a new parameters to the segment/repairer to calculate the maximum
  number of excess nodes, based on the optimal threshold, that repaired
  pieces can be uploaded.
  This new parameter has been added for not returning more nodes than the
  number of upload orders for data repair satellite service calculate for
  repairing pieces.
* pkg/storage/ec: Update log message in clien.Repair
* satellite: Update configuration lock file
2019-07-12 00:44:47 +02:00
Egon Elbre
d52f764e54
protocol: implement new piece signing and verification (#2525) 2019-07-11 16:51:40 -04:00
Maximillian von Briesen
8b507f3d73 Address concerns with storagenode Retain endpoint (#2527) 2019-07-11 16:04:21 -04:00
Bill Thorp
0e463dccfd
7 day validity window for order limits (#2520)
* 7 day limit
2019-07-10 17:17:00 -04:00
JT Olio
a79c7d77f3 overlay cache: slight modification of node-is-online rules (#2490) 2019-07-09 22:36:09 -04:00
Jeff Wendling
d616be8ae0 storagenode: use minimum time in the order for expiration (#2504) 2019-07-09 17:16:30 -04:00
Stefan Benten
16156e3b3d
Ensure we force a segment size and account storage before committing them (#2473) 2019-07-08 18:24:38 -04:00
Egon Elbre
674742d1a7
satellite/datarepair: use reliability cache (#1976) 2019-07-09 01:04:35 +03:00
Kaloyan Raev
f9ed0dc1a8
Improve stability of TestDownloadSharesDownloadTimeout (#2210) 2019-07-05 19:04:15 +03:00
aligeti
ae8b9698f9
Rename/remove EncryptionScheme -> EncryptionParame… (#2363)
* rename/remove EncryptionScheme -> EncryptionParameters
2019-07-03 14:07:44 -04:00
Cameron
d499d162f4
implement storj.NodeURL in trusted satellites (#2388)
* implement storj.NodeURL in trusted satellites
2019-07-03 13:29:18 -04:00
Kaloyan Raev
ca0058c9f1
Set MinDownloadTimeout to 5s in testplanet (#2447) 2019-07-03 17:49:08 +03:00
Egon Elbre
38f3d860a4
storagenode: decline uploads when there are too many live requests (#2397) 2019-07-03 16:47:55 +03:00
Alexander Leitner
6d55bbdb57
OrderLimit creation date time limit (#2412)
* Limit by order creation
2019-07-02 12:06:12 -04:00
Stefan Benten
3583c65f5b
Move from allowed range to minimum for Version Control (#2421) 2019-07-02 17:28:06 +02:00
Maximillian von Briesen
52e5a4eee3 pass logger into repairer and ecclient (#2365) 2019-07-02 13:08:02 +03:00
Natalie Villasana
3f643551e7 remove flakiness in TestDataRepair and TestSegmentStoreRepair (#2335)
* stop audit loop in repair tests to prevent possible timeout
2019-07-01 11:15:45 -04:00
Egon Elbre
2b68a72428
internal/testplanet: ensure that metainfo connections get closed (#2381) 2019-07-01 17:35:10 +03:00
Jeff Wendling
efcdaa43a3
lib/uplink: encryption context (#2349)
* lib/uplink: encryption context

Change-Id: I5c23dca3286a46b713b30c4997e9ae6e630b2280

* lib/uplink: bucket operation examples

Change-Id: Ia0f6e69f365dcff0cf11c731f51b30842bce053b

* lib/uplink: encryption key sharing test cases

Change-Id: I3a172d565f33f4e591402cdcb9460664a7cc7fbe

* fix encrypted path prefix restriction issue

Change-Id: I8f3921f9d52aaf4b84039de608b8cbbc88769554

* implement panics in libuplink encryption code

todo on cipher suite selection as well as an api concern

Change-Id: Ifa39eb3cc4b3443f7d96f9304df9b2ac4ec4085d

* implement GetProjectInfo api call to get salt

Change-Id: Ic5f6b3be9ea35df48c1aa214ab5d355fb328e2cf

* some fixes and accessors for encryption store

Change-Id: I3bb61f6712a037900e2a96e72ad4029ec1d3f718

* general fixes to builds/tests/etc

Change-Id: I9930fa96acb3b221d9a001f8e274af5729cc8a47

* java bindings changes

Change-Id: Ia2bd4c9c69739c8d3154d79616cff1f36fb403b6

* get libuplink examples passing

Change-Id: I828f09a144160e0a5dd932324f78491ae2ec8a07

* fix proto.lock file

Change-Id: I2fbbf4d0976a7d0473c2645e6dcb21aaa3be7651

* fix proto.lock again

Change-Id: I92702cf49e1a340eef6379c2be4f7c4a268112a9

* fix golint issues

Change-Id: I631ff9f43307a58e3b25a58cbb4a4cc2495f5eb6

* more linting fixes

Change-Id: I51f8f30b367b5bca14c94b15417b9a4c9e7aa0ce

* bug fixed by structs bump

Change-Id: Ibb03c691fce7606c35c08721b3ef0781ab48a38a

* retrigger

Change-Id: Ieee0470b6a2d07168a1578552e8e7f271ae93a13

* retrigger

Change-Id: I753d63853171e6a436c104ce176048892eb974c5

* semantic merge conflict

Change-Id: I9419448496de90340569047a6a16a1b858a7978a

* update total to match prod defaults

Change-Id: I693d55c1ebb28b5803ee1d26e9e198decf82308b

* retrigger

Change-Id: I28b74d5d6202f61aa3866fe407d423f6a0a14b9e

* retrigger

Change-Id: I6fd054885c715f602e2cef623fd464c42e88742c

* retrigger

Change-Id: I6a01bae88c72406d4ed5a8f13bf8a2b3c650bd2d
2019-06-27 17:36:51 +00:00
Egon Elbre
7b66e0cd7c Use dial to clarify that it's internally closing the connection. (#2347) 2019-06-26 15:14:48 +03:00