Commit Graph

2461 Commits

Author SHA1 Message Date
littleskunk
9d1910cb2b
storagenode/orderarchive: Reduce TTL from 45 to 7 days (#2915) 2019-08-29 22:38:09 +02:00
JT Olio
b3f9a8813d pkg/process: remove prometheus help (#2914)
the current prometheus help messages have enough unexpected
characters that they are breaking prometheus parsing. they
may also be triggering prometheus to expect more from us (type
annotations) than we have to offer.

we're really not adding a lot of value with these help messages,
so just take them out

Change-Id: I9b723447a294bb492a6292480e9f88634346a80b
2019-08-29 12:42:11 -07:00
Ivan Fraixedes
537769d7fa
storagenode/orders: Don't return error Archiving unsent (#2903)
Don't return error when archiving errors which aren't found in the DB
because it causes Storage Node send orders cycle to stop.

This was applied in the commit e47b8ed131
but the last call to orders.Archive function was missed so the errors
weren't returned when not found orders in the first call but they were
returned in the second call.

This commit address the second call for making handleBatches function
never returns error on not found orders.
2019-08-29 20:22:22 +02:00
Matt Robinson
243d280591
Clean up jenkins message to slack (#2912) 2019-08-29 13:25:49 -04:00
Egon Elbre
8a5db77e04
storagenode/retain: add comment (#2910) 2019-08-29 19:42:17 +03:00
littleskunk
7e37452abb open port for storage node dashboard API (#2899) 2019-08-29 11:47:43 -04:00
Matt Robinson
aa37334e7a
Send failures to slack for added visibility (#2909) 2019-08-29 11:35:02 -04:00
ethanadams
4ede12a2ab
satellite/orders: Fix for V3-2529: Release v0.19.0 storage nodes can't submit orders, duplicate key value violates unique constraint (#2900)
* V3-2529: Add DB savepoint to fix issue with postgres. Add test force a rejected order

Co-Authored-By: Ivan Fraixedes <ivan@fraixed.es>

* Update satellite/satellitedb/orders.go
2019-08-29 11:14:10 -04:00
Yehor Butko
24a36999ba
Revert "web/satellite: navigation, button and project members unit tests (#2904)" (#2905) 2019-08-29 18:06:17 +03:00
Yehor Butko
5bb51c9876
web/satellite: navigation, button and project members unit tests (#2904) 2019-08-29 17:49:34 +03:00
Nikolay Yurchenko
368f6cc320
web/satellite: account route redirect fix (#2895) 2019-08-29 17:41:27 +03:00
Yingrong Zhao
8eda360ad3
add segment path into logs (#2898) 2019-08-29 08:38:26 -04:00
Vitalii Shpital
07d6019a13
web/satellite: project members UI slightly reworked, bugs and tests fixed (#2896) 2019-08-29 13:05:22 +03:00
Michal Niewrzal
5fb823843f
Fix downloading non encrypted segments (#2870) 2019-08-29 10:00:20 +02:00
Yaroslav Vorobiov
b4d7d6778f
storagenode/reputation: add disqualified flag (#2862) 2019-08-28 23:54:12 +03:00
Egon Elbre
62e3bf5b34 storagenode/retain: fix concurrency issues (#2828)
* nicer flags

* fix concurrency

* add concurrent workers

* initialize things

* fix tests

* close retain service

* ensure we don't have workers working on the same satellite

* ensure things compile

* fix other compilation issues:

* concurrency changes

ran this with `go test -count=1000` and it passed all of them.

- we add a closed channel so that we can select on it with
  context cancellation.
- we put a once in so we only close the channel once.
- every time the queue/running state changes, we have to broadcast
  because we may want to wake up N pending Wait calls or other
  concurrent workers.
- because we broadcast, we don't need to do the polling in Wait
  anymore.
- ensure Run doesn't start multiple times so that we don't have
  to worry about concurrent Close with multiple Runs.
- hold the lock while we start workers so that a concurrent Close
  with Run can't decide that there's nothing started and exit
  and then have Run start things.
- make sure to poll the closed/context channels through loops
  or at the start of Run calls in case Close happens first.
- these polls should be under a mutex because they have a default
  case which makes it possible to schedule such that Close hasn't
  executed the channel close so it starts more work.
- cancel a local Run context when it's going to exit to make sure
  that any retainPieces calls have a canceled context.
- hopefully enough comments to both check my work and help readers
  digest what's going on.

Change-Id: Ida0e226a7e01e8ae64fa2c59dd5a84b04bccfbd7

* use the retain error class

Change-Id: I1511eaef135f98afd57b878e997e4c8a0d11cafc

* concurrency fixes again

- forgot to update the gc test to use the old Wait api.
- we need to drop the lock while we wait for the workers
  to exit, because they may be blocked on the condition
  variable
- additionally, we need to broadcast when we close the
  signal channel because the state changed: they want
  to wake up and exit.

Change-Id: I4204699792275260cd912f29aa73720f7d9b14b5

* undo my misguided rename

Change-Id: I6baffe1eb0434e260212c485bbcc01bed3250881

* remove pollInterval

* format paragraph more nicely

* move skew calculation into retain pieces
2019-08-28 16:35:25 -04:00
Egon Elbre
842c7118c5
docs/design: update audit v2 (#2901) 2019-08-28 22:47:43 +03:00
Ivan Fraixedes
c7bd75b032
docs/design: Storage Node downtime tracking (#2857)
Create the Storage Node downtime tracking design document, using the current template and revision workflow approval.
2019-08-28 21:05:18 +02:00
Jess G
3e121b13e0
docs/design: satellite service separation (#2815)
* add md extension to dd process doc

* rm old format of dd process doc

* wip design doc

* add diagram, add implementation

* adjust format

* fix format

* add updates per CR and arch review meeting

* update diagram sizes

* make image smaller

* try using svg instead of png

* replace png with svg, mv to docs/images dir

* updates per CR comments

* more CR comment changes

* update sa design img

* grammar fixes

* a few more updates

* fixings nits

* fix spelling err

* fix spelling, change binary to process, rm process doc

* add changes to design imagees per CR comments
2019-08-28 09:28:53 -07:00
Yehor Butko
7b874db8ce
web/satellite project related bugs fixed (#2894) 2019-08-28 17:23:37 +03:00
Egon Elbre
66ec727a37 jenkins: backwards-compatibility, don't overwrite installed binaries (#2892) 2019-08-28 09:57:34 -04:00
Nikolay Yurchenko
499c4d0c26
web/satellite: navigation bugs fixed (#2893) 2019-08-28 16:08:19 +03:00
Ivan Fraixedes
46a495fbaf docs/design: Fix typos & remove not applicable title (#2879) 2019-08-28 15:55:36 +03:00
Ivan Fraixedes
ec715cba9e docs/design: Remove subtitle part not applicable (#2880) 2019-08-28 15:38:08 +03:00
Matt Robinson
1b2a2d045f
[build]: Adjust arm63v6 and aarch64 images to match convention (#2845)
* Adjust arm32v6 and aarch64 docker images to match the hello-world image

* Update from master, fix potential bug in push-images target, and update storagenode deploy to handle arm64 image
2019-08-28 08:18:56 -04:00
Nikolay Yurchenko
8c24399438
web/satellite: usage api refactored (#2864) 2019-08-28 12:53:53 +03:00
Bogdan Artemenko
8fbb25f3b5
web/satellite: ProjectMembers unit-tests refactoring. (#2865) 2019-08-28 11:29:40 +03:00
Matt Robinson
f404aad878
Add deploy script for storagenodes (#2889) 2019-08-27 16:27:34 -04:00
Jennifer Li Johnson
c8405f6c2b
docs/design/archive: moves archive directory (#2885) 2019-08-27 14:59:52 -04:00
Ivan Fraixedes
b587c93f43
satellite/gc: Service run must call mon.Task (#2887)
The call to monkit for functions which mostly run from the beginning to
the end of the satellite process must be done because it only causes a
little overhead.
2019-08-27 20:20:27 +02:00
Natalie Villasana
49303ea3ac
satellite/audit: mv ReservoirService into its own file (#2886) 2019-08-27 13:39:51 -04:00
Cameron
599324c364
satellite/dbcleanup: delete expired serials from satellite (#2867)
Creates a new chore, dbcleanup, which can be used for routine deletion of items from the satellite database and adds functionality for deletion of expired serial numbers
2019-08-27 13:12:38 -04:00
Egon Elbre
c309bd3fec
lint: add linting for errs package (#2881) 2019-08-27 19:07:12 +03:00
Jennifer Li Johnson
9abc3e9d69
moves audit gating design draft to doc archive (#2883) 2019-08-27 10:58:02 -04:00
Kaloyan Raev
106a21ebe0
docs/design: Use WiX toolset directly instead of go-msi for Windows installer (#2851) 2019-08-27 16:06:00 +03:00
Bryan White
a33106df1c
satellite/satellitedb: persist piece counts to/from db (#2803) 2019-08-27 14:37:42 +02:00
Stefan Benten
d0ab3c03ec cmd/*: Change loglevel from error to warn (#2876) 2019-08-27 11:24:47 +02:00
Bill Thorp
a250551b6d storagenode/piecestore + uplink/piecestore: return PieceHash and original OrderLimit during GET_REPAIR (#2775) 2019-08-26 14:57:41 -04:00
Egon Elbre
977472ed32 all: use fewer storage nodes to improve test memory usage (#2875)
* storagenode/inspector: use less storage nodes

* lib/uplinkc: use fewer storage nodes
2019-08-26 14:40:44 -04:00
aligeti
33aff71959 satellitedb/overlay: add database for storing peer identities (#2764) 2019-08-26 19:49:42 +03:00
Cameron
1f3537d4a9 storagenode/vouchers: remove storagenode vouchers (#2873) 2019-08-26 19:35:19 +03:00
Alexander Leitner
462640a9fe docs/design: automatic updater (#2789) 2019-08-26 17:45:18 +03:00
Egon Elbre
d5667fbe35
lib/uplinkc: ensure it compiles on 32bit arch (#2835) 2019-08-26 16:12:26 +03:00
Egon Elbre
36c9d569ff
design/docs: add successful pingback to kademlia removal document (#2837) 2019-08-26 15:34:04 +03:00
Yingrong Zhao
051052307d satellite/rewards: add mongodb into partner info (#2800) 2019-08-26 15:19:19 +03:00
Egon Elbre
6ff94caf22
satellite/satellitedb: move tests near the interface (#2863) 2019-08-26 13:19:02 +03:00
Yingrong Zhao
4e16a5c598
satellite/marketingweb: fix broken pipe error (#2853)
* add a writer wrapper

* remove unused code

* read out the rest of the connection in client

* remove unused code

* no panic

* check response status code
2019-08-23 14:33:21 -04:00
Maximillian von Briesen
65e2d2e711
storagenode/piecestore: ignore canceled errors on download (#2822)
* ignore canceled errors on piecestore endpoint download
2019-08-23 11:16:43 -04:00
Yaroslav Vorobiov
2ae4129d06
satellite/nodestats: add disqualified flag #2856 2019-08-23 13:58:20 +03:00
JT Olio
12d50ebb99
streams: don't encrypt segment count (#2859)
What: this change makes sure the count of segments is not encrypted.

Why: having the segment count encrypted just makes things hard for no reason - a satellite operator can figure out how many segments an object has by looking at the other segments in the database. but if a user has access but has lost their encryption key, they now can't clean up or delete old segments because they can't know how many there are without just guessing until they get errors. :(

Backwards compatibility: clients will still understand old pointers and will still write old pointers. at some point in the future perhaps we can do a migration for remaining old pointers so we can delete the old code.

Please describe the tests: covered by existing tests

Please describe the performance impact: none
2019-08-22 15:15:58 -06:00