Commit Graph

109 Commits

Author SHA1 Message Date
Egon Elbre
0a69da4ff1 all: switch to storj.io/common/uuid
Change-Id: I178a0a8dac691e57bce317b91411292fb3c40c9f
2020-03-31 19:16:41 +03:00
Egon Elbre
439aba922a satellite/overlay: reduce overhead of GetNodes
Instead of filtering on the client side it's better to filter on the
database side.

Change-Id: I845fbbe5ed28c2ffdb0b8a3f789b59c094fd1069
2020-03-30 18:36:23 +03:00
Egon Elbre
cb781d66c7 satellite/overlay: optimize FindStorageNodes
Reduce the number of fields returned from the query.

Benchmark results in `satellite/overlay`:

benchstat before.txt after2.txt
name                               old time/op  new time/op  delta
SelectStorageNodes-32              7.85ms ± 1%  6.27ms ± 1%  -20.18%  (p=0.002 n=10+4)
SelectNewStorageNodes-32           8.21ms ± 1%  6.61ms ± 0%  -19.53%  (p=0.002 n=10+4)
SelectStorageNodesExclusion-32     17.2ms ± 1%  15.9ms ± 1%   -7.55%  (p=0.002 n=10+4)
SelectNewStorageNodesExclusion-32  17.8ms ± 2%  16.1ms ± 0%   -9.38%  (p=0.002 n=10+4)
FindStorageNodes-32                48.4ms ± 1%  45.1ms ± 0%   -6.69%  (p=0.002 n=10+4)
FindStorageNodesExclusion-32       79.2ms ± 1%  76.1ms ± 1%   -3.89%  (p=0.002 n=10+4)

Benchmark results from `satellite/overlay` after making them parallel:

benchstat before-parallel.txt after2-parallel.txt
name                               old time/op  new time/op  delta
SelectStorageNodes-32               548µs ± 1%   353µs ± 1%  -35.60%  (p=0.029 n=4+4)
SelectNewStorageNodes-32            562µs ± 0%   368µs ± 0%  -34.51%  (p=0.029 n=4+4)
SelectStorageNodesExclusion-32     1.02ms ± 1%  0.84ms ± 0%  -18.08%  (p=0.029 n=4+4)
SelectNewStorageNodesExclusion-32  1.03ms ± 1%  0.86ms ± 2%  -16.22%  (p=0.029 n=4+4)
FindStorageNodes-32                3.11ms ± 0%  2.79ms ± 1%  -10.27%  (p=0.029 n=4+4)
FindStorageNodesExclusion-32       4.75ms ± 0%  4.43ms ± 1%   -6.56%  (p=0.029 n=4+4)

Change-Id: I1d85e2764eb270f4c2b1998303ccfc1179d65b26
2020-03-30 18:36:23 +03:00
Ethan
bdbf764b86 satellite/orders;overlay: Consolidate order limit storage node lookups into 1 query.
https: //storjlabs.atlassian.net/browse/SM-449
Change-Id: Idc62cc2978fba67cf48f7c98b27b0f996f9c58ac
2020-03-16 23:15:47 +00:00
Jess G
39cb821196
satellite/overlay: rm combinedcache, fix IP naming to be network (#3798)
* rn combinedcache, rm dns node lookup

Change-Id: I239f07211764b097d851230d8c81900a47756e9e

* excludeIPs -> excludedNetworks

Change-Id: Ifa6f44ab17457cdd5aff4cd5694296867c18b179

* use lowercase var name

Change-Id: I825aad2b718c71f455e747be18f8cabd02aabe55

* update Getnetwork name

Change-Id: I002a1b7bc6b4ef40159c0cd2b0ef209f80a9c503

* fix comments

Change-Id: Ibddf5b9ffa9d685af6c392d893db063ef18e45fa

* update comments with ipv6

Change-Id: I31758b7d4979e7c27d014668f4fb532ad838cda2

Co-authored-by: Stefan Benten <mail@stefan-benten.de>
2020-03-12 11:37:57 -07:00
Jessica Grebenschikov
803e2930f4 satellite: use IP for all uplink operations, use hostname for audit and repairs
My understanding is that the nodes table has the following fields:
- `address` field which can be a hostname or an IP
- `last_net` field that is the /24 subnet of the IP resolved from the address

This PR does the following:
1) add back the `last_ip` field to the nodes table
2) for uplink operations remove the calls that the satellite makes to `lookupNodeAddress` (which makes the DNS calls to resolve the IP from the hostname) and instead use the data stored in the nodes table `last_ip` field. This means that the IP that the satellite sends to the uplink for the storage nodes could be approx 1 hr stale. In the short term this is fine, next we will be adding changes so that the storage node pushes any IP changes to the satellite in real time.
3) use the address field for repair and audit since we want them to still make DNS calls to confirm the IP is up to date
4) try to reduce confusion about hostname, ip, subnet, and address in the code base

Change-Id: I96ce0d8bb78303f82483d0701bc79544b74057ac
2020-03-11 09:11:40 -07:00
Jessica Grebenschikov
2af71f3460 satellite/orders: add monkit to looking up node addr
Change-Id: Ia0eb0ffc343879a6ef9827d46e936e1fbc2e198a
2020-03-04 23:15:18 +00:00
Egon Elbre
5342dd9fe6 go.mod: update uplink
Change-Id: I867a6a1eef8aa5d60bb676e5112b98c4192ce811
2020-02-21 16:08:12 +02:00
Simon Guindon
961944f24d satellite/orders: Resolve storage node addresses to IP addresses.
This change resolves all the storage node addresses to their IP addresses
before giving them to the uplink so that the uplink doesn't have to resolve
a hundred hosts and can immediately connect to improve uplink performance.

Change-Id: Idb834351e0fece409d74c8a1c29b0b8c9b09c9ff
2020-02-11 18:44:45 +02:00
Jessica Grebenschikov
a1948ed338 satellite/orders: add old method for CreateGetOrderLimitsOld to maintain compatibility with old versions of the uplink
Change-Id: I7ce1f4fbc6217f1d340cf778c4b010d40961b3f0
2020-01-28 18:54:24 -05:00
Jessica Grebenschikov
54dbaaece2 satellite/orders: create as many orderLimits as needed to download a file
Change-Id: I2a39483d35037d9940913c035a78a93ea692ce9f
2020-01-28 20:04:11 +00:00
littleskunk
a6c6440ab7 satellite/order: decrease expire time from 7 days to 2 days
For the last few month we had no issues with order submission. I would
call it stable and now it is time to risk a lower expire time. This will
increase the database performance on the satellite and it will reduce
the delay for billing.

The long term goal is 6h but for that step we need to change graceful
exit first. At the moment storage nodes would get disuqlaified for not
transfering alle pieces in less than 6 hours.

Change-Id: I421a2c2421c5374c4e706e2338f1c2161fedc14c
2020-01-24 23:37:39 +00:00
Jeff Wendling
26e33e7e07 satellite/gracefulexit: make orders with right bucket id and action
paths are organized as follows:

    project_id/segment_index/bucket_name/encrypted_key

so by picking parts[0] and parts[1], we were using the segment
index instead of the bucket name, causing bandwidth to be
accounted for incorrectly. additionally, we were using the
PUT action instead of the PUT_GRACEFUL_EXIT action, causing
the data to be charged incorrectly. we use PUT_REPAIR for
now because nodes won't accept uploads with PUT_GRACEFUL_EXIT
and our tables need migrations to handle rollups with it.

Change-Id: Ife2aff541222bac930c35df8fcf76e8bac5d60b2
2020-01-24 19:27:38 +00:00
stefanbenten
f4097d518c satellite: reduce logging of node status
Change-Id: I6618cf4bf31b856acd7a28b54011a943c03ab22a
2020-01-18 17:47:59 +00:00
Jeff Wendling
78c6d5bb32 satellite/satellitedb: reported_serials table for processing orders
this commit introduces the reported_serials table. its purpose is
to allow for blind writes into it as nodes report in so that we have
minimal contention. in order to continue to accurately account for
used bandwidth, though, we cannot immediately add the settled amount.
if we did, we would have to give up on blind writes.

the table's primary key is structured precisely so that we can quickly
find expired orders and so that we maximally benefit from rocksdb
path prefix compression. we do this by rounding the expires at time
forward to the next day, effectively giving us storagenode petnames
for free. and since there's no secondary index or foreign key
constraints, this design should use significantly less space than
the current used_serials table while also reducing contention.

after inserting the orders into the table, we have a chore that
periodically consumes all of the expired orders in it and inserts
them into the existing rollups tables. this is as if we changed
the nodes to report as the order expired rather than as soon as
possible, so the belief in correctness of the refactor is higher.

since we are able to process large batches of orders (typically
a day's worth), we can use the code to maximally batch inserts into
the rollup tables to make inserts as friendly as possible to
cockroach.

Change-Id: I25d609ca2679b8331979184f16c6d46d4f74c1a6
2020-01-15 19:21:21 -07:00
Isaac Hess
4950d7106a satellite/orders: Add write cache for bw rollups
Change-Id: I8ba454cb2ab4742cafd6ed09120e4240874831fc
2020-01-13 22:40:51 +00:00
littleskunk
bcc23f6869
Satellite/orders: remove allocated bandwith from storagenode_bandwidth_rollups
When an uplink requests an upload or download from the satellite we are trackig the
allocated bandwidth twice. The value in bucket_bandwidth_rollups is used
for project limits but the value in storagenode_bandwidth_rollups is not
used at all. We can increase the performance by removing it. Uplinks
will get a faster response from the satellite.

Change-Id: Icccd41f94107ef34668f30f99bf5f728c384b07e
2020-01-12 16:20:47 +01:00
Egon Elbre
082ec81714
uplink: move to storj.io/uplink (#3746) 2020-01-08 15:40:19 +02:00
Egon Elbre
6615ecc9b6 common: separate repository
Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a
2019-12-27 14:11:15 +02:00
Yingrong Zhao
7af42e3c10
satellite/metainfo, satellite/repair, uplink/eestream: add metric for download failed due to not enough pieces available (#3665) 2019-12-04 16:24:36 -05:00
Ethan Adams
3e0d12354a
storagenode/gracefulexit: Implement storage node graceful exit worker - part 1 (#3322) 2019-10-22 16:42:21 -04:00
Ethan Adams
a1275746b4
satellite/gracefulexit: Implement the 'process' endpoint on the satellite (#3223) 2019-10-11 17:18:05 -04:00
Egon Elbre
a801fab66a
all: add archview annotations (#2964) 2019-09-10 16:24:16 +03:00
Egon Elbre
2d69d47655
all: fix Error.New formatting (#2840) 2019-08-21 19:30:29 +03:00
ethanadams
1a69ec8318
satellite/orders: document protocol and fix typos (#2813)
* Addressing comments from PR 2762
* Rebuild of orders.pb.go after comments added to proto file
* run update-satellite-config-lock for spelling fix.
2019-08-19 09:36:11 -04:00
ethanadams
8df683a265
Update satellite settlement endpoint to batch order processing into transactions. (#2762)
Update satellite settlement endpoint to batch order processing into transactions
2019-08-15 15:05:43 -04:00
Egon Elbre
c8edeb0257
satellite/overlay: rename overlay.Cache to overlay.Service (#2717) 2019-08-06 19:35:59 +03:00
Egon Elbre
5d0816430f
rename all the things (#2531)
* rename pkg/linksharing to linksharing
* rename pkg/httpserver to linksharing/httpserver
* rename pkg/eestream to uplink/eestream
* rename pkg/stream to uplink/stream
* rename pkg/metainfo/kvmetainfo to uplink/metainfo/kvmetainfo
* rename pkg/auth/signing to pkg/signing
* rename pkg/storage to uplink/storage
* rename pkg/accounting to satellite/accounting
* rename pkg/audit to satellite/audit
* rename pkg/certdb to satellite/certdb
* rename pkg/discovery to satellite/discovery
* rename pkg/overlay to satellite/overlay
* rename pkg/datarepair to satellite/repair
2019-07-28 08:55:36 +03:00
Ivan Fraixedes
f420b29d35
[V3-1927] Repairer uploads to max threshold instead of success… (#2423)
* pkg/datarepair: Add test to check num upload pieces
  Add a new test for ensuring the number of pieces that the repair process
  upload when a segment is injured.
* satellite/orders: Don't create "put order limits" over total
  Repair must not create "put order limits" more than the total count.
* pkg/datarepair: Update upload repair pieces test
  Update the test which checks the number of pieces which are uploaded
  during a repair for using the same excess over the success threshold
  value than the implementation.
* satellites/orders: Limit repair put order for not being total
  Limit the number of put orders to be used by repair for only uploading
  pieces to a % excess over the successful threshold.
* pkg/datarepair: Change DataRepair test to pass again
  Make some changes in the DataRepair test to make pass again after the
  repair upload repaired pieces only until a % excess over success
  threshold.
  Also update the steps description of the DataRepair test after it has been
  changed, to match on what's now, besides to leave it more generic for
  avoiding having to update it on minimal future refactorings.
* satellite: Make repair excess optimal threshold configurable
  Add a new configuration parameter to the satellite for being able to
  configure the percentage excess over the optimal threshold, used for
  determining how many pieces should be repaired/uploaded, rather than
  having the value hard coded.
* repairer: Add configurable param to segments/repairer
  Add a new parameters to the segment/repairer to calculate the maximum
  number of excess nodes, based on the optimal threshold, that repaired
  pieces can be uploaded.
  This new parameter has been added for not returning more nodes than the
  number of upload orders for data repair satellite service calculate for
  repairing pieces.
* pkg/storage/ec: Update log message in clien.Repair
* satellite: Update configuration lock file
2019-07-12 00:44:47 +02:00
Egon Elbre
d52f764e54
protocol: implement new piece signing and verification (#2525) 2019-07-11 16:51:40 -04:00
Bill Thorp
0e463dccfd
7 day validity window for order limits (#2520)
* 7 day limit
2019-07-10 17:17:00 -04:00
Alexander Leitner
1c5db71faf
Change protobuf expirations to use time.Time (#2509)
* Change protobuf expirations to use time.Time instead of timestamp.Timestamp
2019-07-09 17:54:00 -04:00
Michal Niewrzal
2ee5bada2c
Add pieceNum to PieceID derivation function (#2193) 2019-07-03 18:53:15 +02:00
Alexander Leitner
6d55bbdb57
OrderLimit creation date time limit (#2412)
* Limit by order creation
2019-07-02 12:06:12 -04:00
Egon Elbre
385c046723
pkg/pb: rename Order2 to Order, OrderLimit2 to OrderLimit (#2406) 2019-07-01 18:54:11 +03:00
Egon Elbre
caa2fcf62b satellite/orders: don't panic (#2331) 2019-06-26 09:26:33 +02:00
Jess G
e5c48fab74
fix ordersDB methods to take correct args (#2314)
* fix orderdDB methods to take correct args

* update tally to save projectID in correct format

* update var names in splitBucket test

* changes per CR comments
2019-06-25 08:58:42 -07:00
Kaloyan Raev
7bf5a09638 Use ErrNodeOffline when returning error for offline node (#2312) 2019-06-25 00:15:38 +02:00
Fadila
8226024ca8
Do not use disqualified nodes when asking for get order limits (#2303)
Add checks about disqualification when creating order limits
2019-06-24 16:46:10 +02:00
JT Olio
568b000e9b satellite: make order expiration configurable (#2251) 2019-06-21 13:38:40 +03:00
JT Olio
76b54458e9 satellite: send external address in order limits (#2278) 2019-06-21 12:19:52 +03:00
Natalie Villasana
9386187fe6
add disqualification and new reputation system into overlay cache (#2227) 2019-06-20 09:56:04 -04:00
Kaloyan Raev
8e29ef8a6b Use zap.Stringer instead of zap.String (#2223) 2019-06-18 01:37:43 +02:00
JT Olio
14486b1885 satellite/orders: TODOs about transactions (#2134) 2019-06-12 17:00:29 +02:00
Egon Elbre
61c8c3fb49
satellite/orders: batch update storage node allocations (#2146) 2019-06-10 17:58:28 +03:00
JT Olio
ccb158c99b
pkg/auth: add monkit task to missing places (#2123)
What: add monkit.Task to a bunch of functions that are missing it

Why: this will significantly help our instrumentation, data collection, and tracing about what's going on in the network
2019-06-05 07:47:01 -06:00
JT Olio
29d16b4d68 satellite: add monkit task to missing places (#2108) 2019-06-04 13:55:37 +02:00
Natalie Villasana
aa6ff17b70 add Reverify to auditing (#2041)
Co-authored-by: Maximillian von Briesen <mobyvb@gmail.com>
Co-authored-by: Kaloyan Raev <kaloyan@storj.io>
2019-05-27 14:13:47 +03:00
Bill Thorp
cd4a3e06d8
wired up IsHealthy to config (#1820)
* wired up IsHealthy to config
2019-04-23 18:45:50 -04:00
Kaloyan Raev
8fc5fe1d6f
Refactor pb.Node protobuf (#1785) 2019-04-22 12:07:50 +03:00
Cameron
32192aca10
convert times to UTC before entering as utimestamp (#1708) 2019-04-09 15:12:58 -04:00
Maximillian von Briesen
bb3b4e4816 Data repair integration test (#1582) 2019-04-08 13:33:47 -04:00
Michal Niewrzal
5ea797889b
Store inline bandiwdth (#1619) 2019-04-05 09:42:56 +02:00
Jennifer Li Johnson
8549421385
Remove bw from tally service + query bandwidth in rollup service (#1618) 2019-04-04 11:20:59 -04:00
Maximillian von Briesen
6028d8c3de
Fix repair order limit creation (#1650)
* fix repair order limit creation (use piece size instead of share size)
* fix counter in CreateGetRepairOrderLimits
2019-04-03 09:17:32 -04:00
Michal Niewrzal
f80750693c Store bandwidth from orders on satellite (#1586) 2019-04-01 16:14:58 -04:00
Kaloyan Raev
9fb99c8484
Ensure serial number is not saved if error during order limit creation (#1602) 2019-03-29 11:53:53 +02:00
Kaloyan Raev
f9ba935286
Merge overlay_cache_nodes into nodes table (#1581) 2019-03-29 10:53:43 +02:00
Egon Elbre
be06fdfd6c Create orders.Service (#1593) 2019-03-28 22:09:23 +02:00