Commit Graph

3619 Commits

Author SHA1 Message Date
Jeff Wendling
05a240050e storagenode: monitor available space and bandwidth
Change-Id: I5763597327c5b32982faab8910c136c6c8dc18c5
2020-02-13 07:07:29 +00:00
Isaac Hess
ab660ccc80 statreceiver: Add http scheme to influx url
Change-Id: I3e6506a4efda19f7a5de949fd0e47e92edbd8db6
2020-02-12 16:26:36 -07:00
Isaac Hess
ddbaaf451e statreceiver: Add influx v3 output
Change-Id: Iffb07ae7304addd94f6c4b172f4fbb2f9ac42d8d
2020-02-12 14:26:38 -07:00
Isaac Hess
4e6a01f797 statreceiver: Rename metric_handlers to v2_metric_handlers
Change-Id: Ieb8bb6af55b23b82564050fb63df72c469d7644d
2020-02-12 14:10:40 -07:00
Isaac Hess
33e74f97de statreceiver: Add prod lua file to source control
Change-Id: Ife7f8b31a04e88e9604ea2f6a9dc95879d759ed6
2020-02-12 14:10:40 -07:00
Qweder93
eeaaa8aa98 satellite/payments/stripecoinpayments: added ApplyInvoiceCredits
Change-Id: I7ed9d8397c0aa59d4ce0d40d1e50d13929e0fe5f
2020-02-12 20:06:08 +02:00
Ivan Fraixedes
c4fd84ad3e satellite/metainfo: Add metrics and traces DeletePices
Trace the calls to DeletePiecesService.DeletePieces method and add
metrics for having statistics about the rate that specific storage node
is dialed and duration time spent on dialing storage nodes.

These statistics will help us to find out if we should implement
connections queues to storage node for reducing the deletion time in cae
that we see that we're spending too much time dialing frequent storage
nodes.

Ticket: https://storjlabs.atlassian.net/browse/SM-85
Change-Id: I9601676c3a8ad96c73c93833145929e4817755e2
2020-02-12 15:38:50 +00:00
Michal Niewrzal
cea4c25f53 mod: bump common and uplink version
Change-Id: Ia063d33c087dd91a46c008e154b078f11fa21527
2020-02-12 14:33:54 +00:00
Michal Niewrzal
2472554826 uplinkc: add ability to not set encryption restrictions with
restrict_scope

Change-Id: I445a945c9dad9d6b9ac5d973619914f50d973185
2020-02-12 11:56:13 +00:00
littleskunk
76849558cb satellite/gracefulexit: increase performance and tolerate higher error
rate

Graceful exit is very slow at the moment. Over the last couple days we
increase the batch size on Stefans satellite to 1000 but as a side
effect the error rate was increased. With a batch size of 500 the error
rate looks stable.
This PR will increase the default to batch size to 300. Graceful exit
will still be painful slow but at least it will be a bit faster. At the
same time this PR also increases the number of errors we tolerate. We
don't want to DQ slow storage nodes just because they didn't finish all
300 transfers in time. We want to give them more retries.

Change-Id: I92e3f99e116d4988457d8b902a88e85ed1bcc1a7
2020-02-12 11:40:15 +00:00
Kaloyan Raev
37cf42a9ae satellite/metainfo: overwrite zombie segments
Fixes https://storjlabs.atlassian.net/browse/USER-240

- Adds UnsynchronizedPut method to metainfo service that overwrites any
existing pointer under the same path
- Uses UnsynchronizedPut in the metainfo endpoint for committing the
segments

Change-Id: Icb43f31ea33f14066ca9dfdcf226eb3079b90948
2020-02-12 11:10:38 +00:00
Jeff Wendling
5d6cb68cd7 storage/{cockroachkv,postgreskv}: detailed monitoring for list
Change-Id: Iedba10776367233e59f3a6523efdb303b836b241
2020-02-12 10:55:07 +00:00
Egon Elbre
dbf46c4aa7 satellite/admin: administrative endpoint
Admin server allows creating basic REST and html API-s
for different administrative tasks.

Change-Id: I3dc1786abe1c87350eed60ec90e48130f44e63cf
2020-02-12 12:12:50 +02:00
Jeff Wendling
2d2f5e1a7f satellite/satellitedb/dbx: remove typo in dbx file and format it
Change-Id: I756315d6228ac9edd35cad8b496d36ecf2b5d26f
2020-02-11 14:15:13 -07:00
Cameron Ayer
f10b22eae9 accounting/tally: if delta < 0, delta = 0
if redis crashed in the middle of tally we could have a situation
where we erroneously subtract from a project total. Currently,
`latest` should never be less than `initial`

Change-Id: Ibb5ab724ac0ad4d684f7954fad7a9e061104b7df
2020-02-11 19:48:55 +00:00
Cameron Ayer
33d696b096 storage/redis/redisserver: simplify redisserver creation
Change-Id: I881576a7881db671b5abeeca7120a022987cc47f
2020-02-11 19:11:57 +00:00
Cameron Ayer
b22bf16b35 satellite/overlay: add config flag for node selection free disk requirement
Currently SNs report their free disk space once per hour. If a node
becomes full, it has to wait until the next contact cycle begins to
report; all the while receiving and failing upload requests. By increasing
the minimum required disk space, we can give the storage nodes more time
to report their space before the completely fill up. This change goes
hand-in-hand with another change we want to implement: trigger capacity
report on SN immediately upon falling below threshold.

Change-Id: I12f778286c6c3f582438b0e2949765ac43325e27
2020-02-11 18:08:25 +00:00
VitaliiShpital
dba647199a web/satellite: multiple storj coin transactions bug fix
Change-Id: If69c5ff65741de1d4ed1d555816df3710d6ee721
2020-02-11 17:30:07 +00:00
Michal Niewrzal
c59938479c Adding Monty Anderson to the CLA
Change-Id: I21ea8babb65637ecd57cb995d6848e8f34fe5cee
2020-02-11 17:11:24 +00:00
Simon Guindon
961944f24d satellite/orders: Resolve storage node addresses to IP addresses.
This change resolves all the storage node addresses to their IP addresses
before giving them to the uplink so that the uplink doesn't have to resolve
a hundred hosts and can immediately connect to improve uplink performance.

Change-Id: Idb834351e0fece409d74c8a1c29b0b8c9b09c9ff
2020-02-11 18:44:45 +02:00
Egon Elbre
429f08b4f0 satellite: add Admin peer
This peer will contain our administrative panels.
It's completely separated from our other satellite
processes because it allows better control for restricting
access to it.

Change-Id: Ifca473bee82ff6c680b346918ba32b835a7a6847
2020-02-11 16:15:33 +00:00
Egon Elbre
7d62b1cf3b cmd/storj-sim: don't wait for process to start indefinitely
In case the endpoint doesn't start, it might end up indefinitely
waiting for it to come up stalling jenkins.

Change-Id: Ib10bf1a25461e7532ec56ca705178bc9a7f85d12
2020-02-11 16:15:18 +00:00
Michal Niewrzal
426c8eb31a private/testplanet: add DeleteBucket method for uplink
New method added to be able to delete easily bucket during tests.

Change-Id: Iaae89618cc676ddbbbd4b0df2eeacd143ea6f3c2
2020-02-11 15:58:13 +00:00
Yaroslav Vorobiov
bd9cebda5b satellite/payments: fix transaction list pagination
Change-Id: I533f637e5cb12b47d7f7248f8bf7de93bd8be000
2020-02-11 16:22:53 +02:00
Ethan
208c05e3db Add metrics to track rate limit.
Add monkit metric for the rate-limit when the rate limit is hit
Logs warning with projectID

https://storjlabs.atlassian.net/browse/SM-165

Change-Id: I352dc40006021990d1bc66a999f62bbf8deb54db
2020-02-11 14:02:12 +00:00
Egon Elbre
ccd8b7f107 satellite/satellitedb: add benchmark for satellitedb setup and close
Change-Id: Ifb561f2eb81e439ea7cfa2ca2dad6b15aa50417e
2020-02-11 13:30:23 +00:00
Egon Elbre
d2fca76146 cmd/uplink/cmd: set exact argument counts
It was possible to call

   uplink cp a b c d e sj://bucket/something

Change-Id: I731da0da4530a3b3f8fbc569f363ba40cf84853a
2020-02-11 13:09:38 +00:00
NickolaiYurchenko
6cd86b2145 web/storagenode: npm audit fix
Change-Id: I09cb16ca8196d36b931e0a148cf73a2ce7ab5be0
2020-02-11 12:53:29 +00:00
Yaroslav Vorobiov
984ed26737 satellite/payments: fix invoice project records pagination
Change-Id: I68de69de78256280a6bbf0b744963b9c8c813007
2020-02-11 14:31:55 +02:00
Qweder93
dc075eaa96 satellite/payments : deposit bonuses (credits) added
Change-Id: Ib151bbb9b02d655fa619c53bfbc04ed6f3bb39e0
2020-02-11 11:11:42 +00:00
Yingrong Zhao
3331b443e7 satellite/metainfo: Delete all the piece of a storage node in one single
request

Change-Id: Ia8758d36f1a113b545e4f746d74d172421f14b24
2020-02-11 00:28:30 +00:00
Natalie Ventura Villasana
3900dadafd satellite/overlay: find new nodes with ExcludedIPs
Adds ExcludedIPs to the NodeCriteria for selecting new storage
nodes. Previously, ExcludedIPs was only added to the NodeCriteria
for selecting reputable storage nodes. Now that both are included
in the FindStorageNodesWithPreferences call, it should no longer
be possible to repair pieces to nodes that are on the same IP as
nodes already storing pieces from that segment.
Adds TestSelectNewStorageNodesExcludedIPs to make sure that
SelectNewStorageNodes returns nodes with different IP addresses.

https://storjlabs.atlassian.net/browse/V3-3011

Change-Id: Ic2d5e607cadeba6e8d5c40f9717149cb30880335
2020-02-10 23:45:17 +00:00
Moby von Briesen
c4a9a5d48b satellite/downtime: update detection and estimation downtime chores for
more trustworthy downtime tracking

Detection chore: Do not update downtime at all from the detection chore.
We only want to include downtime between two explicitly failed ping attempts
(the duration between last contact success and the first failed ping is no longer
included in downtime calculation)

Estimation chore: If the satellite started after the last failed ping for a node,
do not include offline time since the last failed ping time - only
estimate based on two failed pings with no satellite downtime in
between.
This protects us from including satellite downtime in our storagenode downtime calculations.

Change-Id: I1fddc9f7255a7023e02474255d70c64faae75b8a
2020-02-10 22:37:01 +00:00
Jeff Wendling
99c3ba5bbf testplanet: log stack trace for error during creation
Change-Id: Ifcd2cba4195413a7213ba4d113c43f9fb3cbc3e5
2020-02-10 21:59:20 +00:00
Jeff Wendling
bde302fdb8 jenkins: disable npm audit as its failing everything
Change-Id: I399e73b86144588758e50bdebbd0e65e8052c8b6
2020-02-10 13:32:40 -07:00
VitaliiShpital
a90955eced web/satellite: billing history refreshing bug fix
Change-Id: I221e2dc13179b9368dea1c7e1a46b8b49b79c729
2020-02-10 18:56:43 +02:00
littleskunk
68d5b1d6ec build/jenkins: allow -rc release tags
Change-Id: Ib1942a8029313debe672eaae6b4de835b74d5035
2020-02-10 12:01:18 +00:00
VitaliiShpital
55a3a90391 web/satellite: uplink CLI docs link behaviour on API keys page reworked
Change-Id: I5565c3c8e6e55720c2cbf527aa37b6d881047818
2020-02-10 11:35:54 +00:00
NikolaiYurchenko
6679036ace web/satellite: unauthorize error handled
Change-Id: I12c6937ed1660af097d6930fe2a90fac5f298311
2020-02-10 11:14:51 +00:00
Cameron Ayer
13903449c7 satellite/accounting: fix flaky TestProjectUsageStorage
Sometimes the upload that is supposed to fail due to excess usage
would pass. This looks to be because it's overwriting another object
uploaded earlier in the test and deleting the old pointer. If tally
happened to run after the pointer is deleted but before the current
upload reaches the live accounting check, it might pass through.
The solution is to upload to a different path each time.

Change-Id: Ie6c825b9c6eab9ed53426ae262e7997bcb6beb7f
2020-02-07 20:58:24 -05:00
Yingrong Zhao
f151a0b9e1 installer/windows: add service dependency for storagenode
When system restarts, local dns resolver may not be ready before our
application starts up. Adding a dependency for dns service will help
prevent dns lookup not available error for storagenode on system reboot.

Change-Id: Ie4be2813736e377df551fd8190f2247d3ae05ccd
2020-02-07 17:05:30 +00:00
Cameron Ayer
75355547c2 satellite/satellitedb: don't include GET_AUDIT and GET_REPAIR with chargeable BW
In the methods we use to retrieve a user's chargeable BW, we were summing GET, GET_AUDIT,
and GET_REPAIR. We only want to charge for GET

Change-Id: Icead7695494b22c7c835482cf8b1512a980d59f1
2020-02-07 12:02:44 +00:00
Egon Elbre
34f38bf6ce mod: upgrade miniredis to latest
miniredis 2.5.0 had a bug with matching keys with newlines.

Change-Id: I9bcf998459be6d7d4e03bca3589e989e5ed2304d
2020-02-06 13:31:17 +00:00
NikolaiYurchenko
8147d6ccce web/storagenode: chart date bug fixed
Change-Id: Id781bb8973958510e390286a8e6d4d79e3a36725
2020-02-06 13:53:19 +02:00
Jeff Wendling
7999d24f81 all: use monkit v3
this commit updates our monkit dependency to the v3 version where
it outputs in an influx style. this makes discovery much easier
as many tools are built to look at it this way.

graphite and rothko will suffer some due to no longer being a tree
based on dots. hopefully time will exist to update rothko to
index based on the new metric format.

it adds an influx output for the statreceiver so that we can
write to influxdb v1 or v2 directly.

Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff
2020-02-05 23:53:17 +00:00
Isaac Hess
17580fdf57 storagenode/pieces: Add test to cache store
This test checks that we are actually walking over the pieces when
starting the cache, and that it is returning expected values.

A recent outage was partially caused by the fact that this cache was
accidentally reading itself (via the pieces store, which has the cache
embedded). This test ensures that does not happen, and checks that when
the cache's `Run` method is called, the space used values are read from
disk and accurately update the cache.

Change-Id: I9ec61c4299ed06c90f79b17de3ffdbbb06bc502e
2020-02-05 11:39:06 -07:00
igor gaidaienko
efa0f6d443 storagenode/monitor: set MinimumDiskSpace default to 500GB.
As a workaround it was set to 0 in previous release. Now according to the TOC must be set to 500GB.

Change-Id: Ia2743d49e86683396958aff51b95df743af4f872
2020-02-04 15:55:42 +00:00
NikolaiYurchenko
384bdf1f58 web/storagenode: code refactoring pt.1
Change-Id: I948261c61e7ed7f9703a85314d7a14ca9a59b16d
2020-02-04 12:06:29 +00:00
Egon Elbre
91a480f5a0 Jenkins: add storagenode npm checks
Change-Id: I93e3cc009c628e3c97a24541e7b01c75a342bda6
2020-02-03 20:35:33 +02:00
NikolaiYurchenko
109d733dde web/storagenode: npm dependencies updated
Change-Id: I660d65cc38171d94d21225da8d1b82a08850eb27
2020-02-03 18:15:17 +00:00