Commit Graph

4258 Commits

Author SHA1 Message Date
Jeff Wendling
5d6cb68cd7 storage/{cockroachkv,postgreskv}: detailed monitoring for list
Change-Id: Iedba10776367233e59f3a6523efdb303b836b241
2020-02-12 10:55:07 +00:00
Egon Elbre
dbf46c4aa7 satellite/admin: administrative endpoint
Admin server allows creating basic REST and html API-s
for different administrative tasks.

Change-Id: I3dc1786abe1c87350eed60ec90e48130f44e63cf
2020-02-12 12:12:50 +02:00
Jeff Wendling
2d2f5e1a7f satellite/satellitedb/dbx: remove typo in dbx file and format it
Change-Id: I756315d6228ac9edd35cad8b496d36ecf2b5d26f
2020-02-11 14:15:13 -07:00
Cameron Ayer
f10b22eae9 accounting/tally: if delta < 0, delta = 0
if redis crashed in the middle of tally we could have a situation
where we erroneously subtract from a project total. Currently,
`latest` should never be less than `initial`

Change-Id: Ibb5ab724ac0ad4d684f7954fad7a9e061104b7df
2020-02-11 19:48:55 +00:00
Cameron Ayer
33d696b096 storage/redis/redisserver: simplify redisserver creation
Change-Id: I881576a7881db671b5abeeca7120a022987cc47f
2020-02-11 19:11:57 +00:00
Cameron Ayer
b22bf16b35 satellite/overlay: add config flag for node selection free disk requirement
Currently SNs report their free disk space once per hour. If a node
becomes full, it has to wait until the next contact cycle begins to
report; all the while receiving and failing upload requests. By increasing
the minimum required disk space, we can give the storage nodes more time
to report their space before the completely fill up. This change goes
hand-in-hand with another change we want to implement: trigger capacity
report on SN immediately upon falling below threshold.

Change-Id: I12f778286c6c3f582438b0e2949765ac43325e27
2020-02-11 18:08:25 +00:00
VitaliiShpital
dba647199a web/satellite: multiple storj coin transactions bug fix
Change-Id: If69c5ff65741de1d4ed1d555816df3710d6ee721
2020-02-11 17:30:07 +00:00
Michal Niewrzal
c59938479c Adding Monty Anderson to the CLA
Change-Id: I21ea8babb65637ecd57cb995d6848e8f34fe5cee
2020-02-11 17:11:24 +00:00
Simon Guindon
961944f24d satellite/orders: Resolve storage node addresses to IP addresses.
This change resolves all the storage node addresses to their IP addresses
before giving them to the uplink so that the uplink doesn't have to resolve
a hundred hosts and can immediately connect to improve uplink performance.

Change-Id: Idb834351e0fece409d74c8a1c29b0b8c9b09c9ff
2020-02-11 18:44:45 +02:00
Egon Elbre
429f08b4f0 satellite: add Admin peer
This peer will contain our administrative panels.
It's completely separated from our other satellite
processes because it allows better control for restricting
access to it.

Change-Id: Ifca473bee82ff6c680b346918ba32b835a7a6847
2020-02-11 16:15:33 +00:00
Egon Elbre
7d62b1cf3b cmd/storj-sim: don't wait for process to start indefinitely
In case the endpoint doesn't start, it might end up indefinitely
waiting for it to come up stalling jenkins.

Change-Id: Ib10bf1a25461e7532ec56ca705178bc9a7f85d12
2020-02-11 16:15:18 +00:00
Michal Niewrzal
426c8eb31a private/testplanet: add DeleteBucket method for uplink
New method added to be able to delete easily bucket during tests.

Change-Id: Iaae89618cc676ddbbbd4b0df2eeacd143ea6f3c2
2020-02-11 15:58:13 +00:00
Yaroslav Vorobiov
bd9cebda5b satellite/payments: fix transaction list pagination
Change-Id: I533f637e5cb12b47d7f7248f8bf7de93bd8be000
2020-02-11 16:22:53 +02:00
Ethan
208c05e3db Add metrics to track rate limit.
Add monkit metric for the rate-limit when the rate limit is hit
Logs warning with projectID

https://storjlabs.atlassian.net/browse/SM-165

Change-Id: I352dc40006021990d1bc66a999f62bbf8deb54db
2020-02-11 14:02:12 +00:00
Egon Elbre
ccd8b7f107 satellite/satellitedb: add benchmark for satellitedb setup and close
Change-Id: Ifb561f2eb81e439ea7cfa2ca2dad6b15aa50417e
2020-02-11 13:30:23 +00:00
Egon Elbre
d2fca76146 cmd/uplink/cmd: set exact argument counts
It was possible to call

   uplink cp a b c d e sj://bucket/something

Change-Id: I731da0da4530a3b3f8fbc569f363ba40cf84853a
2020-02-11 13:09:38 +00:00
NickolaiYurchenko
6cd86b2145 web/storagenode: npm audit fix
Change-Id: I09cb16ca8196d36b931e0a148cf73a2ce7ab5be0
2020-02-11 12:53:29 +00:00
Yaroslav Vorobiov
984ed26737 satellite/payments: fix invoice project records pagination
Change-Id: I68de69de78256280a6bbf0b744963b9c8c813007
2020-02-11 14:31:55 +02:00
Qweder93
dc075eaa96 satellite/payments : deposit bonuses (credits) added
Change-Id: Ib151bbb9b02d655fa619c53bfbc04ed6f3bb39e0
2020-02-11 11:11:42 +00:00
Yingrong Zhao
3331b443e7 satellite/metainfo: Delete all the piece of a storage node in one single
request

Change-Id: Ia8758d36f1a113b545e4f746d74d172421f14b24
2020-02-11 00:28:30 +00:00
Natalie Ventura Villasana
3900dadafd satellite/overlay: find new nodes with ExcludedIPs
Adds ExcludedIPs to the NodeCriteria for selecting new storage
nodes. Previously, ExcludedIPs was only added to the NodeCriteria
for selecting reputable storage nodes. Now that both are included
in the FindStorageNodesWithPreferences call, it should no longer
be possible to repair pieces to nodes that are on the same IP as
nodes already storing pieces from that segment.
Adds TestSelectNewStorageNodesExcludedIPs to make sure that
SelectNewStorageNodes returns nodes with different IP addresses.

https://storjlabs.atlassian.net/browse/V3-3011

Change-Id: Ic2d5e607cadeba6e8d5c40f9717149cb30880335
2020-02-10 23:45:17 +00:00
Moby von Briesen
c4a9a5d48b satellite/downtime: update detection and estimation downtime chores for
more trustworthy downtime tracking

Detection chore: Do not update downtime at all from the detection chore.
We only want to include downtime between two explicitly failed ping attempts
(the duration between last contact success and the first failed ping is no longer
included in downtime calculation)

Estimation chore: If the satellite started after the last failed ping for a node,
do not include offline time since the last failed ping time - only
estimate based on two failed pings with no satellite downtime in
between.
This protects us from including satellite downtime in our storagenode downtime calculations.

Change-Id: I1fddc9f7255a7023e02474255d70c64faae75b8a
2020-02-10 22:37:01 +00:00
Jeff Wendling
99c3ba5bbf testplanet: log stack trace for error during creation
Change-Id: Ifcd2cba4195413a7213ba4d113c43f9fb3cbc3e5
2020-02-10 21:59:20 +00:00
Jeff Wendling
bde302fdb8 jenkins: disable npm audit as its failing everything
Change-Id: I399e73b86144588758e50bdebbd0e65e8052c8b6
2020-02-10 13:32:40 -07:00
VitaliiShpital
a90955eced web/satellite: billing history refreshing bug fix
Change-Id: I221e2dc13179b9368dea1c7e1a46b8b49b79c729
2020-02-10 18:56:43 +02:00
littleskunk
68d5b1d6ec build/jenkins: allow -rc release tags
Change-Id: Ib1942a8029313debe672eaae6b4de835b74d5035
2020-02-10 12:01:18 +00:00
VitaliiShpital
55a3a90391 web/satellite: uplink CLI docs link behaviour on API keys page reworked
Change-Id: I5565c3c8e6e55720c2cbf527aa37b6d881047818
2020-02-10 11:35:54 +00:00
NikolaiYurchenko
6679036ace web/satellite: unauthorize error handled
Change-Id: I12c6937ed1660af097d6930fe2a90fac5f298311
2020-02-10 11:14:51 +00:00
Cameron Ayer
13903449c7 satellite/accounting: fix flaky TestProjectUsageStorage
Sometimes the upload that is supposed to fail due to excess usage
would pass. This looks to be because it's overwriting another object
uploaded earlier in the test and deleting the old pointer. If tally
happened to run after the pointer is deleted but before the current
upload reaches the live accounting check, it might pass through.
The solution is to upload to a different path each time.

Change-Id: Ie6c825b9c6eab9ed53426ae262e7997bcb6beb7f
2020-02-07 20:58:24 -05:00
Yingrong Zhao
f151a0b9e1 installer/windows: add service dependency for storagenode
When system restarts, local dns resolver may not be ready before our
application starts up. Adding a dependency for dns service will help
prevent dns lookup not available error for storagenode on system reboot.

Change-Id: Ie4be2813736e377df551fd8190f2247d3ae05ccd
2020-02-07 17:05:30 +00:00
Cameron Ayer
75355547c2 satellite/satellitedb: don't include GET_AUDIT and GET_REPAIR with chargeable BW
In the methods we use to retrieve a user's chargeable BW, we were summing GET, GET_AUDIT,
and GET_REPAIR. We only want to charge for GET

Change-Id: Icead7695494b22c7c835482cf8b1512a980d59f1
2020-02-07 12:02:44 +00:00
Egon Elbre
34f38bf6ce mod: upgrade miniredis to latest
miniredis 2.5.0 had a bug with matching keys with newlines.

Change-Id: I9bcf998459be6d7d4e03bca3589e989e5ed2304d
2020-02-06 13:31:17 +00:00
NikolaiYurchenko
8147d6ccce web/storagenode: chart date bug fixed
Change-Id: Id781bb8973958510e390286a8e6d4d79e3a36725
2020-02-06 13:53:19 +02:00
Jeff Wendling
7999d24f81 all: use monkit v3
this commit updates our monkit dependency to the v3 version where
it outputs in an influx style. this makes discovery much easier
as many tools are built to look at it this way.

graphite and rothko will suffer some due to no longer being a tree
based on dots. hopefully time will exist to update rothko to
index based on the new metric format.

it adds an influx output for the statreceiver so that we can
write to influxdb v1 or v2 directly.

Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff
2020-02-05 23:53:17 +00:00
Isaac Hess
17580fdf57 storagenode/pieces: Add test to cache store
This test checks that we are actually walking over the pieces when
starting the cache, and that it is returning expected values.

A recent outage was partially caused by the fact that this cache was
accidentally reading itself (via the pieces store, which has the cache
embedded). This test ensures that does not happen, and checks that when
the cache's `Run` method is called, the space used values are read from
disk and accurately update the cache.

Change-Id: I9ec61c4299ed06c90f79b17de3ffdbbb06bc502e
2020-02-05 11:39:06 -07:00
igor gaidaienko
efa0f6d443 storagenode/monitor: set MinimumDiskSpace default to 500GB.
As a workaround it was set to 0 in previous release. Now according to the TOC must be set to 500GB.

Change-Id: Ia2743d49e86683396958aff51b95df743af4f872
2020-02-04 15:55:42 +00:00
NikolaiYurchenko
384bdf1f58 web/storagenode: code refactoring pt.1
Change-Id: I948261c61e7ed7f9703a85314d7a14ca9a59b16d
2020-02-04 12:06:29 +00:00
Egon Elbre
91a480f5a0 Jenkins: add storagenode npm checks
Change-Id: I93e3cc009c628e3c97a24541e7b01c75a342bda6
2020-02-03 20:35:33 +02:00
NikolaiYurchenko
109d733dde web/storagenode: npm dependencies updated
Change-Id: I660d65cc38171d94d21225da8d1b82a08850eb27
2020-02-03 18:15:17 +00:00
Jessica Grebenschikov
dd9d18f152 upgrade drpc so that we have the monkit metric capability
Change-Id: Icdd08478aeff4fbd7148975eca8a21fac41289d7
2020-02-03 17:05:54 +00:00
Egon Elbre
9e5679fdaa storagenode/console/consoleserver: set content-type manually
http.FileServer relies on mime types defined in the operating system.
These values may be misconfigured, so a javascript file might
end up being served as "plain/text".

Change-Id: I3c13c8a9ac484bd765a4de0f8253bfe40dde7513
2020-02-03 15:37:47 +02:00
Jeff Wendling
bd78945116 statreceiver: add v2/v3 splitter and downgrade
this allows for us to roll out monkit v3 while keeping the
old v2 code and dependencies working.

Change-Id: I0758ee2bb66685b0819502368fc2c20cb35b958a
2020-02-02 22:56:14 +00:00
Egon Elbre
eaf3318a58 diagrams: satellite graph per process
Currently the whole satellite diagram can be quite overwhelming.
This change makes graphs for api, core and repair processes separately.

Change-Id: Iea906f51c3bcc46c71d7c8f6d8964034b317b3b4
2020-02-01 19:56:15 +00:00
Jeff Wendling
d20db90cff private/dbutil/txutil: create new transactions for retries
it was noticed that if you had a long lived transaction A that
was blocking some other transaction B and A was being aborted
due to retriable errors, then transaction B was never given
priority. this was due to using savepoints to do lightweight
retries.

this behavior was problematic becaue we had some queries blocked
for over 16 hours, so this commit addresses the issue with two
prongs:

    1. bound the amount of time we will retry a transaction
    2. create new transactions when a retry is needed

the first ensures that we never wait for 16 hours, and the value
chosen is 10 minutes. that should be long enough for an ample
amount of retries for small queries, and huge queries probably
shouldn't be retried, even if possible: it's more preferrable to
find a way to make them smaller.

the second ensures that even in the case of retries, queries that
are blocked on the aborted transaction gain priority to run.

between those two changes, the maximum stall time due to retries
should be bounded to around 10 minutes.

Change-Id: Icf898501ef505a89738820a3fae2580988f9f5f4
2020-02-01 18:34:28 +00:00
Jeff Wendling
71ff044edb storagenode/bandwidth: fix tests to not fail for 10 hours near the end of the month
Change-Id: I390569a8702164c42edddd3be020e93782227c2e
2020-01-31 16:25:52 -07:00
Jeff Wendling
03166d6be3 storagenode/piecestore: log available bandwidth and space on uploads
Change-Id: Ia92228cb2a178da45f4f123b48c476e5ec821fe8
2020-01-31 19:47:14 +00:00
Egon Elbre
97d360afd1 satellite/satellitedb: use correct type
Array was using a smaller type integer.

Change-Id: I025d61b6cea9869efa0b4ac1d24265356491f6dc
2020-01-31 13:00:14 -05:00
VitaliiShpital
ae3eae31dc web/storagenode: chart point's hit radius extended
Change-Id: Ib231d7970b4fe9e196638d3a09d2ae96e6c4efc5
2020-01-31 17:37:37 +00:00
VitaliiShpital
d3fe122d9b web/satellite: project/billing dropdown bug fix
Change-Id: I1276ad20bd4efc011a1e705c5401260779b7b610
2020-01-31 17:09:54 +00:00
VitaliiShpital
2cd233e1f3 web/storagenode: dashboard content header reworked
Change-Id: I725efffc8aecf0b8b53c9ef86c59fbcb065ecc04
2020-01-31 16:54:49 +00:00