Commit Graph

3737 Commits

Author SHA1 Message Date
Moby von Briesen
c4a9a5d48b satellite/downtime: update detection and estimation downtime chores for
more trustworthy downtime tracking

Detection chore: Do not update downtime at all from the detection chore.
We only want to include downtime between two explicitly failed ping attempts
(the duration between last contact success and the first failed ping is no longer
included in downtime calculation)

Estimation chore: If the satellite started after the last failed ping for a node,
do not include offline time since the last failed ping time - only
estimate based on two failed pings with no satellite downtime in
between.
This protects us from including satellite downtime in our storagenode downtime calculations.

Change-Id: I1fddc9f7255a7023e02474255d70c64faae75b8a
2020-02-10 22:37:01 +00:00
Jeff Wendling
99c3ba5bbf testplanet: log stack trace for error during creation
Change-Id: Ifcd2cba4195413a7213ba4d113c43f9fb3cbc3e5
2020-02-10 21:59:20 +00:00
Jeff Wendling
bde302fdb8 jenkins: disable npm audit as its failing everything
Change-Id: I399e73b86144588758e50bdebbd0e65e8052c8b6
2020-02-10 13:32:40 -07:00
VitaliiShpital
a90955eced web/satellite: billing history refreshing bug fix
Change-Id: I221e2dc13179b9368dea1c7e1a46b8b49b79c729
2020-02-10 18:56:43 +02:00
littleskunk
68d5b1d6ec build/jenkins: allow -rc release tags
Change-Id: Ib1942a8029313debe672eaae6b4de835b74d5035
2020-02-10 12:01:18 +00:00
VitaliiShpital
55a3a90391 web/satellite: uplink CLI docs link behaviour on API keys page reworked
Change-Id: I5565c3c8e6e55720c2cbf527aa37b6d881047818
2020-02-10 11:35:54 +00:00
NikolaiYurchenko
6679036ace web/satellite: unauthorize error handled
Change-Id: I12c6937ed1660af097d6930fe2a90fac5f298311
2020-02-10 11:14:51 +00:00
Cameron Ayer
13903449c7 satellite/accounting: fix flaky TestProjectUsageStorage
Sometimes the upload that is supposed to fail due to excess usage
would pass. This looks to be because it's overwriting another object
uploaded earlier in the test and deleting the old pointer. If tally
happened to run after the pointer is deleted but before the current
upload reaches the live accounting check, it might pass through.
The solution is to upload to a different path each time.

Change-Id: Ie6c825b9c6eab9ed53426ae262e7997bcb6beb7f
2020-02-07 20:58:24 -05:00
Yingrong Zhao
f151a0b9e1 installer/windows: add service dependency for storagenode
When system restarts, local dns resolver may not be ready before our
application starts up. Adding a dependency for dns service will help
prevent dns lookup not available error for storagenode on system reboot.

Change-Id: Ie4be2813736e377df551fd8190f2247d3ae05ccd
2020-02-07 17:05:30 +00:00
Cameron Ayer
75355547c2 satellite/satellitedb: don't include GET_AUDIT and GET_REPAIR with chargeable BW
In the methods we use to retrieve a user's chargeable BW, we were summing GET, GET_AUDIT,
and GET_REPAIR. We only want to charge for GET

Change-Id: Icead7695494b22c7c835482cf8b1512a980d59f1
2020-02-07 12:02:44 +00:00
Egon Elbre
34f38bf6ce mod: upgrade miniredis to latest
miniredis 2.5.0 had a bug with matching keys with newlines.

Change-Id: I9bcf998459be6d7d4e03bca3589e989e5ed2304d
2020-02-06 13:31:17 +00:00
NikolaiYurchenko
8147d6ccce web/storagenode: chart date bug fixed
Change-Id: Id781bb8973958510e390286a8e6d4d79e3a36725
2020-02-06 13:53:19 +02:00
Jeff Wendling
7999d24f81 all: use monkit v3
this commit updates our monkit dependency to the v3 version where
it outputs in an influx style. this makes discovery much easier
as many tools are built to look at it this way.

graphite and rothko will suffer some due to no longer being a tree
based on dots. hopefully time will exist to update rothko to
index based on the new metric format.

it adds an influx output for the statreceiver so that we can
write to influxdb v1 or v2 directly.

Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff
2020-02-05 23:53:17 +00:00
Isaac Hess
17580fdf57 storagenode/pieces: Add test to cache store
This test checks that we are actually walking over the pieces when
starting the cache, and that it is returning expected values.

A recent outage was partially caused by the fact that this cache was
accidentally reading itself (via the pieces store, which has the cache
embedded). This test ensures that does not happen, and checks that when
the cache's `Run` method is called, the space used values are read from
disk and accurately update the cache.

Change-Id: I9ec61c4299ed06c90f79b17de3ffdbbb06bc502e
2020-02-05 11:39:06 -07:00
igor gaidaienko
efa0f6d443 storagenode/monitor: set MinimumDiskSpace default to 500GB.
As a workaround it was set to 0 in previous release. Now according to the TOC must be set to 500GB.

Change-Id: Ia2743d49e86683396958aff51b95df743af4f872
2020-02-04 15:55:42 +00:00
NikolaiYurchenko
384bdf1f58 web/storagenode: code refactoring pt.1
Change-Id: I948261c61e7ed7f9703a85314d7a14ca9a59b16d
2020-02-04 12:06:29 +00:00
Egon Elbre
91a480f5a0 Jenkins: add storagenode npm checks
Change-Id: I93e3cc009c628e3c97a24541e7b01c75a342bda6
2020-02-03 20:35:33 +02:00
NikolaiYurchenko
109d733dde web/storagenode: npm dependencies updated
Change-Id: I660d65cc38171d94d21225da8d1b82a08850eb27
2020-02-03 18:15:17 +00:00
Jessica Grebenschikov
dd9d18f152 upgrade drpc so that we have the monkit metric capability
Change-Id: Icdd08478aeff4fbd7148975eca8a21fac41289d7
2020-02-03 17:05:54 +00:00
Egon Elbre
9e5679fdaa storagenode/console/consoleserver: set content-type manually
http.FileServer relies on mime types defined in the operating system.
These values may be misconfigured, so a javascript file might
end up being served as "plain/text".

Change-Id: I3c13c8a9ac484bd765a4de0f8253bfe40dde7513
2020-02-03 15:37:47 +02:00
Jeff Wendling
bd78945116 statreceiver: add v2/v3 splitter and downgrade
this allows for us to roll out monkit v3 while keeping the
old v2 code and dependencies working.

Change-Id: I0758ee2bb66685b0819502368fc2c20cb35b958a
2020-02-02 22:56:14 +00:00
Egon Elbre
eaf3318a58 diagrams: satellite graph per process
Currently the whole satellite diagram can be quite overwhelming.
This change makes graphs for api, core and repair processes separately.

Change-Id: Iea906f51c3bcc46c71d7c8f6d8964034b317b3b4
2020-02-01 19:56:15 +00:00
Jeff Wendling
d20db90cff private/dbutil/txutil: create new transactions for retries
it was noticed that if you had a long lived transaction A that
was blocking some other transaction B and A was being aborted
due to retriable errors, then transaction B was never given
priority. this was due to using savepoints to do lightweight
retries.

this behavior was problematic becaue we had some queries blocked
for over 16 hours, so this commit addresses the issue with two
prongs:

    1. bound the amount of time we will retry a transaction
    2. create new transactions when a retry is needed

the first ensures that we never wait for 16 hours, and the value
chosen is 10 minutes. that should be long enough for an ample
amount of retries for small queries, and huge queries probably
shouldn't be retried, even if possible: it's more preferrable to
find a way to make them smaller.

the second ensures that even in the case of retries, queries that
are blocked on the aborted transaction gain priority to run.

between those two changes, the maximum stall time due to retries
should be bounded to around 10 minutes.

Change-Id: Icf898501ef505a89738820a3fae2580988f9f5f4
2020-02-01 18:34:28 +00:00
Jeff Wendling
71ff044edb storagenode/bandwidth: fix tests to not fail for 10 hours near the end of the month
Change-Id: I390569a8702164c42edddd3be020e93782227c2e
2020-01-31 16:25:52 -07:00
Jeff Wendling
03166d6be3 storagenode/piecestore: log available bandwidth and space on uploads
Change-Id: Ia92228cb2a178da45f4f123b48c476e5ec821fe8
2020-01-31 19:47:14 +00:00
Egon Elbre
97d360afd1 satellite/satellitedb: use correct type
Array was using a smaller type integer.

Change-Id: I025d61b6cea9869efa0b4ac1d24265356491f6dc
2020-01-31 13:00:14 -05:00
VitaliiShpital
ae3eae31dc web/storagenode: chart point's hit radius extended
Change-Id: Ib231d7970b4fe9e196638d3a09d2ae96e6c4efc5
2020-01-31 17:37:37 +00:00
VitaliiShpital
d3fe122d9b web/satellite: project/billing dropdown bug fix
Change-Id: I1276ad20bd4efc011a1e705c5401260779b7b610
2020-01-31 17:09:54 +00:00
VitaliiShpital
2cd233e1f3 web/storagenode: dashboard content header reworked
Change-Id: I725efffc8aecf0b8b53c9ef86c59fbcb065ecc04
2020-01-31 16:54:49 +00:00
Isaac Hess
78d0868bc9 storagenode/pieces: Log error if cannot calculate piece size
Change-Id: I33b49315a0f6044a801a8b118e6b61dbcd751bfe
2020-01-31 09:57:44 -05:00
Egon Elbre
d0b4272467 storagenode: fix global logger in tests
https://github.com/storj/storj/wiki/Testing#logging

Change-Id: Ic6a31360bcfedae3f37f6b2536a345f00e33cd78
2020-01-31 14:09:28 +00:00
Isaac Hess
2968857e21 storagenode/pieces: Prevent recalculate from having negative numbers
Change-Id: Iafd2bcb9963e85508cb5e2bd69f229d89c589a6c
2020-01-30 17:47:54 -05:00
Michal Niewrzal
a181e0b627 libuplink: adjust tests to changes in encryption store
We move PathCipher to encryption.Store and we need to adjust
storj/uplink for those changes. Uplink repo is also using libuplink to
run tests so we need first adjust storj/storj libuplink and later
storj/uplink.

Change-Id: I84f23e6bad18ac139f72c19939dc526f9f46d88b
2020-01-30 22:00:24 +00:00
Egon Elbre
81d44f19ee storage/filestore: ensure we bail on deleted folder without error
Change-Id: Iecf5f9ac5bc278489b433923c526d60611d356a4
2020-01-30 16:32:10 -05:00
paul cannon
157b8c4d71 storagenode/pieces: accumulate errors in traversal
instead of aborting on the first error, so that we can hit all
satellites and get the best numbers we can

Change-Id: I21d5163884940612d7d39eaf73a6fac07235cd9e
2020-01-30 19:31:29 +00:00
Isaac Hess
5a053483b7 storagenode/pieces: read trash from blobstore
Change-Id: Ib134e63a13b8a5dda5d6a9ead42013ce18411227
2020-01-30 13:30:48 -05:00
Isaac Hess
4dafd03f11 storagenode: Prevent negative values in piece_space_used, migrate negatives to 0
Change-Id: Ibd663db087058c928190aa52c520f22e9338dd04
2020-01-30 13:03:18 -05:00
Isaac Hess
00fc192f6b storagenode/pieces: Explicitly walk satellite pieces in SpaceUsedTotalAndBySatellite
Change-Id: I7ff9a1120d4ced0b5cba7d7765ef8aed7a1edae0
2020-01-30 12:01:50 -06:00
Jeff Wendling
21b65ca3b0 storagenode/storagenodedb: migrate to set total to content_size
Change-Id: I4906c2fe9cdb3a32c045c98039d4bde6b8b809e3
2020-01-30 08:53:12 -07:00
Moby von Briesen
006a2824ba satellite/repair: lock monkit stats in checker and repairer
Change-Id: Ia10fc8da0177389a500359ce51d21a5806f3f7b1
2020-01-30 14:09:56 +00:00
Egon Elbre
8dea4f52db satellite: add control panel
Change-Id: Id48246e9bcd4c6ec643277fe740937b2e42ad85b
2020-01-30 08:06:43 -05:00
Egon Elbre
4e2bf81719 pkg/debug: add better title
Change-Id: Icc6114f4e7523cfe6c7984ef1f6eec664ae4ee65
2020-01-30 07:49:40 -05:00
littleskunk
81eddaa2c1
storagenode/monitor: reduce space requirement to 0
We have added a bug with v0.31.7 and deploying it would kick out all the
storage nodes that are full. Easy fix is setting the requirment to 0.
That will allow them to still start up even if they are full.

Change-Id: Ie66f369952d929fcfd47f44f6e5e57eea8f51ff6
2020-01-30 01:44:45 +01:00
Moby von Briesen
8c19855871 scripts/tests/rollingupgrade: explicitly set debug port for old
satellite api during rolling upgrade test

The old api is using the same config file as the new satellite in the
rolling upgrade test, so we need to set it to something different so
that there is no conflict when we spin up a new storj-sim instance while
the old api is running concurrently.

Change-Id: Ia4ec2db4953f36f43275495710992831ad3916a2
2020-01-29 18:32:03 -05:00
Egon Elbre
d10d6fd153 storagenode,satellite: ignore error on listening debug port
Change-Id: Id3a6d153535776ce41f8edf2bd6f6dad5e2a60bf
2020-01-29 18:06:02 -05:00
crawter
0b898c48a4 satellite/payments: coupons expiration logic fix
Change-Id: Ic8cc4e117957a75a3eb057075204a5b592e62ff4
2020-01-30 00:22:38 +02:00
Egon Elbre
10be538602 storagenode: add pkg/debug support
Change-Id: If941095b886c28a0d53fff4c9bf9fa0ce7471dea
2020-01-29 16:30:31 -05:00
Egon Elbre
a2b2bc676b pkg/debug: implement control panel
Control Panel allows to control different chores and services.
Currently this adds controlling of cycles.

Change-Id: I734f1676b2a0d883b8f5ba937e93c45ac1a9ce21
2020-01-29 16:30:31 -05:00
Egon Elbre
f237d70098 storagenode,satellite: use pkg/debug
Use debug.Server in storage node and satellite for customizing debug server.

Change-Id: I7979412376d028cadf29656d838ab94f18e2aa99
2020-01-29 16:30:31 -05:00
Egon Elbre
f833289e1b pkg/debug: separate debug endpoints to a server
By separating Server it allows Peers to directly embed the server
and provide customizations and hooks into rest of the services.

Change-Id: Ic1d68740fd494d2f82c1739bd990849c561b912b
2020-01-29 16:30:31 -05:00