Commit Graph

22 Commits

Author SHA1 Message Date
Cameron Ayer
3e343b683b cmd/segment-reaper: add metrics for zombie segments count
Change-Id: I106c6795946283165ba3de8465e5898346da1a3f
2020-08-26 18:42:59 +00:00
Moby von Briesen
959cd5cd83 satellite/satellitedb: Update audit history from overlay.UpdateStats and overlay.BatchUpdateStats
Change-Id: Ib530b61895ca4a8b12ba022c408a416b237b56d7
2020-08-20 22:46:28 +00:00
Isaac Hess
67a292d135 satellite/satellitedb: Monitor node tallies
We are adding a monkit evaluation for the total sum of data stored on
the nodes before it is inserted into the database. This will give us a
time-series history of total data stored so we can see it change over
time.

Change-Id: I41145a2d7a09c8e63b42ae578bd081035b60e529
2020-07-17 10:21:42 -06:00
Moby von Briesen
0b109c32e4 storagenode/piecestore/usedserials: add monkit metric for serials that are randomly deleted
This will give storagenode operators a better idea of whether the memory
allocated to the usedserials store is sufficient.

Change-Id: I5c30f2e39473a573f43409511ad9e2e32680479c
2020-06-09 17:04:37 -04:00
Moby von Briesen
290c006a10 satellite/repair/{checker,queue}: add metric for new segments added to repair queue
* add monkit stat new_remote_segments_needing_repair, which reports the
number of new unhealthy segments in the repair queue since the previous
checker iteration

Change-Id: I2f10266006fdd6406ece50f4759b91382059dcc3
2020-05-27 06:23:47 +00:00
Moby von Briesen
178aa8b5e0 satellite/{metainfo,repair}: Delete expired segments from metainfo
* Delete expired segments in expired segments service using metainfo
loop
* Add test to verify expired segments service deletes expired segments
* Ignore expired segments in checker observer
* Modify checker tests to verify that expired segments are ignored
* Ignore expired segments in segment repairer and drop from repair queue
* Add repair test to verify that a segment that expires after being
added to the repair queue is ignored and dropped from the repair queue

Change-Id: Ib2b0934db525fef58325583d2a7ca859b88ea60d
2020-04-22 13:02:31 +00:00
paul cannon
ba5991dc86 satellite/repair: add monitoring for remote_segments_healthy_percentage
Change-Id: I6ad29fe1a947ac19d15e40ea33164a510eb33d4f
2020-03-17 17:45:59 +00:00
Moby von Briesen
8b72181a1f satellite/{audit,overlay,satellitedb}: implement unknown audit reputation and suspension
* change overlay.UpdateStats to allow a third audit outcome. Now it can
handle successful, failed, and unknown audits.
* when "unknown audit reputation"
(unknownAuditAlpha/(unknownAuditAlpha+unknownAuditBeta)) falls below the
DQ threshold, put node into suspension.
* when unknown audit reputation goes above the DQ threshold, remove node
from suspension.
* record unknown audits from audit reporter.
* add basic tests around unknown audits and suspension.

Change-Id: I125f06f3af52e8a29ba48dc19361821a9ff1daa1
2020-03-16 20:29:26 +00:00
paul cannon
79553059cb satellite/repair: put irreparable segments in irreparableDB
Previously, we were simply discarding rows from the repair queue when
they couldn't be repaired (either because the overlay said too many
nodes were down, or because we failed to download enough pieces).

Now, such segments will be put into the irreparableDB for further
and (hopefully) more focused attention.

This change also better differentiates some error cases from Repair()
for monitoring purposes.

Change-Id: I82a52a6da50c948ddd651048e2a39cb4b1e6df5c
2020-03-09 21:45:16 +00:00
Jennifer Johnson
1c1750e6be removes bandwidth limiting
On satellite, remove all references to free_bandwidth column in nodes table.
On storage node, remove references to AllocatedBandwidth and MinimumBandwidth and mark as deprecated.

Protobuf message, NodeCapacity, is left intact for backwards compatibility.
Once this is released to all satellites, we can drop the column from the DB.

Change-Id: I2ff6c6537fc9008a0c5588e951afea58ede85838
2020-03-04 14:04:00 +00:00
Moby von Briesen
6043d01c90 satellite/audit/verifier: add metric for number of successfully downloaded shares
Change-Id: Ia4f1dc6e088db802e340aaecf80cc7ef6dc237a4
2020-02-27 14:33:59 +00:00
Moby von Briesen
d5540c89a1 satellite/repair/checker: add monkit metrics for segments immediately above repair threshold
Record counts for segments at health=rt+1 through health=rt+5 for every checker
iteration.

Change-Id: I2a00c0bc34d17beb21cacdeab4dac77f755faefe
2020-02-26 20:27:15 +00:00
Ethan
208c05e3db Add metrics to track rate limit.
Add monkit metric for the rate-limit when the rate limit is hit
Logs warning with projectID

https://storjlabs.atlassian.net/browse/SM-165

Change-Id: I352dc40006021990d1bc66a999f62bbf8deb54db
2020-02-11 14:02:12 +00:00
Moby von Briesen
006a2824ba satellite/repair: lock monkit stats in checker and repairer
Change-Id: Ia10fc8da0177389a500359ce51d21a5806f3f7b1
2020-01-30 14:09:56 +00:00
Egon Elbre
082ec81714
uplink: move to storj.io/uplink (#3746) 2020-01-08 15:40:19 +02:00
Yingrong Zhao
7af42e3c10
satellite/metainfo, satellite/repair, uplink/eestream: add metric for download failed due to not enough pieces available (#3665) 2019-12-04 16:24:36 -05:00
Isaac Hess
a6235d3962 storage/filestore: Monitor when we open files in trash
Change-Id: I817bf8349c2e1ba55e1490f06162af1099bebdb0
2019-11-26 14:38:49 -07:00
Rafael Antonio Ribeiro Gomes
2739771761
storagenode: add bandwidth metrics (#3623)
* storagenode: add bandwidth metrics

* remove unecessary metric
2019-11-21 16:51:40 -03:00
Rafael Antonio Ribeiro Gomes
da39c71d35
storagenode: add new metric satellite.request (#3610)
* storagenode: add new metric satellite.request

* storagenode: metrics fixed

* switch from Counter to Meter
2019-11-19 18:11:31 -03:00
Maximillian von Briesen
8653dda2b1 satellite/audit: do not contain nodes for unknown errors (#3592)
* skip unknown errors (wip)

* add tests to make sure nodes that time out are added to containment

* add bad blobs store

* call "Skipped" "Unknown"

* add tests to ensure unknown errors do not trigger containment

* add monkit stats to lockfile

* typo

* add periods to end of bad blobs comments
2019-11-19 17:30:28 +01:00
Yingrong Zhao
69b0ae02bf
satellite/gracefulexit: separate functional code in endpoint (#3476) 2019-11-08 13:57:51 -05:00
Natalie Villasana
cf430d2d73
scripts: add check-monitoring script to detect changes to monkit calls (#3114) 2019-10-15 13:00:14 -04:00