For the storagenode usage graph currently,
1. the graph is still likely to contain spikes on the first day of
the month for a newly setup storagenode because there's no previous
interval_end_time to deduct from.
2. if a node goes offline for too long, say the last usage report
in the storageusage cache has an interval_end_time of
2020-07-30 00:00:00+00:00 and later, it comes back online a few
days later, it requests for the storage usage from the satellite
starting from the current month, say 2021-01-01 00:00:00+00:00,
the calculated hours for the first day would be 48 hours and it
could be wrong because the cache is missing one day usage report.
This PR addresses second issue on the storagenode side by requesting
storage usage data, instead of just a month boundary, request for an
interval starting from the last day of the previous month to the
current day of the current month.
The first one will be a tradeoff and wouldn't really matter since
it will just be an issue on the first day the storagenode joined
the satellite.
Updates https://github.com/storj/storj/issues/4178
Change-Id: I041c56c412030ce013dd77dce11b0b5d6550927b
The satellite now returns the last interval_end_time for each
daily storage usage.
We need to store the interval_end_time in the storage usage cache.
Also, renamed interval_start to timestamp to avoid ambiguity since
the interval_start only stores just the date/day returned by the
satellite.
Updates https://github.com/storj/storj/issues/4178
Change-Id: I94138ba8a506eeedd6703787ee03ab3e072efa32
errs.Class should not contain "error" in the name, since that causes a
lot of stutter in the error logs. As an example a log line could end up
looking like:
ERROR node stats service error: satellitedbs error: node stats database error: no rows
Whereas something like:
ERROR nodestats service: satellitedbs: nodestatsdb: no rows
Would contain all the necessary information without the stutter.
Change-Id: I7b7cb7e592ebab4bcfadc1eef11122584d2b20e0
Full scope:
storagenode/{console,nodestats,notifications,reputation,storagenodedb},
web/storagenode
These columns are deprecated. They used to be for the uptime reputation
system which has been replaced by downtime tracking with audits.
Change-Id: I151d6569577d89733ac97af21a1d885323522b21
Full prefix: satellite/{overlay,nodestats},storagenode/{reputation,nodestats}
Allow the storagenode to receive its audit history data from the
satellite via the satellite's GetStats endpoint.
The storagenode does not save this data for use in the API yet.
Change-Id: I9488f4d7a4ccb4ccf8336b8e4aeb3e5beee54979
Nodes are receiving an error that heldamount rpc doesn't exist on a
satellite. This simply adds which satellite to the error.
Change-Id: I7708e0511b55fdd2425969db2a545645339bad81
Most places now need the NodeURL rather than the ID and Address
separately. This simplifies code in multiple places.
Change-Id: I52621d8ca52296a8b5bf7afbc1001cf8bfb44239
* Add migration to storagenode reputation table to add suspended
timestamp
* Send suspended info to storagenode from satellite nodestats endpoint
* Add suspended status to storagenode api
* Add an indicator on the storagenode dashboard informing operator of
the satellites the node is suspended on
Change-Id: Ie3669f6069cc0258ba76ec99d17006e1b5fd9c8a
this commit updates our monkit dependency to the v3 version where
it outputs in an influx style. this makes discovery much easier
as many tools are built to look at it this way.
graphite and rothko will suffer some due to no longer being a tree
based on dots. hopefully time will exist to update rothko to
index based on the new metric format.
it adds an influx output for the statreceiver so that we can
write to influxdb v1 or v2 directly.
Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff
When error is formatted using %v it's not possible to check
whether the error was caused by a context cancellation.
Change-Id: Ia77dfb0817e49d9a7b168c12a6300d131007d0ee
all of the packages and tests work with both grpc and
drpc. we'll probably need to do some jenkins pipelines
to run the tests with drpc as well.
most of the changes are really due to a bit of cleanup
of the pkg/transport.Client api into an rpc.Dialer in
the spirit of a net.Dialer. now that we don't need
observers, we can pass around stateless configuration
to everything rather than stateful things that issue
observations. it also adds a DialAddressID for the
case where we don't have a pb.Node, but we do have an
address and want to assert some ID. this happened
pretty frequently, and now there's no more weird
contortions creating custom tls options, etc.
a lot of the other changes are being consistent/using
the abstractions in the rpc package to do rpc style
things like finding peer information, or checking
status codes.
Change-Id: Ief62875e21d80a21b3c56a5a37f45887679f9412
* storagenode/nodestats: fix issue on 32 bit platforms
time.Duration is an int64, so casting it down to an int
can cause it to become negative, causing a panic.
Change-Id: I33da7c29ddd59be60d8deec944a25f4a025902c7
* storagenode/nodestats: fix lint issue in test
Change-Id: Ie68598d724d2cae0dc959d4877098a08f4eb9af7