The admission/v3 protocol now supports arbitrary key/value headers to be
included in each packet of metrics. This commit creates support for
this, so the lua config file can declare a filter taking into account
the key/value headers.
Change-Id: I41de8c018d33304ccf46ec221ae689d55c5fb1ee
This is now done from the new storj/gateway repo by another Jenkins job.
Ideally, we should remove everything related to the gateway from this
Makefile, but we cannot do it at the moment due to an all-on-one test
that depends on the gateway.
Change-Id: Ibc722bcee20c9f58b4864583ee3245455b45c7ba
My understanding is that the nodes table has the following fields:
- `address` field which can be a hostname or an IP
- `last_net` field that is the /24 subnet of the IP resolved from the address
This PR does the following:
1) add back the `last_ip` field to the nodes table
2) for uplink operations remove the calls that the satellite makes to `lookupNodeAddress` (which makes the DNS calls to resolve the IP from the hostname) and instead use the data stored in the nodes table `last_ip` field. This means that the IP that the satellite sends to the uplink for the storage nodes could be approx 1 hr stale. In the short term this is fine, next we will be adding changes so that the storage node pushes any IP changes to the satellite in real time.
3) use the address field for repair and audit since we want them to still make DNS calls to confirm the IP is up to date
4) try to reduce confusion about hostname, ip, subnet, and address in the code base
Change-Id: I96ce0d8bb78303f82483d0701bc79544b74057ac
We missed this in the migration that added the num_healthy_pieces
column. It exists in dbx, but not on the actual satellite table.
Change-Id: If16b5ec2325d56406250298531b3285215188bf3
Metainfo method validateAuth checks things like API key, user permission
and rate limit but at the end all errors were returned as
rpcstatus.Unauthenticated.
Old Metainfo is not touched to avoid backward compatibility issues.
Change-Id: I78eb276210fc50151da58a5c84e13ecd0961da29
Previously, we were simply discarding rows from the repair queue when
they couldn't be repaired (either because the overlay said too many
nodes were down, or because we failed to download enough pieces).
Now, such segments will be put into the irreparableDB for further
and (hopefully) more focused attention.
This change also better differentiates some error cases from Repair()
for monitoring purposes.
Change-Id: I82a52a6da50c948ddd651048e2a39cb4b1e6df5c
In many cases when a storagenode fails the preflight check, it is due to
test_table existing, which is used to determine read/write capabilities
after the initial schema verification. If preflight ends early due to a
failure or stopped storagenode, it may not get the chance to drop this
table.
This change excludes test_table from the schema comparison to ensure
that it never prevents a storagenode from starting up.
It also adds Preflight DB test for storagenode.
Change-Id: Ib8e71df2e42fda3b2a364fbf7a801891c5831d39
New API has limited number of options to configure at the moment. We
should remove unused flags from Uplink CLI and add if needed in the
future.
Change-Id: Icf3f3dadd43cb61a3b408b02d0762aef34425dbf
In production, the satellite is overriding the default repair threshold
(35) to a higher value (52). In some places in the checker and
irreparable processes, the repair threshold on the redundancy scheme is
used in place of the override value. This fixes those cases.
Change-Id: Ie7387217d9fb3886f050b5e5b67be51f276196de
The migration was broken into one migration per table to reduce table locking and reduce the
chances of failure due to SQL timeouts.
Of the 14 fields that lacked time zones, only the 3 named 'interval_start` seemed to have non-UTC data in them.
These fields are fixed in the migration by removing the +00 and adding AT TIME ZONE current_setting('TIMEZONE')
Field with good data are migrated by adding AT TIME ZONE 'UTC'
Note that postgres's timezone() is different than cockroach's timezone() so AT TIME ZONE is used.
https://storjlabs.atlassian.net/browse/SM-104
Change-Id: I410f2f1d7c11b143f17844347f37e6f4b1e70fce
- Previously, checkSegmentAltered only checked for segments that were replaced
but we want to detect all changes to a segment that occurred while an audit was being conducted.
- Fixed a bug where nodes failing audits during reverify for non-piece-hash-verified
segments were not being removed from containment mode.
- Filled in gaps in reverify testing to ensure nodes are properly removed from containment.
Change-Id: Icd96d369278987200fd28581395725438972b292
The billing tests were flaky because some assertions ran before the
storage nodes finish their work.
A new helper function in testplanet has been added to allow to wait for
storage nodes endpoints to finish their work. This function now it's
used in the billing tests for avoiding their flakiness.
This commit closes the ticket:
https://storjlabs.atlassian.net/browse/SM-403
A part of fixing other billing tests flakiness.
Change-Id: Iacb750af435f515c04b1e1d3510a218d184c9abc
After calling uplink.Upload it is not guaranteed that the
storage node has yet saved all the orders since it happens
asynchronously. Hence we need a separate func to wait
for them to complete.
Change-Id: I0c34b3ea6c98dbcf37f80493c0e10a8bdbbb2aaf
uplinks
New libuplink is not storing encryption values in with bucket but old
uplinks are using those values for configuration. If bucket was created
with new libuplink we will send back satellite defaults.
Change-Id: Ie1bf3682847e07b302270b4c4bf1a7219f4bf011
Submit an order limit with a high amount but the order has a low amount of traffic.
Make sure the order amount is used for billing.
Change-Id: I6b6ae26e9b8896f4a3acf530b2f48510b6df89cc