Commit Graph

3733 Commits

Author SHA1 Message Date
Egon Elbre
46228fee92 cmd/gateway: use proper module name
By using a require for storj.io/storj it will make the import
unambiguous. This means it is possible to have a module name
storj.io/storj/cmd/gateway.

Change-Id: I98439cbbaf433ae31309b7f80a19ced896018f65
2020-02-26 21:44:40 +02:00
Egon Elbre
64330c55b3 all: use pbgrpc
common/pb moved grpc to a separate package common/pb/pbgrpc.
This updates this repository to use it.

Change-Id: I2de2a190688871cf9cb61f7ea511f8a01e264e4e
2020-02-26 21:27:47 +02:00
Egon Elbre
8822e98c1f cmd/gateway: simplify module handling
Change-Id: If6ed158a6c9568fa33f69ca2d52e231ee4fcb0cb
2020-02-26 17:59:45 +00:00
Egon Elbre
89e5c77d83 satellite/metainfo: track observer timing
Measure total time spent in each observer and distribution of handling
pointers by pointer type.

Change-Id: I2d125dfce8dbbb17225029fa35557bc106491151
2020-02-26 17:42:56 +00:00
Moby von Briesen
4e5a7f13c7 satellite/repair/queue: Prioritize selection of items off repair queue by segment health
Add a column to the repair queue table in the satellite db for healthy
piece count. When an item is selected from the repair queue, the least
durable segment that has not been attempted in the past hour should be
selected first. This prevents our repairer from getting stuck doing work
on segments that are close to the repair threshold while allowing
segments that are more unhealthy to degrade further.

The migration also clears the repair queue so that the migration runs
quickly and we can properly account for segment health in future repair
work.

We do not select items off the repair queue that have been attempted in
the past six hours. This was changed from on hour to allow us time to
try a wider variety of segments when the repair queue is very large.

Change-Id: Iaf183f1e5fd45cd792a52e3563a3e43a2b9f410b
2020-02-26 09:54:16 -05:00
Yingrong Zhao
ac34485f5d scripts/tests: install correct version of gateway
1. only run release tags that don't contain 'rc'
2. install gateway version that's the same as satellite
3. update gateway access to contain satellite id

Change-Id: I8ca1418302c3aafdf0c4eaaf8361422a1eec2bd4
2020-02-26 13:12:31 +00:00
VitaliiShpital
9a8db05836 web/satellite: updating billing history after render added
Change-Id: Ic7f3d4734d010759ed31bbae330c84f56057f370
2020-02-26 12:18:57 +00:00
NikolaiYurchenko
fc105af0e5 web/satellite: user select text restricted
Change-Id: If3692d55e48255c95b7722c5a574060c84fdf502
2020-02-26 11:13:56 +00:00
Simon Guindon
594d6e03aa docs/blueprints: Add design doc for distributed tracing.
Change-Id: I98f76f857d1a6ccd384adc6287137b46e37b9904
2020-02-25 20:29:05 +00:00
Isaac Hess
e486a073cb docs: Add uplink telemetry doc
Change-Id: I6f47ef4af80d0c76a32dc360f8809a526a4e948f
2020-02-25 17:52:34 +00:00
Jessica Grebenschikov
e19e3c1101 pkg/process:
Now that we are trying to identify the root cause of the satellite load limitations (i.e. currently the satellite has a max ability of 400 rps for uploads and we need this to be higher), we are using the golang diagnostic tools to collect insight into what the bottlenecks are.  We currently have a debug endpoint to gather some cpu and mem data, but it could be useful to have continuous profiling. GCP stackdriver has support for continuous profiling so lets set that up and see if it is helpful to gather more data.

This PR adds support for [GCP continuous profiler](https://cloud.google.com/profiler) which allows enabling continuous cpu/mem profiling and the stats are sent to stackdriver in google cloud console.

To enable the continuous profiling for a storj component, do the following:
- prereq: the workload must be running in GKE and have Stackdriver Profiling IAM role permissions
- provide the config flag `debug.profilename` in the config.yaml file for the workload (i.e. satellite api process, etc). The profilename should be the workload name, for example "satellite-api".
- once the above config flag is provided, the profiler will be initialized and profiling stats will automatically be sent to GCP project where the workload is running and viewable in the Stackdriver Profile page in the console

The current implementation assumes the workload is running in GKE, however if we find if useful we can add support to enable this from anywhere. But for simplicity, its configured this way assuming the main goal is to enable in production systems.

Change-Id: Ibf8ebe2df7bf06fdd4951ee6a1e48854dd36ad47
2020-02-25 09:04:23 -08:00
VitaliiShpital
b387c6b90c web/satellite: overview page implemented
Change-Id: I66a8e17635040730906bd6f9c20924abd0db0744
2020-02-25 14:28:00 +00:00
Egon Elbre
29452d82a5 go.mod: unlock graphql dependency and bump to latest
Change-Id: I40026f6c8de155e024f5fbb51105546065393034
2020-02-25 13:17:49 +02:00
Egon Elbre
9752d01884 private/prompt: remove dependency to go-prompt
Change-Id: Ida8ef731ce806cec076343dc77d72a3b0d7736b4
2020-02-25 13:09:41 +02:00
JT Olio
50a21de9dc traces: fix memory leak for long running traces that aren't being collected
for real this time. i'm so ashamed

Change-Id: Ib05bb50d8e947dec2d872fd53e71eec561c2d0e8
2020-02-24 15:02:26 -07:00
Cameron Ayer
d578102672 storagenode/piecestore: add workgroup to endpoint to prevent stray goroutine after shutdown
Change-Id: Ie8444c3c8f870745b73342de2e9a93027fcad371
2020-02-24 21:38:52 +00:00
paul cannon
92d86fa044 satellite/repair: fix repair concurrency
This new repair timeout (configured as TotalTimeout) will include both
the time to download pieces and the time to upload pieces, as well as
the time to pop the segment from the repair queue.

This is a move from Github PR #3645.

Change-Id: I47d618f57285845d8473fcd285f7d9be9b4318c8
2020-02-24 19:57:09 +00:00
Cameron Ayer
f22bddf122 {storagenode/contact, private/testplanet}: remove ErrFailureToStart and panic in testplanet.Start
Change-Id: I252e8c9407400af7bda95a7657c8154660c3c801
2020-02-24 18:24:23 +00:00
VitaliiShpital
8ea620b3c4 satellite/console: redirecting to login after activation implemented
Change-Id: Ibcf65f5d4664ac41c795f5ceb0a94bcd42673004
2020-02-24 19:52:28 +02:00
Jeff Wendling
f671eb2beb satellite/satellitedb: use queue for orders to get back fast billing
This change adds two new tables to process orders as fast as we used
to but in an asynchronous manner and with hopefully less storage
usage. This should help scale on cockroach, but limits us to one
worker. It lays the groundwork for the order processing pipeline to
be queue rather than database driven.

For more details, see the added fast billing changes blueprint.

It also fixes the orders db so that all the timestamps that are
passed to columns that do not contain a time zone are converted to
UTC at the last possible opportunity, making it less likely to use
the APIs incorrectly. We really should migrate to include timezones
on all of our timestamp columns.

Change-Id: Ibfda8e7a3d5972b7798fb61b31ff56419c64ea35
2020-02-24 17:07:07 +00:00
Qweder93
dca6fcbe28 satellite/payments/stripecoinpayments: credits added to invoice calculations
Change-Id: I6d3f5244a46f8945d2703af39ced333940db34e9
2020-02-24 16:48:27 +00:00
VitaliiShpital
985c3ef897 satellite/console: handling graphql errors bug fix
Change-Id: Ib20786485b0ea448e388912bb8406030d4fae1f7
2020-02-24 16:22:09 +00:00
Egon Elbre
e30f7b35b6 cmd/gateway: use a separate repository
Change-Id: Idbb0b2b6cf0e60c6d5d91218c24524d72285cf26
2020-02-24 10:03:03 +02:00
Yingrong Zhao
5011e78311 storagenode/piecestore: remove unused DeletePiece endpoint
With commit: 3331b443e7, satellite will
start calling `DeletePieces`. Therefore, we can remove the old endpoint
once the above commit is deployed with all satellites

Change-Id: I0124bc00a7cb808d119eb59f8fcd7fadf68158bb
2020-02-21 21:03:49 +00:00
Yingrong Zhao
a645e52ed9 satellite/metainfo: remove DeletePieces_node_id metric
Change-Id: I2cb10d411aa2912b256754a24d5c150e9536b4d3
2020-02-21 20:33:33 +00:00
Rafael Gomes
5132d285db cmd/statreceiver: Add instance tag to influx metric
Change-Id: I6545915c5cb93f6349c7b9d90f39e7d67c29038c
2020-02-21 16:33:00 -03:00
Yaroslav Vorobiov
f185adcf7c satellite/payments: fix projects list pagination
Change-Id: I342e69a17be34a503c1e0cef18ee009f1921fcd4
2020-02-21 19:37:11 +02:00
NikolaiYurchenko
2601f25c98 web/storagenode: notification logic implementation
Change-Id: Iec741997312203117213674ef85125fa8a976249
2020-02-21 15:49:27 +00:00
Michal Niewrzal
54e38b8986 pkg/miniogw: gateway implementation with new libuplink
Change-Id: I170c3a68cfeea33b528eeb27e6aecb126ecb0365
2020-02-21 16:20:38 +01:00
Egon Elbre
5342dd9fe6 go.mod: update uplink
Change-Id: I867a6a1eef8aa5d60bb676e5112b98c4192ce811
2020-02-21 16:08:12 +02:00
Ivan Fraixedes
0a8f268a7e
go.mod: Update golang.org/x/crypto to fix vulnerability
Update the golang.org/x/crypto package to fix the vulnerability
CVE-2020-9283.

See https://groups.google.com/forum/#!topic/golang-nuts/XDqhhjZViNk

Change-Id: I7c841c0bae0f55dad0c7de19ac70c730d11733f0
2020-02-21 11:30:05 +01:00
Egon Elbre
4044b8eeea storagenode/pieces: ensure chore is stopped before test ends
Change-Id: Ibc26e156d13011bf0f91b4206980200a24d348fe
2020-02-21 10:14:44 +02:00
Egon Elbre
fd5611fb5e private/testplanet: ensure server is closed in test
Change-Id: I12eafadfb1794cd84a288e39740f703919a9ddc6
2020-02-21 10:10:51 +02:00
Yaroslav Vorobiov
ea970e45ce satellite/payments: remove unused code
Change-Id: I2daaf5089bec000a6e995b8396d55528256aca6c
2020-02-20 16:04:19 +02:00
Yingrong Zhao
77f67a8086 satellite/metainfo: add timeout for delete request
Change-Id: I9cad6d7ea185fc2c0ed4e58b42e4e3a78178a79f
2020-02-20 09:10:16 +00:00
Maximillian von Briesen
a5afa4834a
docs/blueprints: Add blueprint for creating a new state for storagenodes, suspension (#3778)
This document describes a new state, suspension, which nodes will
be placed in if they respond to too many audits with "unknown" errors.
When in this state, node operators will have to fix their node within a
certain amount of time, or they will be disqualified. While nodes are
suspended, they will not be able to receive new data.

Change-Id: I543f27ada66d8b40fc52ca8ead8ecc8479326bad
2020-02-19 14:29:48 -05:00
Ethan
154ebdea69 satellite/api;repair: Have monkit pick the hostname as it's unique identifier
https://storjlabs.atlassian.net/browse/SM-157

Change-Id: If91c5a41234fac5ee93b151fc6799c07e0250239
2020-02-19 17:11:09 +00:00
Yingrong Zhao
e6da8d0249 satellite/metainfo: use global limiter for DeletePieces Service
we want to return back to the user as quick as possible but also keep
deleting remaining pieces on the storagenodes

Change-Id: I04e9e7a80b17a8c474c841cceae02bb21d2e796f
2020-02-19 12:17:36 +00:00
VitaliiShpital
dccca518da web/satellite: shortened token payments dropdown bug fix
Change-Id: Ie4a77cf88fcdc5a730a23e878481277ddf6cdb79
2020-02-19 11:51:04 +00:00
Michal Niewrzal
4fb70b4383 lib/uplink: use empty encryption access for restricting
Change-Id: Idc0c5ceb577bc2321e419b90dee83c4264f324e3
2020-02-19 11:30:51 +00:00
VitaliiShpital
5c508d1438 web/satellite: new project button removed after creating one
Change-Id: I7e3d063e5defd59926029c6d37f25655fc788eb2
2020-02-19 12:42:31 +02:00
VitaliiShpital
f4472b0b8c web/satellite: iframe checking added for login/register/forgotpass views
Change-Id: I2256c50498afc815f57fe0d87af1d2f60a1a5d60
2020-02-19 10:20:19 +00:00
VitaliiShpital
07da9d17c4 web/satellite: successful project creation popup reworked
Change-Id: I6d94a25a6507d1ab27fa223f042db91cc2cfd4e5
2020-02-19 10:04:23 +00:00
Ethan
4e1c00d374 jenkins: Update Jenkins to use -e POSTGRES_HOST_AUTH_METHOD=trust when running postgres.
Change-Id: If3fe75fcf7869bd1a8ea04f8b806b699af338547
2020-02-19 00:22:04 +00:00
Isaac Hess
47a52d3ce4 cmd/statreceiver: Change v2 known metrics to toml config file
Change-Id: Ib5e303210b2c0a59fad23ab4fd6333ea1aee9364
2020-02-18 16:52:42 -07:00
Jeff Wendling
c3c69a088a cmd/statreceiver: downgrade tally metrics correctly
The code to generate monkit.lock has a bug where it doesn't take
ScopeNamed into account and assumes the package. Since the downgrade
file was created from monkit.lock, we also assumed the package, so
we were downgrading to the wrong metric.

No other places call ScopeNamed that would cause a problem.

Change-Id: If9fbbd971a7d755f5de33ed20b8a6bcc95670ee3
2020-02-18 15:41:32 -07:00
Cameron Ayer
3e70a893dd storagenode/{piecestore, contact}: report capacity to satellites if below specific threshold
Curently, storage nodes only report their capacity to satellites
once per hour. If a node fills up, it will fail all uploads until
the next contact cycle begins. With these changes, at the end of an
upload we check whether the MinimumDiskSpace threshold has been
passed. If so, trigger the monitor chore to update the node's
capacity, then trigger the contact chore to report the new
capacity to the satellites

Change-Id: Ie6aadaade1e2c12c87e03f8ff9059a50121380a0
2020-02-18 15:42:48 -05:00
Yingrong Zhao
85a2c7fdac cmd/statreceiver: downgrade repair metrics from v3 to v2
Change-Id: I98abde5c069171a542e5f63a37c6395f91076ed7
2020-02-18 11:41:33 -05:00
Ivan Fraixedes
1a84a00cc9
satellite/orders: Fix doc comments
Enhance the documentation of the UseSerialNumber method (interface and
implementation) and add several missing dots in doc comments of the
methods of the same interface and implementation.

Change-Id: I792cd344f0d2542e060fa2ec288b71231cae69de
2020-02-18 13:03:23 +01:00
Michal Niewrzal
dbe8428f9f satelite/metainfo: return NotFound on delete non existing bucket
Change-Id: I7f466b5f824eab7b5146c2792f40cb2bcd7976a5
2020-02-18 09:05:30 +00:00