Commit Graph

1346 Commits

Author SHA1 Message Date
Egon Elbre
f85606b5a7 private/grpctlsopts: grpc related tlsopts
This moves grpc related tlsopts methods to private/grpctlsopts.
This allows to remove grpc dependency from tlsopts.

Change-Id: I25090b82b1e7a0633417ad600f8587b0c30ace73
2020-02-26 22:46:06 +02:00
Jessica Grebenschikov
e19e3c1101 pkg/process:
Now that we are trying to identify the root cause of the satellite load limitations (i.e. currently the satellite has a max ability of 400 rps for uploads and we need this to be higher), we are using the golang diagnostic tools to collect insight into what the bottlenecks are.  We currently have a debug endpoint to gather some cpu and mem data, but it could be useful to have continuous profiling. GCP stackdriver has support for continuous profiling so lets set that up and see if it is helpful to gather more data.

This PR adds support for [GCP continuous profiler](https://cloud.google.com/profiler) which allows enabling continuous cpu/mem profiling and the stats are sent to stackdriver in google cloud console.

To enable the continuous profiling for a storj component, do the following:
- prereq: the workload must be running in GKE and have Stackdriver Profiling IAM role permissions
- provide the config flag `debug.profilename` in the config.yaml file for the workload (i.e. satellite api process, etc). The profilename should be the workload name, for example "satellite-api".
- once the above config flag is provided, the profiler will be initialized and profiling stats will automatically be sent to GCP project where the workload is running and viewable in the Stackdriver Profile page in the console

The current implementation assumes the workload is running in GKE, however if we find if useful we can add support to enable this from anywhere. But for simplicity, its configured this way assuming the main goal is to enable in production systems.

Change-Id: Ibf8ebe2df7bf06fdd4951ee6a1e48854dd36ad47
2020-02-25 09:04:23 -08:00
Egon Elbre
e30f7b35b6 cmd/gateway: use a separate repository
Change-Id: Idbb0b2b6cf0e60c6d5d91218c24524d72285cf26
2020-02-24 10:03:03 +02:00
Michal Niewrzal
54e38b8986 pkg/miniogw: gateway implementation with new libuplink
Change-Id: I170c3a68cfeea33b528eeb27e6aecb126ecb0365
2020-02-21 16:20:38 +01:00
Egon Elbre
5342dd9fe6 go.mod: update uplink
Change-Id: I867a6a1eef8aa5d60bb676e5112b98c4192ce811
2020-02-21 16:08:12 +02:00
Ethan
154ebdea69 satellite/api;repair: Have monkit pick the hostname as it's unique identifier
https://storjlabs.atlassian.net/browse/SM-157

Change-Id: If91c5a41234fac5ee93b151fc6799c07e0250239
2020-02-19 17:11:09 +00:00
JT Olio
900cc47772 traces: fix memory leak for long running traces that aren't being collected
Change-Id: I7576e5f420e83c52c5bdb65a30f68aa8ee3d3cc8
2020-02-13 03:16:31 -07:00
Michal Niewrzal
cea4c25f53 mod: bump common and uplink version
Change-Id: Ia063d33c087dd91a46c008e154b078f11fa21527
2020-02-12 14:33:54 +00:00
Jeff Wendling
7999d24f81 all: use monkit v3
this commit updates our monkit dependency to the v3 version where
it outputs in an influx style. this makes discovery much easier
as many tools are built to look at it this way.

graphite and rothko will suffer some due to no longer being a tree
based on dots. hopefully time will exist to update rothko to
index based on the new metric format.

it adds an influx output for the statreceiver so that we can
write to influxdb v1 or v2 directly.

Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff
2020-02-05 23:53:17 +00:00
Egon Elbre
8dea4f52db satellite: add control panel
Change-Id: Id48246e9bcd4c6ec643277fe740937b2e42ad85b
2020-01-30 08:06:43 -05:00
Egon Elbre
4e2bf81719 pkg/debug: add better title
Change-Id: Icc6114f4e7523cfe6c7984ef1f6eec664ae4ee65
2020-01-30 07:49:40 -05:00
Egon Elbre
d10d6fd153 storagenode,satellite: ignore error on listening debug port
Change-Id: Id3a6d153535776ce41f8edf2bd6f6dad5e2a60bf
2020-01-29 18:06:02 -05:00
Egon Elbre
a2b2bc676b pkg/debug: implement control panel
Control Panel allows to control different chores and services.
Currently this adds controlling of cycles.

Change-Id: I734f1676b2a0d883b8f5ba937e93c45ac1a9ce21
2020-01-29 16:30:31 -05:00
Egon Elbre
f237d70098 storagenode,satellite: use pkg/debug
Use debug.Server in storage node and satellite for customizing debug server.

Change-Id: I7979412376d028cadf29656d838ab94f18e2aa99
2020-01-29 16:30:31 -05:00
Egon Elbre
f833289e1b pkg/debug: separate debug endpoints to a server
By separating Server it allows Peers to directly embed the server
and provide customizations and hooks into rest of the services.

Change-Id: Ic1d68740fd494d2f82c1739bd990849c561b912b
2020-01-29 16:30:31 -05:00
Bryan White
fab58e9c12 cmd/uplink: hide advanced flags from output
Change-Id: I536af267c38e153aeea682fca4a74dc0ea2c42f0
2020-01-23 13:24:30 +00:00
Egon Elbre
5a4745eddb all: remove usages of testplanet.New
Ensure that tests use testplanet.Run, so we always require running
against all database backends.

Change-Id: I6b0209e6a4912cf3328bd35b2c31bb8598930acb
2020-01-22 22:42:57 +02:00
Fadila Khadar
b2c454dbd6 cmd/uplink: print user-friendly FATAL errors.
Change-Id: I9147a83bec2defa854e52aecbd9658e56e2aa123
2020-01-22 16:21:48 +00:00
Michal Niewrzal
86f194769f uplink: adjust to changes in storj/uplink
This change is adjusting code base to changes in storj/uplink.

https://review.dev.storj.io/c/storj/uplink/+/643

Change-Id: Ieca87f9f5983e391bf4b4fec8b9d5491fd32bfa1
2020-01-20 22:06:19 +00:00
Egon Elbre
ca8ceff5e2 pkg/cache: fix expiration test
Multiple time.Now calls in a short amount of time
may return the exact same time.

This adds a time.Sleep() to ensure we get different internal
timing.

Change-Id: I533b5eedfdcab13a0197976ccf4b9fa96d4c3bba
2020-01-19 19:31:31 +02:00
JT Olio
c01cbe0130 satellitedb: save out all db-touching traces
Change-Id: Ib1e192221f9da813fd9cbb55f620a047b82c9523
2020-01-14 18:47:45 -05:00
Michal Niewrzal
c8ccd26e04 cmd/uplink: import imports 'access' into existing configuration
https://storjlabs.atlassian.net/browse/V3-3491

Change-Id: I9c5f649ded314bb3a2235588c746913a3ec2d203
2020-01-14 13:18:48 +00:00
Michal Niewrzal
36db00b2bf cmd/uplink: don't require setup or import if --access is set
We want to make using uplink as easy as possible. That's why we wan't to
avoid requiring setup or import command before normal usage if user
specified --access flag. If this flag is set then rest flags should be
set as defaults.

https://storjlabs.atlassian.net/browse/V3-3490

Change-Id: I95a7bd77a3f00b8d9981fee513e9e77aef298bca
2020-01-11 07:47:53 +00:00
Jeff Wendling
77fd41a02e satellite: add an expiring lru cache around api keys
Change-Id: I995429c66affd33da59b091f28f09ca122070b5e
2020-01-09 22:13:41 -07:00
Egon Elbre
082ec81714
uplink: move to storj.io/uplink (#3746) 2020-01-08 15:40:19 +02:00
Jeff Wendling
828d0b9984 pkg/server: set TCP_USER_TIMEOUT and monitor leaked conns
Go will, by default, set tcp keep alives on sockets. But
the kernel does not send keep alives to sockets that have
a non-empty send queue. That can cause connections that
hang forever.

So we set TCP_USER_TIMEOUT on all of the sockets as well.
That option will close any connection that has not received
an ack for any sent data (keep alive or otherwise) in the
configured time period. This places an upper bound on the
amount of time a socket can be stuck due to a client not
acknowleding data.

See https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
for more information on what these options do and how they
interact.

Additionally, make sure that we close every connection coming
from the listeners by wrapping them in a type with a finalizer
that closes the connection, much like the os package does for
file handles. It monitors if a connection was closed due to a
finalizer so that we can go and look for the bug if we ever
see a non-zero value.

Change-Id: Idc6c0564224b8dc2e4c9d769e80374ed1fe8cce0
2020-01-03 21:31:09 +00:00
Egon Elbre
e03d3fb577 uplink: move configs to cmd/uplink/cmd
Change-Id: Ifc1d3440dcef429c2a6142c16f3e991abf49f1d2
2020-01-02 09:40:57 +00:00
Egon Elbre
6615ecc9b6 common: separate repository
Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a
2019-12-27 14:11:15 +02:00
Fadila
115b8b0fc8 storagenode/piecestore: delete several pieces in a single request
This is part of the deletion performance improvement.
See https://storjlabs.atlassian.net/browse/V3-3349

Change-Id: Idcd83a302f2bd5cc3299e1a4195a7e177f452599
2019-12-27 10:58:04 +00:00
Egon Elbre
3849b9a21a pkr/rpc: remove RawConn
Change-Id: I61bc10b82f178a16f27279b85b627553d122c174
2019-12-22 18:01:01 +02:00
Egon Elbre
d55288cf68 pkg/rpc: replace methods with direct calls to pb
Change-Id: I8bd015d8d316a2c12c1daceca1d9fd257f6f57bc
2019-12-22 17:12:43 +02:00
Egon Elbre
006baa9ca6 pkg/rpc: remove drpc aliases
We need to split up pb package, which means we cannot have a core package
that depends on them.

Change-Id: I7f4f6fd82f89a51a9b2ad08bf2b1207253b8a215
2019-12-22 16:58:08 +02:00
Egon Elbre
31fbdcc8f7 pkg/encryption: better EncodingDecodingStress
The test case wasn't testing all the combinations.

But, testing all 3 byte combinations would take too long,
instead test special values and +- 1 of them, with some
additional noise characters.

Change-Id: If53ff25863a1f27c534922bd399fbbbdfefda441
2019-12-21 12:45:24 +02:00
Egon Elbre
6fc009f6e4 uplink/eestream: move Pad to encryption package to break dependency to eestream
Change-Id: I0c9bc3c65f161d79812196ac8285405e6be04c9e
2019-12-20 19:32:12 +00:00
Egon Elbre
ea455b6df0 all: remove code to default to grpc
We have moved to drpc so we don't need to have code for building
with grpc only.

Change-Id: I55732314dca0d5b4ce1132b68de4186a15d91b21
2019-12-20 20:12:04 +02:00
JT Olio
389d1821ea uplink/paths/encryption: support commandline argument to override path cipher to be urlsafe base64 for lists and deletes (#2855) 2019-12-19 12:29:00 +01:00
Egon Elbre
9ed9d3516f pkg/peertls: move tests requiring redis or bolt
Change-Id: Ib9de8d5ac1123d109b0209de4174fb4da7c67078
2019-12-18 17:11:36 +02:00
Egon Elbre
1eaf9e9ed7 pkg/storj: move non-common types
Change-Id: I2dd15c95cd334660f29a528dfb3dec9c30b29cab
2019-12-18 14:14:53 +00:00
Egon Elbre
71d6eef0d7 pkg/storj: remove unnecessary interfaces
Change-Id: Ifb308d2654d61a5b9bb3e28ff20282d6a690f0ca
2019-12-18 14:14:43 +00:00
Egon Elbre
7a36507a0a private/testcontext: ensure we call cleanup everywhere
Change-Id: Icb921144b651611d78f3736629430d05c3b8a7d3
2019-12-17 14:16:09 +00:00
Egon Elbre
7455ab771b pkg/peertls/tlsopts: move test that requires testplanet
For splitting core repository we need it not to pull in testplanet
even in tests.

Change-Id: I04d46b418e6e908185a4da694cf47dc3c5cc65f0
2019-12-17 13:45:51 +00:00
Egon Elbre
b04f9996c5 pkg/rpc: move test that needs testplanet
Move rpc test that uses testplanet into private/testplanet.

This ensures that rpc doesn't have the whole system as a dependency
making it easier to separate.

This unfortunately leaves pkg/rpc without specific tests, but
we would need to write new tests that only use the core packages.

Change-Id: I402ab3c2d50282af159c2ef3371d23b0997fef0a
2019-12-17 13:31:12 +00:00
Cameron Ayer
a4f9865b47 satellite: adds and enables cockroachdb compatibility for tests
Change-Id: I85a3ad8c3b9d7e15ea8675b6c55af0002933db57
2019-12-16 22:29:25 +00:00
Isaac Hess
0008aebf80 pkg/rpc: Change drpcheader to save a packet
This changes when we write the drpcheader. Rather than making it its own
write to the connection, it now prepends the drpc header to the first
write on the connection (typically the tls handshake). This results in
one less packet being sent at the beginning of each drpc connection.

For an operation like uploading a file from uplink, this results in many
packets being dropped: one when communicating with the satellite, and
one for each communication with the storage nodes.

Change-Id: I7644b46e90ffa7acea73ac56831396307352ed7a
2019-12-16 13:33:39 -07:00
Kaloyan Raev
958772467a cmd/storagenode-updater, pkg/process: Fix logging timestamp
After changing how we execute the storagenode-updater process we lost
timestamps in the log.

The fix is to start using zap logging.

The Windows Installer is changed to register the storagenode-updater
service in a way that the Windows Service Manager passes the
--log.output flag instead of the old --log.

The old --log flag is deprecated, but not removed. We will support it
for backward compatibility. This is required as the storagenode-updater
can auto-updated itself, but the Windows Service Manager of this old
installtion will continue passing the old --log flag when starting it.

Change-Id: I690dff27e01335e617aa314032ecbadc4ea8cbd5
Signed-off-by: Kaloyan Raev <kaloyan@storj.io>
2019-12-11 10:05:48 +00:00
Jeff Wendling
23df647a15 pkg/rpc/rpcpool: add idle expiration to connections
long lived uplinks could just hold on to connections forever
if their client to the storagenode or satellite isn't closed.
this will prevent that from happening on the client. more
changes will be necessary to add appropriate prevention on
the servers.

Change-Id: Ib36d85e70cbafb315664ad7657bb70b936b3828c
2019-12-10 20:32:11 +00:00
Cameron Ayer
6fae361c31 replace planet.Start in tests with planet.Run
planet.Start starts a testplanet system, whereas planet.Run starts a testplanet
and runs a test against it with each DB backend (cockroach compat).

Change-Id: I39c9da26d9619ee69a2b718d24ab00271f9e9bc2
2019-12-10 16:55:54 +00:00
Michal Niewrzal
ffd570eb04 uplink/metainfo: remove additional GetObject from download
This change removes one additional metainfo.GetObject call during
download.

Change-Id: I86b4e5165f299069e3e18944fe79c697c6a514d3
2019-12-09 11:48:48 +00:00
Jennifer Johnson
ecb960f506 private/dbutil: distinguishes between db drivers and implementations to allow for different implementations of SQL queries.
Change-Id: I2dc8d1d371139aa8bc805e92a2b80b71f580fd64
2019-12-04 18:31:26 +00:00
Andrew Harding
2461ccd469 pkg/private/fpath: subsume AtomicWriteFile
AtomicWriteFile is useful primitive to use throughout the codebase

Change-Id: I338fc4505ba20d5aece09ddc257286f46298e083
2019-12-03 18:14:08 +00:00