Commit Graph

95 Commits

Author SHA1 Message Date
Jessica Grebenschikov
e19e3c1101 pkg/process:
Now that we are trying to identify the root cause of the satellite load limitations (i.e. currently the satellite has a max ability of 400 rps for uploads and we need this to be higher), we are using the golang diagnostic tools to collect insight into what the bottlenecks are.  We currently have a debug endpoint to gather some cpu and mem data, but it could be useful to have continuous profiling. GCP stackdriver has support for continuous profiling so lets set that up and see if it is helpful to gather more data.

This PR adds support for [GCP continuous profiler](https://cloud.google.com/profiler) which allows enabling continuous cpu/mem profiling and the stats are sent to stackdriver in google cloud console.

To enable the continuous profiling for a storj component, do the following:
- prereq: the workload must be running in GKE and have Stackdriver Profiling IAM role permissions
- provide the config flag `debug.profilename` in the config.yaml file for the workload (i.e. satellite api process, etc). The profilename should be the workload name, for example "satellite-api".
- once the above config flag is provided, the profiler will be initialized and profiling stats will automatically be sent to GCP project where the workload is running and viewable in the Stackdriver Profile page in the console

The current implementation assumes the workload is running in GKE, however if we find if useful we can add support to enable this from anywhere. But for simplicity, its configured this way assuming the main goal is to enable in production systems.

Change-Id: Ibf8ebe2df7bf06fdd4951ee6a1e48854dd36ad47
2020-02-25 09:04:23 -08:00
Ethan
154ebdea69 satellite/api;repair: Have monkit pick the hostname as it's unique identifier
https://storjlabs.atlassian.net/browse/SM-157

Change-Id: If91c5a41234fac5ee93b151fc6799c07e0250239
2020-02-19 17:11:09 +00:00
Jeff Wendling
7999d24f81 all: use monkit v3
this commit updates our monkit dependency to the v3 version where
it outputs in an influx style. this makes discovery much easier
as many tools are built to look at it this way.

graphite and rothko will suffer some due to no longer being a tree
based on dots. hopefully time will exist to update rothko to
index based on the new metric format.

it adds an influx output for the statreceiver so that we can
write to influxdb v1 or v2 directly.

Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff
2020-02-05 23:53:17 +00:00
Egon Elbre
d10d6fd153 storagenode,satellite: ignore error on listening debug port
Change-Id: Id3a6d153535776ce41f8edf2bd6f6dad5e2a60bf
2020-01-29 18:06:02 -05:00
Egon Elbre
f237d70098 storagenode,satellite: use pkg/debug
Use debug.Server in storage node and satellite for customizing debug server.

Change-Id: I7979412376d028cadf29656d838ab94f18e2aa99
2020-01-29 16:30:31 -05:00
Egon Elbre
f833289e1b pkg/debug: separate debug endpoints to a server
By separating Server it allows Peers to directly embed the server
and provide customizations and hooks into rest of the services.

Change-Id: Ic1d68740fd494d2f82c1739bd990849c561b912b
2020-01-29 16:30:31 -05:00
Fadila Khadar
b2c454dbd6 cmd/uplink: print user-friendly FATAL errors.
Change-Id: I9147a83bec2defa854e52aecbd9658e56e2aa123
2020-01-22 16:21:48 +00:00
JT Olio
c01cbe0130 satellitedb: save out all db-touching traces
Change-Id: Ib1e192221f9da813fd9cbb55f620a047b82c9523
2020-01-14 18:47:45 -05:00
Michal Niewrzal
c8ccd26e04 cmd/uplink: import imports 'access' into existing configuration
https://storjlabs.atlassian.net/browse/V3-3491

Change-Id: I9c5f649ded314bb3a2235588c746913a3ec2d203
2020-01-14 13:18:48 +00:00
Michal Niewrzal
36db00b2bf cmd/uplink: don't require setup or import if --access is set
We want to make using uplink as easy as possible. That's why we wan't to
avoid requiring setup or import command before normal usage if user
specified --access flag. If this flag is set then rest flags should be
set as defaults.

https://storjlabs.atlassian.net/browse/V3-3490

Change-Id: I95a7bd77a3f00b8d9981fee513e9e77aef298bca
2020-01-11 07:47:53 +00:00
Egon Elbre
6615ecc9b6 common: separate repository
Change-Id: Ibb89c42060450e3839481a7e495bbe3ad940610a
2019-12-27 14:11:15 +02:00
Egon Elbre
7a36507a0a private/testcontext: ensure we call cleanup everywhere
Change-Id: Icb921144b651611d78f3736629430d05c3b8a7d3
2019-12-17 14:16:09 +00:00
Kaloyan Raev
958772467a cmd/storagenode-updater, pkg/process: Fix logging timestamp
After changing how we execute the storagenode-updater process we lost
timestamps in the log.

The fix is to start using zap logging.

The Windows Installer is changed to register the storagenode-updater
service in a way that the Windows Service Manager passes the
--log.output flag instead of the old --log.

The old --log flag is deprecated, but not removed. We will support it
for backward compatibility. This is required as the storagenode-updater
can auto-updated itself, but the Windows Service Manager of this old
installtion will continue passing the old --log flag when starting it.

Change-Id: I690dff27e01335e617aa314032ecbadc4ea8cbd5
Signed-off-by: Kaloyan Raev <kaloyan@storj.io>
2019-12-11 10:05:48 +00:00
Andrew Harding
2461ccd469 pkg/private/fpath: subsume AtomicWriteFile
AtomicWriteFile is useful primitive to use throughout the codebase

Change-Id: I338fc4505ba20d5aece09ddc257286f46298e083
2019-12-03 18:14:08 +00:00
Egon Elbre
ee6c1cac8a
private: rename internal to private (#3573) 2019-11-14 21:46:15 +02:00
Andrew Harding
168f72d371
Initialize math/rand default source (#3552) 2019-11-12 10:03:41 -07:00
Michal Niewrzal
ab5c623ac7
cli: should return non-zero code for error (#3469) 2019-11-05 06:01:26 -08:00
Bryan White
f468816f13
{internal/version,versioncontrol,cmd/storagenode-updater}: add rollout to storagenode updater (#3276) 2019-10-21 12:50:59 +02:00
Bryan White
243ba1cb17
{versioncontrol,internal/version,cmd/*}: refactor version control (#3253) 2019-10-20 09:56:23 +02:00
Kaloyan Raev
45df0c5340
storagenode/process: respond to Windows Service events (#3025) 2019-09-19 19:37:40 +03:00
Andrew Harding
f550ab5d1c
Uplink "import" command (#2981)
* uplink import cmd

* pkg/process: fix import order

* fix golangci-lint failures

* remove "help" from the satellite config lock file
2019-09-13 12:33:30 -06:00
Michal Niewrzal
0ccae6b061 cmd: windows log file workaround (#2979) 2019-09-10 12:35:59 +03:00
Egon Elbre
e03e6844f8
pkg/process: reduce noise in storj-sim (#2988)
* Don't display stack for failure to start debug endpoints
* Don't display stack for disabled telemetry
2019-09-10 10:32:53 +03:00
JT Olio
48ba45eb55
pkg/process prometheus: forgot digits (#2920)
Change-Id: I1825d31dedcce45a80663d27261e2db0cb14c0f1
2019-08-29 17:51:03 -06:00
Jess G
b20dcfd64c add type to prometheus metrics (#2916) 2019-08-29 23:42:38 +02:00
JT Olio
b3f9a8813d pkg/process: remove prometheus help (#2914)
the current prometheus help messages have enough unexpected
characters that they are breaking prometheus parsing. they
may also be triggering prometheus to expect more from us (type
annotations) than we have to offer.

we're really not adding a lot of value with these help messages,
so just take them out

Change-Id: I9b723447a294bb492a6292480e9f88634346a80b
2019-08-29 12:42:11 -07:00
Ivan Fraixedes
e47b8ed131
storagenode: No FATAL error when unsent orders aren't found (#2801)
* pkg/process: Fatal show complete error information
  Change the general process execution function to not using the sugared
  logger for outputting the full error information.
  Delete some unreachable code because Zap logger Fatal method calls exit
  1 internally.
* storagenode/storagenodedb: Add info to error
  Add more information to an error returned due to some data
  inconsistency.
* storagenode/orders: Don't use sugared logger
  Don't use sugar logger and provide better contextualized error messages
  in settle method.
* storagenode/orders: Add some log fields to error msgs
  Add some relevant log fields to some logged errors of the sender settle
  method.
* satellite/orders: Remove always nil error from debug
  Remove an error which as logged in debug level which was always nil and
  makes the logic that used this variable clear.
* storagenode/orders: Don't return error Archiving unsent
  Don't stop the process which archive unsent orders if some of them
  aren't found the DB because it cause the Storage Node to stop with a
  fatal error.
2019-08-16 16:53:22 +02:00
Jeff Wendling
2dab0ab466 pkg/process: only propagate missing keys to flags (#2784)
this avoids a problem where setting on a flag isn't sufficient
to express complex data structures like []string.

Change-Id: I06f13656996d658b4c7a957451cb253728a67eda
2019-08-14 14:41:41 -06:00
JT Olio
538e8168a5
pkg/process: support json trace output (#2713)
Change-Id: Id5a83a8fe23d19d92f5c2622acb0642b4e8f72a2
2019-08-05 12:28:41 -06:00
Jeff Wendling
21a3bf89ee cmd/uplink: use scopes to open (#2501)
What: Change cmd/uplink to use scopes

It moves the fields that will be subsumed by scopes into an explicit legacy section and hides their configuration flags.

Why: So that it can read scopes in from files and stuff
2019-08-05 11:01:20 -06:00
JT Olio
28156d3573
storagenode: more live request tracking (#2699)
* storagenode/piecestore: track live requests together

Change-Id: I9ed44e4484b97bcbe076c222450c3449fe8b1075

* show grpc status codes in monkit failures

Change-Id: I68bc3a8d24a372e8147ef2a74636fc3e40fa799a

* small nit

Change-Id: I722b09345377b079e41c5a3dc86d7fd6232c9d24
2019-08-02 16:49:39 -06:00
Egon Elbre
4f0d39cc64
don't use global loggers (#2675) 2019-07-31 17:38:44 +03:00
Kaloyan Raev
0931f0e71b Log fatal errors with respective severity (#2581) 2019-07-17 14:39:10 -04:00
Stefan Benten
ccef5eee46
Add proper Version Handling to Identity, Gateway and Uplink Binary (#2471) 2019-07-08 10:45:20 -04:00
Michal Niewrzal
46b5c30f35
Fix test-sim debug-addr error (#2420) 2019-07-03 16:10:51 +02:00
Egon Elbre
6502143e79
fix import ordering (#2322) 2019-06-25 12:46:29 +03:00
JT Olio
8c57434ded
pkg/process/metrics: add an instance prefix (#2190)
* pkg/process/metrics: add an instance prefix

the distinction between which satellite is sending which
data should go in the instance field, not the suffix or application
fields. (un)fortunately, the instance id is deliberately not
configurable because we don't want it to be easy to accidentally
have multiple applications collide with the same instance id.

so we're currently stuffing the human readable instance in the
suffix. :(

perhaps a reasonable tradeoff would be an optional instance
prefix that allows operators to put their domain name in
the instance

Change-Id: I6fcc8498be908c5740439cc00f77474ad151febd

* linting

Change-Id: I9f9a44fa9a2634ef5e4f89548d42d57ce9e4450e
2019-06-24 16:45:37 -06:00
Jess G
ddcf4fc2a3
add support to hide config settings (#2241)
* add hide support for config settings

* updates per CR to unit test

* check err for lint
2019-06-19 07:27:44 -07:00
JT Olio
df2fad15d8
pkg/process/logging: different defaults for release/dev (#2191)
* pkg/process/logging: different defaults for release/dev

Change-Id: I55be80430a31668fededf479b052e106ab18d9ce

* linting

Change-Id: I4e50d4c9569b7324c4704c14df7dd3228dbb7dd5

* Trigger Jenkins

* fix lock file

* use dev=debug and prod=info
2019-06-13 10:43:39 -06:00
JT Olio
fa09a7894b pkg/process: prometheus support (#2176) 2019-06-13 18:29:35 +02:00
JT Olio
e60ff9dcbb
process/metrics: have metrics suffix default to dev/release status (#2073)
What: this will make it so release binaries default to whatever-release instead of whatever-dev in metrics collection

Why: So we can monitor release binaries with default configuration without getting drowned out by dev binaries
2019-05-31 16:47:48 -06:00
Jeff Wendling
140251882e
fix bug for setting flag only values in process setup (#2089)
* fix bug for setting flag only values in process setup

when the code was changed to directly load values into the config
structs, it was missed that some configuration is only defined
through flags, but can be loaded from config files still.

so, we need to propogate the settings to the flag only values.

* add test for setting propagation

* fix linting error
2019-05-31 21:15:50 +00:00
Jeff Wendling
e74cac52ab
Command line flags features and cleanup (#2068)
* change BindSetup to be an option to Bind
* add process.Bind to allow composite structures
* hack fix for noprefix flags
* used tagged version of structs

Before this PR, some flags were created by calling `cfgstruct.Bind` and having their fields create a flag. Once the flags were parsed, `viper` was used to acquire all the values from them and config files, and the fields in the struct were set through the flag interface.

This doesn't work for slices of things on config structs very well, since it can only set strings, and for a string slice, it turns out that the implementation in `pflag` appends an entry rather than setting it.

This changes three things:

1. Only have a `Bind` call instead of `Bind` and `BindSetup`, and make `BindSetup` an option instead.
2. Add a `process.Bind` call that takes in a `*cobra.Cmd`, binds the struct to the command's flags, and keeps track of that struct in a global map keyed by the command.
3. Use `viper` to get the values and load them into the bound configuration structs instead of using the flags to propagate the changes.

In this way, we can support whatever rich configuration we want in the config yaml files, while still getting command like flags when important.
2019-05-29 17:56:22 +00:00
Ivan Fraixedes
69d8b9f828
Change where the encryption key is being stored for uplink (#1967)
* uplink: Add a new flag to set the filepath of the file which is used for 
  saving the encryption key and rename the one that hold the encryption key and 
  establish that it has priority over the key stored in the file to make the 
  configuration usable without having a huge refactoring in test-sim.
* cmd/uplink: Adapt the setup subcommand for storing the user input key to a file 
  and adapt the rest of the subcommands for reading the key from the key-file when 
  the key isn't explicitly set with a command line flag.
* cmd/gateway: Adapt it to read the encryption key from the key-file or use the 
  one passed by a command line flag.
* pkg/process: Export the default configuration filename so other packages which 
  use the same value can reference to it rather than having it hardcoded.
* Adapt several integrations (scripts, etc.) to consider the changes applied in uplink and cmd packages.
2019-05-22 15:57:12 +02:00
Jeff Wendling
15e74c8c3d uplink share subcommand (#1924)
* cmd/uplink: add share command to restrict an api key

This commit is an early bit of work to just implement restricting
macaroon api keys from the command line. It does not convert
api keys to be macaroons in general.

It also does not apply the path restriction caveats appropriately
yet because it does not encrypt them.

* cmd/uplink: fix path encryption for shares

It should now properly encrypt the path prefixes when adding
caveats to a macaroon.

* fix up linting problems

* print summary of caveat and require iso8601

* make clone part more clear
2019-05-14 12:15:12 -06:00
Bill Thorp
89c5e70003
defaults now commented out (#1878)
* defaults now commented out, unless custom / user / override
2019-05-08 08:14:00 -04:00
Michal Niewrzal
dcea59205d
Uplink CLI setup welcome message (#1735) 2019-04-24 15:17:32 +02:00
JT Olio
2f9aa7e7be pkg/process: add --debug.trace-out option to binaries (#1788)
this adds a flag to all pkg/process binaries where if set,
will write a process-level trace for all monkit-instrumented
functions as an svg to the target path.

usage like:

  uplink cp --debug.trace-out=trace.svg file sj://bucket/file

Change-Id: I8f6bbfc488f0f1ac270cb28a61347c6b9698cea7
2019-04-19 11:34:29 -04:00
JT Olio
27ec69eb79 add ability to inspect version to all binaries (#1685)
* add ability to inspect version to all binaries

Change-Id: I09d83b0e6ddd145dc02ae6619f842fcc7c3a0469

* add monitoring

Change-Id: I14ceca4cd7706f4c33fa67f7b34ebb8297fc76ad

* review comments

Change-Id: I255c5559be78b92cba4b23e22cf92f459bbd16b7

* switch to atomic

Change-Id: Ibdab950c59b974f64b129d783b283abe025ab0ff

* linting

Change-Id: Ia2407172e64c2cc92e78ad8b48c371e61ae67278
2019-04-10 08:38:26 +02:00
Michal Niewrzal
07afba521d
Replace localhost with 127.0.0.1 in defaults (#1678) 2019-04-05 14:09:16 +02:00