Now that we are trying to identify the root cause of the satellite load limitations (i.e. currently the satellite has a max ability of 400 rps for uploads and we need this to be higher), we are using the golang diagnostic tools to collect insight into what the bottlenecks are. We currently have a debug endpoint to gather some cpu and mem data, but it could be useful to have continuous profiling. GCP stackdriver has support for continuous profiling so lets set that up and see if it is helpful to gather more data.
This PR adds support for [GCP continuous profiler](https://cloud.google.com/profiler) which allows enabling continuous cpu/mem profiling and the stats are sent to stackdriver in google cloud console.
To enable the continuous profiling for a storj component, do the following:
- prereq: the workload must be running in GKE and have Stackdriver Profiling IAM role permissions
- provide the config flag `debug.profilename` in the config.yaml file for the workload (i.e. satellite api process, etc). The profilename should be the workload name, for example "satellite-api".
- once the above config flag is provided, the profiler will be initialized and profiling stats will automatically be sent to GCP project where the workload is running and viewable in the Stackdriver Profile page in the console
The current implementation assumes the workload is running in GKE, however if we find if useful we can add support to enable this from anywhere. But for simplicity, its configured this way assuming the main goal is to enable in production systems.
Change-Id: Ibf8ebe2df7bf06fdd4951ee6a1e48854dd36ad47
this commit updates our monkit dependency to the v3 version where
it outputs in an influx style. this makes discovery much easier
as many tools are built to look at it this way.
graphite and rothko will suffer some due to no longer being a tree
based on dots. hopefully time will exist to update rothko to
index based on the new metric format.
it adds an influx output for the statreceiver so that we can
write to influxdb v1 or v2 directly.
Change-Id: Iae9f9494a6d29cfbd1f932a5e71a891b490415ff
By separating Server it allows Peers to directly embed the server
and provide customizations and hooks into rest of the services.
Change-Id: Ic1d68740fd494d2f82c1739bd990849c561b912b
We want to make using uplink as easy as possible. That's why we wan't to
avoid requiring setup or import command before normal usage if user
specified --access flag. If this flag is set then rest flags should be
set as defaults.
https://storjlabs.atlassian.net/browse/V3-3490
Change-Id: I95a7bd77a3f00b8d9981fee513e9e77aef298bca
After changing how we execute the storagenode-updater process we lost
timestamps in the log.
The fix is to start using zap logging.
The Windows Installer is changed to register the storagenode-updater
service in a way that the Windows Service Manager passes the
--log.output flag instead of the old --log.
The old --log flag is deprecated, but not removed. We will support it
for backward compatibility. This is required as the storagenode-updater
can auto-updated itself, but the Windows Service Manager of this old
installtion will continue passing the old --log flag when starting it.
Change-Id: I690dff27e01335e617aa314032ecbadc4ea8cbd5
Signed-off-by: Kaloyan Raev <kaloyan@storj.io>
the current prometheus help messages have enough unexpected
characters that they are breaking prometheus parsing. they
may also be triggering prometheus to expect more from us (type
annotations) than we have to offer.
we're really not adding a lot of value with these help messages,
so just take them out
Change-Id: I9b723447a294bb492a6292480e9f88634346a80b
* pkg/process: Fatal show complete error information
Change the general process execution function to not using the sugared
logger for outputting the full error information.
Delete some unreachable code because Zap logger Fatal method calls exit
1 internally.
* storagenode/storagenodedb: Add info to error
Add more information to an error returned due to some data
inconsistency.
* storagenode/orders: Don't use sugared logger
Don't use sugar logger and provide better contextualized error messages
in settle method.
* storagenode/orders: Add some log fields to error msgs
Add some relevant log fields to some logged errors of the sender settle
method.
* satellite/orders: Remove always nil error from debug
Remove an error which as logged in debug level which was always nil and
makes the logic that used this variable clear.
* storagenode/orders: Don't return error Archiving unsent
Don't stop the process which archive unsent orders if some of them
aren't found the DB because it cause the Storage Node to stop with a
fatal error.
this avoids a problem where setting on a flag isn't sufficient
to express complex data structures like []string.
Change-Id: I06f13656996d658b4c7a957451cb253728a67eda
What: Change cmd/uplink to use scopes
It moves the fields that will be subsumed by scopes into an explicit legacy section and hides their configuration flags.
Why: So that it can read scopes in from files and stuff
* storagenode/piecestore: track live requests together
Change-Id: I9ed44e4484b97bcbe076c222450c3449fe8b1075
* show grpc status codes in monkit failures
Change-Id: I68bc3a8d24a372e8147ef2a74636fc3e40fa799a
* small nit
Change-Id: I722b09345377b079e41c5a3dc86d7fd6232c9d24
* pkg/process/metrics: add an instance prefix
the distinction between which satellite is sending which
data should go in the instance field, not the suffix or application
fields. (un)fortunately, the instance id is deliberately not
configurable because we don't want it to be easy to accidentally
have multiple applications collide with the same instance id.
so we're currently stuffing the human readable instance in the
suffix. :(
perhaps a reasonable tradeoff would be an optional instance
prefix that allows operators to put their domain name in
the instance
Change-Id: I6fcc8498be908c5740439cc00f77474ad151febd
* linting
Change-Id: I9f9a44fa9a2634ef5e4f89548d42d57ce9e4450e
What: this will make it so release binaries default to whatever-release instead of whatever-dev in metrics collection
Why: So we can monitor release binaries with default configuration without getting drowned out by dev binaries
* fix bug for setting flag only values in process setup
when the code was changed to directly load values into the config
structs, it was missed that some configuration is only defined
through flags, but can be loaded from config files still.
so, we need to propogate the settings to the flag only values.
* add test for setting propagation
* fix linting error
* change BindSetup to be an option to Bind
* add process.Bind to allow composite structures
* hack fix for noprefix flags
* used tagged version of structs
Before this PR, some flags were created by calling `cfgstruct.Bind` and having their fields create a flag. Once the flags were parsed, `viper` was used to acquire all the values from them and config files, and the fields in the struct were set through the flag interface.
This doesn't work for slices of things on config structs very well, since it can only set strings, and for a string slice, it turns out that the implementation in `pflag` appends an entry rather than setting it.
This changes three things:
1. Only have a `Bind` call instead of `Bind` and `BindSetup`, and make `BindSetup` an option instead.
2. Add a `process.Bind` call that takes in a `*cobra.Cmd`, binds the struct to the command's flags, and keeps track of that struct in a global map keyed by the command.
3. Use `viper` to get the values and load them into the bound configuration structs instead of using the flags to propagate the changes.
In this way, we can support whatever rich configuration we want in the config yaml files, while still getting command like flags when important.
* uplink: Add a new flag to set the filepath of the file which is used for
saving the encryption key and rename the one that hold the encryption key and
establish that it has priority over the key stored in the file to make the
configuration usable without having a huge refactoring in test-sim.
* cmd/uplink: Adapt the setup subcommand for storing the user input key to a file
and adapt the rest of the subcommands for reading the key from the key-file when
the key isn't explicitly set with a command line flag.
* cmd/gateway: Adapt it to read the encryption key from the key-file or use the
one passed by a command line flag.
* pkg/process: Export the default configuration filename so other packages which
use the same value can reference to it rather than having it hardcoded.
* Adapt several integrations (scripts, etc.) to consider the changes applied in uplink and cmd packages.
* cmd/uplink: add share command to restrict an api key
This commit is an early bit of work to just implement restricting
macaroon api keys from the command line. It does not convert
api keys to be macaroons in general.
It also does not apply the path restriction caveats appropriately
yet because it does not encrypt them.
* cmd/uplink: fix path encryption for shares
It should now properly encrypt the path prefixes when adding
caveats to a macaroon.
* fix up linting problems
* print summary of caveat and require iso8601
* make clone part more clear
this adds a flag to all pkg/process binaries where if set,
will write a process-level trace for all monkit-instrumented
functions as an svg to the target path.
usage like:
uplink cp --debug.trace-out=trace.svg file sj://bucket/file
Change-Id: I8f6bbfc488f0f1ac270cb28a61347c6b9698cea7