The current default value of listenAddress = null blows up:
$ nixos-rebuild build
error: cannot coerce null to a string, at
.../nixpkgs/nixos/modules/services/monitoring/prometheus/alertmanager.nix:97:16
With listenAddress = "" we use the same default as upstream and there is
no blow up :-)
...by providing a default value of "no labels" (an empty attrset).
Without this change we get
$ nixos-rebuild test -I nixpkgs=.
building Nix...
building the system configuration...
error: The option `services.prometheus.scrapeConfigs.[definition 1-entry 1].static_configs.[definition 1-entry 1].labels' is used but not defined.
which is unneeded, because labels _are_ optional.
The structured options are incomplete compared to upstream and I think
it will be a maintenance burden to try to keep up. Instead, provide an
option for the raw config file contents (prometheus.yml).
The collectd service runs as an unprivileged user by default, so it does
not leak more information to its data directory than any user can obtain
elsewhere by other means.
If people are running it as root and are worried about information leak,
we can add collectd group and set perms to 750.
CC @offlinehacker.
Fixes#21198.
* influxdb module: add postStart
* cadvisor module: increase TimeoutStartSec
Under high load, the cadvisor module can take longer than the default 90
seconds to start. This change should hopefully fix the test on Hydra.
Systemd upstream provides targets for networking. This also includes a target network-online.target.
In this PR I remove / replace most occurrences since some of them were even wrong and could delay startup.
Every period, sa1 collects and stores data.
Every 24 hours, sa2 aggregates the previous day's data in to a
report.
Timers and unit configurations were lifted from Fedora's default
units.
This commit implements the changes necessary to start up a graphite carbon Cache
with twisted and start the corresponding graphiteWeb service.
Dependencies need to be included via python buildEnv to include all recursive
implicit dependencies.
Additionally cairo is a requirement of graphiteWeb and pycairo is not a standard
python package (buildPythonPackage) and therefore cannot be included via
buildEnv. It also needs cairo in the Library PATH.
Accidentally broken by 4fede53c09
("nixos manuals: bring back package references").
Without this fix, grafana won't start:
$ systemctl status grafana
...
systemd[1]: Starting Grafana Service Daemon...
systemd[1]: Started Grafana Service Daemon.
grafana[666]: 2016/03/06 19:57:32 [log.go:75 Fatal()] [E] Failed to detect generated css or javascript files in static root (%!s(MISSING)), have you executed default grunt task?
systemd[1]: grafana.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: grafana.service: Unit entered failed state.
systemd[1]: grafana.service: Failed with result 'exit-code'.
This reverts most of 89e983786a, as those references are sanitized now.
Fixes#10039, at least most of it.
The `sane` case wasn't fixed, as it calls a *function* in pkgs to get
the default value.
- add missing types in module definitions
- add missing 'defaultText' in module definitions
- wrap example with 'literalExample' where necessary in module definitions
This reverts most of 89e983786a, as those references are sanitized now.
Fixes#10039, at least most of it.
The `sane` case wasn't fixed, as it calls a *function* in pkgs to get
the default value.
The advantage of putting the PID file under the ephemeral /run is that
when the machine crashes /run gets cleared allowing graphite to start
once the machine is rebooted.
We also set the PIDFile systemd option so that systemd knows the correct
PID and enables systemd to remove the file after service shut down.
* package statsd node packages separatly since they actually require
nodejs-0.10 or nodejs-0.12 to work (which is ... well old)
* remove statsd packages and its backends from "global" node-packages.json.
i did not rebuild it since for some reason npm2nix command fails. next time
somebody will rerun npm2nix statsd packages are going to be removed.
* statsd service: backends are now provided as strings and not anymore as
packages.
The most complex problems were from dealing with switches reverted in
the meantime (gcc5, gmp6, ncurses6).
It's likely that darwin is (still) broken nontrivially.
Now it generates notifications for auto-detected devices as well as
for explicitly configured ones, sends well formed e-mails and supports
immediate `wall` and `xmessage` notifications.
Better replace the double quotes in 'echo "${commands}"' with single
quotes, to prevent the shell from doing command substitution etc. at
configuration build time.
I'm not sure what exactly this user is needed for, i.e. under what circumstances
it must exist or not, but creating it unconditionally seems like the wrong thing
to do. I complained to @offlinehacker about this on Github, but got no response
for a week or so. I'm disabling the extraUsers bit to put out the fire, and now
hope that someone who actually knows about Graphite implements a proper solution
later.
Should bring most of the examples into a better consistency regarding
syntactic representation in the manual.
Thanks to @devhell for reporting.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
All activation scripts run in serial upon boot and nixos-rebuild switch
etc., in contrast to preStart which run before a service starts, and can
run in parallel with other services.
The munin(-node) activation script is particularly slow. Change it to a
preStart script so that it can run in parallel with other services and
not slow down boot (or nixos-rebuild switch).
This reduces (repeated) "nixos-rebuild test" time from ~16 seconds to ~8
on my (old) laptop.
- Upgrade Nagios Core to 4.x
- Expose mainConfigFile and cgiConfigFile in module for finer
configuration control.
- Upgrade Plugins to 2.x
- Remove default objectDefs, which users probably want to customize.
- Systemd-ify Nagios module and simplify directory structure
- Upgrade Nagios package with more modern patch, and ensure the
statedir is set to /var/lib/nagios
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Currently, the restartTriggers are abusing the systemd unit file in that
the cfg.carbon.config/storageAggregation/... option text is pasted into
the unit file. Even though this sort-of works (the service is restarted
if the config changes) this causes systemd to print error messages about
invalid sections (rightfully so!).
The correct use of restartTriggers is to list storage paths, which is
what this change does. If any of the
cfg.carbon/config/storageAggregation/... options change, configDir will
get a new hash. It is not as "fine grained" as the current version, but
it is not abusing the interface.
Also, remove unneeded 'waitress' in one of the restartTriggers, because
it is already listed as part of the service config.
graphitePort must point to the port that carbon-cache listens on, not
the graphite webUI port.
With this change I finally got data from statsd to graphite.
It's "aggregation" with two 'g's.
Fixes this:
carbon-cache[9363]: [console] /nix/store/drxq4jj92sjk3cjik2l4hnsndbray3i4-graphite-config/storage-aggregation.conf not found, ignoring.
This overhauls the Datadog module a bit to be much more useful. In
particular, it adds support for nginx and postgresql monitoring
integrations to dd-agent. These have to exist in separate files under
/etc/dd-agent, so the module just exposes then as separate options. In
the future, more integrations could be added this way.
In the process of doing this, I also had to rename the dd-agent user to
datadog. Note the UIDs did not change, so this is strictly backwards
compatible. The reason for this is to make it easier to create a
'datadog' postgres user with access to pg_stats, as 'dd-agent' typically
isn't a valid username. This allows the out of the box configurations to
be used.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
mkdir -m will only set the permissions if it *creates* the directory.
Existing directories, with possibly wrong permissions, will not be
updated.
Use explicit chmod so permissions will always be correct.
The preStart snippets (graphite, carbon) try to create directories under
/var/db/. That currently fails because the code is run as user
"graphite". Fix by setting "PermissionsStartOnly = true" so that the
preStart stuff is run as 'root'.
Further:
* graphite-web-0.9.12/bin/build-index.sh needs perl, so add it to PATH.
* Now that preStart runs as root, we must wait with "chown graphite"
until we're done creating files/directories.
* Drop needless check for root (uid 0) before running chown.
Using pkgs.lib on the spine of module evaluation is problematic
because the pkgs argument depends on the result of module
evaluation. To prevent an infinite recursion, pkgs and some of the
modules are evaluated twice, which is inefficient. Using ‘with lib’
prevents this problem.
To be compatible with eb2f44c18c (Generate
/etc/passwd and /etc/group at build time). Without this you'll get this:
$ nixos-rebuild build
[...]
user-thrown exception: The option `users.extraGroups.unnamed-9.1.gid' is used but not defined.
(systemd service descriptions that is, not service descriptions in "man
configuration.nix".)
Capitalizing each word in the description seems to be the accepted
standard.
Also shorten these descriptions:
* "Munin node, the agent process" => "Munin Node"
* "Planet Venus, an awesome ‘river of news’ feed reader" => "Planet Venus Feed Reader"
Twisted provides option to log with syslog, this enables nicer logging.
Imagine what happens in a case of exception. If logs are written to stdout,
traceback won't be merged thus giving ugly logs. This commit fixes that.
This is also one of the official ways of starting carbon, so no worries.