If the host network stack is slow to start, the alertmanager fails to
start with this error message:
caller=main.go:256 msg="unable to initialize gossip mesh" err="create memberlist: Failed to get final advertise address: No private IP address found, and explicit IP not provided"
This bug can be reproduced by shutting down the network stack and
restarting the alertmanager.
Note I don't know why I didn't hit this issue with previous
alertmanager releases.
A centralized list for these renames is not good because:
- It breaks disabledModules for modules that have a rename defined
- Adding/removing renames for a module means having to find them in the
central file
- Merge conflicts due to multiple people editing the central file
This results in a simpler service unit which doesn't first have to
start a shell:
> cat /nix/store/s95nsr8zbkblklanqpkiap49mkwbaq45-unit-alertmanager.service/alertmanager.service
...
ExecStart=/nix/store/4g784lwcy7kp69hg0z2hfwkhjp2914lr-alertmanager-0.16.2-bin/bin/alertmanager \
--config.file /nix/store/p2c7fyi2jkkwq04z2flk84q4wyj2ggry-checked-config \
--web.listen-address [::1]:9093 \
--log.level warn
...
https://github.com/golang/go/issues/9334 describes how net.Listen (as used by Alertmanager):
* listens on 127.0.0.1 if the listenAddress is "localhost"
* listens on all interfaces if the listenAddress is ""
This commit adds an assertion that checks that either `configFile` or
`configuration` is configured for alertmanager. The alertmanager config
can not be an empty attributeset. The check executed with `amtool` fails
before the service even has the chance to start. We should probably not
allow a broken alertmanager configuration anyway.
This also introduces a test for alertmanager configuration that piggy
backs on the existing prometheus tests.
Alertmanager 0.13.0 doesn't support single dash long options, so '-config.file'
for example is parsed as '-c', which leads to the service not starting.
The reason being less mental overhead when reading upstream
documentation. Examples can be pasted right into the configuration
instead of translating to Nix attrset first.
The current default value of listenAddress = null blows up:
$ nixos-rebuild build
error: cannot coerce null to a string, at
.../nixpkgs/nixos/modules/services/monitoring/prometheus/alertmanager.nix:97:16
With listenAddress = "" we use the same default as upstream and there is
no blow up :-)