Alertmanager Telegram integration in Kubernetes
Chris Cowley
- 7 minutes read - 1394 wordsSome more Prometheus content now, this time I will share how I do my alerting. I want to to receive alerts from my homelab, but I do not care about them too much. If something goes wrong at 2am, I really could not careless. In fact, there is no reason why I would want to be disturbed by them under any circumstances.
To that end, the whole point of my homelab is that I can use the same techniques I would use in production. To that end, I do want to set up alerts, I just want to keep them as non-intrusive as possible. A good solution for my alerts then is Telegram, as I can have them sent to a channel that I mute. Now I can see them, but I am never actually disturbed by them.
As previously mentioned on these pages, I run Kubernetes in my homelab and use the (now classic) Prometheus/Grafana stack for monitoring everything. The basic kube-prometheus-stack Helm chart is pretty much the first thing I installed when I originally built my cluster. Obviously I will build on that by adding Alertmanager, which I had previously disabled.
Prepare Telegram
The first step is to setup Telegram, so open that and search for “@BotFather”
Start a chat and enter 2 commands:
/start
/newbot
BotFather will ask you a couple of questions (name and username) and return a token, which you need to hang on to for later. Next you need to create a channel for it to talk to you on. I use a private chat, so I just chat directly with the bot. You need to get the ID of that chat though. I used the Telegram API:
curl https://api.telegram.org/bot<BOT_TOKEN>/getUpdates
That will return something like:
"chat": {
"id": -123456789,
"title": "K3s Alerts"
}
We need to give the chat ID and token to alertmanager.
Prometheus Alertmanager
Alertmanager is essentially a router for messages. Based on the contents of the message sent by Prometheus, it decides where to route them. In our case, we only have a single route (telegram), but you could have multiple. For example, during the day, perhaps you send informational messages (CPU/RAM usage, disk space warnings, etc) to Telegram. However, after hours these are simply logged, but critical alerts get sent to something like Pushover or Pagerduty.
The first thing we need to do is create a secret with that Bot token.
kubectl -n monitoring create secret generic alertmanager-telegram \
--from-literal 'bot_token=<BOT_TOKEN>' \
Making sure that is both secure AND versioned is an exercise for the reader. As I reminder, I use Sealed Secrets which works well enough for my use case.
There are a couple of things you need to add. The first is an AlertManagerConfig:
---
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: alert-config-telegram
namespace: prometheus
labels:
release: prometheus # the release label should be the exact label of the prometheus helm release label
alertmanagerConfig: alertmanager-telegram # custom label and also set this label to alertmanager in the prometheus-stack values.yaml
spec:
route:
groupBy: ["alertname","job","namespace"] # it could be other labels to be grouped by
groupWait: 30s
groupInterval: 5m
repeatInterval: 1h
receiver: "telegram"
routes:
- receiver: "telegram"
matchers:
- name: severity
matchType: "=~"
value: "warning|critical"
receivers:
- name: telegram
telegramConfigs:
- botToken:
name: alertmanager-telegram # refer to the secret name has been created before
key: bot_token # bot-token data in the secret
chatID: <CHAT_ID>
parseMode: 'MarkdownV2' # The template I provide is in Markdown Mode
disableNotifications: false
sendResolved: true
This is a resource that will be picked up by the Prometheus Operator when we enable Alertmanager. A couple of things to point out are the labels, which are used to make that association, and the <CHAT_ID which you need to replace with your own. Notice also that, under spec.route.routes I match on both warning and critical. Normally I would likely create a second receiver for Pushover/PagerDuty for critical messages.
The next step is to enable Alertmanager in our Prometheus config. Of course this is in our values.yaml (or whichever file Helm gets that info from).
defaultRules:
create: false
alertmanager:
enabled: true
ingress:
enabled: false
config:
global:
resolve_timeout: 5m
templateFiles:
telegram.tmpl: |
{{ define "telegram.default.message" }}
{{- if eq .Status "firing" -}}
{{- if eq .CommonLabels.severity "critical" -}}
🔴 Alert: {{ .CommonLabels.alertname }}
{{- else if eq .CommonLabels.severity "warning" -}}
🟠 Alert: {{ .CommonLabels.alertname }}
{{- else -}}
⚪️ Alert: {{ .CommonLabels.alertname }}
{{- end }}
Status: 🔥 FIRING
Severity: {{ if eq .CommonLabels.severity "critical" }}🔴 {{ .CommonLabels.severity | title }}{{ else if eq .CommonLabels.severity "warning" }}🟠 {{ .CommonLabels.severity | title }}{{ else }}⚪️ {{ .CommonLabels.severity | title }}{{ end }}
{{- else if eq .Status "resolved" -}}
⚪️ Alert: {{ .CommonLabels.alertname }}
Status: ✅ RESOLVED
Severity: {{ if eq .CommonLabels.severity "critical" }}🟢 {{ .CommonLabels.severity | title }}{{ else if eq .CommonLabels.severity "warning" }}🟢 {{ .CommonLabels.severity | title }}{{ else }}⚪️ {{ .CommonLabels.severity | title }}{{ end }}
{{- end }}
{{- range .Alerts -}}
{{- if .Labels.job }}
Job: `{{ .Labels.job }}`
{{- end }}
{{- if .Labels.namespace }}
Namespace: `{{ .Labels.namespace }}`
{{- end }}
{{- if .Labels.instance }}
Instance: `{{ .Labels.instance }}`
{{- end }}
{{- if .Annotations.runbook_url }}
[RunbookURL]({{ .Annotations.runbook_url }})
{{- end }}
{{- end }}
{{ end }}
alertManagerSpec:
alertmanagerConfigMatcherStrategy:
type: None
alertmanagerConfigSelector:
matchLabels:
alertmanagerConfig: alertmanager-telegram # it is the same label in the AlertmanagerConfig.
Right, there is a lot going on here so let’s break it down.
Obviously the first thing you need is to set alertmanager.enabled to true so that it actually deploys it. After that, The lion’s share is the template used to send the alerts in alertmanager.templateFiles. I wanted to use Markdown, and ChatGPT makes that easy.
The other important bit is alertmanager.alertManagerSpec. This is where we tell it to collect the alertManagerConfig we defined previously.
I also set defaultRules.create to false. Leave it as true and it will create a bunch of alert rules based on Awesome Prometheus Alerts, which are all very nice, but you should do your own based on your needs.
An Alert Rule
Of course none of this is very useful unless we actually have an something to alert on. Let’s create one to get started.
Thanks to the Prometheus Operator, these are just a Kubernetes resource:
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
namespace: prometheus
name: targets
labels:
release: prometheus # this label has to be the release name prometheus-stack
spec:
groups:
- name: targets
rules:
- alert: PrometheusTargetMissing
expr: up == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Prometheus target missing \\(instance {{ $labels.instance }}\\)"
description: "A Prometheus target has disappeared. An exporter might be crashed. \n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
That is our first one and the first thing to notice is that it is associated with Prometheus, not Alertmanager (metadata.labels.release) as the alert is actually sent by Prometheus, with Alertmanager only dealing with routing that alert. The actual rule that generates the alert is expr. In this case, if up == 0 for 2 minutes, it sends a critical alert.
That is an incredibly basic alert, which simply tells us an export has gone down. Let’s create one that does an actual PromQL query. Simply add this to the spec.groups.name.rules list:
# node_exporter related metrics for alert
- alert: HostOutOfMemory
expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes < .10)
for: 2m
labels:
severity: warning
annotations:
summary: Host out of memory (instance {{ $labels.instance }})
description: "Node memory is filling up (< 10% left)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
This one goes a little further by performing some simple arithmetic on a couple of metrics (node_memory_MemAvailable_bytes and node_memory_MemTotal_bytes) to see if the node is at risk of running out of memory. Here we can add the severity label warning instead of critical.
Conclusion
With a Telegram bot, an AlertmanagerConfig route, and a few alert rules, your homelab can now emit Prometheus alerts to a channel you can mute and glance at later. The flow is simple: Prometheus detects issues, Alertmanager routes them by severity, and Telegram delivers concise, Markdown‑formatted messages without interrupting your night. Because the setup uses the same production‑grade primitives (Prometheus Operator resources and Helm values) you can grow it at your own pace: add more rules, tweak group intervals, or introduce additional receivers for daytime chatter or pager‑escalation. In short, you get useful visibility without the noise.