Configure Alert Receivers

APPUiO Cloud allows you to configure custom alert receivers to notify you when an alert is triggered.

This how-to guide shows you alert receivers and routing basics with examples for Slack, Telegram, and email.

Prerequisites

  • Command Line Access to APPUiO Cloud

Configure Alertmanager

OpenShift uses Alertmanager to manage alerts and send notifications to the correct receiver.

Alertmanager brings different integrations which can be listed by running kubectl explain alertmanagerconfigs.spec.receivers.

The basic structure of the Alertmanager configuration is as follows. Click the links below to see a configuration example for each section:

apiVersion: monitoring.coreos.com/v1beta1
kind: AlertmanagerConfig
metadata:
  name: alert-config
spec:
  receivers:
  - name: telegram
    telegramConfigs: {} (1)
  - name: slack
    slackConfigs: {} (2)
  - name: email
    emailConfigs: {} (3)
  route: (4)
    receiver: default
    routes: []
1 Configuration for Telegram. See Configure Telegram Receiver for more information.
2 Configuration for Slack. See Configure Slack Receiver for more information.
3 Email configuration for out-of-office-hours. See Configure Email Receiver for more information. Configured to only send alerts during out of office hours.
4 Route configuration. All alerts are routed to the default receiver. See Routing Alerts for more information.

Configure Pushover Receiver

  1. Create a new application on the Pushover console.

  2. Set the namespace the alerts are created in, as well as the two Pushover secrets: the user key and the API key.

    APP_NAMESPACE=my-namespace
    PUSHOVER_USER_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    PUSHOVER_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  3. Create secrets for your user key and the API key of your newly created application:

    kubectl -n "${APP_NAMESPACE}" apply -f - <<YAML
    apiVersion: v1
    kind: Secret
    metadata:
      name: pushover-user-key
    type: Opaque
    stringData:
      value: ${PUSHOVER_USER_KEY}
    YAML
    kubectl -n "${APP_NAMESPACE}" apply -f - <<YAML
    apiVersion: v1
    kind: Secret
    metadata:
      name: pushover-api-key
    type: Opaque
    stringData:
      value: ${PUSHOVER_API_KEY}
    YAML
  4. Create an AlertmanagerConfig using the secrets created above:

    kubectl -n "${APP_NAMESPACE}" apply -f - <<YAML
    apiVersion: monitoring.coreos.com/v1beta1
    kind: AlertmanagerConfig
    metadata:
      name: ${APP_NAMESPACE}-alert-config
    spec:
      receivers:
      - name: pushover
        pushoverConfigs:
        - userKey:
            key: value
            name: pushover-user-key
          token:
            key: value
            name: pushover-api-key
      route:
        receiver: pushover
        routes:
        - match:
            severity: critical
          continue: true
    YAML

Configure Telegram Receiver

  1. Set the namespace the alerts are created in.

    APP_NAMESPACE=my-namespace
  2. Create a Telegram bot by messaging the @BotFather. See Telegram documentation for more information.

  3. Create a secret with the bot token returned by the BotFather.

    TOKEN=<TOKEN> (1)
    kubectl create secret generic telegram-alert-bot "--from-literal=token=${TOKEN}"
    1 Looks like "123456789:ABCDEF1234567890ABCDEF1234567890ABC"
  4. Find the Chat ID.

    Add the bot to a group chat, and send a message to the group chat, or message the bot directly.

    Query the Telegram API to get the Chat ID.

    curl -X GET "https://api.telegram.org/bot${TOKEN}/getUpdates" | jq '.result[].message | select(. != null)'

    There should be an object with a text field that contains the message you sent. and a chat.id field. Save the chat.id field for the next step.

    CHAT_ID=<CHAT_ID>
  5. Create the AlertmanagerConfig.

    kubectl -n "${APP_NAMESPACE}" apply -f - <<YAML
    apiVersion: monitoring.coreos.com/v1beta1
    kind: AlertmanagerConfig
    metadata:
      name: telegram-alert-config
    spec:
      receivers:
      - name: default
        telegramConfigs:
        - apiURL: https://api.telegram.org
          botToken: (1)
            key: token
            name: telegram-alert-bot
          chatID: ${CHAT_ID}
          parseMode: HTML (2)
      route:
        groupBy: [alertname, instance]
        receiver: default
    YAML
    1 The secret created earlier.
    2 The default parse mode is MarkdownV2 which doesn’t work with the default message template. Telegram wants every special character to be escaped with a backslash. HTML works well with the default Alertmanager template.
  6. Test the receiver. See Test Alerting.

Configure Slack Receiver

  1. Set the namespace the alerts are created in.

    APP_NAMESPACE=my-namespace
  2. Get the webhook URL from Slack.

    Go to my.slack.com/services/new/incoming-webhook/, choose the Workspace in the right upper-most corner, select a channel, and click Add Incoming webhooks integration.

    Copy the webhook URL.

    Slack Webhook
  3. Create a secret with the webhook URL.

    URL=<URL>
    kubectl create secret generic slack-alert-webhook "--from-literal=url=${URL}"
  4. Create the AlertmanagerConfig.

    kubectl -n "${APP_NAMESPACE}" apply -f - <<YAML
    apiVersion: monitoring.coreos.com/v1beta1
    kind: AlertmanagerConfig
    metadata:
      name: slack-alert-config
    spec:
      receivers:
      - name: default
        slackConfigs:
        - apiURL: (1)
            key: url
            name: slack-alert-webhook
          channel: '#alerts' (2)
      route:
        groupBy: [alertname, instance]
        receiver: default
    YAML
    1 The secret created earlier.
    2 The channel to send alerts to.
  5. Test the receiver. See Test Alerting.

Configure Email Receiver

  1. Set the namespace the alerts are created in.

    APP_NAMESPACE=my-namespace
  2. Create a secret with the email authentication password.

    PASSWORD=<PASSWORD>
    kubectl create secret generic email-alert-auth "--from-literal=password=${PASSWORD}"

    This tutorial sets up SMTP authentication using PLAIN authentication. CRAM-MD5 and LOGIN are also supported. See Prometheus documentation for description of the supported authentication fields.

  3. Create the AlertmanagerConfig.

    kubectl -n "${APP_NAMESPACE}" apply -f - <<YAML
    apiVersion: monitoring.coreos.com/v1beta1
    kind: AlertmanagerConfig
    metadata:
      name: alert-config
    spec:
      receivers:
      - name: default
        emailConfigs:
        - to: alerts@example.com (1)
          from: my-app@example.com (2)
          smarthost: smtp.example.com:587 (3)
          authUsername: my-app@example.com (4)
          authIdentity: my-app@example.com (5)
          authPassword: (6)
            key: password
            name: email-alert-auth
      route:
        groupBy: [alertname, instance]
        receiver: default
    YAML
    1 The email address to send alerts to.
    2 The email address to send alerts from.
    3 The SMTP server to send alerts through. Alertmanager doesn’t support unencrypted connections to remote SMTP endpoints.
    4 The username to authenticate with. If empty, Alertmanager doesn’t authenticate to the SMTP server.
    5 The PLAIN identity to authenticate with. Can be the same as the username. Might be omitted for LOGIN protocol.
    6 The secret created earlier.
  4. Test the receiver. See Test Alerting.

Routing Alerts

Alerts can be routed to different receivers based on the labels attached to the alerts.

An AlertmanagerConfig resource defines a routing tree with a root route and child routes. The root route is the default route and is used for all alerts that don’t match any of the child routes. The matcher of the root route is set by APPUiO Cloud to match the namespace of the AlertmanagerConfig. All other matchers are ignored.

The routing tree
route:
  receiver: default
  routes:
  - match: { team: foo }
    continue: true
  - match: { team: bar }
    routes:
    - match: { severity: warning }
    - match: { severity: critical }

The child routes are evaluated in order and the first matching route is used to route the alert. The match field of a route defines the labels that must match for the route to match. The match field can be omitted to match all alerts.

Setting continue to true on a route will cause the evaluation to continue to the next route. This can be used to send alerts to multiple receivers.

It’s possible to mute alerts during certain times of the day. This can be done by defining a timeInterval and referencing it in the muteTimeIntervals field of a route.

Time intervals
route:
  muteTimeIntervals: [weekends]
timeIntervals:
- name: weekends
  timeIntervals:
  - weekdays: [saturday, sunday]

Alerts can be grouped by labels using the groupBy field. Grouping sends multiple alerts that are similar together as a single notification.

Alerts grouped by name
route:
  groupBy: [alertname]

See Prometheus documentation for more information about routing.

Keys need to be mapped from the Prometheus style (snake case) to the Kubernetes CRD style (camel case).

Some fields might not be supported yet by the AlertmanagerConfig CRD.

Example

This example shows how to route alerts to different receivers based on the labels attached to the alerts.

apiVersion: monitoring.coreos.com/v1beta1
kind: AlertmanagerConfig
metadata:
  name: alert-config
spec:
  receivers:
  - name: default
  - name: database-team-slack
  [ ... ]
  route:
    receiver: default (1)
    routes:
    - match:
        team: database
      continue: false (2)
      receiver: database-team-slack
      routes:
      - match: (3)
          severity: critical
        receiver: database-team-pagerduty
        continue: true (4)
    - match:
        team: backend
      continue: false (2)
      receiver: backend-team-slack
      muteTimeIntervals: (5)
      - non-office-hours
      routes:
      - match:
          severity: critical
        receiver: backend-team-pagerduty
        continue: true
    groupBy: [alertname, instance] (6)
  timeIntervals:
  # Monday to Friday, midnight to 9am and 5pm to midnight (UTC) and weekends
  - name: non-office-hours (7)
    timeIntervals:
    - times:
      - startTime: "00:00" (8)
        endTime: "09:00"
      - startTime: "17:00"
        endTime: "24:00"
      weekdays:
      - "monday:friday"
    - weekdays:
      - "saturday"
      - "sunday"
1 The default receiver. All alerts that don’t match any team will be routed to this receiver.
2 Don’t further evaluate the remaining routes. The alert will be routed to the given team and won’t appear in the default "catch-all" receiver. false is the default value and only set for demonstration purposes.
3 Matches alerts with the label team=database, given from the parent route, and severity=critical.
4 Sets this route as an additional receiver for the alert. The alert will be routed to the database-team-pagerduty receiver and the parent Slack receiver.
5 Mutes the Slack alerts for the back-end team during non-office hours. When a route is muted it won’t send any notifications, but otherwise act normally, this includes ending the route-matching process if the continue option isn’t set.
6 The labels to group alerts by. Grouping sends multiple alerts that are similar together as a single notification.
7 Defines a time interval that can be referenced in the muteTimeIntervals field of a route.
8 All times are in UTC, no other time zones are supported.

Test Alerting

All changes to alerting routes and receivers should be tested to ensure that receivers are able to correctly receive alerts.

  1. Set the namespace you created the AlertmanagerConfig in.

    APP_NAMESPACE=my-namespace
  2. Create a test alert that’s always firing.

    kubectl -n "${APP_NAMESPACE}" apply -f - <<YAML
    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      name: test-alert
    spec:
      groups:
      - name: test-alert
        rules:
        - alert: TestAlert
          expr: vector(1)
          for: 10s
          labels: {} (1)
          annotations:
            summary: "Test Alert"
            description: "This is a test alert"
    YAML
    1 Add labels to the alert to route it to the correct receiver. Usually the default receiver is used if no labels are set.
  3. Check your configured receivers to see if the alert was received.

    See Troubleshooting for debugging tips.

  4. Delete the test alert.

    kubectl -n "${APP_NAMESPACE}" delete prometheusrule test-alert
  5. Repeat the test for all receivers by adding the appropriate labels to the alert.

Troubleshooting

OpenShift currently doesn’t expose any logs or status information for Alertmanager. This makes it difficult to debug issues with Alertmanager. The following section describes some common issues and how to debug them.

Check all secret references.

One missing secret reference blocks the creation of all the receivers in the AlertmanagerConfig.

Check the secret values.

Alertmanager may fail to send alerts when the routing configuration has wrong login credentials or missing certificates.

Create a Webhook debug server.

A webhook receiver can be used to debug issues with the routing configuration or to check secret values.

There are webhook debuggers available online, such as webhook.site or requestbin.com.

It’s not recommended to send sensitive information to these services.

This example creates it’s own basic webhook receiver on APPUiO Cloud.

  1. Create a Webhook Receiver

    APP_NAMESPACE=my-namespace
    APPS_DOMAIN=apps.ZONE # Replace with your cluster's default app domain from https://portal.appuio.cloud/zones
    
    kubectl -n "${APP_NAMESPACE}" create -f - <<YAML
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: logging-webhook
    data:
      logging_server.go: |
        package main
    
        import (
          "fmt"
          "io"
          "net/http"
          "os"
        )
    
        func main() {
          http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
            defer w.WriteHeader(http.StatusNoContent)
            defer r.Body.Close()
    
            fmt.Println(r.Method, r.URL)
            r.Header.Write(os.Stdout)
            fmt.Print("\n")
            io.Copy(os.Stdout, r.Body)
            fmt.Print("\n\n")
          })
          panic(http.ListenAndServe(":8080", nil))
        }
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: logging-webhook
      name: logging-webhook
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: logging-webhook
      strategy: {}
      template:
        metadata:
          creationTimestamp: null
          labels:
            app: logging-webhook
        spec:
          containers:
          - image: golang
            name: golang
            env: [{ name: 'HOME', value: '/tmp' }]
            ports:
            - containerPort: 8080
            resources: {}
            command: [go, run, "/s/logging_server.go"]
            volumeMounts:
            - mountPath: /s
              name: logging-webhook
          volumes:
          - configMap:
              name: logging-webhook
            name: logging-webhook
    YAML
    
    kubectl -n "${APP_NAMESPACE}" expose deployment logging-webhook
    kubectl -n "${APP_NAMESPACE}" create ingress logging-webhook "--rule=logging-webhook-${APP_NAMESPACE}.${APPS_DOMAIN}/*=logging-webhook:8080,tls=logging-webhook-tls"
    kubectl -n "${APP_NAMESPACE}" annotate ingress logging-webhook "cert-manager.io/cluster-issuer=letsencrypt-production"
  2. Convert or create a AlertmanagerConfig to use the webhook receiver.

    kubectl -n "${APP_NAMESPACE}" apply -f - <<YAML
    apiVersion: monitoring.coreos.com/v1beta1
    kind: AlertmanagerConfig
    metadata:
      name: debug-alert-config
    spec:
      receivers:
      - name: default
        webhookConfigs:
        - url: https://logging-webhook-${APP_NAMESPACE}.${APPS_DOMAIN}/
          sendResolved: true
          httpConfig:
            authorization:
              credentials:
                name: email-alert-auth (1)
                key: password
      route:
        groupBy: [alertname, instance]
        receiver: default
    YAML
    1 Can be used to check if secret values are correctly sent to the webhook receiver. Will show up in the logs as Authorization: Bearer secret value.
  3. Check the server is up and running.

    Getting the TLS certificate may take a few minutes.

    curl -v https://logging-webhook-${APP_NAMESPACE}.${APPS_DOMAIN}/
  4. Check the logs of the webhook receiver.

    kubectl -n "${APP_NAMESPACE}" logs deployment/logging-webhook -f

    A log message similar to the following should appear:

    POST /
    Authorization: Bearer xxxxxx
    Content-Length: 1206
    Content-Type: application/json
    Forwarded: for=172.18.200.242;host=logging-webhook-my-namespace.apps.lab-cloudscale-rma-0.appuio.cloud;proto=https
    User-Agent: Alertmanager/0.24.0
    X-Forwarded-For: 172.18.200.242
    X-Forwarded-Host: logging-webhook-my-namespace.apps.lab-cloudscale-rma-0.appuio.cloud
    X-Forwarded-Port: 443
    X-Forwarded-Proto: https
    
    {"receiver":"my-namespace/debug-alert-config/default","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"EndpointDown","instance":"my-app:8080","job":"probe/my-namespace/my-app-ingress","namespace":"my-namespace","prometheus":"openshift-user-workload-monitoring/user-workload","severity":"critical"},"annotations":{"summary":"Endpoint my-app:8080 down"},"startsAt":"2022-11-09T13:41:43.399Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"https://thanos-querier-openshift-monitoring.apps.lab-cloudscale-rma-0.appuio.cloud/api/graph?g0.expr=probe_success%7Bnamespace%3D%22my-namespace%22%7D+%3D%3D+0\u0026g0.tab=1","fingerprint":"972537dc10d0155f"}],"groupLabels":{"alertname":"EndpointDown","instance":"my-app:8080"},"commonLabels":{"alertname":"EndpointDown","instance":"my-app:8080","job":"probe/my-namespace/my-app-ingress","namespace":"my-namespace","prometheus":"openshift-user-workload-monitoring/user-workload","severity":"critical"},"commonAnnotations":{"summary":"Endpoint my-app:8080 down"},"externalURL":"https:///console.lab-cloudscale-rma-0.appuio.cloud/monitoring","version":"4","groupKey":"{}/{namespace=\"my-namespace\"}:{alertname=\"EndpointDown\", instance=\"my-app:8080\"}","truncatedAlerts":0}
  5. Delete the manifests.

    kubectl -n "${APP_NAMESPACE}" delete alertmanagerconfig debug-alert-config
    kubectl -n "${APP_NAMESPACE}" delete ingress logging-webhook
    kubectl -n "${APP_NAMESPACE}" delete service logging-webhook
    kubectl -n "${APP_NAMESPACE}" delete deployment logging-webhook
    kubectl -n "${APP_NAMESPACE}" delete configmap logging-webhook
    kubectl -n "${APP_NAMESPACE}" delete secret logging-webhook-tls