Monitor HTTP(S) Endpoints Using Blackbox Exporter

To get notified about firing alerts you need to setup an alert receiver.

See Configure Alert Receivers for more information.

Prerequisites

Command Line Access to APPUiO Cloud
Helm 3, helm.sh

Install Blackbox Exporter

Set the namespace for the Blackbox Exporter

APP_NAMESPACE="my-app" (1)
EXPORTER_NAMESPACE="${APP_NAMESPACE}" (2)

1	The namespace of your application.
2	The Exporter can be installed in the same namespace as your application or in a separate namespace. If you deploy the Blackbox Exporter in a separate namespace and want to use in-cluster probes, you’ll have to configure additional network policies as the default network policies configured by APPUiO Cloud isolate namespaces from each other. See the Tips & Tricks section for more information.

Check if signed in
```
kubectl get projects
```

Install the Blackbox Exporter

# Add the prometheus-community Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install the Blackbox Exporter
helm install blackbox-exporter prometheus-community/prometheus-blackbox-exporter \
  --namespace "${EXPORTER_NAMESPACE}" \
  --set pspEnabled=false \
  --set securityContext.runAsUser=null,securityContext.runAsGroup=null

Configure a Blackbox Exporter Probe

Configure a probe

kubectl -n "${APP_NAMESPACE}" apply -f - <<YAML
apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
  name: my-app
spec:
  prober:
    url: blackbox-exporter-prometheus-blackbox-exporter.${EXPORTER_NAMESPACE}.svc.cluster.local:9115 (1)
  targets:
    staticConfig:
      static:
      - my-app.${APP_NAMESPACE}.svc:8080 (2)
      - https://my-app.apps.appuio-cluster.cloud (3)
YAML

1	The URL of the Blackbox Exporter may be different if you’re using different Helm values or a different release name. Check with `kubectl get svc -n ${EXPORTER_NAMESPACE}`.
2	You can configure a cluster-local probe by providing the name and port of the application’s in-cluster service. The suffix `.${APP_NAMESPACE}.svc` can be omitted if you’ve installed the Blackbox Exporter in the same namespace as your application.
3	You can configure a probe to the public address of your application. This is only relevant if you’re setting up a probe for an application which is exposed through an `Ingress` or `Route`. You may want to explicitly probe both the `https://` and `http://` URLs to test the HTTPS redirect.

APPUiO Cloud currently doesn’t support configuring probes with Ingress discovery (.spec.targets.ingress).

Verify the configuration

Navigate to the OpenShift Web Console console.CLUSTER.cloud/dev-monitoring/ns/<APP_NAMESPACE>/metrics.

Select Custom Query and enter up{job=~"probe/.*"}. There should be an entry for each target with a value of 1 if the probe was successful.

If the probe doesn’t show up in the list or the value is 0 see the Troubleshooting section.
Verify the probe

Select Custom Query and enter probe_success. There should be an entry for each target with a value of 1 if the probe was successful.

If the probe doesn’t show up in the list or the value is 0 see the Troubleshooting section.

Create an Alert for the Probe

Create the alert

kubectl -n "${APP_NAMESPACE}" apply -f - <<YAML
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-app-blackbox-alerts
spec:
  groups:
  - name: my-app-blackbox.rules
    rules:
    - alert: EndpointDown
      expr: probe_success == 0 (1)
      for: 5m (2)
      labels:
        severity: critical
      annotations:
        summary: "Endpoint {{ \$labels.instance }} down"
YAML

1	The `probe_success` metric is created by the Blackbox Exporter as an overall status of the probe. You can also alert on the individual metrics like `probe_http_status_code` or `probe_duration_seconds`. Query all available metrics with `{job=~"probe/.*"}`.
2	The `for` clause specifies how long the probe has to fail before the alert is triggered. Using a low value here will trigger the alert quicker, but may generate false positives.

Verify that the alert was picked up by Prometheus

Navigate to the OpenShift Web Console console.<CLUSTER>.cloud/dev-monitoring/ns/<APP_NAMESPACE>/alerts.

The alert should be displayed in the list of alerts.

It may take a few minutes for the alert to show up. If the alert doesn’t appear after a few minutes, see the Troubleshooting section.

Troubleshooting

No Entry is Shown in the List if Querying the `up` Metric `up{job=~"probe/.*"}`

The probe CRD isn’t well-formed or Prometheus hasn’t yet scraped the probe.

Prometheus hasn’t yet scraped the probe. Wait for three more minutes and check again.

If no entry is shown in the list, the Probe CRD most likely is misconfigured. Especially check the spec.prober.url. The format is HOST:PORT. HOST isn’t valid.

The Value of the `up` Metric is `0`

The probe CRD is well-formed but the Blackbox Exporter isn’t reachable.

If the value is 0, the Probe CRD is valid, but the Blackbox Exporter can’t be reached. Double check the .spec.prober.url of the Probe CRD.

Check if the Blackbox Exporter deployment is running and ready.

`probe_success` shows `0`

The probe CRD is well-formed and the Blackbox Exporter is reachable. The target isn’t reachable, the certificate is bad, or the target returns an unwanted status code.

Check the probe_http_status_code for the job. Only 2XX is considered a success after the final redirect. If the status code is 0, the problem is earlier in the chain.

No Alerts Shows Up in the Alert List

The PrometheusRule CRD might not be well-formed and might have failed to apply.

Check if the PrometheusRule CRD was created successfully.
```
kubectl -n "${APP_NAMESPACE}" get prometheusrules
```

Debug Failing Probes

Connect to the Blackbox Exporter

kubectl --namespace "${EXPORTER_NAMESPACE}" port-forward svc/blackbox-exporter-prometheus-blackbox-exporter 9115

Open the Blackbox Exporter in a browser

Navigate to localhost:9115.
Choose the failed target from the list

Click logs on the failed target.
Check the logs

The logs should show the reason for the failure. In this example an expired certificate is the cause of the error.

Tips & Tricks

Public (Ingress) vs. Local Probes

Local probes have a higher chance of just monitoring the application. They don’t have to go through the ingress so you won’t be notified if there are problems with the ingress. However, it isn’t possible to detect expired certificates or misconfigured ingresses.

Public probes have a higher chance of detecting false positives. For example, a public probe will fail if the application works, but the ingress is down. However, they allow you to setup alerts for expiring certificates configured on the application’s ingress and misconfigurations of the ingress itself.

Cross Cluster Probes

The Blackbox Exporter can be installed on a different zone or cluster than the application and probe the public endpoint of the application.

There is a big unknown (the internet) in between clusters. There is a high chance that the probe will fail because of unrelated network issues between the two clusters.

Install Blackbox Exporter in a Different Namespace

The Blackbox Exporter can be installed in a different namespace than the application. This can be useful if you want to monitor multiple applications with only one Blackbox Exporter.

APPUiO Cloud isolates namespaces from each other by default.

Blackbox Exporter probes can be allowed by configuring a NetworkPolicy in the namespace of the application.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: blackbox-exporter-ingress
spec:
  podSelector: {} (1)
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: ${EXPORTER_NAMESPACE} (2)
      ports: (3)
        - protocol: TCP
          port: 443
        - protocol: TCP
          port: 80

1	Allow all pods in the namespace to be probed. Can be restricted to a specific label selector.
2	The name of the namespace where the Blackbox Exporter is installed.
3	Allow probes on port 80 and 443. If left out, all ports are allowed.

See the Kubernetes documentation for more information about NetworkPolices.

TLS Checks

The Blackbox Exporter returns metrics for the TLS certificate validity period. Using this information, you can alert on expired certificates by checking the probe_ssl_earliest_cert_expiry metric.

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-app-blackbox-alerts
spec:
  groups:
  - name: my-app-blackbox.rules
    rules:
    - alert: TLSCertExpiringSoon
      expr: probe_ssl_earliest_cert_expiry - time() < 5 * 86400 (1)
      for: 1m
      labels:
        severity: warning
      annotations:
        description: "TLS certificate for {{ $labels.instance }} will expire in {{ $value | humanizeDuration }}"
    - alert: EndpointDown
      expr: probe_success == 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Endpoint {{ $labels.instance }} down"

1	The `probe_ssl_earliest_cert_expiry` metric is the earliest expiration date of the certificate chain. 5 days are chosen to give enough time to renew the certificate even through extended weekends.

If you’re using Let’s Encrypt certificates, you may want to alert for certificates expiring in 20 days or less, as the default configuration for APPUiO Cloud renews Let’s Encrypt certificates when their remaining validity is 30 days.

Monitor HTTP(S) Endpoints Using Blackbox Exporter

Prerequisites

Install Blackbox Exporter

Configure a Blackbox Exporter Probe

Create an Alert for the Probe

Troubleshooting

No Entry is Shown in the List if Querying the up Metric up{job=~"probe/.*"}

The Value of the up Metric is 0

probe_success shows 0

No Alerts Shows Up in the Alert List

Debug Failing Probes

Tips & Tricks

Public (Ingress) vs. Local Probes

Cross Cluster Probes

Install Blackbox Exporter in a Different Namespace

TLS Checks

No Entry is Shown in the List if Querying the `up` Metric `up{job=~"probe/.*"}`

The Value of the `up` Metric is `0`

`probe_success` shows `0`