Monitor PVC Disk Usage

To get notified about firing alerts you need to setup an alert receiver.

See Configure Alert Receivers for more information.

Overview

APPUiO Cloud exposes kubelet metrics that can be used to monitor PVC disk usage.

The following volume metrics are available:

kubelet_volume_stats_available_bytes
kubelet_volume_stats_capacity_bytes
kubelet_volume_stats_used_bytes

kubelet_volume_stats_inodes
kubelet_volume_stats_inodes_free
kubelet_volume_stats_inodes_used

Prerequisites

  • Command Line Access to APPUiO Cloud

Create a Disk Usage Alert

The following instructions create two alerts that trigger when the disk usage or used inodes of any PVC in the given namespace exceed 90%.

  1. Set the namespace your application is running in

    APP_NAMESPACE="my-app"
  2. Create the alert

    kubectl -n "${APP_NAMESPACE}" apply -f - <<YAML
    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      name: my-app-volume-alerts
    spec:
      groups:
      - name: my-app-volume.rules
        rules:
        - alert: VolumeAvailableSpaceLow
          expr: |
            (
              sum(kubelet_volume_stats_available_bytes{persistentvolumeclaim!=""}) by (persistentvolumeclaim)
            /
              sum(kubelet_volume_stats_capacity_bytes{persistentvolumeclaim!=""}) by (persistentvolumeclaim)
            ) < 0.1 (1)
          for: 5m
          labels:
            severity: critical (2)
          annotations:
            summary: "PVC available space low"
            description: |
              PVC {{ \$labels.persistentvolumeclaim }} available space {{ \$value | humanizePercentage }} is low.
    
              Your application may become unable to write data to disk.
    
              Either increase the PVC size or delete files to free up space.
        - alert: VolumeFreeInodesLow
          expr: |
            (
              sum(
                kubelet_volume_stats_inodes_free{persistentvolumeclaim!=""} * on(persistentvolumeclaim) kube_persistentvolumeclaim_info{storageclass!="cephfs-fspool-cluster"} (3)
              ) by (persistentvolumeclaim)
            /
              sum(
                kubelet_volume_stats_inodes{persistentvolumeclaim!=""} * on(persistentvolumeclaim) kube_persistentvolumeclaim_info{storageclass!="cephfs-fspool-cluster"}
              ) by (persistentvolumeclaim)
            ) < 0.1 (4)
          for: 5m
          labels:
            severity: critical (2)
          annotations:
            summary: "PVC free inodes low"
            description: |
              PVC {{ \$labels.persistentvolumeclaim }} free inodes {{ \$value | humanizePercentage }} is low.
    
              Your application may become unable to create new files and directories.
    
              Inodes are used to track files and directories. When the number of inodes is low, you may not be able to create new files or directories.
              A high inode usage may stem from many small files. Increasing the PVC size will scale up inodes proportionally.
    
              Either increase the PVC size or delete files to free up inodes.
    YAML
    1 Alert if available space is less than 10%.
    2 You might want to add a higher threshold with a warning severity.
    3 CephFS doesn’t store metadata in inode structures and always reports zero free inodes. This filter excludes CephFS PVCs from the alert.
    4 Alert if free inodes are less than 10%.

    It’s possible to alert on absolute values, for example if the available space is less than 10GiB.

    expr: |
      max by(persistentvolumeclaim) (kubelet_volume_stats_available_bytes{persistentvolumeclaim!=""}) < 10 * 1024 * 1024 * 1024
    annotations:
      description: |
        PVC {{ $labels.persistentvolumeclaim }} available space {{ $value | humanize1024 }} is low.
    
        ...

Predicting Disk Usage

Prometheus can be used to predict disk usage based on the current rate of change.

The following alert triggers when the available space is predicted to be full in less than 4 hours.

It can be difficult to predict disk usage for a given application. The following alert is only a starting point and should be adjusted to your specific use case.

Both a shorter historical time frame and a longer prediction time can lead to false positives. False positives for short spikes can be minimized using the for clause.

A longer historical time frame or a shorter prediction time can delay alerting until the volume is full or almost full if spikes or sudden, unusual fill patterns occur.

kubectl -n "${APP_NAMESPACE}" apply -f - <<YAML
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-app-volume-prediction-alerts
spec:
  groups:
  - name: my-app-volume-prediction.rules
    rules:
    - alert: VolumeAvailableSpaceFullIn4Hours
      expr: |
        min by(persistentvolumeclaim) (
          predict_linear(kubelet_volume_stats_inodes_free{persistentvolumeclaim!=""}[1h], 4 * 3600) (1)
        )
        < 0
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "PVC available space predicted to run out in 4 hours"
        description: |
          PVC {{ \$labels.persistentvolumeclaim }} available space is predicted to run out within the next 4 hours.

          Your application may become unable to write data to disk.

          Either increase the PVC size or delete files to free up space.
    - alert: VolumeFreeInodesFullIn4Hours
      expr: |
        min by(persistentvolumeclaim) (
          predict_linear(kubelet_volume_stats_inodes_free{persistentvolumeclaim!=""}[1h], 4 * 3600) (1)
        )
        * on(persistentvolumeclaim) (2)
          min by(persistentvolumeclaim) (kube_persistentvolumeclaim_info{storageclass!="cephfs-fspool-cluster"})
        < 0
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "PVC free inodes predicted to run out in 4 hours"
        description: |
          PVC {{ \$labels.persistentvolumeclaim }} free inodes are predicted to run out within the next 4 hours.

          Your application may become unable to create new files and directories.

          Inodes are used to track files and directories. When the number of inodes is low, you may not be able to create new files or directories.
          A high inode usage may stem from many small files. Increasing the PVC size will scale up inodes proportionally.

          Either increase the PVC size or delete files to free up inodes.
YAML
1 Use the last 1 hour of data to predict the next 4 hours.
2 CephFS doesn’t store metadata in inode structures and always reports zero free inodes. This filter excludes CephFS PVCs from the alert.