Skip to content

Load Average

LoadAverage is a very good first indication that a Virtul Machine or Server has too much work to do. Load Average is roughly the number of CPU needed to cover the requested amount of work devoted to CPUs (not including IO wait, ...). A temporary high load (several times higher than the number of CPU) for a short duration may not be critical.

Configure

load:

  myload:
     enable: yes
     enable_pager: no
     security_max: warning
     threshold1: 6.0
    threshold5: 4.0
    threshold15: 2.0


threshold1: raise an alert if load 1 minute is above this value (multiplied by the number of available CPUs)
threshold5: raise an alert if load 5 minutes is above this value (id.)
threshold15: raise an alert if load 15 minutes is above this value (id.)

default values : 6, 4 and 2.

Alerts and severity

By default, severity is either CRITICAL or NONE.

Severity can be adjusted with common severity_max option.

Output to Metrology

This module sends one message with the following fields:

cmt_module:    load

cmt_check:     name of the check
cmt_load_cpu:  int - number of available CPUs detected
cmt_load1:     float - value of 1 minute Load Average
cmt_load5:     float - value of 5 minute Load Average
cmt_load15:    float - value of 15 minute Load Average

CLI OUTPUT

$ cmt load

------------------------------------------------------------------
load myload
------------------------------------------------------------------
cmt_load_cpu             2 - Available CPUs
cmt_load1                0.3 - CPU Load Average, one minute
cmt_load5                6.11 - CPU Load Average, 5 minutes
cmt_load15               7.78 - CPU Load Average, 15 minutes
CRITICAL ( )  : load-15/cpu is above threshold : 7.78 > 2 (x 2 cpus)

2021/12/05 - 19:55:58 : SEVERITY=CRITICAL - 0/1 OK (0 %) - 1 NOK : 1 criticial - 0 error - 0 warning - 0 notice.