Remstats Home

  About remstats
  Release Notes
  Configuration Tools

Live Data

Remstats was written by Thomas Erskine at the CRC in Canada and now looking for work.

[Top] [Prev] [Next]

alert-monitor - a status evaluator and alert trigger


alert-monitor version 1.27 from remstats 1.0.13a
usage: ../alert-monitor [options]
where options are:
  -d nnn  enable debugging output at level 'nnn'
  -f fff  use config-dir 'fff'[/home/groups/r/re/remstats/etc/config]
  -h      show this help
  -G GGG  only do hosts in groups 'GGG' (a comma-separated list)
  -H HHH  only do hosts 'HHH' (a comma-separated list)
  -K KKK  only do hosts with keys 'KKK' (a comma-separated list)
  -s sss  search 'sss' data samples for values [5]
  -u      generate alerts for hosts unreachable via a down host


The alert-monitor compares the current value of variables specified in the alerts file in the configuration directory with threshold values and sets the status of those variables accordingly. It saves the current status of variables in /home/groups/r/re/remstats/data/ALERTS.

What value corresponds to what status level is set in the rrd definition or sometimes the host definition. This way an rrd definition will specify generally reasonable levels, but they can be overridden for hosts where they aren't reasonable.

For an rrd definition, an alert line looks like:

	alert varname relation oklevel [warnlevel [errorlevel]]


	alert varname nodata status

[The latter says that missing data for variable varname will cause its status to be level status.]

For a host-specified alert level, the line looks like:

	alert rrdname varname relation oklevel [warnlevel [errorlevel]]


	alert rrdname varname nodata status

and the interpretation is the same, except that you're having to say which rrd this alert refers to.

The available relations are:

	< (value is less than threshold)
	> (value is greater than threshold)
	= (value is equal to threshold)
	|< (absolute value of value is less than threshold)
	|> (absolute value of value is greater than threshold)
	delta< (difference between last two values is less than threshold)
	delta> (difference between last two values is greater than threshold)
	<daystddev (value is outside threshold * the past day's standard-deviation)
	<weekstddev (value is outside threshold * the past day's standard-deviation)
	<monthstddev (value is outside threshold * the past day's standard-deviation)


To make things more concrete for the first (normal) case, here's a real example, from the load rrd supplied in config-base:

	alert load5 < 3 7 10

This means that if the load5 variable is less than 3, the status is set to OK. If it's less than 7, it's WARN, less than 10 it's ERROR and more than that, it's CRITICAL.

Since the first match is taken, it's possible to leave out the upper levels if you don't want them to ocurr. For example if you only wanted load5 to ever go to WARN level, never above, you could use:

	alert load5 < 3

and then the only possible status levels are OK and WARN.

The possible relations are: <, =, >, |<, |>, delta<, delta>. The first three should be obvious. The next two allow comparisons to the absolute value of the variable's current value. The last two allow comparisons to the change in value.

Causing alerts

Depending on the lines in the alerts file, the status may also trigger alerts. A matching line in the alerts config-file will cause alert-monitor to run the alerter for each of the specified recipients. It will also be passed, in order:

  • recipient - the recipient; for alert-email it will be an email address
  • hostname - the name of the host that the alert applies to
  • ip - the IP number for that host, in case it's not in DNS
  • rrdname - the name of the RRD
  • wildpart - the wild part of a wildcard RRD. E.G, for an RRD of port-ftp (using the wildcard RRD port-*) the wildpart would be ftp.
  • variable - the name of the variable
  • status - the current status, as decided by alert-monitor
  • old_status - the previous status
  • value - the current value of the variable
  • relation - the relation used to compare the variable to the threshold, mostly for creating informative messages
  • threshold - the threshold value that was exceeded
  • start - timestamp of when the alert started
  • duration - number of seconds that the alert has been active
  • host-description - the description field from the host config-file
  • rrd-description - the description tag on this rrd (desc="xxx")
  • webmaster - the email address of the remstats person
  • template - the name of the template file to generate the message from.

[Top] [Remstats] [SourceWorks] [RRDtool] [SourceForge]
Last updated Fri May 30 13:50:41 PDT 2003 by <>.