At work, we use Icinga (a fork of
Nagios) for monitoring. We have a few services
which are restarted or otherwise poked by event handlers, but the
recovery takes a while - so we often get paged for problems which
recover in a few minutes. I wrote a small perl script …
more ...