Tag Archives: cacti

Why I Enthusiastically Switched from Cacti to Zabbix for System Monitoring

Cacti is a “complete network graphing solution” according to their website. It has also been a thorn in my side for a long time.

See what I did there? Thorn… because it’s a cactus… never mind.

When Cacti is in a steady state–when I could get it to a steady state–it was good. Not great, because there was a lot of effort to get it into what I consider “steady state”, but good. The rest of the time… thorny.

There are five major things that have driven me up the wall. In no particular order: Continue reading Why I Enthusiastically Switched from Cacti to Zabbix for System Monitoring

Monitoring Postfix

The goal here is to monitor servers running Postfix to determine the number of email messages delivered locally and abroad and to graph that data. The data will be made available via Net-SNMP for collection using your NMS of choice.

Basis for this Work

The methods outlined below are based on the work of Craig Sanders <http://taz.net.au/postfix/mrtg/>. Some things that are done different here:

  • Only one database file is used for storing the data instead of two.
  • Instead of recording two sets of numbers (the current reading and the previous reading), subtracting them, and then returning the difference, the method below lets MRTG/Cacti/whatever you’re using perform that operation itself (as they’ve been designed to). This simplifies the process a fair bit. [On a side note, the reason Craig’s method still works is because he’s using the gauge option in his mrtg.cfg file]
  • The script that pulls the data out of the statistics database has been rewritten in C in order to eek out a bit more performance and reduce the overhead of feeding the data to Net-SNMP.

Code

update-mailstats.pl
Watches the Postfix maillog file and stores a tally of sent and received email in a small database. This script requires the File::Tail module from CPAN.
postfix-stats-get.c
Retrieves data out of the database file.

Compile postfix-stats-get.c and move it to /var/net-snmp.

# gcc -o postfix-stats-get postfix-stats-get.c
# mv postfix-stats-get /var/net-snmp

Linux users: Make sure you have a copy of Berkeley DB installed (tested up to and including v5.3) and then compile using this command:

# gcc -DLINUX -o postfix-stats-get postfix-stats-get.c -ldb

Next, add update-mailstats.pl to the system start-up scripts or to a cronjob so that it runs when the system boots. The script will automatically daemonize itself and begin tailing the /var/log/maillog file.

Getting Stats from Postfix

Unfortunately, Postfix doesn’t store internal statistics due do the way it’s designed. Because of this, statistics will have to be culled from the log file.

The update-mailstats.pl script runs in the background watching the /var/log/maillog file. It records the number of sent and recieved email messages based on whether the message is to or from a local address or a remote one. It also records the number of 4XX and 5XX error codes Postfix returns for incoming mail and the number seen when trying to deliver mail. The data is recorded in a hash table, /tmp/postfix-mailstats.db.

The update-mailstats.pl script should be added to the system start-up scripts or to a cronjob that fires when the system boots.

Serving Stats via Net-SNMP

Since the goal is to use SNMP to monitor the mail server, the data in the statistics file must be made available via SNMP. The Net-SNMP daemon allows for data to be retrieved using local shell scripts or programs. The data retrieved from these scripts is made available under special table .1.3.6.1.4.1.2021.8.1. More information on how this works is available in the snmpd.conf manpage (look for the exec keyword).

The postfix-stats-get program will retrieve data from the database and pass it back to Net-SNMP. The program takes one command line argument which indicates the datapoint to retrieve:

  • sent:smtp – Number of messages sent via SMTP
  • sent:local – Number of messages delivered locally
  • recv:smtp – Number of messages received via SMTP
  • recv:local – Number of messages received locally
  • smtp:4xx – Number of 4XX responses seen when trying to deliver mail
  • smtp:5xx – Number of 5XX responses seen when trying to deliver mail
  • smtpd:4xx – Number of 4XX responses sent to remote servers when they tried to deliver mail to us
  • smtpd:5xx – Number of 5XX responses sent to remote servers when they tried to deliver mail to us

If you have defined custom delivery methods in Postfix’s master.cf file (for example, I’ve defined a method called “legal” which inserts a legal disclaimer at the bottom of outgoing emails) then these tools should automatically create statistics for them too. In my case I can query “sent:legal” and it just works.

Edit snmpd.conf and add exec statements for each data point you want to query. Note that the argument after “exec” is arbitrary; it’s just displayed in the extNames oid.

exec postfix-sent-smtp /var/net-snmp/postfix-stats-get sent:smtp exec postfix-recv-smtp /var/net-snmp/postfix-stats-get recv:smtp exec postfix-sent-local /var/net-snmp/postfix-stats-get sent:local exec postfix-recv-local /var/net-snmp/postfix-stats-get recv:local

Once snmpd is restarted, a walk of the .1.3.6.1.4.1.2021.8.1 MIB will show the data from the hash table.

enterprises.ucdavis.extTable.extEntry.extNames.1 = postfix-sent-smtp
enterprises.ucdavis.extTable.extEntry.extNames.2 = postfix-recv-smtp
enterprises.ucdavis.extTable.extEntry.extNames.3 = postfix-sent-local
enterprises.ucdavis.extTable.extEntry.extNames.4 = postfix-recv-local
enterprises.ucdavis.extTable.extEntry.extCommand.1 = /var/net-snmp/postfix-stats-get sent:smtp
enterprises.ucdavis.extTable.extEntry.extCommand.2 = /var/net-snmp/postfix-stats-get recv:smtp
enterprises.ucdavis.extTable.extEntry.extCommand.3 = /var/net-snmp/postfix-stats-get sent:local
enterprises.ucdavis.extTable.extEntry.extCommand.4 = /var/net-snmp/postfix-stats-get recv:local
enterprises.ucdavis.extTable.extEntry.extResult.1 = 0
enterprises.ucdavis.extTable.extEntry.extResult.2 = 0
enterprises.ucdavis.extTable.extEntry.extResult.3 = 0
enterprises.ucdavis.extTable.extEntry.extResult.4 = 0
enterprises.ucdavis.extTable.extEntry.extOutput.1 = 0
enterprises.ucdavis.extTable.extEntry.extOutput.2 = 215
enterprises.ucdavis.extTable.extEntry.extOutput.3 = 219
enterprises.ucdavis.extTable.extEntry.extOutput.4 = 4

Of interest are the extOutput lines which correspond to messages sent via SMTP, recieved via SMTP, sent locally, and recieved locally, respectively. Don’t get confused by the exResult lines; these are actually the exit status of the postfix-stats-get command and not the email message counts.

Graphing Postfix Stats

Now that Postfix’s statistics are available via SNMP, they can be graphed the same as any other.

Postfix SMTP Messages

Point your NMS at the appropriate extOutput oid(s) and begin graphing. Note that if you don’t have UCD-SNMP-MIB loaded then you won’t be able to refer to any of the extOutput oids by name. Instead use .1.3.6.1.4.1.2021.8.1.101.X where X is the specific output you want.

Notes

The hash file /tmp/postfix-stats.db has a fixed size; it won’t increase in size over time. If the file is deleted for some reason (e.g., if the system reloads and clears out /tmp on start-up), update-mailstats.pl will recreate it.

As explained in the snmpd.conf manpage, when snmpd runs external commands such as postfix-stats-get, it caches the results in the file /var/net-snmp/.snmp-exec-cache. This file must be writeable by the user that snmpd is running as or else it will not return the output from the external script being ran.

The File::Tail perl module does not read the maillog in real time therefore the database is not updated in real time. There may be up to 60 seconds between database updates.

Monitoring BIND9

The goal here is to monitor DNS servers running BIND version 9 and graph the various statistics that it records about itself. The statistics will be made available to the Net-SNMP daemon by a script. From there, the data can be polled by whatever NMS you choose to use.

Table of Contents

Getting Stats from BIND

BIND stores a number of statistics internally. In earlier versions of BIND there were only a handful of statistics available. In more recent versions (9.6 and newer), there are some very detailed statistics present. To retrieve stats, issue the rndc stats command. This will instruct BIND to dump the stats to the statistics-file as configured in named.conf.

A quick look at the statistics file from BIND version 9.4:

+++ Statistics Dump +++ (1293359406)
success 21553
referral 0
nxrrset 864
nxdomain 10869
recursion 15348
failure 65
--- Statistics Dump --- (1293359406)

In BIND version 9.6 the format of the statistics file has been changed.

+++ Statistics Dump +++ (1293402423)
++ Incoming Requests ++
              407104 QUERY
                  29 NOTIFY
++ Incoming Queries ++
               57244 A
                  15 SOA
               10504 PTR
                 664 MX
              314258 TXT
               24068 AAAA
                 351 SRV
++ Outgoing Queries ++
[View: default]
               60597 A
                  11 NS
                  28 SOA
                 244 PTR
                   3 MX
                1008 TXT
               36861 AAAA
                 190 SRV
                  ...... (output trimmed)

I’ve created one set of scripts for each file format. There is a single script for BIND 9.4. It gets called directly by Net-SNMP where it will tell BIND to write its stats to disk (by using rndc). The script will then parse out the specific statistic that Net-SNMP asked it for. For BIND 9.6 and newer, there are two scripts. The first script is meant to run from cron where it will dig through the statistics file for the most recently saved statistics. It writes those stats to a second file in a machine parsable format. The second script is then called by Net-SNMP where it will fetch the requested statistic from the machine parsable file.

The crontab entry for the BIND 9.6+ script should look something like this:

*/5 * * * * /usr/sbin/rndc stats; bind96-stats-parse.pl \
< /var/named/tmp/named.stats > /tmp/named.stats

The script reads the statistics file on stdin and will write its output to stdout. Make sure that wherever you direct the output of the script to is readable by the user that the Net-SNMP daemon runs as. Also make sure you adjust the bind96-stats-get.sh script to read from this same file.

Serving Stats via SNMP

Since the goal is to use SNMP to monitor the DNS server, the data in the statistics file must be made available via SNMP. The Net-SNMP SNMP daemon allows for data to be retrieved using local shell scripts or programs. The data retrieved from these scripts is made available under the .1.3.6.1.4.1.2021.8.1 MIB table. More information on how this works is available in the snmpd.conf manual page (look for the exec keyword).

The following lines are added to snmpd.conf:

# for bind 9.4
exec bind9-ok /var/net-snmp/bind94.sh ok
exec bind9-fail /var/net-snmp/bind94.sh fail

# for bind 9.6+
exec bind9-query /var/net-snmp/bind96-stats-get.sh incoming_requests:query

The first line will return the number of successful queries, the second the number of failed. The /var/net-snmp/bind94.sh shell script has the task of taking data from BIND’s statistics file and passing it to the SNMP daemon. The script is available here: bind94.sh.

A fellow network person who read this page contributed a second version of the script which will also return stats for “referral”, “nxrrset”, “nxdomain”, etc, queries. That script is here: bind94v2.sh. This script is called with the desired query type as its argument, e.g.:

exec bind9-success /var/net-snmp/bind94v2.sh success
exec bind9-failure /var/net-snmp/bind94v2.sh failure
exec bind9-nxdomain /var/net-snmp/bind94v2.sh nxdomain
exec bind9-recursion /var/net-snmp/bind94v2.sh recursion

The possible arguments to bind96-stats-get.sh can be seen by looking at the output of the bind96-stats-parse.pl script.

Sun Dec 26 21:05:05 2010
incoming_requests:query=407104
incoming_requests:notify=29

outgoing_queries:a=60597
outgoing_queries:ns=11
outgoing_queries:soa=28

outgoing_queries+internet_view:a=584033
outgoing_queries+internet_view:ns=5593
outgoing_queries+internet_view:soa=483

Statistics such as outgoing_queries+internet_view:a are related to a particular view (in this case, a view called internet_view). Everything else is either related to the default view or is a global statistic.

Once snmpd.conf is updated and snmpd is restarted, a walk of the .1.3.6.1.4.1.2021.8.1 MIB will show the script in action.

enterprises.ucdavis.extTable.extEntry.extIndex.1 = 1
enterprises.ucdavis.extTable.extEntry.extIndex.2 = 2
enterprises.ucdavis.extTable.extEntry.extNames.1 = bind9-ok
enterprises.ucdavis.extTable.extEntry.extNames.2 = bind9-fail
enterprises.ucdavis.extTable.extEntry.extCommand.1 = /var/net-snmp/bind94.sh ok
enterprises.ucdavis.extTable.extEntry.extCommand.2 = /var/net-snmp/bind94.sh fail
enterprises.ucdavis.extTable.extEntry.extResult.1 = 0
enterprises.ucdavis.extTable.extEntry.extResult.2 = 0
enterprises.ucdavis.extTable.extEntry.extOutput.1 = 268814
enterprises.ucdavis.extTable.extEntry.extOutput.2 = 78
enterprises.ucdavis.extTable.extEntry.extErrFix.1 = 0
enterprises.ucdavis.extTable.extEntry.extErrFix.2 = 0
enterprises.ucdavis.extTable.extEntry.extErrFixCmd.1 =
enterprises.ucdavis.extTable.extEntry.extErrFixCmd.2 =

Of interest are the extOutput lines which return the output from the scripts.

Download for BIND 9.4

bind94.sh
Author: Joel Knight
Returns number of successful or failed queries.

$ bind94.sh

bind94v2.sh (recommended)
Author: Evgeny Zislis (aka Kesor) <evgeny.zislis..gmail.com>
Returns stats for all query types.

$ bind94v2.sh <success|referral|nxrrset|nxdomain|recursion|failure>

Download for BIND 9.6 and Newer

bind96-stats-parse.pl
Author: Joel Knight
Parses the BIND statistics file and outputs them in a machine parsable format.

$ bind96-stats-parse.pl < /path/to/named.stats > /tmp/parsable.stats

bind96-stats-get.sh
Author: Joel Knight
Pulls out a certain statistic from the machine parsable file. Make sure you edit this script and tell it where that parsable file is located.

$ bind96-stats-get.sh <statistic>

Notes

Be aware that when rndc stats is run, the statistics file isn’t overwritten, it’s appened-to. This means the file will continue to grow larger in size once you start using these scripts. A good idea may be to add a weekly cron job to delete the file so that its size can be kept in check or use your system’s log rotation utility to rotate or trim the file.

As explained in the snmpd.conf manpage, when snmpd runs external commands such as bind94.sh, it caches the results in the file /var/net-snmp/.snmp-exec-cache. This file must be writeable by the user that snmpd is running as or else it will not return the output from the external script being ran.