Nagios
Nagios is the software used on Sysnews to monitor ITD services. It currently is monitored 24x7x365 by Operations.
Method
Nagios can run checks on remote clients using either an ssh connection, authenticated with public/private key files, or with it's own remote protocol known as nrpe (Nagios Remote Program Execution). We've choosen nrpe because it allows you to define what nagios is allowed to execute on the client, and so somewhat limits potential access.
To implement the npre protocol on Windows Hosts, we're using NSClient++, one of the many excellent plugins from the Nagios Exchange.
Windows Client Configuration
There is an MSI package of the NSClient++ files located at \\unity.ad.ncsu.edu\dfs\ITD\Applications\Freeware\NSCLient\NSClientpp-0.2.7-ncsu.msi A group policy, assigned to the Microsys Member Servers, installs the software and opens port 5666 in the windows firewall.
Normally, no other configuration is necessary. We limit connections to the NSClient software in two places. Should the Nagios servers ever move to a new IP subnet, you will have to change both the firewall scope, and the "allowed_hosts" entry in C:\Program Files\NC State University\NSClient\NSC.INI file.
At the time this document was last edited, we allow connections from 152.1.227.0/24,152.1.226.0/24
Services Monitored
In Sysnews, you can specify the following server "functions" to invoke various tests.
| Function | Checks | Thresholds |
|---|---|---|
| windows | Average CPU utilization accross all CPUS over the last 5, 10 and 15 minute intervals | Warn at 80%, Critical at 90% |
| Drive space on all "fixed disk" storage, checked every 15 minutes | Warn at 25% free, Critical at 15% free. | |
| All services set to automatic startup are running, checked every 15 minutes | Critical if a service has stopped. | |
| Responds to pings, checked every 15 minutes | Critical if ping fails. | |
| windows-printsync | All services checked under "windows" | Same as for "windows" |
| Has the Pcounter lpquota job run recently? | Critical if lpquota not run in 30 minutes. | |
| windows-web | All services checked under "windows" | Same as for "windows" |
| Serves web pages on port 80, checked every 15 minutes. | Critical if web page not successfully read. | |
New Sysnews server "functions" are coming soon. Plans are to check KMS license counts, FRS replication, Group Policy syncronization, and anything else that could bite us.
Configuration Details
This information should not be necessary to install or use the NSClient++ software, but documents our settings.
The software can be found at http://trac.nakednuns.org/nscp/
Open port 5666 in the Windows firewall. The GPO ITD-Test-JAK-Firewall-Exception-Nagios will do this for you, allowing only the VLANs containing Nagios servers access
Edit the NSC.ini so that it contains the following entries:
[modules]
CheckSystem.dll
CheckDisk.dll
NRPEListener.dll
CheckHelpers.dll
allowed_hosts=152.1.227.0/24,152.1.226.0/24
[NRPE]
port=5666
command_timeout=60
allow_arguments=1
use_ssl=0
script_dir=scripts\
[NRPE Handlers]
nrpe_cpu=inject checkCPU warn=80 crit=90 5 10 15
nrpe_ok=scripts\ok.ba
nrpe_CheckDriveSize=inject CheckDriveSize MinWarn=25% MinCrit=15% CheckAll FilterType=FIXED
nrpe_CheckService=inject checkServiceState CheckAll exclude=SysmonLog
Install the service with the command
"C:\Program Files\NSClient\NSClient++.exe" /install "C:\Program Files\NSClient\NSClient++.exe" /start
Be sure the NSClient service is set to auto start in the control panel on reboot.
On the nagios server, the Windows host appears to be a standard nrpe client, and uses check_nrpe to probe it.