Commercial solutions are available, however these solutions are very expensive, costing many thousands of dollars in additional hardware and software. More distressing though, is the obvious complexity of these solutions, most requiring several hundred man-hours of consulting time just to set up. Finally, they don't use the Web as their interface, not only making truly remote monitoring impossible, but also making data sharing extremely difficult.
Big Brother is a simple, effective solution to the Systems Monitoring problem, and is presented here for your comments and suggestions.
Big Brother is a loosely-coupled distributed set of tools for monitoring and displaying the current status of an entire Unix network and notifying the admin should need be. It came about as the result of automating the day to day tasks encountered while actively administering Unix systems.
It consists of five major parts:
Big Brother was designed to provide instant information about the health of a Unix Network to anyone, anywhere, with Web access to the site.
Network information is now instantly available to those who need it most: managers, systems administrators, and people on the help desk can actively and simply monitor the health of the network.
If any condition is severe, the administrator will have been paged, can use Big Brother to get additional information, and can proceed to fix the problem. Problem verification, data sharing and correction should improve immediately, since everyone has implicit access to the same information.
Finally, since warnings are displayed, corrective action can be taken even before users notice that there is a problem.
The display matrix shows a status of green (ok), yellow (warning), red (severe), and blue (no contact) for each system/area combination. Furthermore, the entire screen changes color to reflect the most serious condition on the network. In order of increasing severity these conditions are: green, yellow, blue, red.
Therefore one single warning anywhere on the network results in the entire display turning yellow which is highly visible, even from far away.
Each of the elements in the display matrix can then be clicked on to provide additional information, including the code, time, and specific information about the area being monitored.
Additionally, Big Brother now makes detailed information about every server in the network instantly accessible, just by clicking on the server name.
Every machine, firewall and router is accessed via ping every 5 minutes. Any loss of contact results in a code red, and the administrator being paged.
Every registered Web server is accessed every 15 minutes using a bbnet. Loss of contact with a Web server is a severe condition. Inability of access a page due to a "Server Error" results in a warning condition.
All systems are monitored for disk usage. Any disk over 90% full is considered a warning condition. Disks over 95% full are marked as a severe condition, since this situation can quickly result in a system crash or hang.
All systems are monitored for CPU usage. A load average over 1.50 is a warning condition, 3.00 merits a severe condition.
Processes are monitored on each system as well. The choice of what is to be monitored is dependent of what each system actually does. A warning condition results if any of these important processes should die.
System messages are monitored. Big Brother watches /var/adm/messages for NOTICE and WARNING conditions. NOTICE conditions result in the admin being paged immediately. WARNING conditions cause a yellow dot to appear. Clicking on the corresponding dot will report the message that caused the display.
And finally the messaging system itself is monitored by the Central Monitoring station. Any report over 30 minutes old results in that report, and the entire screen being marked in blue, indicating a possible loss of contact within the Big Brother system itself.
Note that all of the above are configurable parameters.
Some of the guidelines involved in the design of Big Brother are the following:
Big Brother is not a replacement for a qualified and experienced Systems Administrator. On the contrary, it is a big brother to the Sys Admin. It does not shut down machines or terminate processes, although it could be programmed to do so. It just identifies and notifies.
Big Brother does not explicitly monitor individual hardware components. However, failure of a hardware component is very likely to cause a severe condition through loss of service.
Big Brother does not monitor performance of the network, servers, databases or any individual application. It will however provide information about CPU loads and implicit information about response time; i.e. telnet connections have 15 seconds to answer.
Big Brother isn't complicated. Once the methodology and underlying tools are understood, changes and enhancements are very simple to make.
Big Brother isn't expensive. In fact, it's free.
Big Brother isn't finished. But it keeps growing every day, doing more and more watching. Big Brother is watching...
Errors in /var/adm/messages should be handled better.
Big Brother should support alphanumeric paging and enhanced messages.
Big Brother should be enhanced to work on black and white screens.
Big Brother should log critical and warning situations.
Big Brother should learn something about Oracle databases.
Big Brother should automatically determine what processes to monitor.
Big Brother should probably try to monitor security.
Your Comments and Suggestions are needed!
Send them via e-mail to sean@iti.qc.ca
Big Brother was created by Sean MacGuire, while a consultant with the Tactik Infrastructure Group of Bell Sygma in Montreal. After fifteen years of Unix Systems Administration, Systems Design, and User Interface Design, this system was created to make his life easier.
Sean MacGuire currently has two patents pending, and is the publisher of It's a Bunny, a literary Web Magazine which I-Way magazine ranked as one of the top 500 sites on the the Internet, and the 10th best magazine.
The photo which adorns the Big Brother Display is Sean MacGuire. He is Big Brother... it is meant to be reminicent of George Orwell's book 1984. That's why it's not a pretty picture.