Where does a failure manifest itself first

A network monitoring tool periodically makes a request to a system end point and records the result in a database of some kind.

Whether the polling interval is every few seconds, one minute or ten minutes or longer there is an awful lot of time when the network monitor has nothing meaningful to say about the state of the end point.

The network monitor is unlikely to be the first system to spot a problem. If the network monitor won’t be the first to spot a problem, what will?

In our systems, the first place where a problem will manifest itself is in a log of some kind. Be that a text based log or something like Windows event log.

If your website returns s 5XX status code, then your log file will record the fault long before your network monitor will make a request that returns a 5XX code.

What time difference am I talking here?

Depends a lot on how often you poll the end point. If you are polling every minute or faster then the difference is likely to be pretty insignificant. If you are polling every five minutes then the difference could potentially be significant.

But it isn’t just that you will be informed more quickly by going to the source of the failure, you will also get better information.

Monitoring an end point will only tell you so much: whether  it is working  and the response time and maybe, if you’re lucky, a response code.

In the case of our logs when a 5XX status code is returned, we’ll probably get the full exception message plus stack trace. Altogether a lot more useful.

tl;dr monitor your primary sources, don’t rely on secondary sources.

Ipswitch acquires Dorian Software Creations Inc

Ipswitch, the people responsible for creating What’s Up Gold, have acquired Dorian Software Creations. Dorian Software are publishers of event log management software.

Dorian’s event log management solutions for Windows and Syslog environments include:

  • Event Archiver for automated collection, centralization and secure storage of log data;
  • Event Analyst for event examination, correlation and comprehensive reporting for audit and compliance;
  • Event Alarm for monitoring, alerting and notification on key defined events;
  • Event Rover for on-the-fly forensics and log data mining.

Dorian products are scheduled to be available from Ipswitch in March.

Network Computing rates PRTG #1

The German magazine Network Computing has done a comparison of four well known network monitoring packages: PRTG 7.2, What’s Up Gold 14, Solarwinds Orion Network Performance Manager 9.5 and ManageEngine OpManager 8.

From all these points of view we can only advise those who are looking for a good monitoring product to write PRTG Network Monitor right at the top of the list of products to look at. For Network Computing this product is, as previously, still the reference.

PRTG 7.2 came out on top!

Windows based dynamic systems management update

As a follow up to the Windows based structured systems management post, I have found a network monitor that does have some dynamic abilities.

PolyMon is an open source network monitor written for the .NET environment. Steven Murawski has written PoSHMon, a series of PowerShell cmdlets for interacting with PolyMon dynamically. Whilst neither PolyMon or PoShMon are particularly full featured or mature, they do at least show what is possible.

If anybody knows of any commercial network management tools with PowerShell support for dynamic structured systems management, I’d love to hear about it. 😉

A real world example of the problems with open core software

A real world example of what Tarus Balog from OpenNMS has been banging on about recently with his critique of open core or fauxpen source.

A product manager who has an open product and a closed product plainly has a decision to make over which features go into which product. Give too much away and the value add of the closed enterprise product is insufficient to warrant the licence fees. Put too many features into the enterprise product and the open source offering becomes useless.

Have Hyperic‘s & Zenoss‘s feature selections leaned too far towards their closed enterprise versions? Alemic Boiling would seem to think so…

Distributed network monitoring interview with Robert Aronsson

Robert Aronsson is the CEO of Intellipool AB a company with over ten years experience of the network management market. Intellipool introduced a distributed network monitor over four years ago. I interviewed Robert with a view to getting some insight into Intellipool’s experience of implementing distributed network monitoring solutions with their customers.

The interview was conducted via email. My questions are in bold with Robert’s answers underneath.

Q: What key factors determine whether you need a distributed solution?

The most obvious factor is network location, doing direct network monitoring over, for example, a branch office VPN is not something I would recommend. Depending on what you are monitoring it can be a huge resource drain on a VPN that should be used for “normal” office work.

By placing a remote gateway at a branch office, INM can monitor all aspects of the remote servers to just a fraction of the “bandwith cost” compared to direct monitoring.

There is a second factor that might not be that obvious, but Intellipool’s take on distributed monitoring can be used as a “clustering” solution, splitting the monitoring workload over several machines if you have a very large network to monitor. Since all management is done from the central INM server it’s very easy to move monitored objects between different gateways when you for example needs to take down a machine for maintenance.

Q: What are the biggest barriers to successfully deploying a distributed network monitoring solution?

The two biggest practical problems that you will encounter is deciding if you need a dedicated machine to host each remote gateway and re-configuring firewall rules in both ends.

In Intellipool we have implemented the server/gateway system to make it as cost effective as possible for the customer, this means that:

  1. Remote gateways always connect to the central server, meaning that you have fewer firewall rules and also avoid the risk of deal breakers such as having to open a customer firewall for incoming traffic;
  2. Since the gateway connects to the server, the remote gateway can be placed on a dynamic IP;
  3. A typical Intellipool gateway consumes around 30 MB of memory, stores nothing but temporary work files on disk and can be remotely updated once it’s installed. Installation of a new gateway is done in a matter of minutes. The low resource footprint makes it possible to co-locate the remote gateway running on an already installed server.

Q: What is the best management structure for managing a distributed solution? Should each office/department be responsible for its part or should it be controlled centrally?

Control of the actual software (ie. the remote gateway) should be handled by the same group/entity that manages the Intellipool server. Management of monitored objects is another story, it’s quite possible to hand over the control over the object to a local sys admin. But in reality it’s best that everything is managed by the same group of people, the risk of responsibility grey zones is reduced if you have appointed one group of people to control the whole chain.

Q: What are your top 3 do’s and don’ts?

(I’ll formulate it in a bit different way here) There is a number of things to consider before going ahead with distributed monitoring.

  1. When considering distributed monitoring, make sure that the traffic to and from servers are encrypted and (something that is often forgotten) protected from man in the middle attacks;
  2. Make sure that you select a system where installation and upgrade of remote gateways can be done remotely;
  3. Distributed monitoring is not for everyone, even if your organization scattered around the globe connected with VPN’s. If you are just interested in basic monitoring (ping, simple HTTP monitoring etc) you are likely better off doing it from one central location, since in this case bandwidth is not an issue.

Thanks Robert. If you have any questions, please leave a comment.

Distributed network monitoring introduction

A number of mid-level network monitoring products, like What’s Up Gold & Intellipool for instance, have recently implemented distributed monitoring features. Mid-level network monitoring products are now implementing  distributed monitoring so it is affordable by a lot more companies.

Single Poller Monitoring

With regular network monitoring you have a single poller measuring network and server performance from a single location on your network.

Architecture of a central polling in a distributed network
Architecture of a central poller in a distributed network

Single poller monitoring works well when the network is small or only has a single site. Every request is made from a single location to each of the resources being measured.

Whilst single poller network monitoring is well suited to single site performance monitoring, it does not scale well on larger, multi-campus networks.

What is Distributed Network Monitoring

Distributed network monitoring involves multiple pollers distributed around your network measuring performance from multiple locations on your network

Architecture of Distributed Polling in a Distributed Network
Architecture of Distributed Polling in a Distributed Network

Multi-campus networks typically have WAN links interlinking the various sites. WAN links are usually much slower and more expensive than LAN links. By placing your network monitoring probe in a single central location you are inevitably going to send more traffic over your WAN links.

Distributed network monitors permit you to locate your probes locally to the resources being monitored with only the statistics being synchronised en-masse back to a central Network Operations Centre (NOC).

Advantages of Distributed Network Monitoring

  • Real user view of network performance — with single point network monitoring you see the network from a single perspective. With distributed network monitoring you see the network from a number of different views across your network;
  • Helps with network troubleshooting — distributed network monitoring gives you multiple performance profiles giving you the ability to detect outages and bottlenecks more easily
  • Reduce bandwidth requirements over WANs — a central poller will send requests over your precious WAN links. A distributed network monitor will usually be configured to send requests to local resources and appropriate global resources;
  • Single consolidated NOC view — rather than have a number of separate network monitoring systems situated inside each campus, distributed network monitors allow you the best of both worlds. Monitor resources locally but consolidate all stats into a single NOC for analysis and storage.

Disadvantages of Distributed Network Monitoring

  • More expensive and complex — distributed monitors are more expensive than single poller monitors, sometimes quite a lot more. You also need to find the hardware upon which to deploy the remote pollers and the time for installation and configuration;
  • Unless carefully designed you may end up using more WAN bandwidth than a central network monitor — if you are not selective of which services you monitor and from where you will find no savings in bandwidth usage with a distributed network monitor. Unless polling a resource is going to buy you some insight into your systems performance then monitoring it from a remote site seems like a waste of bandwidth.

Recommendations

  • Multiple single poller monitors, one for each remote office, may be more appropriate if each office runs its IT systems autonomously with few shared systems. Distributed network monitoring comes into its own when a single NOC view of the entire network is required. If you are happy with multiple autonomous point tools then a distributed system may be overkill;
  • Only monitor resources remotely that are genuinely used remotely. This will not only save you the bandwidth required to periodically test the resource but mean that you do not need to deprecate your carefully designed security policy by making a resource more publicly available than is entirely necessary. In addition, your monitoring effort won’t tell you anything meaningful anyway because none of your users use the resource remotely;
  • When remotely monitoring a resource, do not set up a separate comms channel for the monitoring system to use. For a performance monitor to be of any use it needs to use the same infrastructure that your users utilise. If you’re not careful the network monitor just ends up effectively monitoring itself.

I’ll be investigating your open source distributed network monitoring options soon. In the meantime, if you’ve got any feedback, please leave a comment!