Network monitoring, or the process of measuring network status, is one of the many professional tasks in the service provider business process that is undergoing changes. These changes are driven by "the usual suspects," meaning changes in technology, service makeup and service provider business practices.
One challenge for monitoring is the containing of its scope. The general term used in network management of FCAPS (Fault, Capacity, Accounting, Performance, and Security management) suggests that monitoring might be involved in each of these areas, an approach that is in fact taken by some vendors. Others view monitoring as purely fault management, or as fault and performance management. The scope of monitoring's role in this FCAPs process is likely determined by the capabilities of the network management system overall.
Most network equipment vendors in the service provider space offer network monitoring tools through their network management systems (NMSs). These tools link to management information bases (MIBs) in the individual devices, and they can obtain device status by reading the MIB variables. This process creates an "atomic" view of network state that is unlikely to directly uncover network problems. However, because NMSs have overall topology knowledge, they can often interpolate this atomic status information into more useful form. Where network monitoring is available through the NMS, it is highly desirable to utilize it fully because of its more "systemic" viewpoint.
Business activity in the network management and monitoring market has included mergers and acquisitions, for example, Computer Associates' acquisition of Concord. These acquisitions are often aimed at creating effective combinations of management and monitoring tools.
A popular example of a systemic monitoring tool is Cisco's NetFlow, which provides a full set of tools for application and user monitoring, accounting management, traffic analysis, etc. These tools are essentially mission-focused, meaning that they are designed to directly support activities of professionals involved in network, service, and customer support.
While not as "systemic" an approach as NetFlow, the modern trend toward end-to-end monitoring and measurement is a step in the right direction in terms of providing a linkage between network monitoring and service or customer experience. More and more service standards are being enhanced to include end-to-end path status monitoring.
In the IP space, RFC3429 and the Ethernet specification IEEE 802.3ah (now 802.3 Clause 57) provide for end-to-end monitoring and fault localization. These capabilities are not yet fully supported in networks, but as they roll out they will likely change the focus of network monitoring toward something more customer-experience-based. This shift will be welcomed by the networking professional, who is increasingly forced to interpolate service experience based on network status.
Device-level monitoring (SNMP, for example) is still a valuable tool for network professionals, but is more and more likely to be associated with problem resolution or the late stages of problem isolation. Device-level monitoring tools (and even interface or board-level MIBs) are valuable in developing specific information on performance of devices, but are cumbersome as tools in fault isolation because they can't easily link device conditions to network or service behavior. In part, this is due to the fact that they don't distinguish between traffic for various services and users.
Lower-level diagnosis may require a more refined tool set, and this is ironically available both as an extremely low-level and technical tool and as a high-level system tool. Application-aware monitoring is clearly the most significant trend in network monitoring simply because it focuses performance and status analysis on the relationships the network is committed to support, and leads from there to identification of resources. There is some collision between the low- and high-level approaches in a marketing sense, but in fact they are often complementary for service providers.
Low-level monitoring is normally associated with the use of smart probes that can be parameterized to detect traffic based on packet inspection. The IETF RMON specification offers a standard means of providing this, but proprietary strategies are also available from vendors such as NetScout or Network General. The purpose of remote monitoring in any form is to give a network professional a similar level of access that would be offered through the use of a local protocol analyzer. Network General's approach is in fact derived from its "Sniffer" analyzer product, now supported as a remote tool.
High-level application awareness seeks to accomplish much the same thing by obtaining data from applications at their point of network connection. This process is easier for an enterprise, where the boundary between IT and networking is vague, than it is for service providers where that boundary is absolute and where crossing it may generate customer concerns about security. However, the use of "managed services" is expanding worldwide, and application-level data is increasingly available in that context. As service providers offer higher-layer services, including hosting and software-as-a-service, the application components are inside the network and fully available for obtaining application statistics.
The best monitoring strategy is normally set by the mission of the professionals involved, but will also depend on the service and network architecture. Cisco's NetFlow is clearly targeted at IP/Internet providers, for example. Monitoring associated with capacity and performance management can focus on device health and resource congestion, largely fault and performance issues. Monitoring associated with customer care must be directed at service, application, or end-to-end behavior. The older MIB-based tools, including SNMP, are becoming less relevant as these service-based missions increase their hold on service provider agendas.
It is important to note that all fault management and much of performance management must ultimately end in some remediation of problems. This may be something that can be handled remotely in some cases (changing parameters on a device, etc.) but in many cases it will involve dispatching field personnel to complete a repair. In the former case the need for NMS integration to facilitate making network parameter changes is obvious, but in the latter case it is often necessary to manage failure-mode operation of the network, which is also an NMS function. Thus, integration of monitoring and management systems for convenient use by network operations center personnel and other craft professionals is likely to increase productivity and improve customer experience.
About the author: Tom Nolle is president of CIMI Corporation, a strategic consulting firm specializing in telecommunications and data communications since 1982. He is a member of the IEEE, ACM and the IPsphere Forum, and the publisher of Netwatcher, a journal in advanced telecommunications strategy issues. Tom is actively involved in LAN, MAN and WAN issues for both enterprises and service providers and also provides technical consultation to equipment vendors on standards, markets and emerging technologies.