Integrated SNMP Management with HP OpenView ITO and NNM

Integrated SNMP Management with HP OpenView ITO and NNM.

By Mike Peckar

Fognet Consulting

Written for OpenView Forum, February 15, 1999

Integrated Network and Systems Management (INSM) via HP (Hewlett-Packard) OpenView is made possible through the integration of IT/Operations (ITO) with Network Node Manager (NNM). The keystone of this integration is the addition of Simple Network Management Protocol (SNMP) network-oriented events to the system management-oriented messaging infrastructure of ITO.

This paper will focus in general on event management in HP OpenView and specifically on the flow of SNMP events through the integrated NNM and ITO environment. It should appeal in particular to ITO Administrators deploying INSM, as well as those interested in learning event management's context in the ITO product and overall IT management. The paper will cover event management architecture and definitions, details of the SNMP subsystem in NNM and ITO, as well as daemon interactions, configuration issues, ongoing maintenance issues and administrative best practices.

Most of the technical information applies to ITO version 4, but is applicable to version 5 as well. The paper will not go into detail on SNMP internals or other NNM/ITO integration issues unrelated to the event subsystems such as application integration, ITO filtering schemes, NNM map issues, API and developer issues, 3rd party integration and distribution and scalability issues.

What is INSM?

INSM is the combination of network management and systems management. But what, exactly, does this mean? While seemingly obvious, in practice INSM is actually very difficult to achieve because of the way in which INSM historically evolved in relation to the changing way in which organizations have managed information technology.

Today, many IT shops still remain divided in the way they separately manage their systems and their networks. This is not necessarily a bad thing, because the nature of network management is indeed very different from the nature of managing systems. Still, the historic co-mingling of systems and network management become necessary through the need to manage increasingly distributed applications. This drove the demand for vendors to integrate these formerly separate islands of IT management.

As applications became separated from the systems upon which they formerly ran, the need to change the way they were managed evolved too. The most obvious way to satisfy this need was to combine network management tools with systems management tools. Unfortunately, this proved very difficult because network management tools were, by their nature, different from systems management tools.

Network management tools, which evolved around the very popular SNMP protocol, focus on obtaining information from the network and using that information to build a picture of the environment. Topology, trend data, and up/down status are obtained through SNMP "gets" or other types of polls from the management station. Maps and databases are then created which become a central tool for troubleshooting and launching drill-down tools to allow network administrators to solve problems and learn more about their environment.

In general, then, network management's emphasis is less on receiving and manipulating events than it is on pulling information from the environment. SNMP's popularity lies not only in its simplicity, but also in its power to provide information on demand to network managers via the SNMP MIB. The event generation aspect of SNMP via SNMP traps is a less important piece of the whole network management picture; UDP as a protocol does not guarantee the delivery of SNMP traps.

Systems management tools, on the other hand, rely more on events "pulled" by agents and sent directly to the management station. Messages are solicited; their transports must be reliable, their flexibility must be great, and their display is the central focus of systems management tools in general.

Thus the event infrastructure in systems management tools grew complex to accommodate the need for reliability as well as to provide an increase in agent functionality to include automatic actions, remote control, and other extensibility tools for application management. On top of this fundamental difference in basic architecture, the first systems management products also were designed to accommodate the industry's demand for standards compliance in distributed computing. This caused a divergence from architecture and standards in network management.

Leaving aside the organizational differences in managing networks and systems, from a strict product perspective, the integration of network and systems management tools is a daunting task. Today, industry vendors talk about platforms and umbrellas, but very few products can demonstrate any meaningful level of INSM. Network and systems management products from the same vendor often have only very simples levels of integration.

HP OpenView and INSM

HP OpenView is no exception to the aforementioned generalities about IT management tools. The development of HP OpenView products roughly follows the trends in IT management demands for tools. Although it was built using the OpenView Windows API’s as was NNM, Operations Center was first released as a distributed systems management tool with little integration into Network Node Manager. At the time, there was no demand for INSM solutions.

Demand for integration followed very shortly, and by the time Operations Center was renamed IT/Operations (ITO), some basic integration between NNM and ITO became available. This included integrating SNMP traps into ITO's messaging infrastructure, allowing ITO operators access to NNM maps and menu-bar pull-downs, and highlighting objects in NNM Maps from an ITO message. To this day, regrettably, INSM in OpenView has not progressed much further.

By far the greatest benefit of INSM is the ability to combine network device events with systems events in order to raise the IT management bar. The integration of SNMP events into the ITO messaging system, then, becomes the true defining point for INSM, and the critical piece of functionality which determines the level of systems and network management integration in any particular deployment of INSM.

The remainder of this paper will delve into the specific aspects of integrating SNMP events into ITO. It will be demonstrated that while ITO offers a tremendous amount of flexibility in the level of INSM that can be deployed, the choices are limited by a general lack of functionality in the tools supplied specifically to assist the ITO administrator in achieving INSM. While the focus of this discussion will be on ITO version 4 and its slaved version of NNM 5, new features which speak specifically to the integrated management of events in ITO 5 and NNM 6 will also be discussed.

INSM Event Management Overview: Basic Terms

Before going into specifics too deeply, it is important to define some basic terms. The definitions below are presented as basic generalizations to distinguish different IT management areas of focus and in order to place INSM correctly in its IT Management context. It is likely that these terms have very different meanings in theoretical contexts and in standards such as Information Technology Infrastructure Library (ITIL), but in practice, the terms have come to have certain meanings in the workplace. Other terms commonly associated with INSM include event management, fault management, performance management and problem management.

Event management as a term generally means the process of receiving events into the IT management organization, the tools for presenting those events to the IT management staff, and the infrastructure in place for the transportation of the events to the management stations.

Event management is the central focus of this paper, and while there exist tools whose sole function is to provide an event management infrastructure, this paper's focus is on event management within the separate and combined contexts of HP OpenView's NNM and ITO. These products provide event management interfaces that will be discussed in more detail below, and can be utilized as event interfaces when other event management interfaces are not otherwise available.

Fault management generally refers to the pre- and post-processing of events and is mostly associated with built-in and separate tools known as event correlation engines. Problem management generally refers to the processes and tools associated with help desks and help desk tools. Performance management is generally the collection of real-time and historical system performance metrics and the generation of events based on trends and pre-set thresholds.

In best practice, at least these mentioned tools and processes are at interplay in IT management of INSM. But without successful integration of network management events with systems management events, fault management, performance management, problem management, other levels of IT management and INSM efforts cannot be fully successful. Finally, these tools, when combined, provide the best infrastructure for distributed application management.

INSM Event Management Overview: Events and Traps

What exactly is an event, then? This is actually not an easy question to answer. As a general term, an event is used loosely to describe a unit exception or fault. Within HP OpenView, however, an event has a very specific technical meaning. Similarly, a trap has a very specific technical meaning and a more general usage as well. This can be a source of confusion when talking amongst the OpenView-literate in particular. Within this paper, both terms are used in their general sense, mostly because the IT/Operation product itself also uses the term trap interchangeably with event when technically they are distinct.

The technical definition of trap is: an unsolicited SNMP notification sent from an SNMP agent to its internal list of SNMP managers via UDP. Traps are received by SNMP managers on port 162. Trap formats are defined in Internet Request for Comments 1157 and 1901.

The technical definition of an event within the context of OpenView Network Node Manager is: Also an unsolicited SNMP notification, but one that is sent internally to NNM. Transport is TCP (also port 162) and can also occur between copies of NNM and SNMP-API registered applications. All SNMP traps become events once received by the NNM event subsystem. Internally generated events include Node_ down/Node_ up events, and some NNM daemons use internal events to signal each other. Generally, the largest supplier of SNMP events is NNM itself. NNM generates events resulting from polls, topology changes and inter-process communications. Many of these are unseen in the events browser, but can be seen in NNM’s underlying logfile, trapd.log.

Within the context of ITO, then, the terms are used more generically. The ITO trap interceptor, for example, actually receives events, the superset of traps, from the NNM event subsystem. This will be discussed at length below. ITO uses a different nomenclature for events; ITO uses the term message. A trap becomes an event and that can become a message. In practice, the ITO messages received from NNM which were internally generated SNMP traps are very often still called traps, though technically they are events.

An ITO message is received at the ITO agent level from a source such as a logfile, SNMP event, console (MPE), threshold monitor, ITO itself, or via an API. It is filtered and can be completely restructured. It is buffered and transported reliably to the ITO server where it can again be restructured. The ITO message is stored in an Oracle RDBMS on the management server. It has predefined formats and is accessible via API’s at several levels.

INSM Event Management Overview: How It Works

As discussed above, the main benefit of integrating network events into the systems management space is to combine the events. This allows the network events to be related to the system events. This also offers the opportunity for the rudimentary event management system of NNM to take advantage of the robust event management system of ITO. The NNM event management interface is the foreground process xnmevents. In versions of NNM before Version 6.0, xnmevents offers little in the way of features to justify its use as a stand-alone event management interface when the ITO event infrastructure is available. xnmevents has the ability to mark events as acknowledged and predefined and user-customizable categories can be configured, but the GUI is, overall, a simple graphical front-end for the ASCII logfile for NNM events, trapd.log.

It is important to note here that in Version 6 of NNM, the event subsystem is completely different from previous versions. The events can now be logged to an embedded relational database. In addition, they can be rerouted through a fault management subsystem (ECS) that can perform root-cause analysis on NNM events. In Version 6, the xnmevents GUI has been reworked to display events relationally: a set of events related to a root cause will now be displayed as indented entries under the root cause event.

This sort of relational display will not be impossible in the corresponding version of ITO. Those trying to best integrate networks and systems events will likely want to proceed with the integration, but also retain the xnmevents GUI if they choose to implement the root-cause analysis features in NNM 6. In short – ITO 4 offers some difficult choices for INSM. These will be discussed at length below. Under ITO 5, these choices are made much harder by a divergent increase in functionality in the NNM events GUI. NNM 6 new features will only be discussed briefly below.

Even with the addition of an RDBMS and event correlation to the event subsystem of NNM 6, ITO still provides a superior event management infrastructure. There is:

More flexibility in selecting node sources for events
An ability to create customized operator views based on event sources and

message groupings

Propagation of the most critical event status through icons
A robust set of linkages and API's for feeding and diverting messages to notification, fault, performance, problem and other management systems

Event management architecture in NNM

In order to decide how best to deploy INSM with ITO and NNM, a clear understanding of the management product’s event architecture is necessary.

Under NNM, SNMP traps are received at UDP port 162; ovtrapd is the daemon that reads the traps off the socket and, if necessary, provides buffering. The traps are then sent to pmd, the Postmaster daemon. As clearinghouse for events, pmd is responsible for logging events and communicating events to the processes that register with it to receive events of interest.

OV_EVENT is the operative stack within pmd. pmd supports multiple stacks; however only the OV_EVENT stack ships with NNM and ITO.

SNMP trap formats are translated to human-readable form via the trapd.conf configuration file. Event severity categories and actions are defined in trapd.conf. While the xnmtrap foreground process is NNM’s graphical front-end to the trapd.conf file, the file is an ASCII text file and can be manipulated directly using your favorite editor. (For more information on the format of the trapd.conf file, see the man page for ov_event.)

Once pmd parses SNMP traps and other OpenView events according to the rules defined in trapd.log, the events are logged to the ASCII trapd.log file and forwarded to the xnmevents GUI. Any automated actions associated with specific events are forwarded from pmd to the ovactiond daemon that buffers and performs those actions.

Event management architecture in ITO

The ITO messaging architecture is, as expected, a bit more complicated than this. Designed to handle input from multiple sources in multiple forms on multiple levels, the message architecture of ITO assures transport of messages, provides an infrastructure for distributed actions, and provides API hooks along the way to feed other IT management tools. Though HP complicated the client and server architecture of ITO in Version 4.0 by adding the Open Agent and Open Agent Manager layers, do not be intimidated; these daemons simply pass the messages straight through and only exist to provide an infrastructure for other OpenView products such as IT/Administration.

ITO is a two-tiered client-server implementation. While the architecture is complex, flexible, and extensible, the agent’s underlying instrumentation is very simple. The agent provides daemons for control, communications, message encapsulation, threshold monitoring and local and remote actions.

All buffering is handled through message queues. Local filtering and logging of messages is performed based on templates assigned and distributed from the management server. The templates, that ship with ITO for logfile encapsulation and threshold monitoring are very simple and provide only rudimentary systems management functionality. The agent runs on over a dozen operating system’s platforms. The agent daemons are organized as follows:

ITO Agent daemons: opcctla control agent

opcmsga message agent

ITO Sub-agent daemons: opcacta action agent opcle logfile encapsulator

opcmona monitor agent opcmsgi message interceptor

opceca ECS agent(optional) opctrapi trap interceptor (optional)

The server gathers messages sent by the multiple agents (in practice up to 2000 per server) and runs on HP-UX and SUN Solaris. Both DCE- and NCS-based RPCs are supported as the message transport mechanism. DCE security at many levels is also supported, and agent-server communication through firewalls is also supported. Messages received by the management server are buffered in message queues. They are filtered, forwarded to appropriate display receivers, and logged in the RDBMS. Operator actions are initiated from the display receiver through the server and can be executed by any ITO agent system’s action agent daemon. The ITO management server daemons are as follows:

ITO management server daemons: OpC registered object in ovsuf file opcctlm control manager

opcactm action manager opcmsgsm message manager

opcttnsm trouble ticket/notification manager opcsm session manger

opcforwm ITO-to-ITO forwarding manager opcdispm display manager

opcdistm distribution manager

opcdispr display receiver opcecm event correlation system manager (optional)

ITO Open Agent Management daemons:ovoacomm registered object in ovsuf fileovoareqsdr request sender

opcmsgr message receiver and dispatcherovoareqhdlr request handler

opcmsgrd DCE message receiver

Event management architecture integrated

Bringing SNMP events into the ITO fold requires the configuration and distribution of the management server’s own agent. This does not occur by default upon product installation. All ITO agent software message templates are pushed from the management server, including the management server’s own agent. When the ITO SNMP trap template is distributed, the ITO agent’s opctrapi daemon reads the template and registers for events with pmd. By default, the SNMP trap template reflects the same event configuration as is configured in a standalone NNM 5.x installation, as contained the trapd.conf file.

Once ITO is set up to receive SNMP traps, the flow of events is as follows:

Traps are received on UDP port 162; events are received on TCP port 162 or directly by pmd.

ovtrapd buffers SNMP traps and sends them to pmd.

pmd then logs the messages in trapd.log according to the rules in trapd.conf.
Events are simultaneously forwarded to the opctrapi daemon.

opctrapi formats the event as an ITO message and queues it for opcmsga.

The opcmsga daemon forwards the message to the message receiver/dispatcher, opcmsgr (using RPC’s, even though the agent and server are on the same machine.)
Within the management server, the message dispatcher informs the request handler, ovoareqhdlr, which then passes the message on to the message manager, opcmsgm.

opcmsgm logs the message and passes it on to the display receiver, opcdispr, and other interested daemons.

The important message-related ITO agent and server daemon processes again are:

opcmsg(1|3) opcmsg API and command line tool. Note that

opcmsg requires that the opcmsg message template

is distributed.

opcmsgi Message filtering and parsing daemon for opcmsg

source messages generated by opcmsg and opcmon.

opctrapi Message filtering and parsing daemon for traps.

registers directly with pmd for event feed.

opcmsga Interface between subagents and server; notified by a

pipe or event from the subagent which placed the message in the queue. The opcmsga can forward local action requests to actagtq.

opcmsgr/d Message Dispatcher. Receives data from agents and places

it in the message queue according to server registration and informs the request handler.

opcmsgm Stores the message in the RDBMS, forwards to user GUIs,

copy/diverts message to the message stream API

Under ITO, the message source templates hold the definitions and filtering rules for event input. The SNMP trap template contains a base definition and a set of conditions for the input that is read from the opctrapi daemon. The filters are called message conditions. There is a one-to-one relationship between an NNM event definition and an ITO SNMP trap template message condition. At this point, the similarity ends, though, as the message attributes of a trap definition do not match those of a message condition in many important respects.

Utilities to support the integration of SNMP events into ITO are provided to translate the SNMP events from NNM to the format of the ITO message source template. ovtrap2opc reads the trapd.conf file and creates a set of files that can be read by the ITO configuration upload utility, opccfgupld. ovtrap2opc invokes opccfgupld, which reads the ASCII file template definition directory structure and feeds it into the oracle RDBMS. These utilities will be discussed more at length below. The opccfgdwn utility, which is the only one of these utilities that can be invoked from the ITO administrator GUI, reads the configuration data from oracle and downloads it into the ASCII file structure.

ITO and NNM integrated event architecture

INSM and SNMP Management: Set-up

The initial configuration of ITO to receive SNMP traps assumes the same set of traps that NNM receives, but it will only accept traps for those nodes which are placed in the ITO Node Bank. The default node source is just the management server itself, which represents a very limited set of node sources.

There are two ways to configure additional node sources. One way is to add the icons for the nodes for which SNMP traps are to be received into the Node Bank. This option is fine if the set of node sources is to be very limited: a couple of important routers, for example. But too many nodes in the Node Bank can cause ITO administrators and users confusion because the main purpose of the Node Bank is to graphically represent the set of nodes in the environment which have ITO agents.

Fortunately, there is another option. Icons representing external sources of events can be added which represent multiple message sources. The Node Bank menu bar option "Actions -> Add Node For External Events" was specifically designed for SNMP events. Multiple sources can be specified using IP address wildcards. The syntax for the IP address wildcard to accept every SNMP event into the ITO message browser would be: <*>.<*>.<*>.<*>

Curiously, the default ITO 4.0 installation ships with an ITO message group dedicated to SNMP traps and a default ITO user that is configured to see these messages, but no external node source configured for this message group. When configuring the node for external events, use the default message group: net_devices (note that message groups can be arbitrary and you can configure multiple message groups for SNMP traps based on different node sources. Similarly, the SNMP trap template itself can be broken up so that particular events go into particular message groups. This will be discussed later.)

INSM and SNMP Management: Strategies.

Given the flexibility of configuration of SNMP traps and reconciling this with the availability of the simpler NNM event management interface, what is the best strategy for implementing INSM? There are several strategies to consider, and ultimately the one chosen will have to take into account the level of investment required to achieve it, the organizational structure, the existing processes and procedures for IT management, and the ultimate goal of mixing SNMP traps with systems management feeds.

As we have seen above, the default trap handling mechanism is very limited in scope of nodes for which SNMP traps are received. This, however, may be a perfectly viable strategy, though it really can’t be called INSM, since many important network events would not be visible in the systems event management interface.

In essence, the default SNMP trap handling strategy represent a maintenance of separate IT management domains because very few (if any) SNMP traps are conveyed into the ITO infrastructure. This is because the typical target ITO agent is a computer system running some operating system, and operating systems in general send no SNMP traps by default. In fact, most require add-in applications to even provide meaningful data in their SNMP MIBs. Even when this is done, no SNMP traps are generated unless the SNMP agents are extended programmatically to do so. Network events are either not seen, or must be viewed via the NNM xnmevents browser or from trapd.log directly.

The default SNMP trap template handling is desirable in shops where systems management and network management are largely separate functions. In such a case, the recommended best practice is for the systems management staff to use ITO exclusively and not assign the ipmap application in the user’s ITO Application Banks. The network management staff would launch the NNM product separately, which will provide event visibility to only the SNMP traps via xnmevents.

There is another possibility that does allow some level of INSM with this strategy. This implies giving systems management visibility into the network management arena. All that is needed is for the xnmevents application to be added as an OV Service in the ITO Application Bank. Both NNM and ITO browsers will be launched for the network management user, though, and network management users will have maps with status propagation set to "Propagate Most Critical"…an annoyance at best. This alternative may work well if NNM 6’s root cause analysis is enabled and important in the overall INSM context.

A problem with this strategy, though, is that SNMP events aren’t logged to the RDBMS. But an advantage is that SNMP trap configuration and maintenance is very simple. Still, while network managers may gain visibility into systems events, having systems management-oriented visibility to network events is much more powerful from an IT management perspective. This is because so many other IT management areas of focus take their feeds from the system manager. Problem management, application management, performance management and fault management would be more difficult to integrate under this strategy.

An alternate strategy is to receive all SNMP traps into ITO. This is also very easy to configure: simply add an external node source for all nodes. This commonly configured strategy provides complete INSM. The ITO SNMP trap template becomes a better interface for administering, customizing, and increasing the ability to perform automated actions based on SNMP events. All events are logged into the RDBMS and there is a single event management interface for network and systems management. While this strategy is easy to implement, it does not represent a best practice, however. In fact, it can cause serious problems, even leading to failure of the management platform.

The major problem with turning on the SNMP trap template to all node sources is that message floods are quite possible and may go undetected because the way the traps are "translated" from the NNM event interface to ITO. Under NNM, SNMP traps are configured into severity categories, including a "log-only" option, which will log the trap to the trapd.log file, but will not display it in the xnmevents GUI.

Under ITO, all log-only traps are translated as "put directly into history", which means that the messages are logged into the RDBMS into tables which are not seen from the ITO user’s displays unless they select the "view history" menu pull-down from their active message browser. The ITO history message table is the repository for acknowledged messages under ITO and are often not watched by ITO administrators.

Many log-only SNMP events are generated by NNM. Most are generated by NNM itself as the result of netmon polls. Interface-up and interface- down events are examples of OpenView traps, which are "log-only." The default snmpt trap template contain over 150 "log-only" OpenView event definitions.

These "log-only" SNMP events can flood the ITO RDBMS without alerting the administrator, but may contain important information to retain for future reporting purposes. In general though, turning on SNMP for all node sources generates too many ITO messages that are unwanted, both in active and in history message tables.

In addition to this problem, the issue of trap template maintenance crops up. What happens when new or updated SNMP MIBs (which have trap definitions associated with them) are uploaded under NNM? If changes are made to either the trapd.conf file or the SNMP trap template, they are not cross-pollinated. Indeed, consistent maintenance of the SNMP trap template and trapd.conf is a very difficult issue, and will be discussed in detail below.

The two strategies suggested above represent the most common deployments of INSM with ITO. Clearly, both leave much to be desired. There is obviously a middle ground between accepting very few SNMP events and accepting all SNMP events into ITO. The remainder of this paper will be dedicated to fleshing out the best practices in configuring ITO such that only the most important SNMP events are integrated into ITO for only those nodes of interest. This will be done in the context of providing a methodology for ongoing maintenance of this configuration.

INSM and SNMP Management: Issues

As discussed above, ITO message floods that may result from integrating SNMP events from all or many node sources represents the most pressing issue with deploying INSM with ITO. There are several solutions, but from maintenance perspective, the best solution is to completely delete unwanted traps from the SNMP trap template (after making a backup copy of it). A detailed procedure appears below.

Another solution would be to mark all unwanted SNMP traps as suppress conditions, but this is tedious as there are upwards of two hundred "log-only" conditions. Also, manually editing the SNMP trap template download file and uploading it is not an option because the format doesn’t permit simple switching of a match condition to a suppress condition and it would be a very non-trivial task to write a script to do that.

Once the potential for message floods are curtailed, the issue with maintaining the trapd.conf file in sync with the SNMP trap template crops up. Again, this problem arises when, for example, a third-party product such as Optivity or CiscoWorks is installed, and changes are made to trapd.conf, but not updated in the ITO SNMP trap template. This could result in unformatted traps being received into ITO.

One would think these updates could be automated using ovtrap2opc, the tool for exporting the trapd.conf file to an ITO template file, but this tool is woefully inadequate because any template customizations are overwritten. One option would be to manually update the SNMP trap template after changes are made to trapd.conf, but sometimes many trap definitions need to be updated and manual updates would be too time consuming. A procedure for updating many trapd.conf changes to a customized SNMP trap template is given below.

The ovtrap2opc utility works, but in a very limited fashion. When using ovtrap2opc in an attempt to update an existing SNMP trap template, all previous customizations are overwritten, even when using the -subentity option of the upload procedure (more on that below). One can imagine that the utility could be used for ongoing maintenance by maintaining all SNMP trap customizations in trapd.conf and rewriting a new SNMP trap template into ITO in order to update changes made to trapd.conf. But that won’t work because many trap customizations are not preserved by ovtrap2opc.

Some of the problems with ovtrap2opc include:

Actions with special characters, such as "/", "@" and "`" do not translate well.
Actions will attempt to launch on target node by default.
Traps with specific node sources are translated to node source: all.
CiscoWorks trap definitions break ovtrap2opc (fixed in PHSS_14695).

So, while ovtrap2opc is useful for uploading new trap definitions that have not yet been customized, its usefulness ends there. In addition, other limitations muddle the ability of trap customizations to be maintained in either one or the other place. Trap forwarding can only be set up in trapd.conf. Also, ITO is relatively restrictive and even buggy when multiple node sources are specified for a particular trap condition, and traps.conf’s support of a file-based list of node sources is not supported under ITO.

The solution is to maintain both trapd.conf and a truncated SNMP trap template, and use the strengths of ITO for customizing certain SNMP traps while using xnmevents as troubleshooting and tunnel-down tool for further investigating network faults. For major changes in trap definitions, use the procedure provided below for updating the ITO SNMP trap template without compromising its customizations.

Similar to the dilemma of having two event management interfaces to maintain and configure, ITO 4 introduced server- and agent-based event correlation, and with ITO 5/NNM 6, there will be an addition layer of event correlation built into the SNMP trap mechanism. So, where does one best correlate SNMP traps? For pre-ITO 5.0, the answer is to correlate closest to the source. This would imply using the NNM ECS runtime if available or the ITO ECS runtime for the ITO agent.

Under ITO 5.0, the Central ECS Designer (V 3.0+) will operate at any of the three entry points for ECS. This will make it more desirable to correlate at multiple levels, though, the same general rule applies: correlate closest to the source.

INSM and SNMP Management: Deployment
Paring Down the default SNMP trap template:

As a general administration best practice, few or no messages should be logged directly to history under ITO. As mentioned above, SNMP traps are important to see under ITO as indicators of possible problems with distributed systems and applications. But SNMP message floods can bury other important messages or cause the ITO database tables to fill up. Rely instead on xnmevents to browse SNMP events when necessary for troubleshooting or other management puposes.

Here are the steps for minimizing unneeded OpenView traps in the default trap template:

Copy the default SNMP trap template that is listed under the Management Server template group and give it a new name.
Delete the original trap template from the Management Server template group and note that it will be moved to the Default template group.
Open up the trap template conditions window on the new trap template and delete unwanted trap conditions. These traps are examples of log-only OpenView traps which typically be assigned some severity and not deleted: Authentication_fail, Node_up, Interface_down, Interface_up. Try to catch all log-only traps and either delete or log under a specific severity. It may be useful to start up the xnmevents GUI to more easily study the individual trap definitions.
Modify the template and select the suppress unmatched messages option. If this step is skipped, all deleted trap conditions will be passed through as unformatted traps.
Under advanced options, choose to suppress identical messages for particularly repetitive, yet important traps such as Authentication_fail.
Freely customize the remaining SNMP traps, but do not run ovtrap2opc unless using the update procedure below.
To recreate a trap accidentally deleted from the trap template, refer to its corresponding condition definitions under the original trap template that is now under the default template group. When recreating a deleted trap condition be sure to remember to copy the event description information from the instructions section of the original trap condition. This could also be copied from the event description in the trapd.conf file.
Manually update any previously customized SNMP traps under older installations of NNM or traps subsequently added via the installation of 3^rd party software products or through the loading of SNMP MIBS.
Use the utilities menu pull-down to download the trap template configuration to save as a backup of your trap template.

Bulk Updates to Customized SNMP trap template

Once the SNMP trap template has been heavily customized, great care must be taken in any attempts to upload new SNMP trap definitions from trapd.conf via ovtrap2opc or opccfgupld. The following procedure is designed to allow many new trap definitions that might appear in the trapd.conf file after MIB uploads or third- party products have been integrated with NNM/ITO:

Make a backup of the trapd.conf file.
Load new SNMP Trap Definitions. Remember that many SNMP MIBs contain trap definitions that can be automatically appended to the trapd.conf file. The xnmloadmib front-end GUI for loading MIBs will prompt the user to inform if there are trap definitions associated with the MIB. Remember that installation of some third-party products may invoke xnmloadmib.
Check the trapd.conf file for the new trap definitions. If there are only a handful of traps, seriously consider updating the ITO SNMP trap template manually. Use the UNIX diff command if available.
Use the UNIX diff command or some other editing tool to manually separate out the new trap definitions from the trapd.conf file.
Create a file with the new trap definitions, give it a name, say trapd.opc.

Add a line to the top of this file with the text: "VERSION 3" just like the first line in trapd.conf.
Run the command:

$OV_BIN/OpC/utils/ovtrap2opc $OV_CONF/trapd.opc "My SNMP Traps" mytraps.

The script will ask whether the trap definitions are to be uploaded; answer: "no." The trap template name "My SNMP Traps" should be the name of the customized SNMP trap template as listed in the message source templates window. The next argument is the name of template download directory that will be created (or overwritten) in the template download directory path, which is: /var/opt/OV/share/tmp/OpC_appl. If this directory exists, you will be prompted to overwrite it.

Run the command:

$OV_BIN/OpC/opccfgupld –subentity –add mytraps.

Check the message source templates and see if the new trap definitions appeared properly. There should be no need to restart the ITO GUI.

Some final considerations:

ITO Version 4 introduced support for multiple trap templates. This means that if you have a group of trap definitions, you can place then in a separate template. This would be useful for breaking up a large number of trap definitions or setting global message attributes such as Message Group. HP’s OmniBack II product is an example of a product that uses SNMP traps to convey messages. There are about 60 trap definitions which OmniBack II adds during its integration with NNM and/or ITO.

Separating them out into their own templates improves management of SNMP events in ITO.

Another integration point between NNM and ITO that is related to SNMP events is the MIB Object Monitor. While the MIB Object Monitor is an excellent way to take advantage of the very robust threshold monitoring capabilities of ITO, it is somewhat limited in it’s ability to handle multiple node sources. Threshold monitors in ITO also support multiple objects, so multiple thresholds can more easily be established for a single MIB object. For more information on object monitors, see the ITO documentation.

Lastly, here’s a simple trick to automatically acknowledge a node down event upon receipt of a node up event. (It’s also useful for the interface down and interface up events, particularly for router interfaces.) The basic idea is to run an automatic action with each down event that writes the ITO Message ID out to temporary file. The up event then reads the ITO Message ID from the temporary file into the command line tool for external ITO message acknowledgement, opcmack. Here is an example automated action command for an interface down event:

echo <$MSG_ID> >/tmp/if<$2>.<$7>.tmp

The interface up command would then be:

/opt/OV/bin/OpC/opcmack `cat /tmp/if<$2>.<$7>.tmp` ; rm /tmp/if<$2>.<$7>.tmp

$2 is the SNMP variable binding for the node name and $7 is the SNMP variable binding for the interface name. Under advanced options for the node up event, you will need to unset the options for messages on action failure as the down message might have been acknowledged by an ITO user before the action is launched.

Recap

Integrating network and systems management is a chore – even with products like OpenView IT/Operations which ships with ‘Built-in" integration. Like every other aspect of IT Management these days, the tools provided require a heavy investment in order to return decent payback on the IT Management investment dollar.

Integrating event feeds from separate management domains has always been particularly difficult. ITO as a product is very powerful, but there is little help in the way of documentation or in the ITO courses offered by HP or third parties to address the specific difficulties with integrating network oriented events into the ITO message infrastructure. Hopefully, this paper fills a gap and helps build a clearer understanding of INSM with ITO.

Once again, integrated network and systems management is a very important aspect of application management. Other IT management areas are also very important to achieving an ability to manage today’s distributed computing environments, like fault management, problem management, and performance management. Without the foundations of these more established areas of IT management, higher levels of IT management, such as service management cannot be successfully deployed. Unfortunately, in today’s world of buzzwords and fads, INSM is considered "old hat". It is thought to be conquered territory in so far as many vendors offer both network and systems management products within their "frameworks". It is generally assumed these products are tightly integrated, but the opposite is true.

Why? The basic architecture of network management tools is hard to reconcile with the architecture of systems management tools. Indeed, just in terms of event management within these management domains, there exist compelling incompatibilities. There are good reasons for this though, as hopefully demonstrated above. Less discussed in this paper was the organizational differences that have evolved in the IT management world in the areas of network and systems management. To this day, every IT shop struggles with the evolving need to integrate disparate IT management functions. The future holds only greater demand for integration, hold on tight and enjoy the ride.