Adding New Metrics to the Metrics Service
The Metrics Package has helper classes for collecting raw sensor values, converting them into metrics, adding them into the path model, and displaying them in a servlet. In this section, we explain how these helper classes can be used to add new metrics to the Metrics Service.
The Metric Service gets its data from several sources and integrates them into a up-to-date picture of the environment in which the Node finds itself. There are three ways of originating data into the Metrics Service, these are:
- Sensor data from Cougaar code measures the internal consumption of resource by Agents and Nodes. For example, the number of messages sent by an Agent or the amount of CPU or memory consumed by an Agent
- Network Management Configuration Files are used to get information about network characteristics from external Network Management Systems.
- Host Level probes are used to measure the characteristics of the Node’s host
All these sources of data use the same “patterns” for capturing and processing the data. The processing of Metrics can be described as a data flow, which is processed by different components. Each component does a different job in converting raw sensor data into usable metrics. The figure below represents the processing of data as it flows through the Metrics Service:
- Sensors instrument the system and store raw statistics records. The records can be accessed via a service, which takes a snapshot of the record.
- Rate-Converters gathers snapshots of sensor records and converts them into metrics. The metrics are published into the Metrics service.
- Metrics Service models the environment. Internal to the model, formulas can depend on the raw metrics that are published into the Metrics service. The formulas can be accessed from the metrics service using paths.
- QoS Adaptive Components use the Metrics Service environment model to adapt to the Node’s processing to changes in the environment.
- Servlets are used to view both Metrics and raw sensor records.
The patterns used to create these components are described in the following sections. Some explanations of this process are more easily understood when referencing specific pieces of the code, which will be highlighted in green for emphasis.
Sensors have probes in the critical path of the basic operation of the Node. The probes must have very little overhead, so they have limited functionality. Probes only have time to detect that an event has happened and tally the results in a statistics record. The record holds many raw sensor values. There are 3 kinds of raw sensors values, which can reside in either the code or host probes. which are:
- Gauge: This is an instantaneous value, such as utilization, current resource capacity.
- Counter: Continuously increments some value like MsgsIn/Out or BytesIn/Out. org.cougaar.mts.std.AgentStatusAspect is an example of such a Sensor, continuously counting the MsgIn/Out & BytesIn/Out for local agents of a node, and the sums for the node itself. org.cougaar.core.qos.metrics.AgentLoadServlet is one gui where they user can observe this data.
- Integrator: Counter that is a weighted summation of delta t * the value in that period. Integrators are used find the average value of gauge values over a time period. For example, Agent’s CPU consumption is measured by the average number of threads used to process the Agent’s code. (This is called the Agent’s Load Average). org.cougaar.core.thread.AgentLoadSensorPlugin is an example of such a Sensor, continuously counting the MsgIn/Out & BytesIn/Out for local agents of a node, and the sums for the node itself.
An external clients of the Sensor component can sample the statistics record using a service offered by the Sensor. When a client samples data, it takes a Snapshot at a point in time, resulting in a collection of these snapshots which need to be processed to create usable Metrics. Notice the processing of the raw statistics is not done as part of the critical path, but done by an independent component with its own threads. The processing of snapshots is the subject of the next section.
Rate converter are clients to a sensor service and periodically take snapshots of the sensors record. The two snapshots are processed to create meaningful metrics. The time between snapshots defines the averaging interval for the metric. The processing is different for each field in the record, depending on the kind of sensor value. For example to calculate rate of messages being sent by an agent, the sensor value for the count of messages from the agent is needed. The change in count is subtracted from two snapshots and the change in time is subtracted from the snapshot timestamps. The message rate is calculated by dividing the change in count by the change in time.
Rate converters have the following internal structure:
- Record Processing: Data snapshots are collated and computed as a rolling average of changes in time & value. A Decaying History object is used to store the snapshots and call the record processing.
- Decaying History: Is an object which coordinates the storage and processing of snapshots. It maintains a decay store of snapshots and call the processing of data snapshots for several averaging periods. (10, 100, 1000-second averages are common).
is one plugin which implements the DecayingHistory’s newAddition(), which updates the Metrics Service with the processed value.
- Key generation: After the snapshots are processed and represent a value per some time duration, a Key is generated in relation to the rolling average of time, and used to insert this Metric into the Metrics Service.
Metrics Service Models
The Metrics Service maintains a model of the environment for the Cougaar society. The model includes contexts for the Hosts, Nodes, and Agents in the society. Each Node maintains its own instance of the model and updates it using Metrics published into the Metrics Updater Service.
- Resource Context: model a specific entity in the environment. Resource Contexts can contain child contexts. The children inherit all the attributes of its parent. For example, an Agent context is a child of Node, which is a child of Host. The Agent has a CPU count attribute that it inherits from its Host.
org.cougaar.core.qos.rss.AgentDS is an example of a Resource
- Formula: calculate attributes of a context. Formulas may depend on other formulas or raw Metrics published to the Metrics service. The org.cougaar.core.qos.rss.DecayingHistoryFormula is a helper
class which create formulas that get their values directly from Metrics published by a Decaying History object.
- IntegatorDS integrate published metrics from many DataFeeds into a single value. Resource Contexts Formulas get the values for published Metrics by subscribing to Integrator formulas, instead of getting the Metrics directly from the feeds themselves. The IntegratorDS is part of the QuO RSS, which is outside of
the Cougaar code.
Evaluating the Metrics Service & Observing the system is usually done through GUIs in the Cougaar system, using Cougaar Servlets. In the Metrics Service, there resides many differnent servlets pertaining to communications, resource constraints, etc., please refer to Operation Using Servlets for the different kinds of servlets currently available.
Metrics Servlets all extend a Metrics Servlet template, which is formatted to make easy interpretation of the Metrics with tables, background & text colors. Here’s a quick guide to interpreting a Metrics Servlet: Refer to the servlet page.
- query: This is a servlet window into the Metrics Service, or when a servlet is queried it reveals a current snapshot of those current Metrics Service values.
- Value: Dislayed as the base text, this is the Metrics name (e.g. AgentA Load)
- Credibility: Displayed as the color of the text: light gray to indicate that the metric value was only determined as a compile-time default; gray to indicate that the metric value was obtained from a configuration file; black to indicate that the metric value was obtained from a run-time measurement. A metric’s credibility metric is an approximate measure of how much to believe that the value is true. Credibility takes into account several factors, including when, how, by whom a measurement was made.
- threshold: Displayed in the credibility, and when this this certain Metric’s value crosses a threshold, it may be interesting enough to warrant attention. This is shown by the color of the background. A green background indicates that the value is in the normal range. A yellow background indicates that the values is in a typical ground state, i.e nothing is happening. A red background indicates that the value has crossed a metric-specific threshold and may be interesting.
Sensor Servlets allow viewing of a raw sensor servlet. The main use of sensor servlets is for debugging.
QoS Adaptive Components
QoS Adaptive components subscribe to Metrics using the Metrics Service. How to create QoS Adaptive Components is out of scope for this section.
The same name for a metric must be used throughout all components of the metrics service. To aid consistency, we recommend using the metrics constants. The list of constants is documented in org.coguaar.core.qos.metrics.Constants
- Metrics Name: Quite obviously, the name of the Metric, as it applies to its measurement & key generation. Ususally, CPU_LOAD_AVG, CPU_LOAD_MJIPS, MSG_IN, MSG_OUT, BYTES_IN, BYTES_OUT
- Averaging intervals: Used in the Key of a Metric, referring to the averaging interval of value to time of that value. The current intervals are: 1_SEC_AVG, 10_SEC_AVG, 100_SEC_AVG, 1000_SEC_AVG.