Programmer's guide to the Metrics Service
Use Cases and Examples
Example Metrics Reader
An example Metrics Service subscriber can be found in the core module.
package org.cougaar.core.qos.metrics;
/**
* Basic Metric Service Client subscribes to a Metric path given in
* the Agent's .ini file and prints the value if it ever changes
*/
public class MetricsClientPlugin
extends ParameterizedPlugin
implements Constants
{
protected MetricsService metricsService;
private String paramPath = null;
private VariableEvaluator evaluator;
/**
* Metric CallBack object
*/
private class MetricsCallback implements Observer {
/**
* Call back implementation for Observer
*/
public void update(Observable obs, Object arg) {
if (arg instanceof Metric) {
Metric metric = (Metric) arg;
double value = metric.doubleValue();
System.out.println("Metric "+ paramPath +"=" + metric);
}
}
} public void load() {
super.load();
ServiceBroker sb = getServiceBroker();
evaluator = new StandardVariableEvaluator(sb);
metricsService = ( MetricsService)
sb.getService(this, MetricsService.class, null); MetricsCallback cb = new MetricsCallback(); paramPath = getParameter("path");
// If no path is given, default to the load average of the
// current Agent.
if (paramPath == null)
paramPath ="$(localagent)"+PATH_SEPR+"LoadAverage"; Object subscriptionKey=metricsService.subscribeToValue(paramPath, cb,
evaluator);
}
}
Example Metrics Writer
An example Metric Service writer has the following form:
package org.cougaar.core.qos.metrics;
public class MetricsTestComponent
extends SimplePlugin
implements Constants
{
public void load() {
super.load();
ServiceBroker sb = getServiceBroker(); MetricsUpdateService updater = (MetricsUpdateService)
sb.getService(this, MetricsUpdateService.class, null);
String key = "Current" +KEY_SEPR+ "Time" +KEY_SEPR+ "Millis";
// Publish the current time
Metric m = new MetricImpl(new Long(System.currentTimeMillis()),
SECOND_MEAS_CREDIBILITY,
"ms",
"MetricsTestComponent");
updater.updateValue(key, m);
}}
Metric Keys
This section discusses the structure of the Metrics keys name-space and defines common keys.
Metrics are written into the
MetricsUpdateService as key-value pairs. While the key itself is just a string, we impose a structure to help organize the information, loosely modeled after SNMP MIBs. The value is a Metric, a structured object with slots for value, credibility, time-stamp, units, etc.
At runtime, each Node has a store of Key-Value pairs. When the
MetricsUpdateService receives a new Key-Value pair, it looks up the Key in the store. If the value has changed, all the Metrics Formulas that have subscribed to this Key will be called-back with the new Value. The Metrics Service in each Node has one and only one effective Value for each Key, which is derived through a complicated set of integration rules based on credibility and timestamps.
Normally, Key-Value pairs are not read from the Metrics Service, but are only processed by internal formulas. One debugging trick to view the effective value of a key is to the Path
Integrater, which takes a Key as a parameter and returns the effective metric. Currently, there is no way to list all the Keys available to a Node.
Key Syntax
A Metrics Key is a string divided into fields using the Key Separator, e.g. Host_128.33.15.114_CPU_Jips
The current Key Separators is "_". But your code you should use the string constant KEY_SEPR defined in org.cougaar.core.qos.metrics.Constants: "Host" +KEY_SEPR+ "128.33.15.114" +KEY_SEPR+ "CPU" +KEY_SEPR+ "Jips" Notice that some of the fields are types (e.g. Host, CPU, Jips) and others are identifiers (e.g. 128.33.15.114). The types are fixed and case sensitive. Identifiers are variable and define branches in naming hierarchy.
In the following section we will use the convention of writing identifier fields in brackets [], and using "_" as the Key Separator Host_[Host IP]_CPU_Jips
Host
Host Keys start with identifying the host and then have several optional field for different host characteristics. Host identifiers are raw IP V4 addresses, not domain names.
- Host_[Host IP]_CPU_Jips
- CPU capacity in Java Instructions Per Second. JIPS is determined through a benchmark.
- Host_[Host IP]_CPU_loadavg
- CPU load average, i.e. the average number of processes that are ready to run.
- Host_[Host IP]_CPU_count
- The number of CPUs in this host
- Host_[Host IP]_CPU_cache
- Size of CPU level 2 cache
- Host_[Host IP]_Memory_Physical_Total
- Total Physical Memory
- Host_[Host IP]_Memory_Physical_Free
- Free Physical Memory
- Host_[Host IP]_Network_TCP_sockets_inuse
- TCP sockets in use
- Host_[Host IP]_Network_UDP_sockets_inuse
- UDP sockets in use
IP Flow
Network Keys start with identifying the IP Flow and then have several optional field for different network characteristics. End-point addresses are raw IP V4 addresses and not domain names.
- Ip_Flow_[src IP]_[dst IP]_Capacity_Max
- The maximum bandwidth (kbps)for path across the network, i.e. with no competing traffic.
- Ip_Flow_[src IP]_[dst IP]_Capacity_Unused
- The expected available bandwidth (kbps) for a path across the network, i.e. the max minus the competing traffic.
Site Flow
Sites are an IP subnetwork which can be represented with a simple mask (number of bits with leading 1's). Site_Flows between sites can be used as defaults, instead of specify specific Ip_Flows. For example, Site_Flow_128.89.0.0/16_128.33.15.0/28_Capacity_Max
- Site_Flow_[src IP/mask]_[dst IP/mask]_Capacity_Max
- Maximum bandwidth (kbps) between Sites. The current formulas can model asymmetric bandwidth between Sites, by publishing Site_Flows for both direction. If only one direction is published, the formulas will assume the bandwidth is the same in both directions.
Agent
Agents are identified by their message Id.
- Agent_[Message ID]_HeardTime
- System.currentTimeMillis() time-stamp for the last time some component has heard from this agent
- Agent_[Message ID]_SpokeTime
- System.currentTimeMillis() time-stamp for the last time some component has attempted to speak to this agent
Node
Nodes are identified by their message Id. Currently, there are no Node specific Keys
Metric Paths
This section discusses the structure of the Metrics Paths and defines common Paths.
Metrics read from the
MetricsService are the values from a real-time model of the status of the Cougaar Society and system resources. The
MetricsService allow access to
formulas in the model. Formulas are relative to a
Resource Context, which define instances of modeled entries and there relationship between each other.
Containment is a major relationship which is modeled. Child context inherit all the formulas of its parent. The useful containment relationship in Cougaar are Host->Node->Agent. For example, Agents have all the formulas of their hosts, and when agents moves to a new host the metrics track new hosts values.
Path Syntax
A Path is a series of Contexts followed by a Formula. If there is more than one child Context of the same type in the parent Context, then the child Context is narrowed by Parameters. Each Context has a fixed number of parameters and parameter order is important. The Context type is used as the name of the context. Context and Formula names are case sensitive. For example:
IpFlow(128.33.15.114,128.33.15.113):CapacityMax
The separator between Contexts and formulas is the ":" and the separator between parameters is the ",". But your code you should use the constant PATH_SEPR defined in interface org.cougaar.core.qos.metrics.Constants:
"IpFlow(128.33.15.114,128.33.15.113)" +PATH_SEPR+ "CapacityMax"
The following sections will use the convention of writing variable parameter fields in brackets [] for variable parameters; using ":" as the Path Separator; and "," as the parameter separator. The syntax is as follows
ContextType1([parameter1],[parameter2]...):ContextType2([parameter]):Formula(parameter)
The Averaging Period are parameters for some Formulas. The Averaging Periods are: "10", "100", and "1000". This section will use the convention 1xxxSecAvg to represent the averaging period.
Also, please use the constants interface in core module org.cougaar.core.qos.metrics.Constants whenever possible. This will allow compile time detection of errors, when the Metrics Service interface inevitably changes in the future
Host Context
Host Contexts can be accessed at root level and take one parameter which is the Host address. The Host address can be IP V4 address or the domain name. Hosts contexts are not contained in other Contexts.
- Host([host Addr]):EffectiveMJips
- The expected per thread Millions of Java Instructions Per Second. The formula takes into account the base CPU JIPS, the number of CPUs, and the Load Average on the Host
- Host([Host Addr]):Jips
- CPU capacity in Java Instructions Per Second. JIPS is determined through a benchmark.
- Host([Host Addr]):LoadAverage
- CPU load average, i.e. the average number of processes that are ready to run.
- Host([Host Addr]):Count
- The number of CPUs in this host
- Host([Host Addr]):Cache
- Size of CPU level 2 cache
- Host([Host Addr]):TotalMemory
- Total Physical Memory
- Host([Host Addr]):FreeMemory
- Free Physical Memory
- Host([Host Addr]):TcpInUse
- TCP sockets in use
- Host([Host Addr]):UdpInUse
- UDP sockets in use
Node Context
Node Contexts can be accessed at root level and take one parameter which is the the Node's Message address. The Node is also contained within a Host Context and inherits all the Host Context's formulas.
- Node([Node ID]):CPULoadAvg(1xxxSecAvg)
- The average number of threads working for all component on this Node during the Averaging Period.
- Node([Node ID]):CPULoadMJips(1xxxSecAvg)
- The average MJIPS used by all the threads for all components on this Node during the Averaging Period.
- Node([Node ID]):MsgIn(1xxxSecAvg)
- The average messages into all Agents on this Node during the Averaging Period.
- Node([Node ID]):MsgOut(1xxxSecAvg)
- The average messages out of All Agents this Node during the Averaging Period.
- Node([Node ID]):BytesIn(1xxxSecAvg)
- The average number of bytes (for all messages) into all Agents on this Node during the Averaging Period.
- Node([Node ID]):BytesOut(1xxxSecAvg)
- The average number of bytes (for all messages) out of All Agents on this Node during the Averaging Period.
- Node([Node ID]):VMSize
- Size in Bytes of the Node's VM
- Node([node Addr]):Destination([Agent ID]):MsgTo(1xxSecAvg)
- Message per second from all agents on the Node, to the destination Agent
- Node([node Addr]):Destination([Agent ID]):MsgFrom(1xxSecAvg)
- Message per second to any agent on the Node, from the destination Agent
- Node([node Addr]):Destination([Agent ID]):BytesTo(1xxSecAvg)
- Bytes per second from all agents on the Node, to the destination Agent
- Node([node Addr]):Destination([Agent ID]):BytesFrom(1xxSecAvg)
- Bytes per second to any agent on the Node, from the destination Agent
- Node([node Addr]):Destination([Agent ID]):AgentIpAddress
- Ip address of destination Agent
- Node([node Addr]):Destination([Agent ID]):CapacityMax
- Maximum capacity of the network between Node and destination Agent.
- Node([node Addr]):Destination([Agent ID]):OnSameSecureLAN
- True if Node and Destination are on the same secure LAN. The current formula just checks if the network capacity between the Node and Agent is greater than or equal to 10 Mbps.
Agent Context
Agent Contexts can be accessed at root level and take one parameter which is the the Agent's Message address. The Agent Context is also contained within a Node Context and inherits all the Node Context's formulas. When an Agent moves to a new Node, the Agent Context will be moved the the corresponding Node, i.e. all the re-wiring is automatic, but may be slightly delayed due to discovery issues.
- Agent([Agent ID]):LastHeard
- Seconds since some component has heard from this agent. This can be from any source such as successful communication, acknowledgment, or gossip.
- Agent([Agent ID]):LastSpoke
- Seconds since some component attempted to Speak to the Agent. The attempt does not have to be successful. For example, if a failed message is retied, LastSpoke? will be the time of last retry
- Agent([Agent ID]):SpokeErrorTime
- Seconds since last error in communications. This is a large number with 0.0 credibility, if no error has occurred.
- Agent([Agent ID]):CPULoadAvg(1xxxSecAvg)
- The average number of threads working for this Agent during the Averaging Period.
- Agent([Agent ID]):CPULoadMJips(1xxxSecAvg)
- The average MJIPS used by all the threads for this Agent during the Averaging Period.
- Agent([Agent ID]):MsgIn(1xxxSecAvg)
- The average messages into this Agent during the Averaging Period.
- Agent([Agent ID]):MsgOut(1xxxSecAvg)
- The average messages out of this Agent during the Averaging Period.
- Agent([Agent ID]):BytesIn(1xxxSecAvg)
- The average number of bytes (for all messages) into this Agent during the Averaging Period.
- Agent([Agent ID]):BytesOut(1xxxSecAvg)
- The average number of bytes (for all messages) out of this Agent during the
- Agent([Agent ID]):PersistSizeLast
- Size in Bytes of the last persistence file for this agent
IpFlow? Context
IpFlow? Context can be accessed at root level and takes two parameters which is the source and destination host addresses. The Host address can be the IP V4 address or the domain name.
- IpFlow([SrcAddr],[DstAddr]):CapacityMax
- Maximum Bandwidth between two hosts
- IpFlow([SrcAddr],[DstAddr]):CapacityUnused
- Unused Bandwidth between two hosts
AgentFlow? Context
AgentFlow? Context can be accessed at root level and takes two parameters which is the source and destination agent message address.
Since 11.2 the
AgentFlow? context is no longer used, because it added too many formulas to the metrics service when societies are large.
APIs
org.cougaar.core.qos.metrics.MetricsService
- Metric getValue(String path, VariableEvaluator evaluator, Properties qos_tags)
- The most general query function in the Metric Service. The path specifies the metric to be returned. For a description of the syntax, see the Path section. The evaluator is used to handle shell-variable-style references in generic paths. For more details on this, see VariableEvaluator. The qos_tags are for future use.
- Metric getValue(String path, Properties qos_tags)
- As above but with no variable replacement.
- Metric getValue(String path, VariableEvaluator evaluator)
- As above but with no QoS? tags.
- Metric getValue(String path)
- As above but with neither variable replacement nor QoS? tags.
- Object subscribeToValue(String path, Observer observer, VariableEvaluator evaluator, MetricNotificationQualifier qualifier)
- The most general subscription function in the Metric Service. The path specifies the metric to subscribe to. For a description of the syntax, see the Path section. The observer is the entity which will receive a callback when the specified metric changes. The evaluator is used to handle shell-variable-style references in generic paths. For more details on this, see VariableEvaluator. The qualifierw is used to restrict the frequency of callbacks, for example by specifying the smallest change that should trigger one. For more details, see MetricNotificationQualifier. The value returned by this method is a subscription handle and should only be used for a subsquent call to unsubscribeToValue.
- Object subscribeToValue(String path, Observer observer, VariableEvaluator evaluator)
- As above but without callback qualification.
- Object subscribeToValue(String path, Observer observer, MetricNotificationQualifier qualifier)
- As above but without variable replacement.
- Object subscribeToValue(String path, Observer observer)
- As above but with neither variable replacement nor callback qualification.
- void unsubscribeToValue(Object handle)
- Terminates a previously established subscription. The handle is as returned by one of the subscribeToValue calls.