Programmer's guide to the Metrics Service

Use Cases and Examples

Example Metrics Reader

An example Metrics Service subscriber can be found in the core module.

package org.cougaar.core.qos.metrics;



/**



* Basic Metric Service Client subscribes to a Metric path given in



* the Agent's .ini file and prints the value if it ever changes



*/







public class MetricsClientPlugin

extends ParameterizedPlugin

implements Constants

{

protected MetricsService metricsService;

private String paramPath = null;

private VariableEvaluator evaluator;

/**

* Metric CallBack object

*/

private class MetricsCallback implements Observer {

/**

* Call back implementation for Observer

*/

public void update(Observable obs, Object arg) {

if (arg instanceof Metric) {

Metric metric = (Metric) arg;

double value = metric.doubleValue();

System.out.println("Metric "+ paramPath +"=" + metric);

}

}

} public void load() {

super.load();

ServiceBroker sb = getServiceBroker();

evaluator = new StandardVariableEvaluator(sb);

metricsService = ( MetricsService)

sb.getService(this, MetricsService.class, null); MetricsCallback cb = new MetricsCallback(); paramPath = getParameter("path");

// If no path is given, default to the load average of the

// current Agent.

if (paramPath == null)

paramPath ="$(localagent)"+PATH_SEPR+"LoadAverage";  Object subscriptionKey=metricsService.subscribeToValue(paramPath, cb,

evaluator);

}

}

Example Metrics Writer

An example Metric Service writer has the following form:

package org.cougaar.core.qos.metrics;



public class MetricsTestComponent







extends SimplePlugin







implements Constants







{







public void load() {







super.load();







ServiceBroker sb = getServiceBroker(); MetricsUpdateService updater = (MetricsUpdateService)







sb.getService(this, MetricsUpdateService.class, null);







String key = "Current" +KEY_SEPR+ "Time" +KEY_SEPR+ "Millis";







// Publish the current time







Metric m = new MetricImpl(new Long(System.currentTimeMillis()),







SECOND_MEAS_CREDIBILITY,







"ms",







"MetricsTestComponent");







updater.updateValue(key, m);







}}

Metric Keys

This section discusses the structure of the Metrics keys name-space and defines common keys.

Metrics are written into the MetricsUpdateService as key-value pairs. While the key itself is just a string, we impose a structure to help organize the information, loosely modeled after SNMP MIBs. The value is a Metric, a structured object with slots for value, credibility, time-stamp, units, etc.

At runtime, each Node has a store of Key-Value pairs. When the MetricsUpdateService receives a new Key-Value pair, it looks up the Key in the store. If the value has changed, all the Metrics Formulas that have subscribed to this Key will be called-back with the new Value. The Metrics Service in each Node has one and only one effective Value for each Key, which is derived through a complicated set of integration rules based on credibility and timestamps.

Normally, Key-Value pairs are not read from the Metrics Service, but are only processed by internal formulas. One debugging trick to view the effective value of a key is to the Path Integrater, which takes a Key as a parameter and returns the effective metric. Currently, there is no way to list all the Keys available to a Node.

Key Syntax

A Metrics Key is a string divided into fields using the Key Separator, e.g. Host_128.33.15.114_CPU_Jips

The current Key Separators is "_". But your code you should use the string constant KEY_SEPR defined in org.cougaar.core.qos.metrics.Constants: "Host" +KEY_SEPR+ "128.33.15.114" +KEY_SEPR+ "CPU" +KEY_SEPR+ "Jips" Notice that some of the fields are types (e.g. Host, CPU, Jips) and others are identifiers (e.g. 128.33.15.114). The types are fixed and case sensitive. Identifiers are variable and define branches in naming hierarchy.

In the following section we will use the convention of writing identifier fields in brackets [], and using "_" as the Key Separator Host_[Host IP]_CPU_Jips

Host

Host Keys start with identifying the host and then have several optional field for different host characteristics. Host identifiers are raw IP V4 addresses, not domain names.
Host_[Host IP]_CPU_Jips
CPU capacity in Java Instructions Per Second. JIPS is determined through a benchmark.

Host_[Host IP]_CPU_loadavg
CPU load average, i.e. the average number of processes that are ready to run.

Host_[Host IP]_CPU_count
The number of CPUs in this host

Host_[Host IP]_CPU_cache
Size of CPU level 2 cache

Host_[Host IP]_Memory_Physical_Total
Total Physical Memory

Host_[Host IP]_Memory_Physical_Free
Free Physical Memory

Host_[Host IP]_Network_TCP_sockets_inuse
TCP sockets in use

Host_[Host IP]_Network_UDP_sockets_inuse
UDP sockets in use

IP Flow

Network Keys start with identifying the IP Flow and then have several optional field for different network characteristics. End-point addresses are raw IP V4 addresses and not domain names.
Ip_Flow_[src IP]_[dst IP]_Capacity_Max
The maximum bandwidth (kbps)for path across the network, i.e. with no competing traffic.

Ip_Flow_[src IP]_[dst IP]_Capacity_Unused
The expected available bandwidth (kbps) for a path across the network, i.e. the max minus the competing traffic.

Site Flow

Sites are an IP subnetwork which can be represented with a simple mask (number of bits with leading 1's). Site_Flows between sites can be used as defaults, instead of specify specific Ip_Flows. For example, Site_Flow_128.89.0.0/16_128.33.15.0/28_Capacity_Max
Site_Flow_[src IP/mask]_[dst IP/mask]_Capacity_Max
Maximum bandwidth (kbps) between Sites. The current formulas can model asymmetric bandwidth between Sites, by publishing Site_Flows for both direction. If only one direction is published, the formulas will assume the bandwidth is the same in both directions.

Agent

Agents are identified by their message Id.
Agent_[Message ID]_HeardTime
System.currentTimeMillis() time-stamp for the last time some component has heard from this agent

Agent_[Message ID]_SpokeTime
System.currentTimeMillis() time-stamp for the last time some component has attempted to speak to this agent

Node

Nodes are identified by their message Id. Currently, there are no Node specific Keys

Metric Paths

This section discusses the structure of the Metrics Paths and defines common Paths.

Metrics read from the MetricsService are the values from a real-time model of the status of the Cougaar Society and system resources. The MetricsService allow access to formulas in the model. Formulas are relative to a Resource Context, which define instances of modeled entries and there relationship between each other.

Containment is a major relationship which is modeled. Child context inherit all the formulas of its parent. The useful containment relationship in Cougaar are Host->Node->Agent. For example, Agents have all the formulas of their hosts, and when agents moves to a new host the metrics track new hosts values.

Path Syntax

A Path is a series of Contexts followed by a Formula. If there is more than one child Context of the same type in the parent Context, then the child Context is narrowed by Parameters. Each Context has a fixed number of parameters and parameter order is important. The Context type is used as the name of the context. Context and Formula names are case sensitive. For example:

IpFlow(128.33.15.114,128.33.15.113):CapacityMax

The separator between Contexts and formulas is the ":" and the separator between parameters is the ",". But your code you should use the constant PATH_SEPR defined in interface org.cougaar.core.qos.metrics.Constants:

"IpFlow(128.33.15.114,128.33.15.113)" +PATH_SEPR+ "CapacityMax"
The following sections will use the convention of writing variable parameter fields in brackets [] for variable parameters; using ":" as the Path Separator; and "," as the parameter separator. The syntax is as follows
ContextType1([parameter1],[parameter2]...):ContextType2([parameter]):Formula(parameter)

The Averaging Period are parameters for some Formulas. The Averaging Periods are: "10", "100", and "1000". This section will use the convention 1xxxSecAvg to represent the averaging period.

Also, please use the constants interface in core module org.cougaar.core.qos.metrics.Constants whenever possible. This will allow compile time detection of errors, when the Metrics Service interface inevitably changes in the future

Host Context

Host Contexts can be accessed at root level and take one parameter which is the Host address. The Host address can be IP V4 address or the domain name. Hosts contexts are not contained in other Contexts.
Host([host Addr]):EffectiveMJips
The expected per thread Millions of Java Instructions Per Second. The formula takes into account the base CPU JIPS, the number of CPUs, and the Load Average on the Host

Host([Host Addr]):Jips
CPU capacity in Java Instructions Per Second. JIPS is determined through a benchmark.

Host([Host Addr]):LoadAverage
CPU load average, i.e. the average number of processes that are ready to run.

Host([Host Addr]):Count
The number of CPUs in this host

Host([Host Addr]):Cache
Size of CPU level 2 cache

Host([Host Addr]):TotalMemory
Total Physical Memory

Host([Host Addr]):FreeMemory
Free Physical Memory

Host([Host Addr]):TcpInUse
TCP sockets in use

Host([Host Addr]):UdpInUse
UDP sockets in use

Node Context

Node Contexts can be accessed at root level and take one parameter which is the the Node's Message address. The Node is also contained within a Host Context and inherits all the Host Context's formulas.
Node([Node ID]):CPULoadAvg(1xxxSecAvg)
The average number of threads working for all component on this Node during the Averaging Period.

Node([Node ID]):CPULoadMJips(1xxxSecAvg)
The average MJIPS used by all the threads for all components on this Node during the Averaging Period.

Node([Node ID]):MsgIn(1xxxSecAvg)
The average messages into all Agents on this Node during the Averaging Period.

Node([Node ID]):MsgOut(1xxxSecAvg)
The average messages out of All Agents this Node during the Averaging Period.

Node([Node ID]):BytesIn(1xxxSecAvg)
The average number of bytes (for all messages) into all Agents on this Node during the Averaging Period.

Node([Node ID]):BytesOut(1xxxSecAvg)
The average number of bytes (for all messages) out of All Agents on this Node during the Averaging Period.

Node([Node ID]):VMSize
Size in Bytes of the Node's VM

Node([node Addr]):Destination([Agent ID]):MsgTo(1xxSecAvg)
Message per second from all agents on the Node, to the destination Agent

Node([node Addr]):Destination([Agent ID]):MsgFrom(1xxSecAvg)
Message per second to any agent on the Node, from the destination Agent

Node([node Addr]):Destination([Agent ID]):BytesTo(1xxSecAvg)
Bytes per second from all agents on the Node, to the destination Agent

Node([node Addr]):Destination([Agent ID]):BytesFrom(1xxSecAvg)
Bytes per second to any agent on the Node, from the destination Agent

Node([node Addr]):Destination([Agent ID]):AgentIpAddress
Ip address of destination Agent

Node([node Addr]):Destination([Agent ID]):CapacityMax
Maximum capacity of the network between Node and destination Agent.

Node([node Addr]):Destination([Agent ID]):OnSameSecureLAN
True if Node and Destination are on the same secure LAN. The current formula just checks if the network capacity between the Node and Agent is greater than or equal to 10 Mbps.

Agent Context

Agent Contexts can be accessed at root level and take one parameter which is the the Agent's Message address. The Agent Context is also contained within a Node Context and inherits all the Node Context's formulas. When an Agent moves to a new Node, the Agent Context will be moved the the corresponding Node, i.e. all the re-wiring is automatic, but may be slightly delayed due to discovery issues.
Agent([Agent ID]):LastHeard
Seconds since some component has heard from this agent. This can be from any source such as successful communication, acknowledgment, or gossip.

Agent([Agent ID]):LastSpoke
Seconds since some component attempted to Speak to the Agent. The attempt does not have to be successful. For example, if a failed message is retied, LastSpoke? will be the time of last retry

Agent([Agent ID]):SpokeErrorTime
Seconds since last error in communications. This is a large number with 0.0 credibility, if no error has occurred.

Agent([Agent ID]):CPULoadAvg(1xxxSecAvg)
The average number of threads working for this Agent during the Averaging Period.

Agent([Agent ID]):CPULoadMJips(1xxxSecAvg)
The average MJIPS used by all the threads for this Agent during the Averaging Period.

Agent([Agent ID]):MsgIn(1xxxSecAvg)
The average messages into this Agent during the Averaging Period.

Agent([Agent ID]):MsgOut(1xxxSecAvg)
The average messages out of this Agent during the Averaging Period.

Agent([Agent ID]):BytesIn(1xxxSecAvg)
The average number of bytes (for all messages) into this Agent during the Averaging Period.

Agent([Agent ID]):BytesOut(1xxxSecAvg)
The average number of bytes (for all messages) out of this Agent during the

Agent([Agent ID]):PersistSizeLast
Size in Bytes of the last persistence file for this agent

IpFlow? Context

IpFlow? Context can be accessed at root level and takes two parameters which is the source and destination host addresses. The Host address can be the IP V4 address or the domain name.
IpFlow([SrcAddr],[DstAddr]):CapacityMax
Maximum Bandwidth between two hosts

IpFlow([SrcAddr],[DstAddr]):CapacityUnused
Unused Bandwidth between two hosts

AgentFlow? Context

AgentFlow? Context can be accessed at root level and takes two parameters which is the source and destination agent message address.

Since 11.2 the AgentFlow? context is no longer used, because it added too many formulas to the metrics service when societies are large.

APIs

org.cougaar.core.qos.metrics.MetricsService

Metric getValue(String path, VariableEvaluator evaluator, Properties qos_tags)
The most general query function in the Metric Service. The path specifies the metric to be returned. For a description of the syntax, see the Path section. The evaluator is used to handle shell-variable-style references in generic paths. For more details on this, see VariableEvaluator. The qos_tags are for future use.

Metric getValue(String path, Properties qos_tags)
As above but with no variable replacement.

Metric getValue(String path, VariableEvaluator evaluator)
As above but with no QoS? tags.

Metric getValue(String path)
As above but with neither variable replacement nor QoS? tags.

Object subscribeToValue(String path, Observer observer, VariableEvaluator evaluator, MetricNotificationQualifier qualifier)
The most general subscription function in the Metric Service. The path specifies the metric to subscribe to. For a description of the syntax, see the Path section. The observer is the entity which will receive a callback when the specified metric changes. The evaluator is used to handle shell-variable-style references in generic paths. For more details on this, see VariableEvaluator. The qualifierw is used to restrict the frequency of callbacks, for example by specifying the smallest change that should trigger one. For more details, see MetricNotificationQualifier. The value returned by this method is a subscription handle and should only be used for a subsquent call to unsubscribeToValue.

Object subscribeToValue(String path, Observer observer, VariableEvaluator evaluator)
As above but without callback qualification.

Object subscribeToValue(String path, Observer observer, MetricNotificationQualifier qualifier)
As above but without variable replacement.

Object subscribeToValue(String path, Observer observer)
As above but with neither variable replacement nor callback qualification.

void unsubscribeToValue(Object handle)
Terminates a previously established subscription. The handle is as returned by one of the subscribeToValue calls.
Topic revision: r4 - 18 Aug 2009 - 00:23:48 - GordonVidaver
 
This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback