Cougaar Memory Profiler
NOTE: As of January 2007, the Cougaar memory profiler should be considered
deprecated . The Cougaar memory profiler was written to overcome limitations in Sun's
JVMPI implementation, as used by
hprof,
OptimizeIt(tm),
JProbe(tm), etc. Most of these problems have been fixed in Java 1.6.
Instead of using the Cougaar memory profiler, we recommend that developers first try Sun's new, free `jhat` profiler:
http://java.sun.com/javase/6/docs/technotes/tools/index.html#troubleshoot See the above link's
Developer's Guide for notes on how to use
jhat and interpret the results.
For future reference, there may be interest in extending jhat to better identify non-leak memory problems and potential solutions. For example, a developer might want to estimate the potential memory savings of adding caching (e.g. intern the most common Strings) or changing a data structure (e.g. change an "Integer" field to a primitive "int"). These "identify potential memory saving" analysis methods would best be implemented as enhancements to the post-processing jhat dump reader:
j2se/src/share/classes/com/sun/tools/hat/Main.java The modified code could be a new Cougaar project (e.g. "jhat_extensions") and offered back to Sun.
The remainder of this page describes the Cougaar memory profiler, which still works but is more difficult to configure and use than jhat. Some of the concepts may be of interest for use in future projects, e.g. bytecode instrumentation and weak reference tables.
Overview
The Cougaar memory profiler is a tool for debugging memory usage and leaks in any Java application. It features a scalable 100% Java design that is lighter weight than existing
JVMPI-based profilers (
hprof,
OptimizeIt(tm),
JProbe(tm), etc).
Memory leaks can arise in Java applications due to unintentional object references. For example, an application may fail to remove callback listeners or close streams, preventing these objects from being garbage collected. Even minor leaks can grow to major problems in deployed systems.
The profiler tracks memory usage within the application by using tables of
WeakReferences. The developer selects which classes should be tracked and runs an automated classfile editor (using
BCEL) to add profiling instructions to the constructors. The profiler maintains pointers to the live instances and can display useful debugging information:
- Total number of allocations of a profiled class, including the number of live and GC'ed instances
- Free/used/max VM heap size and force GC
- Show instances in order of:
- allocation timestamp
- compareTo
- hashCode
- toString
- "size" (e.g. ArrayList? .size() & String.length())
- "capacity" (e.g. ArrayList? .elementData.length)
For example, find the largest
HashMaps? , ten newest "com.Foo", oldest "com.Foo", etc.
- Count unique occurences of:
- allocation stacktrace
- equals
- toString
- hashCode
- allocation timestamp (& mod second/minute/hour)
- "size" (& mod 100/1000)
For example, what's the most common allocation point for
ArrayLists? , which are the most common "equals" values (for potential caching), etc.
- Sample instances for the above order/count views. For example, randomly select 100 of the 1000+ Strings and show the 10 most common "equals" values.
- The profiler can be configured to randomly sample the allocations to further reduce runtime overhead on a per-class basis. For example, track 5% of allocated Strings. The UI will adjust the displayed data to match the sampling ratio(s).
An interactive servlet-based UI is included that displays all the above profiling data. A servlet-based UI was developed instead of a Swing UI since it matched the Cougaar Agent Architecture's
embedded servlet server. It'd be straight-forward to port the servlet to Swing or create a
JEditorPane? /HTMLEditorKit adaptor. The servlet has minimal Cougaar dependencies and can be modified to run in any servlet server.
Jump to snapshots
The developer controls the profiling overhead by selecting which classes to profile and reducing the level of profiling (e.g. turning off allocation stacktrace data). The runtime overhead is less than JVMPI tools, partially because the JVMPI tools must maintain similar heap tables (see the hprof source in
$JDK/j2se/src/share/tools/hprof for details). We've found the JVMI overhead to be prohibitively expensive when debugging large applications. Long-term
JSR-163 (Java
(tm) Platform Profiling Architecture) may address some of these performance issues, however the draft specification suggests that bytecode-based object profiling may be the recommended approach.
There are a couple noteworthy limitations of this approach vs JVMPI:
- The profiler can't trace the VM's roots, so one can't tell why an object hasn't been GC'ed. The profiler can help figure out which classes are being leaked and when/where they were allocated, which is often sufficient for most debugging.
- Arrays can't be profiled, since they're native to the VM and lack a constructor method. In the future this could be supported by tracking "newarray" bytecodes in all classes. Some VM-internal arrays can't be tracked, e.g. thread stacks. For now the profiler's "+Capacity Bytes" can track all arrays referenced by class fields (e.g. an ArrayList? 's "Object[] elementData").
- A handful of "java.lang" classes can't be profiled due to VM loading constraints:
- java.lang.Class
- java.lang.Object
- java.lang.String$CaseInsensitiveComparator
- java.lang.ThreadGroup
- java.lang.Throwable
- java.lang.ref.Reference
- java.lang.ref.SoftReference
- java.lang.ref.WeakReference
Constant strings in class/interface APIs are also hidden within the VM, as are the Classes themselves and some dynamic proxies. Some odd JDK-internal classes may also be excluded (e.g. "sun.misc.Ref"). The advantages are noted above: scalability and Java-based APIs. Also, since the profiler is 100% Java, it's easier to extend and can be used from within the application itself.
The main
CVS Repo has the latest code. The download contains a "run" script that runs a simple example application.
Please send questions and comments to the
projects forum.
Snapshots
Here's an example HTML snapshot:
The above snapshot shows:
- VM heap memory used/free/total/max
- Rows of profiled classes, where the "Live" and "+Capacity Bytes" columns are the most important. In Mozilla the column links will sort the column using JavaScript? .
Here's a key for each table row:
-
-
- Type
- The class name
- Sample%
- The percent of allocated instances that were tracked. This is an option specified by the developer in the profiler filter configuration.
- Live
- The number of live instances of the class, multiplied by 1/sample
- GC'ed
- The number of garbage collected instances of the class, multiplied by 1/sample
- Total
- "Live" + "GC'ed"
- Bytes Each
- The estimated size of each instance, based upon the object header size and number of fields in the object
- *Live
- "Live" * "Bytes Each"
- +Capacity Bytes
- "*Live" + "Capacity Bytes", where "Capacity Bytes" is the sum of array lengths multipled by array element size in all the live instances (e.g. for ArrayList? , for each "Object[] elementData", the sum of:
(elementData == null ? 0 : 12 + 4*elementData.length) Off to the right, out of view in the snapshot, are columns for:
-
-
- Size Sum
- The sum of all "size()" values of the instances (e.g. for Vector, the sum of all "int elementCount" values)
- Size Max
- The maximum "size()" of the live instances
- Size Max Ever
- The maximum "Size Max" observed since the start of the run
- Size Mean
- "Size Sum" / "Live"
- Capacity Sum
- The sum of all array lengths in all the live instances (e.g. for ArrayList? , for each "Object[] elementData", the sum of:
(elementData == null ? 0 : elementData.length)
- Capacity Max
- The maximum array length of the live instances
- Capacity Max Ever
- The maximum "Capacity Max" observed since the start of the run
- Capacity Mean
- "Capacity Sum" / "Live"
For example, 1% of String allocations were tracked. Currently 56 of 28042 tracked Strings are live, so we can estimate that 5600 are live out of 2804200 allocated (0.2%). Each String costs 24 bytes excluding the char[] data:
header + fields = header + "char[] value" + "int offset" + "int count" + "int hash" = 8 + 4 + 4 + 4 + 4 = 24
So there's about 24*5600 = 134400 bytes in Strings, excluding the "char[] value". The "+Capacity Bytes" is 474800, so the sum of all the "char[]" headers (12 bytes each) and chars (2 bytes each) is 474800-134400 = 340400 bytes. This is an estimate, since some of the Strings may point to the same "char[]" (e.g. substrings and
StringBuffers? ). The max "length()" of a String is currently 49, and the max "value.length" is 98, where the discrepency is likely due to substrings.
If we click on the "java.lang.String" row a new page is generated:
This simply repeats the above data and allows the user to submit a query on instances of "java.lang.String". Here we select a decreasing view by "length()" and see:
In this run we disabled "java.*" allocation timestamps and stacktraces, to minimize the performance overhead, but we can see the "toString()" and other useful information. In this example shows the largest 20 of all 30 Strings (of the 1% we tracked).
Additional snapshots and notes can be found
here.
Usage
The release contains a "run" script that will run a simple example application. A large
HashMap? is allocated, serialized, and deserialized, and at each point the profiler is queried to track this map.
BCEL 5.1 is required to modify the classes but not required when running the profiler.
The "profiler_transform/bin/transform_jar" script can be used to instrument complete jars (e.g. "lib/core.jar", "sys/tomcat_40.jar", etc) or JDK packages (e.g. "java.lang", "java.util", etc). For options run:
./profiler_impl/bin/transform_jar --help
The "transform_jar" script allows the developer to specify an "OptionsFactory" class that will define the profiling level at runtime, e.g.:
./profiler_impl/bin/transform_jar --config=foo.MyConfig xerces
The default implementation is
DefaultOptionsFactory, which enables full profiling for all classes. The class must contain a method matching this signature:
public static final Options getOptions( String module, String classname) {..}
The recommended usage is to specify a custom implementation that you'll modify between runs. This will allow you to run the "transform_jar" script on your jars only once (to instrument the bytecode and specify your
OptionsFactory? ) and in subsequent runs you'll only need to change your
OptionsFactory? to change the profiling config.
Here's an example:
package foo; import org.cougaar.profiler.Options; public class MyConfig { /** * Option that will track 100% of allocations, * record allocation timestamp, * record allocation stacktrace, * record size/capacity/context. * <p> * This is the best detail but the most overhead. */ private static final Options FULL_DETAIL = new Options( true, // time true, // stack true, // size true, // capacity true, // context 1.00 // sampleRatio ); /** * Track only 1% of allocations, * record only the instance itself, * allow size/capacity. * <p> * This is very low overhead but only samples a small * subset of the instances. This is ideal for the * initial profiling runs, to get a rough idea of the * allocation counts. */ private static final Options MIN_DETAIL = new Options( false, // time false, // stack true, // size true, // capacity false, // context 0.01 // sampleRatio ); /* * The profiled class will call this method to get its options. * * @param module optional module name, e.g. "xerces" or null * @param classname non-null class name, e.g. * "org.apache.xerces.dom.DocumentImpl$LEntry" * @return profiling options for the class, or null if the * class shouldn't be tracked. */ public static final Options getOptions( String module, String classname) { if ("core".equals(module)) { // we're interested in all classes in "lib/core.jar". // we want to track every allocation, timestamp, and stack return FULL_DETAIL; } else if (classname.startsWith("java.")) { // we're a little interested in "java.*", but // not enough to suffer a big overhead. This is a good // "scatter-shot" approach. return MIN_DETAIL; } else { // everything else will be ignored return null; } } }
This should be compiled, jar'ed (e.g. "profiler_config.jar"), and placed at the head of the VM's bootclasspath along with the "profiler_impl.jar" and other VM jar overrides, e.g.:
#!/bin/sh .. BOOTCLASSPATH="\ -Xbootclasspath/p:$COUGAAR_INSTALL_PATH/lib/profiler_config.jar\ -Xbootclasspath/p:$COUGAAR_INSTALL_PATH/lib/profiler_impl.jar\ -Xbootclasspath/p:$COUGAAR_INSTALL_PATH/lib/javalangpatch.jar\ -Xbootclasspath/p:$COUGAAR_INSTALL_PATH/lib/javaiopatch.jar" .. java $BOOTCLASSPATH ..
For details see the example "run" script included in the release.
Custom "OptionsFactory" classes should minimize imports to avoid VM loading errors. For example, a call to "new
HashSet? ()" will fail if
HashSet? is profiled, due to a stack overflow caused by the circular reference. Similarily, if all of "java.lang" will be profiled, then Strings should be carefully handled to avoid string allocations, including any calls to "System.out"!
The class modifier can also be used within a
ClassLoader? at class definition time. The "SelfProfiler" defines a standard BCEL
ClassLoader? method:
public JavaClass modifyClass(JavaClass clazz) {..} See BCEL's "org.apache.bcel.util.ClassLoader" for details. This approach would avoid the "transform_jar" script but would add runtime overhead.
ACME Usage
Here are usage notes specific to ACME, which is Cougaar's automation scripting environment (soon to be open-sourced). This is primarily of interest to
Cougaar and
Ultra*Log developers.
Download the
latest profiler install and unzip it into your $CIP. It'll overwrite:
$CIP/build/data/default.build
to specify the compile dependencies of UL jars (e.g. "ar_mic.jar").
To verify the basic install, run the "
$CIP/run" script. It should generate output similar to "
$CIP/run.log". After this test, delete the temporary files:
rm -rf
$CIP/tmp
rm
$CIP/lib/javautilpatch*
Also delete the example jar and config, which you won't need for testing:
rm
$CIP/lib/profiler_examples.jar
rm
$CIP/lib/profiler_config.jar
First you must create an "OptionsFactory" Java class to control the level of profiling. An example is
shown above named "foo.MyConfig". A modified zip containing the source, jar, and compile script can be found
here. This implementation will enable minimal profiling for all classes:
Track 1% of instance allocations at minimal detail
You may want to modify the "profiler_config/src/foo/MyConfig.java" to profile more/less classes or capture more/less detail; see the above
usage section for details.
Next you must transform the jars that you will profile. You can transform all the jars in sys and lib using:
#!/bin/sh CONFIG="--config=foo.MyConfig" cd $CIP || exit -1 # profile everything! This may take a while... # # The following errors are expected: # icu4j: two class files are corrupt # jena: missing jars (sesame, antlr, sleepycat) # juddi: missing jars (axis) # log4j: missing jars (jms, jmx) # openmap: missing jars (j3d) # tomcat_40: missing jars (jmx) # vishnuServer: missing jars (vishnu, util) # These errors and related NPEs can be ignored, since the classes # that depend upon these missing jars will never be loaded at # runtime. JAVA_PKGS="applet awt beans io lang math net nio rmi security sql text util" for x in $JAVA_PKGS; do ./transform_jar $CONFIG java.$x; done for x in $CIP/lib/*.jar; do ./transform_jar $CONFIG $x; done for x in $CIP/sys/*.jar; do ./transform_jar $CONFIG $x; done
This may take up to an hour on a slow machine (~500mhz). This is a one-time step that is only required when/if you upgrade your jars. After the transform you can delete the temporary files in:
rm -rf $CIP/tmp
In a profiling session you'll only need to modify the "profiler_config" source and recompile the tiny "
$CIP/lib/profiler_config.jar" to change the profiling level.
At this point your
$CIP/lib should look something like:
aggagent.jar -> /mnt/shared/socC/lib/aggagent.jar.new
aggagent.jar.new
aggagent.jar.old
...
The "*.jar.new" contains the instrumented classes and is signed by the operator "sign" script.
Next you must modify your:
$CIP/csmart/config/rules/isat/tic_env.rule
by commenting out line 55 (the Xbootclaspath). This allows ACME to use the profiler's VM "-Xbootclasspath" rule:
$CIP/csmart/config/rules/isat/bootclasspath.rule
I'll submit an ISAT bug to split this create the basic "bootclasspath.rule" in the CSMART module, so in the future you won't need to modify your "tic_env.rule".
You should also modify the above "bootclasspath.rule" to uncomment all the jars (e.g. uncomment the "javarmipatch", etc).
Lastly, you can optionally enable the "wget" action:
$CIP/csmart/acme_scripting/src/lib/cougaar/wget.rb
by modifying the very end of:
$CIP/csmart/acme_scripting/src/lib/cougaar/scripting.rb
to append:
require 'cougaar/wget'
Then add "wget" actions in your "MOAS" script, e.g. just before stage-2 add:
do_action 'WGetAllNodes', '/profiler', 'profiler_beforeStage2'
to capture "/profiler" servlet snapshots to the archives. I'll submit an ISAT bug to move this action into the CSMART module.
The config is now ready to run. You can run any "MOAS" script and it'll profile the classes as defined in your "foo.MyConfig" class, which in this case will track 1% of the instance allocations. This is fine for the first run, since it'll provide a rough idea of where to focus.
You should now have "/profiler" HTML snapshots to analyze. See the above
snapshots and notes for details on the "/profiler" servlet output and example analysis. To enable offline javascript-based column sorting in the snapshots you must save
this link as "tablesort.js" and run the following script on "*.html":
#!/bin/sh # replace second line: # <script LANGUAGE="JavaScript" src="/$~/profiler?action=script"></script> # with: # <script LANGUAGE="JavaScript" src="tablesort.js"></script> for x in $*; do cat <<"EOF" | ed $x 2c <script LANGUAGE="JavaScript" src="tablesort.js"></script> . w EOF done
Based upon your analysis, you may choose to:
- Modify your "foo.MyConfig" to increase/decrease the profiling level (e.g. enable allocation stacktraces for specific classes, track more than 1% of specific classes, etc). You'll need to modify the "profiler_config/src/foo/MyConfig.java" source, run the "compile_config" script, and run the society again.
- Fix your code to correct a memory leak, create a new jar, run "./transform_jar" on your jar, and run again.
- Stop profiling...
Design Details
The bytecode modification is done by the
SelfProfiler class.
The basic idea is to transform a class like:
public class Bar extends Foo { public Bar() { System.out.println("Hello, world!"); } } into:
import foo.MyConfig; // your config class import org.cougaar.profiler.MemoryTracker; public class Bar extends Foo { public Bar() { System.out.println("Hello, world!"); MY_PROFILER.add(this); } private static final MemoryTracker MY_PROFILER = MemoryTracker.getInstance( "Bar", // classname 8, // object header + fields size MyConfig.getOptions("the_module", "Bar")); }
The full transform is more complex, since it must call call the profiler in all constructors, "clone()", and "readObject()". The class must also be tagged with the original serialVersionUID to preserve serialization compatibility.
The "run()" method outlines the basic steps:
private void run() { // make sure the serialVersionUID field exists, so we don't // break serialization addSerialVersionUID(); if (ENABLE_SIZE_AND_CAPACITY) { // add "$get_size()" and "$get_capacity()" methods defineSize(); defineCapacity(); } // add static profiler field defineProfilerField(); // init profiler field in class init method initProfilerField(); // override "profile_<super>" method to disable super's profiler disableSuperProfiler(); // add our "profile_<class>" method addClassProfiler(); // update constructors with "profile_<class>" calls // (excluding constructors that call "this(..)") // // note that the default constructor always exists callProfilerInConstructors(); // catch "readObject", which is a hidden constructor. // // we can ignore "readExternal" since it calls the no-arg // constructor. callProfilerInReadObject(); // handle "clone", which is yet another hidden constructor. callProfilerInClone(); }
The note that "the default constructor always exists" refers to javac's creation of a default constructor if one is not specified, e.g.:
class Test {} Javac will create a "public Test() { super(); }" constructor. The VM requires at least one constructor for any class that will be instantiated, so the bytecode instrumentor doesn't need to add constructors.
"clone" creates an object through "super.clone()", which doesn't call the constructor. The result of "super.clone()" is tracked. If clone doesn't call "super.clone()" then it presumable returns either an existing object or uses a constructor to create a new instance.
"readObject" creates the object without calling the constructor. Here's an example test-case:
import java.io.*; class Test implements Serializable { public static void main(String[] args) throws Exception { Test t1 = new Test(); System.out.println("t1: "+t1); Test t2 = (Test) testSerial(t1); System.out.println("t2: "+t2); } public Test() { System.out.println("constructor"); } private void readObject(ObjectInputStream s) throws Exception { System.out.println("readObject"); s.defaultReadObject(); } private static Object testSerial(Object o) throws Exception { ByteArrayOutputStream bos = new ByteArrayOutputStream(); ObjectOutputStream os = new ObjectOutputStream(bos); os.writeObject(o); os.flush(); o = null; byte[] b = bos.toByteArray(); ByteArrayInputStream bis = new ByteArrayInputStream(b); ObjectInputStream is = new ObjectInputStream(bis); Object newO = is.readObject(); return newO; } } It generates output similar to:
constructor t1: Main@1a16869 readObject t2: Main@19efb05
ObjectInputStream? uses
ObjectStreamClass? "newInstance()", which uses:
import sun.reflect.ReflectionFactory; private static final ReflectionFactory reflFactory = ..; .. Constructor cons = reflFactory.newConstructorForSerialization(cl, cons); .. return cons.newInstance(..);
From there it's muddy, since we're in Sun's private code, but the end result is that "readObject" never calls the constructor.
Externalizable will call the zero-arg constructor. If there's no zero-arg constructor then it'll throw a runtime
InvalidClassException? . Note that non-static inner classes always pass the outer class as an implicit first parameter, so they can't be Externalizable.
The profiler can't track primitive arrays since they're not created through a class constructor. A possible workaround is to track arrays created by a profiled class by tracking the result of "newarray" bytecodes. This is different than the profiler's normal operation, since arrays constructed by non-profiled classes won't be tracked, as opposed to the current behavior where profiled classes are tracked regardless of which code constructed them. A synthetic "java.lang.Array" class would be ideal but would require VM changes.
A possible enhancement is to create a transform that tracks "new" bytecodes to find allocations
by a given class, e.g. track all objects allocated by Xerces classes.
CVS
Anonymous CVS Access
This project's CVS repository can be checked out through anonymous (pserver) CVS with the following instruction set. The module you wish to check out must be specified as the
modulename. When prompted for a password for
anonymous, simply press the Enter key.
cvs -d :pserver:anonymous@cougaar.org:/cvsroot/profiler login
cvs -d :pserver:anonymous@cougaar.org:/cvsroot/profiler checkout
modulename
Developer CVS Access via SSH
Only project developers can access the CVS tree via this method. SSH must be installed on your client machine. Substitute
modulename and
developername with the proper values. Enter your site password when prompted.
export CVS_RSH=ssh
cvs -d :ext:
developername@cougaar.org:/cvsroot/profiler checkout
modulename
Browse the CVS Tree
Browsing the CVS tree gives you a great view into the current status of this project's code. You may also view the complete histories of any file in the repository.
[
Browse CVS Repository]