Errata for Cougaar 9.4.1
The following infos/problems/workarounds were detected/posted after Cougaar 9.4.1 was released.
Issues:
New Bugs:
Persistence/Rehydration bugs:
The following bugs were observed while persistence was enabled and a supply point agent (47-FSB) was killed and then restarted. Although the behavior of all agents was not characterized, we expect most "supply thread" agents to exhibit similar behavior as they contain a similar complement of plugins. Upon rehydration the following known bugs occur:
- Bug #1847 - After persistance, no servlet response (fixed in 9.4.1.1)
- Bug #1850 - During rehydration - Illegal rescind of a task that
is connected to a workflow (fixed in 9.4.1.1)
- Bug #1938 - Upon 47-FSB rehydration, negative days_spanned ERROR
at 123-MSB (fixed in 9.4.1.1)
These bugs occured after quiescence, but 1847 and 1850 were also observed while restarting in the midst of processing.
Restart reconciliation bug:
This bug occurs if an agent runs with persistence, is killed, and then restarted and rehydrated with its prior state. The restarted agent only reconciles its blackboard with the agents that exist at the restart time, and fails to reconcile with agents that are created afterwards.
This is bug #1936, which will be fixed in the next release (fixed in 9.4.1.1).
Windows AppServer kill doesn't automatically kill child processes:
Bug #1662 is due to a Windows JVM change from JDK 1.3 to 1.4. If the user "CTRL-C"s an AppServer that is currently running nodes, these child nodes are not automatically killed by the dying AppServer JVM process.
Before killing an AppServer, Windows users should be careful to either make sure that CSMART has killed all the nodes on that AppServer. The Windows TaskManager can also be used to check for running "zombie" nodes and kill them.
CSMART Performance issues and runtime bugs:
CSMART Experiment saving turns out to be quite slow, particularly when
many recipes are included. We are actively addressing this on several
fronts. In addition, a number of issues have been identified, in
particular with the new Component Collection recipes.
- Bug 2002: It is currently very slow to save Experiments in
CSMART. We expect multiple changes to address this in the very near
future. In post 9.4.1.1 releases, these include:
- We have changed the way experiment recipes are saved. If you
have a recipe target query that looks at Agent relationships,
Property Groups, or Plugins and their arguments, you must now edit
your CSMART startup script to set a new -D argument to allow these
recipes to find their targets correctly. The change is to not save
to the database the changes that each recipe makes until all
recipes have been applied. This is a significant performance
improvement, and will not impede most users. Those users who
require more complex recipe target queries however must disable
this imrovement.
- We have added a time that logs at SHOUT level when starting to
save an experiment, and then the total number of seconds taken when
the save is complete. (bug 2015).
- We have made multiple changes to the internals of how CSMART
traverses and manipulates data structures. These changes should
have no visible impact for users, beyond a speed improvement. See
the release notes for details.
- The CSMART 1AD society definitions have been slightly
modified. This change requires re-running
csmart/data/database/scripts/mysql/load_1ad_mysql.[sh/bat] (bug
1988)
- Component Collection recipes failed on save with a "Duplicate
Entry" warning. (Bug 1977) The same warning appeared occasionally
when you changed the name of an agent in a Complete Agent
recipe. (Bug 2031). These and related errors (due to bug 1810) have
been fixed.
- Copied Component Collection recipes lost some parameters (bug 2039).
- Renaming Component Collection recipes lost some components (bug 2040).
- Delete Complete Agent or Component Collection recipes failed to
completely delete the recipe in some cases. This resulted in errors
when later importing recipes. (Bug 2032, 2025). Recipes are now
fully deleted.
- You cannot however import the same recipe twice into a database in order
to correct its definition. You should instead delete the recipe
(through CSMART) from your database before re-importing it. (Bug
2004)
- You may not name an Agent in a Complete Agent or Component
Collection recipe the same as the recipe name. You are now forbidden
from doing so. (Bug 2021)
- If you try to add more than 11 parameters to a Component, the
additional parameter values simply over-write the 11th
parameter. Again, this is fixed in post-9.4.1.1 builds. (bug
2009)
- In rare circumstances, Null Pointer exceptions were possible
when loading recipes from the database, which have been
addressed. (bug 2000).
- Parsing of the HomeLocation in the MilitaryOrgPG was failing for
certain values. This too has been addressed (bug 2012).
- Avoid some additional very rary Null Pointer Exceptions when
your ApplicationServer dies unexpectedly. (bug 2008)
- The CSMART startup script csmart/bin/CSMART erroneously ran
CSMART through jdb. (bug 2022).
- Fixed a small typo in a comment in dumped INI files. (bug
1958)
- Loading communities from old-style csv files would fail when an
Assembly ID was provided. The scripts have been corrected. (bug
1995)
Various CSMART Usability issues:
A number of relatively minor CSMART usability issues were discovered. Unless
otherwise indicated, these will be fixed in the next patch release. For details
on these bugs, see the CSMART Release Notes for that release.
- Bug 1944: The Message Transport Aspects should now be loaded through recipes, and not by Global Command Line Arguments. Although this -D property is not harmful, it is no longer the preferred mechanism. It should be removed from Experiment's Global Command Line Arguments, and from your AppServer server.props file.
- Bug 1885: If you set a property to null in the Configuration Builder, then change it to non-null, you get a NullPointerException.
- Bug 1883: You should not be able to delete the AssetData folder on an Agent,
since there is no way to add it back.
- Bug 1902: If you specify a bad community.xml file to load, you may get an NPE.
- Bugs 1882, 1890, 1894, 1900, 1902, 1895, 1899: Trim white space off of agent name, node names, host names, community names, and command line arguments.
- Bug 1905: Do not allow empty community Attibute IDs.
- Bugs 1888, 1916, 1906, 1899, 1895, 1893, 1894, 1926, 1928, 1930, 1912, 1923, 1928: Treat closing a dialog box like hitting Cancel.
- Bugs 1894, 1895, 1899, 1906, 1916: When asking for a file to read, complain about un-readable or nonexistent file.
- Bugs 1900, 1882, 1890, 1894: Treat an empty agent, host, node, community, or file name like the user entered nothing.
- Bugs 1880, 1913: Clean up some help files.
- Bugs 1916, 1919, 1918, 1907, 1881, 1896: Put reasonable titles on dialogs.
- Bug 1904: Correct an erroneous error-check when editing communities.
- Bug 1874: The table of -D arguments in the Console's Node Info screen should not be editable.
- Bugs 1925, 1910: Fix some help menu items.
- Bugs 1921, 1922: Limit some debug output in the Society Monitor.
- Bug 1919: The Society Monitor Save and Save As do not work consistently.
- Bug 1878, 1914: Society Monitor Experiment or URL to monitor dialogs need cleanup.
- Bugs 1888, 1892: If an experiment has no OrgGroups (none currently do), do not show the OrgGroups section of the Thread dialog.
- Bugs 1912, 1923, 1928: Provide defaults in input dialogs, and error messages on bad input.
RSS thread deadlock issue
A thread deadlock problem in the RSS was recently detected and
corrected. See Bug 2029. The fix is checked in as part of the
quoSumo.jar, in jars/lib. You must get the newest version of
this jar to prevent this deadlock from locking up the threads of
the MetricService. The command
jar tf quoSumo.jar | grep VERSION
should print VERSION_3-0-10_02-08-13_15-49-22 (or later).
Following bugs in 9.4.1 were fixed in 9.4.1.1:
- Bug 1803: Query/DetailRequest can result in cross agent references to the same object. This primarily affected the propagation of OrgActivity changes.
- Bug 1847: After persist, no servlet response
- Bug 1850: During rehydration - Illegal rescind of unconnected task
- Bug 1874: Make -D args not editable
- Bug 1875: (see bugzilla)
- Bug 1878,1914: Society monitor needs cleanup
- Bug 1879: (see bugzilla)
- Bug 1880,1913: helpfile cleanup
- Bug 1882,1890,1894,1900,1902,1895,1899: whitespace problems
- Bug 1883: deletable AssetData folder in an Agent
- Bug 1885: null property in config builder
- Bug 1888,1916,1906,1899,1895,1893,1894,1926,1928,1930,1912,1923: close dialog should imply cancel
- Bug 1892: show orggroups only when meaningful
- Bug 1901: (see bugzilla)
- Bug 1902: npe in bad community.xml
- Bug 1904: Correct bad error check
- Bug 1905: disallow empty community attribute ids
- Bug 1909: (see bugzilla)
- Bug 1916: complain about unreadable/nonexistant files
- Bug 1917: (see bugzilla)
- Bug 1919,1918,1907,1881,1896: reasonable dialog titles
- Bug 1921,1922: Limit debug output in some circumstances
- Bug 1925,1910: fix help menu items
- Bug 1929: (see bugzilla)
- Bug 1936: (see bugzilla)
- Bug 1938: negative days_spanned error
- Bug 1944: MTS aspects
- Bug 1945: (see bugzilla)
- Bug 1955: (see bugzilla)
- Bug 1964: (see bugzilla)
- Bug 1969: (see bugzilla)