wp recovery
Todd Wright
twright at bbn.com
Tue Jun 24 12:54:37 EDT 2008
Morgan Duchène wrote:
> Hi all,
>
> Great you are back, we were missing you!
>
> I am making small demos of the different features of Cougaar, here
> playing around with the white pages and it replication fixture:
> - Node 1 : PingSender
> - Node 2 : PingReceiver
> - Node 3 : NameServer1
> - Node 4 : NameServer2
> - Node 5 : NameServer3
>
> NameServer 1, 2 and 3 are configured to replicate their content.
>
> I start every node, everything works fine (/wp).
>
> Then I stop one name server. First there are some catched exceptions and
> messages, then only lines like:
> 2008-06-19 20:08:39,968 INFO [DestinationQueueImpl] - Node3: No valid
> links to NameServer2
>
> Then I restart the stopped name server and all wps don't seem to recover
> quite well from there. The name server agents seem to communicate, but
> there seems to still be a problem communicating with the node because
> there is still the same info problem and the white pages servlet
> confirms it has not recovered well. But it comes back to normal if I
> restart the other two name servers.
>
> Is there any configuration or something I missed because I don't see why
> agents don't recover from this stop and restart, which can happen in
> real world? (my configuration is below)
This might be a problem with restarting a node with an old node's name.
Instead of restarting Node2, try start a new node with NameServer2, e.g.:
<node name="Node6">
<agent name="NameServer2">
<component class='org.cougaar.core.wp.server.Server'/>
</agent>
</node>
>
> On the other hand, it brings me to a question about white pages. In a
> very big society with millions of nodes,
At that scale you should probably use DNS or LDAP.
The local-only white pages implementation's "submit" method is only about 100
lines:
http://cougaar.org/cgi-bin/viewcvs.cgi/core/src/org/cougaar/core/wp/LoopbackWhitePages.java?annotate=1.2&cvsroot=core
So, you could write your own implementation and modify
configs/common/NodeAgent.xsl
to load it instead of the standard Cougaar implement.
> how would the white pages behave and isn't it a single point of failure with
> the default implementation?
No, it should work so long as any white pages server is running. However, the
performance will be O(N^2) for N servers. For more info see:
http://cougaar.org/docman/view.php/17/176/Scalable_Naming_KIMAS_2005.ppt
http://cougaar.org/docman/view.php/17/175/scalability_paper_kimas_05.pdf
Todd
>
> Thanks,
> Morgan
>
>
>
>
>
> /_Logging props :_/
> log4j.rootCategory=WARN,A1
> log4j.appender.A1.layout=org.apache.log4j.PatternLayout
> log4j.appender.A1.layout.ConversionPattern=%d{ISO8601} %-5p [%c{1}] - %m%n
> log4j.appender.A1=org.apache.log4j.ConsoleAppender
>
> log4j.category.org.cougaar.mts=INFO
>
>
> /_Runtime config file : _/
> <?xml version='1.0'?>
>
> <!--
> Two-node "ping" runtime System Properties.
> -->
> <runtime>
>
> <!--
> NameServers
> -->
> <vm_parameter name="-Dorg.cougaar.name.server"
> value="NameServer1 at localhost:8888"/>
> <vm_parameter name="-Dorg.cougaar.name.server.WP-2"
> value="NameServer2 at localhost:8888"/>
> <vm_parameter name="-Dorg.cougaar.name.server.WP-3"
> value="NameServer3 at localhost:8888"/>
>
> <vm_parameter
> name="-Dorg.cougaar.society.xsl.param.template"
> value="lan"/>
>
> <!-- Optional tuning to reduce naming service startup time -->
> <vm_parameter name="-Dorg.cougaar.core.wp.server.successTTD"
> value="30000"/>
> <vm_parameter name="-Dorg.cougaar.core.wp.server.failTTD" value="1000"/>
> <vm_parameter name="-Dorg.cougaar.core.wp.resolver.rmi.minLookup"
> value="500"/>
> <vm_parameter name="-Dorg.cougaar.core.wp.resolver.rmi.maxLookup"
> value="2000"/>
> <vm_parameter name="-Dorg.cougaar.core.mts.destq.retry.initialTimeout"
> value="250"/>
> <vm_parameter name="-Dorg.cougaar.core.mts.destq.retry.maxTimeout"
> value="500"/>
>
> <!-- Optional log4j config file -->
> <vm_parameter
> name="-Dorg.cougaar.core.logging.config.filename"
> value="logging.props"/>
> </runtime>
>
>
> /_Society config file :_/
> <?xml version='1.0'?>
>
> <!--
> Two-node "ping" society definition.
> -->
> <society>
>
> <node name="Node1">
>
> <agent name="A">
> <component class="org.cougaar.demo.wp.PingSender">
> <argument name="target" value="B"/>
> </component>
> <component class="org.cougaar.demo.wp.PingServlet">
> <argument name="path" value="/ping"/>
> </component>
> </agent>
> </node>
>
>
> <node name="Node2">
>
> <agent name="B">
> <component class="org.cougaar.demo.wp.PingReceiver"/>
> </agent>
> </node>
>
> <!-- First wp -->
> <node name="Node3">
>
> <!-- Agent "NameServer" will be our society-wide naming service -->
> <agent name="NameServer1">
> <component class='org.cougaar.core.wp.server.Server'/>
> </agent>
> </node>
>
> <!-- Another wp -->
> <node name="Node4">
>
> <!-- Agent "NameServer" will be our society-wide naming service -->
> <agent name="NameServer2">
> <component class='org.cougaar.core.wp.server.Server'/>
> </agent>
> </node>
>
> <!-- Another wp -->
> <node name="Node5">
>
> <!-- Agent "NameServer" will be our society-wide naming service -->
> <agent name="NameServer3">
> <component class='org.cougaar.core.wp.server.Server'/>
> </agent>
> </node>
>
> </society>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Cougaar-developers mailing list
> Cougaar-developers at cougaar.org
> http://cougaar.org/mailman/listinfo/cougaar-developers
More information about the Cougaar-developers
mailing list