wp recovery

Todd Wright twright at bbn.com
Tue Jun 24 12:54:37 EDT 2008


Morgan Duchène wrote:
> Hi all,
> 
> Great you are back, we were missing you!
> 
> I am making small demos of the different features of Cougaar, here 
> playing around with the white pages and it replication fixture:
> - Node 1 : PingSender
> - Node 2 : PingReceiver
> - Node 3 : NameServer1
> - Node 4 : NameServer2
> - Node 5 : NameServer3
> 
> NameServer 1, 2 and 3 are configured to replicate their content.
> 
> I start every node, everything works fine (/wp).
> 
> Then I stop one name server. First there are some catched exceptions and 
> messages, then only lines like:
> 2008-06-19 20:08:39,968 INFO  [DestinationQueueImpl] - Node3: No valid 
> links to NameServer2
> 
> Then I restart the stopped name server and all wps don't seem to recover 
> quite well from there. The name server agents seem to communicate, but 
> there seems to still be a problem communicating with the node because 
> there is still the same info problem and the white pages servlet 
> confirms it has not recovered well. But it comes back to normal if I 
> restart the other two name servers.
> 
> Is there any configuration or something I missed because I don't see why 
> agents don't recover from this stop and restart, which can happen in 
> real world? (my configuration is below)

This might be a problem with restarting a node with an old node's name.

Instead of restarting Node2, try start a new node with NameServer2, e.g.:
    <node name="Node6">
      <agent name="NameServer2">
        <component class='org.cougaar.core.wp.server.Server'/>
      </agent>
    </node>

> 
> On the other hand, it brings me to a question about white pages. In a 
> very big society with millions of nodes, 

At that scale you should probably use DNS or LDAP.

The local-only white pages implementation's "submit" method is only about 100 
lines:
http://cougaar.org/cgi-bin/viewcvs.cgi/core/src/org/cougaar/core/wp/LoopbackWhitePages.java?annotate=1.2&cvsroot=core

So, you could write your own implementation and modify
   configs/common/NodeAgent.xsl
to load it instead of the standard Cougaar implement.

> how would the white pages behave and isn't it a single point of failure with
> the default implementation?

No, it should work so long as any white pages server is running.  However, the 
performance will be O(N^2) for N servers.  For more info see:
   http://cougaar.org/docman/view.php/17/176/Scalable_Naming_KIMAS_2005.ppt
   http://cougaar.org/docman/view.php/17/175/scalability_paper_kimas_05.pdf

Todd

> 
> Thanks,
> Morgan
> 
> 
> 
> 
> 
> /_Logging props :_/
> log4j.rootCategory=WARN,A1
> log4j.appender.A1.layout=org.apache.log4j.PatternLayout
> log4j.appender.A1.layout.ConversionPattern=%d{ISO8601} %-5p [%c{1}] - %m%n
> log4j.appender.A1=org.apache.log4j.ConsoleAppender
> 
> log4j.category.org.cougaar.mts=INFO
> 
> 
> /_Runtime config file : _/
> <?xml version='1.0'?>
> 
> <!--
> Two-node "ping" runtime System Properties.
> -->
> <runtime>
> 
>   <!--
> NameServers
>   -->
>   <vm_parameter name="-Dorg.cougaar.name.server" 
> value="NameServer1 at localhost:8888"/>
>   <vm_parameter name="-Dorg.cougaar.name.server.WP-2" 
> value="NameServer2 at localhost:8888"/>
>   <vm_parameter name="-Dorg.cougaar.name.server.WP-3" 
> value="NameServer3 at localhost:8888"/>
>  
>   <vm_parameter
>     name="-Dorg.cougaar.society.xsl.param.template"
>     value="lan"/>
> 
>   <!-- Optional tuning to reduce naming service startup time -->
>   <vm_parameter name="-Dorg.cougaar.core.wp.server.successTTD" 
> value="30000"/>
>   <vm_parameter name="-Dorg.cougaar.core.wp.server.failTTD" value="1000"/>
>   <vm_parameter name="-Dorg.cougaar.core.wp.resolver.rmi.minLookup" 
> value="500"/>
>   <vm_parameter name="-Dorg.cougaar.core.wp.resolver.rmi.maxLookup" 
> value="2000"/>
>   <vm_parameter name="-Dorg.cougaar.core.mts.destq.retry.initialTimeout" 
> value="250"/>
>   <vm_parameter name="-Dorg.cougaar.core.mts.destq.retry.maxTimeout" 
> value="500"/>
> 
>   <!-- Optional log4j config file -->
>   <vm_parameter
>      name="-Dorg.cougaar.core.logging.config.filename"
>      value="logging.props"/>
> </runtime>
> 
> 
> /_Society config file :_/
> <?xml version='1.0'?>
> 
> <!--
> Two-node "ping" society definition.
> -->
> <society>
> 
>   <node name="Node1">
> 
>     <agent name="A">
>       <component class="org.cougaar.demo.wp.PingSender">
>         <argument name="target" value="B"/>
>       </component>
>       <component class="org.cougaar.demo.wp.PingServlet">
>         <argument name="path" value="/ping"/>
>       </component>
>     </agent>
>   </node>
> 
>  
>   <node name="Node2">
>    
>     <agent name="B">
>       <component class="org.cougaar.demo.wp.PingReceiver"/>
>     </agent>
>   </node>
> 
>   <!-- First wp -->
>   <node name="Node3">
> 
>     <!-- Agent "NameServer" will be our society-wide naming service -->
>     <agent name="NameServer1">
>       <component class='org.cougaar.core.wp.server.Server'/>
>     </agent>
>   </node>
> 
>   <!-- Another wp -->
>   <node name="Node4">
> 
>     <!-- Agent "NameServer" will be our society-wide naming service -->
>     <agent name="NameServer2">
>       <component class='org.cougaar.core.wp.server.Server'/>
>     </agent>
>   </node>
> 
>   <!-- Another wp -->
>   <node name="Node5">
> 
>     <!-- Agent "NameServer" will be our society-wide naming service -->
>     <agent name="NameServer3">
>       <component class='org.cougaar.core.wp.server.Server'/>
>     </agent>
>   </node>
>  
> </society>
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Cougaar-developers mailing list
> Cougaar-developers at cougaar.org
> http://cougaar.org/mailman/listinfo/cougaar-developers



More information about the Cougaar-developers mailing list