Monday, February 20, 2012

Failed to start Service "Cluster" (ServiceState=SERVICE_STOPPED, STATE_JOINING)

Problem

When starting the Coherence server via the WebLogic Server Administration Console, you get the following FAILED_NOT_RESTARTABLE error:

Observing the AdminServer log ($DOMAIN_HOME/servers/AdminServer/logs/AdminServer.log) reveals the following:
####<Feb 20, 2012 6:06:26 PM EST> <Error> <NodeManager> <soahost1> <AdminServer> <[ACTIVE] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <cb680017c6a0acfe:-1d3ef2f7:1359ce5a448:-8000-0000000000000105> <1329779186314> <BEA-300048> <Unable to start the server coh_server1 : Exception while starting server 'coh_server1'>
Observing the Coherence Server output file ($DOMAIN_HOME/servers_coherence/coh_server1/logs/coh_server1.out) reveals the following:
2012-02-20 18:06:20.223/2.992 Oracle Coherence 3.7.1.0 <Info> (thread=Main Thread, member=n/a): Loaded operational configuration from "jar:file:/u01/app/oracle/Middleware/coherence_3.7/lib/coherence.jar!/tangosol-coherence.xml"
2012-02-20 18:06:20.273/3.041 Oracle Coherence 3.7.1.0 <Info> (thread=Main Thread, member=n/a): Loaded operational overrides from "jar:file:/u01/app/oracle/Middleware/coherence_3.7/lib/coherence.jar!/tangosol-coherence-override-dev.xml"
2012-02-20 18:06:20.274/3.042 Oracle Coherence 3.7.1.0 <D5> (thread=Main Thread, member=n/a): Optional configuration override "/tangosol-coherence-override.xml" is not specified
2012-02-20 18:06:20.282/3.050 Oracle Coherence 3.7.1.0 <D5> (thread=Main Thread, member=n/a): Optional configuration override "/custom-mbeans.xml" is not specified

Oracle Coherence Version 3.7.1.0 Build 27797
 Grid Edition: Development mode
Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

2012-02-20 18:06:24.606/7.374 Oracle Coherence GE 3.7.1.0 <D4> (thread=Main Thread, member=n/a): TCMP bound to /192.168.97.111:8888 using SystemSocketProvider
2012-02-20 18:06:25.237/8.005 Oracle Coherence GE 3.7.1.0 <Info> (thread=Cluster, member=n/a): Failed to satisfy the variance: allowed=16, actual=68
2012-02-20 18:06:25.238/8.006 Oracle Coherence GE 3.7.1.0 <Info> (thread=Cluster, member=n/a): Increasing allowable variance to 22
2012-02-20 18:06:25.455/8.223 Oracle Coherence GE 3.7.1.0 <Error> (thread=Cluster, member=n/a): This member could not join the cluster because of a configuration mismatch between this member and the configuration being used by the rest of the cluster. This member specified a cluster name of "cluster:0x75CB" which did not match the name of the running cluster. This indicates that there are multiple clusters on this network attempting to use overlapping network configurations. Rejected by Member(Id=1, Timestamp=2012-02-20 18:02:34.354, Address=192.168.97.111:8088, MachineId=16555, Location=site:,machine:soahost1,process:31076, Role=TangosolCoherenceQueryPlus).
2012-02-20 18:06:25.471/8.239 Oracle Coherence GE 3.7.1.0 <D5> (thread=Cluster, member=n/a): Service Cluster left the cluster
2012-02-20 18:06:25.511/8.279 Oracle Coherence GE 3.7.1.0 <Error> (thread=Main Thread, member=n/a): Error while starting cluster: java.lang.RuntimeException: Failed to start Service "Cluster" (ServiceState=SERVICE_STOPPED, STATE_JOINING)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:38)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:6)
        at com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:56)
        at com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:11)
        at com.tangosol.coherence.component.util.SafeCluster.startCluster(SafeCluster.CDB:3)
        at com.tangosol.coherence.component.util.SafeCluster.restartCluster(SafeCluster.CDB:10)
        at com.tangosol.coherence.component.util.SafeCluster.ensureRunningCluster(SafeCluster.CDB:26)
        at com.tangosol.coherence.component.util.SafeCluster.start(SafeCluster.CDB:2)
        at com.tangosol.net.CacheFactory.ensureCluster(CacheFactory.java:427)
        at com.tangosol.net.DefaultConfigurableCacheFactory.ensureServiceInternal(DefaultConfigurableCacheFactory.java:968)
        at com.tangosol.net.DefaultConfigurableCacheFactory.ensureService(DefaultConfigurableCacheFactory.java:937)
        at com.tangosol.net.DefaultCacheServer.startServices(DefaultCacheServer.java:81)
        at com.tangosol.net.DefaultCacheServer.intialStartServices(DefaultCacheServer.java:250)
        at weblogic.nodemanager.server.provider.WeblogicCacheServer.intialStartServices(WeblogicCacheServer.java:84)
        at com.tangosol.net.DefaultCacheServer.startAndMonitor(DefaultCacheServer.java:55)
        at weblogic.nodemanager.server.provider.WeblogicCacheServer.startAndMonitor(WeblogicCacheServer.java:77)
        at weblogic.nodemanager.server.provider.WeblogicCacheServer.main(WeblogicCacheServer.java:70)

Exception in thread "Main Thread" java.lang.RuntimeException: Failed to start Service "Cluster" (ServiceState=SERVICE_STOPPED, STATE_JOINING)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:38)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:6)
        at com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:56)
        at com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:11)
        at com.tangosol.coherence.component.util.SafeCluster.startCluster(SafeCluster.CDB:3)
        at com.tangosol.coherence.component.util.SafeCluster.restartCluster(SafeCluster.CDB:10)
        at com.tangosol.coherence.component.util.SafeCluster.ensureRunningCluster(SafeCluster.CDB:26)
        at com.tangosol.coherence.component.util.SafeCluster.start(SafeCluster.CDB:2)
        at com.tangosol.net.CacheFactory.ensureCluster(CacheFactory.java:427)
        at com.tangosol.net.DefaultConfigurableCacheFactory.ensureServiceInternal(DefaultConfigurableCacheFactory.java:968)
        at com.tangosol.net.DefaultConfigurableCacheFactory.ensureService(DefaultConfigurableCacheFactory.java:937)
        at com.tangosol.net.DefaultCacheServer.startServices(DefaultCacheServer.java:81)
        at com.tangosol.net.DefaultCacheServer.intialStartServices(DefaultCacheServer.java:250)
        at weblogic.nodemanager.server.provider.WeblogicCacheServer.intialStartServices(WeblogicCacheServer.java:84)
        at com.tangosol.net.DefaultCacheServer.startAndMonitor(DefaultCacheServer.java:55)
        at weblogic.nodemanager.server.provider.WeblogicCacheServer.startAndMonitor(WeblogicCacheServer.java:77)
        at weblogic.nodemanager.server.provider.WeblogicCacheServer.main(WeblogicCacheServer.java:70)
<Feb 20, 2012 6:06:25 PM> <FINEST> <NodeManager> <Waiting for the process to die: 31311>
<Feb 20, 2012 6:06:25 PM> <INFO> <NodeManager> <Server failed during startup so will not be restarted>
<Feb 20, 2012 6:06:26 PM> <FINEST> <NodeManager> <runMonitor returned, setting finished=true and notifying waiters>
Solution

I had a Sample Cache Client Application (query.sh) session open. Kill the session and the Coherence Server will startup fine.


Applicable Versions:
  • Oracle Coherence 3.7.1


Ahmed Aboulnaga

1 comment:

Anonymous said...

Thanks for you help. I did ps -eaf | grep coherence and killed the process. It worked for me!