Monday, February 20, 2012

Member specified cluster name which did not match name of running cluster

Problem

We have a Cache Server started up on Oracle WebLogic Server 11. When using the Sample Cache Client Application (query.sh) and trying to create a cache:
CohQL> create cache "products"
We get the following error:
2012-02-20 17:42:42.115/9.052 Oracle Coherence GE 3.7.1.0 <Info> (thread=Main Thread, member=n/a): Loaded cache configuration from "jar:file:/u01/app/oracle/Middleware/coherence_3.7/lib/coherence.jar!/coherence-cache-config.xml"
2012-02-20 17:42:44.257/11.195 Oracle Coherence GE 3.7.1.0 <D4> (thread=Main Thread, member=n/a): TCMP bound to /192.168.97.111:8088 using SystemSocketProvider
2012-02-20 17:42:45.031/11.968 Oracle Coherence GE 3.7.1.0 <Info> (thread=Cluster, member=n/a): Failed to satisfy the variance: allowed=16, actual=52
2012-02-20 17:42:45.032/11.969 Oracle Coherence GE 3.7.1.0 <Info> (thread=Cluster, member=n/a): Increasing allowable variance to 20
2012-02-20 17:42:45.282/12.219 Oracle Coherence GE 3.7.1.0 <Error> (thread=Cluster, member=n/a): This member could not join the cluster because of a configuration mismatch between this member and the configuration being used by the rest of the cluster. This member specified a cluster name of "coh_cluster" which did not match the name of the running cluster. This indicates that there are multiple clusters on this network attempting to use overlapping network configurations. Rejected by Member(Id=1, Timestamp=2012-02-20 17:37:41.584, Address=192.168.97.111:8888, MachineId=16555, Location=site:,machine:soahost1,process:29313,member:coh_server1, Role=WeblogicWeblogicCacheServer).
2012-02-20 17:42:45.301/12.238 Oracle Coherence GE 3.7.1.0 <D5> (thread=Cluster, member=n/a): Service Cluster left the cluster
2012-02-20 17:42:45.356/12.293 Oracle Coherence GE 3.7.1.0 <Error> (thread=Main Thread, member=n/a): Error while starting cluster: java.lang.RuntimeException: Failed to start Service "Cluster" (ServiceState=SERVICE_STOPPED, STATE_JOINING)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:38)
        at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:6)
        at com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:56)
        at com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:11)
        at com.tangosol.coherence.component.util.SafeCluster.startCluster(SafeCluster.CDB:3)
        at com.tangosol.coherence.component.util.SafeCluster.restartCluster(SafeCluster.CDB:10)
        at com.tangosol.coherence.component.util.SafeCluster.ensureRunningCluster(SafeCluster.CDB:26)
        at com.tangosol.coherence.component.util.SafeCluster.start(SafeCluster.CDB:2)
        at com.tangosol.net.CacheFactory.ensureCluster(CacheFactory.java:427)
        at com.tangosol.net.DefaultConfigurableCacheFactory.ensureServiceInternal(DefaultConfigurableCacheFactory.java:968)
        at com.tangosol.net.DefaultConfigurableCacheFactory.ensureService(DefaultConfigurableCacheFactory.java:937)
        at com.tangosol.net.DefaultConfigurableCacheFactory.ensureCache(DefaultConfigurableCacheFactory.java:919)
        at com.tangosol.net.DefaultConfigurableCacheFactory.configureCache(DefaultConfigurableCacheFactory.java:1296)
        at com.tangosol.net.DefaultConfigurableCacheFactory.ensureCache(DefaultConfigurableCacheFactory.java:297)
        at com.tangosol.net.CacheFactory.getCache(CacheFactory.java:204)
        at com.tangosol.net.CacheFactory.getCache(CacheFactory.java:181)
        at com.tangosol.coherence.dslquery.CoherenceQuery.execute(CoherenceQuery.java:574)
        at com.tangosol.coherence.dslquery.QueryPlus.query(QueryPlus.java:199)
        at com.tangosol.coherence.dslquery.QueryPlus.evalLine(QueryPlus.java:539)
        at com.tangosol.coherence.dslquery.QueryPlus.jlineREPL(QueryPlus.java:103)
        at com.tangosol.coherence.dslquery.QueryPlus.main(QueryPlus.java:960)
From the logs above, the address (192.168.97.111:8888), machine (soahost1), and member (coh_server1) settings appear to be correct. The cluster name coh_cluster also appears to be correct as well, because that is the name of the coherence cluster on the console:
 
This is exactly the same cluster name -Dtangosol.coherence.cluster=coh_cluster which is found in JAVA_OPTS in the query.sh script.


Solution

For some odd reason, the cluster name on the WebLogic Administration Console is not the same as what the actual Cache Server starts with.

1. Open the Coherence Server log file: /u01/app/oracle/Middleware/user_projects/domains/soa_domain/servers_coherence/coh_server1/logs/coh_server1.out

2. Get the cluster name and port which is shown when the coherence server starts up:
Oracle Coherence Version 3.7.1.0 Build 27797
 Grid Edition: Development mode
Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

2012-02-20 18:08:02.935/7.473 Oracle Coherence GE 3.7.1.0 <D4> (thread=Main Thread, member=n/a): TCMP bound to /192.168.97.111:8888 using SystemSocketProvider
2012-02-20 18:08:07.379/11.918 Oracle Coherence GE 3.7.1.0 <Info> (thread=Cluster, member=n/a): Created a new cluster "cluster:0x75CB" with Member(Id=1, Timestamp=2012-02-20 18:08:03.344, Address=192.168.97.111:8888, MachineId=16555, Location=site:,machine:soahost1,process:31409,member:coh_server1, Role=WeblogicWeblogicCacheServer, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=2) UID=0xC0A8616F000001359D05959040AB22B8
2012-02-20 18:08:07.435/11.973 Oracle Coherence GE 3.7.1.0 <Info> (thread=Main Thread, member=n/a): Started cluster Name=cluster:0x75CB

Group{Address=231.1.1.1, Port=7777, TTL=4}
4. Edit your query.sh file, and ensure that the cluster name and cluster port are configured identically:
JAVA_OPTS="-Xms64m -Xmx64m -Dtangosol.coherence.distributed.localstorage=false -Dtangosol.coherence.cluster=cluster:0x75CB -Dtangosol.coherence.clusterport=7777"

Applicable Versions:
  • Oracle Coherence 3.7.1


Ahmed Aboulnaga

2 comments:

Todd Beets said...

I have experienced similar issues. You might find the following bug and patch useful for your issue:

ACTIVECACHE: CUSTOM OPERATIONAL CONFIG FILE NOT WORKING [Bug ID 13732966]

ACTIVECACHE: CUSTOM OPERATIONAL CONFIG FILE NOT WORKING [Patch ID 14932332]

Anonymous said...

I checked out the bug and patch and though I haven't applied the patch to confirm, it appears that you're right in that it is related to the issue at hand.

Thanks Todd!