Friday, January 28, 2011

SOA Suite 11g PS3 (11.1.1.4) fixes a lot of issues!

If you are running Oracle SOA suite 11g PS2 (11.1.1.3) and are in production, I salute you. The bad news is that 11.1.1.3 is pretty buggy and quite unstable, and you will run into endless errors. The good news is that the 11.1.1.4 resolves most of these issues.

If you haven't upgraded to 11.1.1.4, then do so immediately to take advantage of its benefits.

Below you will find a list of errors or exceptions you may encounter in 11.1.1.3 that are resolved with 11.1.1.4. Please note that you may still experience the issues below on 11.1.1.4 for other reasons, and don't hesitate to engage Oracle Support when needed. But hopefully most of your issues will get resolved with this patchset, as it did with us.

Good luck!

________________________________________


Issue #1: No free JVM heap space and java.lang.OutOfMemoryError

This error is probably the root cause of most of the issues below. As you deploy more composites to the server, the JVM heap size continues to shrink until you completely run out of memory.
<Jan 19, 2011 3:15:03 PM EST> <Warning> <RMI> <BEA-080003> <RuntimeException thrown by rmi server: javax.management.remote.rmi.RMIConnectionImpl.invoke(Ljavax.management.ObjectName;Ljava.lang.String;Ljava.rmi.MarshalledObject;[Ljava.lang.String;Ljavax.security.auth.Subject;)
 javax.management.RuntimeErrorException: java.lang.OutOfMemoryError: GC overhead limit exceeded.
javax.management.RuntimeErrorException: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:858)
The WebLogic Administration Console Monitoring Dashboard shows the amount of free java heap space gradually decreasing until it reaches zero:

Issue #2: soa_server1 using 99% of the CPU

The soa_server1 java process is pegging the CPU constantly at 99%. This is due to continuous garbage collections that show up in the soa_server1.out output file (if GC logging is enabled).

The output of the "top" command below shows the java process occupying 99.6% of CPU and 59% of memory:

The logs show continuous full garbage collections which hogs the CPU continuously:
1837.993: [Full GC [PSYoungGen: 873856K->800795K(961216K)] [PSOldGen: 3145727K->3145727K(3145728K)] 4019583K->3946523K(4106944K) [PSPermGen: 395967K->395967K(398016K)], 12.8336760 secs] [Times: user=12.59 sys=0.00, real=12.84 secs]
1851.327: [Full GC [PSYoungGen: 873856K->809876K(961216K)] [PSOldGen: 3145727K->3145727K(3145728K)] 4019583K->3955604K(4106944K) [PSPermGen: 395986K->395986K(397504K)], 12.7042890 secs] [Times: user=12.59 sys=0.00, real=12.71 secs]
1864.476: [Full GC [PSYoungGen: 873856K->823074K(961216K)] [PSOldGen: 3145727K->3145727K(3145728K)] 4019583K->3968802K(4106944K) [PSPermGen: 396182K->396182K(397440K)], 12.7887330 secs] [Times: user=12.63 sys=0.00, real=12.79 secs]
1877.642: [Full GC [PSYoungGen: 873856K->783438K(961216K)] [PSOldGen: 3145727K->3145721K(3145728K)] 4019583K->3929159K(4106944K) [PSPermGen: 396351K->395549K(396864K)], 16.7096860 secs] [Times: user=16.57 sys=0.00, real=16.71 secs]
Issue #3: Composites in unknown status

When you bring up the "soa_server1" managed server, composites start loading, then once the server runs out of java heap space, all composites become in an "unknown" status, particularly when the number of composites increase.

Issue #4: Unable to retrieve composite details

When logging in to the EM console, when you click on a composite, you may receive the following error.

Unable to retrieve composite details.
The composite HelloWorld (1.0) is not available. This could happen because either the composite hsa been undeployed or soa-infra has not yet loaded this composite.
This error is usually caused by one of two issues:
    a. Due to issue #1 above
    b. Related to the use of concrete instead of abstract WSDLs when referring to other composites.

Issue #5: WSDL URL is invalid

Composites are inaccessible, and when clicking on the 'Test' button, an HTTP 503 error is returned.

Either the WSDL URL is invalid or the WSDL file is not valid or incorrect. - WSDLException: faultCode=OTHER_ERROR: Failed to read WSDL from http://oradev:8001/soa-infra/services/default/HelloWorldComposite/client?WSDL: HTTP connection error code is 503

Issue #6: SOA-20003 due to an invalid WSDL when loading composites

When starting up the server, the soa_server1.out log will show the exception below when loading composites (i.e., indicating that it is unable to register the service), and thus the composite becomes in an unknown state.
<Jan 11, 2011 6:45:59 PM EST> <Error> <oracle.integration.platform> <SOA-20003> <Unable to register service.

oracle.fabric.common.FabricException: oracle.j2ee.ws.wsdl.LocalizedWSDLException: WSDLException: faultCode=INVALID_WSDL: The document: http://oradev:8001/soa-infra/services/default/HelloWorld/HelloWorld.wsdl is not a wsdl file or does not have a root element of "definitions" in the "http://schemas.xmlsoap.org/wsdl/" namespace or the "http://www.w3.org/2004/08/wsdl" namespace.: WSDLException: faultCode=INVALID_WSDL: The document: http://oradev:8001/soa-infra/services/default/HelloWorld/HelloWorld.wsdl is not a wsdl file or does not have a root element of "definitions" in the "http://schemas.xmlsoap.org/wsdl/" namespace or the "http://www.w3.org/2004/08/wsdl" namespace.

    at oracle.fabric.composite.model.CompositeModel.loadImports(CompositeModel.java:272)
Issue #7: SOA-20003 and HTTP 503 service unavailable when loading composites

When starting up the server, the soa_server1.out log will show the exception below when loading composites (i.e., indicating that it is unable to register the service), and thus the composite becomes unavailable.
<Jan 19, 2011 2:57:56 PM EST> <Error> <oracle.integration.platform> <SOA-20003> <Unable to register service.

oracle.fabric.common.FabricException: Error in getting XML input stream: http://oradev:8001/soa-infra/services/default/HelloWorld/HelloWorld?WSDL: Response: '503: Service Unavailable' for url: 'http://oradev:8001/soa-infra/services/default/HelloWorld/HelloWorld?WSDL'

    at oracle.fabric.common.metadata.MetadataManagerImpl.getInputStreamFromAbsoluteURL(MetadataManagerImpl.java:276)
Issue #8: java.sql.SQLException: The Network Adapter could not establish the connection

The logs may show this error occassionally, particularly when the server becomes unstable.
####<Jan 11, 2011 6:35:39 PM EST> <Info> <JDBC> <oradev> <soa_server1> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1294788939135> <BEA-001156> <Stack trace associated with message 001129 follows:


java.sql.SQLException: The Network Adapter could not establish the connection
Issue #9: BEA-149265 due to weblogic.application.ModuleException

The weblogic.application.ModuleException exception is primarily due to the network adapter not being able to establish the connection.
<Jan 19, 2011 3:05:26 PM EST> <Error> <Deployer> <BEA-149265> <Failure occurred in the execution of deployment request with ID '1295467469460' for task 'weblogic.deploy.configChangeTask.2'. Error is: 'weblogic.application.ModuleException: '
weblogic.application.ModuleException:
    at weblogic.jdbc.module.JDBCModule.prepare(JDBCModule.java:290)
    at weblogic.application.internal.flow.ModuleListenerInvoker.prepare(ModuleListenerInvoker.java:199)
    at weblogic.application.internal.flow.DeploymentCallbackFlow$1.next(DeploymentCallbackFlow.java:507)
    at weblogic.application.utils.StateMachineDriver.nextState(StateMachineDriver.java:41)
    at weblogic.application.internal.flow.DeploymentCallbackFlow.prepare(DeploymentCallbackFlow.java:149)
    Truncated. see log file for complete stacktrace

Caused By: weblogic.common.ResourceException: weblogic.common.ResourceException: Could not create pool connection. The DBMS driver exception was: The Network Adapter could not establish the connection
Issue #10: BEA-000000 due to timeout

You may get numerous transaction timeouts after 28 or 48 seconds.
<Jan 19, 2011 3:15:03 PM EST> <Warning> <oracle.soa.mediator.common> <BEA-000000> <Transaction commit failed due to excetipn

weblogic.transaction.RollbackException: Transaction timed out after 48 seconds

BEA1-0BF2AA3AE988E50C6CCB

    at weblogic.transaction.internal.TransactionImpl.throwRollbackException(TransactionImpl.java:1871)
Issue #11: BEA-149515 for numerous data sources

The BEA-149515 error appears for datasources, including many of the product datasources (for example, the AIA datasources).
<Jan 19, 2011 3:15:21 PM EST> <Warning> <JMX> <BEA-149515> <An error was encountered getting the attribute DatabaseProductName on the MBean com.bea:ServerRuntime=soa_server1,Name=AIADataSource,Type=JDBCDataSourceRuntime during a call to getAttributes>
Issue #12: Multiple WSM errors

There is a one-off patch that resolves these issues, but you are better off upgrading to 11.1.1.4 which also includes the fix.
<Jan 19, 2011 3:15:24 PM EST> <Error> <oracle.wsm.resources.security> <WSM-00006> <Error in receiving the request: oracle.wsm.security.SecurityException: WSM-00008 : Web service authentication failed..>
<Jan 19, 2011 3:15:24 PM EST> <Error> <oracle.wsm.resources.enforcement> <WSM-07607> <Failure in execution of assertion {http://schemas.oracle.com/ws/2006/01/securitypolicy}wss-username-token executor class oracle.wsm.security.policy.scenario.executor.WssUsernameTokenScenarioExecutor.>
<Jan 19, 2011 3:15:24 PM EST> <Error> <oracle.wsm.resources.enforcement> <WSM-07602> <Failure in WS-Policy Execution due to exception.>
<Jan 19, 2011 3:15:24 PM EST> <Error> <oracle.wsm.resources.enforcement> <WSM-07501> <Failure in Oracle WSM Agent processRequest, category=security, function=agent.function.service, application=soa-infra, composite=HelloWorldComposite, modelObj=client, policy=oracle/wss_username_token_service_policy, policyVersion=1, assertionName={http://schemas.oracle.com/ws/2006/01/securitypolicy}wss-username-token.>
<Jan 19, 2011 3:15:24 PM EST> <Error> <oracle.webservices.service> <OWS-04115> <An error occurred for port: FabricProvider: oracle.fabric.common.PolicyEnforcementException: FailedAuthentication : The security token cannot be authenticated..>
<Jan 19, 2011 3:15:24 PM EST> <Error> <oracle.wsm.resources.security> <WSM-00069> <The security header is missing.>
Issue #13: XAResource.XAER_RMERR start() failed on resource 'EDNDataSource_soa_domain'

This exception is related to an out-of-the-box datasource and is caused by a cryptic timeout error caused by the resource manager.
<Jan 18, 2011 4:53:49 PM EST> <Warning> <oracle.integration.platform.blocks.event.saq> <SOA-31013> <Error handling message (rolling back).java.sql.SQLException: Unexpected exception while enlisting XAConnection java.sql.SQLException: XA error: XAResource.XAER_RMERR start() failed on resource 'EDNDataSource_soa_domain': XAER_RMERR : A resource manager error has occured in the transaction branchweblogic.transaction.internal.ResourceAccessException: Transaction has timed out when making request to XAResource 'EDNDataSource_soa_domain'.
Issue #14: Numerous BEA-000000 errors

This is primarily due to lack of java heap space, which is directly related to issue #1 above.
<Jan 13, 2011 1:59:27 PM EST> <Warning> <oracle.soa.adapter> <BEA-000000> <JMSAdapter MyAdapterPoll JMSMessageConsumer_init: Retrying connection; attempt #1415>
<Jan 13, 2011 1:59:27 PM EST> <Warning> <oracle.soa.adapter> <BEA-000000> <JMSAdapter MyAdapterPoll
BINDING.JCA-12135
ERRJMS_ERR_CR_QUEUE_CONS.
ERRJMS_ERR_CR_QUEUE_CONS.
Unable to create Queue consumer due to JMSException.
Please examine the log file to determine the problem.
    at oracle.tip.adapter.jms.JMS.JMSConnection.createConsumer(JMSConnection.java:620)


Applicable Versions:
  • Oracle SOA Suite 11g (11.1.1.4)

18 comments:

Charandeep said...

Hi,

I read your blog and it was quite interesting. I have couple of questions would appreciate if you can answers these

how many of the 14 issues he says are resolved have we / are we facing with 11.1.1.3?

did you experience any broken code due to the upgrade

Ahmed Aboulnaga said...

Hello Charandeep,

As far as your second question goes, no code was broken after upgrading from SOA Suite 11.1.1.3 to 11.1.1.4.

I believe the PL/SQL purge scripts which purge the instance data (under the FABRIC database schema) have slightly changed, so if you are using those, double check them.

Pedro said...

Hello Ahmed.

My project team is facing most of these issues on the development environment (SOA Suite 11.1.1.3 running in Windows Server 2008 R2), specially issue #1 and #2. We have reviewed the JVM parameters (mainly =-Xmx2048m -Xms2048m -Xmn1228m -XX:MaxPermSize=512m -XX:PermSize=512m -XX:+AggressiveHeap -XX:NewRatio=2 -XX:+AggressiveOpts -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:ParallelGCThreads=4 -XX:InitialSurvivorRatio=10 -XX:SurvivorRatio=10 ) in order to obviate the problem, but after a few hours of work we deal with the same problems.

In release notes from Oracle 11g Release 1 (11.1.1.4.0) we could not find bug fixes directly related to these issues. We are considering upgrading, but we could not find more information related to the source of problem (even using Oracle support).

Do you think that the upgrade will be the only solution?

Best Regards.

Pedro.

Ahmed Aboulnaga said...

Hey Pedro,

The short answer is, upgrading may be your only option.

Like you, our SOA Suite 11.1.1.3 dev environment was unstable for many months, requiring restarts almost daily and unstable to the point that it was hindering development efforts.

Please note the following.

* Issue #2 is directly related to #1.

If you have very little heap space available, the WebLogic Server will continue to try to garbage collect to free up space (unsuccessfully), and repeat that cycle endlessly. This causes the high CPU utilization.

* Code deployments cause a massive drop in heap space.

For example, if you deploy 10 projects in a row, expect the heap space to be reduced anywhere from 100M to 500M. After some time, this space will be reclaimed. The point I am trying to make is that code deployments (whether through Ant or JDeveloper) has a direct impact on the available heap space. I also recommend at least doubling your PermSize and MaxPermSize.

* Unfortunately, you will have no choice but to upgrade to SP3 (11.1.1.4).

Like yourself, I faced similar issues with no resolution (even while working with Oracle Support). I cannot stress enough the importance of upgrading to 11.1.1.4. I have talked to other customers who have went live with 11.1.1.3, and all I can say is that they're brave.

Check this post out, as it quickly walks through the upgrade process (for Linux though), although you will want to read through the documentation specific to your OS and see if you have anything not covering in this post:

http://blog.ipnweb.com/2011/01/upgrading-to-soa-suite-11gr1-ps3-111140.html


Good luck.
~Ahmed

Pedro said...

Hello Ahmed.

Thanks for your reply. It was very useful to us.
We will have to start upgrading our composites, checking for code breakage.

Best Regards,

Pedro

Ahmed Aboulnaga said...

Pedro,

I strongly recommend downloading the IPN Web whitepaper on upgrading from SOA Suite 10g to 11g.

1. Navigate to www.ipnweb.com

2. Click on 'Library'

3. Download the 'Upgrading Oracle SOA Suite 10g to 11g' whitepaper

Good luck.

sridhar said...
This comment has been removed by the author.
sridhar said...

Hi,

We faced an awkward issue in our prod instance.
We are on 11.1.1.3 .
Some of the BPEL composites went to unknown state for a fraction of seconds. They got into running state after that.
Could you please tell what could be the reason.
We could not find any info in the log files. .

Ahmed Aboulnaga said...

Hi Sridhar,

Can you provide a little more detail? So if I understand this correctly, your BPEL processes are instantiated and running, then they go into an 'unknown' state briefly, then continue running fine after that? Can you provide some details about this particular BPEL process (Is it async or sync? Does it do polling?)

If that is the case, how do you know that it enters 'unknown' state for a fraction of a second?

Kamal said...
This comment has been removed by the author.
sridhar said...

Hi Ahmed,

Our issues are as follows :

1.The composites are getting into unknown state and their states are fluctuating.
Sometimes we see observe that even the composites being unknown state the instances are getting created for them.
2. SOA_SERVER (the 2 clusters) are getting down and their states are fluctuating.

As you mentioned 11.1.1.3 is buggy ?? Is it because of that we are facing these issues ??

Please clarify.

Ahmed Aboulnaga said...

Hey Sridhar,

Issue #3 above "Composites in unknown status" definitely does occur in 11.1.1.3 due to the reasons cited above. But I have never experienced a composite going in-and-out of the unknown state before.

My recommendations are:

1. Upgrade to 11.1.1.4 (even Oracle Production Management will strongly recommend it).

2. Have the admins look at the soa_server1.out and soa_server.log logs for any additional information.

3. Another theory could involve a coding problem. If composite A references composite B using its concrete WSDL, and composite B is down, composite A will be in an unknown state. If you have a cluster (which you appear to have), then further analysis may be required to understand the behavior.

Immediately after starting up the soa_server, are all composites 100% loaded and available? Or are there composites in an unknown state at the time?

Anonymous said...

Be careful upgrading to 11.1.1.4.0. There are a number of regressions (things that worked in 11.1.1.3 that no longer work) - some of these bugs have been logged and accepted by oracle but the process of logging defects with Oracle is very onerous and the support staff are very slow to pick up on the isues..

Anonymous said...

I am getting Issue #4: Unable to retrieve composite details, after doing load test. After doing load tests , i am restarting the soa_server. the composite deploy will fail and issue #4 reported on em console.
I have developed my composite using 11.1.1.3 JDev , but now I am using 11.1.15.0 SOA Suite. But the problem still exists. What might be wrong. Please help me.
Madhu

Sudheer said...

Hi Ahmed,

Wonderful blog about SOA suite.

We are using 11.1.1.5.0 version. But most of the issues(1, 2, 4, 8, 10 and 14) persists in our case.

Any idea/help would be greatly appreciated.

Thanks in advance,
Sudheer

Ahmed Aboulnaga said...

Hi Sudheer,

Please contact me through the 'Contact Us' form above, and we'll see what's going on.

André said...

Hey Ahmed.

Congrats on great blog.
Regarding this specific post I wonder what was your source? I was trying to find bugfix release notes for 11.1.1.4 but the link seems to be broken:
https://support.us.oracle.com/oip/faces/secure/km/DocumentDisplay.jspx?id=1289147.1

You got the document or any other source? I need to prove that this is a version related bug.

Thanks for you help,
André

John Marson said...

Thanks a lot for this useful information. I appreciate everything you shared with us and definitely like your content strategy.
Latest Breaking News