Sunday, October 12, 2008

Behavior of BPEL processes in a BPEL Process Manager cluster

This post will move permanently to here:

For one of my customers, I was involved in extensive high availability testing of an Oracle SOA Suite 10g ( cluster in order to determine the behaviour of BPEL processes during failover of BPEL Process Manager.

Essentially, various scenarios were designed, configured, and tested. The various failover tests included:

  1. Behavior of Synchronous Transactions
  2. Behavior of Asynchronous Transactions
  3. Behavior of Queue-based Transactions
  4. Behavior of Asynchronous Callback Transactions
  5. Behavior when External Asynchronous Callback Services are invoked from an Internal Client
  6. Behavior when Internal Asynchronous Callback Services are invoked from Internal Client
  7. Behavior when Internal Asynchronous Callback Services are invoked from External Client
  8. Behavior during OHS failover
Today, I will be briefly describing test cases 2 and 7.

Test Case 2: Asynchronous Transactions

This scenario is simple: An external client invokes an asynchronous BPEL process.

If the asynchronous BPEL process is in-flight on the first node and that node crashes (e.g., OC4J_SOA dies), the process is automatically resumed on the second container and runs successfully to completion. Since no response is expected by the client, operations resume normally.

Test Case 7: Invoke Internal Asynchronous Callback Service from External Client

This scenario is somewhat complicated: An external client synchronously invokes a BPEL process which in turn invokes another asynchronous BPEL process.

In normal operations, one of the following occurs when the callback is received successfully by the first BPEL process:

  • A synchronous response is returned to the client.
  • No response is sent to the client, in which the client eventually times out.
Therefore, this scenario has a success rate of (100 – (100 / # of OC4J containers in cluster))% during normal operations.

The diagrams below detailed the following:

  • Red lines --> Request made by client.
  • Blue lines --> Callback from 2nd BPEL process.
  • Green lines --> Synchronous response to client.


gg said...

Hi Ahmed,

We have a process that is almost similar to TestCase:7, when it was deployed on a cluster, consumers were not getting response some times and they were timed we decided not to deploy this process in a clustered environment and we raised an SR with Oracle for a solution..i am just wondering if you have found any solution for this..would appreciate your input.


gg said...

This comment has been removed by the author.

Ahmed said...


The process will finish execution in all cases, but the client will either receive a response or it will time out, as you have been experiencing.

Unfortunately, the only recommendation is not to use asynchronous callback for process development in a clustered environment.

Liviu Florin said...

We faced the same issue, but managed to solve it by doing the callback to the actual node where ohs thread is still waiting.

There are few steps:

- declare variable

variable name="replyTo" messageType="ns1:WSAReplyToHeader"/

where ns1 is the namespace for Async partnerlink. It should not matter which one, as the type is the same.

- assigned values to the variable. Make sure the Address contains partnerlink and role of the requester

- use the variable when doing the invoke

invoke name="Invoke_NoFeeProvider"

- modify bpel.xml to contain

property name="optSoapShortcut"

for the partnerlink of async process.

- modify mod_oc4j.conf to enable local affinity when apache calls oc4j.

Oc4jSet StatusUri /oc4j-status
Oc4jSelectMethod roundrobin:local

Ahmed said...

That should definitely work, as long as OHS is routing to a single OC4J.

Thanks Liviu!

Anonymous said...

Thanks for this interesting and helpful post though I read it only now(during 11g). I'm interested in knowing the test results for scenario 5 & 6, could you please post them?

Anonymous said...

This is a great thread. We have the same issue in our clustered 10g environment. We got around it by setting the soapCallbackUrl to the local server name in each node's bpel\...\collaxa-config.xml. this ensures that the righ node will receive the callback. Obviously this limit's the HA benefits in pure Async flows.
We were told that this will not work in 11g, and if you want a Sync service to call Async services, you'll need to do it in OSB. Anyone aware of any other solutions? This is a very popular pattern in our environment and one of the key reasons we purchased SOA 10g.

sambasiva sura said...

Hi team,
I am facing Test Case 2: Asynchronous Transactions .Actually i am call one Asynchronous process into another Asynchronous process and i am not getting response from same node1.Could please tell me how to i ensure call back to same node.

sambasiva sura said...

I am using 11g environment

Ahmed Aboulnaga said...

Please note that this blog post has been permanently moved to:

I posted my response there.