Saturday, July 21, 2012

Understanding Retry Count in OSB 11g


The purpose of this post is to describe the behavior of the Load Balancing Algorithm and the Retry Count settings within the Business Service.


My OSB Project Design

I have a simple OSB project that calls a SOA composite. This SOA composite is deployed onto a 2-node cluster. So I configured my Business Service to route the requests across the two target services as follows:



Expected Behavior

As you can see here, I chose a Load Balancing Algorithm of "round-robin". So I expect the first request from my Business Service to invoke the first server "n01", then "n02", then "n01", and so on. In fact, it works exactly as expected.


Unexpected Behavior? The Failure Scenario

But what if "ns02" is down?

If "ns02" is down, then the Business Service will continue to round robin the requests to both the UP server and the DOWN server. So 50% of my requests will fail.

I would get the following error back in my failed requests:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
   <soapenv:Body>
      <soapenv:Fault>
         <faultcode>soapenv:Server</faultcode>
         <faultstring>BEA-380002: Tried all: '1' addresses, but could not connect over HTTP to server: 'n02', port: '8001'</faultstring>
         <detail>
            <con:fault xmlns:con="http://www.bea.com/wli/sb/context">
               <con:errorCode>BEA-380002</con:errorCode>
               <con:reason>Tried all: '1' addresses, but could not connect over HTTP to server: 'xls001cn02', port: '8001'</con:reason>
               <con:location>
                  <con:node>RouteTo_HelloWorldBS</con:node>
                  <con:path>request-pipeline</con:path>
               </con:location>
            </con:fault>
         </detail>
      </soapenv:Fault>
   </soapenv:Body>
</soapenv:Envelope

Workaround

In my example, simply set Retry Count to a value of 1. Afterwards, if any of the two servers are down, you will get a 100% success response. If the Business Service encounters a failure in the target service, it will retry on the other one.

The Oracle documentation states:
"You can define the retry option for business services. The retry option specifies the maximum number of times a business service can attempt to access endpoint URIs after an initial failure. For example, consider the behavior of a business service B with endpoint URIs eu1, eu2, and eu3..."

When the Retry Count is set to 1 and "if business service B fails to process a request or is unable to access the endpoint URI eu1, it tries to process the request with eu2 (retry 1). If the retry fails then the business service returns failure. The business service does not retry the third endpoint URI eu3."


Summary

What you want to do is the following:
  1. If you have multiple Endpoint URIs in your Business Service, set the Retry Count to a value that is 1 less than the number of your Endpoint URIs.
That way, you will ensure that the request is retried across all endpoint URIs if needed.


Applicable Versions:
  • Oracle Service Bus (OSB) 11g (11.1.1.5)

References:
  • http://docs.oracle.com/cd/E13159_01/osb/docs10gr3/operations/endpointurimgmt.html#wp1075284


Ahmed Aboulnaga

No comments: