Monday, July 16, 2012

5 Reasons Why Oracle WebLogic Server Sucks

"Oracle WebLogic Server 12c is the industry's best application server."

Okay, so Oracle WebLogic Server 11g doesn't suck. The marketplace tends to agree that WebLogic Server is one of the leading Java application servers on the market. In fact, Gartner ranks Oracle as the leader in Enterprise Application Servers, so for once it's not all hype.

I've used many Java application servers throughout the years, including WebSphere and JBoss. But I'm certified in and have 10 years of experience with Oracle Application Server. So this becomes the basis of my comparison.

For those who don't know, when Oracle acquired BEA, they ended up having two enterprise Java application servers; Oracle Application Server and BEA WebLogic Server. After much analysis, Oracle decided to stick to the more superior WebLogic Server and effectively kill Oracle Application Server.

Oracle WebLogic Server is the primary underlying Java application server for the majority of Oracle Fusion Middleware products, and I've had a solid 2 years of extensive experience working with it as a result. There are definite improvements I can definitely see compared to Oracle Application Server, but I also have some major gripes with it.

In this article, I list out the 5 biggest issues with Oracle WebLogic Server 11g.

1. It is not possible to have truly highly available JMS destinations.

A JMS destination can either be a JMS queue or a JMS topic. These can be created in WebLogic Server very easily, and its reliability is excellent. JMS destinations are used extensively for integration development, particularly in point-to-point and publish-subscribe models.

Let's say that I have a 4-node cluster. I have some Java (or SOA) code deployed to all nodes of the cluster. I also have a JMS topic in this cluster.

I expect the following:
  1. If I have 1 consumer to this topic, I expect that despite this code being deployed to all 4 nodes of the cluster, that the message is consumed only once. 
  2. I expect that the message is equally available to all 4 nodes of the cluster, so if any 3 of the nodes fail, the message is still available and can be consumed without manual intervention.
Point #1 is not possible if you set the forwarding policy to "Replicated". Point #2 is not possible if the destination's forwarding policy is set to "Partitioned". Granted these are the only two options available to me, I pretty much can't satisfy the two basic and necessary requirements needed for high availability queues and topics.

In a clustered environment, there is no simple way to create a simple JMS destination that is targeted to all managed servers and accessible via a single JNDI. Yes, I'm aware of all the little workarounds and tricks people do, but this issue is WebLogic Server's biggest failing.

2. The AdminServer must use a floating IP address in a cluster.

In a standard Oracle Application Server 10g installation, a default OC4J "home" J2EE container is created which hosts Application Server Control, which is the primary administration console. Typically, additional OC4J containers are created to host your applications. In a clustered installation, this "home" container should only be started up on a single node. In plain English, you can only have the admin console running on only a single node in your cluster. You cannot start it up on all nodes of the cluster.

Fortunately, if that node is down, you can simply start it up on any of the other active nodes. In fact, you can configure it so that if the "home" container is down, it is automatically started up on an alternate node. Simple and sweet!

In theory, the WebLogic Server Administration Console behaves the same way... it can only be running on a single node in the cluster. But the Oracle documentation states that "the Administration Server must be configured to listen on a floating IP Address to enable it to seamlessly failover from one host to another." What this means is that if the AdminServer is running on host1 and that server crashes, the only way for it to be started up on host2 is if you have a floating IP address that is moved to that second host. Why?! Why add additional annoying and unnecessary manual failover steps?

3. A shared filesystem is required to setup a WebLogic Server cluster.

Basically, there is absolutely no way to setup a valid WebLogic Server cluster except with a shared file system. This was not the case with Oracle Application Server.

At minimum, this shared file system is required to maintain the tlogs (and other file-persistent stores that you may create) in a clustered setup. The Oracle documentation states that "you must set up the default persistent store so that it stores records in a shared storage system that is accessible to any potential machine to which a failed migratable server might be migrated."

I understand that there may be technical reasons why the shared file system is needed, but that's only because it was designed that way. It annoys me to no end to know that there is absolutely no way I can install a cluster without a decently performing NFS share.

4. The AdminServer consumes all CPU resources when a managed server has issues.

If you have your applications deployed to several managed servers, you technically don't have to have the AdminServer running. Essentially, your applications can continue to function just fine with the AdminServer down. This is typically the case with most Java application servers out there.

The AdminServer constantly pings the managed servers to obtain realtime metrics, health information, and statistics. Yet it baffles me that when certain issues happen on one of the managed servers, the AdminServer spins out of control and starts occupying over 95% of the CPU.

So you're telling me that if I have 3 separate managed servers, each hosting completely separate applications, and one of them behaves erratically, that the AdminServer will consume the entire CPU, thus adversely affecting my other independent applications? Yes! That's what happens!

5. It is not possible to handle stuck threads, which occurs quite regularly.

The Oracle documentation states that "WebLogic Server checks for stuck threads periodically." So what is a stuck thread really? All I know is that when I have a stuck thread, I get a "Warning" on my managed server as seen below.


So what am I supposed to do now exactly?

Get this, I tried deploying some code and it hung. I saw a warning similar to what you see above and after some investigation, I confirmed that there was a stuck thread due to the deployment. Ummm... okay? Now what? My deployment is hung, all subsequent deployments are also hanging, and I have no choice but to restart the managed server. So is that the solution every time?

I'm glad that WebLogic Server (compared to Oracle Application Server) gives me the ability to track down what code is associated with the stuck thread, but in almost all cases, I end up having to restart the managed server. I'll admit that I can't talk about the specifics on how threads exactly operate and behave behind the scenes by the engine, but as an administrator, I frankly am at a loss as what to do in this case.

-----

I'm not trying to bash Oracle WebLogic Server here, but these issues need to be looked at in future releases. Over the years, the Oracle Database has continuously been improved to simplify the administration on the DBA by introducing features that automate database management. I hope the WebLogic Server product development team follows their lead.


Ahmed Aboulnaga

5 comments:

Pavan Dev said...

Indeed a great post.... Iam a great follower of your blog..

Absolutely I can agree to little extent , But the thing is what ever the reasons you have posted can be overcomed with little workarounds

1. It is not possible to have truly highly available JMS destinations.

We can use UDD's this is similar to clustering


2. The AdminServer must use a floating IP address in a cluster.

This is totally the way you design your Env , Not necessarily you should have a Floating IP
In our Env we have PRI and SEC admin which are two different hosts



3. A shared filesystem is required to setup a WebLogic Server cluster.

I feel this is the nice feature provided by the product for mission critical projects.

Most of the companies use Database as persistance storage for tlogs which is much more secured.


4. The AdminServer consumes all CPU resources when a managed server has issues.

I am totally with you on this , but most of the times restarting the admin alone should resolve our issues, This won't effect the application as such.



5. It is not possible to handle stuck threads, which occurs quite regularly.

There are different ways to handle this to get to the root of the issue ( Advance diagnostic repository feature in weblogic provide the grunular level server logs) , There might be two cases , Recoverable and Non Recoverable threads yeah if it not recovered restart is the only


On the whole there is lot of scope for improvement on the product which oracle A team should work towards it

Anonymous said...

Hi Pavan!

In regards to #3...

In WebLogic Server 11g, the "default persistent store can only be a file store" and that the "transaction log (TLOG) can only be stored in a default store" (see http://docs.oracle.com/cd/E17904_01/web.1111/e13701/store.htm).

On the other hand, in WebLogic Server 12c, the "TLOGs can be stored in the default persistent store or a JDBC TLOG store" (see http://docs.oracle.com/cd/E24329_01/web.1211/e24432/store.htm)

So I stand corrected. Seems that Oracle has indeed made advancements in this area with 12c.

Thanks for the feedback!

Anonymous said...

Regarding the two items below:
2. The AdminServer must use a floating IP address in a cluster.

This is only true is you want to use an address like myadminsever:port/console to get to the AdminServer. Otherwise you can simply go to node2:port/console, assuming you started the AdminServer on node2 instead of node1. Where an address like myadminsever comes in handy is for developers going to /em (for SOA or OSB).

4. The AdminServer consumes all CPU resources when a managed server has issues.

This is because the AdminServer is trying to communicate with the unstable or unavailable managed server. The default timeout is 300 seconds. Try lowering it 30 seconds. The setting is under Preferences > Shared Preferences > Management Operation Timeout.

Nagendra Reddy said...

I like your honesty by disclosing important facts.thank you.
Regards,
weblogic training in hyderabad.

Anonymous said...

Really amazing This is only true is you want to use an address like my admin sever :port/console to get to the Admin Server. Otherwise you can simply go to node 2 :port/console, assuming you started the Admin Server on node 2 instead of node 1. Where an address like my admin sever comes in handy is for developers...

Weblogic Admin Training