How to broadcast data to all Google App Engine instances?

How to broadcast data to all Google App Engine instances? - java

For the sake of simplicity, let's say my app needs to allow thousands of users to see a real-time read-only stream of a chat room. The host can type messages, but no other users can—they just see what's being typed by the hosts, in real time. Imagine users are following a textual play-by-play of a sporting event.
Each user checks for new messages by polling once per second using a simple /get-recent-messages call to the GAE server. (Before you ask, I believe using Google's Channels API would be far too expensive.)
Considering that this app is used by many thousands of users simultaneously, which means dozens to hundreds of GAE instances are running, how can I get these /get-recent-messages calls to return the latest chat room messages with less than 1000 ms latency, while minimizing server load and GAE costs?
Some ideas I had:
Store chat messages in datastore entities.
Obviously this is way too slow and expensive, especially if using queries/indexes
Store chat messages in memcache keys. I imagine I'd use one key to store the list of keys for the last 50 messages, and then 50 more keys, one for each message.
This would cause huge bottlenecks because App Engine's memcache shards by key and thus all 500 instances would be constantly reading from the same memcache keys and thus the same memcache server.
Store chat messages in instance memory, backed by memcache. And pull from memcache (like in #2) when instance memory is stale.
This would probably result in an expensive race condition when multiple requests see stale instance memory cache and pull from memcache simultaneously.
Use a background thread to update instance memory from memcache. This could run once per second per instance using a thread started in the warmup request. It would work like #3 but with only one thread pulling instead of random requests triggering memcache reads.
I don't think this is how background threads work on App Engine. I also don't really know if this is an appropriate use of warmup requests.
Use Google's Pub/Sub service.
I have no idea how this works and it seems like it could be overkill for this use case.
Run a once-per-second cron job to pull from memcache. This would be like #4 except without relying on background threads.
I'd need this to run on every instance every second. I don't believe the cron/taskqueue API has a way to run a job or task on all active instances.
Thoughts?

You should check this video.
I would go for the memcache/datastore version and a small amount of cache (1-2 sec) so you can reduce the amount of instances you need to serve the traffic.
If you still need like 100-500 instances to serve your traffic, i would still go for memcache/datastore version. If memcache is a bottleneck for you, shard it in like 10 keys.
Another solution is to use Compute Engine and a web server that you can connect your users via sockets. You can talk to your compute instances either via HTTP and store the value in memory or using pull queues.
If you really need to communicate to all the instances, take a look at communicating between modules
Pub/sub might be a a good option for you to communicate between the instance that publishes new messages and the instances that read the new messages. From what i read in the docs, you should be able to subscribe your users directly to Pub/Sub too (pull only tho).

Related

Update vs. Request-Reply in Storm DRPC

I'm building a real-time API handling 2 types of calls:
Updates,
Computation requests.
Internally, the updates are broadcasted among workers. The workers keep working data structures (such as hash-tables) in their RAM, and modify the contents as the updates are coming.
When a computation request comes, exactly one idle worker handles it, using multiple threads, working with the local copy in RAM.
I'm wondering whether I could migrate my current implementation to Storm. As I understand it, Storm is pretty real-time and could help me a lot with scalability and fault-tolerance.
Currently, I'm using UWSGI/Python to handle the API requests, and Java workers to do the computation. I'm thinking of putting the Java workers into the Storm topology as bolts. However, I'm not quite sure about the spouts.
As I understand it, I could use DRPC to handle the computation requests, just by connecting to a DRPC server from python. It is clearly written in the docs that DRPC can handle the whole life-cycle of the request-reply paradigm. But what about updates?
My question is: Is it a good idea (or is it even possible?) to use DRCP to only submit updates in non-blocking manner, not waiting for replies (because there are no results)?

For Non blocking, asynchronous Updates you should use a Job Server like Gearman
This will enable you to submit and need not to wait for any response. Gearman is used by Instagram to share photos to Facebook/Twitter whenever a user uploads a photo using Instagram app.

Architectural issue with Tomcat cluster environment

I am working on project in which we have an authentication mechanism. We are following the below steps in the authentication mechanism.
The user opens a browser and enter his/her email in a text box and click the login button.
The request goes to a server. We generate a random string (for example, 123456) and send a notification to the user's Android/iPhone and makes the the current thread wait with the help of the wait() method.
The user enters a password on his/her phone and clicks the submit button on his/her phone.
Once the user clicks the submit button, we are making a webservice hit the server and passing the previously generated string (for example, 123456) and password.
If the password is correct against the previously entered email, we call the notify() method to the previously waiting thread and send success as the response and the user gets entered into our system.
If the password is incorrect against the previously entered email, we call the notify() method to the previously waiting thread and send failed as the response and display an invalid credential message to the user.
Everything is working fine, but recently we moved to a clustered environment. We found that some threads are not notified even after replied by the user and for an unlimited waiting time.
For the server, we are using Tomcat 5.5, and we are following The Apache Tomcat 5.5 Servlet/JSP Container for making tomcat cluster environment.
Answer :: Possible problem and solution
The possible problem is the multiple JVMs in a clustered environment. Now we are also sending the clustered Tomcat URL to the user Android application along with generated string.
And when the user clicks on the reply button, we are sending the generated string along with the clustered Tomcat URL so in this case both requests are going to the same JVM, and it works fine.
But I am wondering if there is a single solution for the above issue.
There is a problem in this solution. What happens if the clustered Tomcat crashes? The load balancer will send a request to the second clustered Tomcat and again the same problem will arise.

The underlying reason for your problems is that Java EE was designed to work in a different way - attempting to block/wait on a service thread is one of the important no-no's. I'll give the reason for this first, and how to solve the issue after that.
Java EE (both the web and EJB tier) is designed to be able to scale to very large size (hundreds of computers in a cluster). However, in order to do that, the designers had to make the following assumptions, which are specific limitations on how to code:
Transactions are:
Short lived (eg don't block or wait for periods greater than a second or so)
Independent of each other (eg no communication between threads)
For EJBs, managed by the container
All user state is maintained in specific data storage containers, including:
A data store accessed through, eg, JDBC. You can use a traditional SQL database or a NoSQL backend
Stateful session beans, if you use EJBs. Think of these as Java Bean that persists its fields to a database. Stateful session beans are managed by the container
Web session This is a key-value store (kinda like a NoSQL database but without the scale or search capabilities) that persists data for a specific user over their session. It's managed by the Java EE container and has the following properties:
It will automatically relocate if the node crashes in a cluster
Users can have more than one current web session (i.e. on two different browsers)
Web sessions end when the user ends their session by logging out, or when the session is inactive for longer than the configurable timeout.
All values that are stored must be serializable for them to be persisted or transfered between nodes in a cluster.
If we follow those rules, the Java EE container can successfully manage a cluster, including shutting down nodes, starting new ones and migrating user sessions, without any specific developer code. Developers write the graphical interface and the business logic - all the 'plumbing' is managed by configurable container features.
Also, at run time, the Java EE container can be monitored and managed by some pretty sophisticated software that can trace application performance and behavioural issues on a live system.
< snark >Well, that was the theory. Practice suggests there are pretty important limitations that were missed, which lead to AOSP and code injection techniques, but that's another story < /snark >
[There are many discussions around the 'net on this. One which focuses on EJBs is here: Why is spawning threads in Java EE container discouraged? Exactly the same is true for web containers such as Tomcat]
Sorry for the essay - but this is important to your problem. Because of the limitations on threads, you should not block on the web request waiting for another, later request.
Another problem with the current design is what should happen if the user becomes disconnected from the network, runs out of power, or simply decides to give up? Presumably you will time out, but after how long? Just too soon for some customers, perhaps, which will cause satisfaction problems. If the timeout is too long, you could end up blocking all worker threads in Tomcat and the server will freeze. This opens your organisation up for a denial of service attack.
EDIT : Improved suggestions after a more detailed description of the algorithm was published.
Notwithstanding the discussion above on the bad practice of blocking a web worker thread and also the possible denial of service, it's clear that the user is presented with a small time window in which to react to the the notification on the Android phone, and this can be kept reasonably small to enhance security. This time window can also be kept below Tomcat's timeout for responses as well. So the thread blocking approach could be used.
There are two ways this problem can be resolved:
Change the focus of the solution to the client end - polling the server using Javascript on the browser
Communication between nodes in the cluster allowing the node receiving the authorization response from the Android App to unblock the node blocking the servlet's response.
For approach 1, the browser polls the server via Javascript with an AJAX call to a web service on Tomcat; the AJAX call returns True if the Android app authenticated. Advantage: client side, minimal implementation on the server, no thread blocking on the server. Disadvantages: During the waiting period, you have to make frequent calls (maybe one a second - the user will not notice this latency) which amounts to a lot of calls and some additional load on the server.
For approach 2, there is again choice:
Block the thread with an Object.wait() optionally storing the node ID, IP or other identifier in a shared data store: If so, the node receiving the Android app authorization needs to:
Either find the node that is currently blocking or broadcast to all nodes in the cluster
For each node in 1. above, send a message that identifies the user session to unblock. The message could be sent via:
Have an internal-only servlet on each node - this is called by the servlet performing the Android app authorization. The internal servlet will call Object.notify on the correct thread
Use a JMS pub-sub message queue to broadcast to all members of the cluster. Each node is a subscriber that, on receipt of a notification will call Object.notify() on the correct thread.
Poll a data store until the thread is authorized to continue: In this case, all the Android app needs to do is save the state in a SQL DB

Using wait/notify can be tricky. Remember that any thread can be suspended at any time. So it's possible for notify to be called before wait, in which case wait will then block for ever.
I wouldn't expect this in your case, as you have user interaction involved. But for the type of synchronisation you are doing, try using a Semaphore. Create a Semaphore with 0 (zero) quantity. The waiting thread calls acquire() and it will block until another thread calls release().
Using Semaphore in this way is much more robust that wait/notify for the task you described.

Consider using an in-memory grid so that the instances in the cluster can share state. We used Hazelcast to share data between instances so in case a response reaches a different instance it still can handle it.
E.g. you could use distributed countdown latch with value of 1 to set the thread waiting after sending the message, and when the response arrives from the client to a separate instance it can decrease, that instance can decrease the latch to 0 letting to run the first thread.

Your clustered deployment means that any node in the cluster could receive any response.
Using wait/notify using threads for a web app risks accumulating a lot of threads that may not be notified which could leak memory or create a lot of blocked threads. This could eventually affect the reliability of your server.
A more robust solution would be to send the request to the android app and store the current state of the users request for later processing and complete the HTTP request. To store the state you could consider:
A database that all tomcat nodes connect to
A java cache solution that will work across tomcat nodes like hazelcast
This state would be visible to all nodes in your tomcat cluster.
When the reply from the android app arrives on a different node, restore the state of what your thread was doing and continue processing on that node.
If the UI of the application is waiting on a response from the server, you might consider using an ajax request to poll for the response state from the server. The node processing the android app response does not need to be the same one handling UI requests.

Using Thread.wait in a web service environment is a colossal mistake. Instead, maintain a database of user/token pairs and expire them at intervals.
If you want a cluster, then use a database that is clusterable. I would recommend something like memcached since it's in-memory (and fast) and low on overhead (key/value pairs are dead simple, so you don't need RDBMS, etc.). memcached handles expiration of tokens for you already, so it seems like a perfect fit.
I think the username -> token -> password strategy is unnecessary, especially because you have two different components sharing the same 2-factor authentication responsibility. I think you can further reduce your complexity, reduce confusion for your users, and save yourself some money in SMS-send fees.
The interaction with your web service is simple:
User logs into your website using username + password
If primary authentication (username/password) is successful, generate a token and insert userid=token into memcached
Send the token to the user's phone
Present "enter token" page to the user
User receives token via phone and enters it into the form
Fetch the token value from memcached based upon the user's id. If it matches, expire the token in memcached and consider the second-factor successful
Tokens will auto-expire after whatever amount of time you want to set in memcached
There are no threading problems with the above solution and it will scale across as many JVMs as you need to support your own software.

After analysing your question, I came to the conclusion that the exact problem is of multiple JVMs in a clustered environment.

The exact problem is because of the cluster environment. Both requests are not going to the same JVM. But we know that a normal/simple notify works on the same JVM when the previous thread is waiting.
You should try to execute both requests (first request, second request when the user replies from an Android application).

I'm afraid, but threads cannot migrate over classic Java EE clusters.
You have to rethink your architecture to implement the wait/notify differently (connection-less).
Or, you may give it a try with terracotta.org. It looks like this allows to cluster an entire JVM process over multiple machines. Maybe it's your only solution.
Read a quick introduction in Introduction to OpenTerracotta.

I guess the problem is, your first thread sends a notification to the user's Android application in JVM 1 and when the user reply back, the control goes to JVM 2. And that's the main problem.
Somehow, both threads can access the same JVM to apply wait and notify logic.

Solution:
Create a single point of contact for all waiting threads. Hence in a clustered environment, all the threads will wait on a third JVM (single point of contact), so in this way all the requests (any clustered Tomcat) will contact the same JVM for waiting and notify logic and hence no thread will wait for an unlimited time. If there is a reply, then the thread will be notified if the same object has waited and is being notified the second time.

Write data fast to a remote database

I have an app which will generate 5 - 10 new database records in one host each second.
The records don't need any checks. They just have to be recorded in a remote database.
I'm using Java for the client app.
The database is behind a server.
The sending data can't make the app wait. So probably sending each single record to the remote server, at least synchronously, it's not good.
Sending data must not fail. My app doesn't need an answer from the server, but it has to be 100% secure that it arrives at the server correctly (which should be guaranteed using for example http url connection (TCP) ...?).
I thought about few approaches for this:
Run the send data code in separate thread.
Store the data only in memory and send to database after certain count.
Store the data in a local database and send / pulled by the server by request.
All of this makes sense, but I'm a noob on this, and maybe there's some standard approach which I'm missing and makes things easier. Not sure about way to go.

Your requirements aren't very clear. My best answer is to go through your question, and try to point you in the right direction on a point-by-point basis.
"The records don't need any checks," and "My app doesn't need an answer, but it has to be 100% secure that it arrives at the server correctly."
How exactly are you planning on the client knowing that the data was received without sending a response? You should always plan to write exception handling into your app, and deal with a situation where the client's connection, or the data it sends, is dropped for some reason. These two statements you've made seem to be in conflict with one another; you don't need a response, but you need to know that the data arrives? Is your app going to use a crystal ball to devine confirmation of the data being received (if so, please send me such a crystal ball - I'd like to use it to short the stock market).
"Run the send data code in a separate thread," and "store the data in memory and send later," and "store the data locally and have it pulled by the server", and "sending data can't make my app wait".
Ok, so it sounds like you want non-blocking I/O. But the reality is, even with non-blocking I/O it still takes some amount of time to actually send the data. My question is, why are you asking for non-blocking and/or fast I/O? If data transfers were simply extremely fast, would it really matter if it wasn't also non-blocking? This is a design decision on your part, but it's not clear from your question why you need this, so I'm just throwing it out there.
As far as putting the data in memory and sending it later, that's not really non-blocking, or multi-tasking; that's just putting off the work until some future time. I consider that software procrastination. This method doesn't reduce the amount of time or work your app needs to do in order to process that data, it just puts it off to some future date. This doesn't gain you anything unless there's some benefit to "batching" data sending into large chunks.
The in-memory idea also sounds like a temporary buffer. Many of the I/O stream implementations are going to have a buffer built in, as well as the buffer on your network card, as well as the buffer on your router, etc., etc. Adding another buffer in your code doesn't seem to make any sense on the surface, unless you can justify why you think this will help. That is, what actual, experienced problem are you trying to solve by introducing a buffer? Also, depending on how you're sending this data (i.e. which network I/O classes you choose) you might get non-blocking I/O included as part of the class implementation.
Next, as for sending the data on a separate thread, that's fine if you need non-blocking I/O, but (1) you need to justify why that's a good idea in terms of the design of your software before you go down that route, because it adds complication to your app, so unless it solves a specific, real problem (i.e. you have a UI in your app that shouldn't get frozen/unresponsive due to pending I/O operations), then it's just added complication and you won't get any added performance out of it. (2) There's a common temptation to use threads to, again, basically procrastinate work. Putting the work off onto another thread doesn't reduce the total amount of work needing to be done, or the total amount of I/O your app will consume in order to accomplish its function - it just puts it off on another thread. There are times when this is highly beneficial, and maybe it's the right decision for your app, but from your description I see a lot of requested features, but not the justification (or explanation of the problem you're trying to solve) that backup these feature/design choices, which is what should ultimately drive the direction you choose to go.
Finally, as far as having the server "pull" it instead of it being pushed to the server, well, all you're doing here is flipping the roles, and making the server act as a client, and the client the server. Realize that "client" and "server" are relative terms, and the server is the thing that's providing the service. Simply flipping the roles around doesn't really change anything - it just flips the client/server roles from one part of the software to the other. The labels themselves are just that - labels - a convenient way to know which piece is providing the service, and which piece is consuming the service (the client).
"I have an app which will generate 5 - 10 new database records in one host each second."
This shouldn't be a problem. Any decent DB server will treat this sort of work as extremely low load. The bigger concern in terms of speed/responsiveness from the server will be things like network latency (assuming you're transferring this data over a network) and other factors regarding your I/O choices that will affect whether or not you can write 5-10 records per second - that is, your overall throughput.

The canonical, if unfortunately enterprisey, answer to this is to use a durable message queue. Your app would send messages to the queue, and a backend app would receiver and store them in a database. Once the queue has accepted a message, it guarantees that it will be made available to the receiver, even if the sender, receiver, or the queue broker itself crash.
On my machine, using HornetQ, it takes ~1 ms to construct and send a short text message to a durable queue. That's quick enough that you can do it as part of handling a web request without adding any noticeable additional delay. Any good message queue will support your 10 messages per second throughput. HornetQ has been benchmarked as handling 8.2 million messages per second.
I should add that message queues are not that hard to set up and use. I downloaded HornetQ, and had it up and running in a few minutes. The code needed to create a queue (using the native HornetQ API) and send and receive messages (using the JMS API) is less than a hundred lines.

If you queue the data and send it in a thread, it should be fine if your rate is 5-10 per second and there's only one client. If you have multiple clients, to the point where your database inserts begin to get slow, you could have a problem; given your requirement of "sending data must not fail." Which is a much more difficult requirement, especially in the face of machine or network failure.
Consider the following scenario. You have more clients than your database can handle efficiently, and one of your users is a fast typist. Inserts begin to back up in-memory in their app. They finish their work and shut it down before the last ones are actually uploaded to the database. Or, the machine crashes before the data is sent - or while its sending; or worse yet, the database crashes while its sending, and due to network issues the client can't really tell that its transaction has not completed.
The easy way avoid these problems (most of them anyway), is to make the user wait until the data is committed somewhere before allowing them to continue. If you can make the database inserts fast enough then you can stick with a simpler scheme. If not, then you have to be more creative.
For example, you could locally write the data to disk when the user hits submit, and then upload it from another thread. This scenario needs to be smart enough to mark something that is persisted as sent (deleting it would work); and have the ability to re-scan at startup and look for unsent work to send. It also needs the ability to keep trying in the case of network or centralized server failure.
There also needs to be a way for the server side to detect duplicates. Because the client machine could send the data and crash before it can mark it as sent; and then upon restart it would send it again. The same situation could occur if there is a bad network connection. The client could send it and never receive confirmation from the server; time out and then end up retrying it.

If you don't want the client app to block, then yes, you need to send the data from a different thread.
Once you've done that, then the only thing that matters is whether you're able to send records to the database at least as fast as you're generating them. I'd start off by getting it working sending them one-by-one, then if that isn't sufficient, put them into an in-memory queue and update in batches. It's hard to say more, since you don't give us any idea what is determining the rate at which records are generated.
You don't say how you're writing to the database... JDBC? ORM like Hibernate? But the principles are the same.

How to optimize number of database connections?

We have a Java (Spring) web application with Tomcat servlet container.
We have a something like blog.
But the blog must load its posts dynamically with Ajax.
The client's ajax script checks for new posts every second.
I.e. Ajax must ask the server for new posts every second and it will be very heavy for database.
But what if we have hundreds of thousands connects simultaneously?
I think that we must retrieve all posts with cron every second and after that save it somewhere. But where? The main idea is to unload the database.
Any ideas about architecture?
Thanks in advance!

There is other architecture for polling that could be more optimal, depending on the case:
Long polling
Long polling is a variation of the
traditional polling technique and
allows emulation of an information
push from a server to a client. With
long polling, the client requests
information from the server in a
similar way to a normal poll. However,
if the server does not have any
information available for the client,
instead of sending an empty response,
the server holds the request and waits
for some information to be available.
Once the information becomes available
(or after a suitable timeout), a
complete response is sent to the
client. The client will normally then
immediately re-request information
from the server, so that the server
will almost always have an available
waiting request that it can use to
deliver data in response to an event.
In a web/AJAX context, long polling is
also known as Comet programming.
Long Polling
Example of Implementations of this technology:
Push Server
You could also use the observer pattern to register the requests, and notify them when an update is done.

Hundreds of thousands of concurrent users all polling our site every second makes for a huge amount of traffic. If you truly expect this load you are going to have to design your platform accordingly, probably by clustering multiple web, application and DB servers.
Remember that with a database connection pool you don't need a DB connection for every user.

I'm not as familiar with Tomcat, but in WebSphere we can set up connection pools to prepare a certain number of connections.
Also, are you mainly worried about reads or the same number of writes?
Plus, you may also want to have the database "split" depending on region etc. This way there is no single heavy load across the entire database, but it can then be split and even load balanced.
There is also the "NoSQL" databases to look into as well. Maybe something to consider. Just ideas to help out.

Ensuring serial processing of JMS messages in an OC4J cluster

We have an application that processes JMS message using a message driven bean. This application is deployed on an OC4J application server. (10.1.3)
We are planning to deploy this application on multiple OC4J application servers that will be configured to run in a cluster.
The problem is with JMS message processing in this cluster. We must ensure, that only a single message is being processed in the entire OC4J cluster at a single time. This is required, since the messages have to be processed in chronological order.
Do you know of a configuration parameter, that would control message processing across an OC4J cluster?
Or do you think we have to implement our own synchronisation code that will synchronise the message driven beans across the cluster?

I've done sequential processing of messages in a cluster on a pretty large scale - 1.5 million+ message/day, using a combination of the Competing Consumers pattern and a Lease pattern.
Here's the kicker, though - your requirement that you can only process one trans at a time is going to keep you from achieving your goals. We had the same basic requirement - messages had to be processed in order. At least, we thought we did. Then we had an epiphany - as we gave the problem more thought, we realized that we didn't require total ordering. We actually required ordering only within each account. Therefore, we could distribute the load across the servers in a cluster by assigning ranges of accounts to different servers in the cluster. Then, each server was responsible to process messages for a given account in order.
Here's the second clever part - we used a Lease pattern do dynamically assign account ranges to various servers in the cluster. If one server in the cluster went down, another would grab the lease and take over the first server's responsibility.
This worked for us, and the process lived in production for about 4 years before being replaced due to a company merger.
Edit:
I explain this solution in more detail here: http://coders-log.blogspot.com/2008/12/favorite-projects-series-installment-2.html
Edit:
Okay, gotcha. You're already doing the processing at the level you need, but since you're being deployed to a cluster, you need to make sure that only one instance of your MDB is actively pulling messages from the queue. Plus, you need the simplest workable solution.
You don't need to abandon your MDB mechanism that you have now, I don't think. Essentially what we're talking about here is a requirement for a distributed lock mechanism, not to put too fancy a phrase to it.
So, let me suggest this. At the point where your MDB registers to receive messages from the queue, it should check the distributed lock, and see if it can grab it. The first MDB to grab the lock wins, and only it will register to receive messages. So, now you have your serialization. What form should this lock take? There are many possibilities. Well, how about this. If you have access to a database, its transactional locking already provides some of what you need. Create a table with a single row. In the row is the identifier of the server that currently holds the lock, and an expiration time. This is the server's lease. Each server needs to have a way to generate its unique identifier, perhaps the server name plus a thread ID, for example.
If a server can get update access to the row, and the lease is expired, it should grab it. Otherwise, it gives up. If it grabs the lease, it needs to update the row with a time in the near future, like five minutes or so, and commit the update. The active server should update the lease before it expires. I recommend updating it when there's half the time remaining, so, every 2-1/2 minutes if the lease expires in five. With this, you now have failover. If the active MDB dies, another MDB (and only one) will take over.
That should be pretty straightforward, I think. Now, you want to have the dormant MDBs check the lock occasionally to see if it's freed up.
So, the active MDB and the dormant MDBs all have to do something periodically. You might have them spawn a separate thread to do this. Many application engine vendors won't be happy if you do this, but adding one thread is no big deal, especially since it spends most of its time sleeping. Another option would be to tie into the timer mechanism that many engines provide, and have it wake up your MDB periodically to check the lease.
Oh, and by the way - make sure the server admins employ NTP to keep the clocks reasonably synced.

First point: this is a pretty crappy design and you'll seriously limit performance only being able to process a single message at a time. I assume you are clustering only for fault tolerance, because you won't get performance improvements?
Are you using the default JMS implementation with OC4J or another one?
I've used IBM's MQ in the past and that had a feature that a queue could be marked as exclusive, which meant only one client could connect to it. This would appear to offer what you want.
An alternative would be to introduce a sequence ID (as simple as an incrementing counter) and the client processing the message would check that the sequence ID is the next expected value, if not then the message put back. This approach requires the different clients to persist the last valid sequence ID they've seen in some centrally shared data store, such as a database.

I agree with stevendick: May be you're off track with the design. Regarding sequence ID or similar approachs I suggest you get insight on messaging architectures with Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions (by Gregor Hohpe y Bobby Woolf). It's a great book, plenty of useful patterns... I'm sure the forces and the problem you are facing are well described there.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.