Creating Java jobs pool - java

My application JavaEE app is backend service of mobile clients so the clients must be registered to backend service, there are lots of database process and different kinds of jobs in the registration process, to improve performance I am plannig to create job pool, for example when client is registering to backend service their jobs pushed to pool until the pool size getting full. If the pull size is full, jobs will be processed... Is there any suitable way to implement this idea ?
thanks,

What is the reason for waiting until you accumulate a big block instead of quickly processing small chunks? Performance-wise this is almost always better, not even speaking of transactions and such. Plus your clients wait longer than necessary.
If you really want to do it, I'd go for storing all incoming requests in a List, the database or a queue, whatever you prefer and whether it needs to be persistent, and have a periodical job checking for new ones and processing them, if needed only if a certain threshold exceeded.

Related

Detecting when an EC2 instance will be shutting down (Java SDK)

I posted this on the AWS support forums but haven't received a response, so hoping you guys have an idea...
We have an auto-scaling group which boots up or terminates an instance based on current load. What I'd like to be able to do it detect, on my current EC2 instance, that it's about to be shut down and to finish my work.
To describe the situation in more detail. We have an auto-scaling group, and each instance reads content from a single SQS. Each instance will be running multiple threads, each thread is reading from the same SQS queue and processing the data as needed.
I need to know when this instance will be about to shut down, so I can stop new threads from reading data, and block the shutdown until the remaining data has finished processing.
I'm not sure how I can do this in the Java SDK, and I'm worried my instances will be terminated without my data being processed correctly.
Thanks
Lee
When it wants to scale down, AWS Auto Scaling will terminate your EC2 instances without warning.
There's no way to allow any queue workers to drain before terminating.
If your workers are processing messages transactionally, and you're not deleting messages from SQS until after they have been successfully processed, then this shouldn't be a problem. The processing will stop when the instance is terminated, and the transaction will not commit. The message won't be deleted from the SQS queue, and can be picked up and processed by another worker later on.
The only kind of draining behavior it supports is HTTP connection draining from an ELB: "If connection draining is enabled for your load balancer, Auto Scaling waits for the in-flight requests to complete or for the maximum timeout to expire, whichever comes first, before terminating instances".

Passing around DB connection objects between multiple map-reduce jobs

Fundamentally, this question is about: Can the same DB connection be used across multiple processes (as different map-reduce jobs are in real different independent processes).
I know that this is a little trivial question but it would be great if somebody can answer this as well: What happens in case if the maximum number of connections to the DB(which is preconfigured on the server hosting the DB) have exhausted and a new process tries to get a new connection? Does it wait for sometime, and if yes, is there a way to set a timeout for this wait period. I am talking in terms of a PostGres DB in this particular case and the language used for talking to the DB is java.
To give you a context of the problem, I have multiple map-reduce jobs (about 40 reducers) running in parallel, each wanting to update a PostGres DB. How do I efficiently manage these DB read/writes from these processes. Note: The DB is hosted on a separate machine independent of where the map reduce job is running.
Connection pooling is one option but it can be very inefficient at times especially for several reads/writes per second.
Can the same DB connection be used across multiple processes
No, not in any sane or reliable way. You could use a broker process, but then you'd be one step away from inventing a connection pool anyway.
What happens in case if the maximum number of connections to the
DB(which is preconfigured on the server hosting the DB) have exhausted
and a new process tries to get a new connection?
The connection attempt fails with SQLSTATE 53300 too_many_connections. If it waited, the server could exhaust other limits and begin to have issues servicing existing clients.
For a problem like this you'd usually use tools like C3P0 or DBCP that do in-JVM pooling, but this won't work when you have multiple JVMs.
What you need to do is to use an external connection pool like PgBouncer or PgPool-II to maintain a set of lightweight connections from your workers. The pooler then has a smaller number of real server connections and shares those between the lightweight connections from clients.
Connection pooling is typically more efficient than not pooling, because it allows you to optimise the number of active PostgreSQL worker processes to the hardware and workload, providing admission control for work.
An alternative is to have a writer process with one or more threads (one connection per thread) that takes finished work from the reduce workers and writes to the DB, so the reduce workers can get on to their next unit of work. You'd need to have a way to tell the reduce workers to wait if the writer got too far behind. There are several Java queueing system implementations that would be suitable for this, or you could use JMS.
See IPC Suggestion for lots of small data
It's also worth optimizing how you write to PostgreSQL as much as possible, using:
Prepared statements
A commit_delay
synchronous_commit = 'off' if you can afford to lose a few transactions if the server crashes
Batching work into bigger transactions
COPY or multi-valued INSERTs to insert blocks of data
Decent hardware with a useful disk subsystem, not some Amazon EC2 instance with awful I/O or a RAID 5 box with 5400rpm disks
A proper RAID controller with battery backed write-back cache to reduce the cost of fsync(). Most important if you can't do big batches of work or use a commit delay; has less impact if your fsync rate is lower because of batching and group commit.
See:
http://www.postgresql.org/docs/current/interactive/populate.html
http://www.depesz.com/index.php/2007/07/05/how-to-insert-data-to-database-as-fast-as-possible/

JMS Multiple threads processing

We have a web application that is generating some 3-5 parallel threads every five seconds to connect to a JMS/JNDI connection pool. We wait for the first batch of parallel threads to complete before creating next batch of parallel threads. During this process we are using a lot of network traffic and connection threads are just hanging. Eventually we manually call operations team to kill the connection threads to free up connections.
Question I wanted to ask you is:
Obviously we are doing something wrong as we are holding up connection resources
When we wait for parallel threads to respond before sending second batch of requests,Does this design not resonate well with industry best practices?
Finally what are the options and recommendations you have for this scenario i.e. multiple threads connecting to JMS/JMDI connection
Thanks for your inputs
You need to adjust your connection pool parameters. It sounds like you're using up only 3-5 connections for your service, which seems very reasonable to me. A JMS service should be able to handle thousands of connections. Either your pool's default limit is too low, or your JMS server is configured with too few allowed connections.
Are you sure that's what the other users are blocking on? It seems strange to me.
I'm almost sure that you would be alright with single connection factory. Just make sure clean up/close session properly. We uses spring's SingleConnectionFactory.

How could I quickly know if a server is online?

I'm developing a Java client/server application in which there will be a great number of servers with which the clients are going to have to connect. The problem is that probably the vast majority of them will not be serving at the same time. The client needs to find at least one available in the list, so it will iterate it, looking for an available server (when it finds the first it stops, one is enough).
The problem is that the list will probably be long, tens of zousands, they could be even hundreds... and it may happen that only 1% of them are connected (i.e. executing the server). That's why I need a clever and a fast way to know if a server is connected, without waiting for time-outs or so. I accept all kinds of suggestions.
I have thought about ordering the server list statistically, so that the servers that are available more often are the first hosts attempted. But this is not enough.
Perhaps multicasting UDP datagrams? The connections between clients/servers are TCP, but perhaps to find a server it's better to do an UDP multicast first and wait for the answer, for example... what do you think?
:)
EDIT:
Both the server and client use thread pools.
The server pool handles 200 threads concurrently, and when the pool is full, queues the rest until the queue is 200 runnables long. Then it blocks, and stop accepting connections until there is free room in the queue again.
The client has a cached thread pool, it can make all the request to the server you want concurrently (with common sense, obviously...).
This is just an initial thought and would add some over head, but you could have the servers periodically ping some centralized server which the clients would connect through. Then if the server doesn't ping for some set time it gets removed.
You might want to use a peer-to-peer network.
Have a look at JXTA/JXSE:
http://jxse.kenai.com/index.html
If it is your own code which is running on each of these servers, could you send an alive to a central server (which is controlled by you and is guaranteed to be up at all times)? The central server can then maintain an updated list of all servers which are active. The client just needs a copy of this list from the central server and then start whatever communication it needs.
Sounds like a job for Threads. You cannot speed up the connection, it takes time to contact the server.
IMHO, the best way is to get few hundred Threads to march through the list of servers. The first one to find one server alive wins. Then signal other threads to die out.
Btw, did you really mean to order the server list "sadistically"? :)

JMS queue is full

My Java EE application sends JMS to queue continuously, but sometimes the JMS consumer application stopped receiving JMS. It causes the JMS queue very large even full, that collapses the server.
My server is JBoss or Websphere. Do the application servers provide strategy to remove "timeout" JMS messages?
What is strategy to handle large JMS queue? Thanks!
With any asynchronous messaging you must deal with the "fast producer/slow consumer" problem. There are a number of ways to deal with this.
Add consumers. With WebSphere MQ you can trigger a queue based on depth. Some shops use this to add new consumer instances as queue depth grows. Then as queue depth begins to decline, the extra consumers die off. In this way, consumers can be made to automatically scale to accommodate changing loads. Other brokers generally have similar functionality.
Make the queue and underlying file system really large. This method attempts to absorb peaks in workload entirely in the queue. This is after all what queuing was designed to do in the first place. Problem is, it doesn't scale well and you must allocate disk that 99% of the time will be almost empty.
Expire old messages. If the messages have an expiry set then you can cause them to be cleaned up. Some JMS brokers will do this automatically while on others you may need to browse the queue in order to cause the expired messages to be deleted. Problem with this is that not all messages lose their business value and become eligible for expiry. Most fire-and-forget messages (audit logs, etc.) fall into this category.
Throttle back the producer. When the queue fills, nothing can put new messages to it. In WebSphere MQ the producing application then receives a return code indicating that the queue is full. If the application distinguishes between fatal and transient errors, it can stop and retry.
The key to successfully implementing any of these is that your system be allowed to provide "soft" errors that the application will respond to. For example, many shops will raise the MAXDEPTH parameter of a queue the first time they get a QFULL condition. If the queue depth exceeds the size of the underlying file system the result is that instead of a "soft" error that impacts a single queue the file system fills and the entire node is affected. You are MUCH better off tuning the system so that the queue hits MAXDEPTH well before the file system fills but then also instrumenting the app or other processes to react to the full queue in some way.
But no matter what else you do, option #4 above is mandatory. No matter how much disk you allocate or how many consumer instances you deploy or how quickly you expire messages there is always a possibility that your consumer(s) won't keep up with message production. When this happens your producer app should throttle back, or raise an alarm and stop or do anything other than hang or die. Asynchronous messaging is only asynchronous up to the point that you run out of space to queue messages. After that your apps are synchronous and must gracefully handle that situation, even if that means to (gracefully) shut own.
Sure!
http://download.oracle.com/docs/cd/E17802_01/products/products/jms/javadoc-102a/index.html
Message#setJMSExpiration(long) does exactly what you want.

Categories