I'm new to threading, so I want to understand what is happening behind the scenes when you create a bunch of Threads in a loop and the implications/better ways of doing it.
Here's an example:
for (Page page : book) {
Thread t = new Thread(new Runnable() {
public void run() {
//http request to get page and put into concurrent data structure
}
});
t.start();
threads.add(t);
}
//wait for threads
As you can probably see, in my specific use case right now, I am paging through objects that I request via HTTP. I know there doesn't necessarily need to be threads here and instead I could make async requests, but how (with explanations) how this could be improved.
In your example your are creating and starting a new thread for each Page object you have in your book. This is not useful if you have many more pages than cores on your system.
It's also kinda low-level by now to directly create and start threads and keep track of them.
A better solution would be to use an ExecutorService and create a number of threads close, for example, to the number of cores there is on the system (for I/O bound tasks you may want to create more threads than that: you can check out the comments below this answer).
For example:
final ExecutorService e =
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
for (Page page : book) {
e.submit( new Runnable() {
//http request to get page and put into concurrent data structure}
}
You'd then wait on your ExecutorService to terminate its job.
Note that depending on the server you're fetching your information from, you may need to add, on purpose, delays as to not "hammer" the server too much.
Certain websites will tell you how often you can query them (for example the Bitstamp bitcoin exchange allows one query per second) and will ban your IP if you don't respect the delay. Others won't well you anything and simply ban your IP if they detect that you're leeching too fast.
Related
I'm new to SO/Java software development, and I've been searching for this without much avail.
My question is --in Java-- is it possible to run one main statement across many threads at once? I am writing a native Java application in order to load test a server. The process for this is to have a bunch of threads running at once to simulate users. These threads read from a certain file, get various UIDs, manipulate some standard data, and send this to a queue on the server. After the thread sends the data, we start pulling data from the response queue, and each of the threads that have already sent their data start checking against the UID of the newly returned data, and if it matches, the process outputs the round trip time and terminates.
Algorithmic-ally, that is what I plan to implement, however I don't have much experience with concurrency and using multiple threads, so I'm not sure how I would be able to make the threads run this process. I've seen other work where an array of WorkerThreads is used, and I've read the API for Threads and read various tutorials on concurrency. Any guidance would be helpful.
Thank you!
The recommended way to implement concurrent workers is to use an Executor service. The pattern is something like this:
ExecutorService pool = Executors.newFixedThreadPool(poolSize);
...
while (...) {
final int someParameter = ...
pool.submit(new Runnable() {
public void run() {
// do something using 'someParameter'
}
});
}
This approach takes care of the complicated process of creating and managing a thread pool by hand.
There are numerous variations; see the javadocs for Executors and ExecutorService.
I would like to send an email and update activity logs after updating profile successfully in my web application. For sending mails and updating activity logs, I would like to use thread so that the profile update response can be sent back to the client immediately and the subsequents operations can be taken care by threads. Please suggest an implementation.
There are numerous ways to achieve this, the fact that it's a Spring MVC application is almost irrelevant.
If you're using Java 8 then you can simply call upon the executor service to give you a thread from its pool:
String emailAddress = //get email address...
ExecutorService executorService = Executors.newSingleThreadExecutor();
executorService.submit(() -> {
emailService.sendNotification(emailAddress);
});
Pre-Java 8:
final String emailAddress = "";
Thread thread = new Thread(new Runnable() {
#Override
public void run() {
emailService.sendNotification(emailAddress);
}
});
thread.start();
If you're creating a more complex application then you should look into possibly using a message queue (ActiveMQ is good). This allows you more control and visibility and scales well as you add more asynchronous tasks, it also means you won't starve your application server of threads if there are lots of registrations at the same time.
You can use a BlockingQueue and implement a producer-consumer model to solve the problem. Your existing program acts as a producer, which adds a token into the BlockingQueue and an executor (which is created out of Executors.newFixedThreadpool) can do all your subsequent operations. You can refer the Javadocs and create your Spring context (as XML or annotations).
Also you can refer CompletionSerive
Spawning a thread to send and email as and when a profile is saved is not a good idea. Because it might result in too many threads and context switching might cause delay in completion. Hence the suggestion to use fixed thread pool.
A JMS queue can be used. But it looks like an overkill for the given scenario. Hence the suggestion to use BlockingQueue.
What's the recommended way of starting a thread from a servlet?
Example: One user posts a new chat message to a game room. I want to send a push notification to all other players connected to the room, but it doesn't have to happen synchronously. Something like:
public MyChatServlet extends HttpServlet {
protected void doPost(HttpServletRequest request,
HttpServletResponse response)
{
// Update the database with the new chat message.
final String msg = ...;
putMsgInDatabaseForGameroom(msg);
// Now spawn a thread which will deal with communicating
// with apple's apns service, this can be done async.
new Thread() {
public void run() {
talkToApple(msg);
someOtherUnimportantStuff(msg);
}
}.start();
// We can send a reply back to the caller now.
// ...
}
}
I'm using Jetty, but I don't know if the web container really matters in this case.
Thanks
What's the recommended way of starting a thread from a servlet?
You should be very careful when writing the threading program in servlet.
Because it may causes errors (like memory leaks or missing synchronization) can cause bugs that are very hard to reproduce,
or bring down the whole server.
You can start the thread by using start() method.
As per my knowledge , I would recommend startAsync (servlet 3.0).
I got some helpful link for you Click.
but I don't know if the web container really matters in this case.
Yes it matters.Most webservers (Java and otherwise, including JBoss) follow a "one thread per request" model, i.e. each HTTP request is fully processed by exactly one thread.
This thread will often spend most of the time waiting for things like DB requests. The web container will create new threads as necessary.
Hope it will help you.
I would use a ThreadPoolExecutor and submit the tasks to it. The executor can be configured with a fixed/varying number of threads, and with a work queue that can be bounded or not.
The advantages:
The total number of threads (as well as the queue size) can be bounded, so you have good control on resource consumption.
Threads are pooled, eliminating the overhead of thread starting per request
You can choose a task rejection policy (Occurs when the pool is at full capacity)
You can easily monitor the load on the pool
The executor mechanism supports convenient ways of tracking the asynchronous operation (using Future)
In general that is the way. You can start any thread anywhere in a servlet web application.
But in particulary, you should protect your JVM from starting too much threads on any HTTP request. Someone may request a lot ( or very very much ) and propably at some point your JVM will stop with out of memory or something similiar.
So better choice is to use one of the queues found in the java.util.concurrent package.
One option would be to use ExecutorService and it's implementations like ThreadPoolExecutor
, to re-use the pooled threads thus reducing the creation overhead.
You can use also JMS for queuing you tasks to be executed later.
We have an application which processes a queue of documents (basically all the documents found in an input directory). The documents are read in one by one and are then processed. The application is an obvious candidate for threading since the results from processing one document are completely independent from the results of processing any other document. The question I have is how to divide the work.
One obvious way to split the work is to count the number of documents in the queue, divide by the number of available processors and split the work accordingly (example, the queue has 100 documents and I have 4 available processors, I create 4 threads and feed 25 documents from the queue to each thread).
However, a coworker suggests that I can just spawn a thread for each document in the queue and let the java JVM sort it out. I don't understand how this could work. I do get that the second method results in cleaner code, but is it just as efficient (or even more efficient) than the first method?
Any thoughts would be appreciated.
Elliott
We have an application which processes a queue of documents ... how to divide the work?
You should use the great ExecutorService classes. Something like the following would work. You would submit each of your files to the thread-pool and they will be processed by the 10 working threads.
// create a pool with 10 threads
ExecutorService threadPool = Executors.newFixedThreadPool(10);
for (String file : files) {
threadPool.submit(new MyFileProcessor(file));
}
// shutdown the pool once you've submitted your last job
threadPool.shutdown();
...
public class MyFileProcessor implements Runnable {
private String file;
public MyFileProcessor(String file) {
this.file = file;
}
public run() {
// process the file
}
}
In general, there are three ways to do work-splitting among threads.
First, static partitioning. This is where you count and divide the documents statically (i.e., without taking into account how long will it take to process each document). This approach is very efficient (and often easy to code), however, it can result in poor performance if documents take different amounts of time to process. One thread can accidentally get stuck with all long documents which will imply that it will run the longest and your parallelism will be limited.
Second, dynamic partitioning (you did not mention this). Spawn a fixed number of threads and let each thread work in a simple loop:
While not done:
Dequeue a document
Process document
In this manner you avoid the load imbalance. You incur the overhead of accessing the queue after the processing of each document but that will be negligible as long as each document's processing is substantially longer than a queue access (hence, I think you should be).
Third, let the JVM do your work-scheduling. This is where you span N threads and let them fight it out. This approach is rather simple but its downside is that you will rely heavily on JVMs thread scheduling and it can be very slow if JVM doesn't do a great job at it. Having too many threads that thrash each other can be very slow. I hope JVM is better than that so this may be worth a try.
Hope this helps.
Don't spawn a thread for each document but schedule a Runnable task at a Threadpool that has e.g. as many threads as processors.
You don't need to split the documents that way. Just create a fixed number of worker threads (i.e. create two worker threads using Executors.newFixedThreadPool(2)), and each can only process one document at a time. When it has finished processing one document, it grabs a new document from a shared list.
I have a Spring-MVC, Hibernate, (Postgres 9 db) Web app. An admin user can send in a request to process nearly 200,000 records (each record collected from various tables via joins). Such operation is requested on a weekly or monthly basis (OR whenever the data reaches to a limit of around 200,000/100,000 records). On the database end, i am correctly implementing batching.
PROBLEM: Such a long running request holds up the server thread and that causes the the normal users to suffer.
REQUIREMENT: The high response time of this request is not an issue. Whats required is not make other users suffer because of this time consuming process.
MY SOLUTION:
Implementing threadpool using Spring taskExecutor abstraction. So i can initialize my threadpool with say 5 or 6 threads and break the 200,000 records into smaller chunks say of size 1000 each. I can queue in these chunks. To further allow the normal users to have a faster db access, maybe I can make every runnable thread sleep for 2 or 3 secs.
Advantages of this approach i see is: Instead of executing a huge db interacting request in one go, we have a asynchronous design spanning over a larger time. Thus behaving like multiple normal user requests.
Can some experienced people please give their opinion on this?
I have also read about implementing the same beahviour with a Message Oriented Middleware like JMS/AMQP OR Quartz Scheduling. But frankly speaking, i think internally they are also gonna do the same thing i.e making a thread pool and queueing in the jobs. So why not go with the Spring taskexecutors instead of adding a completely new infrastructure in my web app just for this feature?
Please share your views on this and let me know if there is other better ways to do this?
Once again: the time to completely process all the records in not a concern, whats required is that normal users accessing the web app during that time should not suffer in any way.
You can parallelize the tasks and wait for all of them to finish before returning the call. For this, you want to use ExecutorCompletionService which is available in Java standard since 5.0
In short, you use your container's service locator to create an instance of ExecutorCompletionService
ExecutorCompletionService<List<MyResult>> queue = new ExecutorCompletionService<List<MyResult>>(executor);
// do this in a loop
queue.submit(aCallable);
//after looping
queue.take().get(); //take will block till all threads finish
If you do not want to wait then, you can process the jobs in the background without blocking the current thread but then you will need some mechanism to inform the client when the job has finished. That can be through JMS or if you have an ajax client then, it can poll for updates.
Quartz also has a job scheduling mechanism but, Java provides a standard way.
EDIT:
I might have misunderstood the question. If you do not want a faster response but rather you want to throttle the CPU, use this approach
You can make an inner class like this PollingThread where batches containing java.util.UUID for each job and the number of PollingThreads are defined in the outer class. This will keep going forever and can be tuned to keep your CPUs free to handle other requests
class PollingThread implements Runnable {
#SuppressWarnings("unchecked")
public void run(){
Thread.currentThread().setName("MyPollingThread");
while (!Thread.interrupted()) {
try {
synchronized (incomingList) {
if (incomingList.size() == 0) {
// incoming is empty, wait for some time
} else {
//clear the original
list = (LinkedHashSet<UUID>)
incomingList.clone();
incomingList.clear();
}
}
if (list != null && list.size() > 0) {
processJobs(list);
}
// Sleep for some time
try {
Thread.sleep(seconds * 1000);
} catch (InterruptedException e) {
//ignore
}
} catch (Throwable e) {
//ignore
}
}
}
}
Huge-db-operations are usually triggered at wee hours, where user traffic is pretty less. (Say something like 1 Am to 2 Am.. ) Once you find that out, you can simply schedule a job to run at that time. Quartz can come in handy here, with time based triggers. (Note: Manually triggering a job is also possible.)
The processed result could now be stored in different table(s). (I'll refer to it as result tables) Later when a user wants this result, the db operations would be against these result tables which have minimal records and hardly any joins would be involved.
instead of adding a completely new infrastructure in my web app just for this feature?
Quartz.jar is ~ 350 kb and adding this dependency shouldn't be a problem. Also note that there's no reason this need to be as a web-app. These few classes that do ETL could be placed in a standalone module.The request from the web-app needs to only fetch from the result tables
All these apart, if you already had a master-slave db model(discuss on that with your dba) then you could do the huge-db operations with the slave-db rather than the master, which normal users would be pointed to.