Need help with Java multiple threading
I have a case as below:
There are many records. Each record has about 250 fields. Each field needs to be validated against on a predefined rule.
So I defined a class, FieldInfo, to represent each field:
public class FieldInfo {
private String name;
private String value;
private String error_code;
private String error_message;
// ignore getters and setters
}
a class Record to represent a record:
public class Record {
List<FieldInfo> fields;
// omit getter and setter here
}
and the rule interface and class:
public interface BusinessRule {
// validating one field needs some other fields' value in the same record. So the list of all fields for a certain record passed in as parameter
public FieldInfo validate(List<FieldInfo> fields);
}
public class FieldName_Rule implements BusinessRule {
public FieldInfo validate(List<FieldInfo> fields) {
// will do
// 1. pickup those fields required for validating this target field, including this target field
// 2. performs validation logics A, B, C...
// note: all rules only read data from a database, no update/insert operations.
}
}
User can submit 5000 records or more at a time for process. The performance requirement is high. I was thinking to have multiple threads for the submitted, for example 5000, records (means one thread run several records), and in each thread, fork another multiple threads on each record to run rules.
But unfortunately, such embedded multi-threading always died in my case.
Here are some key parts from the above solution:
public class BusinessRuleService {
#Autowired
private ValidationHandler handler;
public String process(String xmlRequest) {
List<Record> records = XmlConverter.unmarshall(xmlRequest).toList();
ExecutorService es = Executors.newFixedThreadPool(100);
List<CompletableFuture<Integer> futures =
records.stream().map(r->CompletableFuture.supplyAsync(()-> handler.invoke(r), es)).collect(Collectors.toList());
List<Integer> result = future.stream().map(CompletableFuture::join).collect(Collectors.toList());
System.out.println("total records %d processed.", result.size());
es.shutdown();
return XmlConverter.marshallObject(records);
}
}
#Component
public class ValidationHandlerImpl implements ValidationHandler {
#Autowired
private List<BusinessRule> rules;
#Override
public int invoke(Record record) {
ExecutorService es = Executors.newFixedThreadPool(250);
List<CompletableFuture<FieldInfo> futures =
rules.stream().map(r->CompletableFuture.supplyAsync(()-> r.validate(record.getFields()), es)).collect(Collectors.toList());
List<FieldInfo> result = future.stream().map(CompletableFuture::join).collect(Collectors.toList());
System.out.println("total records %d processed.", result.size());
es.shutdown();
return 0;
}
}
The workflow is:
User submits a list of records in an xml string format. One of the application endpoint launches the process method in a BusinessRuleService object. The process uses CompletableFuture to compose tasks and submit the tasks to a ExecutorService which has a thread pool of size 100. Each task in the CompletableFuture list then launches ValidationHandler object. The ValidationHandler object composes another CompletableFuture task and submit the task to another ExecutorService which has the pool size the same as the rule list size.
The above solution is proper?
Note: my current solution is: the submitted records are processed in sequence. And the 250 rules are processed in parallel for each record. With this solution, it takes more than 2 hours for 5000 records. Such poor performance is not acceptable by business.
I am very new to concurrent/multi-threading programming.
Much appreciate for all kind of helps!
This is a well known "single producer - multiple consumers" pattern. The classic solution is to create a BlockingQueue<Record> queue, and put records there at the pace of their reading. On the other end of the queue, a number of working threads read records from the queue and process them (in our case, validate the fields):
class ValidatingThread extends Tread {
BlockingQueue<Record> queue;
FieldName_Rule validator = new FieldName_Rule();
public Validator (BlockingQueue<Record> queue) {
this.queue = queue;
}
public void run() {
Record record = queue.take();
validator.validate(collectFields(record));
}
}
The optimal number of threads equals to the Runtime.getRuntime().availableProcessors().
Start them all at the beginning, and do not use "embedded multi-threading".
The task how to stop the threads after all the records are processed, is left as a learning assignment.
Related
Is it possible to modify the runnable object after it has been submitted to the executor service (single thread with unbounded queue) ?
For example:
public class Test {
#Autowired
private Runner userRunner;
#Autowired
private ExecutorService executorService;
public void init() {
for (int i = 0; i < 100; ++i) {
userRunner.add("Temp" + i);
Future runnerFuture = executorService.submit(userRunner);
}
}
}
public class Runner implements Runnable {
private List<String> users = new ArrayList<>();
public void add(String user) {
users.add(user);
}
public void run() {
/* Something here to do with users*/
}
}
As you can see in the above example, if we submit a runnable object and modify the contents of the object too inside the loop, will the 1st submit to executor service use the newly added users. Consider that the run method is doing something really intensive and subsequent submits are queued.
if we submit a runnable object and modify the contents of the object too inside the loop, will the 1st submit to executor service use the newly added users.
Only if the users ArrayList is properly synchronized. What you are doing is trying to modify the users field from two different threads which can cause exceptions and other unpredictable results. Synchronization ensures mutex so multiple threads aren't changing ArrayList at the same time unexpectedly, as well as memory synchronization which ensures that one thread's modifications are seen by the other.
What you could do is to add synchronization to your example:
public void add(String user) {
synchronized (users) {
users.add(user);
}
}
...
public void run() {
synchronized (users) {
/* Something here to do with users*/
}
}
Another option would be to synchronize the list:
// you can't use this if you are iterating on this list (for, etc.)
private List<String> users = Collections.synchronizedList(new ArrayList<>());
However, you'll need to manually synchronize if you are using a for loop on the list or otherwise iterating across it.
The cleanest, most straightforward approach would be to call cancel on the Future, then submit a new task with the updated user list. Otherwise not only do you face visibility issues from tampering with the list across threads, but there's no way to know if you're modifying a task that's already running.
I have a database which contains e-mails to be sent. I'm using multiple threads to send out these e-mails. The approach I'm using is that each thread will query the database, get N e-mails in memory and mark those as being sent. Another thread will see those N e-mails as marked and move on and fetch the next N entries.
Now this isn't working as before thread1 can update the entries as being sent, thread2 queries for the e-mails and thus both threads end up getting the same set of e-mails.
Each thread has its own connection to the database. Is that the root cause of this behaviour? Should I be just sharing one connection object across all the threads?
Or is there any better approach that I could use?
My recommendation is to have a single thread take care of querying the database, placing the retrieved emails in a thread-safe queue (e.g. an ArrayBlockingQueue, which has the advantage of being bounded); you can then have any number of threads removing and processing emails from this queue. The synchronization overhead on the ArrayBlockingQueue is fairly lightweight, and this way you don't need to use database transactions or anything like that.
class EmailChunk {
Email[] emails;
}
// only instantiate one of these
class DatabaseThread implements Runnable {
final BlockingQueue<EmailChunk> emailQueue;
public DatabaseThread(BlockingQueue<EmailChunk> emailQueue) {
this.emailQueue = emailQueue;
}
public void run() {
EmailChunk newChunk = // query database, create email chunk
// add newChunk to queue, wait 30 seconds if it's full
emailQueue.offer(newChunk, 30, TimeUnit.SECONDS);
}
}
// instantiate as many of these as makes sense
class EmailThread implements Runnable {
final BlockingQueue<EmailChunk> emailQueue;
public EmailThread(BlockingQueue<EmailChunk> emailQueue) {
this.emailQueue = emailQueue;
}
public void run() {
// take next chunk from queue, wait 30 seconds if queue is empty
emailChunk nextChunk = emailQueue.poll(30, TimeUnit.SECONDS);
}
}
class Main {
final int queueSize = 5;
public static void main(String[] args) {
BlockingQueue<EmailChunk> emailQueue = new ArrayBlockingQueue<>(queueSize);
// instantiate DatabaseThread and EmailThread objects with this queue
}
}
You need to have a way to share one method // code to control the concurrence. Sincronize the statements to get the emails and mark them. Then sent the e-mails. Something like this:
public void processMails(){
List<String> mails;
synchronized(this){
mails = getMails();
markMails(mails);
}
sendMails(mails);
}
This method could be in your DAO Facade where all threads can access.
EDIT:
if you have multiples instances of DAO class:
public void processMails(){
List<String> mails;
synchronize(DAO.class){
mails = getMails();
markMails(mails);
}
sendMails(mails);
}
Other alternative
private static final Object LOCK = new Object();
public void processMails(){
List<String> mails;
synchronize(LOCK){
mails = getMails();
markMails(mails);
}
sendMails(mails);
}
Basically, I want to be able to run multiple threads - these threads will use sleep commands for a given period. I want to be able to manipulate the duration of these sleep threads based on user input after the thread has already been running for a period.
For example:
Starting the thread from classA...
private final ExecutorService scheduler = Executors.newCachedThreadPool();
public void startJob(Job job, List <Object> objectList) {
//Store the results of this in a map using a future and the id of job??
scheduler.submit(jobThreadInterface.create(job, objectList));
}
JobThreadInterface starts classB...
public class ClassB implements Runnable{
private Job job;
private List <Object> objectList;
private int changeSleepDuration;
public ClassB (Job job, List <Object> objectList){
this.job = job;
this.objectList= objectList;
}
public void run() {
//It will keep looping through this sleep command until there are no more objects left...
for (Object object : objectList){
if (object.getSleepNumber() > 0){
Thread.sleep(object.getSleepNumber() + changeSleepDuration);
}
}
public setChangeSleepDuration(int i){
changeSleepDuration = i;
}
}
}
So basically, what I want to do is access the setChangeSleepDuration method in ClassB from classA for any thread that I want to access. Is this possible and if so what is the best way?
Thanks,
I suppose that jobThreadInterface.create(job, objectList) does create an instance of ClassB. In that method, you could store the reference to ClassB in a collection that you can access later.
So something like:
ClassB runnable = jobThreadInterface.create(job, objectList);
list.add(runnable);
scheduler.submit(runnable);
And later in your code:
list.get(0).setChangeSleepDuration(1000);
Or you could store the runnables in a map to associate them with some keys that will help you retrieve them later on.
You could keep a hold of the Job instance. Then when the thread starts working on the job, save the thread as an attribute of your job. Then classA already knows the Job, so it can access the thread.
Alternatively, you might simply want to save that changeSleepDuration value in the job itself. It comes down to semantics. What does changeSleepDuration represent (a thread control, or is it part of the job?). Sounds like the latter. I'd go for option two.
I'm trying to get the percentage of the progress from a EJB Asynchronous process. Is this possible?
Does anyone have an idea how I could do this?
To get to know the progress of asynchronous processes is always tricky, especially if you don't know if they have actually started yet.
The best way I have found is to write another function that just gets the progress, so, if you have some unique id for each call, then update a hashmap with the current process. You may want to look at Concurrent Hashmap (http://download-llnw.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/ConcurrentHashMap.html)
Then this other lookup function will just take the unique id, and return the progress back to the client.
If it hasn't been started, you can also return that, and ideally you may want to also be able to return any error messages that came up in the processing.
Then, when it has finished, and you returned the error message or success, then delete it from the hashmap, the client got the information, and that info won't change, so no point it keeping it around.
UPDATE:
In your interface make a new function
String progressDone(String id);
You will then refer to that synchronously, as it just goes out and comes right back, so it can look up the id in the hashmap and return either the percentage done or an error message.
But, this means that your actually worker function needs to every so often put information in the hashmap as to where it is, which is why I suggested using the concurrent hashmap, so that you don't have to worry about concurrent writes, and so locking considerations.
The solution I have found is an context object shared between asynchronous method and main thread. Here is an example:
Asynchronous job itself:
#Stateless
public class AsyncRunner implements AsyncRunnerLocal {
#Asynchronous
public Future<ResultObject> doWorkAsynchronous(WorkContext context) {
context.setRunning(true);
for (int i = 0; i < 100; i++) {
//Do the next iteration of your work here
context.setProgress(i);
}
context.setRunning(false);
return new AsyncResult(new ResultObject());
}
}
Shared context object. Important thing here is volatile keyword. Field values will be locally cached in each thread without it and progress will not be visible in main thread:
public class WorkContext {
//volatile is important!
private volatile Integer progress = 0;
private volatile boolean running = false;
//getters and setters are omitted
}
Usage example:
public class ProgressChecker {
#EJB
private AsyncRunnerLocal asyncRunner;
private WorkContext context;
private Future<ResultObject> future;
public void startJob() {
this.context = new WorkContext();
future = asyncRunner.doWorkAsynchronous(this.context);
//the job is running now
while (!future.isDone()) {
System.out.println("Progress: " + this.context.getProgress());
Thread.sleep(1000); //try catch is omitted
}
}
}
In EJB3.1 #Asynchronous method-calls can return java.util.concurrent.Future, this interface provides information boolean isCancelled() or boolean isDone(), but no information if the execution started. From my point of view, there is no way to get the information if the process started its execution via the EJB-Container in standard ways.
I have a Java Thread like the following:
public class MyThread extends Thread {
MyService service;
String id;
public MyThread(String id) {
this.id = node;
}
public void run() {
User user = service.getUser(id)
}
}
I have about 300 ids, and every couple of seconds - I fire up threads to make a call for each of the id. Eg.
for(String id: ids) {
MyThread thread = new MyThread(id);
thread.start();
}
Now, I would like to collect the results from each threads, and do a batch insert to the database, instead of making 300 database inserts every 2 seconds.
Any idea how I can accomplish this?
The canonical approach is to use a Callable and an ExecutorService. submitting a Callable to an ExecutorService returns a (typesafe) Future from which you can get the result.
class TaskAsCallable implements Callable<Result> {
#Override
public Result call() {
return a new Result() // this is where the work is done.
}
}
ExecutorService executor = Executors.newFixedThreadPool(300);
Future<Result> task = executor.submit(new TaskAsCallable());
Result result = task.get(); // this blocks until result is ready
In your case, you probably want to use invokeAll which returns a List of Futures, or create that list yourself as you add tasks to the executor. To collect results, simply call get on each one.
If you want to collect all of the results before doing the database update, you can use the invokeAll method. This takes care of the bookkeeping that would be required if you submit tasks one at a time, like daveb suggests.
private static final ExecutorService workers = Executors.newCachedThreadPool();
...
Collection<Callable<User>> tasks = new ArrayList<Callable<User>>();
for (final String id : ids) {
tasks.add(new Callable<User>()
{
public User call()
throws Exception
{
return svc.getUser(id);
}
});
}
/* invokeAll blocks until all service requests complete,
* or a max of 10 seconds. */
List<Future<User>> results = workers.invokeAll(tasks, 10, TimeUnit.SECONDS);
for (Future<User> f : results) {
User user = f.get();
/* Add user to batch update. */
...
}
/* Commit batch. */
...
Store your result in your object. When it completes, have it drop itself into a synchronized collection (a synchronized queue comes to mind).
When you wish to collect your results to submit, grab everything from the queue and read your results from the objects. You might even have each object know how to "post" it's own results to the database, this way different classes can be submitted and all handled with the exact same tiny, elegant loop.
There are lots of tools in the JDK to help with this, but it is really easy once you start thinking of your thread as a true object and not just a bunch of crap around a "run" method. Once you start thinking of objects this way programming becomes much simpler and more satisfying.
In Java8 there is better way for doing this using CompletableFuture. Say we have class that get's id from the database, for simplicity we can just return a number as below,
static class GenerateNumber implements Supplier<Integer>{
private final int number;
GenerateNumber(int number){
this.number = number;
}
#Override
public Integer get() {
try {
TimeUnit.SECONDS.sleep(1);
}catch (InterruptedException e){
e.printStackTrace();
}
return this.number;
}
}
Now we can add the result to a concurrent collection once the results of every future is ready.
Collection<Integer> results = new ConcurrentLinkedQueue<>();
int tasks = 10;
CompletableFuture<?>[] allFutures = new CompletableFuture[tasks];
for (int i = 0; i < tasks; i++) {
int temp = i;
CompletableFuture<Integer> future = CompletableFuture.supplyAsync(()-> new GenerateNumber(temp).get(), executor);
allFutures[i] = future.thenAccept(results::add);
}
Now we can add a callback when all the futures are ready,
CompletableFuture.allOf(allFutures).thenAccept(c->{
System.out.println(results); // do something with result
});
You need to store the result in a something like singleton. This has to be properly synchronized.
This not the best advice as it is not good idea to handle raw Threads.
You could create a queue or list which you pass to the threads you create, the threads add their result to the list which gets emptied by a consumer which performs the batch insert.
The simplest approach is to pass an object to each thread (one object per thread) that will contain the result later. The main thread should keep a reference to each result object. When all threads are joined, you can use the results.
public class TopClass {
List<User> users = new ArrayList<User>();
void addUser(User user) {
synchronized(users) {
users.add(user);
}
}
void store() throws SQLException {
//storing code goes here
}
class MyThread extends Thread {
MyService service;
String id;
public MyThread(String id) {
this.id = node;
}
public void run() {
User user = service.getUser(id)
addUser(user);
}
}
}
You could make a class which extends Observable. Then your thread can call a method in the Observable class which would notify any classes that registered in that observer by calling Observable.notifyObservers(Object).
The observing class would implement Observer, and register itself with the Observable. You would then implement an update(Observable, Object) method that gets called when Observerable.notifyObservers(Object) is called.