I have the following problem, and I am not sure how to design parts of the solution:
I have a large text file that I read line by line.
I need to process each line and update a HashMap.
AFAIK I need one producer thread to read the lines from the file, and dispatch the lines to a pool of consumer threads. The consumer threads should update the ConcurrentHashMap and then get new lines.
My questions are:
How can the consumer threads access the ConcurrentHashMap?
If I use a fixed thread pool, does the producer need to add the line to a queue first, or can it simply submit or execute a new consumer?
EDIT:
Zim-Zam is correct; I want the consumers to dump their results into the ConcurrentHashMap when they finish.
I create the ConcurrentHashMap in the main thread, and pass references to it to the Consumers in their constructors. The Consumers should either add or increment an AtomicInteger in their run methods. How can I tell in the main thread when all of the lines are read and the consumers are finished?
Thanks again.
You can either have all of the consumers share the same queue that the producer adds to, or else you can give each consumer its own queue that the producer accesses via a circular linked list or a similar data structure so that each consumer's queue receives more or less the same amount of data (e.g. if you have 3 consumers, then the producer would add data to queue1, then queue2, then queue3, then queue1, etc).
You can give each consumer a reference to the same ConcurrentHashMap (e.g. in the consumer's constructor), or else you can make the ConcurrentHashMap accessible via a static getter method.
I think you don't really need to use producer consumer queue in the way you suggested.
Simply have the main queue reading the file, and for each line you read, create a corresponding Runnable object (treat it as a command) and put it to the thread pool executor. The content of the Runnable object is simply the logic of handle that line and putting result to the concurrentHashMap
The ThreadPoolExecutor can be created with a bounded or unbounded blocking queue, depends on the behavior you want.
In pseudo code it is something like this:
class LineHandler implements Runnable {
String line;
ConcurrentHashMap resultMap;
public LineHandler(String line, ConcurrentHashMap resultMap) {
this.line = line;
this.resultMap = resultMap;
}
#Override
public void run() {
// work on line
// update resultMap
}
}
// logic in your file reader thread, supposed to be in a loop:
while (moreLinesInFile()) {
String line = readFromFile();
threadPoolExecutor.submit(new LineHandler(line, concurrentHashMap));
}
threadPoolExecutor.shutdown();
Use a CountDownLatch.
// in main thread
// assume consumers are in some kind of container
List<MyConsumer> consumers...
CountDownLatch latch = new CountDownLatch( consumers.size() );
for( MyConsumer c : consumers ) {
c.setLatch( latch );
c.start(); // starts asychronous, or submit to executor, whatever you're doing
}
// block main thread, optionally timing out
latch.await();
// Then in consumer when it's done it's work:
latch.countDown();
I would suggest you use a BlockingQueue to store the to be processed lines.
After the main thread finished parsing the file, the main thread puts a poison object as the last object into the queue and waits with awaitTermination(...) for the consumers to finish.
The poison object is handled in a special way in a consumer thread. The consumer thread that processes the posion object attemts to shutdown() the ExecutorService, while the main thread is waiting.
As for the result of the consumers just add them to some threadsafe container. The producer/consumer problem is handled by the Queue: poll(...), put(...).
Hope i could help
Related
I am trying use tow thread,form the first get the input and the second process the input.
put the problem I can not found how to return a value from a thread without using callback
an callback does not act like a thread (I think) so any good idea how to do that and thank.
Thread t1 = new Thread() {
public void input() {
while (true) {
while (true) {
/*
* get input using Scanner
*/
}
}
}
};
t1.start();
Thread t2 = new Thread() {
public void input() {
while (true) {
while (true) {
/* get input form above than
* swith something or do something
*/
}
}
}
};
t2.start();
Use a shared BlockingQueue. The first thread (producer) adds the inputs to the queue, and the second one (consumer) gets them from the queue. A BlockingQueue, as its name indicates, is blocking. So the consumer getting the next element from the queue will block until the queue actually contains an element.
Your second thread should raise an event when it has something available for the first thread. When creating the second thread, have the first thread add itself as a listener, then the second thread uses that to signal the event.
This talks about swing but you can use generic events and listeners for anything.
http://docs.oracle.com/javase/tutorial/uiswing/events/
If you only have one thread you may want to use a java.util.concurrent.FutureTask (provided you use Java 1.5 or later).
the blocking queue is the design of choice in such scenarios
Spawn your producer threads and let them insert their data into the queue. Another thread (or the main thread), the consumer, will "spin-lock" watching for data on the queue: as soon as some data arrives, the consumer will grab it and use it.
So your threads "returns" data by inserting it into this common data structure (the queue).
Don't forget to protect your queue using a mutex (it's a critical section) or multiple threads could use (read/write) the queue data structure at the same time causing all sort of weird behaviors, a plain SIGSEGV signal if you are lucky :-)
I'm working on a producer-consumer pattern that should work with a queue. As usually a consumer Thread and a Producer thread, producer will add an item to the queue at certain times interval (from 3 to 5 seconds), consumer will wait to process it as soon as the queue isn't empty.
As a requirement the producer should and will produce items non-stop, which means if the queue is full, it will keep producing, and that's why I can't use BlockingQueue implementations as they either wait for the queue to have available space or throw exception.
My current implementation is the following
// consumer's Runnable
public void run() {
while(true) {
if(!queue.isEmpty()) {
currentItem = queue.poll();
process(currentItem);
}
}
}
This thread will keep looping even if no item has been produced by the producer Thread.
How is it done to wait until the producer add an item to the queue, and also what is a good Queue implementation with no cap-limit ?
I have a while loop that checks if an arraylist containing commands for the program to execute is empty. Obviously it does things if not empty, but if it is right now I just have a Thread.sleep(1000) for the else. That leaves anything that interacts with it rather sluggish. Is there any way to get the thread it runs on to block until a new command is added? (It runs in it's own thread so that seems to be the best solution to me) Or is there a better solution to this?
You can use wait() and notify() to have the threads that add something to the list inform the consumer thread that there is something to be read. However, this requires proper synchronization, etc..
But a better way to solve your problem is to use a BlockingQueue instead. By definition, they are synchronized classes and the dequeuing will block appropriately and wakeup when stuff is added. The LinkedBlockingQueue is a good class to use if you want your queue to not be limited. The ArrayBlockingQueue can be used when you want a limited number of items to be stored in the queue (or LinkedBlockingQueue with an integer passed to the constructor). If a limited queue then queue.add(...) would block if the queue was full.
BlockingQueue<Message> queue = new LinkedBlockingQueue<Messsage>();
...
// producer thread(s) add a message to the queue
queue.add(message);
...
// consumer(s) wait for a message to be added to the queue and then removes it
Message message = queue.take();
...
// you can also wait for certain amount of time, returns null on timeout
Message message = queue.poll(10, TimeUnit.MINUTES);
Use a BlockingQueue<E> for your commands.
There's a very good example of how to use it in the link above.
A better solution is to use an ExecutorService. This combines a queue and a pool of threads.
// or use a thread pool with multiple threads.
ExecutorService executor = Executors.newSingleThreadExecutor();
// call as often as you like.
executor.submit(new Runnable() {
#Override
public void run() {
process(string);
}
});
// when finished
executor.shutdown();
n threads produce to a BlockingQueue.
When the queue is full, the consumer drains the queue and does some processing.
How should I decide between the following 2 choices of implementation?
Choice A :
The consumer regularly polls the queue to check if it is full, all writers waiting (it is a blocking queue after all : ).
Choice B :
I implement my own queue with a synchronized "put" method. Before putting the provided element, I test if the queue is not nearly full (full minus 1 element). I then put the element and notify my consumer (which was waiting).
The first solution is the easiest but does polling ; which annoys me.
The second solution is in my opinion more error prone and more requires more coding.
I would suggest to write your proxy queue which would wrap a queue instance internally along with an Exchanger instance. Your proxy methods delegate calls to your internal queue. Check if the internal queue is full when you add and when it is full, exchange the internal queue with the consumer thread. The consumer thread will exchange an empty queue in return for the filled queue. Your proxy queue will continue filling the empty queue while the consumer can keep processing the filled queue. Both activities can run in parallel. They can exchange again when both parties are ready.
class MyQueue implements BlockingQueue{
Queue internalQueue = ...
Exchanger<Queue> exchanger;
MyQueue(Exchanger<BlockingQueue> ex){
this.exchanger = ex;
}
.
.
.
boolean add (E e) {
try{
internalQueue.add(e);
}catch(IllegalStateException ise){
internalQueue = exchanger.exchange(internalQueue);
}
internalQueue.add(e);
}
}
class Consumer implements Runnable {
public void run() {
Queue currentQueue = new empty queue;
while (...){
Object o = currentQueue.remove();
if (o == null){
currentQueue = exchanger.exchange(currentQueue);
continue;
}
//cast and process the element
}
}
}
The second solution is obviously better. And it is not so complicated. You can inherit or wrap any other BlockingQueue and override its method offer() as following: call the "real" offer(). If it returns true, exit. Otherwise trigger the working thread to work and immediately call offer() with timeout.
Here is the almost pseudo code:
public boolean offer(E e) {
if (queue.offer(e)) {
return true;
}
boolean result = queue.offer(e, timeout, unit); // e.g. 20 sec. - enough for worker to dequeue at least one task from the queue, so the place will be available.
worker.doYouJob();
return result; }
I don't know is there such implementation of queue you need: consumer are wait while queue become full and only when it full drain and start processing.
You queue should be blocked for consumers until it become full. Think you need to override drain() method to make it wait while queue become full. Than your consumers just call and wait for drain method. No notification from producer to consumer is needed.
Use an observer pattern. Have your consumers register with the queue notifier. When a producer does a put the queue would then decide whether to notify any listeners.
I used the CountdownLatch which is simple and works great.
Thanks for the other ideas :)
I have a scientific application which I usually run in parallel with xargs, but this scheme incurs repeated JVM start costs and neglects cached file I/O and the JIT compiler. I've already adapted the code to use a thread pool, but I'm stuck on how to save my output.
The program (i.e. one thread of the new program) reads two files, does some processing and then prints the result to standard output. Currently, I've dealt with output by having each thread add its result string to a BlockingQueue. Another thread takes from the queue and writes to a file, as long as a Boolean flag is true. Then I awaitTermination and set the flag to false, triggering the file to close and the program to exit.
My solution seems a little kludgey; what is the simplest and best way to accomplish this?
How should I write primary result data from many threads to a single file?
The answer doesn't need to be Java-specific if it is, for example, a broadly applicable method.
Update
I'm using "STOP" as the poison pill.
while (true) {
String line = queue.take();
if (line.equals("STOP")) {
break;
} else {
output.write(line);
}
}
output.close();
I manually start the queue-consuming thread, then add the jobs to the thread pool, wait for the jobs to finish and finally poison the queue and join the consumer thread.
That's really the way you want to do it, have the threads put their output to the queue and then have the writer exhaust it.
The only thing you might want to do to make things a little cleaner is rather than checking a flag, simply put an "all done" token on to the queue that the writer can use to know that it's finished. That way there's no out of band signaling necessary.
That's trivial to do, you can use an well known string, an enum, or simply a shared object.
You could use an ExecutorService.
Submit a Callable that would perform the task and return the string after completion.
When Submitting the Callable you get hold of a Future, store these references e.g. in a List.
Then simply iterate through the Futures and get the Strings by calling Future#get.
This will block until the task is completed if it not yet is, otherwise return the value immediately.
Example:
ExecutorService exec = Executors.newFixedThreadPool(10);
List<Future<String>> tasks = new ArrayList<Future<String>>();
tasks.add(exec.submit(new Callable<String> {
public String call() {
//do stuff
return <yourString>;
}
}));
//and so on for the other tasks
for (Future<String> task : tasks) {
String result = task.get();
//write to output
}
Many threads processing, one thread writing and a message queue between them is a good strategy. The issue that just needs to be solved, is knowing when all work is finished. One way to do that is to count how many worker threads you started, and then after that count how many responses you got. Something like this pseudo code:
int workers = 0
for each work item {
workers++
start the item's worker in a separate thread
}
while workers > 0 {
take worker's response from a queue
write response to file
workers--
}
This approach also works if the workers can find more work items while they are executing. Just include any additional not-yet-processed work in the worker responses, and then increment the workers count and start workers threads as usual.
If each of the workers returns just one message, you can use Java's ExecutorService to execute Callable instances which return the result. ExecutorService's methods give access to Future instances from which you can get the result when the Callable has finished its work.
So you would first submit all the tasks to the ExecutorService and then loop over all the Futures and get their responses. That way you would write the responses in the order in which you check the futures, which can be different from the order in which they finish their work. If latency is not important, that shouldn't be a problem. Otherwise, a message queue (as mentioned above) might be more suitable.
It's not clear if your output file has some defined order or if you just dump your data there. I assume it has no order.
I don't see why you need an extra thread for writing to output. Just synchronized the method that writes to file and call it at the end of each thread.
If you have many threads writing to the same file the simplest thing to do is to write to that file in the task.
final PrintWriter out =
ExecutorService es =
for(int i=0;i<tasks;i++)
es.submit(new Runnable() {
public void run() {
performCalculations();
// so only one thread can write to the file at a time.
synchornized(out) {
writeResults(out);
}
}
});
es.shutdown();
es.awaitTermination(1, TimeUnit.HOUR);
out.close();