i want to introduce my problem first.
I have several WorkingThreads that are receiving a string, processing the string and afterwards sending the processed string to a global Queue like this:
class Main {
public static Queue<String> Q;
public static void main(String[] args) {
//start working threads
}
}
WorkingThread.java:
class WorkingThread extends Thread {
public void run() {
String input;
//do something with input
Main.q.append(processedString);
}
So now every 800ms another Thread called Inserter dequeues all the entries to formulate some sql, but thats not important.
class Inserter extends Thread {
public void run() {
while(!Main.Q.isEmpty()) {
System.out.print(".");
// dequeue and formulate some SQL
}
}
}
Everything works for about 5 to 10 minutes but then suddenly, i cannot see any dots printed (what is basically a heartbeat for the Inserter). The Queue is not empty i can assure that but the inserter just wont work even though it get started regulary.
I have a suspision that there is a problem when a worker wants to insert something while the Inserter dequeues the Queue, could this possibly be some kind of "deadlock"?
I really hope somebody has an explanation for this behaviour. I am looking forward to learn ;).
EDIT: I am using
Queue<String> Q = new LinkedList<String>();
You are not using a synchronized or thread safe Queue therefore you have a race hazard. Your use of a LinkedList shows a (slightly scary) lack of knowledge of this fact. You may want to read more about threading and thread safety before you try and tackle any more threaded code.
You must either synchronize manually or use one of the existing implementations provided by the JDK. Producer/consumer patterns are usually implemented using one of the BlockingQueue implementations.
A BlockingQueue of a bounded size will block producers trying to put if the queue is full. A BlockingQueue will always block consumers if the queue is empty.
This allows you to remove all of your custom logic that spins on the queue and waits for items.
A simple example using Java 8 lambdas would look like:
public static void main(String[] args) throws Exception {
final BlockingQueue<String> q = new LinkedBlockingQueue<>();
final ExecutorService executorService = Executors.newFixedThreadPool(4);
final Runnable consumer = () -> {
while (true) {
try {
System.out.println(q.take());
} catch (InterruptedException e) {
return;
}
}
};
executorService.submit(consumer);
final Stream<Runnable> producers = IntStream.range(0, 5).mapToObj(i -> () -> {
final Random random = ThreadLocalRandom.current();
while (true) {
q.add("Consumer " + i + " putting " + random.nextDouble());
try {
TimeUnit.MILLISECONDS.sleep(random.nextInt(2000));
} catch (InterruptedException e) {
//ignore
}
}
});
producers.forEach(executorService::submit);
}
The consumer blocks on the BlockingQueue.take method and immediately there is an item available, it will be woken and will print the item. If there are no items, the thread will be suspended - allowing the physical CPU to do something else.
The producers each push a String onto the queue using add. As the queue is unbounded, add will always return true. In the case where there is likely to be a backlog of work the for consumer you can bound the queue and use the put method (that throws an InterruptedException so requires a try..catch which is why it's easier to use add) - this will automatically create flow control.
Seems more like synchronization issue.. You are trying to do a simulation of - Producer - Consumer problem. You need to synchronize your Queue or use a BlockingQueue. You probably have a race condition.
You are going to need to synchronize access to your Queue or
use ConcurrentLinkedQueue see http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html
or as also suggested using a BlockingQueue (depending on your requirements) http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html
For a more detailed explanation of the BlockingQueue see
http://tutorials.jenkov.com/java-util-concurrent/blockingqueue.html
Related
I am consuming from a certain source (say Kafka) and periodically dumping the collected messages (to, say, S3). My class definition is as follows:
public class ConsumeAndDump {
private List<String> messages;
public ConsumeAndDump(){
messages = new ArrayList<>();
// initialize required resources
}
public void consume(){
// this runs continuously and keeps consuming from the source.
while(true){
final String message = ...// consume from Kafka
messages.add(message);
}
}
public void dump(){
while(true){
final String allMessages = String.join("\n", messages);
messages.clear(); // shown here simply, but i am synchronising this to avoid race conditions
// dump to destination (file, or S3, or whatever)
TimeUnit.SECONDS.sleep(60); // sleep for a minute
}
}
public void run() {
// This is where I don't know how to proceed.
// How do I start consume() and dump() as separate threads?
// Is it even possible in Java?
// start consume() as thread
// start dump() as thread
// wait for those to finish
}
}
I want to have two threads - consume and dump. consume should run continuously whereas dump wakes up periodically, dumps the messages, clears the buffer and then goes back to sleep again.
I am having trouble starting consume() and dump() as threads. Honestly, I don't know how to do that. Can we even run member methods as threads? Or do I have to make separate Runnable classes for consume and dump? If so, how would I share messages between those?
First of all, you can't really use ArrayList for this. ArrayList is not thread-safe. Check out BlockingQueue for example. You will have to deal with things like back pressure. Don't use an unbounded queue.
Starting a thread is pretty simple, you can use lambdas for it.
public void run() {
new Thread(this::consume).start();
new Thread(this::produce).start();
}
Should work, but gives you little to no control over when those processes should end.
I have the following method:
void store(SomeObject o) {
}
The idea of this method is to store o to a permanent storage but the function should not block. I.e. I can not/must not do the actual storage in the same thread that called store.
I can not also start a thread and store the object from the other thread because store might be called a "huge" amount of times and I don't want to start spawning threads.
So I options which I don't see how they can work well:
1) Use a thread pool (Executor family)
2) In store store the object in an array list and return. When the array list reaches e.g. 1000 (random number) then start another thread to "flush" the array list to storage. But I would still possibly have the problem of too many threads (thread pool?)
So in both cases the only requirement I have is that I store persistantly the objects in exactly the same order that was passed to store. And using multiple threads mixes things up.
How can this be solved?
How can I ensure:
1) Non blocking store
2) Accurate insertion order
3) I don't care about any storage guarantees. If e.g. something crashes I don't care about losing data e.g. cached in the array list before storing them.
I would use a SingleThreadExecutor and a BlockingQueue.
SingleThreadExecutor as the name sais has one single Thread. Use it to poll from the Queue and persist objects, blocking if empty.
You can add not blocking to the queue in your store method.
EDIT
Actually, you do not even need that extra Queue - JavaDoc of newSingleThreadExecutor sais:
Creates an Executor that uses a single worker thread operating off an unbounded queue. (Note however that if this single thread terminates due to a failure during execution prior to shutdown, a new one will take its place if needed to execute subsequent tasks.) Tasks are guaranteed to execute sequentially, and no more than one task will be active at any given time. Unlike the otherwise equivalent newFixedThreadPool(1) the returned executor is guaranteed not to be reconfigurable to use additional threads.
So I think it's exactly what you need.
private final ExecutorService persistor = Executors.newSingleThreadExecutor();
public void store( final SomeObject o ){
persistor.submit( new Runnable(){
#Override public void run(){
// your persist-code here.
}
} );
}
The advantage of using a Runnable that has a quasi-endless-loop and using an extra queue would be the possibility to code some "Burst"-functionality. For example you could make it wait to persist only when 10 elements are in queue or the oldest element has been added at least 1 minute ago ...
I suggest using a Chronicle-Queue which is a library I designed.
It allows you to write in the current thread without blocking. It was originally designed for low latency trading systems. For small messages it takes around 300 ns to write a message.
You don't need to use a back ground thread, or a on heap queue and it doesn't wait for the data to be written to disk by default. It also ensures consistent order for all readers. If the program dies at any point after you call finish() the message is not lost. (Unless the OS crashes/loses power) It also supports replication to avoid data loss.
Have one separate thread that gets items from the end of a queue (blocking on an empty queue), and writes them to disk. Your main thread's store() function just adds items to the beginning of the queue.
Here's a rough idea (though I assume there will be cleaner or faster ways for doing this in production code, depending on how fast you need things to be):
import java.util.*;
import java.io.*;
import java.util.concurrent.*;
class ObjectWriter implements Runnable {
private final Object END = new Object();
BlockingQueue<Object> queue = new LinkedBlockingQueue();
public void store(Object o) throws InterruptedException {
queue.put(o);
}
public ObjectWriter() {
new Thread(this).start();
}
public void close() throws InterruptedException {
queue.put(END);
}
public void run() {
while (true) {
try {
Object o = queue.take();
if (o == END) {
// close output file.
return;
}
System.out.println(o.toString()); // serialize as appropriate
} catch (InterruptedException e) {
}
}
}
}
public class Test {
public static void main(String[] args) throws Exception {
ObjectWriter w = new ObjectWriter();
w.store("hello");
w.store("world");
w.close();
}
}
The comments in your question make it sound like you are unfamilier with multi-threading, but it's really not that difficult.
You simply need another thread responsible for writing to the storage which picks items off a queue. - your store function just adds the objects to the in-memory queue and continues on it's way.
Some psuedo-ish code:
final List<SomeObject> queue = new List<SomeObject>();
void store(SomeObject o) {
// add it to the queue - note that modifying o after this will also alter the
// instance in the queue
synchronized(queue) {
queue.add(queue);
queue.notify(); // tell the storage thread there's something in the queue
}
}
void storageThread() {
SomeObject item;
while (notfinished) {
synchronized(queue) {
if (queue.length > 0) {
item = queue.get(0); // get from start to ensure same order
queue.removeAt(0);
} else {
// wait for something
queue.wait();
continue;
}
}
writeToStorage(item);
}
}
So my goal is to measure the performance of a Streaming Engine. It's basically a library to which i can send data-packages. The idea to measure this is to generate data, put it into a Queue and let the Streaming Engine grab the data and process it.
I thought of implementing it like this: The Data Generator runs in a thread and generates data packages in an endless loop with a certain Thread.sleep(X) at the end. When doing the tests the idea is to minimize tis Thread.sleep(X) to see if this has an impact on the Streaming Engine's performance. The Data Generator writes the created packages into a queue, that is, a ConcurrentLinkedQueue, which at the same time is a Singleton.
In another thread I instantiate the Streaming Engine which continuously removes the packages from the queue by doing queue.remove(). This is done in an endlees loop without any sleeping, because it should just be done as fast as possible.
In a first try to implement this I ran into a problem. It seems as if the Data Generator is not able to put the packages into the Queue as it should be. It is doing that too slow. My suspicion is that the endless loop of the Streaming Engine thread is eating up all the resources and therefore slows down everything else.
I would be happy about how to approach this issue or other design patterns, which could solve this issue elegantly.
the requirements are: 2 threads which run in parallel basically. one is putting data into a queue. the other one is reading/removing from the queue. and i want to measure the size of the queue regularly in order to know if the engine which is reading/removing from the queue is fast enough to process the generated packages.
You can use a BlockingQueue, for example ArrayBlockingQueue, you can initialize these to a certain size, so the number of items queued will never exceed a certain number, as per this example:
// create queue, max size 100
final ArrayBlockingQueue<String> strings = new ArrayBlockingQueue<>(100);
final String stop = "STOP";
// start producing
Runnable producer = new Runnable() {
#Override
public void run() {
try {
for(int i = 0; i < 1000; i++) {
strings.put(Integer.toHexString(i));
}
strings.put(stop);
} catch(InterruptedException ignore) {
}
}
};
Thread producerThread = new Thread(producer);
producerThread.start();
// start monitoring
Runnable monitor = new Runnable() {
#Override
public void run() {
try {
while (true){
System.out.println("Queue size: " + strings.size());
Thread.sleep(5);
}
} catch(InterruptedException ignore) {
}
}
};
Thread monitorThread = new Thread(monitor);
monitorThread.start();
// start consuming
Runnable consumer = new Runnable() {
#Override
public void run() {
// infinite look, will interrupt thread when complete
try {
while(true) {
String value = strings.take();
if(value.equals(stop)){
return;
}
System.out.println(value);
}
} catch(InterruptedException ignore) {
}
}
};
Thread consumerThread = new Thread(consumer);
consumerThread.start();
// wait for producer to finish
producerThread.join();
consumerThread.join();
// interrupt consumer and monitor
monitorThread.interrupt();
You could also have third thread monitoring the size of the queue, to give you an idea of which thread is outpacing the other.
Also, you can used the timed put method and the timed or untimed offer methods, which will give you more control of what to do if the queue if full or empty. In the above example execution will stop until there is space for the next element or if there are no further elements in the queue.
I have a ArrayBlocking queue, , upon which a single thread fixed rate Scheduled works.
I may have failed task. I want re-run that or re-insert in queue at high priority or top level
Some thoughts here -
Why are you using ArrayBlockingQueue and not PriorityBlockingQueue ? Sounds like what you need to me . At first set all your elements to be with equal priority.
In case you receive an exception - re-insert to the queue with a higher priority
Simplest thing might be a priority queue. Attach a retry number to the task. It starts as zero. After an unsuccessful run, throw away all the ones and increment the zeroes and put them back in the queue at a high priority. With this method, you can easily decide to run everything three times, or more, if you want to later. The down side is you have to modify the task class.
The other idea would be to set up another, non-blocking, thread-safe, high-priority queue. When looking for a new task, you check the non-blocking queue first and run what's there. Otherwise, go to the blocking queue. This might work for you as is, and so far it's the simplest solution. The problem is the high priority queue might fill up while the scheduler is blocked on the blocking queue.
To get around this, you'd have to do your own blocking. Both queues should be non-blocking. (Suggestion: java.util.concurrent.ConcurrentLinkedQueue.) After polling both queues with no results, wait() on a monitor. When anything puts something in a queue, it should call notifyAll() and the scheduler can start up again. Great care is needed lest the notification occur after the scheduler has checked both queues but before it calls wait().
Addition:
Prototype code for third solution with manual blocking. Some threading is suggested, but the reader will know his/her own situation best. Which bits of code are apt to block waiting for a lock, which are apt to tie up their thread (and core) for minutes while doing extensive work, and which cannot afford to sit around waiting for the other code to finish all needs to be considered. For instance, if a failed run can immediately be rerun on the same thread with no time-consuming cleanup, most of this code can be junked.
private final ConcurrentLinkedQueue mainQueue = new ConcurrentLinkedQueue();
private final ConcurrentLinkedQueue prioQueue = new ConcurrentLinkedQueue();
private final Object entryWatch = new Object();
/** Adds a new job to the queue. */
public void addjob( Runnable runjob ) {
synchronized (entryWatch) { entryWatch.notifyAll(); }
}
/** The endless loop that does the work. */
public void schedule() {
for (;;) {
Runnable run = getOne(); // Avoids lock if successful.
if (run == null) {
// Both queues are empty.
synchronized (entryWatch) {
// Need to check again. Someone might have added and notifiedAll
// since last check. From this point until, wait, we can be sure
// entryWatch is not notified.
run = getOne();
if (run == null) {
// Both queues are REALLY empty.
try { entryWatch.wait(); }
catch (InterruptedException ie) {}
}
}
}
runit( run );
}
}
/** Helper method for the endless loop. */
private Runnable getOne() {
Runnable run = (Runnable) prioQueue.poll();
if (run != null) return run;
return (Runnable) mainQueue.poll();
}
/** Runs a new job. */
public void runit( final Runnable runjob ) {
// Do everthing in another thread. (Optional)
new Thread() {
#Override public void run() {
// Run run. (Possibly in own thread?)
// (Perhaps best in thread from a thread pool.)
runjob.run();
// Handle failure (runit only, NOT in runitLast).
// Defining "failure" left as exercise for reader.
if (failure) {
// Put code here to handle failure.
// Put back in queue.
prioQueue.add( runjob );
synchronized (entryWatch) { entryWatch.notifyAll(); }
}
}
}.start();
}
/** Reruns a job. */
public void runitLast( final Runnable runjob ) {
// Same code as "runit", but don't put "runjob" in "prioQueue" on failure.
}
I have a queue that contains work items and I want to have multiple threads work in parallel on those items. When a work item is processed it may result in new work items. The problem I have is that I can't find a solution on how to determine if I'm done. The worker looks like that:
public class Worker implements Runnable {
public void run() {
while (true) {
WorkItem item = queue.nextItem();
if (item != null) {
processItem(item);
}
else {
// the queue is empty, but there may still be other workers
// processing items which may result in new work items
// how to determine if the work is completely done?
}
}
}
}
This seems like a pretty simple problem actually but I'm at a loss. What would be the best way to implement that?
thanks
clarification:
The worker threads have to terminate once none of them is processing an item, but as long as at least one of them is still working they have to wait because it may result in new work items.
What about using an ExecutorService which will allow you to wait for all tasks to finish: ExecutorService, how to wait for all tasks to finish
I'd suggest wait/notify calls. In the else case, your worker threads would wait on an object until notified by the queue that there is more work to do. When a worker creates a new item, it adds it to the queue, and the queue calls notify on the object the workers are waiting on. One of them will wake up to consume the new item.
The methods wait, notify, and notifyAll of class Object support an efficient transfer of control from one thread to another. Rather than simply "spinning" (repeatedly locking and unlocking an object to see whether some internal state has changed), which consumes computational effort, a thread can suspend itself using wait until such time as another thread awakens it using notify. This is especially appropriate in situations where threads have a producer-consumer relationship (actively cooperating on a common goal) rather than a mutual exclusion relationship (trying to avoid conflicts while sharing a common resource).
Source: Threads and Locks
I'd look at something higher level than wait/notify. It's very difficult to get right and avoid deadlocks. Have you looked at java.util.concurrent.CompletionService<V>? You could have a simpler manager thread that polls the service and take()s the results, which may or may not contain a new work item.
Using a BlockingQueue containing items to process along with a synchronized set that keeps track of all elements being processed currently:
BlockingQueue<WorkItem> bQueue;
Set<WorkItem> beingProcessed = new Collections.synchronizedSet(new HashMap<WorkItem>());
bQueue.put(workItem);
...
// the following runs over many threads in parallel
while (!(bQueue.isEmpty() && beingProcessed.isEmpty())) {
WorkItem currentItem = bQueue.poll(50L, TimeUnit.MILLISECONDS); // null for empty queue
if (currentItem != null) {
beingProcessed.add(currentItem);
processItem(currentItem); // possibly bQueue.add(newItem) is called from processItem
beingProcessed.remove(currentItem);
}
}
EDIT: as #Hovercraft Full Of Eels suggested, an ExecutorService is probably what you should really use. You can add new tasks as you go along. You can semi-busy wait for termination of all tasks at regular interval with executorService.awaitTermination(time, timeUnits) and kill all your threads after that.
Here's the beginnings of a queue to solve your problem. bascially, you need to track new work and in process work.
public class WorkQueue<T> {
private final List<T> _newWork = new LinkedList<T>();
private int _inProcessWork;
public synchronized void addWork(T work) {
_newWork.add(work);
notifyAll();
}
public synchronized T startWork() throws InterruptedException {
while(_newWork.isEmpty() && (_inProcessWork > 0)) {
wait();
if(!_newWork.isEmpty()) {
_inProcessWork++;
return _newWork.remove(0);
}
}
// everything is done
return null;
}
public synchronized void finishWork() {
_inProcessWork--;
if((_inProcessWork == 0) && _newWork.isEmpty()) {
notifyAll();
}
}
}
your workers will look roughly like:
public class Worker {
private final WorkQueue<T> _queue;
public void run() {
T work = null;
while((work = _queue.startWork()) != null) {
try {
// do work here...
} finally {
_queue.finishWork();
}
}
}
}
the one trick is that you need to add the first work item _before you start any workers (otherwise they will all immediately exit).