I am working on a multi-threaded application and need help with some pseudo-code. To make it simpler for implementation I will try to explain that in simple terms / test case.
Here is the scenario -
I have an array list of strings (say 100 strings)
I have a Reader Class that reads the strings and passes them to a Writer Class that prints the strings to the console. Right now this runs in a Single Thread Model.
I wanted to make this multi-threaded but with the following features -
Ability to set MAX_READERS
Ability to set MAX_WRITERS
Ability to set BATCH_SIZE
So basically the code should instantiate those many Readers and Writers and do the work in parallel.
Any pseudo code will really be helpful to keep me going!
This sounds like the classic consumer-producer problem. Have a look at Wikipedia's article about it. They have plenty of pseudo code there.
Aside from using the producer-consumer pattern that has been suggested, I would recommend that you use the CopyOnWriteArrayList so you can have lock-free read/write/iteration of your list. Since you're only working with a couple of hundred strings you will probably not have any performance issues with the CopyOnWriteArrayList.
If you're concerned about performance then I actually think it might be better if you use the BlockingQueue or a ConcurrentHashMap. They will allow you to maximize throughput with your multithreaded application.
The recommended option:
A BlockingQueue works very well with multiple producers and consumers, but of course it implies an order of data processing (FIFO). If you're OK with FIFO ordering, then you will probably find that the BlockingQueue is a faster and more robust option.
I think that the Wikipedia article has sufficient pseudo code for you to use, but you can also check out some of the following SO questions:
https://stackoverflow.com/search?q=java+producer+consumer
Java Producer-Consumer Designs:
Producer/Consumer threads using a Queue
design of a Producer/Consumer app
Related
I want to parse multiple files to extract the required data and then write the output into an XML file. I have used Callable Interface to implement this. My colleague asked me to use Java 8 feature which does this job easily. I am really confused which one of them I should use now.
list.parallelStream().forEach(a -> {
System.out.println(a);
});
Using concurrency or a parallel stream only helps if you have independent tasks to work on. A good example of when you wouldn't do this is what you are locking on a shared resources e.g.
// makes no sense to use parallel here.
list.parallelStream().forEach(a -> {
// locks System.out so only one thread at a time can do any work.
System.out.println(a);
});
However, as a general question, I would use parallelStream for processing data, instead of the concurrency libraries directly because;
a functional style of coding discourages shared mutable state. (Actually how are not supposed to have an mutable state in functional programming but Java is not really a functional language)
it's easier to write and understand for processing data.
it's easier to test whether using parallel helps or not. Most likely ti won't and you can just as easily change it back to being serial.
IMHO Given the chances that using parallel coding will really help is low, the best feature of parallelStream is not how simple it is to add, but how simple it is to take out.
The concurrency library is better if you have ad hoc work which is difficult to model as a stream of data. e.g. a worker pool for client requests might be simplier to implement using an ExecutorService.
I want to implement a queue, that is hit by multiple threads.
This is stack is in a singleton class.
Now, a simple solution is to synchronize this? I assume it would need this as standard?
However, I want to prioritize writing to it.
So, write is high priority, read is low priority.
Is this possible?
Ideally writing by multiple threads without synchronizing would be great, if possible.
Why do you want to avoid synchronizing? It's possible to write "lock-free" structures, but it's quite tricky and easy to get wrong.
If I were you, I'd use ArrayBlockingQueue or ConcurrentLinkedQueue (or one of the other structures from java.util.concurrent) and make your life easy!
Oh, and I missed the bit about prioritising reads over writes. You can do that with the ReentrantReadWriteLock class. Then you don't need a thread-safe queue - you just lock externally using the read-write lock depending on whether you're reading or writing.
Can anybody suggest me how can can I show Statistically difference between Normal
Multithreading and Executors with multithreading in-terms of as e.g CPU time,Total thread
user time,memory usage, & so on
Any suggestions will be helpful.
I am not sure I understand the term "Statistically difference". I believe that you are asking about using of executors and plain thread API and what is the difference among them.
First, executors a based on threads; it is just yet another layer on top of them. No magic. Plain threading API allows you creation and managing of multithreaded applications but requires dealing with gory details of thread synchronization, pooling, transfering data between threads etc.
Executors framework solves some of these problems. You can define thread pool policy, choose queue type according to your needs and just put new tasks to the incoming queue. The thread pool will execute the tasks according to it configuration.
The problem is that what your question is asking something that makes little sense.
Before you can meaningfully talk about the "statistical difference" between things, you have to have some way of quantifying and measuring them. And before that can happen, you have a clear statement of what you are trying to quantify / measure.
What you are asking satisfies none of these criteria.
Assuming that you have a meaningful question ...
At a practical level, the normal way that people try to quantify the effect of something like this (using thread pools versus creating new threads) is to develop a benchmark application with variants corresponding to the two strategies. Then measure the relative performance. But this has many problems.
The most fundamental problem that what you are actually measuring is effect of the two strategies for that benchmark, and that benchmark only. Generalizing from the benchmark to other applications is very difficult. The problem is that there are "hidden parameters" embedded in the design of any benchmark. For instance, the number of processors, the number of threads, the length and complexity of the tasks, and so on. Without having a good intuition as to what the parameters are, it is difficult to design a benchmark to take them into account. And even if you succeed in figuring out what the hidden parameters are and quantifying their effect, you have the problem that you can't figure out what those parameters will be in a real (more complex) application. At the end of the day, you'll end up with a model that can't give you quantitative answers for real problems. (Computing has nothing like Newton's Law of Gravity.)
Is there an implementation with a blocking queue for take but bounded by a maximum size. When the size of the queue reaches a given max-size, instead of blocking 'put', it will remove the head element and insert it. So put is not blocked() but take() is.
One usage is that if I have a very slow consumer, the system will not crash ( runs out of memory ) rather these message will be removed but I do not want to block the producer.
An example of this would stock trading system. When you get a spike in stock trade/quote data, if you haven't consumed data, you want to automatically throw away old stock trade/quote.
There currently isnt in Java a thread-safe queue that will do what you are looking for. However, there is a BlockingDequeue (Double Ended Queue) that you can write a wrapper in which you can take from the head and and tail as you see freely.
This class, similar to a BlockingQueue, is thread safe.
Several strategies are provided in ThreadPoolExecutor. Search for "AbortPolicy" in this javadoc . You can also implement your own policy if you want. Perhaps Discard is similar to what you want. Personally I think CallerRuns is what you want in most cases.
I think using these is a better solution, but if you absolutely want to implement it at the queue, I'd probably do it by composition. Perhaps use a LinkedList or something and wrap it with synchronize keyword.
EDIT:(some clarifications..)
"Executor" is basically a thread pool combined with a blocking queue. It is the recommended way to implement a producer/consumer pattern in java. The authors of these libraries provides several strategies to cope with issues like you mentioned. If you are interested, here is another approach to specifically address the OOME issue (the source is framework specific and can't be used as is).
I am facing this issue:
I have lots of threads (1024) who access one large collection - Vector.
Question:
is it possible to do something about it which would allow me to do concurrent actions on it without having to synchronize everything (since that takes time)? What I mean, is something like Mysql database works, you don't have to worry about synchronizing and thread-safe issues. Is there some collection alike that in Java? Thanks
Vector is a very old Java class - predates the Collections API. It synchronizes on every operation, so you're not going to have any luck trying to speed it up.
You should consider reworking your code to use something like ConcurrentHashMap or a LinkedBlockingQueue, which are highly optimized for concurrent access.
Failing that, you mention that you'd like performance and access semantics similar to a database - why not use a dedicated database or a message queue? They are likely to implement it better than you ever will, and it's less code for you to write!
[edit] Given your comment:
all what thread does is adding elements to vector
(only if num of elements in vector = 0) &
removing elements from vector. (if vector size > 0)
it sounds very much like you should be using something much more like a queue than a list! A bounded queue with size 1 will give you these semantics - although I'd question why you can't add elements if there is already something there. When you've got thousands of threads this seems like a very inefficient design.
Well first off, this design doesn't sound right. It sounds like you need to think about using a proper database rather than an simple data structure, even if this means just using something like an in-memory instance of HypersonicDB.
However, if you insist on doing things this way, then the java.util.concurrent package has a number of highly concurrent, non-locking data structures. One of them might suit your purpose (e.g. ConcurrentHashMap, if you can use a Map rather than a List)
Looks like you are implementing the producer consumer pattern, you should google "producer consumer java" or have a look at the BlockingQueue interface
I agree with skaffman about looking at java.util.concurrent.
ConcurrentHashMap is very scalable. However, the size() call on it returns only an approximation. So e.g. your app will occasionally be adding elements to it even if !(num of elements in vector = 0).
If you want to strictly enforce the condition you gave, there is no other way than to synchronize.
Instead of having tons of context switches, I guess you could let your users thread post a callable on a queue and have only one thread dealing with the mutation. This will eliminate the need for synchronization on the collection. The user threads can wait on Future.get().
Just an idea.
If you do not want to change your data structure and have only infrequent writes, you might also use one or many ReentrantReadWriteLock to synchronize access. Then many threads can read at the same time, but when a thread wants to write all reads are blocked until the write is done.
But you should check whether the used data structure is appropriate for the task, or whether another of the many java.util or java.util.concurrent classes is more appropriate. java.util.Vector is synchronized, by the way.