I was asked a question in an interview, where i have list available in the main method and and i was told there is some operation to be performed on each item in the list, how would i achieve this using threads concept.
Consider the following scenario:
I have a list of integers. I need to print all the values from the list. Can it be done using threads concept where i have multiple threads running on each item in the list and where each thread is used to print out a value rather than one thread printing all the values? I am not trying to modify any value in the list.
I hope you are looking for something like that:
public class MaltiThreadExample {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
List<Integer> list = new ArrayList<>(Arrays.asList(1, 2, 3));
for (int i : list) {
Thread th = new Thread() {
#Override
public void run() {
System.out.println(i);
}
};
th.start();
}
}
}
The output is for one execution:
run:
3
1
2
BUILD SUCCESSFUL (total time: 0 seconds)
Yes, it is a typical producer-consumer paradigm:
Imagine a Runnable class who receives an Iterator as parameter, and waits over a certain monitor, and then consumes one item from the iterator, and last notifies the same monitor. Loops while the iterator has more items.
Upon this, it will be enough to create the list of numbers, create the consumer threads passing them the list's iterator, and start them.
The code below is not tested at all. It's just something that comes into mind. The last implementation using parallelStream() might be what you are looking for.
public class DemoApplication {
public static void main(String[] args) {
final List<Integer> myIntegerList = Arrays.asList(1, 2, 3);
// Good old for-each-loop
for (Integer item : myIntegerList) {
System.out.print(item);
}
// Java 8 forEach with Consumer
final Consumer<Integer> consumer = new Consumer<Integer>() {
#Override
public void accept(Integer item) {
System.out.print(item);
}
};
myIntegerList.forEach(consumer);
// Java 8 forEach with Lambda
myIntegerList.forEach((item) -> System.out.print(item));
// Java 8 forEach on parallelStream with Lambda
myIntegerList.parallelStream().forEach((item) -> System.out.print(item));
}
}
i am trying to understand the advantage of threads.
There are basically two reasons for using multiple threads in a program:
(1) Asynchronous event handling: Imagine a program that must wait for and respond to several different kinds of input, and each kind of input can happen at completely arbitrary times.
Before threads, we used to write a big event loop, that would poll for each different kind of event, and then dispatch to different handler functions. Things could start to get ugly when one or more of the event handlers was stateful (i.e., what it did next would depend on the history of previous events.)
A program that has one thread for each different kind of event often is much cleaner. That is to say, it's easier to understand, easier to modify, etc. Each thread loops waiting for just one kind of event, and its state (if any) can be kept in local variables, or its state can be implicit (i.e., depends on what function the thread is in at any given time).
(2) Multiprocessing (a.k.a., "parallel processing", "concurrent programming",...): Using worker threads to perform background computations probably is the most widespread model of multiprocessing in use at this moment in time.
Multithreading is the lowest-level of all multiprocessing models which means (a) it is the hardest to understand, but (b) it is the most versatile.
It can be done. We can make use of concurrenthashmap. We can add the list to this map and pass it to the threads. Each thread will try to get the lock on the resource to operate.
Related
We have multiple threads calling add(obj) on an ArrayList.
My theory is that when add is called concurrently by two threads, that only one of the two objects being added is really added to the ArrayList. Is this plausable?
If so, how do you get around this? Use a synchronized collection like Vector?
There is no guaranteed behavior for what happens when add is called concurrently by two threads on ArrayList. However, it has been my experience that both objects have been added fine. Most of the thread safety issues related to lists deal with iteration while adding/removing. Despite this, I strongly recommend against using vanilla ArrayList with multiple threads and concurrent access.
Vector used to be the standard for concurrent lists, but now the standard is to use the Collections synchronized list.
Also I highly recommend Java Concurrency in Practice by Goetz et al if you're going to be spending any time working with threads in Java. The book covers this issue in much better detail.
Any number of things could happen. You could get both objects added correctly. You could get only one of the objects added. You could get an ArrayIndexOutOfBounds exception because the size of the underlying array was not adjusted properly. Or other things may happen. Suffice it to say that you cannot rely on any behavior occurring.
As alternatives, you could use Vector, you could use Collections.synchronizedList, you could use CopyOnWriteArrayList, or you could use a separate lock. It all depends on what else you are doing and what kind of control you have over access to the collection.
You could also get a null, an ArrayOutOfBoundsException, or something left up to the implementation. HashMaps have been observed to go into an infinite loop in production systems. You don't really need to know what might go wrong, just don't do it.
You could use Vector, but it tends to work out the interface is not rich enough. You will probably find that you want a different data structure in most cases.
I came up with the following code to mimic somewhat a real world scenario.
100 tasks are run in parallel and they update their completed status to the main program. I use a CountDownLatch to wait for task completion.
import java.util.concurrent.*;
import java.util.*;
public class Runner {
// Should be replaced with Collections.synchronizedList(new ArrayList<Integer>())
public List<Integer> completed = new ArrayList<Integer>();
/**
* #param args
*/
public static void main(String[] args) {
Runner r = new Runner();
ExecutorService exe = Executors.newFixedThreadPool(30);
int tasks = 100;
CountDownLatch latch = new CountDownLatch(tasks);
for (int i = 0; i < tasks; i++) {
exe.submit(r.new Task(i, latch));
}
try {
latch.await();
System.out.println("Summary:");
System.out.println("Number of tasks completed: "
+ r.completed.size());
} catch (InterruptedException e) {
e.printStackTrace();
}
exe.shutdown();
}
class Task implements Runnable {
private int id;
private CountDownLatch latch;
public Task(int id, CountDownLatch latch) {
this.id = id;
this.latch = latch;
}
public void run() {
Random r = new Random();
try {
Thread.sleep(r.nextInt(5000)); //Actual work of the task
} catch (InterruptedException e) {
e.printStackTrace();
}
completed.add(id);
latch.countDown();
}
}
}
When i ran the application 10 times and at least 3 to 4 times the program did not print correct number of completed tasks. Ideally it should print 100(if no exceptions happen). But in some cases it was printing 98, 99 etc.
Thus it proves that concurrent updates of ArrayList will not give correct results.
If i replace the ArrayList with a Synchronized version, the program outputs the correct results.
you can use List l = Collections.synchronizedList(new ArrayList()); if you want thread safe version of arrayList.
The behavior is probably undefined since ArrayList isn't threadsafe. If you modify the list while an Iterator is interating over it then you will get a ConcurrentModificationException. You can wrap the ArrayList with Collection.synchronizedList or use a thread-safe collection (there are many), or just put the add calls in a synchronized block.
You could use instead of ArrayList(); :
Collections.synchronizedList( new ArrayList() );
or
new Vector();
synchronizedList as of me preferable because it's:
faster on 50-100%
can work with already existing ArrayList's
In my recently experience using ArrayList to add new elements from different threads will miss a few of them, so using Collections.synchronizedList(new ArrayList()) avoid that issue.
List<String> anotherCollection = new ArrayList<>();
List<String> list = new ArrayList<>();
// if 'anotherCollection' is bigger enough it will miss some elements.
anotherCollection.parallelStream().forEach(el -> list.add("element" + el));
List<String> listSync = Collections.synchronizedList(new ArrayList<>());
// regardless of 'anotherCollection' is bigger it will add all the elements.
anotherCollection.parallelStream().forEach(el -> list.add("element" + el));
java.util.concurrent has a thread-safe array list. The standard ArrayList is not thread-safe and the behavior when multiple threads update at the same time is undefined. There can also be odd behaviors with multiple readers when one or more threads is writing at the same time.
http://java.sun.com/j2se/1.4.2/docs/api/java/util/ArrayList.html
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally.
Since there is no synchronization internally, what you theorize is not plausible.
So, things get out of sync, with unpleasant and unpredictable results.
I have gotten my code into a state where I am creating a couple of threads and then inside those threads I use a library framework which spawns some additional threads over the life span of my application.
I have no control over how many threads are spawned inside the library framework, but I know they exist because I can see them in the eclipse debugger, I have kept the threads I use outside the library framework to a minimum, because I really don't want a multithreaded application, but sometimes you have too.
Now I am at the point where I need to do things with sockets and I/O, both of which are inherently hard to deal with in a multithreaded environment and while I am going to make my program thread safe i'd rather not get into the situation in the first place, or at least minimize the occurrences, the classes I am attempting to reduce multithreading in aren't time sensitive and i'd like them to complete "when they get the time". As it happens the lazy work is all in the same class definition but due to reasons, the class is instantiated a hell of a lot.
I was wondering if it was possible to make single type classes use only one thread when instantiated from multiple threads, and how?
I imagine the only way to achieve this would be to create a separate thread specifically for handling and processing of a instances of single class type.
Or do I just have to think of a new way to structure my code?
EDIT: included an example of my applications architecture;
public class Example {
public ArrayList<ThreadTypeA> threads = new ArrayList<ThreadTypeA>();
public static void main(String[] args) {
threads.add(new ThreadTypeA());
// left out how dataObj gets to ThreadTypeB for brevity
dataObj data = new dataObj(events);
}
}
public ThreadTypeA {
public ArrayList<ThreadTypeB> newThreads = new ArrayList<ThreadTypeB>();
public Thread thread = new Thread(this, "");
}
public ThreadTypeB {
// left out how dataObj gets to ThreadTypeB for brevity
public libObj libObj = new Library(dataObj);
}
public Library {
public Thread thread = new Thread(this, "");
#Override
public void editMe(dataObj) {
dataObj.callBack();
}
}
public dataObj(events) {
public void callMe() {
for (Event event: events) {
event.callMe();
}
}
}
there are a number of different events that can be called, ranging from writing to files making sql queries, sending emails and using proprietary ethernet-serial comms. I wish all events to run on the same thread, sequentially.
Rather than having Threads, consider having Callable or Runnables. These are objects which represent the work that is to be done. Your code can pass these to a thread pool for execution - you'll get a Future. If you care about the answer, you'll call get on the future and your code will wait for the execution to complete. If it's a fire-and-forget then you can be assured it's queued and will get done in good time.
Generally it makes more sense to divorce your execution code from the threads that run it to allow patterns like this.
To restrict thread resources use a limited thread pool:
ExecutorService executor = Executors.newFixedThreadPool(4);
for (int i = 0; i < 100; ++i) {
executor.execute(new Runnable() { ... });
}
executor.shutdown();
Also the reuse of threads of such a pool is said to be faster.
It might be a far hope that the library does a similar thing, and maybe even has the thread pool size configurable.
I have two threads which both need to access an ArrayList<short[]> instance variable.
One thread is going to asynchronously add short[] items to the list via a callback when new data has arrived : void dataChanged(short[] theData)
The other thread is going to periodically check if the list has items and if it does it is going to iterate over all the items, process them, and remove them from the array.
How can I set this up to guard for collisions between the two threads?
This contrived code example currently throws a java.util.ConcurrentModificationException
//instance vairbales
private ArrayList<short[]> list = new ArrayList<short[]>();
//asynchronous callback happening on the thread that adds the data to the list
void dataChanged(short[] theData) {
list.add(theData);
}
//thread that iterates over the list and processes the current data it contains
Thread thread = new Thread(new Runnable() {
#Override
public void run() {
while (true) {
for(short[] item : list) {
//process the data
}
//clear the list to discared of data which has been processed.
list.clear();
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
});
You might want to use a producer consumer queue like an ArrayBlockingQueue instead or a similar concurrent collection.
The producer–consumer problem (also known as the bounded-buffer problem) is a classic example of a multi-process synchronization problem. The problem describes two processes, the producer and the consumer, who share a common, fixed-size buffer used as a queue. The producer's job is to generate a piece of data, put it into the buffer and start again. At the same time, the consumer is consuming the data (i.e., removing it from the buffer) one piece at a time. The problem is to make sure that the producer won't try to add data into the buffer if it's full and that the consumer won't try to remove data from an empty buffer.
One thread offers short[]s and the other take()s them.
The easiest way is to change the type of list to a thread safe list implementation:
private List<short[]> list = new CopyOnWriteArrayList<short[]>();
Note that this type of list is not extremely efficient if you mutate it a lot (add/remove) - but if it works for you that's a simple solution.
If you need more efficiency, you can use a synchronized list instead:
private List<short[]> list = Collections.synchronizedList(new ArrayList<short[]>());
But you will need to synchronize for iterating:
synchronized(list) {
for(short[] item : list) {
//process the data
}
}
EDIT: proposals to use a BlockingQueue are probably better but would need more changes in your code.
You might look into a blockingqueue for this instead of an arraylist.
Take a look at Java's synchronization support.
This page covers making a group of statements synchronized on a specified object. That is: only one thread may execute any sections synchronized on that object at once, all others have to wait.
You can use synchronized blocks, but I think the best solution is to not share mutable data between threads at all.
Make each thread to write in its own space and collect and aggregate the results when the workers are finished.
http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#synchronizedList%28java.util.List%29
You can ask the Collections class to wrap up your current ArrayList in a synchronized list.
We have 1000 threads that hit a web service and time how long the call takes. We wish for each thread to return their own timing result to the main application, so that various statistics can be recorded.
Please note that various tools were considered for this, but for various reasons we need to write our own.
What would be the best way for each thread to return the timing - we have considered two options so far :-
1. once a thread has its timing result it calls a singleton that provides a synchronised method to write to the file. This ensures that all each thread will write to the file in turn (although in an undetermined order - which is fine), and since the call is done after the timing results have been taken by the thread, then being blocked waiting to write is not really an issue. When all threads have completed, the main application can then read the file to generate the statistics.
2. Using the Executor, Callable and Future interfaces
Which would be the best way, or are there any other better ways ?
Thanks very much in advance
Paul
Use the latter method.
Your workers implement Callable. You then submit them to a threadpool, and get a Future instance for each.
Then just call get() on the Futures to get the results of the calculations.
import java.util.*;
import java.util.concurrent.*;
public class WebServiceTester {
public static class Tester
implements Callable {
public Integer call() {
Integer start = now();
//Do your test here
Integer end = now();
return end - start;
}
}
public static void main(String args[]) throws Exception {
ExecutorService pool = Executors.newFixedThreadPool(1000);
Set<Future<Integer>> set = new HashSet<Future<Integer>>();
for (int i =0 ; i < 1000 i++) {
set.add(pool.submit(new Tester()));
}
Set<Integer> results = new Set<Integer>();
for (Future<Integer> future : set) {
results.put(future.get());
}
//Manipulate results however you wish....
}
}
Another possible solution I can think of would be to use a CountDownLatch (from the java concurrency packages), each thread decrementing it (flagging they are finished), then once all complete (and the CountDownLatch reaches 0) your main thread can happily go through them all, asking them what their time was.
The executor framework can be implemented here. The time processing can be done by the Callable object. The Future can help you identify if the thread has completed processing.
You could pass an ArrayBlockingQueue to the threads to report their results to. You could then have a file writing thread that takes from the queue to write to the file.
I understand the concept behind threading and have written threads in other languages, but I am having trouble understanding how to adapt them to my needs in java.
Basicly at present I have a vector of objects, which are read in from a file sequentially.
The file then has a list of events, which need to happen concurrently so waiting for one event to finish which takes 20-30 seconds is not an option.
There is only a couple of methods in the object which deal with these events. However from looking at tutorials, objects must extend/implement threads/runnable however if the object is in a thread making a method call to that object seems to happen sequentially anyway.
An y extra information would be appreciated as I am clearly missing something I am just not quite sure what!
So to summarise how can I execute a single method using a thread?
To start a thread you call start() on an instance of Thread or a subclass thereof. The start() method returns immediately. At the same time, the other thread (the one incarnated by the Thread instance) takes off, and proceeds with executing the run() method of the Thread instance.
Managing threads is not as easy as it seems. For a smoother API, try using an Executor (see the classes in java.util.concurrent).
The best thing to do in Java is create another class that takes in the data you need to process and performs whatever you need it to perform:
class Worker implements Runnable{
Object mydata;
Worker(Object data)
{
mydata = data;
}
#override
void run()
{
//process the data
System.out.println(data.toString());
//or if you want to use your class then:
YourClass yc = (YourClass)myData;
yc.methodB();
}
}
class YourClass
{
private final ExecutorService executor = Executors.newCachedThreadPool();
private ArrayList<Object> list;
YourClass()
{
list = new ArrayList<Object>();
list.add(new Object());
...
...
list.add(new Object());
}
void methodA()
{
for(Object item : list )
{
// Create a new thread with the worker class taking the data
executor.execute(new Worker(item));
}
}
void methodB(){/*do something else here*/}
}
Note that instead of getting the data, you can pass the actual class that you need the method to be invoked on:
executor.execute(new Worker(new MyClass()));
In the run method of the Worker class you invoke whatever you need to invoke on MyClass... the executor creates a new thread and calls run on your Worker. Each Worker will run in a separate thread and it will be parallel.
Thomas has already given the technical details. I am going to try and focus on the logic.
Here is what I can suggest from my understanding of your problem.
Lets say you have a collection of objects of type X (or maybe even a mix of different types). You need to call methods foo and/or bar in these objects based on some event specified. So now, you maybe have a second collection that stores those.
So we have two List objects (one for the X objects and other for the events).
Now, we have a function execute that will take X, and the event, and call foo or bar. This execute method can be wrapped in a thread, and executed simultaneously. Each of these threads can take one object from the list, increment the counter, and execute foo/bar. Once done, check the counter, and take the next one from the list. You can have 5 or more of these threads working on the list.
So, as we see, the objects coming from file do not have to be the Thread objects.
You have to be very careful that the List and counter are synchronized. Much better data structures are possible. I am sticking to a crude one for ease of understanding.
Hope this helps.
The key to threads is to remember that each task that must be running must be in its own thread. Tasks executing in the same thread will execute sequentially. Dividing the concurrent tasks among separate threads will allow you to do your required cocurrent processing.