How to use Multithreading to effectively

How to use Multithreading to effectively - java

I want to do a task that I've already completed except this time using multithreading. I have to read a lot of data from a file (line by line), grab some information from each line, and then add it to a Map. The file is over a million lines long so I thought it may benefit from multithreading.
I'm not sure about my approach here since I have never used multithreading in Java before.
I want to have the main method do the reading, and then giving the line that has been read to another thread which will format a String, and then give it to another thread to put into a map.
public static void main(String[] args)
{
//Some information read from file
BufferedReader br = null;
String line = '';
try {
br = new BufferedReader(new FileReader("somefile.txt"));
while((line = br.readLine()) != null) {
// Pass line to another task
}
// Here I want to get a total from B, but I'm not sure how to go about doing that
}
public class Parser extends Thread
{
private Mapper m1;
// Some reference to B
public Parse (Mapper m) {
m1 = m;
}
public parse (String s, int i) {
// Do some work on S
key = DoSomethingWithString(s);
m1.add(key, i);
}
}
public class Mapper extends Thread
{
private SortedMap<String, Integer> sm;
private String key;
private int value;
boolean hasNewItem;
public Mapper() {
sm = new TreeMap<String, Integer>;
hasNewItem = false;
}
public void add(String s, int i) {
hasNewItem = true;
key = s;
value = i;
}
public void run() {
while (!Thread.currentThread().isInterrupted()) {
try {
if (hasNewItem) {
// Find if street name exists in map
sm.put(key, value);
newEntry = false;
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
// I'm not sure how to give the Map back to main.
}
}
I'm not sure if I am taking the right approach. I also do not know how to terminate the Mapper thread and retrieve the map in the main. I will have multiple Mapper threads but I have only instantiated one in the code above.
I also just realized that my Parse class is not a thread, but only another class if it does not override the run() method so I am thinking that the Parse class should be some sort of queue.
And ideas? Thanks.
EDIT:
Thanks for all of the replies. It seems that since I/O will be the major bottleneck there would be little efficiency benefit from parallelizing this. However, for demonstration purpose, am I going on the right track? I'm still a bit bothered by not knowing how to use multithreading.

Why do you need multiple threads? You only have one disk and it can only go so fast. Multithreading it won't help in this case, almost certainly. And if it does, it will be very minimal from a user's perspective. Multithreading isn't your problem. Reading from a huge file is your bottle neck.

Frequently I/O will take much longer than the in-memory tasks. We refer to such work as I/O-bound. Parallelism may have a marginal improvement at best, and can actually make things worse.
You certainly don't need a different thread to put something into a map. Unless your parsing is unusually expensive, you don't need a different thread for it either.
If you had other threads for these tasks, they might spend most of their time sitting around waiting for the next line to be read.
Even parallelizing the I/O won't necessarily help, and may hurt. Even if your CPUs support parallel threads, your hard drive might not support parallel reads.
EDIT:
All of us who commented on this assumed the task was probably I/O-bound -- because that's frequently true. However, from the comments below, this case turned out to be an exception. A better answer would have included the fourth comment below:
Measure the time it takes to read all the lines in the file without processing them. Compare to the time it takes to both read and process them. That will give you a loose upper bound on how much time you could save. This may be decreased by a new cost for thread synchronization.

You may wish to read Amdahl's Law. Since the majority of your work is strictly serial (the IO) you will get negligible improvements by multi-threading the remainder. Certainly not worth the cost of creating watertight multi-threaded code.
Perhaps you should look for a new toy-example to parallelise.

Related

Missing updates with locks and ConcurrentHashMap

I have a scenario where I have to maintain a Map which can be populated by multiple threads, each modifying their respective List (unique identifier/key being the thread name), and when the list size for a thread exceeds a fixed batch size, we have to persist the records to the database.
Aggregator class
private volatile ConcurrentHashMap<String, List<T>> instrumentMap = new ConcurrentHashMap<String, List<T>>();
private ReentrantLock lock ;
public void addAll(List<T> entityList, String threadName) {
try {
lock.lock();
List<T> instrumentList = instrumentMap.get(threadName);
if(instrumentList == null) {
instrumentList = new ArrayList<T>(batchSize);
instrumentMap.put(threadName, instrumentList);
}
if(instrumentList.size() >= batchSize -1){
instrumentList.addAll(entityList);
recordSaver.persist(instrumentList);
instrumentList.clear();
} else {
instrumentList.addAll(entityList);
}
} finally {
lock.unlock();
}
}
There is one more separate thread running after every 2 minutes (using the same lock) to persist all the records in Map (to make sure we have something persisted after every 2 minutes and the map size does not gets too big)
if(//Some condition) {
Thread.sleep(//2 minutes);
aggregator.getLock().lock();
List<T> instrumentList = instrumentMap.values().stream().flatMap(x->x.stream()).collect(Collectors.toList());
if(instrumentList.size() > 0) {
saver.persist(instrumentList);
instrumentMap .values().parallelStream().forEach(x -> x.clear());
aggregator.getLock().unlock();
}
}
This solution is working fine in almost for every scenario that we tested, except sometimes we see some of the records went missing, i.e. they are not persisted at all, although they were added fine to the Map.
My questions are:
What is the problem with this code?
Is ConcurrentHashMap not the best solution here?
Does the List that is used with the ConcurrentHashMap have an issue?
Should I use the compute method of ConcurrentHashMap here (no need I think, as ReentrantLock is already doing the same job)?

The answer provided by #Slaw in the comments did the trick. We were letting the instrumentList instance escape in non-synchronized way i.e. access/operations are happening over list without any synchonization. Fixing the same by passing the copy to further methods did the trick.
Following line of code is the one where this issue was happening
recordSaver.persist(instrumentList);
instrumentList.clear();
Here we are allowing the instrumentList instance to escape in non-synchronized way i.e. it is passed to another class (recordSaver.persist) where it was to be actioned on but we are also clearing the list in very next line(in Aggregator class) and all of this is happening in non-synchronized way. List state can't be predicted in record saver... a really stupid mistake.
We fixed the issue by passing a cloned copy of instrumentList to recordSaver.persist(...) method. In this way instrumentList.clear() has no affect on list available in recordSaver for further operations.

I see, that you are using ConcurrentHashMap's parallelStream within a lock. I am not knowledgeable about Java 8+ stream support, but quick searching shows, that
ConcurrentHashMap is a complex data structure, that used to have concurrency bugs in past
Parallel streams must abide to complex and poorly documented usage restrictions
You are modifying your data within a parallel stream
Based on that information (and my gut-driven concurrency bugs detector™), I wager a guess, that removing the call to parallelStream might improve robustness of your code. In addition, as mentioned by #Slaw, you should use ordinary HashMap in place of ConcurrentHashMap if all instrumentMap usage is already guarded by lock.
Of course, since you don't post the code of recordSaver, it is possible, that it too has bugs (and not necessarily concurrency-related ones). In particular, you should make sure, that the code that reads records from persistent storage — the one, that you are using to detect loss of records — is safe, correct, and properly synchronized with rest of your system (preferably by using a robust, industry-standard SQL database).

It looks like this was an attempt at optimization where it was not needed. In that case, less is more and simpler is better. In the code below, only two concepts for concurrency are used: synchronized to ensure a shared list is properly updated and final to ensure all threads see the same value.
import java.util.ArrayList;
import java.util.List;
public class Aggregator<T> implements Runnable {
private final List<T> instruments = new ArrayList<>();
private final RecordSaver recordSaver;
private final int batchSize;
public Aggregator(RecordSaver recordSaver, int batchSize) {
super();
this.recordSaver = recordSaver;
this.batchSize = batchSize;
}
public synchronized void addAll(List<T> moreInstruments) {
instruments.addAll(moreInstruments);
if (instruments.size() >= batchSize) {
storeInstruments();
}
}
public synchronized void storeInstruments() {
if (instruments.size() > 0) {
// in case recordSaver works async
// recordSaver.persist(new ArrayList<T>(instruments));
// else just:
recordSaver.persist(instruments);
instruments.clear();
}
}
#Override
public void run() {
while (true) {
try { Thread.sleep(1L); } catch (Exception ignored) {
break;
}
storeInstruments();
}
}
class RecordSaver {
void persist(List<?> l) {}
}
}

Synchronizing searches and modifications

What's a good way of allowing searches from multiple threads on a list (or other data structure), but preventing searches on the list and edits to the list on different threads from interleaving? I tried using synchronized blocks in the searching and editing methods, but that can cause unnecessary blocking when trying to run searches in multiple threads.
EDIT: The ReadWriteLock is exactly what I was looking for! Thanks.

Usually, yes ReadWriteLock is good enough.
But, if you're using Java 8 you can get a performance boost with the new StampedLock that lets you avoid the read lock. This applies when you have much more frequent reads(searches) compared with writes(edits).
private StampedLock sl = new StampedLock();
public void edit() { // write method
long stamp = sl.writeLock();
try {
doEdit();
} finally {
sl.unlockWrite(stamp);
}
}
public Object search() { // read method
long stamp = sl.tryOptimisticRead();
Object result = doSearch(); //first try without lock, search ideally should be fast
if (!sl.validate(stamp)) { //if something has modified
stamp = sl.readLock(); //acquire read lock and search again
try {
result = doSearch();
} finally {
sl.unlockRead(stamp);
}
}
return result;
}

Java concurrency pattern to parallel parts of task

I read lines from file, in one thread of course. Lines was sorted by key.
Then I collect lines with same key (15-20 lines), make parsing, big calculation, etc, and push resulting object to statistic class.
I want to paralell my programm to read in one thread, make parsing and calc in many threads, and join results in one thread to write to stat class.
Is any ready pattern or solution in java7 framework for this problem?
I realize it with executor for multithreading, pushing to blockingQueue, and reading queue in another thread, but i think my code sucks and will produce bugs
Many thanks
upd:
I can't map all file in memory - it's very big

You already have the main classes of approaches in mind. CountDownLatch, Thread.join, Executors, Fork/Join. Another option is the Akka framework, which has message passing overheads measured in 1-2 microseconds and is open source. However let me share another approach that often out performs the above approaches and is simpler, this approach is born from working on batch file loads in Java for a number of companies.
Assuming that your goal of splitting the work up is performance, rather than learning. Performance as measured by how long it takes from start to finish. Then it is often difficult to make it faster than memory mapping the file, and processing in a single thread that has been pinned to a single core. It is also gives much simpler code too. A double win.
This may be counter intuitive, however the speed of processing files is nearly always limited by how efficient the file loading is. Not how parallel the processing is. Hence memory mapping the file is a huge win. Once memory mapped we want the algorithm to have low contention with the hardware as it performs the file load. Modern hardware tend to have the IO controller and the memory controller on the same socket as the CPU; which when combined with the prefetchers within the CPU itself lead to a hell of a lot of efficiency when processing the file in a orderly fashion from a single thread. This can be so extreme that going parallel may actually be a lot slower. Pinning a thread to a core usually speeds up memory bound algorithms by a factor of 5. Which is why the memory mapping part is so important.
If you have not already, give it a try.

Without facts and numbers it is hard to give you advices. So let's start from the beginning:
You must identify the bottleneck. Do you really need to perform the computation in parallel or is your job IO bound ? Avoid concurrency if possible, it could be faster.
If computations must be done in parallel you must decide how fine or coarse grained your tasks must be. You need to measure your computations and tasks to be able to size them. Avoid to create too many tasks
You should have a IO thread, several workers, and a "data gatherer" thread. No mutable data.
Be sure to not slow down the IO thread because of task submission. Otherwise you should use more coarse grained tasks or use a better task dispatcher (who said disruptor ?)
The "Data gatherer" thread should be the only one to mutate the final state
Avoid unnecessary data copy and object creation. Quite often, when iterating on large files the bottleneck is the GC. Last week, I achieved a 6x speedup replacing a standard scala object by a flyweight pattern. You should also try to pre-allocate everything and use large buffers (page sized).
Avoid disk seeks.
Having that said you should be one the right track. You can start with an Executor using properly sized tasks. Tasks write into a data structure, like your blocking queue, shared between workers and the "data gatherer" thread. This threading model is really simple, efficient and hard to get wrong. It is usually efficient enough. If you still require better performances then you must profile your application and understand the bottleneck. Then you can decide the way to go: refine your task size, use faster tools like the disruptor/Akka, improve IO, create fewer objects, tune your code, buy a bigger machine or faster disks, move to Hadoop etc. Pinning each thread to a core (require platform specific code) could also provide a significant boost.

You can use CountDownLatch
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/CountDownLatch.html
to synchronize the starting and joining of threads. This is better than looping on the set of threads and calling join() on each thread reference.

Here is what I would do if asked to split work as you are trying to:
public class App {
public static class Statistics {
}
public static class StatisticsCalculator implements Callable<Statistics> {
private final List<String> lines;
public StatisticsCalculator(List<String> lines) {
this.lines = lines;
}
#Override
public Statistics call() throws Exception {
//do stuff with lines
return new Statistics();
}
}
public static void main(String[] args) {
final File file = new File("path/to/my/file");
final List<List<String>> partitionedWork = partitionWork(readLines(file), 10);
final List<Callable<Statistics>> callables = new LinkedList<>();
for (final List<String> work : partitionedWork) {
callables.add(new StatisticsCalculator(work));
}
final ExecutorService executorService = Executors.newFixedThreadPool(Math.min(partitionedWork.size(), 10));
final List<Future<Statistics>> futures;
try {
futures = executorService.invokeAll(callables);
} catch (InterruptedException ex) {
throw new RuntimeException(ex);
}
try {
for (final Future<Statistics> future : futures) {
final Statistics statistics = future.get();
//do whatever to aggregate the individual
}
} catch (InterruptedException | ExecutionException ex) {
throw new RuntimeException(ex);
}
executorService.shutdown();
try {
executorService.awaitTermination(1, TimeUnit.DAYS);
} catch (InterruptedException ex) {
throw new RuntimeException(ex);
}
}
static List<String> readLines(final File file) {
//read lines
return new ArrayList<>();
}
static List<List<String>> partitionWork(final List<String> lines, final int blockSize) {
//divide up the incoming list into a number of chunks
final List<List<String>> partitionedWork = new LinkedList<>();
for (int i = lines.size(); i > 0; i -= blockSize) {
int start = i > blockSize ? i - blockSize : 0;
partitionedWork.add(lines.subList(start, i));
}
return partitionedWork;
}
}
I have create a Statistics object, this holds the result of the work done.
There is a StatisticsCalculator object which is a Callable<Statistics> - this does the calculation. It is given a List<String> and it processes the lines and creates the Statistics.
The readLines method I leave to you to implement.
The most important method in many ways is the partitionWork method, this divides the incoming List<String> which is all the lines in the file into a List<List<String>> using the blockSize. This essentially decides how much work each thread should have, tuning of the blockSize parameter is very important. As if each work is only one line then the overheads would probably outweight the advantages whereas if each work of ten thousand lines then you only have one working Thread.
Finally the meat of the opertation is the main method. This calls the read and then partition methods. It spawns an ExecutorService with a number of threads equal to the number of bits of work but up to a maximum of 10. You may way to make this equal to the number of cores you have.
The main method then submits a List of all the Callables, one for each chunk, to the executorService. The invokeAll method blocks until the work is done.
The method now loops over each returned List<Future> and gets the generated Statistics object for each; ready for aggregation.
Afterwards don't forget to shutdown the executorService as it will prevent your application form exiting.
EDIT
OP wants to read line by line so here is a revised main
public static void main(String[] args) throws IOException {
final File file = new File("path/to/my/file");
final ExecutorService executorService = Executors.newFixedThreadPool(10);
final List<Future<Statistics>> futures = new LinkedList<>();
try (final BufferedReader reader = new BufferedReader(new FileReader(file))) {
List<String> tmp = new LinkedList<>();
String line = null;
while ((line = reader.readLine()) != null) {
tmp.add(line);
if (tmp.size() == 100) {
futures.add(executorService.submit(new StatisticsCalculator(tmp)));
tmp = new LinkedList<>();
}
}
if (!tmp.isEmpty()) {
futures.add(executorService.submit(new StatisticsCalculator(tmp)));
}
}
try {
for (final Future<Statistics> future : futures) {
final Statistics statistics = future.get();
//do whatever to aggregate the individual
}
} catch (InterruptedException | ExecutionException ex) {
throw new RuntimeException(ex);
}
executorService.shutdown();
try {
executorService.awaitTermination(1, TimeUnit.DAYS);
} catch (InterruptedException ex) {
throw new RuntimeException(ex);
}
}
This streams the file line by line and, after a given number of lines fires a new task to process the lines to the executor.
You would need to call clear on the List<String> in the Callable when you are done with it as the Callable instances are references by the Futures they return. If you clear the Lists when you're done with them that should reduce the memory footprint considerably.
A further enhancement may well be to use the suggestion here for a ExecutorService that blocks until there is a spare thread - this will guranatee that there are never more than threads*blocksize lines in memory at a time if you clear the Lists when the Callables are done with them.

Updating highly read Lists/Maps in a concurrent environment

The following class acts as a simple cache that gets updated very infrequently (say e.g. twice a day) and gets read quite a lot (up to several times a second). There are two different types, a List and a Map. My question is about the new assignment after the data gets updated in the update method. What's the best (safest) way for the new data to get applied?
I should add that it isn't necessary for readers to see the absolute latest value. The requirements are just to get either the old or the new value at any given time.
public class Foo {
private ThreadPoolExecutor _executor;
private List<Object> _listObjects = new ArrayList<Object>(0);
private Map<Integer, Object> _mapObjects = new HashMap<Integer, Object>();
private Object _mutex = new Object();
private boolean _updateInProgress;
public void update() {
synchronized (_mutex) {
if (_updateInProgress) {
return;
} else {
_updateInProgress = true;
}
}
_executor.execute(new Runnable() {
#Override
public void run() {
try {
List<Object> newObjects = loadListObjectsFromDatabase();
Map<Integer, Object> newMapObjects = loadMapObjectsFromDatabase();
/*
* this is the interesting part
*/
_listObjects = newObjects;
_mapObjects = newMapObjects;
} catch (final Exception ex) {
// error handling
} finally {
synchronized (_mutex) {
_updateInProgress = false;
}
}
}
});
}
public Object getObjectById(Integer id) {
return _mapObjects.get(id);
}
public List<Object> getListObjects() {
return new ArrayList<Object>(_listObjects);
}
}
As you see, currently no ConcurrentHashMap or CopyOnWriteArrayList is used. The only synchronisation is done in the update method.
Although not necessary for my current problem, it would be also great to know the best solution for cases where it is essential for readers to always get the absolute latest value.

You could use plan synchronization unless you are reading over 10,000 times per second.
If you want concurrent access I would use on of the concurrent collections like ConcurrentHashMap or CopyOnWriteArrayList. These are simpler to use than synchronizing the collection. (i.e. you don't need them for performance reasons, use them for simplicity)
BTW: A modern CPU can perform billions of operations in 0.1 seconds so several times a second is an eternity to a computer.

I am also seeing this issue and think of multiple solutions:
Use synchronization block on the both codes, one where reading and other where writing.
Make a separate remove list, add all removable items in that list. Remove in the same thread where reading the list just after reading is done. This way reading and deleting will happen in sequence and no error will come.

reduce in performance when used multithreading in java

I am new to multi-threading and I have to write a program using multiple threads to increase its efficiency. At my first attempt what I wrote produced just opposite results. Here is what I have written:
class ThreadImpl implements Callable<ArrayList<Integer>> {
//Bloom filter instance for one of the table
BloomFilter<Integer> bloomFilterInstance = null;
// Data member for complete data access.
ArrayList< ArrayList<UserBean> > data = null;
// Store the result of the testing
ArrayList<Integer> result = null;
int tableNo;
public ThreadImpl(BloomFilter<Integer> bloomFilterInstance,
ArrayList< ArrayList<UserBean> > data, int tableNo) {
this.bloomFilterInstance = bloomFilterInstance;
this.data = data;
result = new ArrayList<Integer>(this.data.size());
this.tableNo = tableNo;
}
public ArrayList<Integer> call() {
int[] tempResult = new int[this.data.size()];
for(int i=0; i<data.size() ;++i) {
tempResult[i] = 0;
}
ArrayList<UserBean> chkDataSet = null;
for(int i=0; i<this.data.size(); ++i) {
if(i==tableNo) {
//do nothing;
} else {
chkDataSet = new ArrayList<UserBean> (data.get(i));
for(UserBean toChk: chkDataSet) {
if(bloomFilterInstance.contains(toChk.getUserId())) {
++tempResult[i];
}
}
}
this.result.add(new Integer(tempResult[i]));
}
return result;
}
}
In the above class there are two data members data and bloomFilterInstance and they(the references) are passed from the main program. So actually there is only one instance of data and bloomFilterInstance and all the threads are accessing it simultaneously.
The class that launches the thread is(few irrelevant details have been left out, so all variables etc. you can assume them to be declared):
class MultithreadedVrsion {
public static void main(String[] args) {
if(args.length > 1) {
ExecutorService es = Executors.newFixedThreadPool(noOfTables);
List<Callable<ArrayList<Integer>>> threadedBloom = new ArrayList<Callable<ArrayList<Integer>>>(noOfTables);
for (int i=0; i<noOfTables; ++i) {
threadedBloom.add(new ThreadImpl(eval.bloomFilter.get(i),
eval.data, i));
}
try {
List<Future<ArrayList<Integer>>> answers = es.invokeAll(threadedBloom);
long endTime = System.currentTimeMillis();
System.out.println("using more than one thread for bloom filters: " + (endTime - startTime) + " milliseconds");
System.out.println("**Printing the results**");
for(Future<ArrayList<Integer>> element: answers) {
ArrayList<Integer> arrInt = element.get();
for(Integer i: arrInt) {
System.out.print(i.intValue());
System.out.print("\t");
}
System.out.println("");
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
I did the profiling with jprofiler and
![here]:(http://tinypic.com/r/wh1v8p/6)
is a snapshot of cpu threads where red color shows blocked, green runnable and yellow is waiting. I problem is that threads are running one at a time I do not know why?
Note:I know that this is not thread safe but I know that I will only be doing read operations throughout now and just want to analyse raw performance gain that can be achieved, later I will implement a better version.

Can anyone please tell where I have missed
One possibility is that the cost of creating threads is swamping any possible performance gains from doing the computations in parallel. We can't really tell if this is a real possibility because you haven't included the relevant code in the question.
Another possibility is that you only have one processor / core available. Threads only run when there is a processor to run them. So your expectation of a linear speed with the number of threads and only possibly achieved (in theory) if is a free processor for each thread.
Finally, there could be memory contention due to the threads all attempting to access a shared array. If you had proper synchronization, that would potentially add further contention. (Note: I haven't tried to understand the algorithm to figure out if contention is likely in your example.)
My initial advice would be to profile your code, and see if that offers any insights.
And take a look at the way you are measuring performance to make sure that you aren't just seeing some benchmarking artefact; e.g. JVM warmup effects.

That process looks CPU bound. (no I/O, database calls, network calls, etc.) I can think of two explanations:
How many CPUs does your machine have? How many is Java allowed to use? - if the threads are competing for the same CPU, you've added coordination work and placed more demand on the same resource.
How long does the whole method take to run? For very short times, the additional work in context switching threads could overpower the actual work. The way to deal with this is to make a longer job. Also, run it a lot of times in a loop not counting the first few iterations (like a warm up, they aren't representative.)

Several possibilities come to mind:
There is some synchronization going on inside bloomFilterInstance's implementation (which is not given).
There is a lot of memory allocation going on, e.g., what appears to be an unnecessary copy of an ArrayList when chkDataSet is created, use of new Integer instead of Integer.valueOf. You may be running into overhead costs for memory allocation.
You may be CPU-bound (if bloomFilterInstance#contains is expensive) and threads are simply blocking for CPU instead of executing.
A profiler may help reveal the actual problem.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.