Parallelization (Java 8) vs Concurrency (Java 7)

Parallelization (Java 8) vs Concurrency (Java 7) - java

I want to parse multiple files to extract the required data and then write the output into an XML file. I have used Callable Interface to implement this. My colleague asked me to use Java 8 feature which does this job easily. I am really confused which one of them I should use now.
list.parallelStream().forEach(a -> {
System.out.println(a);
});

Using concurrency or a parallel stream only helps if you have independent tasks to work on. A good example of when you wouldn't do this is what you are locking on a shared resources e.g.
// makes no sense to use parallel here.
list.parallelStream().forEach(a -> {
// locks System.out so only one thread at a time can do any work.
System.out.println(a);
});
However, as a general question, I would use parallelStream for processing data, instead of the concurrency libraries directly because;
a functional style of coding discourages shared mutable state. (Actually how are not supposed to have an mutable state in functional programming but Java is not really a functional language)
it's easier to write and understand for processing data.
it's easier to test whether using parallel helps or not. Most likely ti won't and you can just as easily change it back to being serial.
IMHO Given the chances that using parallel coding will really help is low, the best feature of parallelStream is not how simple it is to add, but how simple it is to take out.
The concurrency library is better if you have ad hoc work which is difficult to model as a stream of data. e.g. a worker pool for client requests might be simplier to implement using an ExecutorService.

Related

spring batch multi-threaded processor

I am trying to process records in processor step using multiple processor classes. These classes can work in parallel. Currently I have written a multi threaded step where I
Set input and output row for a processor class
Submit it to Executor service
Get all future objects and collect final output
Now as soon as I make my job parallel by adding taskExecutor ; I get issues as input objects set in step 1 get overwritten in step 2 and processors are called with overwritten values. I tried to search if I can write composite processor that delegates task to multiple steps in parallel but they work only in sequential manner.
Any inputs would be greatly helpful. Thanks !

Welcome to concurrency. You can get yourself into allot of trouble when you do not follow the path which keeps you in safe deterministic world. You can get rid of all your issues if you use pure functions. As in your functions do not have any side effects, all your variables should be final, you'll find that you wont have any concurrency issues if you stick to this. In general stay away for the threading libraries that get shipped with Java. You should treat thread pools and executors etc. as a resource. Probably should do a bit of reading about concurrency, locks, volatile variables, why these lower level constructs are hard to use, and then look at higher order constructs such as promises, futures and actors.

Using Actors instead of `synchronized`

Every time I read about using synchronized in Scala the author will usually mention that Actors should be used instead (this for example). While I understand roughly how actors work I'd really like to see an example of Actors being used to replace Java's synchronized method modifier (by this I mean its Scala equivalent - the synchronized block) in a piece of code. Modifying the internals of a data structure for instance would be nice to see.
Is this a good use of Actors or have I been misinformed?

1) Overview
Scala Actors can replace the complex business logic in a standard Java threaded application s which often evade developers working on complex multithreaded systems.
Consider the following java code snippet that one might see in a a simple, threaded application (this code is waiting for an asynchronous request to complete).
myAsyncRequest.startCalculation();
while(notDone)
myAsyncRequest.checkIfDone();
Thread.sleep(1000);
System.out.println("Done ! Value is : " + myAsyncRequest.getCalculationValue());
To see a direct replacement of this sort of code using Scala's higher level concurrency model, check this post out : Scala program exiting before the execution and completion of all Scala Actor messages being sent. How to stop this?
2) Now : back to the code snpipet --- There are some obvious issues here, lets take a quick look :
The code is coupling the logic of "monitoring" the execution of calculation to the processing of the calculated results.
There are heuristics embedded in the code (Thread.sleep(1000)) which have no clear logical justification (why wait a second ? Why not wait 3 seconds ?), thus adding unecessary logic to the code block.
It doesnt scale - if I'm running 1000 clients, and each is constantly checking the results, I could generate some pretty ugly traffic --- for no good reason.
How does scala modify this paradigm ?
Scala actors can return "futures"
These encapsulate the expectation that, soon enough, the "thing" that you want an actor to do will be accomplished. The scala "future" replaces this java construct : It makes "explicit" the fact that , my while loop is "expecting" something to occur in the near future, and there is an action to be done afterwards.
Scala actors can pass "messages"
Although I'm "waiting" (in the while loop above) for completion, its obvious that another way to implement would be if the calculation object would simply "tell me" when it was done. Message passing enables this, but is somewhat complicated and leads to untraceable, unreadable code in some java implementations. Since scala abstracts this notion in such a way that is directly designed to accomodate concurrent work-loads, the message passing design pattern can now be implemented in a way which isn't overly complex, thus decoupling the logic of "waiting" from the logic of processing.
3) The short answer : In general, the scala API's are built to encode concurrent logic at a higher level of abstraction, so that you're concurrent code is declarative, rather than muddled in implementation details.
4) Synchronization : A lower-level concept which , although essential, can complicate our code .
Synchronization is an artifact of lower-level, multithreaded programming. By providing higher level abstractions of the most common parallel programming paradigms, Scala makes this particular construct unnecessary in many of the most common concurrent programming user cases. In fact, nowadays, even java does this :) The java.util.concurrent package gives us atomic data types and data structures, obviating the need to wrap simple operations in "synchronized" blocks. However, standard Java does not support the higher level notions of "Actors" and "Futures" which can be effectively managed and coordinated without needing to manually manage synchronized method calls or object modifications.

Actors guarantee that only a single message will be handles at time so that there will not be two threads accessing any of the instance members - ergo no need to use synchronized

Manually Increasing the Amount of CPU a Java Application Uses

I've just made a program with Eclipse that takes a really long time to execute. It's taking even longer because it's loading my CPU to 25% only (I'm assuming that is because I'm using a quad-core and the program is only using one core). Is there any way to make the program use all 4 cores to max it out? Java is supposed to be natively multi-threaded, so I don't understand why it would only use 25%.

You still have to create and manage threads manually in your application. Java can't determine that two tasks can run asynchronously and automatically split the work into several threads.

This is a pretty vague question because we don't know much about what your program does. If your program is single-threaded, then no number of cores on your machine is going to make it run any faster. Java does have threading support, but it won't automatically parallelize your code for you. To speed it up, you'll need to identify parts of the computation that can be run in parallel with one another and add code as appropriate to split up and reconstitute the work. Without more info on what your program does, I can't help you out.
Another important detail to note is that Java threads are not the same as system threads. The JVM often has its own thread scheduler that tries to put Java threads onto actual system threads in a way that's fair, but there's no actual guarantee that it will do so.

Yes, Java is multi-threaded, but the multi-threading doesn't happen "by magic".
Have a look at either at the Thread class or at the Executor framework. Essentially you need to split your job into "subtasks" each of which can run on a single processor, then do something like this:
Executor ex = Executors.newFixedThreadPool(4);
while (thereAreMoreSubtasksToDo) {
ex.execute(new Runnable() {
public void run() {
... do subtask ...
}
});
}
Turning a serial routine/algorithm into a parallel one isn't necessarily trivial: you need to know in particular about a range of issues broadly termed "thread-safety". You may be interested in some material I've written about thread-safety in Java, and threading in general if you follow the links: the key thing to bear in mind is that if any data/objects are being shared among the different threads running, then you need to take special precautions. That said, for independent things that you just want to "run at the same time", then the above pattern will get you started.

Java is multi-threaded but if your application runs in only one thread, only one thread will be used. (Apart from the internal threads Java uses for finalization, garbage collection and so on.)
If you want your code to use multiple threads, you have to split it up manually, either by starting threads by yourself or using a third party thread pool. I'd suggest the latter option as it's safer but both can work equally well.

You've got a bit of learning ahead of you (actually, quite a bit of learning) - but it's learning you should do if you are going to be doing any serious programming.
Here's a starting point: http://download.oracle.com/javase/tutorial/essential/concurrency/
But you might want to look into a good book on Java multi-threading (I did this so long ago that any book I could recommend would be out of print). This sort of hard topic is well suited for learning from a text instead of online tutorials.

Approach to a thread safe program

All,
What should be the approach to writing a thread safe program. Given a problem statement, my perspective is:
1 > Start of with writing the code for a single threaded environment.
2 > Underline the fields which would need atomicity and replace with possible concurrent classes
3 > Underline the critical section and enclose them in synchronized
4 > Perform test for deadlocks
Does anyone have any suggestions on the other approaches or improvements to my approach. So far, I can see myself enclosing most of the code in synchronized blocks and I am sure this is not correct.
Programming in Java

Writing correct multi-threaded code is hard, and there is not a magic formula or set of steps that will get you there. But, there are some guidelines you can follow.
Personally I wouldn't start with writing code for a single threaded environment and then converting it to multi-threaded. Good multi-threaded code is designed with multi-threading in mind from the start. Atomicity of fields is just one element of concurrent code.
You should decide on what areas of the code need to be multi-threaded (in a multi-threaded app, typically not everything needs to be threadsafe). Then you need to design how those sections will be threadsafe. Methods of making one area of the code threadsafe may be different than making other areas different. For example, understanding whether there will be a high volume of reading vs writing is important and might affect the types of locks you use to protect the data.
Immutability is also a key element of threadsafe code. When elements are immutable (i.e. cannot be changed), you don't need to worry about multiple threads modifying them since they cannot be changed. This can greatly simplify thread safety issues and allow you to focus on where you will have multiple data readers and writers.
Understanding details of concurrency in Java (and details of the Java memory model) is very important. If you're not already familiar with these concepts, I recommend reading Java Concurrency In Practice http://www.javaconcurrencyinpractice.com/.

You should use final and immutable fields wherever possible, any other data that you want to change add inside:
synchronized (this) {
// update
}
And remember, sometimes stuff brakes, and if that happens, you don't want to prolong the program execution by taking every possible way to counter it - instead "fail fast".

As you have asked about "thread-safety" and not concurrent performance, then your approach is essentially sound. However, a thread-safe program that uses synchronisation probably does not scale much in a multi cpu environment with any level of contention on your structure/program.
Personally I like to try and identify the highest level state changes and try and think about how to make them atomic, and have the state changes move from one immutable state to another – copy-on-write if you like. Then the actual write can be either a compare-and-set operation on an atomic variable or a synchronised update or whatever strategy works/performs best (as long as it safely publishes the new state).
This can be a bit difficult to structure if your new state is quite different (requires updates to several fields for instance), but I have seen it very successfully solve concurrent performance issues with synchronised access.

Buy and read Brian Goetz's "Java Concurrency in Practice".

Any variables (memory) accessible by multiple threads potentially at the same time, need to be protected by a synchronisation mechanism.

Multi-Threaded Application - Help with some pseudo code!

I am working on a multi-threaded application and need help with some pseudo-code. To make it simpler for implementation I will try to explain that in simple terms / test case.
Here is the scenario -
I have an array list of strings (say 100 strings)
I have a Reader Class that reads the strings and passes them to a Writer Class that prints the strings to the console. Right now this runs in a Single Thread Model.
I wanted to make this multi-threaded but with the following features -
Ability to set MAX_READERS
Ability to set MAX_WRITERS
Ability to set BATCH_SIZE
So basically the code should instantiate those many Readers and Writers and do the work in parallel.
Any pseudo code will really be helpful to keep me going!

This sounds like the classic consumer-producer problem. Have a look at Wikipedia's article about it. They have plenty of pseudo code there.

Aside from using the producer-consumer pattern that has been suggested, I would recommend that you use the CopyOnWriteArrayList so you can have lock-free read/write/iteration of your list. Since you're only working with a couple of hundred strings you will probably not have any performance issues with the CopyOnWriteArrayList.
If you're concerned about performance then I actually think it might be better if you use the BlockingQueue or a ConcurrentHashMap. They will allow you to maximize throughput with your multithreaded application.
The recommended option:
A BlockingQueue works very well with multiple producers and consumers, but of course it implies an order of data processing (FIFO). If you're OK with FIFO ordering, then you will probably find that the BlockingQueue is a faster and more robust option.
I think that the Wikipedia article has sufficient pseudo code for you to use, but you can also check out some of the following SO questions:
https://stackoverflow.com/search?q=java+producer+consumer
Java Producer-Consumer Designs:
Producer/Consumer threads using a Queue
design of a Producer/Consumer app

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.