Using Actors instead of `synchronized` - java

Every time I read about using synchronized in Scala the author will usually mention that Actors should be used instead (this for example). While I understand roughly how actors work I'd really like to see an example of Actors being used to replace Java's synchronized method modifier (by this I mean its Scala equivalent - the synchronized block) in a piece of code. Modifying the internals of a data structure for instance would be nice to see.
Is this a good use of Actors or have I been misinformed?

1) Overview
Scala Actors can replace the complex business logic in a standard Java threaded application s which often evade developers working on complex multithreaded systems.
Consider the following java code snippet that one might see in a a simple, threaded application (this code is waiting for an asynchronous request to complete).
myAsyncRequest.startCalculation();
while(notDone)
myAsyncRequest.checkIfDone();
Thread.sleep(1000);
System.out.println("Done ! Value is : " + myAsyncRequest.getCalculationValue());
To see a direct replacement of this sort of code using Scala's higher level concurrency model, check this post out : Scala program exiting before the execution and completion of all Scala Actor messages being sent. How to stop this?
2) Now : back to the code snpipet --- There are some obvious issues here, lets take a quick look :
The code is coupling the logic of "monitoring" the execution of calculation to the processing of the calculated results.
There are heuristics embedded in the code (Thread.sleep(1000)) which have no clear logical justification (why wait a second ? Why not wait 3 seconds ?), thus adding unecessary logic to the code block.
It doesnt scale - if I'm running 1000 clients, and each is constantly checking the results, I could generate some pretty ugly traffic --- for no good reason.
How does scala modify this paradigm ?
Scala actors can return "futures"
These encapsulate the expectation that, soon enough, the "thing" that you want an actor to do will be accomplished. The scala "future" replaces this java construct : It makes "explicit" the fact that , my while loop is "expecting" something to occur in the near future, and there is an action to be done afterwards.
Scala actors can pass "messages"
Although I'm "waiting" (in the while loop above) for completion, its obvious that another way to implement would be if the calculation object would simply "tell me" when it was done. Message passing enables this, but is somewhat complicated and leads to untraceable, unreadable code in some java implementations. Since scala abstracts this notion in such a way that is directly designed to accomodate concurrent work-loads, the message passing design pattern can now be implemented in a way which isn't overly complex, thus decoupling the logic of "waiting" from the logic of processing.
3) The short answer : In general, the scala API's are built to encode concurrent logic at a higher level of abstraction, so that you're concurrent code is declarative, rather than muddled in implementation details.
4) Synchronization : A lower-level concept which , although essential, can complicate our code .
Synchronization is an artifact of lower-level, multithreaded programming. By providing higher level abstractions of the most common parallel programming paradigms, Scala makes this particular construct unnecessary in many of the most common concurrent programming user cases. In fact, nowadays, even java does this :) The java.util.concurrent package gives us atomic data types and data structures, obviating the need to wrap simple operations in "synchronized" blocks. However, standard Java does not support the higher level notions of "Actors" and "Futures" which can be effectively managed and coordinated without needing to manually manage synchronized method calls or object modifications.

Actors guarantee that only a single message will be handles at time so that there will not be two threads accessing any of the instance members - ergo no need to use synchronized

Related

Is it wrong to use a Java future for an infinite operation?

I need to implement an infinite operation in a Clojure program. I hope the syntax is reasonably clear for Java programmers too:
(defn- my-operation
(future
(while #continue
(do things)
(Thread/sleep polling-time)))
That gives me exactly what I want, the operation in a different thread so it does not block the main one, and also a pretty clear and straightforward syntax without having to deal with native Java functions which would force me to use the dot special form.
But the definition of a Java future is a "representation of the result of an asynchronous computation" and in this case, well, I'm not really doing anything with the result.
Is it wrong to use them this way?.
Is there any technical difference that should worry me compared to starting my own Thread?.
Is this semantically wrong?.
This will tie up a thread in the thread pool that Clojure manages, and uses for such things. The intended use of that thread pool is for short running operations, so it's kind-of a misuse. Also, the intention behind a future is to calculate a value once so it can be dereferenced multiple times, so that's kinds-of a misuse too.
There are lots of other options for long running tasks, including using Java's Executor framework, core-async or starting your own thread.
(defonce background-thread (doto (Thread. (fn []
(while #continue
(do things)
(Thread/sleep polling-time))))
(.setDaemon true)
(.start)))
As others have mentioned, it might be worth thinking about unhandled exceptions. Stuart Sierra's blog post is a great place to start.
The docstring for future says that it returns something you can call deref on. It is a reference type in the same way that delay or atom is, with it's own semantics of how it get's a value and while it's true you could just create a future and let it run forever, if you see a future in some code, it implies that you care about the value it produces.
(clojure.repl/doc future)
-------------------------
clojure.core/future
([& body])
Macro
Takes a body of expressions and yields a future object that will
invoke the body in another thread, and will cache the result and
return it on all subsequent calls to deref/#. If the computation has
not yet finished, calls to deref/# will block, unless the variant of
deref with timeout is used. See also - realized?.
It should also be noted that Futures will swallow all exceptions that take place in them, until they're dereferenced. If you never plan on dereferencing them, be prepared for tears.
I've been bitten multiple times by this, where suddenly everything just stops working for no apparent reason. It's only later that I realize that an exception occurred that I had no idea about.
Along the same vein as what #l0st suggests, this is what I've been using for simple cases:
(defmacro thread [& body]
`(doto (Thread. (fn [] ~#body)
(.start))))
Everything you then pass to thread will be implicitly run in a new thread, which allows you to write:
(thread
(while #continue
(stuff)))
Which is basically what you had before in terms of syntax. Note though that of course this will shadow the macro with the same name if you ever decide to use Clojure.Async.
Ya, dealing with Java interop is a pain, but if you just tuck it away somewhere, it can lead to some nice custom code.

Parallelization (Java 8) vs Concurrency (Java 7)

I want to parse multiple files to extract the required data and then write the output into an XML file. I have used Callable Interface to implement this. My colleague asked me to use Java 8 feature which does this job easily. I am really confused which one of them I should use now.
list.parallelStream().forEach(a -> {
System.out.println(a);
});
Using concurrency or a parallel stream only helps if you have independent tasks to work on. A good example of when you wouldn't do this is what you are locking on a shared resources e.g.
// makes no sense to use parallel here.
list.parallelStream().forEach(a -> {
// locks System.out so only one thread at a time can do any work.
System.out.println(a);
});
However, as a general question, I would use parallelStream for processing data, instead of the concurrency libraries directly because;
a functional style of coding discourages shared mutable state. (Actually how are not supposed to have an mutable state in functional programming but Java is not really a functional language)
it's easier to write and understand for processing data.
it's easier to test whether using parallel helps or not. Most likely ti won't and you can just as easily change it back to being serial.
IMHO Given the chances that using parallel coding will really help is low, the best feature of parallelStream is not how simple it is to add, but how simple it is to take out.
The concurrency library is better if you have ad hoc work which is difficult to model as a stream of data. e.g. a worker pool for client requests might be simplier to implement using an ExecutorService.

Is java concurrent package implemented using locks?

Conceptually,
Mutex
Reader's/Writer lock (Better form of Mutex)
Semaphore
Condition Variable
are used as four major synchronization mechanisms, which are purely lock based. Different programming language have different terms/jargon for these 4 mechanisms. POSIX pthread package is one such example for such implementation.
First two get implemented using spin lock(Busy-wait).
Last two get implemented using sleep lock.
Lock based synchronisation is expensive in terms of cpu cycles.
But, I learnt that java.util.concurrent packages do not use lock(sleep/spin) based mechanism to implement synchronisation.
My question:
What is the mechanism used by java concurrent package to implement synchronization? Because spin lock is cpu intensive and sleep lock is more costlier than spin lock due to frequent context switch.
That very much depends on what parts of the java.util.concurrent package you use (and to a lesser degree on the implementation). E.g. the LinkedBlockingQueue as of Java 1.7 uses both ReentrantLocks and Conditions, while e.g. the java.util.concurrent.atomic classes or the CopyOnWrite* classes rely on volatiles + native methods (that insert the appropriate memory barriers).
The actual native implementation of Locks, Semaphores, etc. also varies between architectures and implementations.
Edit: If you really care about performance, you should measure performance of your specific workload. There are folks far more clever than me like A. Shipilev (whose site is a trove of information on this topic) on the JVM team, who do this and care deeply about JVM performance.
This question is best answered by looking at the source code for java.util.concurrent. The precise implementation depends on the class you are referring to.
For example, many of the implementations make use of volatile data and sun.misc.Unsafe, which defers e.g. compare-and-swap to native operations. Semaphore (via AbstractQueuedSynchronizer) makes heavy use of this.
You can browse through the other objects there (use the navigation pane on the left of that site) to take a look at the other synchronization objects and how they are implemented.
The short answer is no.
Concurrent collections are not implemented with locks compared to synchronized collections.
I myself had the exact same issue as what is asked, wanted to always understand the details. What helped me ultimately to fully understand what's going on under the hood was to read the following chapter in java concurrency in practice:
5.1 Synchronized collections
5.2 Concurrent collections
The idea is based on doing atomic operations, which basically requires no lock, since they are atomic.
The OP's question and the comment exchanges appear to contain quite a bit of confusion. I will avoid answering the literal questions and instead try to give an overview.
Why does java.util.concurrent become today's recommended practice?
Because it encourages good application coding patterns. The potential performance gain (which may or may not materialize) is a bonus, but even if there is no performance gain, java.util.concurrent is still recommended because it helps people write correct code. Code that is fast but is flawed has no value.
How does java.util.concurrent encourage good coding patterns?
In many ways. I will just list a few.
(Disclaimer: I come from a C# background and do not have comprehensive knowledge of Java's concurrent package; though a lot of similarities exist between the Java and C# counterparts.)
Concurrent data collections simplifies code.
Often, we use locking when we need to access and modify a data structure from different threads.
A typical operation involves:
Lock (blocked until succeed),
Read and write values,
Unlock.
Concurrent data collections simplify this by rolling all these operations into a single function call. The result is:
Simpler code on the caller's side,
Possibly more optimized, because the library implementation can possibly use a different (and more efficient) locking or lock-free mechanism than the JVM object monitor.
Avoids a common pitfall of race condition: Time of check to time of use.
Two broad categories of concurrent data collection classes
There are two flavors of concurrent data collection classes. They are designed for very different application needs. To benefit from the "good coding patterns", you must know which one to use given each situation.
Non-blocking concurrent data collections
These classes can guarantee a response (returning from a method call) in a deterministic amount of time - whether the operation succeeds or fails. It never deadlocks or wait forever.
Blocking concurrent data collections
These classes make use of JVM and OS synchronization features to link together data operations with thread control.
As you have mentioned, they use sleep locks. If a blocking operation on a blocking concurrent data collection is not satisfied immediately, the thread requesting this operation goes into sleep, and will be waken up when the operation is satisfied.
There is also a hybrid: blocking concurrent data collections that allow one to do a quick (non-blocking) check to see if the operation might succeed. This quick check can suffer from the "Time of check to time of use" race condition, but if used correctly it can be useful to some algorithms.
Before the java.util.concurrent package becomes available, programmers often had to code their own poor-man's alternatives. Very often, these poor alternatives have hidden bugs.
Besides data collections?
Callable, Future, and Executor are very useful for concurrent processing. One could say that these patterns offer something remarkably different from the imperative programming paradigm.
Instead of specifying the exact order of execution of a number of tasks, the application can now:
Callable allows packaging "units of work" with the data that will be worked on,
Future provides a way for different units of work to express their order dependencies - which work unit must be completed ahead of another work unit, etc.
In other words, if two different Callable instances don't indicate any order dependencies, then they can potentially be executed simultaneously, if the machine is capable of parallel execution.
Executor specifies the policies (constraints) and strategies on how these units of work will be executed.
One big thing which was reportedly missing from the original java.util.concurrent is the ability to schedule a new Callable upon the successful completion of a Future when it is submitted to an Executor. There are proposals calling for a ListenableFuture.
(In C#, the similar unit-of-work composability is known as Task.WhenAll and Task.WhenAny. Together they make it possible to express many well-known multi-threading execution patterns without having to explicitly create and destroy threads with own code.)

Java v Scala from a concurrency viewpoint

I am kicking off my final year project right now. I am going to be investigating the concurrency approaches from java and scala perspectives. Having come out of a java concurrency module, I can see why people say that the shared state threading approach is difficult to reason about. You have critical sections to worry about, run the risk of race conditions and deadlocks etc due to the non deterministic way in which java threads operate. With 1.5 this reasoning was given some clarity ,but still, far from crystal clear.
At first view, scala appears to remove this complex reasoning through the actors class. This has given the programmer the ability to develop concurrent systems from a more sequential viewpoint and easier to conceptualize. But, for this positive, am I right in saying that there are some drawbacks? For instance, say we want to sort a large list in both scenarios - with java you create two threads split the list in two, worry about the critical sections, atomic actions etc and go code. With scala, because it is "share nothing" you actually have to pass the list/2 to two actors to peform the sort operation, right?
I guess my question is that the price you pay for simpler reasoning is performance overhead of having to pass the collection to your actors, in scala?
I was thinking of doing some benchmark tests to this effect (selection sort, quick sort etc;) but because one is functional and one is imperative - I will not be comparing apples with apples from an algorithm viewpoint.
I would really appreciate any views you guys have on the above to give me some ideas to get me started.
Many thanks.
The nice thing about Scala is that you can do concurrency the Java way if you want. All the Java classes are available.
So it really boils down to the difference between a model where you have threads with concurrent access to mutable variables, and a model where you have stateful actors which send messages to each other but do not peek into each others' internals. And you're absolutely right that in some scenarios you have to trade off performance against ease of getting the code correct.
I generally find as a rough rule of thumb that if you're going to have a pile of threads spending a significant amount of time waiting for a lock to open up, using a Java model, and there is no clean way to separate the work to avoid having everyone waiting for that resource, and if the execution switches between threads quickly, then the Java model is far superior to an actor model where the actor sends an "I'm done" message back to a supervisor, which then sends out a "Here's new work!" message to an existing non-busy actor. Sorting algorithms, depending on how you envision them, can very much fall into this category.
For most everything else, the performance penalty associated with actors doesn't amount to much as far as I've seen. If you can conceive of your problem as lots and lots of reactive elements (i.e. they only need time when they've received a message), then actors can scale particularly well (millions available, though only a handful are working at any given instant); with threads, you'd need to have some sort of extra internal state to keep track of who should be doing what work, since you couldn't handle that many active threads.
I'm just going to point out here that Scala does not copy arguments passed to actors, so actors can share whatever it is passed to them.
As opposed to Erlang, it is the programmer's responsibility to avoid sharing mutable stuff. However, there is no penalty in sharing immutable stuff, since there's no need to lock it, as all accesses to it are read-only. And Scala has strong support for immutable data structures.

Approach to a thread safe program

All,
What should be the approach to writing a thread safe program. Given a problem statement, my perspective is:
1 > Start of with writing the code for a single threaded environment.
2 > Underline the fields which would need atomicity and replace with possible concurrent classes
3 > Underline the critical section and enclose them in synchronized
4 > Perform test for deadlocks
Does anyone have any suggestions on the other approaches or improvements to my approach. So far, I can see myself enclosing most of the code in synchronized blocks and I am sure this is not correct.
Programming in Java
Writing correct multi-threaded code is hard, and there is not a magic formula or set of steps that will get you there. But, there are some guidelines you can follow.
Personally I wouldn't start with writing code for a single threaded environment and then converting it to multi-threaded. Good multi-threaded code is designed with multi-threading in mind from the start. Atomicity of fields is just one element of concurrent code.
You should decide on what areas of the code need to be multi-threaded (in a multi-threaded app, typically not everything needs to be threadsafe). Then you need to design how those sections will be threadsafe. Methods of making one area of the code threadsafe may be different than making other areas different. For example, understanding whether there will be a high volume of reading vs writing is important and might affect the types of locks you use to protect the data.
Immutability is also a key element of threadsafe code. When elements are immutable (i.e. cannot be changed), you don't need to worry about multiple threads modifying them since they cannot be changed. This can greatly simplify thread safety issues and allow you to focus on where you will have multiple data readers and writers.
Understanding details of concurrency in Java (and details of the Java memory model) is very important. If you're not already familiar with these concepts, I recommend reading Java Concurrency In Practice http://www.javaconcurrencyinpractice.com/.
You should use final and immutable fields wherever possible, any other data that you want to change add inside:
synchronized (this) {
// update
}
And remember, sometimes stuff brakes, and if that happens, you don't want to prolong the program execution by taking every possible way to counter it - instead "fail fast".
As you have asked about "thread-safety" and not concurrent performance, then your approach is essentially sound. However, a thread-safe program that uses synchronisation probably does not scale much in a multi cpu environment with any level of contention on your structure/program.
Personally I like to try and identify the highest level state changes and try and think about how to make them atomic, and have the state changes move from one immutable state to another – copy-on-write if you like. Then the actual write can be either a compare-and-set operation on an atomic variable or a synchronised update or whatever strategy works/performs best (as long as it safely publishes the new state).
This can be a bit difficult to structure if your new state is quite different (requires updates to several fields for instance), but I have seen it very successfully solve concurrent performance issues with synchronised access.
Buy and read Brian Goetz's "Java Concurrency in Practice".
Any variables (memory) accessible by multiple threads potentially at the same time, need to be protected by a synchronisation mechanism.

Categories