Why Clojure instead of Java for concurrent programming

Why Clojure instead of Java for concurrent programming - java

When Java is providing the capabilities for concurrent programming, what are the major advantages in using Clojure (instead of Java)?

Clojure is designed for concurrency.
Clojure provides concurrency primitives at a higher level of abstraction than Java. Some of these are:
A Software Transactional Memory system for dealing with synchronous and coordinated changes to shared references. You can change several references as an atomic operation and you don't have to worry about what the other threads in your program are doing. Within your transaction you will always have a consistent view of the world.
An agent system for asynchronous change. This resembles message passing in Erlang.
Thread local changes to variables. These variables have a root binding which are shared by every thread in your program. However, when you re-bind a variable it will only be visible in that thread.
All these concurrency primitives are built on top of Clojures immutable data structures (i.e., lists, maps, vectors etc.). When you enter the world of mutable Java objects all of the primitives break down and you are back to locks and condition variables (which also can be used in clojure, when necessary).

Without being an expert on Clojure I would say that the main advantage is that Clojure hides a lot of the details of concurrent programming and as we all know the devil is in the details, so I consider that a good thing.
You may want to check this excellent presentation from Rick Hickey (creator of Clojure) on concurrency in Clojure. EDIT: Apparently JAOO has removed the old presentations. I haven't been able to locate a new source for this yet.

Because Clojure is based on the functional-programming paradigm, which is to say that it achieves safety in concurrency by following a few simple rules:
immutable state
functions have no side effects
Programs written thus pretty much have horizontal scalability built-in, whereas a lock-based concurrency mechanism (as with Java) is prone to bugs involving race conditions, deadlocks etc.

Because the world has advanced in the past 10 years and the Java language (!= the JVM) is finding it hard to keep up. More modern languages for the JVM are based on new ideas and improved concepts which makes many tedious tasks much more simple and safe.

One of the cool things about having immutable types is that most of the built-in functions are already multi-threaded. A simple 'reduce' will span multiple cores/processors, without any extra work from you.
So, sure you can be multi-threaded with Java, but it involves locks and whatnot. Clojure is multi-threaded without any extra effort.

Yes, Java provides all necessary capabilities for concurrent programs.
An analogy: C provides all necessary capabilities for memory-safe programs, even with lots of string handling. But in C memory safety is the programmer's problem.
As it happens, analyzing concurrency is quite hard. It's better to use inherently safe mechanisms rather than trying to anticipate all possible concurrency hazards.
If you attempt to make a shared-memory mutable-data-structure concurrent program safe by adding interlocks you are walking on a tightrope. Plus, it's largely untestable.
One good compromise might be to write concurrent Java code using Clojure's functional style.

In addition to Clojure's approach to concurrency via immutable data, vars, refs (and software transactional memory), atoms and agents... it's a Lisp, which is worth learning. You get Lisp macros, destructuring, first class functions and closures, the REPL, and dynamic typing - plus literals for lists, vectors, maps, and sets - all on top of interoperability with Java libraries (and there's a CLR version being developed too.)
It's not exactly the same as Scheme or Common Lisp, but learning it will help you if you ever want to work through the Structure and Interpretation of Computer Programs or grok what Paul Graham's talking about in his essays, and you can relate to this comic from XKCD. ;-)

This video presentation makes a very strong case, centred around efficient persistent data structures implemented as tries.

Java programming language evolution is quite slow mainly because of Sun's concern about backward compatibility.
Why don't you want just directly use JVM languages like Clojure and Scala?

Related

How exactly does Scala leverage more cores as opposed to Java or other non-functional languages?

I was listening to a Martin Odersky video recently where he attempts to explain the fundamental advantage of functional languages (such as Scala, but of course not necessarily Scala) over OOP or procedural languages.
To paraphrase, he explains that Moore's Law is failing us lately, and so to make processors "faster", instead of being able to double the number of transistors in the cores, CPU manufacturers are simply offering more cores. This in turn lends the CPU to being leveraged more fully by concurrent/multi-threaded applications. So that primary take away was: the more concurrent the application the more snippets of its code are running simultaneously on different cores, and with more cores on the CPU that overall faster the program executes.
So far, so good.
What he failed to explain (or more likely, what I failed to grasp) is why functional languages like Scala lend themselves to being more concurrent then other non-functional languages. Since we happen to be talking about the JVM space, let's do a quick comparison of Java and Scala. What is it about a Scala program that, if it were instead implemented in pure Java, would make it more difficult to parallelize/make concurrent, especially if its all compiling down to JVM bytecode and running on the same JVMs with the same native capabilities??
Meaning I have two JVM apps, App1 written in Java and App2 written in Scala. They both do the exact same tasks and accomplish the same things. Both get compiled down to JVM bytecode and are ran on the same computer with the same JVM installed on it. How is the Scala app able to "tap into" JVM capabilities that make it more suited for concurrency than the Java one (and thus, faster on a CPU with more cores on it)?

You are absolutely correct when you say that both of your apps ultimately boil down to byte code on the JVM. Most of the libraries and frameworks available to Scala for concurrency are also available to Java. However, the thing that sets Scala apart is Scala’s built in support for immutability. Scala has a very rich immutable collections and this feature of embracing immutability makes it easier to write code for concurrent and parallel applications. Scala abstracts the user from writing thread level code and lets the user focus more on the business logic. The Futures API comes in very handy during this.
Scala embraces the functional programming paradigm (i.e. composing functions) and isolating mutable state within actors, by which you can eliminate all of the typical concurrency problems that you would run into by, e.g. programming threads in Java. This is why Scala was chosen as the language to design Spark.
Now coming to main part of your question as to how scala leverages more cores as compared to other non functional languages :-
The Futures API in scala has an underlying execution context which is nothing but a thread pool which is configurable by a developer. Using this API you will be able to take advantage of all your cores.

I want to add more to what is already mentioned. I believe, in general, Java applications are faster but negligible than Scala applications. However, fast execution speed alone is not enough. For example,
Can we develop Scala applications faster than Java? The point is, even if the Scala applications are slower by 10~15% (for example) in term of execution speed but the development time is ~30% faster using FP best practices then it is worth to use Scala (or other Functional Languages). FP emphasizes a lot of functions composability and type safety. Another trait of FP is to use values (immutable variables) over variables. Immutability makes multicore programming easier to manage and less error-prone. Java can do the same using some of the FP best practices. So, during development, Java programmers will encounter a lot of inconveniences because these practices are not readily available in Java.
Does compiling Java sources faster than Scala? Well, this is a big yes. Scala compiler checks whole lots of things, e.g. implicit functions, during compilation. However, Scala programmers write lesser unit tests because the compiler checks the validity of the functions' parameters and return types. The compiler will break if the types are not aligned properly.
The argument where both Java and Scala applications are compiled into JVM bytecode and executed under the same JVM, therefore, they should have the same capabilities is wrong. Both applications are limited by the JVM capabilities unless one of them uses the undocumented JVM features. Then, that language has an "unfair" advantage. Discounting that, the only way one language is faster than the other, given the same JVM runtime environment, is to arrange the bytecode more efficiently.

What advantages does Scala have over Java for concurrent programming?

How can scala make writing multi-threaded programs easier than in java? What can scala do (that java can't) to facilitate taking advantage of multiple processors?

The rules for concurrency are
1 avoid it if you can
2 share nothing if you can
3 share immutable objects if you can
4 be very careful (and lucky)
For rule 2 Scala helps by providing a nicely integrated message passing library out of the box in the form of the actors.
For rule 3 Scala helps to make everything immutable by default.
For rule 4 Scala's flexible syntax allows the creation of internal DSL's making it easier and less wordy to express what you need consicely. i.e. less place for surprises (if done well)

If one takes Akka as a foundation for concurrent (and distributed) computing, one might argue the only difference is the usual stuff that distinguishes Scala from Java, since Akka has both Java and Scala bindings for all its facilities.

There is nothing Scala does that Java does not. That would be silly. Scala runs on the same JVM that Java does.
What Scala does do is make it easier to write, easier to reason about and easier to debug a multi-thread program.
The good bits of Scala for concurrency are its focus on immutable objects, its message-passing and its Actors.
This gives you thread-safe read-only data, easy ways to pass that data to other threads, and easy use of a thread pool.

Python vs. Java -- Which would you pick to do concurrent programming and why? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Also, if not python or java, then would you more generally pick a statically-typed language or a dynamic-type language?

I would choose the JVM over python, primarily because multi-threading in Python is impeded by the Global Interpreter Lock. However, Java is unlikely to be your best when running on the JVM. Clojure or Scala (using actors) are both likely to be better suited to multi-threaded problems.
If you do choose Java you should consider making use of the java.util.concurrent libraries and avoid multi-threading primitives such as synchronized.

For concurrency, I would use Java. By use Java, I actually mean Scala, which borrows a lot from Erlang's concurrency constructs, but is (probably) more accessible to a Java developer who has never used either before.
Python threads suffer from having to wait for the Global Interpreter Lock, making true concurrency (within a single process) unachievable for CPU-bound programs. As I understand, Stackless Python solves some (though not all) of CPython's concurrency deficiencies, but as I have not used it, I can't really advise on it.

Definetely Stackless Python! That a Python variant especially made for concurrency.
But in the end it depends on your target platform and what you are trying to achieve.

If not Java/Python I would go for a functional language since taking side effects into account is one of the complexities of writing concurrent software. (As far as your question goes : this one happens to be statical typed, but compiler infered most of the time)
Personally I would pick F#, since I've seen lots of nice examples of writing concurrent software with ease using it.
As an introduction : this man is equally fun as inspiring, even a must have seen if you are not interested in F# what so ever.

I don't think the argument is about language choice or static or dynamic typing - it's between two models of concurrency - shared memory and message passing. Which model makes more sense in your situation & does your chosen language allow you to make a choice or are you forced to adopt one model over the other?
Why not have a look at Erlang (which has dynamic typing) and message passing, the Actor model, and read why Joe Armstrong doesn't like shared memory. There's also a interesting discussion about java concurrency using locks and threads here on SO.
I don't know about Python, but Java, along with the inbuilt locks and threads model, has a mesasge passing framework called Kilim.

I would use Java, via Jython. Java has strong thread capabilities, and it can be written using the Python syntax with Jython, so you got the best of the two worlds.
Python itself is not really good with concurrency, and is slower than Java anyway.
But if you have concurrency issues and free hands, I'd have a look at Erlang because it has been design for such problems. Of course, you must consider Erlang only if you have:
time to master a (very) new technology
control on a reasonable part of the production chain, since Erland need some adaptations in your toolbox to fit

Neither. Concurrent programming is notoriously hard to get correct. There is the option of using a process oriented programming language like occam-pi which is based of the idea of communicating sequential processes and the pi calculus. This allows compile time checking for deadlock and many other problems that arise during concurrent systems development. If you do not like occam-pi, which I cant blame you if you dont, you could try Go the new language from google which also implements a version of CSP.

For some tasks, Python is too slow. Your single thread Java program could be faster than the concurrent version of Python on a multi-core computer...
I'd like to use Java or Scala, F# or simply go to C++(MPI and OpenMPI).

The Java environment (JVM + libraries) is better for concurrency than (C)Python, but Java the language sucks. I would probably go with another language on the JVM - Jython has already been mentioned, and Clojure and Scala both have excellent support for concurrency.
Clojure is particularly good - it has support for high performance persistent data structures, agents and software transactional memory. It is a dynamic language but you can give it type hints to get performance as good as Java.
Watch this video on InfoQ by Richard Hickey (creator of Clojure) on the problems with traditional approaches to concurrency, and how Clojure handles it.

I'd look at Objective-C and the Foundation Framework. Asynchronous, concurrent programming is well provided for.
This of course depends on your access to Apple's Developer Tools or GnuStep, but if you have access to either one it's a good route to take with concurrent programming.

The answer is that it depends. For example are you trying to take advantage of multiple cores or cpus on a single machine or are you wanting to distribute your task across many machines? How important is speed vs. ease of implementation?
As mentioned before, Python has the Global Interpreter Lock but you could use the multiprocessing module. Note that while Stackless is very cool, it won't utilise multiple cores on its own. Python is usually considered easier to work with than Java. If speed is a priority Java is usually faster.
The java.util.concurrent library in Java makes writing concurrent applications on a single machine simpler but you'll still need to synchronise around any shared state. While Java isn't necessarily the best language for concurrency, there are a lot of tools, libraries, documentation and best practices out there to help.
Using message passing and immutability instead of threads and shared state is considered the better approach to programming concurrent applications. Functional languages that discourage mutability and side effects are often preferred as a result. If distributing your concurrent applications across multiple machines is a requirement, it is worth looking at runtimes designed for this e.g. Erlang or Scala Actors.

Multicore programming: what's necessary to do it?

I have a quadcore processor and I would really like to take advantage of all those cores when I'm running quick simulations. The problem is I'm only familiar with the small Linux cluster we have in the lab and I'm using Vista at home.
What sort of things do I want to look into for multicore programming with C or Java? What is the lingo that I want to google?
Thanks for the help.

The key word is "threading" - wouldn't work in a cluster, but it will be just fine in a single multicore machine (actually, on any kind of Windows, much better in general than spawning multiple processes -- Windows' processes are quite heavy-weight compared to Linux ones). Not quite that easy in C, very easy in Java -- for example, start here!

You want Threads and Actors
Good point ... you can't google for it unless you know some keywords.
C: google pthread, short for Posix Thread, although the win32 native interface is non-posix, see Creating Threads on MSDN.
Java: See class Thread
Finally, you should read up a bit on functional programming, actor concurrency, and immutable objects. It turns out that managing concurrency in plain old shared memory is quite difficult, but message passing and functional programming can allow you to use styles that are inherently much safer and avoid concurrency problems. Java does allow you to do everything the hard way, where data is mutable shared memory and you desperately try to manually interlock shared state structures. But you can also use an advanced style from within java. Perhaps start with this JavaWorld article: Actors on the JVM.

Check out this book: Java Concurrency in Practice

I think you should consider Clojure, too. It runs on the JVM and has good Java interoperability. As a Lisp, it's different from what you're used to with C and Java, so it might not be your cup of tea, but it's worth taking a look at the issues addressed by Clojure anyway, since the concepts are valuable regardless of what language you use. Check out this video, and then, if you're so inclined, the clojure site, which has links to some other good screencasts more specifically about Clojure in the upper right.

It depends on what your preferred language is to get the job done.
Besides the threading solutions, you may can also consider
MPI as a possibility from Java and C --- as well as from Python or R or whatever you like.
DeinoMPI appears to be popular on Windows, and OpenMPI just started with support for Windows too in the current release 1.3.3.

A lot of people have talked about threading, which is one approach, but consider another way of doing it. What if you had several JVM's started up, connected to the network, and waiting for work to come their way? How would you program an application so that it could leverage all of those JVMs without knowing whether or not they are on the same CPU?
On a quadcore machine, you should be able to run 12 or more JVMs to process work. And if you approach the problem from this angle, scaling up to multiple computers is fairly simple, although you do have to consider higher network latencies when your communication is across a real network.

Here is a good source of info on threading in C#.

You need to create multithreaded programs. Java supports multi-threading out of the box (though older JVMs ran all threads on one core). For C, you'll either need to use platform specific code to to create and manipulate threads (pthread* for Linux, CreateThread and company for Windows). Alternatively, you might want to do your threading from C++, where there are a fair number of libraries (e.g. Boost::threads) to make life a bit simpler and allow portable code.
If you want code that'll be portable across a single machine with multiple cores AND a cluster, you might look into MPI. It's really intended for the cluster situation, but has been ported to work on a single machine with multiple processors or multiple cores -- though it's not as efficient as code written specifically for a single machine.

So, that's a very broad question. You can experiment with multithreaded programming using many different programming languages including C or Java. If you wanted me to pick one for you, then I'd pick C. :)
You want to look into Windows threads, POSIX threads (or multithreading in Java, if that's language). You might want to try to find some problems to experiment with. I'd suggest trying out matrix multiplication; start with a sequential version and then try to improve the time using threads.
Also, OpenMP is available for Windows and offers a much different view of how to multithreaded programming.

Even though you asked specifically for C or Java, Erlang isn't a bad choice of language if this is just a learning exercise
It allows you to do multiprocess style programming very very easily and it has a large set of libraries that let you dive in at just about any level you like.
It was built for distributed programming in a very pragmatic way. If you are comfortable with java, the transition shouldn't be too difficult.
If you are interested, I would recommend the book, "Programming Erlang" by Joe Armstrong.
(as a note: there are other languages designed to run on in highly parallel environments like Haskell. Erlang just tends to be more pragmatic than languages like Haskell which are rooted more in theory)

If you want to do easy threading, such as parallel loops, I recommend check out .NET 4.0 Beta (C# in VS2010 Beta).
The book chapter Joe linked to is a very good one I use myself and highly recommend, but doesn't cover the new parallel extensions to the .NET framework.

yes , many threads , but if the threads are accessing the same position in the memory only one thread will execute,
we need multi memory cores

By far the easiest way to do multicore programming on Windows is to use .NET 4 and C# or F#. Here is a simple example where a parallel program (from the shootout) is 7× shorter in F# than Java and just as fast.
.NET 4 provides a lot of new infrastructure for parallel programming and it is really easy to use.

That you say "take advantage" sounds to me as something more than doing just any multi-threading. Simulations in my book are computation-intensive and in that respect the most efficient language is C. Some would say assembly but there are very few x86 assembly programmers who can beat a modern C compiler.
For the Windows NT engine (NT4, 2000, XP, Vista and 7) the mechanisms you should look into are threads, critical sections and I/O completion ports (iocp). Threads are nice but you need to be able to synchronize them among themselves and with I/O which is where cs's and iocps come in. To make sure your wringing every last bit of performance out of your code you need to profile, analyze, experiment/re-construct. Lots of fun but very time-consuming.

Multiple threads can exist in a single process. The threads that belong to the same process share the same memory area (can read from and write to the very same variables, and can interfere with one another). On the contrary, different processes live in different memory areas, and each of them has its own variables. In order to communicate, processes have to use other channels (files, pipes or sockets).
If you want to parallelize a computation, you're probably going to need multithreading, because you probably want the threads to cooperate on the same memory.
Speaking about performance, threads are faster to create and manage than processes (because the OS doesn't need to allocate a whole new virtual memory area), and inter-thread communication is usually faster than inter-process communication. But threads are harder to program. Threads can interfere with one another, and can write to each other's memory, but the way this happens is not always obvious (due to several factors, mainly instruction reordering and memory caching), and so you are going to need synchronization primitives to control access to your variables.
Taken from this answer.

Why Java programs?

I've had little exp. with java and since now i've worked with c++. what makes this one more special and preferred?
Moreover I would like to know about the use of System.in classes and parseInt classes.

Java is vastly easier to work with, especially when developing large programs.
Debugging: Java generates nice Stacktraces
Stability: You can catch every exception
Development Speed: you need no linker (which can take many minutes in C++); with a modern IDE (e.g. eclipse) you can edit code in-place while the program is running
Garbage Collection and run-time type safety eliminate whole classes of errors
really good free (as in beer) IDEs

In theory (and sometimes in reality) Java programs also run on multiple platforms, "write once - debug, er, run everywhere" type of thing. That makes it very useful for a variety of projects.
In my personal experience, while learning Java shortly after being introduced to C++, Java seemed simpler and easier to learn and understand, hence more productive, as was said before. While program structure and syntax is very similar, there is no need to worry about pointers and other potentially dangerous language features.

This is really very broad and I think these are really 2 or 3 different questions. I'll address the first one very briefly. Java utilizes garbage collection, or autmatic memory management. That is, arguably, the biggest difference between it from a language like C++. There are clearly some potential for increase in productivity in that you don't have to worry as much about memory, although in reality you do need to pay attention to your references. Perhaps you could refine your question a bit.

Java works in browsers! (Milpa for example). You can say Flash too, but with Java you can leverage the numerous classes coming with it (another advantage over C++, even if both languages has a good set of free libraries on the Net) and your knowledge of the language.
As said, Java is supported on many platforms with minimal adjustments, with a fast, efficient VM, from big servers to mobile phones.
OO support is arguably better designed, avoiding mistakes done in C++. Somehow, C# is to Java what Java is to C++ ^_^ (I won't argue on this, I don't know C# enough actually, it is just an historical point).
In the same spirit, Java is slightly more abstract, avoiding pointers, manual memory management and some other low level stuff.
That doesn't mean than one is better than the other, STL helps C++ for some of the issues above, etc.
I am not sure how to answer the last sentence, these are object and method respectively, not classes.
I never used System.in yet, I suppose it is usable if you feed the Java program with < or | on the command line. And parseInt is a static method of Integer class.

The language features have already been mentioned (GC, reflection etc.). But there is another major difference: Libraries. I know there is the STL and Boost and all kinds of libraries out there, but they are not all of a piece, they all feel different. Often you are forced to use all kinds of C-APIs (e.g. threading or sockets, just to mention two things). All the C++ evangelists will now jump in and tell about some kind of cool OO-socket or OO-threading library, but they are not part of the STL. They should be. Its almost 2009 and everything is networked and multithreaded. This ought to be part of the standard library.
Why is it bad to use those C-APIs? Because it is hard to use them in an object oriented programm. Try using Win32's CreateThread() with the listener-pattern (C#-users: read "delegates").
For a "rich client application", where performance is not a big deal, I would always use Java or C#. If I need raw speed (think signal processing or embedded applications), I would rather use C instead of C++.
BTW: I have used all four languages (C, C++, Java, C#) for a long time.

If you like to program really the object oriented way, then you need to go from C++ to Java. One of the problems with C++ is that most programmers actually use it as C and don't exploit all its OO features. Java is here stricter.

With C++ you're programming "on the metal", whereas with Java you're programming towards a virtual machine. The Java software stack all the way down to the VM is constructed to give a highly abstracted programming experience. This is most clearly apparent in the use of datatypes "that just are" (i.e. the programmers need no understanding of how they translate into memory areas), garbage collection "that just works" (the programmers don't have to deal with allocation and deallocation issues) and the ubiquity of exceptions for error handling and propagation. Pointers are not part of Java, the system takes care of where and how things are allocated.
From this, you might see that the design philosophy of Java is very different from C++: Java tries to enforce that the programmer should stick to certain ways of working which are considered to be safe and to make programming easier. Some people hate this aspect of Java, other people love it.

It really depends on what you're trying to do.
For a lot of higher level functionality where optimal performance may not matter, Java is easier and more reliable to use. For example, garbage collection, array checking, etc. It's also sandboxed, of course.
For me, another major benefit of Java is the use of reflection and of run time class loading. I write a lot of plugins within pluggable architectures, and can ensure I can add more new classes to a running program on any platform. Last time I tried to do that in C++, I had to mess with DLLs and COM.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.