How to make full use of multiple processors?

How to make full use of multiple processors? - java

I am doing web crawling on a server with 32 virtual processors using Java. How can I make full of these processors? I've seen some suggestions on multi-threaded programming, but I wonder how that could ensure all processors would be taken advantage of since we can do multi-threaded programming on single processor machine as well.

There is no simple answer to this ... except the way to ensure all processors are used is to use multi-threading the right way. (Note: that is a circular answer!)
Basically, the way to get effective use of multiple processors is to:
ensure that there is work that can be done in parallel, and
reduce / eliminate contention points that force one thread to wait while another thread does something.
This is difficult enough when you are doing simple computation. For a web crawler, you've got the additional problems that the threads will be competing for network and (possibly) remove server bandwidth, and they will typically be attempting to put their results into a shared data structure or database.
That's about all that can be said at this level of generality ...
And as #veer correctly points, you can't "ensure" it.
... but using a load of threads will surely be quicker wall-time-wise because all the miserable network latency will happen in parallel ...
Actually, if you go overboard, a load of threads can reduce throughput because of contention. Just throwing lots of threads at the problem is rarely a good idea.

A computer or a program is only as fast as the slowest link in its processing chain. Just increasing the CPU capacity is not going to ensure a drastic performance peak. Leaving aside other issues like your cache-size, RAM, etc., there are two basic kinds of approach to your question about how to take advantage of all your processors:
[1] Using a Jit/just-in-time compiler/interpreter technology such as Java/.NET. I don't know much about java, but the .NET jitter is definitely designed to take advantage of all the available processors on the mahcine. In fact, this very feature makes a jitter stand out against other static language compilers like C/C++, because the jitter "knows" that it is sitting on 32 processors, it is in a much better position to take advantage of them than a program statically compiled on any other machine. (provided you have written a robust multi-threading code for it!)
[2] Programming in C/C++. This is the classic approach. If you compile your code on the same machine with 32 CPUs, and take proper care in your program such as memory-management, handling pointers, etc. the C/C++ program will be the most optimal and will perform better than its CLR/JVM counterpart (as it runs without the extra overhead of a garbage-collector or a VM).
But keep in mind that writing robust code is much easier in .NET/Java than C/C++. So, if you are not a "hard-core" programmer, I would suggest going with the former approach. Also remember to handle your multiple threads with care, such as locking variables when multiple threads try to change the same variables. However, excessive locking might make your code hang, if a variable behaves unexpectedly.

Processor management is implemented in native through the Virtual machine you are using i.e., JVM. You can have a look here Java Hotspot VM Options to optimize your machine if you are using Java Hotspot VM. If you are using a third party VM then your provider may help you with tuning it for your requirements.
Application performance in design practically depends on you.
If you would like to monitor your threads and memory usage to optimize your application, you can use any VM monitoring tools available to date. The Java virtual machine (JVM) has built-in instrumentation that enables you to monitor and manage it using JMX.
For details you can check Platform Monitoring and management using JMX. For third party VMs you have to contact the vendor I guess.

Related

high performance rtsp server

I want to implement a high performance rtsp server which is to handle vod request --- it only handles signaling request, it does not need to streaming the media file. I have accomplish a version that is written in Java basing on the Mina networking framework, and the performance seems to be not very high.
As far as I know, high performance SIP server(e.g. VoIP server) is written in C (e.g. OpenSIPS, Kamailo), should I use C or C++ for my project to get a significant performance improvement?
BTW. I found some explanation of the reason why OpenSER is written in C by its author:
"On the other hand, it is the garbage collector that can cause lots of troubles when developing SIP applications in Java. Aheavily loaded server written in Java stopsworking when the garbage collector is cleaning the memory. The delay caused by the garbage collector can be even more than 10 seconds. Such delays are unacceptable"
Is that a fact nowadays which mean that I should use C too?

There are a huge number of variables here, language may not be the determining factor. Trustin Lee, the author of MINA, later created Netty, which offers very high performance indeed. Lee himself says that MINA has "relatively poor performance" as a result of the complexity of some of the features it offers being too tightly bound to the core. So you might look at Netty before completely rewriting everything.
If you're using Oracle's JVM, you're using an extremely optimized runtime system that identifies hotspots in the code (hence the name "HotSpot") and aggressively optimizes them at runtime. It's been a long time since you could say, ipso facto, that Java code would run more slowly than C code. Well-written, optimized C code probably out-performs equivalent Java code in certain select tasks, but a generalization from there is probably no longer appropriate, and of course your code has to take on several of the burdens that the JVM shoulders for you with Java. Also note that there are several things you can do to tune the JVM's garbage collector, for instance to prefer consistency and short pauses over footprint and long pauses.
Obviously C has several strengths (being close to the machine is sometimes exactly what you want), as does explicit memory management for certain tasks.

Have you compared your rtsp server with Wowza?
Wowza is also written in Java, if your rtsp server has lower performance than Wowza, I believe you could improve its performance without changing language, otherwise, if Wowza has similar performance with your server, it indicates that Java cannot satisfy the performance requirements, maybe you should consider to use c/c++ instead.

I built my own RtspServer in C# and have no problem streaming to hundreds of clients.
http://net7mma.codeplex.com/
Code Project article # http://www.codeproject.com/Articles/507218/Managed-Media-Aggregation-using-Rtsp-and-Rtp
You are more then welcome to adopt / reference the design! (Apache 2 License)

Measure execution time of Java with JVMTI

For the profiler which I implement using JVMTI I would like to start measuring the execution time of all Java methods. The JVMTI offers the events:
MethodEntry
MethodExit
So this would be quite easy to implement, however I came across this note in the API:
Enabling method entry or exit events will significantly degrade performance on many platforms and is thus not advised for performance critical usage (such as profiling). Bytecode instrumentation should be used in these cases.
But my profiling agent works headless, which means the collected data is serialized and sent via socket to a server application displaying the results. How should I realize this using byte code instrumentation. I am kind of confused how to go on from here. Could someone explain to me, if I have to switch the strategy or how can I approach this problem?

I don't know about the Sun JVM but the IBM JVM goes into what we call FullSpeedDebug mode when you request the MethodEntry/Exit events.... FSD slows down execution quite a bit.
As you say you can use BCI as my profiler does but unless you are selective about which methods you instrument you will also see a slow down. For example my profiler inserts a if(profiling) callProfilerHook() on every entry and all of the possible exits in a method all object creates and some other areas as well.... These additional checks can slow down execution by over 50%...
As for how to BCI... well I wrote my own C library to do it... it's technically not hard (hint just delete the StackMapTable) but I may take you a while.. Alternatively you can use ASM et. al.
Finally... you callBackHook will add overhead and on small methods render the reported CPU/Clock time meaningless unless you perform some sophisticated overhead calculation... even if you do this your callback code affects the shape of the processor L1 caches and the Java code becomes less efficient because it has less room..
My profiler basically ignores the reported times as I visualize the execution in an interesting way... I'm looking to understand the flow of all of the code, in fact in most cases what code is running (most Java projects have no idea of the millions on lines of third-party code running in their app)

Confused, are languages like python, ruby single threaded? unlike say java? (for web apps)

I was reading how Clojure is 'cool' because of its syntax + it runs on the JVM so it is multithreaded etc. etc.
Are languages like ruby and python single threaded in nature then? (when running as a web app).
What are the underlying differences between python/ruby and say java running on tomcat?
Doesn't the web server have a pool of threads to work with in all cases?

Both Python and Ruby have full support for multi-threading. There are some implementations (e.g. CPython, MRI, YARV) which cannot actually run threads in parallel, but that's a limitation of those specific implementations, not the language. This is similar to Java, where there are also some implementations which cannot run threads in parallel, but that doesn't mean that Java is single-threaded.
Note that in both cases there are lots of implementations which can run threads in parallel: PyPy, IronPython, Jython, IronRuby and JRuby are only few of the examples.
The main difference between Clojure on the one side and Python, Ruby, Java, C#, C++, C, PHP and pretty much every other mainstream and not-so-mainstream language on the other side is that Clojure has a sane concurrency model. All the other languages use threads, which we have known to be a bad concurrency model for at least 40 years. Clojure OTOH has a sane update model which allows it to not only present one but actually multiple sane concurrency models to the programmer: atomic updates, software transactional memory, asynchronous agents, concurrency-aware thread-local global variables, futures, promises, dataflow concurrency and in the future possibly even more.

A confused question with a lot of confused answers...
First, threading and concurrent execution are different things. Python supports threads just fine; it doesn't support concurrent execution in any real-world implementation. (In all serious implementations, only one VM thread can execute at a time; the many attempts to decouple VM threads have all failed.)
Second, this is irrelevant for web apps. You don't need Python backends to execute concurrently in the same process. You spawn separate processes for each backend, which can then each handle requests in parallel because they're not tied together at all.
Using threads for web backends is a bad idea. Why introduce the perils of threading--locking, race conditions, deadlocks--to something inherently embarrassingly parallel? It's much safer to tuck each backend away in its own isolated process, avoiding the potential for all of these problems.
(There are advantages to sharing memory space--it saves memory, by sharing static code--but that can be solved without threads.)

CPython has a Global Interpreter Lock which can reduce the performance of multi-threaded code in Python. The net effect, in some cases, is that threads can't actually run simultaneously because of locking contention. Not all Python implementations use a GIL so this may not apply to JPython, IronPython or other implementations.
The language itself does support threading and other asynchronous operations. The python libraries can also support threading internally without exposing it directly to the Python interpreter.
If you've heard anything negative about Python and threading (or that it doesn't support it), it is probably someone encountering a situation where the GIL is causing a bottleneck..

Certainly the webserver will have a pool of threads. That's only outside the control of your program. Those threads are used to handle HTTP requests. Each HTTP request is handled in a separate thread and the thread is released back to pool when the associated HTTP response is finished. If the webserver doesn't have such a pool, it would have been extremely slow in serving.
Whether a programming language is singlethreaded or multithreaded dependens on the possibility to programmatically spawn new threads using the language in question. If that isn't possible, then the language is singlethreaded, for example PHP. As far as I can see, both Ruby and Python supports multithreading.

The short answer is yes, they are single threaded.
The long answer is it depends.
JRuby is multithreaded and can be run in tomcat like other java code. MRI (default ruby) and Python both have a GIL (Global Interpreter Lock) and are thus single threaded.
The way it works for web servers is further complicated by the number of available server configurations. For most ruby applications there are (at least) two levels of servers, a proxy/static file server like nginx and then the ruby app server.
Nginx does not use threads like apache or tomcat, it uses non-blocking events (and I think forked worker processes). This allows it to deal with higher levels of concurrency than would be allowed with the overhead and scheduling inefficiencies of native threads.
The various ruby apps servers also work in different ways to get high throughput and concurrency without threads. Thin uses libev and the asynchronous evented model like Nginx. Mongrel uses a round-robin pool of worker processes. Unicorn uses native Unix IPC (select on a socket) to load balance to a pool of forked processes through one master proxy socket.
Threads are only one way to address concurrency. Multiple processes and evented models are a different approach that ties in well with the Unix base. This is fundamentally different from the way Java treats the world.

Python
Let me try to put it more simply than the more detailed answers.
The heart of the answer here doesn't really have to do with Python being single-threaded versus multi-threaded. It has a more to do with threading versus multiprocessing.
Saying Python is "single-threaded" doesn't really capture reality, because you can certainly have more than one thread running in a Python process. Just use the threading library, and create more than one thread. There, now you have just proven that Python isn't single-threaded.
But using multiple threads in Python does NOT mean you're using multiple CPU processors concurrently. In fact, the Global Interpreter Lock prevents this. So this is where questions arise.
Basically, threading in Python cannot be used for parallel CPU computation. But you CAN do parallel CPU computation with Python by using multiprocessing instead of multi-threading.
I found this article very helpful when researching this: https://timber.io/blog/multiprocessing-vs-multithreading-in-python-what-you-need-to-know/ . It includes real-world examples of when you'd want to use multiprocessing versus multi-threading.

Most languages don't define single or multithreading. Usually, that is left up to the libraries to implement.
That being said, some languages are better at it than others. CPython, for instance, has issues with interpreter locking during multithreading, Jython (python running on the JVM) does not.
Some of the real power of Clojure (IMO) is that it runs on the JVM. You get multithreading and tons of libraries for free.

A few interpreted programming
languages such as CPython and Ruby
support threading, but have a
limitation that is known as a Global
Interpreter Lock (GIL). The GIL is a
mutual exclusion lock held by the
interpreter that prevents the
interpreter from concurrently
interpreting the applications code on
two or more threads at the same time,
which effectively limits the
concurrency on multiple core systems.
from wikipedia Thread

keeping this very short..
Python supports Multi Threading.
Python does NOT support parallel execution of its Threads.
Exception:
Above statement may vary with implementations of Python not using GIL (Global Interpreter Locking).
If a particular implementation is not using GIL, then, that would be Multi Threaded as well as support Parallel Execution

Ruby
The Ruby Interpreter is single threaded, which is to say that several of its methods are not thread safe.
In the Rails world, this single-thread has mostly been pushed to the server. So, you'll see nginx running with a pool of mongrel servers, each of which has an interpreter in memory, processes 1 request at a time, and in its own thread.
Passenger, running "ruby enterprise" brings the concept of garbage collection and some thread safety into Rails, and it's nice.
Still work to be done in Rails on this area, but it's getting there slowly -- but in general, the idea is to have multiple services and servers.

How to untangle the knots in al those threads...
Clojure did not invent threading, however it has particularly strong support for it with Software Transactional Memory, Atoms, Agents, parallel map operations, ...
All other have accumulated threading support. Ruby is a special case as it has green threads in some implementations which are a kind of software emulated threads and do not use all the cores. 1.9 will put this to rest.
Regarding web servers, no they do not always work multithreaded, apache has traditionally ran as a flock of daemons which are a pool of separate single threaded processes. Now currently there are more options to run apache servers.
To summarize all modern languages support threading in one form or another.
The newer languages like scala and clojure are adding specific support to improve working with multiple threads without explicit locking as this has traditionally be the great pitfall of multithreading.

Reading these answers here... A lot of them try to sound smarter than they really are imho (im mostly talking about Ruby related stuff as thats the one i'm most familiar with).
In fact, JRuby is currently the only Ruby implementation that supports true concurrency. On JVM Ruby threads are mapped to OS native threads, without GIL interfering. So its totally correct to say that Ruby is not multithreaded.
In 1.8.x Ruby is actually run inside one OS thread, and while you do have the fake feeling of concurrency with green threads, then in reality GIL will pretty much prevent you from having true concurrency.
In Ruby 1.9 this changed a bit, as now a Ruby process can have many OS threads attached to it (plus the green threads), but again GIL will totally destroy the point and become the bottleneck.
In practice, from a regular webapp standpoint, it should not matter much if its single or multithreaded. The problem mostly arises on the server side anyhow and it mostly is a matter of scaling technique difference.

Yes Ruby and Python can handle multi-threading, but for many cases (web) is better to rely on the threads generated by the http requests from the client to the server. Even if you generate many threads on a same application to low the runtime cost or to handle many task at time, in a web application case that's usually too much time, no one will wait happily more than some fractions of a second for the response of your application in a single page, it's more wise to use AJAX (Asynchronous JavaScript And XML) techniques: make sure the design of your web shows up rapidly, and make an asynchronous insertion of those hard-coding things later.
That does not mean that multi-threading is useless for web! It's highly recommended to low the charge of your server if you want to run recursive-complicated-hardcore-applications (not for a website, I mean), but what that thing return must end in files or in databases, so then could be softly served by a http response.

Alternative to Java

I need an alternative to Java, because I am working on a genetics-calculation project.
It takes a lot of memory and the most of the cpu time. And therefore it won´t work when I deploy it on a server, because many people use the program at the same time.
Does anybody know another language that is not running in a virtual machine and is similar to Java (object-oriented, using exceptions and type-safety)?
Best regards,
Jonathan

To answer the direct question: there are dozens of languages that fit your explicit requirements. AmmoQ listed a few; Wikipedia has many more.
And I think that you'll be disappointed with every one of them.
Despite what Java haters want you to think, Java's performance is not much different than any other compiled language. Just changing languages won't improve performance much.
You'll probably do better by getting a profiler, and looking at the algorithms that you used.
Good luck!

If your apps is consuming most of the CPU and memory on a single-user workstation, I'm skeptical that translating it into some non-VM language is going to help much. With Java, you're depending on the VM for things like memory management; you're going to have to re-implement their equivalents in your non-VM language. Also, Java's memory management is pretty good. Your application probably isn't real-time sensitive, so having it pause once in a while isn't a problem. Besides, you're going to be running this on a multi-user system anyway, right?
Memory usage will have more to do with your underlying data structures and algorithms rather than something magical about the language. Unless you've got a really great memory allocator library for your chosen language, you may find you uses just as much memory (if not more) due to bugs in your program.
Since your app is compute-intensive, some other language is unlikely to make it less so, unless you insert some strategic sleep() calls throughout the code to deliberately make it yield the CPU more often. This will slow it down, but will be nicer to the other users.
Try running your app with Java's -server option. That will engage a VM designed for long-running programs and includes a JIT that will compile your Java into native code. It may make your program run a bit faster, but it will still be CPU and memory bound.

If you don't like C++, you might consider D, ObjectiveC or the new Go language from google.

You may try C++, it satisfies all your requirements.

Use Python along with numpy, scipy and matplotlib packages. numpy is a Python package which has all the number crunching code implemented in C. Hence runtime performance (bcoz of Python Virtual Machine) won't be an issue.
If you want compiled, statically typed language only, have a look at Haskell.

Can your algorithms be parallelised?
No matter what language you use you may come up against limitations at some point if you use a single process. Using something like Hadoop will mean you can retain Java and ease of use but you can run in parallel across many machines.

On the same theme as #Barry Brown's answer:
If your application is compute / memory intensive in Java, it will probably be compute / memory intensive in C++ or any other "more efficient" language. You might get some extra leeway ... but you'll soon run into the same performance wall.
IMO, you need to do the following things:
You need profile your application, and look for any major performance bottlenecks. You might find some real surprises.
In the light of the previous step, review the design and algorithms, paying attention to space and time complexity issues. Do some research to see if someone has discovered better algorithms for doing the computations that are problematic from a performance perspective.
If the previous steps don't get you ahead of the curve, see if you can upgrade your platform; get a bigger machine with more processors, more memory, etc.
If you are still stuck, your only other option is a scale-out design. Assuming that individual user requests are processed in a single-threaded, re-architect your system so that you can run "workers" across multiple servers, with a load balancer on the front. If you have a persistent back-end, look into how you can replicate that. And so on.
Figure out if the key algorithms can be parallelized / distributed so that the resource intensive parts of a user request execute in parallel on multiple processors / multiple servers; e.g. using a "map-reduce" framework.
OK, so there is no easy answer. But simply changing programming languages is NOT a good answer.

Regardless of language your program will need to share with others when running in multiple instances on a single machine. That is simply the way computers work.
The best way to allow your current program to scale to use the available hardware resources is to chop your amount of work into small, independent pieces, and make them implement the Callable interface. These can then be executed by a suitable Executor which can then be chosen according to the available hardware. See the Executors class for many preconfigured versions. THis is what I would recommend you to do here.
If you want to switch language then Mac OS X 10.6 allows for programming in the way described above with C and ObjectiveC and if you do it properly OS X can distribute the code over all available computing resources (both CPU and GPU and what have we).
If none of the above is interesting to you, then consider one of the Grid frameworks. Terracotta may be a good place to start.

F# or ruby, or python, they are very good for calculations, and many other things
NASA uses python

Well.. I think you are looking for C#.
C# is Object Oriented and has excellent support for Generics. You can use it do write both WinForm and server-side applications.
You can read more about C# generics here: http://msdn.microsoft.com/en-us/library/ms379564(VS.80).aspx
Edit:
My mistake, geneTIcs, not geneRIcs. It does not change the fact C# will do the job, and using generics will reduce load significantly.

You might find the computer language shootout here interesting.
For example, here's Java vs C++.
You might find Ocaml (from which F# is derived) worth a look; it meets your requirements for OO, exceptions, static types and it has a native compiler, however according to the shootout you may be trading less memory for lower speed.

Multicore programming: what's necessary to do it?

I have a quadcore processor and I would really like to take advantage of all those cores when I'm running quick simulations. The problem is I'm only familiar with the small Linux cluster we have in the lab and I'm using Vista at home.
What sort of things do I want to look into for multicore programming with C or Java? What is the lingo that I want to google?
Thanks for the help.

The key word is "threading" - wouldn't work in a cluster, but it will be just fine in a single multicore machine (actually, on any kind of Windows, much better in general than spawning multiple processes -- Windows' processes are quite heavy-weight compared to Linux ones). Not quite that easy in C, very easy in Java -- for example, start here!

You want Threads and Actors
Good point ... you can't google for it unless you know some keywords.
C: google pthread, short for Posix Thread, although the win32 native interface is non-posix, see Creating Threads on MSDN.
Java: See class Thread
Finally, you should read up a bit on functional programming, actor concurrency, and immutable objects. It turns out that managing concurrency in plain old shared memory is quite difficult, but message passing and functional programming can allow you to use styles that are inherently much safer and avoid concurrency problems. Java does allow you to do everything the hard way, where data is mutable shared memory and you desperately try to manually interlock shared state structures. But you can also use an advanced style from within java. Perhaps start with this JavaWorld article: Actors on the JVM.

Check out this book: Java Concurrency in Practice

I think you should consider Clojure, too. It runs on the JVM and has good Java interoperability. As a Lisp, it's different from what you're used to with C and Java, so it might not be your cup of tea, but it's worth taking a look at the issues addressed by Clojure anyway, since the concepts are valuable regardless of what language you use. Check out this video, and then, if you're so inclined, the clojure site, which has links to some other good screencasts more specifically about Clojure in the upper right.

It depends on what your preferred language is to get the job done.
Besides the threading solutions, you may can also consider
MPI as a possibility from Java and C --- as well as from Python or R or whatever you like.
DeinoMPI appears to be popular on Windows, and OpenMPI just started with support for Windows too in the current release 1.3.3.

A lot of people have talked about threading, which is one approach, but consider another way of doing it. What if you had several JVM's started up, connected to the network, and waiting for work to come their way? How would you program an application so that it could leverage all of those JVMs without knowing whether or not they are on the same CPU?
On a quadcore machine, you should be able to run 12 or more JVMs to process work. And if you approach the problem from this angle, scaling up to multiple computers is fairly simple, although you do have to consider higher network latencies when your communication is across a real network.

Here is a good source of info on threading in C#.

You need to create multithreaded programs. Java supports multi-threading out of the box (though older JVMs ran all threads on one core). For C, you'll either need to use platform specific code to to create and manipulate threads (pthread* for Linux, CreateThread and company for Windows). Alternatively, you might want to do your threading from C++, where there are a fair number of libraries (e.g. Boost::threads) to make life a bit simpler and allow portable code.
If you want code that'll be portable across a single machine with multiple cores AND a cluster, you might look into MPI. It's really intended for the cluster situation, but has been ported to work on a single machine with multiple processors or multiple cores -- though it's not as efficient as code written specifically for a single machine.

So, that's a very broad question. You can experiment with multithreaded programming using many different programming languages including C or Java. If you wanted me to pick one for you, then I'd pick C. :)
You want to look into Windows threads, POSIX threads (or multithreading in Java, if that's language). You might want to try to find some problems to experiment with. I'd suggest trying out matrix multiplication; start with a sequential version and then try to improve the time using threads.
Also, OpenMP is available for Windows and offers a much different view of how to multithreaded programming.

Even though you asked specifically for C or Java, Erlang isn't a bad choice of language if this is just a learning exercise
It allows you to do multiprocess style programming very very easily and it has a large set of libraries that let you dive in at just about any level you like.
It was built for distributed programming in a very pragmatic way. If you are comfortable with java, the transition shouldn't be too difficult.
If you are interested, I would recommend the book, "Programming Erlang" by Joe Armstrong.
(as a note: there are other languages designed to run on in highly parallel environments like Haskell. Erlang just tends to be more pragmatic than languages like Haskell which are rooted more in theory)

If you want to do easy threading, such as parallel loops, I recommend check out .NET 4.0 Beta (C# in VS2010 Beta).
The book chapter Joe linked to is a very good one I use myself and highly recommend, but doesn't cover the new parallel extensions to the .NET framework.

yes , many threads , but if the threads are accessing the same position in the memory only one thread will execute,
we need multi memory cores

By far the easiest way to do multicore programming on Windows is to use .NET 4 and C# or F#. Here is a simple example where a parallel program (from the shootout) is 7× shorter in F# than Java and just as fast.
.NET 4 provides a lot of new infrastructure for parallel programming and it is really easy to use.

That you say "take advantage" sounds to me as something more than doing just any multi-threading. Simulations in my book are computation-intensive and in that respect the most efficient language is C. Some would say assembly but there are very few x86 assembly programmers who can beat a modern C compiler.
For the Windows NT engine (NT4, 2000, XP, Vista and 7) the mechanisms you should look into are threads, critical sections and I/O completion ports (iocp). Threads are nice but you need to be able to synchronize them among themselves and with I/O which is where cs's and iocps come in. To make sure your wringing every last bit of performance out of your code you need to profile, analyze, experiment/re-construct. Lots of fun but very time-consuming.

Multiple threads can exist in a single process. The threads that belong to the same process share the same memory area (can read from and write to the very same variables, and can interfere with one another). On the contrary, different processes live in different memory areas, and each of them has its own variables. In order to communicate, processes have to use other channels (files, pipes or sockets).
If you want to parallelize a computation, you're probably going to need multithreading, because you probably want the threads to cooperate on the same memory.
Speaking about performance, threads are faster to create and manage than processes (because the OS doesn't need to allocate a whole new virtual memory area), and inter-thread communication is usually faster than inter-process communication. But threads are harder to program. Threads can interfere with one another, and can write to each other's memory, but the way this happens is not always obvious (due to several factors, mainly instruction reordering and memory caching), and so you are going to need synchronization primitives to control access to your variables.
Taken from this answer.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.