Using multiple cores/processors when compiling Java

Using multiple cores/processors when compiling Java - java

I use a desktop with eight cores to build a Java application using Ant (through a javac target). Is there a way to speed up the compilation by using more than one thread or process?
I know I can run several Ant tasks in parallel, but I don't think this can be applied to a single compilation target, or does it?

I don't know of any way to do tell ant itself to make effective use of multiple cores. But you can tell ant to use the Eclipse Compiler, which has support for multithreaded compilation built-in.

As long as the javac you are calling doesn't use all the cores it doesn't really matter what you say in Ant. You can use the compiler attribute to define which java compiler should be used for the task.
If you have several build targets you can use fork=yes to execute the target(s) externally.
http://ant.apache.org/manual/Tasks/javac.html#compilervalues

The documentation seems to indicate that it's unlikely to work correctly with javac.
Anyone trying to run large Ant task sequences in parallel, such as javadoc and javac at the same time, is implicitly taking on the task of identifying and fixing all concurrency bugs the tasks that they run.
Accordingly, while this task has uses, it should be considered an advanced task which should be used in certain batch-processing or testing situations, rather than an easy trick to speed up build times on a multiway CPU.

You can use Buck Build to increase your build speed and utilize multiple cores.
In a nutshell:
Buck is a build system developed and used by Facebook. It encourages
the creation of small, reusable modules consisting of code and
resources, and supports a variety of languages on many platforms.
Buck builds independent artifacts in parallel to take advantage of multiple
cores on your machine. Further, it reduces incremental build times by
keeping track of unchanged modules so that the minimal set of modules
is rebuilt.

Not as far as I know. The Eclipse compiler has some work done to speed up using multiple cores but it does not buy as much as you probably would like it to.
Question is, can you live with incremental compilation for development, and only recompile those that changed? The full rebuild can then be left to the build server.

I assume that it might not help much because javac can pull all files in memory and if it has to do this with multiple processes it's just doubling the effort. However if you want to compile two fairly separate pieces of Java code, then you can just do:
#!/usr/bin/env bash
javac file1.java &
javac file2.java &
javac file3.java &
wait;
if the 3 files have mostly different dependencies, then it might save time, if the dependencies overlap, then it probably doesn't save much time.

Related

Scons Build performance for C++ in large code base

We have been working on our project with scons as build system for a few years. Scons is awesome and has improved our development process a lot.
There are ~15000 C++ source files initially and as the project evolves source files in other languages(Java/Scala) are added into the code base as well.
Then we developed our own builders which manage dependencies and invoke the external tool(javac, scalac) to build those sources, which works really well.
Recently I was working on the optimization of current build system and found performance difference between C++ and Java build process:
Environment setup:
Build server with 24 cores, Python 2.7.3, Scons 2.2.0
Command: scons --duplicate=soft-copy -j8
When building java code, CPU usage is easily high observed from top and spanning multiple cores:
Java Build
However, when building C++ code, CPU usage is always ~4% and running only on 1 core no matter how many jobs in scons:
C++ Build
I've been googling a lot on the internet but could not find something useful. Am I hitting the GIL issue in Python? I believe that each gcc/g++ command should be ran in a separate sub-process just like javac in our own builders, so there should not be GIL here. Is there any workaround to fully utilize multiple cores speeding up C++ build further?
Thanks in advance.

As WindLeeWT explained in one of his comments, the cause for the observed behaviour was that ccache was installed and configured on the server in question. It seems that most of the CPU usage during a C++ build was took by the compilation stage, which was fully avoided due to ccache. That's why no CPU usage for several cores could be seen within top.
As soon as they launched a "build from scratch" on another server without ccache, several cores were running at 90% with 'cc1plus -o ...' as expected.
No performance penalties (GIL etc.) were involved, and neither Python nor SCons degraded performance in any significant way.

Remote Java compiler

I'm looking for a way to boost my team's productivity, and one way to do that would be to shorten the time it takes to compile & unit test & package & deploy our Java EE application which is getting bigger and bigger.
The trivial solution that I know of is to set up a powerful computer with N processors (N ~= num of developers) and a blazingly fast disk system and a lot of memory, and run everything on this computer and connect to it via X remotely. It would certainly be much faster than compiling on our laptops, but still cheaper and easier to maintain than to buy each developer his/her own supercomputer.
Is there another way to solve this problem? For example, could we run our IDEs locally and then tell it to remote compile java source? Can Netbeans / Eclipse / IntelliJ / etc. do this? Or is there a special tool that enables remote java compilation, also that makes use of multiple processors? It need not be free/open source.
Unfortunately our laptops MUST run a (company managed) Windows Vista, so another reason to go for the separate server computer is to let us use linux on it and finally get rid of the annoying managed environment.
EDIT: to sum up the answers so far, one way to shorten build times is to leave compilation for the developers individually (because compiling is supposed to be fast), skip running unit tests and hot-deploy (without packaging) to the container.
Then, when the developer decides to check his/her code in, a continuous integration server (such as Hudson) is triggered to clean & build & run tests & package & deploy.
SOLUTION: I've accepted Thorbjørn's answer since I think that's going to be the closest to which way I'm planning to proceed. Although out of curiosity I'm still interested in solving the original problem (=remote Java compiling)...

You essentially need two workflows.
The OFFICIAL build, which checks out the sources, builds the whole thing from scratch, runs all the unit tests, and then builds the bits which will eventually ship to the customer after testing.
Developer hot-deploying after each source code change into the container the IDE knows about.
These two can actually be vastly different!
For the official build, get Jenkins up and running and tell it to watch your source repository and build whenever there is a change (and tell those who break the build). If you can get the big computer for building, use it for this purpose.
For the developers, look into a suitable container with very good IDE deployment options, and set that up for usage for each and every developer. This will VERY rapidly pay off! JBoss was previously very good for exactly this purpose.
And, no, I don't know of an efficient remote java compilation options, and I don't think this is what you should pursue for the developers.
See what Joel thinks about Build Servers: http://www.joelonsoftware.com/articles/fog0000000023.html
If you don't like Jenkins, plenty others exist.
(2016 edit: Hudson changed to Jenkins. See https://stackoverflow.com/a/4974032/53897 for the history behind the name change)

It's common to set up a build server , e.g. running hudson to do the compiling/packaging/unit-testing/deploying.
Though you'd likely still need the clients to at least perform a compile. Shifting to using a build server, you might need to change the work process too if you arn't using a build server now - e.g. if the goal is to take load off the client machines, your developers will check code in , automatic unit tests gets run, instead of running unit tests first, then checking in.

You could mount each developer dir with ntfs on the powerful machine and then create External Tool Configuration in Eclipse (GUI access), that would be triggering build on external server.

JavaRebel can increase productivity also. It eliminates the need for redeployments..
You can recompile a single file and see the changes being applied directly on the server.

When things start getting too big for efficient builds, it may be time to investigate breaking up your code into modules/JARs (how it breaks apart would depend on many project specifics and how your team tends to work). If you find a good setup, you can get away with less compiling (dont always need to rebuild the whole project) and more/quicker copying/jaring to get to the point where you can test new code.

What your project need is a build system to do the building, testing and packaging for you. Hudson is a good example of such a continuous integration build system.

Distributed Java Compiler

Is there a distributed compiler for Java, analogous to distcc for C/C++?

The direct answer to your question is "no". However, it probably would not help you anyway… compiling Java is very fast.
On a small project, the compilation is fast enough that you shouldn't really care. On a large project you would need to deal with throwing the file to compile over a network, and having to deal with potentially also throw across many megabytes of dependencies as well.
One thing that you can do to improve your compilation speed is to use the eclipse compiler instead of the Sun javac. The Eclipse compiler is multi-threaded, and so will, with luck, use all the cores of your machine.
It is probably also worth mentioning that Apple also recently reduced distcc support, since in general, on newer hardware, it cost more time to get the code somewhere else to compile and back, than it did to just do the compilation locally. To quote Apple:
The single-computer build performance of Xcode has been improved to the point that distributed building with Distributed Network Builds is slower than local builds in most circumstances.

Maybe Jikes would work for you. You can achieve very similar effects with a clever ant script and nfs like filesystem...

If you're annoyed with waiting a long time for your java compiles, then you might consider one of the following:
break your project up into several different jar files (in a hierarchic dependency). With any luck, your changes will only affect source in one of those jars, while the others can continue to serve as dependencies.
break your project into groups of sources, perhaps by package, and use Apache ant to coordinate your compiling. I was always too lazy to use it, but you can set up explicit dependency management to avoid re-compiling stuff for which .class files already exist and are newer than the source. The effort that goes into setting this up once can reap dividends within a few days if the project is large and compiles are chewing up a lot of your time.
As opposed to multi-coring, reducing the amount of code that you need to recompile will also reduce your PC's energy consumption and carbon footprint ;)

I did write the start of one for java6
http://www.pointdefence.net/jarc/index.html
It's distributed at the java compiler task. So it would work well with parallel compilation of independent Maven modules.

I think the parallel compilation of independent Maven modules should be quite easy using some simple scripts - just pull from version control, change dir and run mvn clean compile. Add mvn deploy to get the artifact to your artifact repository.
This should work even with dependent modules, will need some work on synchronization though.

Why is no one using make for Java?

Just about every Java project that I've seen either uses Maven or Ant. They are fine tools and I think just about any project can use them. But what ever happened to make? It's used for a variety of non-Java projects and can easily handle Java. Sure you have to download make.exe if you use Windows, but Ant and Maven also don't come with the JDK.
Is there some fundamental flaw with make when used with Java? Is it just because Ant and Maven are written in Java?

The fundamental issue with Make and Java is that Make works on the premise that you have specify a dependency, and then a rule to resolve that dependency.
With basic C, that typically "to convert a main.c file to a main.o file, run "cc main.c".
You can do that in java, but you quickly learn something.
Mostly that the javac compiler is slow to start up.
The difference between:
javac Main.java
javac This.java
javac That.java
javac Other.java
and
javac Main.java This.java That.java Other.java
is night and day.
Exacerbate that with hundreds of classes, and it just becomes untenable.
Then you combine that with the fact that java tends to be organized as groups of files in directories, vs C and others which tend towards a flatter structure. Make doesn't have much direct support to working with hierarchies of files.
Make also isn't very good at determining what files are out of date, at a collection level.
With Ant, it will go through and sum up all of the files that are out of date, and then compile them in one go. Make will simply call the java compiler on each individual file. Having make NOT do this requires enough external tooling to really show that Make is not quite up to the task.
That's why alternatives like Ant and Maven rose up.

Actually, make can handle the recompilation in one command of all outdated java files. Change the first line if you don't want to compile all files in the directory or want a specific order...
JAVA_FILES:=$(wildcard *.java)
#
# the rest is independent of the directory
#
JAVA_CLASSES:=$(patsubst %.java,%.class,$(JAVA_FILES))
.PHONY: classes
LIST:=
classes: $(JAVA_CLASSES)
if [ ! -z "$(LIST)" ] ; then \
javac $(LIST) ; \
fi
$(JAVA_CLASSES) : %.class : %.java
$(eval LIST+=$$<)

The venerable make program handles separately compiled languages like C and C++ reasonably well. You compile a module, it uses #include to pull in the text of other include files, and writes a single object file as output. The compiler is very much a one-at-a-time system, with a separate linking step to bind the object files into an executable binary.
However, in Java, the compiler has to actually compile other classes that you import with import. Although it would be possible to write something that generated all the necessary dependencies from Java source code, so that make would build classes in the correct order one at a time, this still wouldn't handle cases such as circular dependencies.
The Java compiler can also be more efficient by caching the compiled results of other classes while compiling further classes that depend on the results of ones already compiled. This sort of automatic dependency evaluation is not really possible with make alone.

The question is based on an incorrect assumption: a non-trivial number of developers do use make. See Java Build Tools: Ant vs. Maven. As for why a developer wouldn't use make: many developers either have never used make, or used it and hated it with a fire that burns hotter than a thousand suns. As such, they use alternative tools.

All the other answers about the technical merits of each are true. Ant and Maven may be better suited to Java than make, or as Hank Gay points out, they may not :)
However, you asked if it matters that Ant and Maven are written in Java. Although on StackOverflow we don't consider such thoughts (closed! not-programming-related! etc.), OF COURSE THAT'S PART OF THE THING. On rails we use Rake, C dudes use make, and in Java we use Ant and Maven. While it's true that the Ant or Maven developers will look after the Java developer perhaps better than others, there's also another question: what do you write Ant tasks in? Java. If you're a Java developer, that's an easy fit.
So yeah, part of it is to use tools written in the language you are tooling.

Ant and later Maven were designed to solve some headaches caused by Make ( while creating new ones in the process ) It is just evolution.
...Soon thereafter, several open source Java projects realized that Ant could solve the problems they had with Makefiles....
From http://ant.apache.org/faq.html#history
Whether they solve anything or just create an extra format to learn is a subjective topic. The truth is that's pretty much the history of every new invention: The creator says it solves a lot of problems and the original users say those are virtues.
The main advantage it has, is the possibility to integrate with java.
I guess a similar history would be with rake for instance.

One of the major issues solved by Maven (and Ivy-enabled Ant setups) over make is automated dependency resolution and downloading of your dependency jars.

Short answer: Because make isn't good. Even on the C front you see many alternatives popping up.
Long answer: make has several flaws that make it barely suitable for compiling C, and unsuitable at all for compiling Java. You can force it to compile Java, if you want, but expect running into issues, some of which do not have a suitable solution or workaround. Here are a few:
Dependency resolution
make inherently expects files to have a tree-like dependency on each other, in which one file is the output of building several others. This already backfires in C when dealing with header files. make requires a make-specific include file to be generated to represent the dependency of a C file on its header files, so a change to the latter would cause the prior to be rebuilt. However, since the C file itself isn't recreated (merely rebuilt), make often requires specifying the target as .PHONY. Fortunately, GCC supports generating those files automatically.
In Java, dependency can be circular, and there's no tool for auto-generating class dependencies in make format. ant's Depend task can, instead, read the class file directly, determine which classes it imports, and delete the class file if any of them are out of date. Without this, any non-trivial dependency may result in you being forced to use repeated clean builds, removing any advantage of using a build tool.
Spaces in filenames
While neither Java nor C encourage using spaces in your source code filenames, in make this can be problem even if the spaces are in the file path. Consider, for example, if your source code exists in C:\My Documents\My Code\program\src. This would be enough to break make. This is because make treats filenames as strings. ant treats paths as special objects.
Scanning files for build
make requires explicitly setting which files are to be built for each target. ant allows specifying a folder which is to be auto-scanned for source files. It may seem like a minor convenience, but consider that in Java each new class requires a new file. Adding files to the project can become a big hassle fast.
And the biggest problem with make:
make is POSIX-dependent
Java's motto is "compile once run everywhere". But restricting that compilation to POSIX-based systems, in which Java support is actually the worst, is not the intention.
Build rules in make are essentially small bash scripts. Even though there is a port of make to Windows, for it to work properly, it has to be bundled with a port of bash, which includes a POSIX emulation layer for the file system.
This comes in two varieties:
MSYS which tries to limit the POSIX translation to file paths, and can therefore have unpleasant gotchas when running external tools not made especially for it.
cygwin which provides a complete POSIX emulation. The resulting programs, however, tend to still rely on that emulation layer.
For that reason, on Windows, the standard build tool isn't even make at all, but rather MSBuild, which is also an XML-based tool, closer in principle to ant.
By contrast, ant is built in Java, can run everywhere, and contains internal tools, called "tasks", for manipulating files and executing commands in a platform-independent way. It's sufficiently versatile that you can actually have an easier time building a C program in Windows using ant than using make.
And one last minor one:
Even C programs don't use make natively
You may not initially notice this, but C programs generally aren't shipped with a Makefile. They are shipped with a CMakeLists.txt, or a bash configuration script, which generates the actual Makefile. By contrast, the source of a Java program built using ant is shipped with an ant script pre-built. A Makefile is a product of other tools - That's how much make is unsuitable to be a build tool on its own. ant is standalone, and deals with everything you need for your Java build process, without any additional requirements or dependencies.
When you run ant on any platform, it Just Works(tm). You can't get that with make. It's incredibly platform and configuration dependent.

I think the most likely explanation is that several factors discouraged the use of make within the Java community in a critical period of time (the late 1990s):
Because Java encompasses multiple platforms, Java programmers in general were not as adept at Unix tools as were programmers generally confined to a Unix environment (e.g., C and Perl programmers). Note that this is IN GENERAL. Without a doubt there are and were gifted Java programmers with a deep understanding of Unix.
Consequently they were less adept at make and didn't know how to use make effectively.
While it is possible to write a short and simple Makefile that compiles Java efficiently, extra care is required to do so in a platform-independent way.
Consequently there was an appetite for an intrinsically platform-independent build tool.
It was in this environment that Ant and later Maven were created.
In short, while make most certainly can be used for Java projects, there was a moment of opportunity to make it the de facto Java build tool. That moment has passed.

Unless I am no one the assumption no one is (mis)using make for java is wrong.
"Managing Projects with GNU Make" (available under GFDL) contains a complete chapter dedicated to using make with java projects.
As it contains a long (and hopefully fair) list of the pros and cons of using make instead of other tools you might want to take a look there. (see: http://oreilly.com/catalog/make3/book/)

Make scripts tend to be inherently platform dependent. Java is supposed to be platform independent. Therefore having a build system that only works on one platform for a multi-platform sourcebase is kindof a problem.

Ant is an XML configuration oriented improvement over Makefiles and Maven is a dependency build tool improvement over Ant. Some projects use all three. I think the JDK projects used to use a mix of makefiles and ant.

One big reason is that both Ant and Maven (and most java targeted SCM, CI and IDE tools) are written in java by/for java developers. This makes it simpler to integrate into your development environment and allows other tools such as the IDE and CI servers to integrate portions of the ant/maven libraries within the build/deployment infrastructure.

Once upon a time I worked on a Java project that used gmake. My recollection is hazy but IIRC we had a hard time dealing with the package directory structure that javac expects. I also remember that building JAR files was a hassle unless you had something trivial.

ApacheAnt isn't anything like Make. Make is about describing dependencies between files, and how to build files. Ant is about dependencies between "tasks", and is really more of a way of gluing build scripts together.
it may helps you AntVsMake

Ant and Maven approach the build dependency graph and the management of it from a more 'modern' view... But as Oscar says, they created their own problems while attempting to address the old problems with make.

I've never used GNU Make for Java projects, but I used to use jmk. Sadly it hasn't been updated since 2002.
It had some Java-specific functionality but was small enough to include in your source tarball without significantly increasing its size.
Nowadays I just assume any Java developer I share code with has Ant installed.

Tips for speeding up build time on Linux using ANT, Javacc, JUnit and compiling Java classes

We have a large codebase that takes approx 12 minutes on the developer machines to auto-generate some Java 5 classes using JavaCC and then compiles all the classes as well as running the units test.
The project consists of multiple projects which can be built in groups, but we are aiming for a full a build in under 10 minutes
What tips are there for reducing this build time?
Thanks

One quick fix that might shave some time off is to ensure that you are running Ant using the server JVM (by default it uses the client VM). Set ANT_OPTS to include "-server".

Profile the build process and see where the bottlenecks are. This can give you some ideas as to how to improve the process.
Try building independent projects in parallel on multi-core/CPU machines. As an extension of this idea, you may want to look around for a Java equivalent of distcc (don't know whether it exists) to distribute your build over a number of machines.
Get better machines.

some tips for reducing build time:
do less work.
e.g. remove unnecessary logging/echoing to files and
console
make your build 'incremental'.
Compile only changes classes.
eliminate duplicated effort.
Easier said than done, but if you run
the build in debug mode ("ant -debug") you can sometimes see
redundant tasks or targets.
avoid expensive operations.
copying of files, and packaging of jars into wars are necessary for release.Signing jars is expensive and should only be done, if possible' for milestone releases rather than every build

Try be inspired by pragmatic programmer. Compile only what is necessary, have two or more test suites. One for quick tests, other for full tests. Consider if there is real need to use each build-step every time. It necessary try to use jikes compiler instead of javac. After project spans several hundreds of classes I switch to jikes to improve speed. But be aware of potential incompatibility issues. Don't forget to include one all in one target to perform every step with full rebuild and full test of project.

Now that you've explained the process in more detail, here are two more options:
A dedicated machine/cluster where the build is performed much quicker than on a normal workstation. The developers would then, before a commit, run a script that builds their code on the dedicated machine/cluster.
Change the partitioning into sub-projects so that it's harder to break one project by modifying another. This should then make it less important to do a full build before every commit. Only commits that are touching sensitive sub-projects, or those spanning multiple projects would then need to be "checked" by means of a full build.

This probably wouldn't help in the very near term, but figured I should throw it out there anyway.
If your project is breakable into smaller projects (a database subsystem, logging, as examples), you may be interested in using something like maven to handle the build. You can run each smaller bite as a separate project or module, and maven will be able to maintain what needs to be built if changes exist. In this the build can focus onthe main portion of your project and it won't take nearly as long.

What is the breakdown in time spent:
generating the classes
compiling the classes
running the tests
Depending on your project, you may see significant increases in build time by allocating a larger heap size to javac(memoryMaximumSize) and junit(maxmemory).

Is it very important that the entire build lasts less than 10 minutes? If you make the sub-projects independent from one another, you could work on one sub-project while having already compiled the other ones (think Maven or Ivy to manage the dependencies).
Another solution (and if your modules are reasonably stable) is to treat your sub-projects as standalone projects. Each project would then follow their own release cycle and be available from a local Maven/Ivy repository. This of course works well if at least parts of the project are reasonably stable.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.