Lightweight microbenchmark library with graph output (Java)

Lightweight microbenchmark library with graph output (Java) - java

Is there a good Java library for taking the legwork out of writing good micro-benchmarks? I'm thinking something which can provide (with a minimum of hassle) provide text (CSV or HTML, take your pick) output of results and maybe graphs summarizing results. Ideally, it should be something that plays nicely with JUnit or equivalent, and should be simple to configure benchmarks with variable parameters.
I've looked at japex, but found it too heavyweight (25 MB of libraries to include?!) and frankly it was just a pain to work with. Virtually nonexistent documentation, mucking about with ant, XML, and paths... etc.

A few of us from the Google Collections team are in the early days of building something that satisfies your needs. Here's the code to measure how long foo() takes:
public class Benchmark1 extends SimpleBenchmark {
public void timeFoo(int reps) {
for (int i = 0; i < reps; i++) {
foo();
}
}
}
Neither the API nor the tool itself is particularly stable. We aren't even ready to receive bug reports or feature requests! If I haven't scared you off yet, I invite you to take Caliper for a spin.

Oracle now has JMH. Not only is it written by members of the JIT team (who will take out much of the legwork of writing good micro-benchmarks), but it also has other neat features like pluggable profilers (including those that will print assembly of your hotspots with per-line cpu time).
It prints out tables. Not sure about graphs. The benchmarks can be configured with variable parameters. The documentation is fairly good.
It is easy to set up and get going. I've got it integrated with JUnit, but the developers provide a Maven archetype to get started.

Related

Non-Toy Software Transactional Memory for C or Java

I'm thinking about the possibility of teaching the use of Software Transactional Memory through 1 or 2 guided laboratories for a university course. I only know about Haskell's STM, but the students of the course probably never heard a word about it.
I already found some lists of such libraries online or in other questions (e.g., http://en.wikipedia.org/wiki/Software_transactional_memory#C.2FC.2B.2B). I'm checking them out as you read this, but many of them do not seem to have a very nice documentation (most are research prototypes only vaguely described in papers, and I would rather teach about something more used and well documented).
Furthermore, many of the links provided by wikipedia are dangling.
To sum it up, are there STM implementations aimed to industrial projects (or at least non-toy ones, to ensure a certain level of quality) and well documented (to give some good pointers to the students)?
EDIT: I'm not the teacher of the course, I just help him with the laboratories. Of course the students will be taught basics of concurrency and distributed algorithms before. This was just an idea to propose something different towards the end of the course.

Production-quality STM-Libraries are not intended as a teaching tool, not even as "best practice". What is worth learning for any college/university-course is maybe 1% of the code; the remaining 99% is nitty-gritty platform-dependent intrinsic corner-cases. The 1% that is interesting is not highlighted in any way so you have no way of finding it.
What I recommend for a college/university-course (no matter if introductory or advanced) is to implement STM-buildingblocks yourself (and only for 1 platform).
Start by introducing the problems: concurrency, cache...
Then introduce the atomic helpers we have: cas/cmpxchg, fence.
Then build examples together with your students, first easy, then harder and more complex.

Start by introducing the problems: concurrency, cache...
Leading on from eznme, some good problems that I covered while at University for concurrency.
Dining philosophers problem
In computer science, the dining philosophers problem is an example problem often used in concurrent algorithm design to illustrate synchronization issues and techniques for resolving them.
(source: wikimedia.org)
Using the same implementation from here, by Je Magee and Je Kramer, and solving the problem using monitors.
Most shared memory applications are more efficient with Integers than Strings (due to AtomicInteger class for Java). So the best way to demonstrate shared memory in my opinion is to get the students to write an application that uses a threadpool to calculate prime numbers, or to calculate some integral.
Or a good example of threads and shared memory is the Producer-consumer problem.
The producer-consumer problem (also known as the bounded-buffer problem) is a classic example of a multi-process synchronization problem.
(source: csusb.edu)
Implementation found here, there is also an implementation from Massey from the professor in Software Eng Jenz Dietrich.
For distributed algorithms MapReduce and Hadoop are highly documented distributed data structures. And for Distributed Programming Libraries look into MPI (Message Passing Interface) and OpenMP (or Pragma for C++). There is also implementations of Dijkstra shortest path algorithm in parallel too.

There are three good ways to do STM today.
The first way is to use gcc and do TM in C or C++. As of gcc 4.7, transactional memory is supported via the -fgnu-tm flag. The gcc maintainers have done a lot of work, and as of the 4.9 (trunk) branch, you can even use hardware TM (e.g., Intel Haswell TSX). There is a draft specification for the interface to the TM at http://justingottschlich.com/tm-specification-for-c-v-1-1/, which is not too painful. You can also find use cases of gcc's TM from the TM community (see, for example, the application track papers from transact 2014: http://transact2014.cse.lehigh.edu).
The implementation itself is a bit complex, but that's what it takes to be correct. There's a lot of literature on the things that can go wrong, especially in a type-unsafe language like C or C++. GCC gets all of these things right. Really.
The second way is to use Java. You can either use DeuceSTM, which is very easy to extend (type safety makes TM implementation much easier!), or use Scala's Akka library for STM. I prefer Deuce, because it's easier to extend and easier to use (you just annotate a method as #Atomic, and Deuce's java agents do the rest).
The third way is to use Scala. I've not done much in this space, but researchers seem to love Akka. If you're affiliated with a parallel/distributed class, you might even be using Scala already.

Lightweight runtime "build system"

I've recently worked on refactoring a system that processes bundles of client data. The system executes a series of steps, each of which consumes files from previous steps (and sometime in-memory data), and produces its own output, in the form of files or data. Sometimes the output data for a particular step is already available. I have to be careful to make sure that, when one step fails, we continue to run all possible steps (ones that don't depend on the failed step), so that the final output is as complete as possible. Furthermore, not all steps have to be run in all situations.
Previously, the relationships were all implicit in the structure of the code. For instance:
void processClientData() {
try {
processA();
} catch(Exception e) {
log.log(Level.SEVERE, "exception occured in step A", e);
processC(); // C doesn't depend on A, so we can still run it.
throw e;
}
processB();
processC();
//etc... for ~20 steps
}
I changed this to make the dependencies explicit, the error handling uniform, etc, by introducing Tasks:
public interface Task {
List<Task> getDependencies();
void execute(); //only called after all dependencies have been executed
}
public class TaskRunner {
public void run(Set<Task> targets) {
// run the dependencies and targets ala ANT
// make sure to run all possible tasks on the "road" to targets
// ...
}
}
This starts to feel a lot like a very watered-down version of a build system with dependency management (ANT, being most familiar to me). I don't want to pull in ANT for this kind of thing, and I certainly don't want to write out the XML.
I have my system up and running (mostly), but it still feels a bit hacked together, and I have since reflected on how much I hate to be reinventing the wheel. I would expect that this is a fairly common problem - one that has been solved many times over by people smarter than me. Alas, a few hours of googling turned up nothing
Is there a library that implements this sort of thing, without being a really heavy-weight build system? I'd also appreciate any pointers, including libraries in other languages (or even novel systems) that I should take inspiration from.
EDIT: I appreciate the suggestions (and I will give them due consideration), but I'm really NOT looking for a "build system" per se. What I am looking for is something more like the kernel of a build system, that I could just call directly from Java and use as a small, low-overhead library for doing said dependency analysis, task execution, and resulting resource management. Like I said, I have existing (working) code in pure Java, and I don't want to bring in XML and all of the baggage that comes with it, without a very compelling reason.

At its core, a build system does 3 things. It manages dependency, it test whether something is "built" or not, and it "builds" the things that aren't built.
Dependency management is little more than a simple topological sort. There rest is iterating through the tasks in dependent order, and processing them.
You can readily create something like:
BuildSystem bs = new BuildSystem();
bs.addTask(new Task1());
bs.addTask(new Task...);
bs.addTask(new TaskN());
bs.build();
public void build() {
List<Task> sortedTasks = topologicalTaskSort(tasks);
for(Task t : sortedTasks) {
if (t.needsBuilding()) {
t.execute();
}
}
}
If you have no need to externalize the list of Tasks, then there's no reason for an XML file or anything.
The topological sort allows you to simply add tasks to the list and let the system sort things out. Not a problem with 4 tasks, more of an issue with dozens of tasks.
The sort fails if it detects a cycle of dependency, so that's where you get that control.
Something like this is "too simple" to need a framework. I don't know how you're doing your dependency management now.

Take a look at jsr166 fork/join framework. It seems to me this is exactly what you're trying to accomplish.
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ForkJoinTask.html
This is included in JDK7 but is available as a separate jar for 5 and 6. If I wasn't on my tablet I'd write a more comprehensive example. Maybe someone else can expand in the meantime.
public class DependencyTreeTask extends RecursiveAction {
private final List<DependencyTreeTask> dependencies = new ArrayList<Task>();
public void addDependency(DependencyTreeTask t) { dependencies.add(t) }
public void compute() {
invokeAll(dependencies);
}
}
...
// build tree...
DependencyTreeTask root = ...
ForkJoinPool pool = new ForkJoinPool();
pool.invoke(root);
You also have to take care if your graph is unconnected, but there are a well known set of algorithms for determining this.

I would consider writing a Maven plugin, it isn't that hard and much lighter weight because you only have to provide the relevant special logic. All the infrastructure is provided by Maven. Maven 3 would even give you things like parallel builds where your plugin supports it for free, amongst all the other things it provides.
One of the major goals of Maven 3 was a re-write to make it as easy as possible to embed the workflow engine in your own projects.

I've heard of Jenkins being used for this sort of thing in addition to it's primary 'build system' role. I only just started using Jenkins so can't say for sure whether it will do what you need. I'm impressed with it so far. It's relatively easy to use and has a lot of configuration options. There are a large number of plugins for it as well. Just run Jenkins and go to the plugins page to review the list and install them.

Your code reminds me of iwant, a java build engine I've been developing. You can declare your target definitions with dependencies using fluent java, and in addition to using it normally from a commandline (or ant script), you can also embed your build in a java program.

PHP to Java (using PtoJ)

I would like to transition our codebase from poorly written PHP code to poorly written Java, since I believe Java code is easier to tidy up. What are the pros and cons, and for those who have done it yourselves, would you recommend PtoJ for a project of about 300k ugly lines of code? Tips and tricks are most welcome; thanks!

Poorly written PHP is likely to be very hard to convert because a lot of the bad stuff in PHP just doesn't exist in Java (the same is true vice versa though, so don't take that as me saying Java is better - I'm going to keep well clear of that flame-war).
If you're talking about a legacy PHP app, then its highly likely that your code contains a lot of procedural code and inline HTML, neither of which are going to convert well to Java.
If you're really unlucky, you'll have things like eval() statements, dynamic variable names (using $$ syntax), looped include() statements, reliance on the 'register_globals' flag, and worse. That kind of stuff will completely thwart any conversion attempt.
Your other major problem is that debugging the result after the conversion is going to be hell, even if you have beautiful code to start with. If you want to avoid regressions, you will basically need to go through the entire code base on both sides with a fine comb.
The only time you're going to get a satisfactory result from an automated conversion of this type is if you start with a reasonably tide code base, written at least mainly in up-to-date OOP code.
In my opinion, you'd be better off doing the refacting excersise before the conversion. But of course, given your question, that would rather defeat the point. Therefore my recommendation is to stick it in PHP. PHP code can be very good, and even bad PHP can be polished up with a bit of refactoring.
[EDIT]
In answer to #Jonas's question in the comments, 'what is the best way to refactor horrible PHP code?'
It really depends on the nature of the code. A large monolithic block of code (which describes a lot of the bad PHP I've seen) can be very hard (if not imposible) to implementunit tests for. You may find that functional tests are the only kind of tests you can write on the old code base. These would use Selenium or similar tools to run the code through the browser as if it were a user. If you can get a set of reliable functional tests written, it is good for helping you remain confident that you aren't introducing regressions.
The good news is that it can be very easy - and satisfying - to rip apart bad code and rebuild it.
The way I've approached it in the past is to take a two-stage approach.
Stage one rewrites the monolithic code into decent quality procedural code. This is relatively easy, and the new code can be dropped into place as you go. This is where the bulk of the work happens, but you'll still end up with procedural code. Just better procedural code.
Stage two: once you've got a critical mass of reasonable quality procedural code, you can then refactor it again into an OOP model. This has to wait until later, because it is typically quite hard to convert old bad quality PHP straight into a set of objects. It also has to be done in fairly large chunks because you'll be moving large amounts of code into objects all at once. But if you did a good job in stage one, then stage two should be fairly straightforward.
When you've got it into objects, then you can start seriously thinking about unit tests.

I would say that automatic conversion from PHP to Java have the following:
pros:
quick and dirty, possibly making happy some project manager concerned with short-time delivery (assuming that you're lucky and the automatically generated code works without too much debugging, which I doubt)
cons:
ugly code: I doubt that automatic conversion from ugly PHP will generate anything but ugly Java
unmaintainable code: the automatically generate code is likely to be unmaintainable, or, at least, very difficult to maintain
bad approach: I assume you have a PHP Web application; in this case, I think that the automatic translation is unlikely to use Java best practices for Web application, or available frameworks
In summary
I would avoid automatic translation from PHP to Java, and I woudl at least consider rewriting the application from the ground up using Java. Especially if you have a Web application, choose a good Java framework for webapps, do a careful design, and proceed with an incremental implementation (one feature of your original PHP webapp at a time). With this approach, you'll end up with cleaner code that is easier to maintain and evolve ... and you may find out that the required time is not that bigger that what you'd need to clean/debug automatically generated code :)

P2J appears to be offline now, but I've written a proof-of-concept that converts a subset of PHP into Java. It uses the transpiler library for SWI-Prolog:
:- use_module(library(transpiler)).
:- set_prolog_flag(double_quotes,chars).
:- initialization(main).
main :-
Input = "function add($a,$b){ print $a.$b; return $a.$b;} function squared($a){ return $a*$a; } function add_exclamation_point($parameter){return $parameter.\"!\";}",
translate(Input,'php','java',X),
atom_chars(Y,X),
writeln(Y).
This is the program's output:
public static String add(String a,String b){
System.out.println(a+b);
return a+b;
}
public static int squared(int a){
return a*a;
}
public static String add_exclamation_point(String parameter){
return parameter+"!";
}

In contrast to other answers here, I would agree with your strategy to convert "PHP code to poorly written Java, since I believe Java code is easier to tidy up", but you need to make sure the tool that you are using doesn't introduce more bugs than you can handle.
An optimum stategy would be:
1) Do automated conversion
2) Get an MVP running with some basic tests
3) Start using the amazing Eclipse/IntelliJ refractoring tool to make the code more readable.
A modern Java IDE can refactor code with zero bugs when done properly. It can also tell you which functions are never called and a lot of other inspections.
I don't know how "PtoJ" was, since their website has vanished, but you ideally want something that doesn't just translate the syntax, but the logic. I used php2java.com recently and it worked very well. I've also used various "syntax" converters (not just for PHP to Java, but also ObjC -> Swift, Java -> Swift), and even they work just fine if you put in the time to make things work after.
Also, found this interesting blog entry about what might have happened to numiton PtoJ (http://www.runtimeconverter.com/single-post/2017/11/14/What-happened-to-numition).

http://www.numiton.com/products/ntile-ptoj/translation-samples/web-and-db-access/mysql.html
Would you rather not use Hibernate ?

Simple Java Map/Reduce framework [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Can anyone point me at a simple, open-source Map/Reduce framework/API for Java? There doesn't seem to much evidence of such a thing existing, but someone else might know different.
The best I can find is, of course, Hadoop MapReduce, but that fails the "simple" criteria. I don't need the ability to run distributed jobs, just something to let me run map/reduce-style jobs on a multi-core machine, in a single JVM, using standard Java5-style concurrency.
It's not a hard thing to write oneself, but I'd rather not have to.

Have you check out Akka? While akka is really a distributed Actor model based concurrency framework, you can implement a lot of things simply with little code. It's just so easy to divide work into pieces with it, and it automatically takes full advantage of a multi-core machine, as well as being able to use multiple machines to process work. Unlike using threads, it feels more natural to me.
I have a Java map reduce example using akka. It's not the easiest map reduce example, since it makes use of futures; but it should give you a rough idea of what's involved. There are several major things that my map reduce example demonstrates:
How to divide the work.
How to assign the work: akka has a really simple messaging system was well as a work partioner, whose schedule you can configure. Once I learned how to use it, I couldn't stop. It's just so simple and flexible. I was using all four of my CPU cores in no time. This is really great for implementing services.
How to know when the work is done and the result is ready to process: This is actually the portion that may be the most difficult and confusing to understand unless you're already familiar with Futures. You don't need to use Futures, since there are other options. I just used them because I wanted something shorter for people to grok.
If you have any questions, StackOverflow actually has an awesome akka QA section.

I think it is worth mentioning that these problems are history as of Java 8. An example:
int heaviestBlueBlock =
blocks.filter(b -> b.getColor() == BLUE)
.map(Block::getWeight)
.reduce(0, Integer::max);
In other words: single-node MapReduce is available in Java 8.
For more details, see Brian Goetz's presentation about project lambda

I use the following structure
int procs = Runtime.getRuntime().availableProcessors();
ExecutorService es = Executors.newFixedThreadPool(procs);
List<Future<TaskResult>> results = new ArrayList();
for(int i=0;i<tasks;i++)
results.add(es.submit(new Task(i)));
for(Future<TaskResult> future:results)
reduce(future);

I realise this might be a little after the fact but you might want to have a look at the JSR166y ForkJoin classes from JDK7.
There is a back ported library that works under JDK6 without any issues so you don't have to wait until the next millennium to have a go with it. It sits somewhere between an raw executor and hadoop giving a framework for working on map reduce job within the current JVM.

I created a one-off for myself a couple years ago when I got an 8-core machine, but I wasn't terribly happy with it. I never got it to be as simple to used as I had hoped, and memory-intensive tasks didn't scale well.
If you don't get any real answers I can share more, but the core of it is:
public class LocalMapReduce<TMapInput, TMapOutput, TOutput> {
private int m_threads;
private Mapper<TMapInput, TMapOutput> m_mapper;
private Reducer<TMapOutput, TOutput> m_reducer;
...
public TOutput mapReduce(Iterator<TMapInput> inputIterator) {
ExecutorService pool = Executors.newFixedThreadPool(m_threads);
Set<Future<TMapOutput>> futureSet = new HashSet<Future<TMapOutput>>();
while (inputIterator.hasNext()) {
TMapInput m = inputIterator.next();
Future<TMapOutput> f = pool.submit(m_mapper.makeWorker(m));
futureSet.add(f);
Thread.sleep(10);
}
while (!futureSet.isEmpty()) {
Thread.sleep(5);
for (Iterator<Future<TMapOutput>> fit = futureSet.iterator(); fit.hasNext();) {
Future<TMapOutput> f = fit.next();
if (f.isDone()) {
fit.remove();
TMapOutput x = f.get();
m_reducer.reduce(x);
}
}
}
return m_reducer.getResult();
}
}
EDIT: Based on a comment, below is a version without sleep. The trick is to use CompletionService which essentially provides a blocking queue of completed Futures.
public class LocalMapReduce<TMapInput, TMapOutput, TOutput> {
private int m_threads;
private Mapper<TMapInput, TMapOutput> m_mapper;
private Reducer<TMapOutput, TOutput> m_reducer;
...
public TOutput mapReduce(Collection<TMapInput> input) {
ExecutorService pool = Executors.newFixedThreadPool(m_threads);
CompletionService<TMapOutput> futurePool =
new ExecutorCompletionService<TMapOutput>(pool);
Set<Future<TMapOutput>> futureSet = new HashSet<Future<TMapOutput>>();
for (TMapInput m : input) {
futureSet.add(futurePool.submit(m_mapper.makeWorker(m)));
}
pool.shutdown();
int n = futureSet.size();
for (int i = 0; i < n; i++) {
m_reducer.reduce(futurePool.take().get());
}
return m_reducer.getResult();
}
I'll also note this is a very distilled map-reduce algorithm, including a single reduce worker which does both the reduce and merge operation.

I like to use Skandium for parallelism in Java. The framework implements certain patterns of parallelism (namely Master-Slave, Map/Reduce, Pipe, Fork and Divide & Conquer) for multi-core machines with shared memory. This technique is called "algorithmic skeletons". The patterns can be nested.
In detail there are skeletons and muscles. Muscles do the actual work (split, merge, execute and condition). Skeletons represent the patterns of parallelism, except for "While", "For" and "If", which can be useful when nesting patterns.
Examples can be found inside the framework. I needed a bit to understand how to use the muscles and skeletons, but after getting over this hurdle I really like this framework. :)

Have you had a look at GridGain ?

You might want to take a look at the project website of Functionals 4 Java: http://f4j.rethab.ch/ It introduces filter, map and reduce to java versions before 8.

A MapReduce API was introduced into v3.2 of Hazelcast (see the MapReduce API section in the docs). While Hazelcast is intended to be used in a distributed system, it works perfectly well in a single node setup, and it's fairly lightweight.

You can try LeoTask : a parallel task running and results aggregation framework
It is free and open-source: https://github.com/mleoking/leotask
Here is a brief introduction showing Its API: https://github.com/mleoking/leotask/blob/master/leotask/introduction.pdf?raw=true
It is a light weight framework working on a single computer using all its availble CPU-cores.
It has the following features:
Automatic & parallel parameter space exploration
Flexible & configuration-based result aggregation
Programming model focusing only on the key logic
Reliable & automatic interruption recovery
and Utilities:
Dynamic & cloneable networks structures.
Integration with Gnuplot
Network generation according to common network models
DelimitedReader: a sophisticated reader that explores CSV (Comma-separated values) files like a database
Fast random number generator based on the Mersenne Twister algorithm
An integrated CurveFitter from the ImageJ project

Class libraries for Java's immense verbosity

I recently got into Java. I have a background in dynamic languages and I'm finally figuring out why people complain about Java's verbosity. Are there any class libraries out there that address this issue? I'd much rather type something like String text = someClass.stdin() instead of the 8 or so lines it takes to get user input in Java.

In Java 5:
import java.util.Scanner;
...
System.out.print("Enter your name: ");
String userName = new Scanner(System.in).nextLine();
Or, in Java 6:
String userName = System.console().readLine("Enter your name: ");

Some of the Apache Commons libraries (particularly Lang, IO and Collections) are designed to hide the verbosity of certain core Java APIs. The verbosity of the Java language, however, we're all stuck with.

Sure there are several JPython, JRuby, Clojure, Scala...

Google has also released a number of libraries that complement sections of the standard library, like the collections library. Guice is also a nice lightweight DI framework that, IMHO, is easier to learn that spring.
The standard library is so large I don't think you'll find a single library that replaces everything. You're best bet is to look for libraries that solve individual problems (i.e. I don't like the Collections API, I need an object pool, etc.)

I'd be interested in seeing these 8 lines to get user input in Java.
I personally think that Java's verbosity becomes an asset as your program becomes larger. Unlike C and C++, everything is done in a more object oriented way. You get the object representing your output, then you issue an operation on it, and so on. Much easier to understand and maintain in the long run.
Is this as quick as a nice printf() here and there? No. Is it as convenient as scripting in Python? Of course not. But that's part of the cost of using a language like Java, just like the lack of Lambdas is annoying.
As an engineer your role is to pick the best tool for the job. I do most of my coding in Java, and some in Python, accepting the tradeoffs of each.
While you can't change the language, you could use libraries that simplify some operations (e.g., Google's or Apache's IO libraries). You could also write your own classes for the things that annoy you the most.
I also think you're confusing the verbosity of the language and of the standard library. The library contains a lot of stuff, most of it you'll never need. I find the existing division fairly straightforward and have never found myself in areas I didn't care about.
If you really can't stand Java, you might want to use hybrid languages like Scala.

I'm a big fan of leaning on my IDE's live templating features. (IntelliJ IDEA) I can't remember the last time I spelled out StringBuffer or System.out.println("...").

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.