Java.
Is it normal that i get stack overflow error after 10 000 recursive void function calls with a reference and two integers as arguments?
Got 6gb ram memory, tried running through IDE and command line. I'm pretty sure the code is correct and recursion should finish.
It's about a Fill tool for a tile map editor. It start's at a certain tile and goes up, down, right and left if the coincident tile is of the same type and doesn't come back.
Tried different approaches, here is the one with additional boolean table indicating whether [x][y] tile was visited and replacing marked tiles after recursion is done:
public void fillRec(Tile t, int column, int row) {
if (affected[column][row] || t.getName() != pattern)
return;
/*t.replaceMe(editor.currentTileButton.spawnTile(column, row,
editor.tileMap));*/
affected[column][row] = true;
if (column < editor.tileMap.tilesX - 1) {
fillRec(editor.tileMap.tiles[column + 1][row], column + 1, row);
}
if (column > 0) {
fillRec(editor.tileMap.tiles[column - 1][row], column - 1, row);
}
if (row < editor.tileMap.tilesY - 1) {
fillRec(editor.tileMap.tiles[column][row + 1], column, row + 1);
}
if (row > 0) {
fillRec(editor.tileMap.tiles[column][row - 1], column, row - 1);
}
}
This works fine with ~75x75 map, so did functions replacing tile and doing other heavy stuff in their bodies.
Yes, each method call uses up a Stack frame. If you want to use large scale recursion in Java, you'll need to use a Trampoline - which can swap stack space for heap space. A trampoline typically has two states
completed
more work to do
The completed state holds the final result, and the more work to do can be implemented with a Supplier (in Java 8) or similar construct, that makes the next recursive call. The Trampoline implementation should manages the calls to your method, and iterates rather than recurses.
Here is a simple looping example with a Trampoline.
Trampoline<Integer> loop(int times,int sum){
if(times==0)
return Trampoline.done(sum);
else
return Trampoline.more(()->loop(times-1,sum+times));
}
To make the call to loop
loop(100,10).result();
Note the method returns a lazy Trampoline Object immediately (i.e. it doesn't perform the summing), and the Trampoline runs through the simple summing algorithm when result is called - in an iterative, rather than recursive fashion.
There is a Trampoline implementation in a library I wrote called cyclops-trampoline that you can use. Or if you prefer here is how to roll your own (this implementation makes use of a nice technique by Mario Fusco of managing the trampoline iteration in a Java 8 Stream).
public interface Trampoline<T> {
default Trampoline<T> bounce(){
return this;
}
T result();
default boolean complete() {
return true;
}
public static <T> Trampoline<T> done(T result) {
return () -> result;
}
public static <T> Trampoline<T> more(Trampoline<Trampoline<T>> trampoline) {
return new Trampoline<T>() {
#Override
public boolean complete() {
return false;
}
#Override
public Trampoline<T> bounce() {
return trampoline.result();
}
public T result() {
return trampoline(this);
}
T trampoline(Trampoline<T> trampoline) {
return Stream.iterate(trampoline,Trampoline::bounce)
.filter(Trampoline::complete)
.findFirst()
.get()
.result();
}
};
}
}
It depends on how much data these functions place on the stack in relation to the configured (or default) stack size, so it's not only the stack size used by the arguments to the function call.
So yes, it does not sound unnormal. You should play with the stack size or implement it differently.
That sounds normal. If you don't specify otherwise, the default Java stack size is 1Mb or less, depending on your JVM and execution platform. A stack overflow at ~10,000 recursive calls sounds quite plausible for a stack with the default size.
You can change the JVM's default stack size with the -Xss option; e.g. -Xss10m sets the default size to 10MB.
You can also specify a thread stack size directly via the Thread constructor.
However, this does illustrate a point that is not obvious to new Java programmers. Unlike typical functional programming languages (and many others) the standard Java implementations do not do "tail call optimization". This means that a recursive call sequence always needs stack space that is proportional to the maximum recursion depth.
This is a potential problem for programmers who prefer to use recursion rather than iteration. Unfortunately, if your data is such that deep recursion is a possibility, you really need to convert to an iterative solution. (Or find some other way to move the "recursion state" off the stack.)
Related
I know this question has been answered already via numerous methods:
Set maximum stack size (-Xss20m)
Avoid the test what so ever - if you need a bigger recursion the problem is within the program.
Those methods are great, but I know there is a problem in my code, and I want to specifically limit (to a small number e.g. 5) the recursion depth, to test whether this is the problem.
Is there a method, like the sys.setrecursionlimit in python?
The least invasive "Manual" way to do this (also the most hacky) is probably to create a static variable in your class that is having the recursion issue. When you enter the recursive method use it to count the recursion depth (by adding or subtracting) and when you exit, reverse what you did upon entry.
This isn't great but it is a lot better than trying to set a stack depth (Nearly any system call in java will blow through 5 levels of stack without even blinking).
If you don't use a static you may end up having to pass your stack depth variable through quite a few classes, it's very invasive to the rest of your code.
As an alternative I suggest you let it fail "normally" and throw the exception then meditate upon the stack trace for a while--they are really informative and will probably lead you to the source of your problem more quickly than anything else.
static int maxDepth = 5;
public void recursiveMethod() {
if(maxDepth-- == 0)
throw new IllegalStateException("Stack Overflow");
recursiveMethod();
maxDepth++;
}
Create this class:
public class RecursionLimiter {
public static int maxLevel = 10;
public static void emerge() {
if (maxLevel == 0)
return;
try {
throw new IllegalStateException("Too deep, emerging");
} catch (IllegalStateException e) {
if (e.getStackTrace().length > maxLevel + 1)
throw e;
}
}
}
Then import static and insert emerge() call into the beginning of any method in your code that can be deeply recursive. You can adjust maximum allowed recursion level via the maxLevel variable. The emerge() procedure will interrupt execution on a level greater than the value of that variable. You can switch off this behaviour by setting maxLevel to 0. This solution is thread-safe because it doesn't use any counter at all.
I wrote a minimal somewhat-lazy (int) sequence class, GarbageTest.java, as an experiment, to see if I could process very long, lazy sequences in Java, the way I can in Clojure.
Given a naturals() method that returns the lazy, infinite, sequence of natural numbers; a drop(n,sequence) method that drops the first n elements of sequence and returns the rest of the sequence; and an nth(n,sequence) method that returns simply: drop(n, lazySeq).head(), I wrote two tests:
static int N = (int)1e6;
// succeeds # N = (int)1e8 with java -Xmx10m
#Test
public void dropTest() {
assertThat( drop(N, naturals()).head(), is(N+1));
}
// fails with OutOfMemoryError # N = (int)1e6 with java -Xmx10m
#Test
public void nthTest() {
assertThat( nth(N, naturals()), is(N+1));
}
Note that the body of dropTest() was generated by copying the body of nthTest() and then invoking IntelliJ's "inline" refactoring on the nth(N, naturals()) call. So it seems to me that the behavior of dropTest() should be identical to the behavior of nthTest().
But it isn't identical! dropTest() runs to completion with N up to 1e8 whereas nthTest() fails with OutOfMemoryError for N as small as 1e6.
I've avoided inner classes. And I've experimented with a variant of my code, ClearingArgsGarbageTest.java, that nulls method parameters before calling other methods. I've applied the YourKit profiler. I've looked at the byte code. I just cannot find the leak that causes nthTest() to fail.
Where's the "leak"? And why does nthTest() have the leak while dropTest() does not?
Here's the rest of the code from GarbageTest.java in case you don't want to click through to the Github project:
/**
* a not-perfectly-lazy lazy sequence of ints. see LazierGarbageTest for a lazier one
*/
static class LazyishSeq {
final int head;
volatile Supplier<LazyishSeq> tailThunk;
LazyishSeq tailValue;
LazyishSeq(final int head, final Supplier<LazyishSeq> tailThunk) {
this.head = head;
this.tailThunk = tailThunk;
tailValue = null;
}
int head() {
return head;
}
LazyishSeq tail() {
if (null != tailThunk)
synchronized(this) {
if (null != tailThunk) {
tailValue = tailThunk.get();
tailThunk = null;
}
}
return tailValue;
}
}
static class Incrementing implements Supplier<LazyishSeq> {
final int seed;
private Incrementing(final int seed) { this.seed = seed;}
public static LazyishSeq createSequence(final int n) {
return new LazyishSeq( n, new Incrementing(n+1));
}
#Override
public LazyishSeq get() {
return createSequence(seed);
}
}
static LazyishSeq naturals() {
return Incrementing.createSequence(1);
}
static LazyishSeq drop(
final int n,
final LazyishSeq lazySeqArg) {
LazyishSeq lazySeq = lazySeqArg;
for( int i = n; i > 0 && null != lazySeq; i -= 1) {
lazySeq = lazySeq.tail();
}
return lazySeq;
}
static int nth(final int n, final LazyishSeq lazySeq) {
return drop(n, lazySeq).head();
}
In your method
static int nth(final int n, final LazyishSeq lazySeq) {
return drop(n, lazySeq).head();
}
the parameter variable lazySeq hold a reference to the first element of your sequence during the entire drop operation. This prevents the entire sequence from getting garbage collected.
In contrast, with
public void dropTest() {
assertThat( drop(N, naturals()).head(), is(N+1));
}
the first element of your sequence is returned by naturals() and directly passed to the invocation of drop, thus removed from the operand stack and does not exist during the execution of drop.
Your attempt to set the parameter variable to null, i.e.
static int nth(final int n, /*final*/ LazyishSeq lazySeqArg) {
final LazyishSeq lazySeqLocal = lazySeqArg;
lazySeqArg = null;
return drop(n,lazySeqLocal).head();
}
does not help, as now, the lazySeqArg variable is null, but the lazySeqLocal holds a reference to the first element.
A local variable does not prevent garbage collection in general, the collection of otherwise unused objects is permitted, but that doesn’t imply that a particular implementation is capable of doing it.
In case of the HotSpot JVM, only optimized code will get rid of such unused references. But here, nth is not a hot spot, as the heavy things happen within drop method.
This is the reason why the same issue does not appear at the drop method, despite it also holds a reference to the first element in its parameter variable. The drop method contains the loop doing the actual work, hence, is very likely to get optimized by the JVM, which may cause it to eliminate unused variables, allowing the already processed part of the sequence to become collected.
There are many factors which may affect the JVM’s optimizations. Besides the different shape of the code, it seems that that rapid memory allocations during the unoptimized phase may also reduce the optimizer’s improvements. Indeed, when I run with -Xcompile, to forbid interpreted execution altogether, both variants run successfully, even int N = (int)1e9 is no problem anymore. Of course, forcing compilation raises the startup time.
I have to admit that I do not understand why the mixed mode performs that much worse and I’ll investigate further. But generally, you have to be aware that the efficiency of the garbage collector is implementation dependent, so objects collected in one environment may stay in memory in another.
Clojure implements a strategy for dealing with this sort of scenario which it calls "locals clearing". There's support for it in the compiler that makes it kick in automatically where required in pure Clojure code (unless disabled at compilation time – this is sometimes useful for debugging). Clojure does also clear locals in various places in its Java runtime, however, and the way it does that could be used in Java libraries and possibly even application code, though it would undoubtedly be somewhat cumbersome.
Before I get into what Clojure does, here's a short summary of what is going on in this example:
nth(int, LazyishSeq) is implemented in terms of drop(int, LazyishSeq) and LazyishSeq.head().
nth passes both its arguments to drop and has no further use for them.
drop can easily be implemented so as to avoid holding on to the head of the passed-in sequence.
Here nth still holds on to the head of its sequence argument. The runtime may potentially discard that reference, but it is not guaranteed that it will.
The way Clojure deals with this is by clearing the reference to the sequence explicitly before control is handed off to drop. This is done using a rather elegant trick (link to the below snippet on GitHub as of Clojure 1.9.0):
// clojure/src/jvm/clojure/lang/Util.java
/**
* Copyright (c) Rich Hickey. All rights reserved.
* The use and distribution terms for this software are covered by the
* Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php)
* which can be found in the file epl-v10.html at the root of this distribution.
* By using this software in any fashion, you are agreeing to be bound by
* the terms of this license.
* You must not remove this notice, or any other, from this software.
**/
// … beginning of the file omitted …
// the next line is the 190th in the file as of Clojure 1.9.0
static public Object ret1(Object ret, Object nil){
return ret;
}
static public ISeq ret1(ISeq ret, Object nil){
return ret;
}
// …
Given the above, the call to drop inside nth can be changed to
drop(n, ret1(lazySeq, lazySeq = null))
Here lazySeq = null is evaluated as an expression before control is transferred to ret1; the value is null and there is also the side effect of setting the lazySeq reference to null. The first argument to ret1 will have been evaluated by this point, however, so ret1 receives the reference to the sequence in its first argument and returns it as expected, and that value is then passed to drop.
Thus drop receives the original value held by the lazySeq local, but the local itself is cleared before control is transferred to drop.
Consequently nth no longer holds on to the head of the sequence.
I solved this problem using a graph, but unfortunately now I'm stuck with having to use a 2d array and I have questions about the best way to go about this:
public class Data {
int[][] structure;
public data(int x, int y){
structure = new int[x][y]
}
public <<TBD>> generateRandom() {
// This is what my question is about
}
}
I have a controller/event handler class:
public class Handler implements EventHandler {
#Override
public void onEvent(Event<T> e) {
this.dataInstance.generateRandom();
// ... other stuff
}
}
Here is what each method will do:
Data.generateRandom() will generate a random value at a random location in the 2d int array if there exists a value in the structure that in not initialized or a value exists that is equal to zero
If there is no available spot in the structure, the structure's state is final (i.e. in the literal sense, not the Java declaration)
This is what I'm wondering:
What is the most efficient way to check if the board is full? Using a graph, I was able to check if the board was full on O(1) and get an available yet also random location on worst-case O(n^2 - 1), best case O(1). Obviously now with an array improving n^2 is tough, so I'm just now focusing on execution speed and LOC. Would the fastest way to do it now to check the entire 2d array using streams like:
Arrays.stream(board).flatMapToInt(tile -> tile.getX()).map(x -> x > 0).count() > board.getWidth() * board.getHeight()
(1) You can definitely use a parallel stream to safely perform read only operations on the array. You can also do an anyMatch call since you are only caring (for the isFull check) if there exists any one space that hasn't been initialized. That could look like this:
Arrays.stream(structure)
.parallel()
.anyMatch(i -> i == 0)
However, that is still an n^2 solution. What you could do, though, is keep a counter of the number of spaces possible that you decrement when you initialize a space for the first time. Then the isFull check would always be constant time (you're just comparing an int to 0).
public class Data {
private int numUninitialized;
private int[][] structure;
public Data(int x, int y) {
if (x <= 0 || y <= 0) {
throw new IllegalArgumentException("You can't create a Data object with an argument that isn't a positive integer.");
}
structure = new int[x][y];
int numUninitialized = x * y;
}
public void generateRandom() {
if (isFull()) {
// do whatever you want when the array is full
} else {
// Calculate the random space you want to set a value for
int x = ThreadLocalRandom.current().nextInt(structure.length);
int y = ThreadLocalRandom.current().nextInt(structure[0].length);
if (structure[x][y] == 0) {
// A new, uninitialized space
numUninitialized--;
}
// Populate the space with a random value
structure[x][y] = ThreadLocalRandom.current().nextInt(Integer.MIN_VALUE, Integer.MAX_VALUE);
}
}
public boolean isFull() {
return 0 == numUninitialized;
}
}
Now, this is with my understanding that each time you call generateRandom you take a random space (including ones already initialized). If you are supposed to ONLY choose a random uninitialized space each time it's called, then you'd do best to hold an auxiliary data structure of all the possible grid locations so that you can easily find the next random open space and to tell if the structure is full.
(2) What notification method is appropriate for letting other classes know the array is now immutable? It's kind of hard to say as it depends on the use case and the architecture of the rest of the system this is being used in. If this is an MVC application with a heavy use of notifications between the data model and a controller, then an observer/observable pattern makes a lot of sense. But if your application doesn't use that anywhere else, then perhaps just having the classes that care check the isFull method would make more sense.
(3) Java is efficient at creating and freeing short lived objects. However, since the arrays can be quite large I'd say that allocating a new array object (and copying the data) over each time you alter the array seems ... inefficient at best. Java has the ability to do some functional types of programming (especially with the inclusion of lambdas in Java 8) but only using immutable objects and a purely functional style is kind of like the round hole to Java's square peg.
As a Java dev, I'm currently looking at Go because I think it's an interesting language.
To start with it, I decided to take a simple Java project I wrote months ago, and re-write it in Go to compare performances and (mainly, actually) compare the code readability/complexity.
The Java code sample is the following:
public static void main(String[] args) {
long start = System.currentTimeMillis();
Stream<Container> s = Stream.from(new Iterator<Container>() {
int i = 0;
#Override
public boolean hasNext() {
return i < 10000000;
}
#Override
public Container next() {
return new Container(i++);
}
});
s = s.map((Container _source) -> new Container(_source.value * 2));
int j = 0;
while (s.hasNext()) {
s.next();
j++;
}
System.out.println(System.currentTimeMillis() - start);
System.out.println("j:" + j);
}
public static class Container {
int value;
public Container(int v) {
value = v;
}
}
Where the map function is:
return new Stream<R>() {
#Override
public boolean hasNext() {
return Stream.this.hasNext();
}
#Override
public R next() {
return _f.apply(Stream.this.next());
}
};
And the Stream class is just an extension to java.util.Iterator to add custom methods to it. Other methods than map differs from standard Java Stream API.
Anyway, to reproduce this, I wrote the following Go code:
package main
import (
"fmt"
)
type Iterator interface {
HasNext() bool
Next() interface{}
}
type Stream interface {
HasNext() bool
Next() interface{}
Map(transformer func(interface{}) interface{}) Stream
}
///////////////////////////////////////
type incremetingIterator struct {
i int
}
type SampleEntry struct {
value int
}
func (s *SampleEntry) Value() int {
return s.value
}
func (s *incremetingIterator) HasNext() bool {
return s.i < 10000000
}
func (s *incremetingIterator) Next() interface{} {
s.i = s.i + 1
return &SampleEntry{
value: s.i,
}
}
func CreateIterator() Iterator {
return &incremetingIterator{
i: 0,
}
}
///////////////////////////////////////
type stream struct {
source Iterator
}
func (s *stream) HasNext() bool {
return s.source.HasNext()
}
func (s *stream) Next() interface{} {
return s.source.Next()
}
func (s *stream) Map(tr func(interface{}) interface{}) Stream {
return &stream{
source: &mapIterator{
source: s,
transformer: tr,
},
}
}
func FromIterator(it Iterator) Stream {
return &stream{
source: it,
}
}
///////////////////////////////////////
type mapIterator struct {
source Iterator
transformer func(interface{}) interface{}
}
func (s *mapIterator) HasNext() bool {
return s.source.HasNext()
}
func (s *mapIterator) Next() interface{} {
return s.transformer(s.source.Next())
}
///////////////////////////////////////
func main() {
it := CreateIterator()
ss := FromIterator(it)
ss = ss.Map(func(in interface{}) interface{} {
return &SampleEntry{
value: 2 * in.(*SampleEntry).value,
}
})
fmt.Println("Start")
for ss.HasNext() {
ss.Next()
}
fmt.Println("Over")
}
Both producing the same result but when Java takes about 20ms, Go takes 1050ms (with 10M items, test ran several times).
I'm very new to Go (started couple of hours ago) so please be indulgent if I did something really bad :-)
Thank you!
The other answer changed the original task quite "dramatically", and reverted to a simple loop. I consider it to be different code, and as such, it cannot be used to compare execution times (that loop could be written in Java as well, which would give smaller execution time).
Now let's try to keep the "streaming manner" of the problem at hand.
Note beforehand:
One thing to note beforehand. In Java, the granularity of System.currentTimeMillis() could be around 10 ms (!!) which is in the same order of magnitude of the result! This means the error rate could be huge in Java's 20 ms! So instead you should use System.nanoTime() to measure code execution times! For details, see Measuring time differences using System.currentTimeMillis().
Also this is not the correct way to measure execution times, as running things for the first time might run several times slower. For details, see Order of the code and performance.
Genesis
Your original Go proposal runs on my computer roughly for 1.1 seconds, which is about the same as yours.
Removing interface{} item type
Go doesn't have generics, trying to mimic this behavior with interface{} is not the same and have serious performance impact if the value you want to work with is a primitive type (e.g. int) or some simple structs (like the Go equivalent of your Java Container type). See: The Laws of Reflection #The representation of an interface. Wrapping an int (or any other concrete type) in an interface requires creating a (type;value) pair holding the dynamic type and value to be wrapped (creation of this pair also involves copying the value being wrapped; see an analysis of this in the answer How can a slice contain itself?). Moreover when you want to access the value, you have to use a type assertion which is a runtime check, so the compiler can't be of any help optimizing that (and the check will add to the code execution time)!
So let's not use interface{} for our items, but instead use a concrete type for our case:
type Container struct {
value int
}
We will use this in the iterator's and stream's next method: Next() Container, and in the mapper function:
type Mapper func(Container) Container
Also we may utilize embedding, as the method set of Iterator is a subset of that of Stream.
Without further ado, here is the complete, runnable example:
package main
import (
"fmt"
"time"
)
type Container struct {
value int
}
type Iterator interface {
HasNext() bool
Next() Container
}
type incIter struct {
i int
}
func (it *incIter) HasNext() bool {
return it.i < 10000000
}
func (it *incIter) Next() Container {
it.i++
return Container{value: it.i}
}
type Mapper func(Container) Container
type Stream interface {
Iterator
Map(Mapper) Stream
}
type iterStream struct {
Iterator
}
func NewStreamFromIter(it Iterator) Stream {
return iterStream{Iterator: it}
}
func (is iterStream) Map(f Mapper) Stream {
return mapperStream{Stream: is, f: f}
}
type mapperStream struct {
Stream
f Mapper
}
func (ms mapperStream) Next() Container {
return ms.f(ms.Stream.Next())
}
func (ms mapperStream) Map(f Mapper) Stream {
return nil // Not implemented / needed
}
func main() {
s := NewStreamFromIter(&incIter{})
s = s.Map(func(in Container) Container {
return Container{value: in.value * 2}
})
fmt.Println("Start")
start := time.Now()
j := 0
for s.HasNext() {
s.Next()
j++
}
fmt.Println(time.Since(start))
fmt.Println("j:", j)
}
Execution time: 210 ms. Nice, we're already sped it up 5 times, yet we're far from Java's Stream performance.
"Removing" Iterator and Stream types
Since we can't use generics, the interface types Iterator and Stream doesn't really need to be interfaces, since we would need new types of them if we'd wanted to use them to define iterators and streams of another types.
So the next thing we do is we remove Stream and Iterator, and we use their concrete types, their implementations above. This will not hurt readability at all, in fact the solution is shorter:
package main
import (
"fmt"
"time"
)
type Container struct {
value int
}
type incIter struct {
i int
}
func (it *incIter) HasNext() bool {
return it.i < 10000000
}
func (it *incIter) Next() Container {
it.i++
return Container{value: it.i}
}
type Mapper func(Container) Container
type iterStream struct {
*incIter
}
func NewStreamFromIter(it *incIter) iterStream {
return iterStream{incIter: it}
}
func (is iterStream) Map(f Mapper) mapperStream {
return mapperStream{iterStream: is, f: f}
}
type mapperStream struct {
iterStream
f Mapper
}
func (ms mapperStream) Next() Container {
return ms.f(ms.iterStream.Next())
}
func main() {
s0 := NewStreamFromIter(&incIter{})
s := s0.Map(func(in Container) Container {
return Container{value: in.value * 2}
})
fmt.Println("Start")
start := time.Now()
j := 0
for s.HasNext() {
s.Next()
j++
}
fmt.Println(time.Since(start))
fmt.Println("j:", j)
}
Execution time: 50 ms, we've again sped it up 4 times compared to our previous solution! Now that's the same order of magnitude of the Java's solution, and we've lost nothing from the "streaming manner". Overall gain from the asker's proposal: 22 times faster.
Given the fact that in Java you used System.currentTimeMillis() to measure execution, this may even be the same as Java's performance. Asker confirmed: it's the same!
Regarding the same performance
Now we're talking about roughly the "same" code which does pretty simple, basic tasks, in different languages. If they're doing basic tasks, there is not much one language could do better than the other.
Also keep in mind that Java is a mature adult (over 21 years old), and had an enormous time to evolve and be optimized; actually Java's JIT (just-in-time compilation) is doing a pretty good job for long running processes, such as yours. Go is much younger, still just a kid (will be 5 years old 11 days from now), and probably will have better performance improvements in the foreseeable future than Java.
Further improvements
This "streamy" way may not be the "Go" way to approach the problem you're trying to solve. This is merely the "mirror" code of your Java's solution, using more idiomatic constructs of Go.
Instead you should take advantage of Go's excellent support for concurrency, namely goroutines (see go statement) which are much more efficient than Java's threads, and other language constructs such as channels (see answer What are golang channels used for?) and select statement.
Properly chunking / partitioning your originally big task to smaller ones, a goroutine worker pool might be quite powerful to process big amount of data. See
Is this an idiomatic worker thread pool in Go?
Also you claimed in your comment that "I don't have 10M items to process but more 10G which won't fit in memory". If this is the case, think about IO time and the delay of the external system you're fetching the data from to process. If that takes significant time, it might out-weight the processing time in the app, and app's execution time might not matter (at all).
Go is not about squeezing every nanosecond out of execution time, but rather providing you a simple, minimalist language and tools, by which you can easily (by writing simple code) take control of and utilize your available resources (e.g. goroutines and multi-core CPU).
(Try to compare the Go language spec and the Java language spec. Personally I've read Go's lang spec multiple times, but could never get to the end of Java's.)
This is I think an interesting question as it gets to the heart of differences between Java and Go and highlights the difficulties of porting code. Here is the same thing in go minus all the machinery (time ~50ms here):
values := make([]int64, 10000000)
start := time.Now()
fmt.Println("Start")
for i := int64(0); i < 10000000; i++ {
values[i] = 2 * i
}
fmt.Println("Over after:", time.Now().Sub(start))
More seriously here is the same thing with a map over a slice of entries which is a more idiomatic version of what you have above and could work with any sort of Entry struct. This actually works out at a faster time on my machine of 30ms than the for loop above (anyone care to explain why?), so probably similar to your Java version:
package main
import (
"fmt"
"time"
)
type Entry struct {
Value int64
}
type EntrySlice []*Entry
func New(l int64) EntrySlice {
entries := make(EntrySlice, l)
for i := int64(0); i < l; i++ {
entries[i] = &Entry{Value: i}
}
return entries
}
func (entries EntrySlice) Map(fn func(i int64) int64) {
for _, e := range entries {
e.Value = fn(e.Value)
}
}
func main() {
entries := New(10000000)
start := time.Now()
fmt.Println("Start")
entries.Map(func(v int64) int64 {
return 2 * v
})
fmt.Println("Over after:", time.Now().Sub(start))
}
Things that will make operations more expensive -
Passing around interface{}, don't do this
Building a separate iterator type - use range or for loops
Allocations - so building new types to store answers, transform in place
Re using interface{}, I would avoid this - this means you have to write a separate map (say) for each type, not a great hardship. Instead of building an iterator, a range is probably more appropriate. Re transforming in place, if you allocate new structs for each result it'll put pressure on the garbage collector, using a Map func like this is an order of magnitude slower:
entries.Map(func(e *Entry) *Entry {
return &Entry{Value: 2 * e.Value}
})
To stream split the data into chunks and do the same as above (keeping a memo of last object if you depend on previous calcs). If you have independent calculations (not as here) you could also fan out to a bunch of goroutines doing the work and get it done faster if there is a lot of it (this has overhead, in simple examples it won't be faster).
Finally, if you're interested in data processing with go, I'd recommend visiting this new site: http://gopherdata.io/
Just as a complement to the previous comments, I changed the code of both Java and Go implementations to run the test 100 times.
What's interesting here is that Go takes a constant time between 69 and 72ms.
Owever, Java takes 71ms the first time (71ms, 19ms, 12ms) and then between 5 and 7ms.
From my test and understanding, this comes from the fact that the JVM takes a bit of time to properly load the classes and do some optimization.
In the end I'm still having this 10 times performance difference but I'm not giving up and I'll try to have a better understanding of how Go works to try to have it more fast :)
Assume that we have a given interface:
public interface StateKeeper {
public abstract void negateWithoutCheck();
public abstract void negateWithCheck();
}
and following implementations:
class StateKeeperForPrimitives implements StateKeeper {
private boolean b = true;
public void negateWithCheck() {
if (b == true) {
this.b = false;
}
}
public void negateWithoutCheck() {
this.b = false;
}
}
class StateKeeperForObjects implements StateKeeper {
private Boolean b = true;
#Override
public void negateWithCheck() {
if (b == true) {
this.b = false;
}
}
#Override
public void negateWithoutCheck() {
this.b = false;
}
}
Moreover assume that methods negate*Check() can be called 1+ many times and it is hard to say what is the upper bound of the number of calls.
The question is which method in both implementations is 'better'
according to execution speed, garbage collection, memory allocation, etc. -
negateWithCheck or negateWithoutCheck?
Does the answer depend on which from the two proposed
implementations we use or it doesn't matter?
Does the answer depend on the estimated number of calls? For what count of number is better to use one or first method?
There might be a slight performance benefit in using the one with the check. I highly doubt that it matters in any real life application.
premature optimization is the root of all evil (Donald Knuth)
You could measure the difference between the two. Let me emphasize that these kind of things are notoriously difficult to measure reliably.
Here is a simple-minded way to do this. You can hope for performance benefits if the check recognizes that the value doesn't have to be changed, saving you an expensive write into the memory. So I have changed your code accordingly.
interface StateKeeper {
public abstract void negateWithoutCheck();
public abstract void negateWithCheck();
}
class StateKeeperForPrimitives implements StateKeeper {
private boolean b = true;
public void negateWithCheck() {
if (b == false) {
this.b = true;
}
}
public void negateWithoutCheck() {
this.b = true;
}
}
class StateKeeperForObjects implements StateKeeper {
private Boolean b = true;
public void negateWithCheck() {
if (b == false) {
this.b = true;
}
}
public void negateWithoutCheck() {
this.b = true;
}
}
public class Main {
public static void main(String args[]) {
StateKeeper[] array = new StateKeeper[10_000_000];
for (int i=0; i<array.length; ++i)
//array[i] = new StateKeeperForObjects();
array[i] = new StateKeeperForPrimitives();
long start = System.nanoTime();
for (StateKeeper e : array)
e.negateWithCheck();
//e.negateWithoutCheck();
long end = System.nanoTime();
System.err.println("Time in milliseconds: "+((end-start)/1000000));
}
}
I get the followings:
check no check
primitive 17ms 24ms
Object 21ms 24ms
I didn't find any performance penalty of the check the other way around when the check is always superfluous because the value always has to be changed.
Two things: (1) These timings are unreliable. (2) This benchmark is far from any real life application; I had to make an array of 10 million elements to actually see something.
I would simply pick the function with no check. I highly doubt that in any real application you would get any measurable performance benefit from the function that has the check but that check is error prone and is harder to read.
Short answer: the Without check will always be faster.
An assignment takes a lot less computation time than a comparison. Therefore: an IF statement is always slower than an assignment.
When comparing 2 variables, your CPU will fetch the first variable, fetch the second variable, compare those 2 and store the result into a temporary register. That's 2 fetches, 1 compare and a 1 store.
When you assign a value, your CPU will fetch the value on the right hand of the '=' and store it into the memory. That's 1 fetch and 1 store.
In general, if you need to set some state, just set the state. If, on the otherhand, you have to do something more - like log the change, inform about the change, etc. - then you should first inspect the old value.
But, in the case when methods like the ones you provided are called very intensely, there may be some performance difference in checking vs non-checking (whether the new value is different). Possible outcomes are:
1-a) check returns false
1-b) check returns true, value is assigned
2) value is assigned without check
As far as I know, writing is always slower than reading (all the way down to register level), so the fastest outcome is 1-a. If your case is that the most common thing that happens is that the value will not be changed ('more than 50%' logic is just not good enough, the exact percentage has to be figured out empirically) - then you should go with checking, as this eliminates redundant writing operation (value assignment). If, on the other hand, value is different more than often - assign it without checking.
You should test your concrete cases, do some profiling, and based on the result determine the best implementation. There is no general "best way" for this case (apart from "just set the state").
As for boolean vs Boolean here, I would say (off the top of my head) that there should be no performance difference.
Only today I've seen few answers and comments repeating that
Premature optimization is the root of all evil
Well obviously one if statement more is one thing more to do, but... it doesn't really matter.
And garbage collection and memory allocation... not an issue here.
I would generally consider the negateWithCheck to be slightly slower due there always being a comparison. Also notice in the StateKeeperOfObjects you are introducing some autoboxing. 'true' and 'false' are primitive boolean values.
Assuming you fix the StateKeeperOfObjects to use all objects, then potentially, but most likely not noticeable.
The speed will depend slightly on the number of calls, but in general the speed should be considered to be the same whether you call it once or many times (ignoring secondary effects such as caching, jit, etc).
It seems to me, a better question is whether or not the performance difference is noticeable. I work on a scientific project that involves millions of numerical computations done in parallel. We started off using Objects (e.g. Integer, Double) and had less than desirable performance, both in terms of memory and speed. When we switched all of our computations to primitives (e.g. int, double) and went over the code to make sure we were not introducing anything funky through autoboxing, we saw a huge performance increase (both memory and speed).
I am a huge fan of avoiding premature optimization, unless it is something that is "simple" to implement. Just be wary of the consequences. For example, do you have to represent null values in your data model? If so, how do you do that using a primitive? Doubles can be done easily with NaN, but what about Booleans?
negateWithoutCheck() is preferable because if we consider the number of calls then negateWithoutCheck() has only one call i.e. this.b = false; where as negateWithCheck() has one extra with previous one.