In a performance-critical part of my code, I need to clear an int array buffer by setting it back to all 0s.
Should I do buffer = new int[size] or Arrays.fill(buffer, 0)? The first seems to be faster in my tests, but maybe it will slow down eventually because of garbage collection. I don't have confidence in my own tests (because of stuff like compiler optimization), so I am asking it here.
If it matters, buffer will be size of about 300, and I need to clear buffer when it fills up, so after 300 iterations of my main loop.
I read More efficient to create new array or reset array but it doesn't specifically say for larger arrays. Also it is for Objects, not ints, which I think could matter.
Is it faster to create new array or set all elements of current array to 0.
There is no simple answer. The JVM can allocate a default initialized array faster that fill(array, 0) can fill an array of the same size. But the flipside is that there are GC-related overheads that are difficult to quantify:
The GC costs are typically proportional to amount of reachable data. For non-reachable objects, the cost is essentially the cost of zeroing memory.
The GC costs / efficiency will depend on the heap size, and on how full it is.
The GC overheads also depend on the lifetime of the objects. For example a long-lived object will typically be tenured to the "old" generation and GC'd less often. But the flipside is that write barriers may make array writes slower.
Different GC's perform differently.
Different Java JIT compilers, etc perform differently.
And so on.
The bottom line is that it is not possible to give a clear answer without knowing ... more information than you can provide to create a valid model of the behavior.
Likewise, artificial benchmarks are liable to involve making explicit or implicit choices about various of the above (overt and hidden) variables. The result is liable to be that the benchmark results don't reflect real performance in your application.
So the best answer is to measure and compare the performance in the context of your actual application. In other words:
Get your application working
Write a benchmark for measuring your application's performance with realistic test data / inputs
Use the benchmark to compare the performance of the two alternatives in the context of your application.
(Your question has the smell of premature optimization about it. You should be able to put off deciding which of these alternatives is better ... until you have the tools to make a well-founded decision.)
I am just wondering why the jdk HashMap reshashing process not taking the gradual approach as Redis. Though the rehash calculation of Jdk HashMap is quite elegant and effective, it will still take noticeable time when the number of elements in the original HashMap contains quite a number of entries.
I am not an experience java user so I always suppose that there must be consideration of the java designers that is beyond the limit of my cognitive capability.
The gradual rehash like Redis can effectively distributes the workload to each put, delete or get in the HashMap, which could significantly reduce the resize/rehashing time .
And I have also compared the two hash methods which in my mind doesn't restrict Jdk from doing a gradual rehashing.
Hope someone could give an clue or some inspiration. Thanks a lot in advance.
If you think about the costs and benefits of incremental rehashing for something like HashMap, It turns out that the costs are not insignificant, and the benefits are not as great as you might like.
An incrementally rehashing HashMap:
Uses 50% more memory on average, because it needs to keep both the old table and new table around during the incremental rehash; and
Has a somewhat higher computational cost per operation. Also:
The rehashing is still not entirely incremental, because allocating the new hash table array has to be done all at once; so
There are no improvements in the asymptotic complexity of any operation. And finally:
Almost nothing that would really need incremental rehashing can be implemented in Java at all, due to the unpredictable GC pauses, so why bother?
I was reading the comments on this answer and I saw this quote.
Object instantiation and object-oriented features are blazing fast to use (faster than C++ in many cases) because they're designed in from the beginning. and Collections are fast. Standard Java beats standard C/C++ in this area, even for most optimized C code.
One user (with really high rep I might add) boldly defended this claim, stating that
heap allocation in java is better than C++'s
and added this statement defending the collections in java
And Java collections are fast compared to C++ collections due largely to the different memory subsystem.
So my question is can any of this really be true, and if so why is java's heap allocation so much faster.
This sort of statement is ridiculous; people making it are
either incredibly uninformed, or incredibly dishonest. In
particular:
The speed of dynamic memory allocation in the two cases will
depend on the pattern of dynamic memory use, as well as the
implementation. It is trivial for someone familiar with the
algorithms used in both cases to write a benchmark proving which
ever one he wanted to be faster. (Thus, for example, programs
using large, complex graphs that are build, then torn down and
rebuilt, will typically run faster under garbage collection. As
will programs that never use enough dynamic memory to trigger
the collector. Programs using few, large, long lived
allocations will often run faster with manual memory
management.)
When comparing the collections, you have to consider what is
in the collections. If you're comparing large vectors of
double, for example, the difference between Java and C++ will
likely be slight, and could go either way. If you're comparing
large vectors of Point, where Point is a value class containing
two doubles, C++ will probably blow Java out of the water,
because it uses pure value semantics (with no additional dynamic
allocation), where as Java needs to dynamically allocate each
Point (and no dynamic allocation is always faster than even
the fastest dynamic allocation). If the Point class in Java
is correctly designed to act as a value (and thus immutable,
like java.lang.String), then doing a translation on the
Point in a vector will require a new allocation for every
Point; in C++, you could just assign.
Much depends on the optimizer. In Java, the optimizer works
with perfect knowledge of the actual use cases, in this
particular run of the program, and perfect knowledge of the
actual processor it is running on, in this run. In C++, the
optimizer must work with data from a profiling run, which will
never correspond exactly to any one run of the program, and the
optimizer must (usually) generate code that will run (and run
quickly) on a wide variety of processor versions. On the other
hand, the C++ optimizer may take significantly more time
analysing the different paths (and effective optimization can
require a lot of CPU); the Java optimizer has to be fairly
quick.
Finally, although not relevant to all applications, C++ can be
single threaded. In which case, no locking is needed in the
allocator, which is never the case in Java.
With regards to the two numbered points: C++ can use more or
less the same algorithms as Java in its heap allocator. I've
used C++ programs where the ::operator delete() function was
empty, and the memory was garbage collected. (If your
application allocates lots of short lived, small objects, such
an allocator will probably speed things up.) And as for the
second: the really big advantage C++ has is that its memory
model doesn't require everything to be dynamically allocated.
Even if allocation in Java takes only a tenth of the time it
would take in C++ (which could be the case, if you only count
the allocation, and not the time needed for the collector
sweeps), with large vectors of Point, as above, you're
comparing two or three allocations in C++ with millions of
allocations in Java.
And finally: "why is Java's heap allocation so much faster?" It
isn't, necessarily, if you amortise the time for the
collection phases. The time for the allocation itself can be
very cheap, because Java (or at least most Java implementations)
use a relocating collector, which results in all of the free
memory being in a single contiguous block. This is at least
partially offset by the time needed in the collector: to get
that contiguity, you've got to move data, which means a lot of
copying. In most implementations, it also means an additional
indirection in the pointers, and a lot of special logic to avoid
issues when one thread has the address in a register, or such.
Your questions don't have concrete answers. For example, C++ does not define memory management at all. It leaves allocation details up to the library implementation. Therefore, within the bounds of C++, a given platform may have a very slow heap allocation scheme, and Java would certainly be faster if it bypasses that. On another platform, memory allocations may be blazing fast, outperforming Java. As James Kanze pointed out, Java also places very little constraints on memory management (e.g. even the GC algorithm is entirely up to the JVM implementor). Because Java and C++ do not place constraints on memory management, there is no concrete answer to that question. C++ is purposefully open about underlying hardware and kernel functions, and Java is purposefully open about JVM memory management. So the question becomes very fuzzy.
You may find that some operations are faster in Java, and some not. You never know until you try, however:
In practice, the real differences lie in your higher level algorithms and implementations. For all but the most absolutely performance critical applications, the differences in performance of identical data structures in different languages is completely negligible compared to the performance characteristics of the algorithm itself. Concentrate on optimizing your higher level implementations. Only after you have done so, and after you have determined that your performance requirements are not being met, and after you have benchmarked and found (unlikely) that your bottleneck is in container implementations, should you start to think of things like this.
In general, as soon as you find yourself thinking or reading about C++ vs. Java issues, stop and refocus on something productive.
Java heap is faster because (simplified) all you need to do to allocate is to increase heap top pointer (just like on stack). It is possible because heap is periodically compacted. So your price for speed is:
Periodic GC pauses for heap compacting
Increased memory usage
There is no free cheese... So while collection operations may be fast, it is amortized by overall slowing down during GC work.
While I am a fan of Java, it is worth noting that C++ supports allocation of objects on the stack which is faster than heap allocation.
If you use C++ efficiently with all it various ways of doing the same thing, it will be faster than Java (even if it takes you longer to find that optimal combination)
If you program in C++ as you would in Java, e.g. everything on the heap, all methods virtual, have lots of runtime checks which don't do anything and can be optimised away dynamically, it will be slower. Java has optimised these things further as these a) are the only thing Java does, b) can be optimised dynamically more efficiently, c) Java has less features and side effects so it is easier for optimiser for get decent speeds.
and Collections are fast. Standard Java beats standard C/C++ in this area, even for most optimized C code.
This may be true for particular collections, but most certainly isn't true for all collections in all usage patterns.
For instance, a java.util.HashMap will outperform a std:map, because the latter is required to be sorted. That is, the fastest Map in the Java Standard Library is faster that the fastest Map in the C++ one (at least prior to C++11, which added the std:unordered_map)
On the other side, a std:Vector<int> is far more efficient that an java.util.ArrayList<Integer> (due to type erasure, you can't use a java.util.ArrayList<int>, and therefore end up with about 4 times the memory consumption, and possibly poorer cache locality, and correspondingly slower iteration).
In short, like most sweeping generalizations, this one doesn't always apply. However, neither would the opposite assertion (that Java is always slower than C++). It really depends on the details, such as how you use the collection, or even which versions of the languages you compare).
I'm exploring options to help my memory-intensive application, and in doing so I came across Terracotta's BigMemory. From what I gather, they take advantage of non-garbage-collected, off-heap "native memory," and apparently this is about 10x slower than heap-storage due to serialization/deserialization issues. Prior to reading about BigMemory, I'd never heard of "native memory" outside of normal JNI. Although BigMemory is an interesting option that warrants further consideration, I'm intrigued by what could be accomplished with native memory if the serialization issue could be bypassed.
Is Java native memory faster (I think this entails ByteBuffer objects?) than traditional heap memory when there are no serialization issues (for instance if I am comparing it with a huge byte[])? Or do the vagaries of garbage collection, etc. render this question unanswerable? I know "measure it" is a common answer around here, but I'm afraid I would not set up a representative test as I don't yet know enough about how native memory works in Java.
Direct memory is faster when performing IO because it avoid one copy of the data. However, for 95% of application you won't notice the difference.
You can store data in direct memory, however it won't be faster than storing data POJOs. (or as safe or readable or maintainable) If you are worried about GC, try creating your objects (have to be mutable) in advance and reuse them without discarding them. If you don't discard your objects, there is nothing to collect.
Is Java native memory faster (I think this entails ByteBuffer objects?) than traditional heap memory when there are no serialization issues (for instance if I am comparing it with a huge byte[])?
Direct memory can be faster than using a byte[] if you use use non bytes like int as it can read/write the whole four bytes without turning the data into bytes.
However it is slower than using POJOs as it has to bounds check every access.
Or do the vagaries of garbage collection, etc. render this question unanswerable?
The speed has nothing to do with the GC. The GC only matters when creating or discard objects.
BTW: If you minimise the number of object you discard and increase your Eden size, you can prevent even minor collection occurring for a long time e.g. a whole day.
The point of BigMemory is not that native memory is faster, but rather, it's to reduce the overhead of the garbage collector having to go through the effort of tracking down references to memory and cleaning it up. As your heap size increases, so do your GC intervals and CPU commitment. Depending upon the situation, this can create a sort of "glass ceiling" where the Java heap gets so big that the GC turns into a hog, taking up huge amounts of processor power each time the GC kicks in. Also, many GC algorithms require some level of locking that means nobody can do anything until that portion of the GC reference tracking algorithm finishes, though many JVM's have gotten much better at handling this. Where I work, with our app server and JVM's, we found that the "glass ceiling" is about 1.5 GB. If we try to configure the heap larger than that, the GC routine starts eating up more than 50% of total CPU time, so it's a very real cost. We've determined this through various forms of GC analysis provided by our JVM vendor.
BigMemory, on the other hand, takes a more manual approach to memory management. It reduces the overhead and sort of takes us back to having to do our own memory cleanup, as we did in C, albeit in a much simpler approach akin to a HashMap. This essentially eliminates the need for a traditional garbage collection routine, and as a result, we eliminate that overhead. I believe that the Terracotta folks used native memory via a ByteBuffer as it's an easy way to get out from under the Java garbage collector.
The following whitepaper has some good info on how they architected BigMemory and some background on the overhead of the GC: http://www.terracotta.org/resources/whitepapers/bigmemory-whitepaper.
I'm intrigued by what could be accomplished with native memory if the serialization issue could be bypassed.
I think that your question is predicated on a false assumption. AFAIK, it is impossible to bypass the serialization issue that they are talking about here. The only thing you could do would be to simplify the objects that you put into BigMemory and use custom serialization / deserialization code to reduce the overheads.
While benchmarks might give you a rough idea of the overheads, the actual overheads will be very application specific. My advice would be:
Only go down this route if you know you need to. (You will be tying your application to a particular implementation technology.)
Be prepared for some intrusive changes to your application if the data involved isn't already managed using as a cache.
Be prepared to spend some time in (re-)tuning your caching code to get good performance with BigMemory.
If your data structures are complicated, expect a proportionately larger runtime overheads and tuning effort.
I'm writing some pretty CPU-intensive, concurrent numerical code that will process large amounts of data stored in Java arrays (e.g. lots of double[100000]s). Some of the algorithms might run millions of times over several days so getting maximum steady-state performance is a high priority.
In essence, each algorithm is a Java object that has an method API something like:
public double[] runMyAlgorithm(double[] inputData);
or alternatively a reference could be passed to the array to store the output data:
public runMyAlgorithm(double[] inputData, double[] outputData);
Given this requirement, I'm trying to determine the optimal strategy for allocating / managing array space. Frequently the algorithms will need large amounts of temporary storage space. They will also take large arrays as input and create large arrays as output.
Among the options I am considering are:
Always allocate new arrays as local variables whenever they are needed (e.g. new double[100000]). Probably the simplest approach, but will produce a lot of garbage.
Pre-allocate temporary arrays and store them as final fields in the algorithm object - big downside would be that this would mean that only one thread could run the algorithm at any one time.
Keep pre-allocated temporary arrays in ThreadLocal storage, so that a thread can use a fixed amount of temporary array space whenever it needs it. ThreadLocal would be required since multiple threads will be running the same algorithm simultaneously.
Pass around lots of arrays as parameters (including the temporary arrays for the algorithm to use). Not good since it will make the algorithm API extremely ugly if the caller has to be responsible for providing temporary array space....
Allocate extremely large arrays (e.g. double[10000000]) but also provide the algorithm with offsets into the array so that different threads will use a different area of the array independently. Will obviously require some code to manage the offsets and allocation of the array ranges.
Any thoughts on which approach would be best (and why)?
What I have noticed when working with memory in Java is the following. If your memory needs patterns are simple (mostly 2-3 types of memory allocations) you can usually be better than the default allocator. You can either preallocate a pool of buffers at the application startup and use them as needed or go to the other route (allocate an huge array at the beginning and provide pieces of that when needed). In effect you are writing your own memory allocator. But chances are you will do a worse job than the default allocator of Java.
I would probably try to do the following: standardize the buffer sizes and allocate normally. That way after a while the only memory allocation/deallocation will be in fixed sizes which will greatly help the garbage collector to run fast. Another thing I would do is to make sure at the algorithm design time that the total memory needed at any one point will not exceed something like 80-85% of the memory of the machine in order to not trigger a full collection inadvertently.
Apart from those heuristics I would probably test the hell of any solution I would pick and see how it works in practice.
Allocating big arrays is relatively cheap for the GC. You tend to use you your Eden space quickly, but the cost is largely per object. I suggest you write the code in the simplest manner possible and optimise it later after profiling the application. a double[100000] is less than a MB and you can over a thousand in a GB.
Memory is a lot cheaper than it used to be. An 8 GB server costs about £850. A 24 GB server costs about £1,800. (a 24 GB machine could allow you 24K x double[100000]) You may find using a large heap size or even a large Eden size gives you the efficiency you want.