excessive memory use by java - java

In a project of mine I constantly compress little blocks of data.
Now I find out that the jvm then grows to 6GB of ram (resident (RES) RAM, not shared or virtual or so) and then die because of out of memory.
It is as if the garbage collector never runs or so.
I've pulled out the relevant code and pasted it below. When I run it (java6, 32 bit linux) it grows to 1GB of ram.
Anyone got an idea how to reduce the memory usage?
import java.util.Random;
import java.util.zip.Deflater;
import java.util.zip.Inflater;
class test {
int blockSize = 4096;
Random r = new Random();
public test() throws Exception {
blockSize = 4096;
byte [] data = new byte[blockSize];
for(int index=0; index<blockSize; index++)
data[index] = (byte)r.nextInt();
for(long cnt=0; cnt<1000000; cnt++) {
byte [] result = compress(data);
if (result != null)
data[0] = result[0];
}
}
byte [] compress(byte [] in) {
assert in.length == blockSize;
Deflater compresser = new Deflater();
compresser.setInput(in);
compresser.finish();
byte [] out = new byte[in.length];
int outLen = compresser.deflate(out);
if (outLen < blockSize) {
byte [] finalOut = new byte[outLen];
System.arraycopy(out, 0, finalOut, 0, outLen);
return finalOut;
}
return null;
}
public static void main(String [] args) throws Exception {
new test();
}
}

Well, Folkert van Heusden solved his own problem, but to summarize:
Early in the compress(byte [] in)-method, we create a java.util.zip.Deflater.
We use the Deflater to do some stuff, and then we leave the compress()-method. We loose our reference to the deflater-variable. At this point, the Deflater is no longer in use, and is waiting to be killed by the garbage collector.
Deflater allocates both Java heap memory and C/C++/native heap memory. The native heap memory that are allocated by a Deflater, will be held until Deflater.finalize-method is called by the garbage collector. If the garbage collector doesn't run fast enough (there might be plenty free java heap memory), we can run out of C/C++ heap memory. If this happen, we will get "Out of memory"-errors.
The Oracle bug report JDK-4797189 is probably related. It contains a code snippet that illustrates and reproduces the problem:
public class Bug {
public static void main( String args[] ) {
while ( true ) {
/* If ANY of these two lines is not commented, the JVM
runs out of memory */
final Deflater deflater = new Deflater( 9, true );
final Inflater inflater = new Inflater( true );
}
}
}
The solution is to free the resources when you are finished by calling the Deflater.end()-method (or Inflater.end()).

Well, It seems to me that there is no memory leak in the code, so it actually seems the VM is not GC-ing byte arrays.
"Anyone got an idea how to reduce the memory usage?"
Well, I would try with
byte firstByteOfDataWhichIsCompressedAndThenUncompressed(byte [] in) { ... }
which specifically returns the first byte of the uncompressed array, rather than the whole array. I know, it's a horrible method name, and I hope you will find a better one.
The following code
for(long cnt=0; cnt<1000000; cnt++) {
byte [] result = compress(data);
if (result != null)
data[0] = result[0];
}
would become
for(long cnt=0; cnt<1000000; cnt++)
data[0] = firstByteOfDataWhichIsCompressedAndThenUncompressed(data);

Related

Flush cache lines in case of a Single Shot benchmark

I'd like to run a SingleShot JMH benchmark with all cache hierarchy related to the memory working up on are reliably flushed.
The benchmark looks roughly as follows:
#State(Scope.Benchmark)
public class MyBnchmrk {
public byte buffer[];
#Setup(Level.Trial)
public void generateSampleData() throws IOException {
// writes to buffer ...
}
#Setup(Level.Invocation)
public void flushCaches() {
//Perfectly I'd like to invoke here something like
//_mm_clflushopt() intrinsic as in GCC/clang for each line of the buffer
}
#Benchmark
#BenchmarkMode(Mode.SingleShotTime)
public void benchmarkMemoryBoundCode() {
//the benchmark
}
}
Is there a Java way to flush caches before single-shot measurement or hand-written clflush is required?
If you want to measure cache misses access, calling clflush directly is possible from java, but you end up writing JNI library with ASM intrinsic. Not to say, you can't probably do it in a reliable fashion, since you need to provide virtual address, and GC may move you buffer at any time.
Instead I offer you this:
Use single snapshot benchmark as you do
Measuring a single opreration would not be a good idea (measuring nanoseconds has high error). Instead create million of such identical buffers and do the same operation for million buffers. Every time you access a next buffer, which is not in the cache
You also can run some calculation between iterations. For example, reading 32+ mb of memory so it evicts cache lines from you cache. But with million of buffers, it doesn't show any profit
The resulting code:
#State(Scope.Benchmark)
#BenchmarkMode(Mode.SingleShotTime)
#OutputTimeUnit(TimeUnit.NANOSECONDS)
#Fork(value = 1)
public class BufferBenchmarkLatency {
public static final int BATCH_SIZE = 1000000;
public static final int MY_BUFFER_SIZE = 1024;
public static final int CACHE_LINE_PADDING = 256;
public static class StateHolder extends Padder {
byte buffer[];
StateHolder() {
buffer = new byte[CACHE_LINE_PADDING + MY_BUFFER_SIZE + CACHE_LINE_PADDING];
Arrays.fill(buffer, (byte) ThreadLocalRandom.current().nextInt());
}
}
private final StateHolder[] arr = new StateHolder[BATCH_SIZE];
private int index;
#Setup(Level.Trial)
public void setUpTrial() {
for (int i = 0; i < arr.length; i++) {
arr[i] = new StateHolder();
}
ArrayUtil.shuffle(arr)
}
#Setup(Level.Iteration)
public void prepareForIteration(Blackhole blackhole) {
index = 0;
blackhole.consume(CacheUtil.evictCacheLines());
System.gc();
System.gc();
}
#Benchmark
public long read() {
byte[] buffer = arr[index].buffer;
return buffer[0];
}
#TearDown(Level.Invocation)
public void move() {
index++;
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(BufferBenchmarkLatency.class.getSimpleName())
.measurementBatchSize(BATCH_SIZE)
.warmupBatchSize(BATCH_SIZE)
.measurementIterations(10)
.warmupIterations(10)
.build();
new Runner(opt).run();
}
}
As you see, I padd state holder itself, so reading buffer references are always on the different cache lines (Padder class has 24 long fields). Oh, and I also padd buffer itself, JMH won't do it for you.
I've implemented this idea, and I have avg 100 ns result for simple operation like reading first element of the buffer. To read first element you need to read two cache lines (buffer reference + first element). The full code is here

How to copy native memory to DirectByteBuffer

I know one way - using memcpy on C++ side:
C++ method:
void CopyData(void* buffer, int size)
{
memcpy(buffer, source, size);
}
JNR mapping:
void CopyData(#Pinned #Out ByteBuffer byteBuffer, #Pinned #In int size);
Java invocation:
ByteBuffer buffer = ByteBuffer.allocateDirect(size);
adapter.CopyData(buffer, size);
But I would like to handle case when native code does not copy data, but only returns pointer to the memory which is to be copied:
C++ methods:
void* GetData1()
{
return source;
}
// or
struct Data
{
void* data;
};
void* GetData2(Data* outData)
{
outData->data = source;
}
I know how to write JNR mapping to be able to copy data to HeapByteBuffer:
Pointer GetData1();
// or
void GetData2(#Pinned #Out Data outData);
final class Data extends Struct {
public final Struct.Pointer data;
public DecodeResult(Runtime runtime) {
super(runtime);
data = new Struct.Pointer();
}
}
Java invocation:
ByteBuffer buffer = ByteBuffer.allocate(size);
Pointer dataPtr = adapter.GetData1();
dataPtr.get(0, buffer.array(), 0, buffer.array().length);
// or
ByteBuffer buffer = ByteBuffer.allocate(size);
Data outData = new Data(runtime);
adapter.GetData2(outData);
Pointer dataPtr = outData.data.get();
dataPtr.get(0, buffer.array(), 0, buffer.array().length);
But I have not found a way to copy memory to DirectByteBuffer instead of HeapByteBuffer. The above snippet of code does not work for DirectByteBuffer because buffer.array() is null for such a buffer, as it is backed by native memory area.
Please help.
I have found several ways to perform copying of JNR native memory to DirectByteBuffer. They differ in efficiency. Currently I use the following aproach, I don't know whether is it the best or intended by JNR authors:
ByteBuffer buffer = ByteBuffer.allocateDirect(size);
Pointer dataPtr = adapter.GetData1();
long destAddress = ((DirectBuffer)buffer).address();
Pointer destPtr = AsmRuntime.pointerValue(destAddress, runtime);
assert dataPtr.isDirect() && destPtr.isDirect();
dataPtr.transferTo(0, destPtr, 0, size);
or
ByteBuffer buffer = ByteBuffer.allocateDirect(size);
Data outData = new Data(runtime);
adapter.GetData2(outData);
Pointer dataPtr = outData.data.get();
long destAddress = ((DirectBuffer)buffer).address();
Pointer destPtr = AsmRuntime.pointerValue(destAddress, runtime);
assert dataPtr.isDirect() && destPtr.isDirect();
dataPtr.transferTo(0, destPtr, 0, size);
It is important that the assert clause above is fulfilled. It guarantees that pointers are jnr.ffi.provider.jffi.DirectMemoryIO instances, and the efficient memcpy method is used for copying (check implementation of DirectMemoryIO.transferTo()).
The alternative is to wrap DirectByteBuffer using the following method:
Pointer destPtr = Pointer.wrap(runtime, destAddress);
or
Pointer destPtr = Pointer.wrap(runtime, destAddress, size);
but no:
Pointer destPtr = Pointer.wrap(runtime, buffer);
The first and second pointers are backed by DirectMemoryIO, but the third pointer is backed by ByteBufferMemoryIO and it involves slow byte-by-byte copying.
The one drawback is that DirectMemoryIO instance is quite heavyweight. It allocates 32 bytes on JVM heap, so in case of plenty of JNR invocations, all DirectMemoryIO instances consume big part of memory.

Do initialization of any variable invokes GC in JAVA

I have got a Java memory puzzle from internet and trying hard to understand it, and what I understood till now is that Java is releasing memory of object with ended life on initialization of new object, though I am unable to get any proof on the same. If any one have any idea on it please guide.
Puzzle is as below
private final int dataSize = (int) (Runtime.getRuntime().maxMemory() * 0.6);
public void f()
{
{
byte[] data = new byte[dataSize];
}
//int i = 0; //If uncommented then program work fine
byte[] data2 = new byte[dataSize];
}

Is Java7 intelligent in collecting objects if they are not used further in a scope, although the scope has not completely ended

Consider the following code:
public class BitSetTest
{
public static void main(final String[] args) throws IOException
{
System.out.println("Start?");
int ch = System.in.read();
List<Integer> numbers = getSortedNumbers();
System.out.println("Generated numbers");
ch = System.in.read();
RangeSet<Integer> rangeSet = TreeRangeSet.create();
for (Integer number : numbers)
{
rangeSet.add(Range.closed(number, number));
}
System.out.println("Finished rangeset");
ch = System.in.read();
BitSet bitset = new BitSet();
for (Integer number : numbers)
{
bitset.set(number.intValue());
}
System.out.println("Finished bitset");
ch = System.in.read();
//System.out.println(numbers.size());
//System.out.println(rangeSet.isEmpty());
//System.out.println(bitset.size());
}
private static List<Integer> getSortedNumbers()
{
int max = 200000000;
int n = max / 10;
List<Integer> numbers = Lists.newArrayListWithExpectedSize(max);
File file = new File("numbers.txt");
if (file.exists())
{
try (BufferedReader reader = new BufferedReader(new FileReader(file)))
{
String line = reader.readLine();
while ((line = reader.readLine()) != null)
{
numbers.add(Integer.valueOf(line));
}
}
catch (IOException e1)
{
throw new RuntimeException(e1);
}
}
else
{
for (int i = 0; i < n; i++)
{
int number = (int) (Math.random() * max);
numbers.add(number);
if (i > 0 && i % 10000 == 0)
{
System.out.println(i);
}
}
Collections.sort(numbers);
try (BufferedWriter writer = new BufferedWriter(new FileWriter(file)))
{
writer.write(numbers.get(0) + "");
for (int i = 1; i < n; i++)
{
writer.write("\n");
writer.write(numbers.get(i) + "");
}
}
catch (IOException e1)
{
throw new RuntimeException(e1);
}
}
return numbers;
}
}
At the first pause(System.in.read()), JConsole shows memory usage as 4MB.
At the second pause("Generated Numbers"), since a large list is instantiated, the memory usage jumps to 922MB.
At the next pause ("Finished rangeset"), after running GC the memory comes back to 4MB which means the list is collected although the function has not ended in scope.
When the commented sys outs are uncommented and used, then the list does not get collected till the sysout gets executed.
Just wanted to understand whether the JVM is intelligent enough to determine the scope of an object based on the point from where it is not being used any further?
Garbage collection is based on generations (there are some changes in Java 8). Until Java 8 memory used to be divided into 3 parts: Young generation, Old generation and PermGen. All newly created objects will get into young generation, and if they are still available after some time they will be migrated to Old generation. PermGen used to be used mostly for JVM's own data. Garbage collection of young generation is called minor garbage collection and happens relatively frequently.
Java approach to garbage collection is "Mark and Swipe" (see the first link) and it marks all objects which are not referenced by any life code as dead and clean them up (swipe).
In your particular case the following is happening:
Java loads your class into young memory generation;
Java kicks off main class;
Your code allocates more memory in young generation;
While your code runs all these memory is referenced by life code therefore if not cleaned;
Your program stops running;
Garbage collection kicks in and detects that all data, and the class itself is not referenced any longer by any life code, therefore it marks it for deletion and eventually cleanup.
Based on what you are saying there is a good chance that your class and all your data doesn't even make to Old generation.
To be more clear: garbage collection is happening in parallel with your code and therefore can detect that some data is not referenced any more. Assuming that it detects if object is no longer referenced based on method end is not always correct (and proven by your test).

Java search array by brute force and doesnt spends memory?

I'm doing a comparison of methods in a construction of an array without repeated elements and then get the time it took and the memory used.
For the hash method and the treeset method the memory prints without problem, but for the bruteforce search it doesn't print any memory. Is it possible that the brute force doesn't use any "respectable" memory because it just compares a element one by one? this is the code I have. Is it possible that is something wrong?
public static void main(String[] args)
{
Random r = new Random();
int warmup = 0;
while(warmup<nr) {
tempoInicial = System.nanoTime();
memoriaInicial = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
while(ne<max)
{
valor = r.nextInt(maxRandom);
acrescentar();
}
tempoFinal = System.nanoTime();
memoriaFinal = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
retirar();
System.gc();
warmup++;
}
and
private static void acrescentar()
{
if(usaTreeSet)
{
if(ts.contains(valor))
return;
ts.add(valor);
}
if(usaHashSet)
{
if(hs.contains(valor))
return;
hs.add(valor);
}
if(usaBruteForce)
{
for(int i=0; i<ne; i++)
{
if(pilha[i]==valor)
return;
}
}
pilha[ne]=valor;
ne++;
}
When testing small amounts of memory try turning off the TLAB and no object is too small. ;) -XX:-UseTLAB The TLAB allocates blocks of memory at a time to each thread. These blocks do not count to the free memory.
You might find this article on Getting the size of an Object useful.

Categories