Assume I have a large array of relatively small objects, which I need to iterate frequently.
I would like to optimize my iteration by improving cache performance, so I would like to allocate the objects [and not the reference] contiguously on the memory, so I'll get fewer cache misses, and the overall performance could be segnificantly better.
In C++, I could just allocate an array of the objects, and it will allocate them as I wanted, but in java - when allocating an array, I only allocate the reference, and the allocation is being done one object at a time.
I am aware that if I allocate the objects "at once" [one after the other], the jvm is most likely to allocate the objects as contiguous as it can, but it might be not enough if the memory is fragmented.
My questions:
Is there a way to tell the jvm to defrag the memory just before I start allocating my objects? Will it be enough to ensure [as much as possible] that the objects will be allocated continiously?
Is there a different solution to this issue?
New objects are creating in the Eden space. The eden space is never fragmented. It is always empty after a GC.
The problem you have is when a GC is performed, object can be arranged randomly in memory or even surprisingly in the reverse order they are referenced.
A work around is to store the fields as a series of arrays. I call this a column-based table instead of a row based table.
e.g. Instead of writing
class PointCount {
double x, y;
int count;
}
PointCount[] pc = new lots of small objects.
use columns based data types.
class PointCounts {
double[] xs, ys;
int[] counts;
}
or
class PointCounts {
TDoubleArrayList xs, ys;
TIntArrayList counts;
}
The arrays themselves could be in up to three different places, but the data is otherwise always continuous. This can even be marginally more efficient if you perform operations on a subset of fields.
public int totalCount() {
int sum = 0;
// counts are continuous without anything between the values.
for(int i: counts) sum += i;
return i;
}
A solution I use is to avoid GC overhead for having large amounts of data is to use an interface to access a direct or memory mapped ByteBuffer
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
public class MyCounters {
public static void main(String... args) {
Runtime rt = Runtime.getRuntime();
long used1 = rt.totalMemory() - rt.freeMemory();
long start = System.nanoTime();
int length = 100 * 1000 * 1000;
PointCount pc = new PointCountImpl(length);
for (int i = 0; i < length; i++) {
pc.index(i);
pc.setX(i);
pc.setY(-i);
pc.setCount(1);
}
for (int i = 0; i < length; i++) {
pc.index(i);
if (pc.getX() != i) throw new AssertionError();
if (pc.getY() != -i) throw new AssertionError();
if (pc.getCount() != 1) throw new AssertionError();
}
long time = System.nanoTime() - start;
long used2 = rt.totalMemory() - rt.freeMemory();
System.out.printf("Creating an array of %,d used %,d bytes of heap and tool %.1f seconds to set and get%n",
length, (used2 - used1), time / 1e9);
}
}
interface PointCount {
// set the index of the element referred to.
public void index(int index);
public double getX();
public void setX(double x);
public double getY();
public void setY(double y);
public int getCount();
public void setCount(int count);
public void incrementCount();
}
class PointCountImpl implements PointCount {
static final int X_OFFSET = 0;
static final int Y_OFFSET = X_OFFSET + 8;
static final int COUNT_OFFSET = Y_OFFSET + 8;
static final int LENGTH = COUNT_OFFSET + 4;
final ByteBuffer buffer;
int start = 0;
PointCountImpl(int count) {
this(ByteBuffer.allocateDirect(count * LENGTH).order(ByteOrder.nativeOrder()));
}
PointCountImpl(ByteBuffer buffer) {
this.buffer = buffer;
}
#Override
public void index(int index) {
start = index * LENGTH;
}
#Override
public double getX() {
return buffer.getDouble(start + X_OFFSET);
}
#Override
public void setX(double x) {
buffer.putDouble(start + X_OFFSET, x);
}
#Override
public double getY() {
return buffer.getDouble(start + Y_OFFSET);
}
#Override
public void setY(double y) {
buffer.putDouble(start + Y_OFFSET, y);
}
#Override
public int getCount() {
return buffer.getInt(start + COUNT_OFFSET);
}
#Override
public void setCount(int count) {
buffer.putInt(start + COUNT_OFFSET, count);
}
#Override
public void incrementCount() {
setCount(getCount() + 1);
}
}
run with the -XX:-UseTLAB option (to get accurate memory allocation sizes) prints
Creating an array of 100,000,000 used 12,512 bytes of heap and took 1.8 seconds to set and get
As its off heap, it has next to no GC impact.
Sadly, there is no way of ensuring objects are created/stay at adjacent memory locations in Java.
However, objects created in sequence will most likely end up adjacent to each other (of course this depends on the actual VM implementation). I'm pretty sure that the writers of the VM are aware that locality is highly desirable and don't go out of their way to scatter objects randomly around.
The Garbage Collector will at some point probably move the objects - if your objects are short lived, that should not be an issue. For long lived objects it then depends on how the GC implements moving the survivor objects. Again, I think its reasonable that the guys writing the GC have spent some thought on the matter and will perform copies in a way that does not screw locality more than unavoidable.
There are obviously no guarantees for any of above assumptions, but since we can't do anything about it anyway, stop worring :)
The only thing you can do at the java source level is to sometimes avoid composition of objects - instead you can "inline" the state you would normally put in a composite object:
class MyThing {
int myVar;
// ... more members
// composite object
Rectangle bounds;
}
instead:
class MyThing {
int myVar;
// ... more members
// "inlined" rectangle
int x, y, width, height;
}
Of course this makes the code less readable and duplicates potentially a lot of code.
Ordering class members by access pattern seems to have a slight effect (I noticed a slight alteration in a benchmarked piece of code after I had reordered some declarations), but I've never bothered to verify if its true. But it would make sense if the VM does no reordering of members.
On the same topic it would also be nice to (from a performance view) be able to reinterpret an existing primitive array as another type (e.g. cast int[] to float[]). And while you're at it, why not whish for union members as well? I sure do.
But we'd have to give up a lot of platform and architecture independency in exchange for these possibilities.
Doesn't work that way in Java. Iteration is not a matter of increasing a pointer. There is no performance impact based on where on the heap the objects are physically stored.
If you still want to approach this in a C/C++ way, think of a Java array as an array of pointers to structs. When you loop over the array, it doesn't matter where the actual structs are allocated, you are looping over an array of pointers.
I would abandon this line of reasoning. It's not how Java works and it's also sub-optimization.
Related
As far as I can see java.lang.instrument.Instrumentation.getSize returns the size without elements (as the size doesn't depend on the number of the list. It is 24 bytes). Looks strange. Is that correct that list items are not measured?
The size of objects in Java is fixed, because Java, like C++, is a statically-typed language. The size basically corresponds to whatever is written inside the class file. Since a class file layout is fixed, so is its size.
An ArrayList for example, looks roughly like this:
public class ArrayList implements List
{
Object[] elementData;
private int size;
}
The size of any ArrayList instance would be the size of elementData (a pointer to Object[]), size (an int), and overhead. So 24 sounds about right.
That is often referred to as a shallow size.
It sounds like what you want instead is the deep size, i.e. the size of the List + all of the objects it refers to + all the objects those objects refer to, etc.
That's not directly possible using instrumentation API, but with some hacky reflection + instrumentation you might be able to get it. The basic idea is to reflect the object and recursively call getObjectSize() on all referred objects, skipping circular references.
Something along the lines:
public class DeepSizeCounter {
private HashSet<Object> counted = new HashSet<>();
private long size = 0;
public long getSizeDeep(Object x) throws Exception {
counted.clear();
size = 0;
computeSizeDeep(x);
return size;
}
private void computeSizeDeep(Object x) throws Exception {
if (x != null && !counted.contains(x)) {
counted.add(x);
size += instrumentation.getObjectSize(x);
if (x.getClass().isArray()) {
if (!x.getClass().getComponentType().isPrimitive())
for (Object y : (Object[]) x)
computeSizeDeep(y);
} else {
for (Field f : x.getClass().getDeclaredFields()) {
if (!f.getType().isPrimitive() && (f.getModifiers() & Modifier.STATIC) == 0) {
f.setAccessible(true);
computeSizeDeep(f.get(x));
}
}
}
}
}
}
Use as:
long totalSize = new DeepSizeCounter().getSizeDeep(myList);
Beware that this approach is unreliable because objects can have backreferences to some global state which you can't distinguish automatically, so can end up counting way too many objects. Also if access control is on, the Field.setAccessible() hack won't work and you won't be able to count any non-public members.
Hello,
My program spent a lot of Time creating Object in the Heap Memory , so at certain Time i get this Error:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
I can't Put my Full Application in this discussion So i created A prototype to explain what My program is doing.
The part of my program that deal with Creating Object looks as Follows:
**The Calling Program :**
public class Example {
public static void main(String[] args) {
ArrayList list = new ArrayList();
for (int i = 0; i < 5000; i++) {
for (int j = 0; j < 5000; j++) {
if (i==j)
continue;
int weight = new Random().nextInt();
Edge edge = new Edge(new Vertex(i+""), new Vertex(j+""),weight);
list.add(edge);
}
}
}
}
The Class Vertex :
public class Vertex {
private String sequence ;
public Vertex() {
}
public Vertex(String seq) {
this.sequence = seq;
}
}
The Class Edge :
public class Edge {
private Vertex source;
private Vertex destination;
private int weight;
public Edge() {
}
public Edge(Vertex source, Vertex destination, int weight) {
int[][] array = new int[500][500];
//here i need the array to do some calculation
anotherClass.someCalculation(array,source,destination);
this.source = source;
this.destination = destination;
this.weight = weight;
}
}
So as you can see:
I have 5000 Vertices , i need To create 5000*5000 Edges , each edge has An array of length 500*500.
For this reason the memory allocated in the Heap Memory end at certain time,the problem as i understood from many disctions I read is that there is no guaranty that the Garbage Collector Will free memory.
So what are the solution for this problem ? normally I don't Need The Edge's Array after constructing the Edge;the array is only needed during the construction of The Edges.
Another question : How can I minimize the Memory Utilisation in my Program ? I tried To turn the int array to char array but it didn't help.
Many Thanks
Your program actually creates 2 * 5000 * 5000 vertices, i.e. a pair for each iteration of the inner (j) loop. I think that you need to first create 5k vertices and keep references to them in an array for when you are creating your edges.
I assume that the "array" variable is only needed for the calculation. If that is true, then use a local variable instead of a instance variable. That way you don't keep 5k instances of that array--just one while you complete your calculation--reducing the memory usage.
Redefinition of the Edge class:
public class Edge {
private Vertex source;
private Vertex destination;
private int weight;
//private int [][] array; // don't keep an instance copy of this array if
// if it is not needed beyond the calculation.
public Edge() {
}
public Edge(Vertex source, Vertex destination, int weight) {
int[][] array = new int[500][500]; // array will gc eligible after
// constructor completes
//here i need the array to do some calculation
anotherClass.someCalculation(array,source,destination);
this.source = source;
this.destination = destination;
this.weight = weight;
}
}
The problem with your code is due to following reason.
Everytime you create an object of Edge class
Edge edge = new Edge(new Vertex(i), new Vertex(j),weight);
the following constructor is invoked
public Edge(Vertex source, Vertex destination, int weight)
Therefore, for each object creation of Edge class the following 500 X 500 integer array is created
int[][] array = new int[500][500];
which consumes a lot of memory because you are creating around 5000 X 5000 objects of Edge class.
So, if you want to store edge weights , try to make a global integer array for it.
It's important to know that an array index reference is 4 bytes. So a two-dimensional array defined as 500, 500 is 1MB without any data being stored. You're also creating 5000 x 5000 of them. Just the arrays with no other data will be over 25TB.
A GC overhead limit reached occurs when the GC is spending most of its time garbage collecting with view results. And this would be expected since the all of the arrays being created are reachable from the main thread.
You should create and expand arrays as you need it (think of an ArrayList) or only create the arrays when you need them.
If you ABSOLUTELY need all the edges in memory you will either need to increase the heap or learn to serialize the results and load only what you need for calculations.
I made a Java object that has a lot of Boolean fields. I was considering using BitSet when I started questioning its usefulness.
Of course, one would use it for memory reasons, since a boolean is 8 bits alone, 4 in an array. With BitSet, each value is stored as a single bit. However, wouldn't the memory saved be blown out of the water by the following overhead?
BitSet class and method definitions meta data (per runtime)
The objects needed as keys to semantically retrieve the values (per class using BitSet)
The meta data for the bits array in BitSet (per instance)
versus using booleans:
boolean value (per instance)
Let's take a look at the following class:
private boolean isVisible; // 8 bits per boolean * 82 booleans = ~0.6Kb
// 81 lines later...
private boolean isTasty;
// ...
public boolean isVisible() { return isVisible; }
// ...
public boolean isTasty() { return isTasty; }
public void setVisible(boolean newVisibility) { isVisible = newVisibility; }
// ...
public void setTasty(boolean newTastiness) { isTasty = newTastiness; }
Now, if I were to combine all my booleans into one BitSet and still keep my code semantic, I might do this:
private static final int _K_IS_VISIBLE = 1; // 32 bits per key * 82 keys = ~2.5Kb
// ...
private static final int _K_IS_TASTY = 82;
private BitSet bools = new BitSet(82); // 2 longs = 64b
// ...
public boolean isVisible() { return bools.get(_K_IS_VISIBLE); }
// ...
public boolean isTasty() { return bools.get(_K_IS_TASTY); }
public void setVisible(boolean newVisibility) { bools.set(_K_IS_VISIBLE, newVisibility); }
// ...
public void setTasty(boolean newTastiness) { bools.set(_K_IS_TASTY, newTastiness); }
tl;dr
costOfUsingBitSet =
bitSetMethodsAndClassMetaData + // BitSet class overhead
(numberOfKeysToRetrieveBits * Integer.SIZE) + // Semantics overhead
(numberOfBitSetsUsed * floor((bitsPerBitSet / Long.SIZE) + 1)); // BitSet internal array overhead
and possibly more. Whereas using booleans would be:
costOfBooleans =
(numberOfBooleansOutsideArrays * 8) +
(numberOfBooleansInsideArrays * 4);
I feel like the overhead of BitSet is much higher. Am I right?
BitSet will be less memory, using only one bit is far more efficient. The method overhead you are looking at is once no matter how many instances of your class you have, so its cost is amortized basically to 0
The advantage of a boolean over an array of booleans or a BitSet is that it is not an Object, so you have one less level of indirection
Cache hits are a primary driver for performance so you have to weigh fewer cache hits with the higher likelihood of evicting data from the cache due to higher memory consumption
Roughly speaking a few booleans will be faster but more memory, as you have more fields or get closer to huge numbers, the scale will top towards BitSet
Nice space comparison here between boolean[] and BitSet:
https://www.baeldung.com/java-boolean-array-bitset-performance
Think they swapped labels here. Should be more bits per memory (Blue) in BitSet.
The key takeaway here is, the BitSet beats the boolean[] in terms of the memory footprint, except for a minimal number of bits.
An alternative in your example is to use 2 long as bit flags.
class A {
// 1st set
private static final long IS_VISIBLE_MASK = 1;
...
private static final long IS_DARK_MASK = 1 << 63 ;
// 2nd set...
private static final long IS_TASTY_MASK = 1;
...
// IS_VISIBLE_MASK .. IS_DARK_MASK
long data1;
// IS_TASTY_MASK ...
long data2;
boolean isDark = (data1 & IS_DARK_MASK) != 0;
}
Limitations
BitSet comes with silly limitations, as you can reach a max of Integer.MAX_VALUE bits. I needed as much bits as I could store in RAM. So modified the original implementation in two ways:
It waste less computing for fixed sized LongBitSets (i.e. user specifies length at construction time).
it can reach the last bit in the biggest possible word array.
Added details on limitations in this thread
more updates
As is explained in the selected answer, the problem is in JVM's garbage collection algorithm.
JVM uses card marking algorithm to keep track of modified references in object fields. For each reference assignment to a field, it marks an associated bit in the card to be true -- this causes a false-sharing hence blocks scaling. The details are well described in this article: https://blogs.oracle.com/dave/entry/false_sharing_induced_by_card
The option -XX:+UseCondCardMark (in Java 1.7u40 and up) mitigates the problem, and makes it scale almost perfectly.
updates
I found out (hinted from Park Eung-ju) that assigning an object into a field variable makes the difference. If I remove the assignment, it scales perfectly.
I think probably it has something to do with Java memory model -- such as, an object reference must point to a valid address before it gets visible, but I am not completely sure. Both double and Object reference (likely) have 8 bytes size on 64 bit machine, so it seems to me that assigning a double value and an Object reference should be the same in terms of synchronization.
Anyone has a reasonable explanation?
Here I have a weird Java multi-threading scalability problem.
My code simply iterates over an array (using the visitor pattern) to compute simple floating-point operations and assign the result to another array. There is no data dependency, nor synchronization, so it should scale linearly (2x faster with 2 threads, 4x faster with 4 threads).
When primitive (double) array is used, it scales very well. When object type (e.g. String) array is used, it doesn't scale at all (even though the value of the String array is not used at all...)
Here's the entire source code:
import java.util.ArrayList;
import java.util.Arrays;
import java.util.concurrent.CyclicBarrier;
class Table1 {
public static final int SIZE1=200000000;
public static final boolean OBJ_PARAM;
static {
String type=System.getProperty("arg.type");
if ("double".equalsIgnoreCase(type)) {
System.out.println("Using primitive (double) type arg");
OBJ_PARAM = false;
} else {
System.out.println("Using object type arg");
OBJ_PARAM = true;
}
}
byte[] filled;
int[] ivals;
String[] strs;
Table1(int size) {
filled = new byte[size];
ivals = new int[size];
strs = new String[size];
Arrays.fill(filled, (byte)1);
Arrays.fill(ivals, 42);
Arrays.fill(strs, "Strs");
}
public boolean iterate_range(int from, int to, MyVisitor v) {
for (int i=from; i<to; i++) {
if (filled[i]==1) {
// XXX: Here we are passing double or String argument
if (OBJ_PARAM) v.visit_obj(i, strs[i]);
else v.visit(i, ivals[i]);
}
}
return true;
}
}
class HeadTable {
byte[] filled;
double[] dvals;
boolean isEmpty;
HeadTable(int size) {
filled = new byte[size];
dvals = new double[size];
Arrays.fill(filled, (byte)0);
isEmpty = true;
}
public boolean contains(int i, double d) {
if (filled[i]==0) return false;
if (dvals[i]==d) return true;
return false;
}
public boolean contains(int i) {
if (filled[i]==0) return false;
return true;
}
public double groupby(int i) {
assert filled[i]==1;
return dvals[i];
}
public boolean insert(int i, double d) {
if (filled[i]==1 && contains(i,d)) return false;
if (isEmpty) isEmpty=false;
filled[i]=1;
dvals[i] = d;
return true;
}
public boolean update(int i, double d) {
assert filled[i]==1;
dvals[i]=d;
return true;
}
}
class MyVisitor {
public static final int NUM=128;
int[] range = new int[2];
Table1 table1;
HeadTable head;
double diff=0;
int i;
int iv;
String sv;
MyVisitor(Table1 _table1, HeadTable _head, int id) {
table1 = _table1;
head = _head;
int elems=Table1.SIZE1/NUM;
range[0] = elems*id;
range[1] = elems*(id+1);
}
public void run() {
table1.iterate_range(range[0], range[1], this);
}
//YYY 1: with double argument, this function is called
public boolean visit(int _i, int _v) {
i = _i;
iv = _v;
insertDiff();
return true;
}
//YYY 2: with String argument, this function is called
public boolean visit_obj(int _i, Object _v) {
i = _i;
iv = 42;
sv = (String)_v;
insertDiff();
return true;
}
public boolean insertDiff() {
if (!head.contains(i)) {
head.insert(i, diff);
return true;
}
double old = head.groupby(i);
double newval=Math.min(old, diff);
head.update(i, newval);
head.insert(i, diff);
return true;
}
}
public class ParTest1 {
public static int THREAD_NUM=4;
public static void main(String[] args) throws Exception {
if (args.length>0) {
THREAD_NUM = Integer.parseInt(args[0]);
System.out.println("Setting THREAD_NUM:"+THREAD_NUM);
}
Table1 table1 = new Table1(Table1.SIZE1);
HeadTable head = new HeadTable(Table1.SIZE1);
MyVisitor[] visitors = new MyVisitor[MyVisitor.NUM];
for (int i=0; i<visitors.length; i++) {
visitors[i] = new MyVisitor(table1, head, i);
}
int taskPerThread = visitors.length / THREAD_NUM;
MyThread[] threads = new MyThread[THREAD_NUM];
CyclicBarrier barrier = new CyclicBarrier(THREAD_NUM+1);
for (int i=0; i<THREAD_NUM; i++) {
threads[i] = new MyThread(barrier);
for (int j=taskPerThread*i; j<taskPerThread*(i+1); j++) {
if (j>=visitors.length) break;
threads[i].addVisitors(visitors[j]);
}
}
Runtime r=Runtime.getRuntime();
System.out.println("Force running gc");
r.gc(); // running GC here (excluding GC effect)
System.out.println("Running gc done");
// not measuring 1st run (excluding JIT compilation effect)
for (int i=0; i<THREAD_NUM; i++) {
threads[i].start();
}
barrier.await();
for (int i=0; i<10; i++) {
MyThread.start = true;
long s=System.currentTimeMillis();
barrier.await();
long e=System.currentTimeMillis();
System.out.println("Iter "+i+" Exec time:"+(e-s)/1000.0+"s");
}
}
}
class MyThread extends Thread {
static volatile boolean start=true;
static int tid=0;
int id=0;
ArrayList<MyVisitor> tasks;
CyclicBarrier barrier;
public MyThread(CyclicBarrier _barrier) {
super("MyThread"+(tid++));
barrier = _barrier;
id=tid;
tasks = new ArrayList(256);
}
void addVisitors(MyVisitor v) {
tasks.add(v);
}
public void run() {
while (true) {
while (!start) { ; }
for (int i=0; i<tasks.size(); i++) {
MyVisitor v=tasks.get(i);
v.run();
}
start = false;
try { barrier.await();}
catch (InterruptedException e) { break; }
catch (Exception e) { throw new RuntimeException(e); }
}
}
}
The Java code can be compiled with no dependency, and you can run it with the following command:
java -Darg.type=double -server ParTest1 2
You pass the number of worker threads as an argument (the above uses 2 threads).
After setting up the arrays (that is excluded from the measured time), it does a same operation for 10 times, printing out the execution time at each iteration.
With the above option, it uses double array, and it scales very well with 1,2,4 threads (i.e. the execution time reduces to 1/2, and 1/4), but
java -Darg.type=Object -server ParTest1 2
With this option, it uses Object (String) array, and it doesn't scale at all!
I measured the GC time, but it was insignificant (and I also forced running GC before measuring times). I have tested with Java 6 (updates 43) and Java 7 (updates 51), but it's the same.
The code has comments with XXX and YYY describing the difference when arg.type=double or arg.type=Object option is used.
Can you figure out what is going on with the String-type argument passing here?
HotSpot VM generates following assemblies for reference type putfield bytecode.
mov ref, OFFSET_OF_THE_FIELD(this) <- this puts the new value for field.
mov this, REGISTER_A
shr 0x9, REGISTER_A
movabs OFFSET_X, REGISTER_B
mov %r12b, (REGISTER_A, REGISTER_B, 1)
putfield operation is completed in 1 instruction.
but there are more instructions following.
They are "Card Marking" instructions. (http://www.ibm.com/developerworks/library/j-jtp11253/)
Writing reference field to every objects in a card (512 bytes), will store a value in a same memory address.
And I guess, store to same memory address from multiple cores mess up with cache and pipelines.
just add
byte[] garbage = new byte[600];
to MyVisitor definition.
then every MyVisitor instances will be spaced enough not to share card marking bit, you will see the program scales.
This is not a complete answer but may provide a hint for you.
I have changed your code
Table1(int size) {
filled = new byte[size];
ivals = new int[size];
strs = new String[size];
Arrays.fill(filled, (byte)1);
Arrays.fill(ivals, 42);
Arrays.fill(strs, "Strs");
}
to
Table1(int size) {
filled = new byte[size];
ivals = new int[size];
strs = new String[size];
Arrays.fill(filled, (byte)1);
Arrays.fill(ivals, 42);
Arrays.fill(strs, new String("Strs"));
}
after this change, the running time with 4 threads with object type array reduced.
According to http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.7
For the purposes of the Java programming language memory model, a single write to a non-volatile long or double value is treated as two separate writes: one to each 32-bit half. This can result in a situation where a thread sees the first 32 bits of a 64-bit value from one write, and the second 32 bits from another write.
Writes and reads of volatile long and double values are always atomic.
Writes to and reads of references are always atomic, regardless of whether they are implemented as 32-bit or 64-bit values.
Assigning references are always atomic,
and double is not atomic except when it is defined as volatile.
The problem is sv can be seen by other threads and its assignment is atomic.
Therefore, wrapping visitor's member variables (i, iv, sv) using ThreadLocal will solve the problem.
"sv = (String)_v;" makes the difference. I also confirmed that the type casting is not the factor. Just accessing _v can't make the difference. Assigning some value to sv field makes the difference. But I can't explain why.
I'm doing a comparison of methods in a construction of an array without repeated elements and then get the time it took and the memory used.
For the hash method and the treeset method the memory prints without problem, but for the bruteforce search it doesn't print any memory. Is it possible that the brute force doesn't use any "respectable" memory because it just compares a element one by one? this is the code I have. Is it possible that is something wrong?
public static void main(String[] args)
{
Random r = new Random();
int warmup = 0;
while(warmup<nr) {
tempoInicial = System.nanoTime();
memoriaInicial = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
while(ne<max)
{
valor = r.nextInt(maxRandom);
acrescentar();
}
tempoFinal = System.nanoTime();
memoriaFinal = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
retirar();
System.gc();
warmup++;
}
and
private static void acrescentar()
{
if(usaTreeSet)
{
if(ts.contains(valor))
return;
ts.add(valor);
}
if(usaHashSet)
{
if(hs.contains(valor))
return;
hs.add(valor);
}
if(usaBruteForce)
{
for(int i=0; i<ne; i++)
{
if(pilha[i]==valor)
return;
}
}
pilha[ne]=valor;
ne++;
}
When testing small amounts of memory try turning off the TLAB and no object is too small. ;) -XX:-UseTLAB The TLAB allocates blocks of memory at a time to each thread. These blocks do not count to the free memory.
You might find this article on Getting the size of an Object useful.