I have a binary tree in which every node represents a electronics gate (AND, OR, ...). My mission is to calculate the total value of the tree (like this one in the picture, a binary tree):
This is my code so far (without the threads implementation):
gate_node:
public class gate_node {
gate_node right_c, left_c;
Oprtator op;
int value;
int right_v, left_v;
public gate_node(gate_node right, gate_node left, Oprtator op) {
this.left_c = left;
this.right_c = right;
this.op = op;
right_v = left_v = 0;
}
void add_input(int right_v, int left_v){
this.right_v=right_v;
this.left_v=left_v;
}
int compute(int array_index, int arr_size) {
/*
* The following use of a static sInputCounter assumes that the
* static/global input array is ordered from left to right, irrespective
* of "depth".
*/
final int left, right;
System.out.print(this.op+"(");
if (null != this.left_c) {
left = this.left_c.compute(array_index,arr_size/2);
System.out.print(",");
} else {
left = main_class.arr[array_index];
System.out.print(left + ",");
}
if (null != this.right_c) {
right = this.right_c.compute(array_index + arr_size/2,arr_size/2);
System.out.print(")");
} else {
right = main_class.arr[array_index + 1];
System.out.print(right + ")");
}
return op.calc(left, right);
}
}
Oprtator:
public abstract class Oprtator {
abstract int calc(int x, int y);
}
And
public class and extends Oprtator {
public int calc(int x, int y){
return (x&y);
}
}
Or
public class or extends Oprtator {
public int calc(int x, int y){
return (x|y);
}
}
The tree:
public class tree implements Runnable {
gate_node head;
tree(gate_node head) {
this.head = head;
}
void go_right() {
head = head.right_c;
}
void go_left() {
head = head.left_c;
}
#Override
public void run() {
// TODO Auto-generated method stub
}
}
main class
public class main_class {
public static int arr[] = { 1, 1, 0, 1, 0, 1, 0, 1 };
public static void main(String[] args) {
tree t = new tree(new gate_node(null, null, new and()));
t.head.right_c = new gate_node(null, null, new or());
t.head.right_c.right_c = new gate_node(null, null, new and());
t.head.right_c.left_c = new gate_node(null, null, new and());
t.head.left_c = new gate_node(null, null, new or());
t.head.left_c.right_c = new gate_node(null, null, new and());
t.head.left_c.left_c = new gate_node(null, null, new and());
int res = t.head.compute(0, arr.length);
System.out.println();
System.out.println("The result is: " + res);
}
}
I want to calculate it using thread pool, like this algorithm:
Preparation:
Implement each gate as a class/object. It has to have 2 attributes: input A, input B and a way to calculate result;
Implement a tree. Each node is a pair (gate, next_node). Root is a node with next_node being null. Leaves are nodes such that no other node points to it.
Use a shared (thread safe) queue of nodes. It is initially empty.
There is a fixed number (pick one, does not depend on number of gates) of threads which continuously wait for an element from the queue (unless the result is reached in which case they just quit).
Loop:
Whenever an input occurs on a node put the node in a queue (at the beginning inputs go to leaves). This can be simply implemented by defining add_input method on a gate.
A thread picks up a node from queue:
If one of the input is missing discard it (it will be there one more time when second input appears). Another idea is to put the node in a queue only when both inputs are there.
If both inputs are there, then calculate result and pass it to next_node if it is not null (and put next_node in the queue). If next_node is null, then this is your result - break the loop and finalize.
the only problem is that I don't know how to create a shared BlockingQueue that every node object in the tree can insert himself into it, and how to create an array of fixed sized of threads that constantly waits for new elements in the queue to be available (and then execute them)..... until the head is removed from the list (meaning we are done calculating).
I searched online for BlockingQueue examples but I only found producer and consumer examples and I'm having a hard time to move those example to fit my problem. I would really appreciate it if anyone could try to help me.
I can give you a few starting pointers to get you going :)
To create your threads just spawn that many threads:
for (int i=0;i<MAX_THREADS;i++) {
new Thread(myRunnable).start();
}
You may well want to store a reference to those threads but it isn't required. The threads need no special setup as they are all identical and they all just sit there grabbing items off the queue.
To share a blocking queue the simplest way is just to make it static and final:
static final BlockingQueue blockingQueue();
Now all the threads can access it.
Incidentally if I was doing this I wouldn't use the Queue at all, I'd use a ThreadPoolExecutor and just send the processing to that as new runnables.
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html
Related
I'm using PriorityQueue and i've implemented comparable class, with compareTo method,
Now i want to know if my queue is sorted, if i use poll() method will this return the queue of the minimum costSum?
Class: State.java
public class State<N extends Comparable<N>> implements Comparable<State<N>> {
private final ArrayList<Integer> board;
private State<N> predecessor;
private double totalCostFromStart; //g(x)
private double minimumRemainingCostToTarget; //h(x)
private double costSum; //f(x)
private Move direction;
public State(ArrayList<Integer> board,
State<N> predecessor,
double minimumRemainingCostToTarget,
Move direction) {
this.board = board;
this.predecessor = predecessor;
this.totalCostFromStart = predecessor == null ? 0 : predecessor.totalCostFromStart + 1;
this.minimumRemainingCostToTarget = minimumRemainingCostToTarget;
this.direction=direction;
calculateCostSum();
}
private void calculateCostSum() {
this.costSum = this.totalCostFromStart + this.minimumRemainingCostToTarget;
}
#Override
public int compareTo(State<N> nNode) {
int compare = Double.compare(this.costSum, nNode.costSum);
if (compare == 0) return 0;
else return this.costSum>nNode.costSum ? 1:-1;
}
Class : AStar.java
public State AStar(ArrayList<Integer> initialBoard,
State source,
ArrayList<Integer> target,
Heuristic heuristic){
int minimumRemainingCostToTarget= heuristic.getRank(initialBoard, target);
source = new State( initialBoard,null,0, minimumRemainingCostToTarget,null);
PriorityQueue<State> open = new PriorityQueue<>();
Set<ArrayList<Integer>> close = new HashSet<>(181440);
//add initial state to ouverts, f(n) is an attribut in source.
open.add(source);
while(!close.isEmpty()){
State currentState = open.poll();//<<<----------------------
}
return null;
}
Now i want to know if my queue is sorted, if i use poll() method will this return the queue of the minimum costSum?
The Javadoc describes this:
The head of this queue is the least element with respect to the specified ordering. ... The queue retrieval operations poll, remove, peek, and element access the element at the head of the queue.
So, yes, it is the minimum element.
Note, however, that the queue isn't internally sorted: if you print a priority queue, you may note that they do not appear in ascending order. The elements are simply stored in an order with the heap property, which allows efficient updating of the data structure once the minimum element is removed.
I'm trying to get a LinkedList from my buffer class (BufferCharacter) and then loop through every element in the LinkedList in the Reader class. But when I try return the LinkedList in the get() method in BufferCharacter class I then cannot loop through it in the Reader class. I've tried to loop through the List in the Buffer class and then return each element from there but that doesn't work either.
Any help is highly appreciated!
public class CharacterBuffer {
private char ch;
private LinkedList buffer = new LinkedList();
private boolean filled;
public void put(char ch) {
buffer.addLast(ch);
}
public void filled() {
filled = true;
}
public Object get() throws InterruptedException {
while (buffer.isEmpty()) {
// wait();
return "Waiting";
}
return buffer;
}
public synchronized void putSync(char ch) {
buffer.addLast(ch);
}
public synchronized Object getSync() throws InterruptedException {
while (buffer.isEmpty()) {
// wait();
return "---------";
}
for(int i = 0; i<buffer.size(); i++){
System.out.println(buffer.get(i));
}
return buffer;
}
public int size(){
return buffer.size();
}
}
public class Reader extends Thread {
private GUIMutex gui;
private CharacterBuffer buffer;
private boolean isSynced;
public Reader(GUIMutex gui, CharacterBuffer buffer, boolean isSynced) {
this.gui = gui;
this.buffer = buffer;
this.isSynced = isSynced;
}
public void run() {
String data = "test";
while (true) {
try {
// data = buffer.get();
if (isSynced) {
gui.setReaderText(buffer.getSync() + "\n");
} else {
for(int i = 0; i<buffer.get().size(); i++){
gui.setReaderText(i);
}
gui.setReaderText(buffer.get() + "\n");
}
Thread.sleep(700);
} catch (InterruptedException e) {
}
}
}
}
I think you don't understand what you are talking about; so lets try to shed some light here.
In the end, you are talking about some kind of "collection" class that contains multiple elements; in your case a LinkedList. The thing is: in order to make use of such a class, you need a clear understanding of the APIs you intend to provide.
You figured that you want to use that buffer to store individual char values, that you add using putSync().
But then ... what exactly is getSync() supposed to do?
In your case, you are simply returning the buffer, and that is probably wrong.
Instead, you want to have methods like:
synchronized boolean hasNext()
and
synchronized char getNext()
A user of your class can call the first method to figure: are there other chars; and if so, the second method returns those values.
That would a first, simple way to improve your code. A more reasonable way would be that you implement a method getIterator() that would return an object implementing the Iterator interface.
Other things to note: if you are using the "built-in" LinkedList; please understand that this class supports generics!
Thus you should be using it like:
private final List<Character> buffer = new LinkedList<>();
to get all the benefits from using strongly typed collections!
EDIT: upon your comments, I think using a LinkedList is simply the wrong approach here.
Instead of using a List, you want to use a Queue, like:
private final Queue<Character> buffer = new ConcurrentLinkedQueue<>();
That class gives you that functionality that one party can add elements at the queue tail; whereas another party removes elements from the queue head.
Extra bonus: that class is doing the synchronisation work for you already, so you don't need to care about that!
Use StringBuilder instead
StringBuilder sb = new StringBuilder(128);
// add chars using sb.append(char)
for (int i = 0, n = sb.length(); i < n; i++)
{
char c = sb.charAt(i);
}
or
String s = sb.toString();
I'm currently learning algorithms and was trying to adapt some code from Algorithms by Robert Sedgewick. Here's a link to the part of the code I'm having problems with: http://algs4.cs.princeton.edu/44sp/DirectedCycle.java.html.
For my code the dependencies are a Vertex class with a constructor that just takes a vertex number. I also have a DirectedEdge class, that has a fromVertex, a toVertex and a weight. Finally, I have an EdgeWeightedDigraph class that has a list of the vertices within the graph and a DirectedEdge adjacency list that holds a list of the edges adjacent to a particular vertex and a test class that just initializes the various instances variables with the data in addition to executing the DirectedCycle.java program. The code compiles and runs but returns a wrong cycle using the data supplied by the book (http://algs4.cs.princeton.edu/42digraph/tinyDG.txt). The book specifies a Directed cycle: 3 5 4 3 but my code returns 3 2 3 instead. I noticed that the code runs fine until it encounters an already marked vertex and the onStack else-if code executes. For some reason the loop that contains the first call to dfs() iterates only once for index = 0 so after a marked vertex is encountered it does not retrace its steps and continue with the next vertex as it should with a depth first search. Not sure what I'm missing but any help will be appreciated. Please let me know if you need me to include the other code for the dependent classes listed above.
Here's my own code:
import java.util.Stack;
public class DirectedCycle {
private boolean[] marked;
private Vertex[] edgeTo;
private Stack<Vertex> cycle;
private boolean[] onStack;
public DirectedCycle(EdgeWeightedDigraph graph) {
onStack = new boolean[graph.getNumOfVertices()];
edgeTo = new Vertex[graph.getNumOfVertices()];
marked = new boolean[graph.getNumOfVertices()];
for (int index = 0; index < graph.getVertices().size(); index++) {
if (!marked[index] && cycle == null) {
dfs(graph, graph.getVertex(index));
}
}
}
private void dfs(EdgeWeightedDigraph graph, Vertex vertex) {
onStack[vertex.getVertexNumber()] = true;
marked[vertex.getVertexNumber()] = true;
for (DirectedEdge w : graph.adjacent(vertex)) {
if (this.hasCycle()) {
return;
}
else if (!marked[w.toVertex().getVertexNumber()]) {
edgeTo[w.toVertex().getVertexNumber()] = vertex;
dfs(graph, w.toVertex());
}
else if (onStack[w.toVertex().getVertexNumber()]) {
cycle = new Stack<>();
for (Vertex v = vertex;
v.getVertexNumber() != w.toVertex().getVertexNumber();
v = edgeTo[v.getVertexNumber()]) {
cycle.push(v);
}
cycle.push(w.toVertex());
cycle.push(vertex);
}
}
onStack[vertex.getVertexNumber()] = false;
}
public boolean hasCycle() {
return cycle != null;
}
public Iterable<Vertex> cycle() {
return cycle;
}
}
Question Background
I am comparing two (at a time, actually many) text files, and I want to determine how similar they are. To do so, I have created small, overlapping groups of text from each file. I now want to determine the number of those groups from one file which are also from the other file.
I would prefer to use only Java 8 with no external libraries.
Attempts
These are my two fastest methods. The first contains a bunch of logic which allows it to stop if meeting the threshold is not possible with the remaining elements (this saves a bit of time in total, but of course executing the extra logic also takes time). The second is slower. It does not have those optimizations, actually determines the intersection rather than merely counting it, and uses a stream, which is quite new to me.
I have an integer threshold and dblThreshold (the same value cast to a double), which are the minimum percentage of the smaller file which must be shared to be of interest. Also, from my limited testing, it seems that writing all the logic for either set being larger is faster than calling the method again with reversed arguments.
public int numberShared(Set<String> sOne, Set<String> sTwo) {
int numFound = 0;
if (sOne.size() > sTwo.size()) {
int smallSize = sTwo.size();
int left = smallSize;
for (String item: sTwo) {
if (numFound < threshold && ((double)numFound + left < (dblThreshold) * smallSize)) {
break;
}
if (sOne.contains(item)) {
numFound++;
}
left--;
}
} else {
int smallSize = sOne.size();
int left = smallSize;
for (String item: sOne) {
if (numFound < threshold && ((double)numFound + left < (dblThreshold) * smallSize)) {
break;
}
if (sTwo.contains(item)) {
numFound++;
}
left--;
}
}
return numFound;
}
Second method:
public int numberShared(Set<String> sOne, Set<String> sTwo) {
if (sOne.size() < sTwo.size()) {
long numFound = sOne.parallelStream()
.filter(segment -> sTwo.contains(segment))
.collect(Collectors.counting());
return (int)numFound;
} else {
long numFound = sTwo.parallelStream()
.filter(segment -> sOne.contains(segment))
.collect(Collectors.counting());
return (int)numFound;
}
}
Any suggestions for improving upon these methods, or novel ideas and approaches to the problem are much appreciated!
Edit: I just realized that the first part of my threshold check (which seeks to eliminate, in some cases, the need for the second check with doubles) is incorrect. I will revise it as soon as possible.
If I understand you correctly, you have already determined which methods are fastest, but aren't sure how to implement your threshold-check when using Java 8 streams. Here's one way you could do that - though please note that it's hard for me to do much testing without having proper data and knowing what thresholds you're interested in, so take this simplified test case with a grain of salt (and adjust as necessary).
public class Sets {
private static final int NOT_ENOUGH_MATCHES = -1;
private static final String[] arrayOne = { "1", "2", "4", "9" };
private static final String[] arrayTwo = { "2", "3", "5", "7", "9" };
private static final Set<String> setOne = new HashSet<>();
private static final Set<String> setTwo = new HashSet<>();
public static void main(String[] ignoredArguments) {
setOne.addAll(Arrays.asList(arrayOne));
setTwo.addAll(Arrays.asList(arrayTwo));
boolean isFirstSmaller = setOne.size() < setTwo.size();
System.out.println("Number shared: " + (isFirstSmaller ?
numberShared(setOne, setTwo) : numberShared(setTwo, setOne)));
}
private static long numberShared(Set<String> smallerSet, Set<String> largerSet) {
SimpleBag bag = new SimpleBag(3, 0.5d, largerSet, smallerSet.size());
try {
smallerSet.forEach(eachItem -> bag.add(eachItem));
return bag.duplicateCount;
} catch (IllegalStateException exception) {
return NOT_ENOUGH_MATCHES;
}
}
public static class SimpleBag {
private Map<String, Boolean> items;
private int threshold;
private double fraction;
protected int duplicateCount = 0;
private int smallerSize;
private int numberLeft;
public SimpleBag(int aThreshold, double aFraction, Set<String> someStrings,
int otherSetSize) {
threshold = aThreshold;
fraction = aFraction;
items = new HashMap<>();
someStrings.forEach(eachString -> items.put(eachString, false));
smallerSize = otherSetSize;
numberLeft = otherSetSize;
}
public void add(String aString) {
Boolean value = items.get(aString);
boolean alreadyExists = value != null;
if (alreadyExists) {
duplicateCount++;
}
items.put(aString, alreadyExists);
numberLeft--;
if (cannotMeetThreshold()) {
throw new IllegalStateException("Can't meet threshold; stopping at "
+ duplicateCount + " duplicates");
}
}
public boolean cannotMeetThreshold() {
return duplicateCount < threshold
&& (duplicateCount + numberLeft < fraction * smallerSize);
}
}
}
So I've made a simplified "Bag-like" implementation that starts with the contents of the larger set mapped as keys to false values (since we know there's only one of each). Then we iterate over the smaller set, adding each item to the bag, and, if it's a duplicate, switching the value to true and keeping track of the duplicate count (I initially did a .count() at the end of .stream().allMatch(), but this'll suffice for your special case). After adding each item, we check whether we can't meet the threshold, in which case we throw an exception (arguably not the prettiest way to exit the .forEach(), but in this case it is an illegal state of sorts). Finally, we return the duplicate count, or -1 if we encountered the exception. In my little test, change 0.5d to 0.51d to see the difference.
more updates
As is explained in the selected answer, the problem is in JVM's garbage collection algorithm.
JVM uses card marking algorithm to keep track of modified references in object fields. For each reference assignment to a field, it marks an associated bit in the card to be true -- this causes a false-sharing hence blocks scaling. The details are well described in this article: https://blogs.oracle.com/dave/entry/false_sharing_induced_by_card
The option -XX:+UseCondCardMark (in Java 1.7u40 and up) mitigates the problem, and makes it scale almost perfectly.
updates
I found out (hinted from Park Eung-ju) that assigning an object into a field variable makes the difference. If I remove the assignment, it scales perfectly.
I think probably it has something to do with Java memory model -- such as, an object reference must point to a valid address before it gets visible, but I am not completely sure. Both double and Object reference (likely) have 8 bytes size on 64 bit machine, so it seems to me that assigning a double value and an Object reference should be the same in terms of synchronization.
Anyone has a reasonable explanation?
Here I have a weird Java multi-threading scalability problem.
My code simply iterates over an array (using the visitor pattern) to compute simple floating-point operations and assign the result to another array. There is no data dependency, nor synchronization, so it should scale linearly (2x faster with 2 threads, 4x faster with 4 threads).
When primitive (double) array is used, it scales very well. When object type (e.g. String) array is used, it doesn't scale at all (even though the value of the String array is not used at all...)
Here's the entire source code:
import java.util.ArrayList;
import java.util.Arrays;
import java.util.concurrent.CyclicBarrier;
class Table1 {
public static final int SIZE1=200000000;
public static final boolean OBJ_PARAM;
static {
String type=System.getProperty("arg.type");
if ("double".equalsIgnoreCase(type)) {
System.out.println("Using primitive (double) type arg");
OBJ_PARAM = false;
} else {
System.out.println("Using object type arg");
OBJ_PARAM = true;
}
}
byte[] filled;
int[] ivals;
String[] strs;
Table1(int size) {
filled = new byte[size];
ivals = new int[size];
strs = new String[size];
Arrays.fill(filled, (byte)1);
Arrays.fill(ivals, 42);
Arrays.fill(strs, "Strs");
}
public boolean iterate_range(int from, int to, MyVisitor v) {
for (int i=from; i<to; i++) {
if (filled[i]==1) {
// XXX: Here we are passing double or String argument
if (OBJ_PARAM) v.visit_obj(i, strs[i]);
else v.visit(i, ivals[i]);
}
}
return true;
}
}
class HeadTable {
byte[] filled;
double[] dvals;
boolean isEmpty;
HeadTable(int size) {
filled = new byte[size];
dvals = new double[size];
Arrays.fill(filled, (byte)0);
isEmpty = true;
}
public boolean contains(int i, double d) {
if (filled[i]==0) return false;
if (dvals[i]==d) return true;
return false;
}
public boolean contains(int i) {
if (filled[i]==0) return false;
return true;
}
public double groupby(int i) {
assert filled[i]==1;
return dvals[i];
}
public boolean insert(int i, double d) {
if (filled[i]==1 && contains(i,d)) return false;
if (isEmpty) isEmpty=false;
filled[i]=1;
dvals[i] = d;
return true;
}
public boolean update(int i, double d) {
assert filled[i]==1;
dvals[i]=d;
return true;
}
}
class MyVisitor {
public static final int NUM=128;
int[] range = new int[2];
Table1 table1;
HeadTable head;
double diff=0;
int i;
int iv;
String sv;
MyVisitor(Table1 _table1, HeadTable _head, int id) {
table1 = _table1;
head = _head;
int elems=Table1.SIZE1/NUM;
range[0] = elems*id;
range[1] = elems*(id+1);
}
public void run() {
table1.iterate_range(range[0], range[1], this);
}
//YYY 1: with double argument, this function is called
public boolean visit(int _i, int _v) {
i = _i;
iv = _v;
insertDiff();
return true;
}
//YYY 2: with String argument, this function is called
public boolean visit_obj(int _i, Object _v) {
i = _i;
iv = 42;
sv = (String)_v;
insertDiff();
return true;
}
public boolean insertDiff() {
if (!head.contains(i)) {
head.insert(i, diff);
return true;
}
double old = head.groupby(i);
double newval=Math.min(old, diff);
head.update(i, newval);
head.insert(i, diff);
return true;
}
}
public class ParTest1 {
public static int THREAD_NUM=4;
public static void main(String[] args) throws Exception {
if (args.length>0) {
THREAD_NUM = Integer.parseInt(args[0]);
System.out.println("Setting THREAD_NUM:"+THREAD_NUM);
}
Table1 table1 = new Table1(Table1.SIZE1);
HeadTable head = new HeadTable(Table1.SIZE1);
MyVisitor[] visitors = new MyVisitor[MyVisitor.NUM];
for (int i=0; i<visitors.length; i++) {
visitors[i] = new MyVisitor(table1, head, i);
}
int taskPerThread = visitors.length / THREAD_NUM;
MyThread[] threads = new MyThread[THREAD_NUM];
CyclicBarrier barrier = new CyclicBarrier(THREAD_NUM+1);
for (int i=0; i<THREAD_NUM; i++) {
threads[i] = new MyThread(barrier);
for (int j=taskPerThread*i; j<taskPerThread*(i+1); j++) {
if (j>=visitors.length) break;
threads[i].addVisitors(visitors[j]);
}
}
Runtime r=Runtime.getRuntime();
System.out.println("Force running gc");
r.gc(); // running GC here (excluding GC effect)
System.out.println("Running gc done");
// not measuring 1st run (excluding JIT compilation effect)
for (int i=0; i<THREAD_NUM; i++) {
threads[i].start();
}
barrier.await();
for (int i=0; i<10; i++) {
MyThread.start = true;
long s=System.currentTimeMillis();
barrier.await();
long e=System.currentTimeMillis();
System.out.println("Iter "+i+" Exec time:"+(e-s)/1000.0+"s");
}
}
}
class MyThread extends Thread {
static volatile boolean start=true;
static int tid=0;
int id=0;
ArrayList<MyVisitor> tasks;
CyclicBarrier barrier;
public MyThread(CyclicBarrier _barrier) {
super("MyThread"+(tid++));
barrier = _barrier;
id=tid;
tasks = new ArrayList(256);
}
void addVisitors(MyVisitor v) {
tasks.add(v);
}
public void run() {
while (true) {
while (!start) { ; }
for (int i=0; i<tasks.size(); i++) {
MyVisitor v=tasks.get(i);
v.run();
}
start = false;
try { barrier.await();}
catch (InterruptedException e) { break; }
catch (Exception e) { throw new RuntimeException(e); }
}
}
}
The Java code can be compiled with no dependency, and you can run it with the following command:
java -Darg.type=double -server ParTest1 2
You pass the number of worker threads as an argument (the above uses 2 threads).
After setting up the arrays (that is excluded from the measured time), it does a same operation for 10 times, printing out the execution time at each iteration.
With the above option, it uses double array, and it scales very well with 1,2,4 threads (i.e. the execution time reduces to 1/2, and 1/4), but
java -Darg.type=Object -server ParTest1 2
With this option, it uses Object (String) array, and it doesn't scale at all!
I measured the GC time, but it was insignificant (and I also forced running GC before measuring times). I have tested with Java 6 (updates 43) and Java 7 (updates 51), but it's the same.
The code has comments with XXX and YYY describing the difference when arg.type=double or arg.type=Object option is used.
Can you figure out what is going on with the String-type argument passing here?
HotSpot VM generates following assemblies for reference type putfield bytecode.
mov ref, OFFSET_OF_THE_FIELD(this) <- this puts the new value for field.
mov this, REGISTER_A
shr 0x9, REGISTER_A
movabs OFFSET_X, REGISTER_B
mov %r12b, (REGISTER_A, REGISTER_B, 1)
putfield operation is completed in 1 instruction.
but there are more instructions following.
They are "Card Marking" instructions. (http://www.ibm.com/developerworks/library/j-jtp11253/)
Writing reference field to every objects in a card (512 bytes), will store a value in a same memory address.
And I guess, store to same memory address from multiple cores mess up with cache and pipelines.
just add
byte[] garbage = new byte[600];
to MyVisitor definition.
then every MyVisitor instances will be spaced enough not to share card marking bit, you will see the program scales.
This is not a complete answer but may provide a hint for you.
I have changed your code
Table1(int size) {
filled = new byte[size];
ivals = new int[size];
strs = new String[size];
Arrays.fill(filled, (byte)1);
Arrays.fill(ivals, 42);
Arrays.fill(strs, "Strs");
}
to
Table1(int size) {
filled = new byte[size];
ivals = new int[size];
strs = new String[size];
Arrays.fill(filled, (byte)1);
Arrays.fill(ivals, 42);
Arrays.fill(strs, new String("Strs"));
}
after this change, the running time with 4 threads with object type array reduced.
According to http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.7
For the purposes of the Java programming language memory model, a single write to a non-volatile long or double value is treated as two separate writes: one to each 32-bit half. This can result in a situation where a thread sees the first 32 bits of a 64-bit value from one write, and the second 32 bits from another write.
Writes and reads of volatile long and double values are always atomic.
Writes to and reads of references are always atomic, regardless of whether they are implemented as 32-bit or 64-bit values.
Assigning references are always atomic,
and double is not atomic except when it is defined as volatile.
The problem is sv can be seen by other threads and its assignment is atomic.
Therefore, wrapping visitor's member variables (i, iv, sv) using ThreadLocal will solve the problem.
"sv = (String)_v;" makes the difference. I also confirmed that the type casting is not the factor. Just accessing _v can't make the difference. Assigning some value to sv field makes the difference. But I can't explain why.