Is there an ``arrayAdd`` method or something to facilitate array additions? - java

So I have an array
static final int N = maxNofActiveThreads;
static final int[]arr = new int[N*nofEntries];
Where the N threads write to mutually exclusive regions of the array.
I should now like to add a monitoring thread that will periodically collect the results for decision-making by simply summing up all the threads' tables.
I.e. in pseudo-code
int[] snapshot = arr[0 : nofEntries] + arr[nofEntries : 2*nofEntries] + ... + arr[(N-1) * nofEntries : N*nofEntries]
The obvious choice would be to simply create
int[] snapshot = new int[nofEntries]
System.arrayCopy(arr,0,snapshot,0,nofEntries);
and then walking through the rest of arr, adding one value at a time.
Is there a smarter/more efficient way?
Oh, and we don't care if we miss an update every so often, it will eventually show up on the next pass and that's fine. No need for any synchronisation.
(I should also mention that the project I'm working on requires me to use Java 7.)

The best I can imagine is to use the Arrays class static method parallelSetAll. As it tries to parallelize operation it may be more efficient. Javadoc for another parallel method says:
... computation is usually more efficient than sequential loops for large arrays.
So you could use:
private int tsum(int index) {
int val = 0;
for (int t=0; t<N; t++) {
val += arr[index + t * nofEntries];
}
return val;
}
and:
IntUnaryOperator generator = (x) -> tsum(x) ;
int[] snapshot = new int[nofEntries];
Arrays.parallelSetAll(snapshot, generator);

Related

Calculate all permutations of a collection in parallel

I need to calculate all permutations of a collection and i have a code for that but the problem is that it is linear and takes a lot of time.
public static <E> Set<Set<E>> getAllCombinations(Collection<E> inputSet) {
List<E> input = new ArrayList<>(inputSet);
Set<Set<E>> ret = new HashSet<>();
int len = inputSet.size();
// run over all numbers between 1 and 2^length (one number per subset). each bit represents an object
// include the object in the set if the corresponding bit is 1
for (int i = (1 << len) - 1; i > 0; i--) {
Set<E> comb = new HashSet<>();
for (int j = 0; j < len; j++) {
if ((i & 1 << j) != 0) {
comb.add(input.get(j));
}
}
ret.add(comb);
}
return ret;
}
I am trying to make the computation run in parallel.
I though of the option to writing the logic using recursion and then parallel execute the recursion call but i am not exactly sure how to do that.
Would appreciate any help.
There is no need to use recursion, in fact, that might be counter-productive. Since the creation of each combination can be performed independently of the others, it can be done using parallel Streams. Note that you don’t even need to perform the bit manipulations by hand:
public static <E> Set<Set<E>> getAllCombinations(Collection<E> inputSet) {
// use inputSet.stream().distinct().collect(Collectors.toList());
// to get only distinct combinations
// (in case source contains duplicates, i.e. is not a Set)
List<E> input = new ArrayList<>(inputSet);
final int size = input.size();
// sort out input that is too large. In fact, even lower numbers might
// be way too large. But using <63 bits allows to use long values
if(size>=63) throw new OutOfMemoryError("not enough memory for "
+BigInteger.ONE.shiftLeft(input.size()).subtract(BigInteger.ONE)+" permutations");
// the actual operation is quite compact when using the Stream API
return LongStream.range(1, 1L<<size) /* .parallel() */
.mapToObj(l -> BitSet.valueOf(new long[] {l}).stream()
.mapToObj(input::get).collect(Collectors.toSet()))
.collect(Collectors.toSet());
}
The inner stream operation, i.e. iterating over the bits, is too small to benefit from parallel operations, especially as it would have to merge the result into a single Set. But if the number of combinations to produce is sufficiently large, running the outer stream in parallel will already utilize all CPU cores.
The alternative is not to use a parallel stream, but to return the Stream<Set<E>> itself instead of collecting into a Set<Set<E>>, to allow the caller to chain the consuming operation directly.
By the way, hashing an entire Set (or lots of them) can be quite expensive, so the cost of the final merging step(s) are likely to dominate the performance. Returning a List<Set<E>> instead can dramatically increase the performance. The same applies to the alternative of returning a Stream<Set<E>> without collecting the combinations at all, as this also works without hashing the Sets.

Build Spark JavaRDD List from DropResult objects

(What's possible in Scala should be possible in Java, right? But I would take Scala suggestions as well)
I am not trying to iterate over an RDD, instead I need to build one with n elements from a random/simulator class of a type called DropResult. DropResult can't be cast into anything else.
I thought the Spark "find PI" example had me on the right track but no luck. Here's what I am trying:
On a one-time basis a DropResult is made like this:
make a single DropResult from pld (PipeLinkageData)
DropResult dropResultSeed = pld.doDrop();
I am trying something like this:
JavaRDD<DropResult> simCountRDD = spark.parallelize(makeRangeList(1, getSimCount())).foreach(pld.doDrop());
I just need to run pld.doDrop() about 10^6 times on the cluster and put the results in a Spark RDD for the next operation, also on the cluster. I can't figure out what kind of function to use on "parallelize" to make this work.
makeRangeList:
private List<Integer> makeRangeList(int lower, int upper) {
List<Integer> range = IntStream.range(lower, upper).boxed().collect(Collectors.toList());
return range;
}
(FWIW I was trying to use the Pi example from http://spark.apache.org/examples.html as a model of how to do a for loop to create a JavaRDD)
int count = spark.parallelize(makeRange(1, NUM_SAMPLES)).filter(new Function<Integer, Boolean>() {
public Boolean call(Integer i) {
double x = Math.random();
double y = Math.random();
return x*x + y*y < 1;
}
}).count();
System.out.println("Pi is roughly " + 4 * count / NUM_SAMPLES);
Yea, seems like you should be able to do this pretty easily. Sounds like you just need to parallelize an RDD of 10^6 integers simply so that you can create 10^6 DropResult objects into an RDD.
If this is the case, I don't think you need to explicitly create a list as above. It seems like you should just be able to use makeRange() the way the Spark Pi example does like this :
JavaRDD<DropResult> simCountRDD = spark.parallelize(makeRange(1,getSimCount())).map(new Function<Integer, DropResult>()
{
public DropResult call(Integer i) {
return pld.doDrop();
}
});

Java multiple instances of same class share the same running objects, but they shoud not

I'm doing a multithreaded Java optimization algorhithm which initiates various instances of the same subclass, for time improvement reason. This subclass have itself other subclasses.
The algorhthm searchs though the search space for an optimal solution, by means of random movements. So, if i run several instances of it, i should take advantage of my system's cores and improve the search widing the search space.
I've noticed that the first instance runs well, but others seems to share the running objects of the first, picking the information they hold, even when it has finished.
Thats not what i want; i want any of the instances be insulated for the others.
I'm using Executor Services:
Code:
ExecutorService executorService = Executors.newCachedThreadPool();
ExecutorCompletionService<float[][]> service = new ExecutorCompletionService<float[][]>(executorService);
IteratedGreedy[] ig = new IteratedGreedy[instances];
Future<float[][]>[] future = new Future[instances];
// launching instances:
for (int i=0; i<instances; i++)
{
path = "\\" + i + ".txt";
ig[i] = new IteratedGreedy(path);
future[i] = service.submit(ig[i]);
}
// retrieveing solutions:
for (int i=1; i<instances; i++)
{
solutions[i] = future[i].get();
}
As you may think, the IteratedGreedy function has its own sublcasses inside.
Any help is appreciated.
Problem is, somewhere in the code, theres a class with a global static variable:
static float[][] matrix;
And then, a method uses it:
SomeMethod()
{
int f = matrix[i][b];
}
The solution is to change the way the method obtains the object:
float[][] matrix;
SomeMethod(float[][] matrix)
{
int f = matrix[i][j];
}

Merge Queues in Java

I was wondering whats the best way to write a method to merge an ArrayQueue with another Queue without removing any elements from the q thats passed.
eg. queue1 = [1,2,3,4] and queue2 = [5,6,7,8,9,10].
When queue1.mergeQs(queue2) was called it would create queue1 = [1,5,2,6,3,7,4,8,9,10] whilst queue2 would remain [5,6,7,8,9,10]
public void mergeQs(ArrayQmerge q){}
This way seems harder to implement than if you were to pass both Queues and return a new merged Queue. Thanks.
Just to clarify, i'm looking for a method to interleave the elements from the two queues.
One detail that might help you is that private fields are visible between different object of the same class in Java. That means that as long as you only intend to merge queues of your own class, your code has full access to all internal fields, such as the array you use to store your elements.
For the simplest case, where all elements are stored in a linear array with the queue head being at index zero, something like this might be a start:
public void mergeQs(ArrayQmerge q) {
Object[] array = new Object[this.size() + q.size()];
int i;
int o;
// Interleave elements
for (i = 0, o = 0; i < this.size() && i < q.size(); ++i) {
array[o++] = this.array[i];
array[o++] = q.array[i];
}
// Copy the remaining elements
while (i < this.size()) {
array[o++] = this.array[i++];
}
while (i < q.size()) {
array[o++] = q.array[i++];
}
this.array = array;
}
You can create a new Queue locally in the merge method, then assign your class's queue to the local version.
Since you are using your own homebrew ArrayQueue then this is conjecture.
Creating a new queue and returning is as I think you already say is way easier, and more efficient, as inserting elements into an Array backed structure will involve shuffling the rest of the elements down one position for each insert op.
An alternative is to implement public void mergeQs(ArrayQmerge q) by swapping out the underlying array you have backing it. So you get the same easy implementation as returning a new Queue but with the same in place side effect.

Better practice to re-instantiate a List or invoke clear()

Using Java (1.6) is it better to call the clear() method on a List or just re-instantiate the reference?
I have an ArrayList that is filled with an unknown number of Objects and periodically "flushed" - where the Objects are processed and the List is cleared. Once flushed the List is filled up again. The flush happens at a random time. The number within the List can potentially be small (10s of Objects) or large (millions of objects).
So is it better to have the "flush" call clear() or new ArrayList() ?
Is it even worth worrying about this sort of issues or should I let the VM worry about it? How could I go about looking at the memory footprint of Java to work this sort of thing out for myself?
Any help greatly appreciated.
The main thing to be concerned about is what other code might have a reference to the list. If the existing list is visible elsewhere, do you want that code to see a cleared list, or keep the existing one?
If nothing else can see the list, I'd probably just clear it - but not for performance reasons; just because the way you've described the operation sounds more like clearing than "create a new list".
The ArrayList<T> docs don't specify what happens to the underlying data structures, but looking at the 1.7 implementation in Eclipse, it looks like you should probably call trimToSize() after clear() - otherwise you could still have a list backed by a large array of null references. (Maybe that isn't an issue for you, of course... maybe that's more efficient than having to copy the array as the size builds up again. You'll know more about this than we do.)
(Of course creating a new list doesn't require the old list to set all the array elements to null... but I doubt that that will be significant in most cases.)
The way you are using it looks very much like how a Queue is used. When you work of the items on the queue they are removed when you treat them.
Using one of the Queue classes might make the code more elegant.
There are also variants which handle concurrent updates in a predictable way.
I think if the Arraylist is to be too frequently flushed,like if it's run continuously in loop or something then better use clear if the flushing is not too frequent then you may create a new instance.Also since you say that elements may vary from 10 object to millions you can probably go for an in-between size for each new Arraylist your creating so that the arraylist can avoid resizing a lot of time.
There is no advantage for list.clear() than new XXList.
Here is my investigation to compare performance.
import java.util.ArrayList;
import java.util.List;
public class ClearList {
public static void testClear(int m, int n) {
List<Integer> list = new ArrayList<>();
long start = System.currentTimeMillis();
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
list.add(Integer.parseInt("" + j + i));
}
list.clear();
}
System.out.println(System.currentTimeMillis() - start);
}
public static void testNewInit(int m, int n) {
List<Integer> list = new ArrayList<>();
long start = System.currentTimeMillis();
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
list.add(Integer.parseInt("" + j + i));
}
list = new ArrayList<>();
}
System.out.println(System.currentTimeMillis() - start);
}
public static void main(String[] args) {
System.out.println("clear ArrayList:");
testClear(991000, 100);
System.out.println("new ArrayList:");
testNewInit(991000, 100);
}
}
/*--*
* Out:
*
* clear ArrayList:
* 8391
* new ArrayList:
* 6871
*/

Categories