I need to calculate all permutations of a collection and i have a code for that but the problem is that it is linear and takes a lot of time.
public static <E> Set<Set<E>> getAllCombinations(Collection<E> inputSet) {
List<E> input = new ArrayList<>(inputSet);
Set<Set<E>> ret = new HashSet<>();
int len = inputSet.size();
// run over all numbers between 1 and 2^length (one number per subset). each bit represents an object
// include the object in the set if the corresponding bit is 1
for (int i = (1 << len) - 1; i > 0; i--) {
Set<E> comb = new HashSet<>();
for (int j = 0; j < len; j++) {
if ((i & 1 << j) != 0) {
comb.add(input.get(j));
}
}
ret.add(comb);
}
return ret;
}
I am trying to make the computation run in parallel.
I though of the option to writing the logic using recursion and then parallel execute the recursion call but i am not exactly sure how to do that.
Would appreciate any help.
There is no need to use recursion, in fact, that might be counter-productive. Since the creation of each combination can be performed independently of the others, it can be done using parallel Streams. Note that you don’t even need to perform the bit manipulations by hand:
public static <E> Set<Set<E>> getAllCombinations(Collection<E> inputSet) {
// use inputSet.stream().distinct().collect(Collectors.toList());
// to get only distinct combinations
// (in case source contains duplicates, i.e. is not a Set)
List<E> input = new ArrayList<>(inputSet);
final int size = input.size();
// sort out input that is too large. In fact, even lower numbers might
// be way too large. But using <63 bits allows to use long values
if(size>=63) throw new OutOfMemoryError("not enough memory for "
+BigInteger.ONE.shiftLeft(input.size()).subtract(BigInteger.ONE)+" permutations");
// the actual operation is quite compact when using the Stream API
return LongStream.range(1, 1L<<size) /* .parallel() */
.mapToObj(l -> BitSet.valueOf(new long[] {l}).stream()
.mapToObj(input::get).collect(Collectors.toSet()))
.collect(Collectors.toSet());
}
The inner stream operation, i.e. iterating over the bits, is too small to benefit from parallel operations, especially as it would have to merge the result into a single Set. But if the number of combinations to produce is sufficiently large, running the outer stream in parallel will already utilize all CPU cores.
The alternative is not to use a parallel stream, but to return the Stream<Set<E>> itself instead of collecting into a Set<Set<E>>, to allow the caller to chain the consuming operation directly.
By the way, hashing an entire Set (or lots of them) can be quite expensive, so the cost of the final merging step(s) are likely to dominate the performance. Returning a List<Set<E>> instead can dramatically increase the performance. The same applies to the alternative of returning a Stream<Set<E>> without collecting the combinations at all, as this also works without hashing the Sets.
Related
So I have an array
static final int N = maxNofActiveThreads;
static final int[]arr = new int[N*nofEntries];
Where the N threads write to mutually exclusive regions of the array.
I should now like to add a monitoring thread that will periodically collect the results for decision-making by simply summing up all the threads' tables.
I.e. in pseudo-code
int[] snapshot = arr[0 : nofEntries] + arr[nofEntries : 2*nofEntries] + ... + arr[(N-1) * nofEntries : N*nofEntries]
The obvious choice would be to simply create
int[] snapshot = new int[nofEntries]
System.arrayCopy(arr,0,snapshot,0,nofEntries);
and then walking through the rest of arr, adding one value at a time.
Is there a smarter/more efficient way?
Oh, and we don't care if we miss an update every so often, it will eventually show up on the next pass and that's fine. No need for any synchronisation.
(I should also mention that the project I'm working on requires me to use Java 7.)
The best I can imagine is to use the Arrays class static method parallelSetAll. As it tries to parallelize operation it may be more efficient. Javadoc for another parallel method says:
... computation is usually more efficient than sequential loops for large arrays.
So you could use:
private int tsum(int index) {
int val = 0;
for (int t=0; t<N; t++) {
val += arr[index + t * nofEntries];
}
return val;
}
and:
IntUnaryOperator generator = (x) -> tsum(x) ;
int[] snapshot = new int[nofEntries];
Arrays.parallelSetAll(snapshot, generator);
Does the Arraylist object store the last requested value in memory to access it faster the next time? Or do I need to do this myself?
Or more concretely, in terms of performance, is it better to do this :
for (int i = 0; i < myArray.size(); i++){
int value = myArray.get(i);
int result = value + 2 * value - 5 / value;
}
Instead of doing this :
for (int i = 0; i < myArray.size(); i++)
int result = myArray.get(i) + 2 * myArray.get(i) - 5 / myArray.get(i);
In terms of performance, it doesn't matter one bit. No, ArrayList doesn't cache anything, although the JITted end result could be a different issue.
If you're wondering which version to use, use the first one. It's clearer.
You can answer your (first) question yourself by looking into the actual source:
public E get(int index) {
rangeCheck(index);
return elementData(index);
}
So: No, there is no caching taking place but you can also see that there is no much of an impact in terms of performance because the get method is essentially just an access to an array.
But it's still good to avoid multiple calls for some reasons:
int result = value + 2 * value - 5 / value is easier to understand (i.e. realizing that you use the same value three times in your calculation)
If you later decide to change the underlying list (e.g. to a LinkedList) you might end up with an impact on performance and then have to change your code to get around it.
As long as you don't synchronize the access to the list, repeated calls of get(index) might actually return different values if between two calls a call of set(index, value) has taken place (even in small souce blocks like this, it's possible to happen - BTST)
The second point has also a consequence in terms of how to access all values of a list, that leads to the decision to avoid list.get(i) altogether if you're going to iterate over all elements in a list. In that case it's better to use the Iterator or streams:
You code would then look like this:
Iterator it = myArray.iterator();
while (it.hasNext()) {
int value = it.next();
int result = value + 2 * value - 5 / value;
}
LinkedList is very slow when trying to access elements in it by specific index but can iteratre quite fast from one element to the next, so the Iterator returned by LinkedList makes use of that while the Iterator returned by ArrayList simply accesses the internal array (without the need to do the repeated range check calls you can see in the get-method above
I have a function that processes vectors. Size of input vector can be anything up to few millions. Problem is that function can only process vectors that are no bigger than 100k elements without problems.
I would like to call function in smaller parts if vector has too many elements
Vector<Stuff> process(Vector<Stuff> input) {
Vector<Stuff> output;
while(1) {
if(input.size() > 50000) {
output.addAll(doStuff(input.pop_front_50k_first_ones_as_subvector());
}
else {
output.addAll(doStuff(input));
break;
}
}
return output;
}
How should I do this?
Not sure if a Vector with millions of elements is a good idea, but Vector implements List, and thus there is subList which provides a lightweight (non-copy) view of a section of the Vector.
You may have to update your code to work with the interface List instead of only the specific implementation Vector, though (because the sublist returned is not a Vector, and it is just good practice in general).
You probably want to rewrite your doStuff method to take a List rather than a Vector argument,
public Collection<Output> doStuff(List<Stuff> v) {
// calculation
}
(and notice that Vector<T> is a List<T>)
and then change your process method to something like
Vector<Stuff> process(Vector<Stuff> input) {
Vector<Stuff> output;
int startIdx = 0;
while(startIdx < input.size()) {
int endIdx = Math.min(startIdx + 50000, input.size());
output.addAll(doStuff(input.subList(startIdx, endIdx)));
startIdx = endIdx;
}
}
this should work as long as the "input" Vector isn't being concurrently updated during the running of the process method.
If you can't change the signature of doStuff, you're probably going to need to wrap a new Vector around the result of subList,
output.addAll(doStuff(new Vector<Stuff>(input.subList(startIdx, endIdx)))));
What's the most efficient way to make an array of a given length, with each element containing its subscript?
Possible description with my dummy-level code:
/**
* The IndGen function returns an integer array with the specified dimensions.
*
* Each element of the returned integer array is set to the value of its
* one-dimensional subscript.
*
* #see Modeled on IDL's INDGEN function:
* http://idlastro.gsfc.nasa.gov/idl_html_help/INDGEN.html
*
* #params size
* #return int[size], each element set to value of its subscript
* #author you
*
* */
public int[] IndGen(int size) {
int[] result = new int[size];
for (int i = 0; i < size; i++) result[i] = i;
return result;
}
Other tips, such as doc style, welcome.
Edit
I've read elsewhere how inefficient a for loop is compared to other methods, as for example in Copying an Array:
Using clone: 93 ms
Using System.arraycopy: 110 ms
Using Arrays.copyOf: 187 ms
Using for loop: 422 ms
I've been impressed by the imaginative responses to some questions on this site, e.g., Display numbers from 1 to 100 without loops or conditions. Here's an answer that might suggest some methods:
public class To100 {
public static void main(String[] args) {
String set = new java.util.BitSet() {{ set(1, 100+1); }}.toString();
System.out.append(set, 1, set.length()-1);
}
}
If you're not up to tackling this challenging problem, no need to vent: just move on to the next unanswered question, one you can handle.
Since it's infeasible to use terabytes of memory at once, and especially to do any calculation with them simultaneously, you might considering using a generator. (You were probably planning to loop over the array, right?) With a generator, you don't need to initialize an array (so you can start using it immediately) and almost no memory is used (O(1)).
I've included an example implementation below. It is bounded by the limitations of the long primitive.
import java.util.Iterator;
import java.util.NoSuchElementException;
public class Counter implements Iterator<Long> {
private long count;
private final long max;
public Counter(long start, long endInclusive) {
this.count = start;
this.max = endInclusive;
}
#Override
public boolean hasNext() {
return count <= max;
}
#Override
public Long next() {
if (this.hasNext())
return count++;
else
throw new NoSuchElementException();
}
#Override
public void remove() {
throw new UnsupportedOperationException();
}
}
Find a usage demonstration below.
Iterator<Long> i = new Counter(0, 50);
while (i.hasNext()) {
System.out.println(i.next()); // Prints 0 to 50
}
only thing i ca think of is using "++i" instead of "i++" , but i think the java compiler already has this optimization .
other than that, this is pretty much the best algorithm there is.
you could make a class that acts as if it has an array yet it doesn't , and that it will simply return the same number that it gets (aka the identity function) , but that's not what you've asked for.
As other have said in their answers, your code is already close to the most efficient that I can think of, at least for small sized arrays. If you need to create those arrays a lot of times and they are very big, instead of continuously iterating in a for loop you could create all the arrays once, and then copy them. The copy operation will be faster than iterating over the array if the array is very big. It would be something like this (in this example for a maximum of 1000 elements):
public static int[][] cache = {{0},{0,1},{0,1,2},{0,1,2,3},{0,1,2,3,4}, ..., {0,1,2,...,998,999}};
Then, from the code where you need to create those arrays a lot of times, you would use something like this:
int[] arrayOf50Elements = Arrays.copyOf(cache[49], 50);
Note that this way you are using a lot of memory to improve the speed. I want to emphasize that this will only be worth the complication when you need to create those arrays a lot of times, the arrays are very big, and maximum speed is one of your requirements. In most of the situations I can think of, the solution you proposed will be the best one.
Edit: I've just seen the huge amount of data and memory you need. The approach I propose would require memory of the order of n^2, where n is the maximum integer you expect to have. In this case that's impractical, due to the monstrous amount of memory you would need. Forget about this. I leave the post because maybe it is useful for others.
I have some events, where each of them has a probability to happen, and a weight if they do. I want to create all possible combinations of probabilities of events, with the corresponding weights. In the end, I need them sorted in weight order. It is like generating a probability tree, but I only care about the resulting leaves, not which nodes it took to get them. I don't need to look up specific entries during the creation of the end result, just to create all the values and sort them by weight.
There will be only about 5-15 events,but since there is 2^n resulting possibilities with n events, and this is to be done very often, I don’t want it to take unnecessarily long time. Speed is much more important than the amount of storage used.
The solution I came up with works but is slow. Any idea for a quicker solution or some ideas for improvement?
class ProbWeight {
double prob;
double eventWeight;
public ProbWeight(double aProb, double aeventWeight) {
prob = aProb;
eventWeight = aeventWeight;
}
public ProbWeight(ProbWeight aCellProb) {
prob = aCellProb.getProb();
eventWeight = aCellProb.geteventWeight();
}
public double getProb(){
return prob;
}
public double geteventWeight(){
return eventWeight;
}
public void doesHappen(ProbWeight aProb) {
prob*=aProb.getProb();
eventWeight += aProb.geteventWeight();
}
public void doesNotHappen(ProbWeight aProb) {
prob*=(1-aProb.getProb());
}
}
//Data generation for testing
List<ProbWeight> dataList = new ArrayList<ProbWeight>();
for (int i =0; i<5; i++){
ProbWeight prob = new ProbWeight(Math.random(), 10*Math.random(), i);
dataList.add(prob);
}
//The list where the results will end up
List<ProbWeight> resultingProbList = new ArrayList<ProbWeight>();
// a temporaty list to avoid modifying a list while looping through it
List<ProbWeight> tempList = new ArrayList<ProbWeight>();
resultingProbList.add(dataList.remove(0));
for (ProbWeight data : dataList){ //for each event
//go through the already created event combinations and create two new for each
for(ProbWeight listed: resultingProbList){
ProbWeight firstPossibility = new ProbWeight(listed);
ProbWeight secondPossibility = new ProbWeight(listed);
firstPossibility.doesHappen(data);
secondPossibility.doesNotHappen(data);
tempList.add(firstPossibility);
tempList.add(secondPossibility);
}
resultingProbList = new ArrayList<ProbWeight>(tempList);
}
// Then sort the list by weight using sort and a comparator
It is 50% about choosing an appropriate data structure and 50% about the algorithm. Data structure - I believe TreeBidiMap will do the magic for you. You will need to implement 2 Comparators - 1 for the weight and another for the probability.
Algorithm - trivial.
Good luck!
just a few tricks to try to speed up your code:
- try to avoid non necessary objects allocation
- try to use the right constructor for your collections , in your code sample it seems that you already know the size of the collections, so use it as a parameter in the constructors to prevent useless collections resizing (and gc calls)
You may try to use a Set instead of List in order to see the ordering made on the fly.....
HTH
jerome