Combinatorics algorithm parallelization - java

I'm writing the program which is calculates C(n, k) combinations and have big difference between n and k (e. g. n=39, k=13 -> 8122425444 combinations). Also, I need to make some calculations with every combination in realtime. The question is how can I divide my algorithm to several threads to make it faster?
public void getCombinations(List<Item> items) {
int n = items.size();
int k = 13;
int[] res = new int[k];
for (int i = 1; i <= k; i++) {
res[i - 1] = i;
}
int p = k;
while (p >= 1) {
//here I make a Set from items in List by ids in res[]
Set<Item> cards = convert(res, items);
//some calculations
if (res[k - 1] == n) {
p--;
} else {
p = k;
}
if (p >= 1) {
for (int i = k; i >= p; i--) {
res[i - 1] = res[p - 1] + i - p + 1;
}
}
}
}
private Set<Item> convert(int[] res, List<Item> items) {
Set<Item> set = new TreeSet<Item>();
for (int i : res) {
set.add(items.get(i - 1));
}
return set;
}

If you're using JDK 7 then you could use fork/join to divide and conquer this algorithm.
If you want to keep things simple then I would just get a thread to compute a subset of the input and use a CountDownLatch until all threads have completed. The number of threads depends on your CPU.
You could also use Hadoop's map/reduce if you think the input will grow so you can compute on several computers. You will need to normalise it as a map/reduce operation - but look at examples.

The simplest way to split combinations is to have combinations of combinations. ;)
For each possible "first" value you can create a new task in a thread pool. Or you can create each possible pair of "first" and "second" in as a new task. or three etc. You only need to create as many tasks as you have cpus, so you don't need to go over board.
e.g. say you want to create all possible selections of 13 from 39 items.
for(Item item: items) {
List<Item> items2 = new ArrayList<Item>(items);
items2.remove(item);
// create a task which considers all selections of 12 from 38 (plus item)
createCombinationsOf(item, item2, 12);
}
This creates roughly equal work for 39 cpus which may be more than enough. If you want more create pairs (39*38/2) of those.

Your question is quite vague.
What problem are you having right now? Implementing the divide and conquer part of the algorithm (threading, joining, etc), or figuring out how to divide a problem into it's sub-parts.
The later should be your first step. Do you know how to break your original problem into several smaller problems (that can then be dispatched to Executor threads or a similar mechanism to be processed), and how to join the results?

I have been working on some code that works with combinatoric sets of this size. Here are a few suggestions for getting output in a reasonable amount of time.
Instead of building a list of combinations and then processing them, write your program to take a rank for a combination. You can safely assign signed 64 bit long values to each combination for all k values up to n = 66. This will let you easily break up the number system and assign it to different threads/hardware.
If your computation is simple, you should look at using OpenCL or CUDA to do the work. There are a couple of options for doing this. Rootbeer and Aparapi are options for staying in Java and letting a library take care of the GPU details. JavaCL is a nice binding to OpenCL, if you do not mind writing kernels directly in C99. AWS has GPU instance for doing this type of work.
If you are going to collect a result for each combination, you are really going to need to consider storage space. For your example of C(39,13), you would need a little under 61 Gigs just to store a long for each combination. You need a good strategy for dealing with datasets of this size.
If you are trying to roll up this data into a simple result for the entire set of combinations, then follow #algolicious' suggestion and look at map/reduce to solve this problem.
If you really need answers for each combination, but a little error is OK, you may want to look at using AI algorithms or a linear solver to compress the data. Be aware that these techniques will only work if there is something to learn in the resulting data.
If some error will not work, but you need every answer, you may want to just consider recomputing it each time you need it, based on the element's rank.

Related

More efficient alternative to these "for" loops?

I'm taking an introductory course to Java and one of my latest projects involve making sure an array doesn't contain any duplicate elements (has distinct elements). I used a for loop with an inner for loop, and it works, but I've heard that you should try to avoid using many iterations in a program (and other methods in my classes have a fair number of iterations as well). Is there any efficient alternative to this code? I'm not asking for code of course, just "concepts." Would there potentially be a recursive way to do this? Thanks!
The array sizes are generally <= 10.
/** Iterates through a String array ARRAY to see if each element in ARRAY is
* distinct. Returns false if ARRAY contains duplicates. */
boolean distinctElements(String[] array) { //Efficient?
for (int i = 0; i < array.length; i += 1) {
for (int j = i + 1; j < array.length; j += 1) {
if (array[i] == array[j]) {
return false;
}
}
} return true;
}
"Efficiency" is almost always a trade-off. Occasionally, there are algorithms that are simply better than others, but often they are only better in certain circumstances.
For example, this code above: it's got time complexity O(n^2).
One improvement might be to sort the strings: you can then compare the strings by comparing if an element is equal to its neighbours. The time complexity here is reduced to O(n log n), because of the sorting, which dominates the linear comparison of elements.
However - what if you don't want to change the elements of the array - for instance, some other bit of your code relies on them being in their original order - now you also have to copy the array and then sort it, and then look for duplicates. This doesn't increase the overall time or storage complexity, but it does increase the overall time and storage, since more work is being done and more memory is required.
Big-oh notation only gives you a bound on the time ignoring multiplicative factors. Maybe you only have access to a really slow sorting algorithm: actually, it turns out to be faster just to use your O(n^2) loops, because then you don't have to invoke the very slow sort.
This could be the case when you have very small inputs. An oft-cited example of an algorithm that has poor time complexity but actually is useful in practice is Bubble Sort: it's O(n^2) in the worst case, but if you have a small and/or nearly-sorted array, it can actually be pretty darn fast, and pretty darn simple to implement - never forget the inefficiency of you having to write and debug the code, and to have to ask questions on SO when it doesn't work as you expect.
What if you know that the elements are already sorted, because you know something about their source. Now you can simply iterate through the array, comparing neighbours, and the time complexity is now O(n). I can't remember where I read it, but I once saw a blog post saying (I paraphrase):
A given computer can never be made to go quicker; it can only ever do less work.
If you can exploit some property to do less work, that improves your efficiency.
So, efficiency is a subjective criterion:
Whenever you ask "is this efficient", you have to be able to answer the question: "efficient with respect to what?". It might be space; it might be time; it might be how long it takes you to write the code.
You have to know the constraints of the hardware that you're going to run it on - memory, disk, network requirements etc may influence your choices.
You need to know the requirements of the user on whose behalf you are running it. One user might want the results as soon as possible; another user might want the results tomorrow. There is never a need to find a solution better than "good enough" (although that can be a moving goal once the user sees what is possible).
You also have to know what inputs you want it to be efficient for, and what properties of that input you can exploit to avoid unnecessary work.
First, array[i] == array[j] tests reference equality. That's not how you test String(s) for value equality.
I would add each element to a Set. If any element isn't successfully added (because it's a duplicate), Set.add(E) returns false. Something like,
static boolean distinctElements(String[] array) {
Set<String> set = new HashSet<>();
for (String str : array) {
if (!set.add(str)) {
return false;
}
}
return true;
}
You could render the above without a short-circuit like
static boolean distinctElements(String[] array) {
Set<String> set = new HashSet<>(Arrays.asList(array));
return set.size() == array.length;
}

library for integer factorization in java or scala

There are a lot of questions about how to implement factorization, however for production use, I would rather use an open source library to get something efficient and well tested right away.
The method I am looking for looks like this:
static int[] getPrimeFactors(int n)
it would return {2,2,3} for n=12
A library may also have an overload for handling long or even BigInteger types
The question is not about a particular application, it is about having a library which handles well this problem. Many people argue that different implementations are needed depending on the range of the numbers, in this regard, I would expect that the library select the most reasonable method at runtime.
By efficient I don't mean "world fastest" (I would not work on the JVM for that...), I just mean dealing with int and long range within a second rather than a hour.
It depends what you want to do. If your needs are modest (say, you want to solve Project Euler problems), a simple implementation of Pollard's rho algorithm will find factors up to ten or twelve digits instantly; if that's what you want, let me know, and I can post some code. If you want a more powerful factoring program that's written in Java, you can look at the source code behind Dario Alpern's applet; I don't know about a test suite, and it's really not designed with an open api, but it does have lots of users and is well tested. Most of the heavy-duty open-source factoring programs are written in C or C++ and use the GMP big-integer library, but you may be able to access them via your language's foreign function interface; look for names like gmp-ecm, msieve, pari or yafu. If those don't satisfy you, a good place to ask for more help is the Mersenne Forum.
If you want to solve your problem, rather than get what you are asking for, you want a table. You can precompute it using silly slow methods, store it, and then look up the factors for any number in microseconds. In particular, you want a table where the smallest factor is listed in an index corresponding to the number--much more memory efficient if you use trial division to remove a few of the smallest primes--and then walk your way down the table until you hit a 1 (meaning no more divisors; what you have left is prime). This will take only two bytes per table entry, which means you can store everything on any modern machine more hefty than a smartphone.
I can demonstrate how to create this if you're interested, and show how to check that it is correct with greater reliability than you could hope to achieve with an active community and unit tests of a complex algorithm (unless you ran the algorithm to generate this table and verified that it was all ok).
I need them for testing if a polynomial is primitive or not.
This is faster than trying to find the factors of all the numbers.
public static boolean gcdIsOne(int[] nums) {
int smallest = Integer.MAX_VALUE;
for (int num : nums) {
if (num > 0 && smallest < num)
smallest = num;
}
OUTER:
for (int i = 2; i * i <= smallest; i = (i == 2 ? 3 : i + 2)) {
for (int num : nums) {
if (num % i != 0)
continue OUTER;
}
return false;
}
return true;
}
I tried this function in scala. Here is my result:
def getPrimeFactores(i: Int) = {
def loop(i: Int, mod: Int, primes: List[Int]): List[Int] = {
if (i < 2) primes // might be i == 1 as well and means we are done
else {
if (i % mod == 0) loop(i / mod, mod, mod :: primes)
else loop(i, mod + 1, primes)
}
}
loop(i, 2, Nil).reverse
}
I tried it to be as much functional as possible.
if (i % mod == 0) loop(i / mod, mod, mod :: primes) checks if we found a divisor. If we did we add it to primes and divide i by mod.
If we did not find a new divisor, we just increase the divisor.
loop(i, 2, Nil).reverse initializes the function and orders the result increasingly.

Solving 3D knapsack using bruteforce in Java

I want to solve a 3-dimesional knapsack problem.
I got a number of boxes with a different width, height, length and value. I got a specified space and I want to put the boxes into that space, such that I will get the optimal profit. I would like to do it with using bruteforce.
I'm programming in Java.
I tried to do it with recursion, so:
public void solveBruteforce(double freeX, double freeY, double freeZ) {
for(int i = 0; i < numOfBoxes; i++) {
for(int j = 0; j < BoxObject.numOfVariations; j++) {
if(possible to place box) {
place(box);
add(value);
solveBruteforce(newX, newY, newZ);
}
}
}
remove(box);
remove(value);
}
But I will get the problem that each line has a different free x, y and z.
Could someone help me to find another way to do it?
First thing is, use an octree to keep track of where things are in the space. Occupancy tree is a 3D 4-out-degree tree, with occupancy flags at every node, dividing your space into a place that is efficient to search over. This would be useful if you want to use some kind of heuristic search to place the boxes, and even if you are trying all possibilities. It can shortcut the forbidden (crowded) placements.
Brute force will take a long time. But if that's what you want you need to define an ordering for trying out permutations of placements.
Since you will need many iterations, recursion is not so great since you will get a stack overflow.
A first-draft alternative would involve a greedy algorithm. Take the box that maximizes your profit (say, the largest), place that, then take the next largest box, and find the best fit for that, and so on.
But, say you wanted to try all possible combinations:
def maximize_profit(boxes,space):
max_profit = 0
best_fits = list()
while(Arranger.hasNext()):
a_fit,a_profit = Arranger.next(boxes,space)
if (a_profit == max_profit):
best_fits.append(a_fit)
elif (a_profit > max_profit):
max_profit = a_profit
best_fits = [ a_profit ]
return best_fits, max_profit
For ideas on how to define the Arranger, think about choosing #{box} slots from #{space} possibilities, respecting arrangements that are identical w.r.t. symmetry. Alternately maybe a "flood fill" method will give you ideas.

Representing a 100K X 100K matrix in Java

How can I store a 100K X 100K matrix in Java?
I can't do that with a normal array declaration as it is throwing a java.lang.OutofMemoryError.
The Colt library has a sparse matrix implementation for Java.
You could alternatively use Berkeley DB as your storage engine.
Now if your machine has enough actual RAM (at least 9 gigabytes free), you can increase the heap size in the Java command-line.
If the vast majority of entries in your matrix will be zero (or even some other constant value) a sparse matrix will be suitable. Otherwise it might be possible to rewrite your algorithm so that the whole matrix doesn't exist simultaneously. You could produce and consume one row at a time, for example.
Sounds like you need a sparse matrix. Others have already suggested good 3rd party implementations that may suite your needs...
Depending on your applications, you could get away without a third-party matrix library by just using a Map as a backing-store for your matrix data. Kind of...
public class SparseMatrix<T> {
private T defaultValue;
private int m;
private int n;
private Map<Integer, T> data = new TreeMap<Integer, T>();
/// create a new matrix with m rows and n columns
public SparseMatrix(int m, int n, T defaultValue) {
this.m = m;
this.n = n;
this.defaultValue = defaultValue;
}
/// set value at [i,j] (row, col)
public void setValueAt(int i, int j, T value) {
if (i >= m || j >= n || i < 0 || j < 0)
throw new IllegalArgumentException(
"index (" + i + ", " +j +") out of bounds");
data.put(i * n + j, value);
}
/// retrieve value at [i,j] (row, col)
public T getValueAt(int i, int j) {
if (i >= m || j >= n || i < 0 || j < 0)
throw new IllegalArgumentException(
"index (" + i + ", " +j +") out of bounds");
T value = data.get(i * n + j);
return value != null ? value : defaultValue;
}
}
A simple test-case illustrating the SparseMatrix' use would be:
public class SparseMatrixTest extends TestCase {
public void testMatrix() {
SparseMatrix<Float> matrix =
new SparseMatrix<Float>(100000, 100000, 0.0F);
matrix.setValueAt(1000, 1001, 42.0F);
assertTrue(matrix.getValueAt(1000,1001) == 42.0);
assertTrue(matrix.getValueAt(1001,1000) == 0.0);
}
}
This is not the most efficient way of doing it because every non-default entry in the matrix is stored as an Object. Depending on the number of actual values you are expecting, the simplicity of this approach might trump integrating a 3rd-party solution (and possibly dealing with its License - again, depending on your situation).
Adding matrix-operations like multiplication to the above SparseMatrix implementation should be straight-forward (and is left as an exercise for the reader ;-)
100,000 x 100,000 = 10,000,000,000 (10 billion) entries. Even if you're storing single byte entries, that's still in the vicinity of 10 GB - does your machine even have that much physical memory, let alone have a will to allocate that much to a single process?
Chances are you're going to need to look into some kind of a way to only keep part of the matrix in memory at any given time, and the rest buffered on disk.
There are a number possible solutions depending on how much memory you have, how sparse the array actually is, and what the access patterns are going to be.
If the calculation of 100K * 100K * 8 is less than the amount of physical memory on your machine for use by the JVM, a simple non-sparse array is viable solution.
If the array is sparse, with (say) 75% or more of the elements being zero, then you can save space by using a sparse array library. Various alternatives have been suggested, but in all cases, you still need to work out if this is going to give you enough savings. Figure out how many non-zero elements there are going to be, multiply that by 8 (to give you doubles) and (say) 4 to account for the overheads of the sparse array. If that is less than the amount of physical memory that you can make available to the JVM, then sparse arrays are a viable solution.
If sparse and non-sparse arrays (in memory) won't work, things will get more complicated, and the viability of any solution will depend on the access patterns for the array data.
One approach is to represent the array as a file that is mapped into memory in the form of a MappedByteBuffer. Assuming that you don't have enough physical memory to store the entire file in memory, you are going to be hitting the virtual memory system hard. So it is best if your algorithm only needs to operate on contiguous sections of the array at any time. Otherwise, you'll probably die from swapping.
A second approach is a variation of the first. Map the array/file a section at a time, and when you are done, unmap and move to the next section. This only works if the algorithm works on the array in sections.
A third approach is to represent the array using a light-weight database like BDB. This will be slower than any in-memory solution because reading array elements will translate into disc accesses. But if you get it wrong it won't kill the system like the memory mapped approach will. (And if you do this on Linux/Unix, the system's disc block cache may speed things up, depending on your algorithm's array access patterns)
A fourth approach is to use a distributed memory cache. This replaces disc i/o with network i/o, and it is hard to say whether this is a good or bad thing.
A fifth approach is to analyze your algorithm and see if it is amenable to implementing as a distributed algorithm; e.g. with sections of the array and corresponding parts of the algorithm on different machines.
You can upgrade to this machine:
http://www.azulsystems.com/products/compute_appliance.htm
864 processor cores and 768 GB of memory, only costs a single family house somewhere.
Well, I'd suggest that you increase the memory in your jvm but you've going to need a lot of memory, as you're talking about 10 billion items. It's (barely) possible with lots of memory or a clustered jvm, but that's probably the wrong answer.
You're getting the outOfmemory because if you declare int[1000], the memory is allocated immediately (additionally doubles take up more space than ints-an int representation will also save you space). Maybe you can substitute a more efficient implementation of your array (if you have many empty entries lookup "sparse matrix" representations).
You could store pieces in an outside system, like memcached or memory-mapped buffers.
There are lots of good suggestions here, maybe if you posted a more detailed description of the problem you're trying to solve people could be more specific.
You should try an "external" package to handle matrices, I never did that though, maybe something like jama.
Unless you have 100K x 100K x 8 ~ 80GB of memory, you cannot create this matrix in memory. You can create this matrix on disk and access it using memory mapping. However, using this approach will be very slow.
What are you trying to do? You may find that representing your data in a different way will be much more efficient.

Divide a list of numbers into smaller list with "sum" approximately same

I execute around 2000 tests on grid, each test being run as separate task on grid. The tests do have rather big startup time. Total execution takes 500 hours, finishes in less than 10 hours on 60 node SunGridEngine. Runtime of tests vary from 5 minutes to 90 minutes. Combining tests without much intelligence gave some performance gain. I would like to create "tasks" that are approximately equal size. How can I do so?
(what we do now: Sorting all tests and keep adding till the sum of execution time is approximately 5 hours. Looking for some thing better )
Doing this optimally is NP-complete. This is a variation of the partition problem, which is a special case of the subset sum problem, which itself a special case of the knapsack problem.
In your case you probably don't need an exact solution, so you can probably use some heuristics to get something "good enough" in a reasonable amount of time. See the Methods section of the partition problem page for a description of some approaches.
What you are looking for is the Partition problem for k sets.
There is som literature about k=3, called the 3-partition problem. This is NP complete in the strong sense.
There are many heuristics that should give an approximate result quickly.
I suggest you start here: http://en.wikipedia.org/wiki/Partition_problem
Hope this helps.
This is a version of the subset-sum problem, and is NP-complete. Your best bet is to employ some subset-sum heuristics.
Your problem sounds a little like a shop scheduling problem. There are all kinds of different sequencing approaches some of which are described here. Sorting in increasing order of processing time, for instance, will minimized the mean waiting time and a whole bunch of other measures. If you elaborate a bit more on the objective, the setup times, the processing time, and any interdependence that would help.
Looking at the links Laurence posted I thought I would try whipping something up. The algorithm is to assign the longest test to the shortest task list (repeat until all the tests are assigned). Using your examples and random test times the std deviation was pretty low, less than 2 minutes in running it several times (code in C#, but nothing that wouldn't be trivial to convert):
private static void BuildJobs()
{
PriorityQueue<Task> tasks = new PriorityQueue<Task>();
//create a task list for each node
for (int i = 0; i < 60; i++)
{
Task t = new Task();
tasks.Enqueue(t);
}
//get the list of tests, in order from longest to shortest
int[] testList = new int[2000];
for (int i = 0; i < testList.Length; i++)
{
testList[i] = random.Next(5, 90);
}
Array.Sort<int>(testList);
Array.Reverse(testList);
// add the longest running test to the current shortest task list
foreach (int time in testList)
{
Task t = tasks.Dequeue();
t.addTest(time);
tasks.Enqueue(t);
}
Debug.WriteLine(CalculateStdDev(tasks));
}

Categories