Bug: parameter 'initialCapacity' of ConcurrentHashMap's construct method?

Bug: parameter 'initialCapacity' of ConcurrentHashMap's construct method? - java

one of the construct method of java.util.concurrent.ConcurrentHashMap:
public ConcurrentHashMap(int initialCapacity) {
if (initialCapacity < 0)
throw new IllegalArgumentException();
int cap = ((initialCapacity >= (MAXIMUM_CAPACITY >>> 1)) ?
MAXIMUM_CAPACITY :
tableSizeFor(initialCapacity + (initialCapacity >>> 1) + 1));
this.sizeCtl = cap;
}
What does the parameter for method 'tableSizeFor(...)' mean?
initialCapacity + (initialCapacity >>> 1) + 1
I think the parameter should be like :
(int)(1.0 + (long)initialCapacity / LOAD_FACTOR)
or just:
initialCapacity
I think the parameter expression is wrong, at least is a bug.Did I misunderstand something?
I send a bug report to OpenJDK, seems they officially confirmed it is most likely a bug: https://bugs.openjdk.java.net/browse/JDK-8202422
Update: Doug Lea commented on the bug,seems that he agree it is a bug.

I strongly suppose it’s an optimization trick.
You’re on to the correct thought. The constructor you cite uses a the default load factor of 0.75, so to accommodate initialCapacity elements the hash table size needed to be at least
initialCapacity / 0.75
(roughly the same as multiplying by 1.3333333333). However floating-point divisions are expensive (a slight bit, not bad). And we would additionally need to round up to an integer. I guess that an integer division would already help
(initialCapacity * 4 + 2) / 3
(the + 2 is for making sure that the result is rounded up; the * 4 ought to be cheap since it can be implemented as a left shift). The implementors do even better: shifts are a lot cheaper than divisions.
initialCapacity + (initialCapacity >>> 1) + 1
This is really multiplying by 1.5, so is giving us a result that will often be greater than needed, but it’s fast. The + 1 is to compensate for the fact that the “multiplication” rounded down.
Details: the >>> is an unsigned right shift, filling a zero into the leftmost position. Already knowing that initialCapacity was non-negative this gives the same result as a division by 2, ignoring the remainder.
Edit: I may add that tableSizeFor rounds up to a power of 2, so most often the same power of 2 will be the final result even when the first calculation gave a slightly greater result than needed. For example, if you ask for capacity for 10 elements (to keep the calculation simple), table size 14 would be enough, where the formula yields 16. But the 14 would be rounded up to a power of 2, so we get 16 anyway, so in the end there is no difference. If you asked for room for 12 elements, size 16 would still suffice, but the formula yields 19, which is then rounded up to 32. This is the more unusual case.
Further edit: Thank you for the information in the comments that you have submitted this as a JDK bug and for providing the link: https://bugs.openjdk.java.net/browse/JDK-8202422. The first comment by Marin Buchholz agrees with you:
Yes, there is a bug here. The one-arg constructor effectively uses a
load-factor of 2/3, not the documented default of 3/4…
I myself would not have considered this a bug unless you regard it as a bug that you occasionally get a greater capacity than you asked for. On the other hand you are right, of course (in your exemplarily terse bug report) that there is an inconsistency: You would expect new ConcurrentHashMap(22) and new ConcurrentHashMap(22, 0.75f, 1) to give the same result since the latter just gives the documented default load factor/table density; but the table sizes you get are 64 from the former and 32 from the latter.

When you say (int)(1.0 + (long)initialCapacity / LOAD_FACTOR), it makes sense for HashMap, not for ConcurrentHashMap (not in the same sense it does for HashMap).
For HashMap, capacity is the number of buckets before a resize happens, for ConcurrentHashMap it's the number of entries before resize is performed.
Testing this is fairly easy:
private static <K, V> void debugResize(Map<K, V> map, K key, V value) throws Throwable {
Field table = map.getClass().getDeclaredField("table");
AccessibleObject.setAccessible(new Field[] { table }, true);
Object[] nodes = ((Object[]) table.get(map));
// first put
if (nodes == null) {
map.put(key, value);
return;
}
map.put(key, value);
Field field = map.getClass().getDeclaredField("table");
AccessibleObject.setAccessible(new Field[] { field }, true);
int x = ((Object[]) field.get(map)).length;
if (nodes.length != x) {
++currentResizeCalls;
}
}
public static void main(String[] args) throws Throwable {
// replace with new ConcurrentHashMap<>(1024) to see a different result
Map<Integer, Integer> map = new HashMap<>(1024);
for (int i = 0; i < 1024; ++i) {
debugResize(map, i, i);
}
System.out.println(currentResizeCalls);
}
For HashMap, resize happened once, for ConcurrentHashMap it didn't.
And the 1.5 growing is not a new thing at all, ArrayList has the same strategy.
The shifts, well, they are cheap(er) than usual math; but also because >>> is un-signed.

Related

Creating combinations of a BitSet

Assume I have a Java BitSet. I now need to make combinations of the BitSet such that only Bits which are Set can be flipped. i.e. only need combinations of Bits which are set.
For Eg. BitSet - 1010, Combinations - 1010, 1000, 0010, 0000
BitSet - 1100, Combination - 1100, 1000, 0100, 0000
I can think of a few solutions E.g. I can take combinations of all 4 bits and then XOR the combinations with the original Bitset. But this would be very resource-intensive for large sparse BitSets. So I was looking for a more elegant solution.

It appears that you want to get the power set of the bit set. There is already an answer here about how to get the power set of a Set<T>. Here, I will show a modified version of the algorithm shown in that post, using BitSets:
private static Set<BitSet> powerset(BitSet set) {
Set<BitSet> sets = new HashSet<>();
if (set.isEmpty()) {
sets.add(new BitSet(0));
return sets;
}
Integer head = set.nextSetBit(0);
BitSet rest = set.get(0, set.size());
rest.clear(head);
for (BitSet s : powerset(rest)) {
BitSet newSet = s.get(0, s.size());
newSet.set(head);
sets.add(newSet);
sets.add(s);
}
return sets;
}

You can perform the operation in a single linear pass instead of recursion, if you realize the integer numbers are a computer’s intrinsic variant of “on off” patterns and iterating over the appropriate integer range will ultimately produce all possible permutations. The only challenge in your case, is to transfer the densely packed bits of an integer number to the target bits of a BitSet.
Here is such a solution:
static List<BitSet> powerset(BitSet set) {
int nBits = set.cardinality();
if(nBits > 30) throw new OutOfMemoryError(
"Not enough memory for "+BigInteger.ONE.shiftLeft(nBits)+" BitSets");
int max = 1 << nBits;
int[] targetBits = set.stream().toArray();
List<BitSet> sets = new ArrayList<>(max);
for(int onOff = 0; onOff < max; onOff++) {
BitSet next = new BitSet(set.size());
for(int bitsToSet = onOff, ix = 0; bitsToSet != 0; ix++, bitsToSet>>>=1) {
if((bitsToSet & 1) == 0) {
int skip = Integer.numberOfTrailingZeros(bitsToSet);
ix += skip;
bitsToSet >>>= skip;
}
next.set(targetBits[ix]);
}
sets.add(next);
}
return sets;
}
It uses an int value for the iteration, which is already enough to represent all permutations that can ever be stored in one of Java’s builtin collections. If your source BitSet has 2³¹ one bits, the 2³² possible combinations do not only require a hundred GB heap, but also a collection supporting 2³² elements, i.e. a size not representable as int.
So the code above terminates early if the number exceeds the capabilities, without even trying. You could rewrite it to use a long or even BigInteger instead, to keep it busy in such cases, until it will fail with an OutOfMemoryError anyway.
For the working cases, the int solution is the most efficient variant.
Note that the code returns a List rather than a HashSet to avoid the costs of hashing. The values are already known to be unique and hashing would only pay off if you want to perform lookups, i.e. call contains with another BitSet. But to test whether an existing BitSet is a permutation of your input BitSet, you wouldn’t even need to generate all permutations, a simple bit operation, e.g. andNot would tell you that already. So for storing and iterating the permutations, an ArrayList is more efficient.

Determinism of Java 8 streams

Motivation
I've just rewritten some 30 mostly trivial parsers and I need that the new versions behave exactly like the old ones. Therefore, I stored their example input files and some signature of the outputs produced by the old parsers for comparison with the new ones. This signature contains the counts of successfully parsed items, sums of some hash codes and up to 10 pseudo-randomly chosen items.
I thought this was a good idea as the equality of the hash code sums sort of guarantee that the outputs are exactly the same and the samples allow me to see what's wrong. I'm only using samples as otherwise it'd get really big.
The problem
Basically, given an unordered collection of strings, I want to get a list of up to 10 of them, so that when the collection changes a bit, I still get mostly the same samples in the same positions (the input is unordered, but the output is a list). This should work also when something is missing, so ideas like taking the 100th smallest element don't work.
ImmutableList<String> selectSome(Collection<String> list) {
if (list.isEmpty()) return ImmutableList.of();
return IntStream.range(1, 20)
.mapToObj(seed -> selectOne(list, seed))
.distinct()
.limit(10)
.collect(ImmutableList.toImmutableList());
}
So I start with numbers from 1 to 20 (so that after distinct I still most probably have my 10 samples), call a stateless deterministic function selectOne (defined below) returning one string which is maximal according to some funny criteria, remove duplicates, limit the result and collect it using Guava. All steps should be IMHO deterministic and "ordered", but I may be overlooking something. The other possibility would be that all my 30 new parsers are wrong, but this is improbable given that the hashes are correct. Moreover, the results of the parsing look correct.
String selectOne(Collection<String> list, int seed) {
// some boring mixing, definitely deterministic
for (int i=0; i<10; ++i) {
seed *= 123456789;
seed = Integer.rotateLeft(seed, 16);
}
// ensure seed is odd
seed = 2*seed + 1;
// first element is the candidate result
String result = list.iterator().next();
// the value is the hash code multiplied by the seed
// overflow is fine
int value = seed * result.hashCode();
// looking for s maximizing seed * s.hashCode()
for (final String s : list) {
final int v = seed * s.hashCode();
if (v < value) continue;
// tiebreaking by taking the bigger or smaller s
// this is needed for determinism
if (s.compareTo(result) * seed < 0) continue;
result = s;
value = v;
}
return result;
}
This sampling doesn't seem to work. I get a sequence like
"9224000", "9225000", "4165000", "9200000", "7923000", "8806000", ...
with one old parser and
"9224000", "9225000", "4165000", "3030000", "1731000", "8806000", ...
with a new one. Both results are perfectly repeatable. For other parsers, it looks very similar.
Is my usage of streams wrong? Do I have to add .sequential() or alike?
Update
Sorting the input collection has solved the problem:
ImmutableList<String> selectSome(Collection<String> collection) {
final List<String> list = Lists.newArrayList(collection);
Collections.sort(list);
.... as before
}
What's still missing is an explanation why.
The explanation
As stated in the answers, my tiebreaker was an all-breaker as I missed to check for a tie. Something like
if (v==value && s.compareTo(result) < 0) continue;
works fine.
I hope that my confused question may be at least useful for someone looking for "consistent sampling". It wasn't really Java 8 related.
I should've used Guava ComparisonChain or better Java 8 arg max to avoid my stupid mistake:
String selectOne(Collection<String> list, int seed) {
.... as before
final int multiplier = 2*seed + 1;
return list.stream()
.max(Comparator.comparingInt(s -> multiplier * s.hashCode())
.thenComparing(s -> s)) // <--- FOOL-PROOF TIEBREAKER
.get();
}

The mistake is that your tiebreaker is not in fact breaking a tie. We should be selecting s when v > value, but instead we're falling back to compareTo(). This breaks comparison symmetry, making your algorithm dependent on encounter order.
As a bonus, here's a simple test case to reproduce the bug:
System.out.println(selectOne(Arrays.asList("1", "2"), 4)); // 1
System.out.println(selectOne(Arrays.asList("2", "1"), 4)); // 2

In selectOne you just want to select String s with max rank of value = seed * s.hashCode(); for that given seed.
The problem is with the "tiebreaking" line:
if (s.compareTo(result) * seed < 0) continue;
It is not deterministic - for different order of elements it omits different elements from being check, and thus change in order of elements is changing the result.
Remove the tiebreaking if and the result will be insensitive to the order of elements in input list.

Fast way to check if long integer is a cube (in Java)

I am writing a program in which I am required to check if certain large numbers (permutations of cubes) are cubic (equal to n^3 for some n).
At the moment I simply use the method
static boolean isCube(long input) {
double cubeRoot = Math.pow(input,1.0/3.0);
return Math.round(cubeRoot) == cubeRoot;
}
but this is very slow when working with large numbers (10+ digits). Is there a faster way to determine if integer numbers are cubes?

There are only 2^21 cubes that don't overflow a long (2^22 - 1 if you allow negative numbers), so you could just use a HashSet lookup.

The Hacker's Delight book has a short and fast function for integer cube roots which could be worth porting to 64bit longs, see below.
It appears that testing if a number is a perfect cube can be done faster than actually computing the cube root. Burningmath has a technique that uses the "digital root" (sum the digits. repeat until it's a single digit). If the digital root is 0, 1 or 8, your number might be a perfect cube.
This method could be extremely valuable for your case of permuting (the digits of?) numbers. If you can rule out a number by its digital root, all permutations are also ruled out.
They also describe a technique based on the prime factors for checking perfect cubes. This looks most appropriate for mental arithmetic, as I think factoring is slower than cube-rooting on a computer.
Anyway, the digital root is quick to computer, and you even have your numbers as a string of digits to start with. You'll still need a divide-by-10 loop, but your starting point is the sum of digits of the input, not the whole number, so it won't be many divisions. (Integer division is about an order of magnitude slower than multiplication on current CPUs, but division by a compile-time-constant can be optimized to multiply+shift with a fixed-point inverse. Hopefully Java JIT compilers use that, too, and maybe even use it for runtime constants.)
This plus A. Webb's test (input % 819 -> search of a table of 45 entries) will rule out a lot of inputs as not possible perfect cubes.
IDK if binary search, linear search, or hash/set would be best.
These tests could be a front-end to David Eisenstat's idea of just storing the set of longs that are perfect cubes in a data structure that allows quick is-present checks. (e.g. HashSet). Yes, cache misses are expensive enough that at least the digital-root test is probably worth it before doing a HashSet lookup, maybe both.
You could use less memory on this idea by using it for a Bloom Filter instead of an exact set (David Ehrman's suggestion). This would give another candidate-rejection frontend to the full calculation. The guavac BloomFilter implementation requires a "funnel" function to translate objects to bytes, which in this case should be f(x)=x).
I suspect that Bloom filtering isn't going to be a big win over an exact HashSet check, since it requires multiple memory accesses. It's appropriate when you really can't afford the space for a full table, and what you're filtering out is something really expensive like a disk access.
The integer cube root function (below) is probably faster than a single cache miss. If the cbrt check is causing cache misses, then probably the rest of your code will suffer more cache misses too, when its data is evicted.
Math.SE had a question about this for perfect squares, but that was about squares, not cubes, so none of this came up. The answers there did discuss and avoid the problems in your method, though. >.<
There are several problems with your method:
The problem with using pow(x, 1./3) is that 1/3 does not have an exact representation in floating point, so you're not "really" getting the cube root. So use cbrt. It's highly unlikely to be slower, unless it has higher accuracy that comes with a time cost.
You're assuming Math.pow or Math.cbrt always return a value that's exactly an integer, and not 41.999999 or something. Java docs say:
The computed result must be within 1 ulp of the exact result.
This means your code might not work on a conforming Java implementation. Comparing floating point numbers for exactly equal is tricky business. What Every Computer Scientist Should Know About Floating-Point Arithmetic has much to say about floating point, but it's really long. (With good reason. It's easy to shoot yourself in the foot with floating point.) See also Comparing Floating Point Numbers, 2012 Edition, Bruce Dawson's series of FP articles.
I think it won't work for all long values. double can only precisely represent integers up to 2^53 (size of the mantissa in a 64bit IEEE double). Math.cbrt of integers that can't be represented exactly is even less likely to be an exact integer.
FP cube root, and then testing the resulting integer, avoids all the problems that the FP comparison introduced:
static boolean isCube(long input) {
double cubeRoot = Math.cbrt(input);
long intRoot = Math.round(cubeRoot);
return (intRoot*intRoot*intRoot) == input;
}
(After searching around, I see other people on other stackoverflow / stackexchange answers suggesting that integer-comparison method, too.)
If you need high performance, and you don't mind having a more complex function with more source code, then there are possibilities. For example, use a cube-root successive-approximation algorithm with integer math. If you eventually get to a point where n^3 < input <(n+1)^3, theninput` isn't a cube.
There's some discussion of methods on this math.SE question.
I'm not going to take the time to dig into integer cube-root algorithms in detail, as the cbrt part is probably not the main bottleneck. Probably input parsing and string->long conversion is a major part of your bottleneck.
Actually, I got curious. Turns out there is already an integer cube-root implementation available in Hacker's Delight (use / copying / distributing even without attribution is allowed. AFAICT, it's essentially public domain code.):
// Hacker's delight integer cube-root (for 32-bit integers, I think)
int icbrt1(unsigned x) {
int s;
unsigned y, b;
y = 0;
for (s = 30; s >= 0; s = s - 3) {
y = 2*y;
b = (3*y*(y + 1) + 1) << s;
if (x >= b) {
x = x - b;
y = y + 1;
}
}
return y;
}
That 30 looks like a magic number based on the number of bits in an int. Porting this to long would require testing. (Also note that this is C, but looks like it should compile in Java, too!)
IDK if this is common knowledge among Java people, but the 32bit Windows JVM doesn't use the server JIT engine, and doesn't optimize your code as well.

You can first eliminate a large number of candidates by testing modulo given numbers. For example, a cube modulo the number 819 can only take on the following 45 values.
0 125 181 818 720 811 532 755 476
1 216 90 307 377 694 350 567 442
8 343 559 629 658 351 190 91 469
27 512 287 252 638 118 603 161 441
64 729 99 701 792 378 260 468 728
So, you could eliminate actually having to compute the cubic root in almost 95% of uniformly distributed cases.

The hackers delight routine seems to work on long numbers if you just change int to long and 30 to 60. If you change 30 to 61 it does not seem to work.
I didn't really understand the program, so I made another version that seems to work in Java.
private static int cubeRoot(long n) {
final int MAX_POWER = 21;
int power = MAX_POWER;
long factor;
long root = 0;
long next, square, cube;
while (power >= 0) {
factor = 1 << power;
next = root + factor;
while (true) {
if (next > n) {
break;
}
if (n / next < next) {
break;
}
square = next * next;
if (n / square < next) {
break;
}
cube = square * next;
if (cube > n) {
break;
}
root = next;
next += factor;
}
--power;
}
return (int) root;
}

Please define very show. Here is a test program:
public static void main(String[] args) {
for (long v = 1; v > 0; v = v * 10) {
long start = System.nanoTime();
for (int i = 0; i < 100; i++)
isCube(v);
long end = System.nanoTime();
System.out.println(v + ": " + (end - start) + "ns");
}
}
static boolean isCube(long input) {
double cubeRoot = Math.pow(input,1.0/3.0);
return Math.round(cubeRoot) == cubeRoot;
}
Output is:
1: 290528ns
10: 46188ns
100: 45332ns
1000: 46188ns
10000: 46188ns
100000: 46473ns
1000000: 46188ns
10000000: 45048ns
100000000: 45048ns
1000000000: 44763ns
10000000000: 45048ns
100000000000: 44477ns
1000000000000: 45047ns
10000000000000: 46473ns
100000000000000: 47044ns
1000000000000000: 46188ns
10000000000000000: 65291ns
100000000000000000: 45047ns
1000000000000000000: 44477ns
I don't see a performance impact of "large" numbers.

Reduce time complexity of a program (in Java)?

This question is quite a long shot. It could take quite long, so if you haven't the time I understand.
Let me start by explaining what I want to achieve:
Me and some friends play this math game where we get 6 random numbers out of a pool of possible numbers: 1 to 10, 25, 50, 75 and 100. 6 numbers are chosen out of these and no duplicates are allowed. Then a goal number will be chosen in the range of [100, 999]. With the 6 aforementioned numbers, we can use only basic operations (addition, subtraction, multiplication and division) to reach the goal. Only integers are allowed and not all 6 integers are required to reach the solution.
An example: We start with the numbers 4,8,6,9,25,100 and need to find 328.
A possible solution would be: ((4 x 100) - (9 x 8)) = 400 - 72 = 328. With this, I have only used 4 out of the 6 initial numbers and none of the numbers have been used twice. This is a valid solution.
We don't always find a solution on our own, that's why I figured a program would be useful. I have written a program (in Java) which has been tested a few times throughout and it had worked. It did not always give all the possible solutions, but it worked within its own limitations. Now I've tried to expand it so all the solutions would show.
On to the main problem:
The program that I am trying to execute is running incredibly long. As in, I would let it run for 15 minutes and it doesn't look like it's anywhere near completion. So I thought about it and the options are indeed quite endless. I start with 6 numbers, I compare the first with the other 5, then the second with the other 5 and so on until I've done this 6 times (and each comparison I compare with every operator, so 4 times again). Out of the original one single state of 6 numbers, I now have 5 times 6 times 4 = 120 states (with 5 numbers each). All of these have to undergo the same ritual, so it's no wonder it's taking so long.
The program is actually too big to list here, so I will upload it for those interested:
http://www.speedyshare.com/ksT43/MathGame3.jar
(Click on the MathGame3.jar title right next to download)
Here's the general rundown on what happens:
-6 integers + goal number are initialized
-I use the class StateNumbers that are acting as game states
-> in this class the remaining numbers (initially the 6 starting numbers)
are kept as well as the evaluated expressions, for printing purposes
This method is where the main operations happen:
StateNumbers stateInProcess = getStates().remove(0);
ArrayList<Integer> remainingNumbers = stateInProcess.getRemainingNumbers();
for(int j = 0; j < remainingNumbers.size(); j++){
for(int i = 0; i < remainingNumbers.size(); i++){
for(Operator op : Operator.values()){ // Looping over different operators
if(i == j) continue;
...
}
}
}
I evaluate for the first element all the possible operations with all the remaining numbers for that state. I then check with a self written equals to see if it's already in the arraylist of states (which acts as a queue, but the order is not of importance). If it's not there, then the state will be added to the list and then I do the same for the other elements. After that I discard the state and pick another out of the growing list.
The list grows in size to 80k states in 10 minutes and grows slower and slower. That's because there is an increasing amount of states to compare to when I want to add a new state. It's making me wonder if comparing with other states to prevent duplicates is such a good idea.
The completion of this program is not really that important, but I'd like to see it as a learning experience. I'm not asking anyone to write the code for me, but a friendly suggestion on what I could have handled better would be very much appreciated. This means if you have something you'd like to mention about another aspect of the program, please do. I'm unsure if this is too much to ask for on this forum as most topics handle a specific part of a program. While my question is specific as well, the causes could be many.
EDIT: I'm not trying to find the fastest single solution, but every solution. So if I find a solution, my program will not stop. It will however try to ignore doubles like:
((4+5)7) and (7(5+4)). Only one of the two is accepted because the equals method in addition and multiplication do not care about the positioning of the operands.

It would probably be easier to write this using recursion, i.e. a depth-first search, as this would simplify the bookkeeping for intermediary states.
If you want to keep a breath-first approach, make sure that the list of states supports efficient removal of the first element, i.e. use a java.util.Queue such as java.util.ArrayDeque. I mention this because the most frequently used List implementation (i.e. java.util.ArrayList) needs to copy its entire contents to remove the first element, which makes removing the first element very expensive if the list is large.
120 states (with 5 numbers each). All of these have to undergo the same ritual, so it's no wonder it's taking so long.
Actually, it is quite surprising that it would. After all, a 2GHz CPU performs 2 billion clock cycles per second. Even if checking a state were to take as many as 100 clock cycles, that would still mean 20 million states per second!
On the other hand, if I understand the rules of the game correctly, the set of candidate solutions is given by all orderings of the 6 numbers (of which there are 6! = 720), with one of 4 operators in the 5 spaces in between, and a defined evaluation order of the operators. That is, we have a total of 6! * 4^5 * 5! = 88 473 600 candidate solutions, so processing should complete in a couple of seconds.
PS: A full solution would probably not be very time-consuming to write, so if you wish, I can also postcode - I just didn't want to spoil your learning experience.
Update: I have written the code. It was harder than I thought, as the requirement to find all solutions implies that we need to print a solution without unwinding the stack. I, therefore, kept the history for each state on the heap. After testing, I wasn't quite happy with the performance (about 10 seconds), so I added memoization, i.e. each set of numbers is only processed once. With that, the runtime dropped to about 3 seconds.
As Stackoverflow doesn't have a spoiler tag, I increased the indentation so you have to scroll right to see anything :-)
package katas.countdown;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
enum Operator {
plus("+", true),
minus("-", false),
multiply("*", true),
divide("/", false);
final String sign;
final boolean commutes;
Operator(String sign, boolean commutes) {
this.sign = sign;
this.commutes = commutes;
}
int apply(int left, int right) {
switch (this) {
case plus:
return left + right;
case minus:
return left - right;
case multiply:
return left * right;
case divide:
int mod = left % right;
if (mod == 0) {
return left / right;
} else {
throw new ArithmeticException();
}
}
throw new AssertionError(this);
}
#Override
public String toString() {
return sign;
}
}
class Expression implements Comparable<Expression> {
final int value;
Expression(int value) {
this.value = value;
}
#Override
public int compareTo(Expression o) {
return value - o.value;
}
#Override
public int hashCode() {
return value;
}
#Override
public boolean equals(Object obj) {
return value == ((Expression) obj).value;
}
#Override
public String toString() {
return Integer.toString(value);
}
}
class OperationExpression extends Expression {
final Expression left;
final Operator operator;
final Expression right;
OperationExpression(Expression left, Operator operator, Expression right) {
super(operator.apply(left.value, right.value));
this.left = left;
this.operator = operator;
this.right = right;
}
#Override
public String toString() {
return "(" + left + " " + operator + " " + right + ")";
}
}
class State {
final Expression[] expressions;
State(int... numbers) {
expressions = new Expression[numbers.length];
for (int i = 0; i < numbers.length; i++) {
expressions[i] = new Expression(numbers[i]);
}
}
private State(Expression[] expressions) {
this.expressions = expressions;
}
/**
* #return a new state constructed by removing indices i and j, and adding expr instead
*/
State replace(int i, int j, Expression expr) {
Expression[] exprs = Arrays.copyOf(expressions, expressions.length - 1);
if (i < exprs.length) {
exprs[i] = expr;
if (j < exprs.length) {
exprs[j] = expressions[exprs.length];
}
} else {
exprs[j] = expr;
}
Arrays.sort(exprs);
return new State(exprs);
}
#Override
public boolean equals(Object obj) {
return Arrays.equals(expressions, ((State) obj).expressions);
}
public int hashCode() {
return Arrays.hashCode(expressions);
}
}
public class Solver {
final int goal;
Set<State> visited = new HashSet<>();
public Solver(int goal) {
this.goal = goal;
}
public void solve(State s) {
if (s.expressions.length > 1 && !visited.contains(s)) {
visited.add(s);
for (int i = 0; i < s.expressions.length; i++) {
for (int j = 0; j < s.expressions.length; j++) {
if (i != j) {
Expression left = s.expressions[i];
Expression right = s.expressions[j];
for (Operator op : Operator.values()) {
if (op.commutes && i > j) {
// no need to evaluate the same branch twice
continue;
}
try {
Expression expr = new OperationExpression(left, op, right);
if (expr.value == goal) {
System.out.println(expr);
} else {
solve(s.replace(i, j, expr));
}
} catch (ArithmeticException e) {
continue;
}
}
}
}
}
}
}
public static void main(String[] args) {
new Solver(812).solve(new State(75, 50, 2, 3, 8, 7));
}
}
}
As requested, each solution is reported only once (where two solutions are considered equal if their set of intermediary results is). Per Wikipedia description, not all numbers need to be used. However, there is a small bug left in that such solutions may be reported more than once.

What you're doing is basically a breadth-first search for a solution. This was also my initial idea when I saw the problem, but I would add a few things.
First, the main thing you're doing with your ArrayList is to remove elements from it and test if elements are already present. Since your range is small, I would use a separate HashSet, or BitSet for the second operation.
Second, and more to the point of your question, you could also add the final state to your initial points, and search backward as well. Since all your operations have inverses (addition and subtraction, multiplication and division), you can do this. With the Set idea above, you would effectively halve the number of states you need to visit (this trick is known as meet-in-the-middle).
Other small things would be:
Don't divide unless your resulting number is an integer
Don't add a number outside the range (so >999) into your set/queue
The total number of states is 999 (the number of integers between 1 and 999 inclusive), so you shouldn't really run into performance issues here. I'm thinking your biggest drain is that you're testing inclusion in an ArrayList which is O(n).
Hope this helps!
EDIT: Just noticed this. You say you check whether a number is already in the list, but then remove it. If you remove it, there's a good chance you're going to add it back again. Use a separate data structure (a Set works perfectly here) to store your visited states, and you should be fine.
EDIT 2: As per other answers and comments (thanks #kutschkem and #meriton), a proper Queue is better for popping elements (constant versus linear for ArrayList). In this case, you have too few states for it to be noticeable, but use either a LinkedList or ArrayDeque when you do a BFS.
Updated answer to solve Countdown
Sorry for my misunderstandings before. To solve countdown, you can do something like this:
Suppose your 6 initial numbers are a1, a2, ..., a6, and your target number is T. You want to check whether there is a way to assign operators o1, o2, ..., o5 such that
a1 o1 a2 ... o5 a6 = T
There are 5 operators, each can take one of 4 values, so there are 4 ^ 5 = 2 ^ 10 possibilities. You can use less than the entire 6, but if you build your solution recursively, you will have checked all of them at the end (more on this later). The 6 initial numbers can also be permuted in 6! = 720 ways, which leads to a total number of solutions of 2 ^ 10 * 6! which is roughly 720,000.
Since this is small, what I would do is loop through every permutation of the initial 6 numbers, and try to assign the operators recursively. For that, define a function
void solve(int result, int index, List<Integer> permutation)
where result is the value of the computation so far, and index is the index in the permutation list. You then loop over every operator and call
solve(result op permutation.get(index), index + 1, permutation)
If at any point you find a solution, check to see if you haven't found it before, and add it if not.
Apologies for being so dense before. I hope this is more to the point.

Your problem is analogous to a Coin Change Problem. First do all of the combinations of subtractions so that you can have your 'unit denomination coins' which should be all of the subtractions and additions, as well as the normal numbers you are given. Then use a change making algorithm to get to the number you want. Since we did subtractions beforehand, the result may not be exactly what you want but it should be close and a lot faster than what you are doing.
Say we are given the 6 numbers as the set S = {1, 5, 10, 25, 50, 75, 100}. We then do all the combinations of subtractions and additions and add them to S i.e. {-99, -95, -90,..., 1, 5, 10,..., 101, 105,...}. Now we use a coin change algorithm with the elements of S as the denominations. If we do not get a solution then it is not solvable.
There are many ways to solve the coin change problem, a few are discussed here:
AlgorithmBasics-examples.pdf

Dealing with overflow in Java without using BigInteger

Suppose I have a method to calculate combinations of r items from n items:
public static long combi(int n, int r) {
if ( r == n) return 1;
long numr = 1;
for(int i=n; i > (n-r); i--) {
numr *=i;
}
return numr/fact(r);
}
public static long fact(int n) {
long rs = 1;
if(n <2) return 1;
for (int i=2; i<=n; i++) {
rs *=i;
}
return rs;
}
As you can see it involves factorial which can easily overflow the result. For example if I have fact(200) for the foctorial method I get zero. The question is why do I get zero?
Secondly how do I deal with overflow in above context? The method should return largest possible number to fit in long if the result is too big instead of returning wrong answer.
One approach (but this could be wrong) is that if the result exceed some large number for example 1,400,000,000 then return remainder of result modulo
1,400,000,001. Can you explain what this means and how can I do that in Java?
Note that I do not guarantee that above methods are accurate for calculating factorial and combinations. Extra bonus if you can find errors and correct them.
Note that I can only use int or long and if it is unavoidable, can also use double. Other data types are not allowed.
I am not sure who marked this question as homework. This is NOT homework. I wish it was homework and i was back to future, young student at university. But I am old with more than 10 years working as programmer. I just want to practice developing highly optimized solutions in Java. In our times at university, Internet did not even exist. Today's students are lucky that they can even post their homework on site like SO.

Use the multiplicative formula, instead of the factorial formula.

Since its homework, I won't want to just give you a solution. However a hint I will give is that instead of calculating two large numbers and dividing the result, try calculating both together. e.g. calculate the numerator until its about to over flow, then calculate the denominator. In this last step you can chose the divide the numerator instead of multiplying the denominator. This stops both values from getting really large when the ratio of the two is relatively small.
I got this result before an overflow was detected.
combi(61,30) = 232714176627630544 which is 2.52% of Long.MAX_VALUE
The only "bug" I found in your code is not having any overflow detection, since you know its likely to be a problem. ;)

To answer your first question (why did you get zero), the values of fact() as computed by modular arithmetic were such that you hit a result with all 64 bits zero! Change your fact code to this:
public static long fact(int n) {
long rs = 1;
if( n <2) return 1;
for (int i=2; i<=n; i++) {
rs *=i;
System.out.println(rs);
}
return rs;
}
Take a look at the outputs! They are very interesting.
Now onto the second question....
It looks like you want to give exact integer (er, long) answers for values of n and r that fit, and throw an exception if they do not. This is a fair exercise.
To do this properly you should not use factorial at all. The trick is to recognize that C(n,r) can be computed incrementally by adding terms. This can be done using recursion with memoization, or by the multiplicative formula mentioned by Stefan Kendall.
As you accumulate the results into a long variable that you will use for your answer, check the value after each addition to see if it goes negative. When it does, throw an exception. If it stays positive, you can safely return your accumulated result as your answer.
To see why this works consider Pascal's triangle
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
which is generated like so:
C(0,0) = 1 (base case)
C(1,0) = 1 (base case)
C(1,1) = 1 (base case)
C(2,0) = 1 (base case)
C(2,1) = C(1,0) + C(1,1) = 2
C(2,2) = 1 (base case)
C(3,0) = 1 (base case)
C(3,1) = C(2,0) + C(2,1) = 3
C(3,2) = C(2,1) + C(2,2) = 3
...
When computing the value of C(n,r) using memoization, store the results of recursive invocations as you encounter them in a suitable structure such as an array or hashmap. Each value is the sum of two smaller numbers. The numbers start small and are always positive. Whenever you compute a new value (let's call it a subterm) you are adding smaller positive numbers. Recall from your computer organization class that whenever you add two modular positive numbers, there is an overflow if and only if the sum is negative. It only takes one overflow in the whole process for you to know that the C(n,r) you are looking for is too large.
This line of argument could be turned into a nice inductive proof, but that might be for another assignment, and perhaps another StackExchange site.
ADDENDUM
Here is a complete application you can run. (I haven't figured out how to get Java to run on codepad and ideone).
/**
* A demo showing how to do combinations using recursion and memoization, while detecting
* results that cannot fit in 64 bits.
*/
public class CombinationExample {
/**
* Returns the number of combinatios of r things out of n total.
*/
public static long combi(int n, int r) {
long[][] cache = new long[n + 1][n + 1];
if (n < 0 || r > n) {
throw new IllegalArgumentException("Nonsense args");
}
return c(n, r, cache);
}
/**
* Recursive helper for combi.
*/
private static long c(int n, int r, long[][] cache) {
if (r == 0 || r == n) {
return cache[n][r] = 1;
} else if (cache[n][r] != 0) {
return cache[n][r];
} else {
cache[n][r] = c(n-1, r-1, cache) + c(n-1, r, cache);
if (cache[n][r] < 0) {
throw new RuntimeException("Woops too big");
}
return cache[n][r];
}
}
/**
* Prints out a few example invocations.
*/
public static void main(String[] args) {
String[] data = ("0,0,3,1,4,4,5,2,10,0,10,10,10,4,9,7,70,8,295,100," +
"34,88,-2,7,9,-1,90,0,90,1,90,2,90,3,90,8,90,24").split(",");
for (int i = 0; i < data.length; i += 2) {
int n = Integer.valueOf(data[i]);
int r = Integer.valueOf(data[i + 1]);
System.out.printf("C(%d,%d) = ", n, r);
try {
System.out.println(combi(n, r));
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
}
}
Hope it is useful. It's just a quick hack so you might want to clean it up a little.... Also note that a good solution would use proper unit testing, although this code does give nice output.

You can use the java.math.BigInteger class to deal with arbitrarily large numbers.

If you make the return type double, it can handle up to fact(170), but you'll lose some precision because of the nature of double (I don't know why you'd need exact precision for such huge numbers).
For input over 170, the result is infinity

Note that java.lang.Long includes constants for the min and max values for a long.
When you add together two signed 2s-complement positive values of a given size, and the result overflows, the result will be negative. Bit-wise, it will be the same bits you would have gotten with a larger representation, only the high-order bit will be truncated away.
Multiplying is a bit more complicated, unfortunately, since you can overflow by more than one bit.
But you can multiply in parts. Basically you break the to multipliers into low and high halves (or more than that, if you already have an "overflowed" value), perform the four possible multiplications between the four halves, then recombine the results. (It's really just like doing decimal multiplication by hand, but each "digit" is, say, 32 bits.)

You can copy the code from java.math.BigInteger to deal with arbitrarily large numbers. Go ahead and plagiarize.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.