Common element in Java infinite streams - java

I have three infinite Java IntStream objects. I want to find the smallest element that is present in all three of them.
IntStream a = IntStream.iterate(286, i->i+1).map(i -> (Integer)i*(i+1)/2);
IntStream b = IntStream.iterate(166, i->i+1).map(i -> (Integer)i*(3*i-1)/2);
IntStream c = IntStream.iterate(144, i->i+1).map(i -> i*(2*i-1));
I can always employ a brute force solution (without streams) which involves iterating in nested loops, but I was wondering if we can do it more efficiently with streams?

You need to iterate all 3 in parallel, advancing the one with the lowest value, checking if all 3 are equal.
You code will not find an answer for next value after 40755, because the next value is 1_533_776_805, which has intermediate value (before division by 2) higher than Integer.MAX_VALUE (2_147_483_647).
So, here is one way to use your streams, after changing them to long and guarding against overflow.
LongStream a = LongStream.iterate(286, i->i+1).map(i -> Math.multiplyExact(i, i+1)/2);
LongStream b = LongStream.iterate(166, i->i+1).map(i -> Math.multiplyExact(i, 3*i-1)/2);
LongStream c = LongStream.iterate(144, i->i+1).map(i -> Math.multiplyExact(i, 2*i-1));
OfLong aIter = a.iterator();
OfLong bIter = b.iterator();
OfLong cIter = c.iterator();
long aVal = aIter.nextLong();
long bVal = bIter.nextLong();
long cVal = cIter.nextLong();
while (aVal != bVal || bVal != cVal) {
long min = Math.min(Math.min(aVal, bVal), cVal);
if (aVal == min)
aVal = aIter.nextLong();
if (bVal == min)
bVal = bIter.nextLong();
if (cVal == min)
cVal = cIter.nextLong();
}
System.out.println(aVal);

These functions are always increasing. So the code should stop when the magic equal triplet is found.
The thing to code is:
a) when a stream's current value is below any other, it can iterate next for itself.
b) when it meets the same candidate value, it waits for the 3rd stream to take a decision.
c) when it has a higher value than the all others, it changes the candidate and waits for both others.
Reference juggling.
There may not be a solution too (at least in short time).
Notice that stream c can only produce even numbers (when seeded with even). There might be some optimization there to skip a and b faster.

I don't think there is anything smart possible with stream API. The main reason is that you can't really go over one stream until some condition is met - instead, you look at current 3 elements and pick the next element from one of the streams before comparing the elements again.
The most efficient (and might be also the cleanest) solution is to use iterators and keep calling next() method on the right streams until the answer is found.
To start with, you can focus on two streams only and find their first common value:
while (elementA != elementB) {
if (elementA < elementB) {
elementA = iteratorA.next();
} else {
elementB = iteratorB.next();
}
}
Then you need to do make third stream catch up with these two:
while (elementC < elementA) {
elementC = iteratorC.next();
}
At this point there are two options:
either elementC == elementA in which case you have the answer
or elementC > elementA in which case you can go to next value on all three streams and start over
One thing to remember is the max value of integer. Because you have i^2, this means that it will overflow for i about 46k, so you need to change streams of ints to streams of longs (the answer is about 1.5 billion - and that's after division by 2 in these functions).
Since you are doing exercises for practice, I don't think it's right to give you the full working code, but let me know if you still struggle with it ;)

Related

Creating combinations of a BitSet

Assume I have a Java BitSet. I now need to make combinations of the BitSet such that only Bits which are Set can be flipped. i.e. only need combinations of Bits which are set.
For Eg. BitSet - 1010, Combinations - 1010, 1000, 0010, 0000
BitSet - 1100, Combination - 1100, 1000, 0100, 0000
I can think of a few solutions E.g. I can take combinations of all 4 bits and then XOR the combinations with the original Bitset. But this would be very resource-intensive for large sparse BitSets. So I was looking for a more elegant solution.
It appears that you want to get the power set of the bit set. There is already an answer here about how to get the power set of a Set<T>. Here, I will show a modified version of the algorithm shown in that post, using BitSets:
private static Set<BitSet> powerset(BitSet set) {
Set<BitSet> sets = new HashSet<>();
if (set.isEmpty()) {
sets.add(new BitSet(0));
return sets;
}
Integer head = set.nextSetBit(0);
BitSet rest = set.get(0, set.size());
rest.clear(head);
for (BitSet s : powerset(rest)) {
BitSet newSet = s.get(0, s.size());
newSet.set(head);
sets.add(newSet);
sets.add(s);
}
return sets;
}
You can perform the operation in a single linear pass instead of recursion, if you realize the integer numbers are a computer’s intrinsic variant of “on off” patterns and iterating over the appropriate integer range will ultimately produce all possible permutations. The only challenge in your case, is to transfer the densely packed bits of an integer number to the target bits of a BitSet.
Here is such a solution:
static List<BitSet> powerset(BitSet set) {
int nBits = set.cardinality();
if(nBits > 30) throw new OutOfMemoryError(
"Not enough memory for "+BigInteger.ONE.shiftLeft(nBits)+" BitSets");
int max = 1 << nBits;
int[] targetBits = set.stream().toArray();
List<BitSet> sets = new ArrayList<>(max);
for(int onOff = 0; onOff < max; onOff++) {
BitSet next = new BitSet(set.size());
for(int bitsToSet = onOff, ix = 0; bitsToSet != 0; ix++, bitsToSet>>>=1) {
if((bitsToSet & 1) == 0) {
int skip = Integer.numberOfTrailingZeros(bitsToSet);
ix += skip;
bitsToSet >>>= skip;
}
next.set(targetBits[ix]);
}
sets.add(next);
}
return sets;
}
It uses an int value for the iteration, which is already enough to represent all permutations that can ever be stored in one of Java’s builtin collections. If your source BitSet has 2³¹ one bits, the 2³² possible combinations do not only require a hundred GB heap, but also a collection supporting 2³² elements, i.e. a size not representable as int.
So the code above terminates early if the number exceeds the capabilities, without even trying. You could rewrite it to use a long or even BigInteger instead, to keep it busy in such cases, until it will fail with an OutOfMemoryError anyway.
For the working cases, the int solution is the most efficient variant.
Note that the code returns a List rather than a HashSet to avoid the costs of hashing. The values are already known to be unique and hashing would only pay off if you want to perform lookups, i.e. call contains with another BitSet. But to test whether an existing BitSet is a permutation of your input BitSet, you wouldn’t even need to generate all permutations, a simple bit operation, e.g. andNot would tell you that already. So for storing and iterating the permutations, an ArrayList is more efficient.

Is it more efficient to scan an array once against multiple predicates or multiple times against a single predicate

I have an int array with 1000 elements. I need to extract the size of various sub-populations within the array (How many are even, odd, greater than 500, etc..).
I could use a for loop and a bunch of if statements to try add to a counting variable for each matching item such as:
for(int i = 0; i < someArray.length i++) {
if(conditionA) sizeA++;
if(conditionB) sizeB++;
if(conditionC) sizeC++;
...
}
or I could do something more lazy such as:
Supplier<IntStream> ease = () -> Arrays.stream(someArray);
int sizeA = ease.get().filter(conditionA).toArray.length;
int sizeB = ease.get().filter(conditionB).toArray.length;
int sizeC = ease.get().filter(conditionC).toArray.length;
...
The benefit of doing it the second way seems to be limited to readability, but is there a massive hit on efficiency? Could it possibly be more efficient? I guess it boils down to is iterating through the array one time with 4 conditions always better than iterating through 4 times with one condition each time (assuming the conditions are independent). I am aware this particular example the second method has lots of additional method calls which I'm sure don't help efficiency any.
Preamble:
As #Kayaman points out, for a small array (1000 elements) it probably doesn't matter.
The correct approach to this kind of thing is to do the optimization after you have working code, and a working benchmark, and after you have profiled the code to see where the real hotspots are.
But assuming that this is worth spending effort on optimization, the first version is likely to be faster than the second version for a couple of reasons:
The overheads of incrementing and testing the index are only incurred once in the first version versus three times in the second one.
For an array that is too large to fit into the memory cache, the first version will entail fewer memory reads than the second one. Since memory access is typically a bottleneck (especially on a multi-core machine), this can be significant.
Streams add an extra performance overhead compared to simple iteration of an array.
I did some time measuring with this code:
Random r = new Random();
int[] array = IntStream.generate(() -> r.nextInt(100)).limit(1000).toArray();
long odd = 0;
long even = 0;
long divisibleBy3 = 0;
long start = System.nanoTime();
//for (int i: array) {
// if (i % 2 == 1) {
// odd++;
// }
// if (i % 2 == 0) {
// even++;
// }
// if (i % 3 == 0) {
// divisibleBy3++;
// }
//}
even = Arrays.stream(array).parallel().filter(x -> x % 2 == 0).toArray().length;
odd = Arrays.stream(array).parallel().filter(x -> x % 2 == 1).toArray().length;
divisibleBy3 = Arrays.stream(array).parallel().filter(x -> x % 3 == 0).toArray().length;
System.out.println(System.nanoTime() - start);
The above outputs a 8 digit number, usually around 14000000
If I uncomment the for loop and comment the streams, I get a 5 digit number as output, usually around 80000.
So the streams are slower in terms of execution time.
When the array size is bigger, though, the difference between streams and loops becomes smaller.

Determinism of Java 8 streams

Motivation
I've just rewritten some 30 mostly trivial parsers and I need that the new versions behave exactly like the old ones. Therefore, I stored their example input files and some signature of the outputs produced by the old parsers for comparison with the new ones. This signature contains the counts of successfully parsed items, sums of some hash codes and up to 10 pseudo-randomly chosen items.
I thought this was a good idea as the equality of the hash code sums sort of guarantee that the outputs are exactly the same and the samples allow me to see what's wrong. I'm only using samples as otherwise it'd get really big.
The problem
Basically, given an unordered collection of strings, I want to get a list of up to 10 of them, so that when the collection changes a bit, I still get mostly the same samples in the same positions (the input is unordered, but the output is a list). This should work also when something is missing, so ideas like taking the 100th smallest element don't work.
ImmutableList<String> selectSome(Collection<String> list) {
if (list.isEmpty()) return ImmutableList.of();
return IntStream.range(1, 20)
.mapToObj(seed -> selectOne(list, seed))
.distinct()
.limit(10)
.collect(ImmutableList.toImmutableList());
}
So I start with numbers from 1 to 20 (so that after distinct I still most probably have my 10 samples), call a stateless deterministic function selectOne (defined below) returning one string which is maximal according to some funny criteria, remove duplicates, limit the result and collect it using Guava. All steps should be IMHO deterministic and "ordered", but I may be overlooking something. The other possibility would be that all my 30 new parsers are wrong, but this is improbable given that the hashes are correct. Moreover, the results of the parsing look correct.
String selectOne(Collection<String> list, int seed) {
// some boring mixing, definitely deterministic
for (int i=0; i<10; ++i) {
seed *= 123456789;
seed = Integer.rotateLeft(seed, 16);
}
// ensure seed is odd
seed = 2*seed + 1;
// first element is the candidate result
String result = list.iterator().next();
// the value is the hash code multiplied by the seed
// overflow is fine
int value = seed * result.hashCode();
// looking for s maximizing seed * s.hashCode()
for (final String s : list) {
final int v = seed * s.hashCode();
if (v < value) continue;
// tiebreaking by taking the bigger or smaller s
// this is needed for determinism
if (s.compareTo(result) * seed < 0) continue;
result = s;
value = v;
}
return result;
}
This sampling doesn't seem to work. I get a sequence like
"9224000", "9225000", "4165000", "9200000", "7923000", "8806000", ...
with one old parser and
"9224000", "9225000", "4165000", "3030000", "1731000", "8806000", ...
with a new one. Both results are perfectly repeatable. For other parsers, it looks very similar.
Is my usage of streams wrong? Do I have to add .sequential() or alike?
Update
Sorting the input collection has solved the problem:
ImmutableList<String> selectSome(Collection<String> collection) {
final List<String> list = Lists.newArrayList(collection);
Collections.sort(list);
.... as before
}
What's still missing is an explanation why.
The explanation
As stated in the answers, my tiebreaker was an all-breaker as I missed to check for a tie. Something like
if (v==value && s.compareTo(result) < 0) continue;
works fine.
I hope that my confused question may be at least useful for someone looking for "consistent sampling". It wasn't really Java 8 related.
I should've used Guava ComparisonChain or better Java 8 arg max to avoid my stupid mistake:
String selectOne(Collection<String> list, int seed) {
.... as before
final int multiplier = 2*seed + 1;
return list.stream()
.max(Comparator.comparingInt(s -> multiplier * s.hashCode())
.thenComparing(s -> s)) // <--- FOOL-PROOF TIEBREAKER
.get();
}
The mistake is that your tiebreaker is not in fact breaking a tie. We should be selecting s when v > value, but instead we're falling back to compareTo(). This breaks comparison symmetry, making your algorithm dependent on encounter order.
As a bonus, here's a simple test case to reproduce the bug:
System.out.println(selectOne(Arrays.asList("1", "2"), 4)); // 1
System.out.println(selectOne(Arrays.asList("2", "1"), 4)); // 2
In selectOne you just want to select String s with max rank of value = seed * s.hashCode(); for that given seed.
The problem is with the "tiebreaking" line:
if (s.compareTo(result) * seed < 0) continue;
It is not deterministic - for different order of elements it omits different elements from being check, and thus change in order of elements is changing the result.
Remove the tiebreaking if and the result will be insensitive to the order of elements in input list.

Number Guessing Game Over Intervals

I have just started my long path to becoming a better coder on CodeChef. People begin with the problems marked 'Easy' and I have done the same.
The Problem
The problem statement defines the following -:
n, where 1 <= n <= 10^9. This is the integer which Johnny is keeping secret.
k, where 1 <= k <= 10^5. For each test case or instance of the game, Johnny provides exactly k hints to Alice.
A hint is of the form op num Yes/No, where -
op is an operator from <, >, =.
num is an integer, again satisfying 1 <= num <= 10^9.
Yes or No are answers to the question: Does the relation n op num hold?
If the answer to the question is correct, Johnny has uttered a truth. Otherwise, he is lying.
Each hint is fed to the program and the program determines whether it is the truth or possibly a lie. My job is to find the minimum possible number of lies.
Now CodeChef's Editorial answer uses the concept of segment trees, which I cannot wrap my head around at all. I was wondering if there is an alternative data structure or method to solve this question, maybe a simpler one, considering it is in the 'Easy' category.
This is what I tried -:
class Solution //Represents a test case.
{
HashSet<SolutionObj> set = new HashSet<SolutionObj>(); //To prevent duplicates.
BigInteger max = new BigInteger("100000000"); //Max range.
BigInteger min = new BigInteger("1"); //Min range.
int lies = 0; //Lies counter.
void addHint(String s)
{
String[] vals = s.split(" ");
set.add(new SolutionObj(vals[0], vals[1], vals[2]));
}
void testHints()
{
for(SolutionObj obj : set)
{
//Given number is not in range. Lie.
if(obj.bg.compareTo(min) == -1 || obj.bg.compareTo(max) == 1)
{
lies++;
continue;
}
if(obj.yesno)
{
if(obj.operator.equals("<"))
{
max = new BigInteger(obj.bg.toString()); //Change max value
}
else if(obj.operator.equals(">"))
{
min = new BigInteger(obj.bg.toString()); //Change min value
}
}
else
{
//Still to think of this portion.
}
}
}
}
class SolutionObj //Represents a single hint.
{
String operator;
BigInteger bg;
boolean yesno;
SolutionObj(String op, String integer, String yesno)
{
operator = op;
bg = new BigInteger(integer);
if(yesno.toLowerCase().equals("yes"))
this.yesno = true;
else
this.yesno = false;
}
#Override
public boolean equals(Object o)
{
if(o instanceof SolutionObj)
{
SolutionObj s = (SolutionObj) o; //Make the cast
if(this.yesno == s.yesno && this.bg.equals(s.bg)
&& this.operator.equals(s.operator))
return true;
}
return false;
}
#Override
public int hashCode()
{
return this.bg.intValue();
}
}
Obviously this partial solution is incorrect, save for the range check that I have done before entering the if(obj.yesno) portion. I was thinking of updating the range according to the hints provided, but that approach has not borne fruit. How should I be approaching this problem, apart from using segment trees?
Consider the following approach, which may be easier to understand. Picture the 1d axis of integers, and place on it the k hints. Every hint can be regarded as '(' or ')' or '=' (greater than, less than or equal, respectively).
Example:
-----(---)-------(--=-----)-----------)
Now, the true value is somewhere on one of the 40 values of this axis, but actually only 8 segments are interesting to check, since anywhere inside a segment the number of true/false hints remains the same.
That means you can scan the hints according to their ordering on the axis, and maintain a counter of the true hints at that point.
In the example above it goes like this:
segment counter
-----------------------
-----( 3
--- 4
)-------( 3
-- 4
= 5 <---maximum
----- 4
)----------- 3
) 2
This algorithm only requires to sort the k hints and then scan them. It's near linear in k (O(k*log k), with no dependance on n), therefore it should have a reasonable running time.
Notes:
1) In practice the hints may have non-distinct positions, so you'll have to handle all hints of the same type on the same position together.
2) If you need to return the minimum set of lies, then you should maintain a set rather than a counter. That shouldn't have an effect on the time complexity if you use a hash set.
Calculate the number of lies if the target number = 1 (store this in a variable lies).
Let target = 1.
Sort and group the statements by their respective values.
Iterate through the statements.
Update target to the current statement group's value. Update lies according to how many of those statements would become either true or false.
Then update target to that value + 1 (Why do this? Consider when you have > 5 and < 7 - 6 may be the best value) and update lies appropriately (skip this step if the next statement group's value is this value).
Return the minimum value for lies.
Running time:
O(k) for the initial calculation.
O(k log k) for the sort.
O(k) for the iteration.
O(k log k) total.
My idea for this problem is similar to how Eyal Schneider view it. Denoting '>' as greater, '<' as less than and '=' as equals, we can sort all the 'hints' by their num and scan through all the interesting points one by one.
For each point, we keep in all the number of '<' and '=' from 0 to that point (in one array called int[]lessAndEqual), number of '>' and '=' from that point onward (in one array called int[]greaterAndEqual). We can easily see that the number of lies in a particular point i is equal to
lessAndEqual[i] + greaterAndEqual[i + 1]
We can easily fill the lessAndEqual and greaterAndEqual arrays by two scan in O(n) and sort all the hints in O(nlogn), which result the time complexity is O(nlogn)
Note: special treatment should be taken for the case when the num in hint is equals. Also notice that the range for num is 10^9, which require us to have some forms of point compression to fit the array into the memory

all possible ways in which K items can be arranged in N slots

I am looking for an algorithm to find all combinations of K values to n items.
Example:
K values are [R,B] & N is 2 so i get {RR, RB, BR, BB} 2*2 = 4 ways
K values are [R,B] & N is 3 so i get {RRR, RRB, RBB, RBR, BRR, BRB, BBR, BBB} 2*2*2 = 8 ways
I need to find out the generic algorithm to find all possible ways in which K items can be arranged in N slots. (repeat is allowed)
Another example would be:
K values are [R,G,B] & N is 5 so i need to find 3^5 = 81 combinations.
This problem lends itself exceptionally well to a recursive solution.
The solution in the general case is clearly formed by taking the solution for N - 1, and then prepending each of your set's elements in turn to the results. In pseudocode:
f(options, 0) = []
f(options, n) = options foreach o => o ++ f(options, n-1)
This could be implemented recursively in Java, but you'd run into stack overflow errors for moderately large values of n; and I also suspect the JIT compiler is less effective at optimising recursive algorithms so performance would suffer.
However, recursive algorithms can always be converted into a loop equivalent. In this case it might look something like:
List<String> results = new ArrayList<String>();
results.add(""); // Seed it for the base case n=0
for (int i = 0; i < n; i ++) {
List<String> previousResults = results;
results = new ArrayList<String>();
for (String s : options) {
for (String base : previousResults) {
results.add(s + base);
}
}
}
return results;
This works (I hope!) similarly to the recursive method - at each iteration it "saves" the current progress (i.e. the result for n-1) to previousResults, then just iterates over the options in turn, to get the result of prepending them to the previous results.
It would be interesting to see the effects of passing the recursive solution through any automatic recursion-to-iterative algorithms, and compare both the readability and performance to this hand-created one. This is left as an exercise for the reader.
I'll using counter of N bit in base k.
e.g: k=3, n=5
(0,0,0,0,0)
(0,0,0,0,1),
....
(2,2,2,2,2)
Implementing such a counter is easy, just keep array of size n+1, set all elements tozero at first, each time increase latest element, and if it will exceed from k-1, increase next neighbors (till neighbors are exceeding k-1). action terminates when n+1 element sets to 1.
If you tried and couldn't do so, tell it with comment.

Categories