Set of m integers from an array of size n - java

Write a method to randomly generate a set of m integers from an array of size n. Each element must have equal probability of being chosen.
This is a fairly well known question - featured in multiple books and interviews - but either I am not reading the question correctly, or to me the requirements of this question cannot actually all be fulfilled at the same time in Java.
Let's say have an array of size n=3
Integer[] ar = {1,1,5}
if w chose m=2 for our randomly generated set, I don't see how we can guarantee an equal probability for each element to be chosen.
In other words, asking for a Java set of 2 integers from the given array of size 3 makes it impossible to ensure an equal probability for each element. To illustrate, if we call the [0] a, [1] b, [2] c, then the all the 2 element combinations chosen at random, with removal, will look like this:
ab
ba
ac
bc
ca
cb
Since choices 1) and 2) would automatically invalidate a unique element requirement of a set in Java, in this particular situation element 'c' i.e. number 5 will always end up with a probability of 100% if we are to end up with a set of 2 elements.
I guess it is even easier to illustrate this issue if the array contains only duplicates i.e. {1,1,1}, and then a set of m=2 integers would simply be impossible.
Is there something I am misreading or misinterpreting with regards to this question.

Related

Would it be efficient to sort with indexes

So I was thinking of a new sorting algorithm that might be efficient but I am not too sure about that.
1) Imagine we have an array a of only positive numbers.
2) We go through the array and find the biggest number n.
3) We create a new array b of the size n+1.
4) We go through every entry in the unsorted array and increase the value in the second array at the index of the number of the unsorted array we are looking at by one. (In pseudo-code this means: b[a[i]]++; while a[i] is the number we are currently looking at)
5) Once we have done this with every element in a, the array b stores at every index the exact amount of numbers of this index. (For example: b[0] = 3 means that we had 3 zeros in the initial array a)
6) We go through the whole array b and skip all the empty fields and create a new List or Array out of it.
So I can imagine that this algorithm can be very fast and efficient for smaller numbers only since at the end we have to go through the whole array b to build the sorted one, which is going to be really time consuming.
If we for example have an array a = {1000, 1} it would still check 1001 elements in array b wether or not they are 0 even though we only have 2 elements in the initial array.
With smaller numbers however we should almost get a O(n) result? I am not too sure about that and that's why I am asking you. Maybe I am even missing something really important. Thanks for your help in advance :)
Congratulations on independently re-discovering the counting sort.
This is indeed a very good sorting strategy for situations when the range is limited, and the number of items is significantly greater than the number of items in your array.
In situations when the range is greater than the number of items in the array a traditional sorting algorithm would give you better performance.
Algorithms of this kind are called pseudo-polynomial.
we should almost get a O(n) result
You get O(N+M) result, when M - max number in first array. Plus, you spend O(M) memory, so it have sense only if M is small. See counting sort

How can I write a recursive permutation function without using arrays and substrings in Java

My teacher and I were discussing whether or not a recursive permutation function could be written without the use of substrings and/or arrays in Java.
Is there a way to do this?
The answer is yes, this can be done. I'm assuming that "without the use of substrings and/or arrays" refers to the info being passed to the recursion. You have to have some sort of container for the elements that are to be permuted.
In that case it can be done by pulling some hideous tricks with numerically encoding the indices of the elements as digits of a numeric argument. For instance, if there are 3 elements and I use 1 as a sentinel value in the left-most digit (so you can have 0 as the leading index sometimes), 1 means I haven't started, 10 means the first element has been selected, 102 means the first and third, and 1021 means I'm ready to print the permutation since I now have a 4 digit argument and there are 3 elements in the set. I can then deconstruct which elements to print using % 10 and / 10 arithmetic to pick them off.
I implemented this in Ruby rather than Java, and I'm not going to share the actual code because it's too horrible to contemplate. However, it works recursively with only the input array of elements and an integer as arguments, no partial solution substrings or arrays.

How to improve the complexity of HashMap iteration?

I implemented a custom HashMap class (in C++, but shouldn't matter). The implementation is simple -
A large array holds pointers to Items.
Each item contains the key - value pair, and a pointer to an Item (to form a linked list in case of key collision).
I also implemented an iterator for it.
My implementation of incrementing/decrementing the iterator is not very efficient. From the present position, the iterator scans the array of hashes for the next non-null entry. This is very inefficient, when the map is sparsely populated (which it would be for my use case).
Can anyone suggest a faster implementation, without affecting the complexity of other operations like insert and find? My primary use case is find, secondary is insert. Iteration is not even needed, I just want to know this for the sake of learning.
PS: Why I implemented a custom class? Because I need to find strings with some error tolerance, while ready made hash maps that I have seen provide only exact match.
EDIT: To clarify, I am talking about incrementing/decrementing an already obtained iterator. Yes, this is mostly done in order to traverse the whole map.
The errors in strings (keys) in my case occur from OCR errors. So I can not use the error handling techniques used to detect typing errors. The chance of fist character being wrong is almost the same as that of the last one.
Also, my keys are always string, one word to be exact. Number of entries will be less than 5000. So hash table size of 2^16 is enough for me. Even though it will still be sparsely populated, but that's ok.
My hash function:
hash code size is 16 bits.
First 5 bits for the word length. ==> Max possible key length = 32. Reasonable, given that key is a single word.
Last 11 bits for sum of the char codes. I only store the English alphabet characters, and do not need case sensitivity. So 26 codes are enough, 0 to 25. So a key with 32 'z' = 25 * 32 = 800. Which is well within 2^11. I even have scope to add case sensitivity, if needed in future.
Now when you compare a key containing an error with the correct one,
say "hell" with "hello"
1. Length of the keys is approx the same
2. sum of their chars will differ by the sum of the dropped/added/distorted chars.
in the hash code, as first 5 bits are for length, the whole table has fixed sections for every possible length of keys. All sections are of same size. First section stores keys of length 1, second of length 2 and so on.
Now 'hello' is stored in the 5th section, as length is 5.'When we try to find 'hello',
Hashcode of 'hello' = (length - 1) (sum of chars) = (4) (7 + 4 + 11 + 11 + 14) = (4) (47)
= (00100)(00000101111)
similarly, hashcode of 'helo' = (3)(36)
= (00011)(00000100100)
We jump to its bucket, and don't find it there.
so we try to check for ONE distorted character. This will not change the length, but change the sum of characters by at max -25 to +25. So we search from 25 places backwards to 25 places forward. i.e, we check the sum part from (36-25) to (36+25) in the same section. We won't find it.
We check for an additional character error. That means the correct string would contain only 3 characters. So we go to the third section. Now sum of chars due to additional char would have increased by max 25, it has to be compensated. So search the third section for appropriate 25 places (36 - 0) to (36 - 25). Again we don't find.
Now we consider the case of a missing character. So the original string would contain 5 chars. And the second part of hashcode, sum of chars in the original string, would be more by a factor of 0 to 25. So we search the corresponding 25 buckets in the 5th section, (36 + 0) to (36 + 25). Now as 47 (the sum part of 'hello') lies in this range, we will find a match of the hashcode. Ans we also know that this match will be due to a missing character. So we compare the keys allowing for a tolerance of 1 missing character. And we get a match!
In reality, this has been implemented to allow more than one error in key.
It can also be optimized to use only 25 places for the first section (since it has only one character) and so on.
Also, checking 25 places seems overkill, as we already know the largest and smallest char of the key. But it gets complex in case of multiple errors.
You mention an 'error tolerance' for the string. Why not build in the "tolerance' into the hash function itself and thus obviate the need for iteration.
You could go the way of Javas LinkedHashMap class. It adds efficient iteration to a hashmap by also making it a doubly-linked list.
The entries are key-value pairs that have pointers to the previous and next entries. The hashmap itself has the large array as well as the head of the linked list.
Insertion/deletion are constant time for both data structures, searches are done via the hashmap, and iteration via the linked list.

Subsets of a given Set of Integers whose sum is a Constant N : Java

Given a set of integers, how to find a subset that sums to a given value...the subset problem ?
Example : S = {1,2,4,3,2,5} and n= 7
Finding the possible subsets whose sum is n.
I tried to google out found many links,but were not clear.
How can we solve this in java and what is the data structure to be used and its complexity ?
In three steps:
Find the powerset of S (the set of all subsets of S)
Compute the sum of each subset
Filter out subsets that did not sum to 7.
I wont give you any code, but explain how it works.
Run a loop from 0 to (2^k-1)
For each value in 1, a 1 in its binary representation indicates that this value is chosen and 0 otherwise.
Test to see if the sum of chosen numbers is equal to n.
The above method will evaluate each possible subset of the given set.
If the upper limit of the values is small, then Dynamic Programming Approach could be used.

Algorithm which tells if a number is obtainable from a given set using only '+' ,'*' and brackets

I have two list of numbers, for every member of the second one I must tell if it's obtainable using all the numbers of the first one and placing '+' or '*' and as many '(' ')' I want.
I can't change the order .
List1 can contain a max of 20 elements beetween 1 and 100.
List2 can contain max 5 elements beetween 1 and 20'000.
EX:
List1=[2 4 3 5]
List2=[19 15 24]
19-> 2+(4*3)+5 YES
15 NO
24->2*(4+3+5) YES
With brute force it takes ages to handle inputs with List1 larger than 10.
edit: numbers are always positive.
edit:
I find the max and min numbers that are obtainable from the list and then I discard all the possibilities that have the target outside this range, then I try all the remaining ones.
MAX=n1*n2*n3*....*ni if there are 1 thei r added to their smallest neighbour
MIN=n1+n2+....+ni 1 excluded
Still it's not fast enough when input are big (List1 longer than 10 or numbers in List2 bigger than 10000)
For each sublist of List1, compute the numbers between 1 and 20,000 that can be made with that sublist. The resulting DP bears resemblance to CYK.
I'm being somewhat vague here because this is almost certainly a programming contest problem.
#u mad is correct, but I'll give a little more detail.
Suppose that n = size of list 1. For each 0 <= i < j < n you need to compute all of the distinct values in the range (1..20_000) that can be made from the numbers in the interval [i, j-1]. You can do this with recursion and memoization.
Once you've done this then the problem is easy.
You could try a smart brute force which discards sets of equations by chunks.

Categories