Searching an array for sum of values

Searching an array for sum of values - java

I have a system that generates values in a text file which contains values as below
Line 1 : Total value possible
Line 2 : No of elements in the array
Line 3(extra lines if required) : The numbers themselves
I am now thinking of an approach where I can subtract the total value from the first integer in the array and then searching the array for the remainder and then doing the same until the pair is found.
The other approach is to add the two integers in the array on a permutation and combination basis and finding the pair.
As per my analysis the first solution is better since it cuts down on the number of iterations.Is my analysis correct here and is there any other better approach?
Edit :
I'll give a sample here to make it more clear
Line 1 : 200
Line 2=10
Line 3 : 10 20 80 78 19 25 198 120 12 65
Now the valid pair here is 80,120 since it sums up to 200 (represented in line one as Total Value possible in the input file) and their positions in the array would be 3,8.So find to this pair I listed out my approach where I take the first element and I subtract it with the Total value possible and searching the other element through basic search algorithms.
Using the example here I first take 10 and subtract it with 200 which gives 190,then I search for 190,if it is found then the pair is found otherwise continue the same process.

Your problem is vague, but if you are looking for a pair in the array that is summed to a certain number, it can be done in O(n) on average using hash tables.
Iterate the array, and for each element:
(1) Check if it is in the table. If it is - stop and return there is such a pair.
(2) Else: insert num-element to the hash table.
If your iteration terminated without finding a match - there is no such pair.
pseudo code:
checkIfPairExists(arr,num):
set <- new empty hash set
for each element in arr:
if set.contains(element):
return true
else:
set.add(num-element)
return false
The general problem of "is there a subset that sums to a certain number" is NP-Hard, and is known as the subset-sum problem, so there is no known polynomial solution to it.

If you're trying to find a pair (2) numbers which sum to a third number, in general you'll have something like:
for(i=0;i<N;i++)
for(j=i+1;j<N;j++)
if(numbers[i]+numbers[j]==result)
The answer is <i,j>
end
which is O(n^2). However, it is possible to do better.
If the list of numbers is sorted (which takes O(n log n) time) then you can try:
for(i=0;i<N;i++)
binary_search 'numbers[i+1:N]' for result-numbers[i]
if search succeeds:
The answer is <i, search_result_index>
end
That is you can step through each number and then do a binary search on the remaining list for its companion number. This takes O(n log n) time. You may need to implement the search function above yourself as built-in functions may just walk down the list in O(n) time leading to an O(n^2) result.
For both methods, you'll want to check to for the special case that the current number is equal to your result.
Both algorithms use no more space than is taken by the array itself.
Apologies for the coding style, I'm not terribly familiar with Java and it's the ideas here which are important.

Related

How can i make it clearly Guessing game?

how can i do that someone ?
i had problem about that
Enter the mode: A
Enter the minimal possible integer: 1
Enter the maximal possible integer: 10
I have generated a random integer between 1 and 10.
Try to guess: 5
No. It is smaller!
Try to guess: 3
Done.
Enter the mode: B
Enter the minimal possible integer: 1
Enter the maximal possible integer: 10
Generate a random integer between 1 and 10...
Which method I should use to guess it?
1: Binary search
2: Interpolation search
Enter your choice of the method: 1
Is it 5? (<, >, =)
<
Is it 3?
Done.

Learning different search algorithms is very interesting.
Since you say that you need to learn, I present you the theoretical explanation of both algorithms.
Binary search
This is a computer science search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.
Interpolation search
Instead of calculating the midpoint, interpolation search estimates the position of the target value, taking into account the lowest and highest elements in the array as well as length of the array. This is only possible if the array elements are numbers. It works on the basis that the midpoint is not the best guess in many cases. For example, if the target value is close to the highest element in the array, it is likely to be located near the end of the array.
To find the position to be searched, it uses following formula.
pos = lo + [ (x-arr[lo])*(hi-lo) / (arr[hi]-arr[Lo]) ]
arr[] -> Array where elements need to be searched
x -> Element to be searched
lo -> Starting index in arr[]
hi -> Ending index in arr[]
Source Wikipedia Binary Search
How you wish to implement them is at your own discretion.
If you are stuck with that and post the code of your attempt, we can help you more.

Partition array into K subsets of same sum value

trying to figure out following problem:
Given a set S of N positive integers the task is to divide them into K subsets such that the sum of the elements values in every of the K subsets is equal.
I want to do this with a set of values not more than 10 integers, with values not bigger than 10 , and less than 5 subsets.
All integers need to be distributed, and only perfect solutions (meaning all subsets are equal, no approximations) are accepted.
I want to solve it recursively using backtracking. Most ressources I found online were using other approaches I did not understand, using bitmasks or something, or only being for two subsets rather than K subsets.
My first idea was to
Sort the set by ascending order, check all base cases (e.g. an even distribution is not possible), calculate the average value all subsets have to have so that all subsets are equal.
Going through each subset, filling each (starting with the biggest values first) until that average value (meaning theyre full) is achieved.
If the average value for a subset can't be met (undistributed values are too big etc.), go back and try another combination for the previous subset.
Keep going back if dead ends are encountered.
stop if all dead ends have been encountered or a perfect solution was found.
Unfortunately I am really struggling with this, especially with implementing the backtrack and retrying new combinations.
Any help is appreciated!

the given set: S with N elements has 2^N subsets. (well explained here: https://www.mathsisfun.com/activity/subsets.html ) A partition is is a grouping of the set's elements into non-empty subsets, in such a way that every element is included in one and only one of the subsets. The total number of partitions of an n-element set is the Bell number Bn.
A solution for this problem can be implemented as follows:
1) create all possible partitions of the set S, called P(S).
2) loop over P(S) and filter out if the sum of the elements values in every subsets do not match.

How to improve the complexity of HashMap iteration?

I implemented a custom HashMap class (in C++, but shouldn't matter). The implementation is simple -
A large array holds pointers to Items.
Each item contains the key - value pair, and a pointer to an Item (to form a linked list in case of key collision).
I also implemented an iterator for it.
My implementation of incrementing/decrementing the iterator is not very efficient. From the present position, the iterator scans the array of hashes for the next non-null entry. This is very inefficient, when the map is sparsely populated (which it would be for my use case).
Can anyone suggest a faster implementation, without affecting the complexity of other operations like insert and find? My primary use case is find, secondary is insert. Iteration is not even needed, I just want to know this for the sake of learning.
PS: Why I implemented a custom class? Because I need to find strings with some error tolerance, while ready made hash maps that I have seen provide only exact match.
EDIT: To clarify, I am talking about incrementing/decrementing an already obtained iterator. Yes, this is mostly done in order to traverse the whole map.
The errors in strings (keys) in my case occur from OCR errors. So I can not use the error handling techniques used to detect typing errors. The chance of fist character being wrong is almost the same as that of the last one.
Also, my keys are always string, one word to be exact. Number of entries will be less than 5000. So hash table size of 2^16 is enough for me. Even though it will still be sparsely populated, but that's ok.
My hash function:
hash code size is 16 bits.
First 5 bits for the word length. ==> Max possible key length = 32. Reasonable, given that key is a single word.
Last 11 bits for sum of the char codes. I only store the English alphabet characters, and do not need case sensitivity. So 26 codes are enough, 0 to 25. So a key with 32 'z' = 25 * 32 = 800. Which is well within 2^11. I even have scope to add case sensitivity, if needed in future.
Now when you compare a key containing an error with the correct one,
say "hell" with "hello"
1. Length of the keys is approx the same
2. sum of their chars will differ by the sum of the dropped/added/distorted chars.
in the hash code, as first 5 bits are for length, the whole table has fixed sections for every possible length of keys. All sections are of same size. First section stores keys of length 1, second of length 2 and so on.
Now 'hello' is stored in the 5th section, as length is 5.'When we try to find 'hello',
Hashcode of 'hello' = (length - 1) (sum of chars) = (4) (7 + 4 + 11 + 11 + 14) = (4) (47)
= (00100)(00000101111)
similarly, hashcode of 'helo' = (3)(36)
= (00011)(00000100100)
We jump to its bucket, and don't find it there.
so we try to check for ONE distorted character. This will not change the length, but change the sum of characters by at max -25 to +25. So we search from 25 places backwards to 25 places forward. i.e, we check the sum part from (36-25) to (36+25) in the same section. We won't find it.
We check for an additional character error. That means the correct string would contain only 3 characters. So we go to the third section. Now sum of chars due to additional char would have increased by max 25, it has to be compensated. So search the third section for appropriate 25 places (36 - 0) to (36 - 25). Again we don't find.
Now we consider the case of a missing character. So the original string would contain 5 chars. And the second part of hashcode, sum of chars in the original string, would be more by a factor of 0 to 25. So we search the corresponding 25 buckets in the 5th section, (36 + 0) to (36 + 25). Now as 47 (the sum part of 'hello') lies in this range, we will find a match of the hashcode. Ans we also know that this match will be due to a missing character. So we compare the keys allowing for a tolerance of 1 missing character. And we get a match!
In reality, this has been implemented to allow more than one error in key.
It can also be optimized to use only 25 places for the first section (since it has only one character) and so on.
Also, checking 25 places seems overkill, as we already know the largest and smallest char of the key. But it gets complex in case of multiple errors.

You mention an 'error tolerance' for the string. Why not build in the "tolerance' into the hash function itself and thus obviate the need for iteration.

You could go the way of Javas LinkedHashMap class. It adds efficient iteration to a hashmap by also making it a doubly-linked list.
The entries are key-value pairs that have pointers to the previous and next entries. The hashmap itself has the large array as well as the head of the linked list.
Insertion/deletion are constant time for both data structures, searches are done via the hashmap, and iteration via the linked list.

Sorting string so that there aren't two same characters on adjacent places [duplicate]

It's a bonus school task for which we didn't receive any teaching yet and I'm not looking for a complete code, but some tips to get going would be pretty cool. Going to post what I've done so far in Java when I get home, but here's something I've done already.
So, we have to do a sorting algorithm, which for example sorts "AAABBB" to the ABABAB. Max input size is 10^6, and it all has to happen under 1 second. If there's more than one answer, the first one in alphabetical order is the right one. I started to test different algorithms to even sort them without that alphabetical order requirement in mind, just to see how the things work out.
First version:
Save the ascii codes to the Integer array where index is the ascii code, and the value is amount which that character occurs in the char array.
Then I picked 2 highest numbers, and started to spam them to the new character array after each other, until some number was higher, and I swapped to it. It worked well, but of course the order wasn't right.
Second version:
Followed the same idea, but stopped picking the most occurring number and just picked the indexes in the order they were in my array. Works well until the input is something like CBAYYY. Algorithm sorts it to the ABCYYY instead of AYBYCY. Of course I could try to find some free spots for those Y's, but at that point it starts to take too long.

An interesting problem, with an interesting tweak. Yes, this is a permutation or rearranging rather than a sort. No, the quoted question is not a duplicate.
Algorithm.
Count the character frequencies.
Output alternating characters from the two lowest in alphabetical order.
As each is exhausted, move to the next.
At some point the highest frequency char will be exactly half the remaining chars. At that point switch to outputting all of that char alternating in turn with the other remaining chars in alphabetical order.
Some care required to avoid off-by-one errors (odd vs even number of input characters). Otherwise, just writing the code and getting it to work right is the challenge.
Note that there is one special case, where the number of characters is odd and the frequency of one character starts at (half plus 1). In this case you need to start with step 4 in the algorithm, outputting all one character alternating with each of the others in turn.
Note also that if one character comprises more than half the input then apart for this special case, no solution is possible. This situation may be detected in advance by inspecting the frequencies, or during execution when the tail consists of all one character. Detecting this case was not part of the spec.
Since no sort is required the complexity is O(n). Each character is examined twice: once when it is counted and once when it is added to the output. Everything else is amortised.

My idea is the following. With the right implementation it can be almost linear.
First establish a function to check if the solution is even possible. It should be very fast. Something like most frequent letter > 1/2 all letters and take into cosideration if it can be first.
Then while there are still letters remaining take the alphabetically first letter that is not the same as previous, and makes further solution possible.

The correct algorithm would be the following:
Build a histogram of the characters in the input string.
Put the CharacterOccurrences in a PriorityQueue / TreeSet where they're ordered on highest occurrence, lowest alphabetical order
Have an auxiliary variable of type CharacterOccurrence
Loop while the PQ is not empty
Take the head of the PQ and keep it
Add the character of the head to the output
If the auxiliary variable is set => Re-add it to the PQ
Store the kept head in the auxiliary variable with 1 occurrence less unless the occurrence ends up being 0 (then unset it)
if the size of the output == size of the input, it was possible and you have your answer. Else it was impossible.
Complexity is O(N * log(N))

Make a bi directional table of character frequencies: character->count and count->character. Record an optional<Character> which stores the last character (or none of there is none). Store the total number of characters.
If (total number of characters-1)<2*(highest count character count), use the highest count character count character. (otherwise there would be no solution). Fail if this it the last character output.
Otherwise, use the earliest alphabetically that isn't the last character output.
Record the last character output, decrease both the total and used character count.
Loop while we still have characters.

While this question is not quite a duplicate, the part of my answer giving the algorithm for enumerating all permutations with as few adjacent equal letters as possible readily can be adapted to return only the minimum, as its proof of optimality requires that every recursive call yield at least one permutation. The extent of the changes outside of the test code are to try keys in sorted order and to break after the first hit is found. The running time of the code below is polynomial (O(n) if I bothered with better data structures), since unlike its ancestor it does not enumerate all possibilities.
david.pfx's answer hints at the logic: greedily take the least letter that doesn't eliminate all possibilities, but, as he notes, the details are subtle.
from collections import Counter
from itertools import permutations
from operator import itemgetter
from random import randrange
def get_mode(count):
return max(count.items(), key=itemgetter(1))[0]
def enum2(prefix, x, count, total, mode):
prefix.append(x)
count_x = count[x]
if count_x == 1:
del count[x]
else:
count[x] = count_x - 1
yield from enum1(prefix, count, total - 1, mode)
count[x] = count_x
del prefix[-1]
def enum1(prefix, count, total, mode):
if total == 0:
yield tuple(prefix)
return
if count[mode] * 2 - 1 >= total and [mode] != prefix[-1:]:
yield from enum2(prefix, mode, count, total, mode)
else:
defect_okay = not prefix or count[prefix[-1]] * 2 > total
mode = get_mode(count)
for x in sorted(count.keys()):
if defect_okay or [x] != prefix[-1:]:
yield from enum2(prefix, x, count, total, mode)
break
def enum(seq):
count = Counter(seq)
if count:
yield from enum1([], count, sum(count.values()), get_mode(count))
else:
yield ()
def defects(lst):
return sum(lst[i - 1] == lst[i] for i in range(1, len(lst)))
def test(lst):
perms = set(permutations(lst))
opt = min(map(defects, perms))
slow = min(perm for perm in perms if defects(perm) == opt)
fast = list(enum(lst))
assert len(fast) == 1
fast = min(fast)
print(lst, fast, slow)
assert slow == fast
for r in range(10000):
test([randrange(3) for i in range(randrange(6))])

You start by count each number of letter you have in your array:
For example you have 3 - A, 2 - B, 1 - C, 4 - Y, 1 - Z.
1) Then you put each time the lowest one (it is A), you can put.
so you start by :
A
then you can not put A any more so you put B:
AB
then:
ABABACYZ
These works if you have still at least 2 kind of characters. But here you will have still 3 Y.
2) To put the last characters, you just go from your first Y and insert one on 2 in direction of beginning.(I don't know if these is the good way to say that in english).
So ABAYBYAYCYZ.
3) Then you take the subsequence between your Y so YBYAYCY and you sort the letter between the Y :
BAC => ABC
And you arrive at
ABAYAYBYCYZ
which should be the solution of your problem.
To do all this stuff, I think a LinkedList is the best way
I hope it help :)

Use hashing to find a subarray of strings with minimum total length which contain all the distinct strings in the original array

Hi this a java exercise on hashing. First we have an array of N strings (1<=N<=100000), the program will find the minimum length of the consecutive subseries which contains all distinct strings which present in the original array.
For example, original array is {apple,orange,orange pear,pear apple,pear}
the consecutive subarrays can be {orange, pear, pear, apple}
so answer is 19
I've written a code which visit every element in the array and create a new hash table to find the length of the subarray which contain all the distinct strings. It becomes very very slow once N is larger than 1000. So I hope there is a faster algorithm. Thank you!

Pass through the array once, using a hash to keep track of whether you've seen a word before or not. Count the distinct words in the array by adding to your count only when you're seeing a word for the first time.
Pass through the array a second time, using a hash to keep track of the number of times you've seen each word. Also keep track of the sum of the lengths of all the words you've seen. Keep going until you have seen all words at least once.
Now move the start of the range forward as long as you can do so without reducing a word's count to zero. Remember to adjust your hash and letter count accordingly. This gives you the first range which includes every word at least once, and can't be reduced without excluding a word.
Repeatedly do the following: Move the left end of your range forward by one, and then move the right end forward until you find another instance of the word that you just booted from the left end. Each time you do this, you have another minimal range that includes each word once.
While doing steps 3 and 4, keep track of the minimum length so far, and the start and end of the associated range. You're done when you need to move the right end of your range past the end of the array. At this point you have the right minimum length, and the range that achieves it.
This runs in linear time.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.