Longest sequence of numbers - java

I was recently asked this question in an interview for which i could give an O(nlogn) solution, but couldn't find a logic for O(n) . Can someone help me with O(n) solution?
In an array find the length of longest sequence of numbers
Example :
Input : 2 4 6 7 3 1
Output: 4 (because 1,2,3,4 is a sequence even though they are not in consecutive positions)
The solution should also be realistic in terms of space consumed . i.e the solution should be realistic even with an array of 1 billion numbers

For non-consecutive numbers you needs a means of sorting them in O(n). In this case you can use BitSet.
int[] ints = {2, 4, 6, 7, 3, 1};
BitSet bs = new BitSet();
IntStream.of(ints).forEach(bs::set);
// you can search for the longer consecutive sequence.
int last = 0, max = 0;
do {
int set = bs.nextSetBit(last);
int clear = bs.nextClearBit(set + 1);
int len = clear - set;
if (len > max)
max = len;
last = clear;
} while (last > 0);
System.out.println(max);

Traverse the array once and build the hash map whose key is a number from the input array and value is a boolean variable indicating whether the element has been processed or not (initially all are false). Traverse once more and do the following: when you check number a, put value true for that element in the hash map and immediately check the hash map for the existence of the elements a-1 and a+1. If found, denote their values in the hash map as true and proceed checking their neighbors, incrementing the length of the current contigous subsequence. Stop when there are no neighbors, and update longest length. Move forward in the array and continue checking unprocessed elements. It is not obvious at the first glance that this solution is O(n), but there are only two array traversals and hash map ensures that every element of the input is processed only once.
Main lesson - if you have to reduce time complexity, it is often neccesary to use additional space.

Related

Finding number of subarrays whose sum equals `k`

We would be given an array of integers and a value k. We need to find the total number of sub-arrays whose sum equals k.
I found some interesting code online (on Leetcode) which is as follows:
public class Solution {
public int subarraySum(int[] nums, int k) {
int sum = 0, result = 0;
Map<Integer, Integer> preSum = new HashMap<>();
preSum.put(0, 1);
for (int i = 0; i < nums.length; i++) {
sum += nums[i];
if (preSum.containsKey(sum - k)) {
result += preSum.get(sum - k);
}
preSum.put(sum, preSum.getOrDefault(sum, 0) + 1);
}
return result;
}
}
To understand it, I walked through some specific examples like [1,1,1,1,1] with k=3 and [1,2,3,0,3,2,6] with k=6. While the code works perfectly in both the cases, I fail to follow how it actually computes the output.
I have two specific points of confusion:
1) Why does the code continuously add the values in the array, without ever zeroing it out? For example, in case of [1,1,1,1,1] with k=3, once sum=3, don't we need to reset sum to zero? Doesn't not resetting sum interfere with finding later subarrays?
2) Shouldn't we simply do result++ when we find a subarray of sum k? Why do we add preSum.get(sum-k) instead?
Let's handle your first point of confusion first:
The reason the code keeps summing the array and doesn't reset sum is because we are saving the sum in preSum (previous sums) as we go. Then, any time we get to a point where sum-k is a previous sum (say at index i), we know that the sum between index i and our current index is exactly k.
For example, in the image below with i=2, and our current index equal to 4, we can see that since 9, the sum at our current index, minus 3, the sum at index i, is 6, the sum between indexes 2 and 4 (inclusive) is 6.
Another way to think about this is to see that discarding [1,2] from the array (at our current index of 4) gives us a subarray of sum 6, for similar reasons as above (see image for details).
Using this method of thinking, we can say we want to discard from the front of the array until we are left with a subarray of sum k. We could do this by saying, for each index, "discard just 1, then discard 1+2, then discard 1+2+3, etc" (these numbers are from our example) until we found a subarray of sum k (k=6 in our example).
That gives a perfectly valid solution, but notice we would be doing this at every index of our array, and thus summing the same numbers over and over. A way to save computation would be to save these sums for later use. Even better, we already sum these same numbers to get our current sum, so we can just save that total as we go.
To find a subarray, we can just look through our saved sums, subtracting them and testing if what we are left with is k. It is a bit annoying to have to subtract every saved sum, so we can use the commutativity of subtraction to see that if sum-x=k is true, sum-k=x is also true. This way we can just see if x is a saved sum, and, if it is, know we have found a subarray of size k. A hash map makes this lookup efficient.
Now for your second point of confusion:
Most of the time you are right, upon finding an appropriate subarray we could just do result++. Almost always, the values in preSum will be 1, so result+=preSum.get(sum-k) will be equivalent to result+=1, or result++.
The only time it isn't is when preSum.put is called on a sum that has been reached before. How can we get back to a sum we already had? The only way is with either negative numbers, which cancel out previous numbers, or with zero, which doesn't affect the sum at all.
Basically, we get back to a previous sum when a subarray's sum is equal to 0. Two examples of such subarrays are [2,-2] or the trivial [0]. With such a subarray, when we find a later, adjoining subarray with sum k, we need to add more than 1 to result as we have found more than one new subarray, one with the zero-sum subarray (sum=k+0) and one without it (sum=k).
This is the reason for that +1 in the preSum.put as well. Every time we reach the same sum again, we have found another zero-sum subarray. With two zero-sum subarrays, finding a new adjoining subarray with sum=k actually gives 3 subarrays: the new subarray (sum=k), the new subarray plus the first zero-sum (sum=k+0), and the original with both zero-sums (sum=k+0+0). This logic holds for higher numbers of zero-sum subarrays as well.

Understanding Radix Sort Algorithm [duplicate]

This question already has answers here:
Radix Sort Algorithm
(4 answers)
Closed 8 years ago.
I am trying to understand how this radix sort algorithm works. I am new to algorithms and bits so this isn't coming easy to me. So far i have added these comments to my code to try and make it easier to understand. I am not sure if I have grasped the concept correctly so if anyone can see any problems with my comments/something I am not understanding correctly please help me out :)
Also would anyone be able to explain this line of code to me: mask = 1 << bit;
My commented code:
public static ArrayList<Integer> RadixSort(ArrayList<Integer> a)
//This method implements the radix sort algorithm, taking an integer array as an input
{
ArrayList<Integer> array = CopyArray(a);
//Created a new integer array called 'array' and set it to equal the array inputed to the method
//This was done by copying the array entered to the method through the CopyArray method, then setting the results of the method to the new empty array
Integer[] zerobucket = new Integer[a.size()];
Integer[] onebucket = new Integer[a.size()];
//Created two more integer arrays to act as buckets for the binary values
//'zerobucket' will hold array elements where the ith bit is equal to 0
//'onebucket' will hold array elements where the ith bit is equal to 1
int i, bit;
//Created two integer variables i & bit, these will be used within the for loops below
//Both i & bit will be incremented to run the radix sort for every bit of the binary value, for every element in the array
Integer element, mask;
//Created an integer object called element, this will be used to retrieve the ith element of the unsorted array
//Created an integer object called mask, this will be used to compare the bit values of each element
for(bit=0; bit<8; ++bit)
//Created a for loop to run for every bit of the binary value e.g.01000000
//Change from 8 to 32 for whole integers - will run 4 times slower
{
int zcount = 0;
int ocount = 0;
//Created two integer variables to allow the 'zerobucket' and 'onebucket' arrays to be increment within the for loop below
for(i=0; i<array.size(); ++i)
//Created a nested for loop to run for every element of the unsorted array
//This allows every bit for every binary value in the array
{
element = array.get(i);
//Set the variable 'element' to equal the ith element in the array
mask = 1 << bit;
if ((element & mask) == 0)
//If the selected bit of the binary value is equal to 0, run this code
{
zerobucket[zcount++] = array.get(i);
//Set the next element of the 'zerobucket' array to equal the ith element of the unsorted array
}
else
//Else if the selected but of the binary value is not equal to 0, run this code
{
onebucket[ocount++] = array.get(i);
//Set the next element of the 'onebucket' array to equal the ith element of the unsorted array
}
}
for(i=0; i<ocount; ++i)
//Created a for loop to run for every element within the 'onebucket' array
{
array.set(i,onebucket[i]);
//Appended the ith element of the 'onebucket' array to the ith position in the unsorted array
}
for(i=0; i<zcount; ++i)
//Created a for loop to run for every element within the 'zerobucket' array
{
array.set(i+ocount,zerobucket[i]);
//Appended the ith element of the 'zerobucket' array to the ith position in the unsorted array
}
}
return(array);
//Returned the sorted array to the method
}
I did not write this code I was given it to try to understand
I'll answer your questions in reverse order...
mask = 1 << bit;
Due to precedence rules, you could write this as
mask = (1 << bit);
Which is a bit more obvious. Take the integer 1 (0x01), shift it left by the bit position, and assign it to mask. So if bit is 2, mask is 00000010 (skipping leading zeroes). If bit is 4, mask is 00001000. And so on.
The reason for the mask is the line that follows:
if ((element & mask) == 0)
which is meant to identify whether the bit as the position bit is a 1 or zero. An item ANDed with the bitmask will either be zero or non-zero depending on whether the bit in the same position as the 1 in the bitmask is 0 or non-zero respectively.
Now the more complicated question. The algorithm in question is a least significant bit radix sort, meaning a radix sort in which the passes over the sorted values go from the least to most significant (or in the case of software integers, right to left) bits.
The following pseudocode describes your code above:
array = copy (a)
for every bit position from 0 to radix // radix is 8
for each item in array
if bit at bit position in item is 0
put item in zeroes bucket
else
put item in ones bucket
array = ordered concatenation of ones bucket and zeroes bucket
return array
So why does this work? You can think of this as an iterative weighted ranking of the items in the array. All other bits being equal, an item with a 1 bit will be a larger item than an item with a 0 bit (rank the items). Each pass becomes more important to final position (weight the rankings). The successive application of these binary sorts will result in items that more often have a 1 being more often placed in the 1's bucket. Each time that an item stays in the 1's bucket when other items don't, that item improves its relative position in the 1's bucket when compared to other items that in future passes may also have a 1. Consistent presence in the 1's bucket is then associated with consistent improvement in position, the logical extreme of which will be the largest value in the array.
Hope that helps.

Finding unique numbers from sorted array in less than O(n)

I had an interview and there was the following question:
Find unique numbers from sorted array in less than O(n) time.
Ex: 1 1 1 5 5 5 9 10 10
Output: 1 5 9 10
I gave the solution but that was of O(n).
Edit: Sorted array size is approx 20 billion and unique numbers are approx 1000.
Divide and conquer:
look at the first and last element of a sorted sequence (the initial sequence is data[0]..data[data.length-1]).
If both are equal, the only element in the sequence is the first (no matter how long the sequence is).
If the are different, divide the sequence and repeat for each subsequence.
Solves in O(log(n)) in the average case, and O(n) only in the worst case (when each element is different).
Java code:
public static List<Integer> findUniqueNumbers(int[] data) {
List<Integer> result = new LinkedList<Integer>();
findUniqueNumbers(data, 0, data.length - 1, result, false);
return result;
}
private static void findUniqueNumbers(int[] data, int i1, int i2, List<Integer> result, boolean skipFirst) {
int a = data[i1];
int b = data[i2];
// homogenous sequence a...a
if (a == b) {
if (!skipFirst) {
result.add(a);
}
}
else {
//divide & conquer
int i3 = (i1 + i2) / 2;
findUniqueNumbers(data, i1, i3, result, skipFirst);
findUniqueNumbers(data, i3 + 1, i2, result, data[i3] == data[i3 + 1]);
}
}
I don't think it can be done in less than O(n). Take the case where the array contains 1 2 3 4 5: in order to get the correct output, each element of the array would have to be looked at, hence O(n).
If your sorted array of size n has m distinct elements, you can do O(mlogn).
Note that this is going to efficient when m << n (eg m=2 and n=100)
Algorithm:
Initialization: Current element y = first element x[0]
Step 1: Do a binary search for the last occurrence of y in x (can be done in O(log(n)) time. Let it's index be i
Step 2: y = x[i+1] and go to step 1
Edit: In cases where m = O(n) this algorithm is going to work badly. To alleviate it you can run it in parallel with regular O(n) algorithm. The meta algorithm consists of my algorithm and O(n) algorithm running in parallel. The meta algorithm stops when either of these two algorithms complete.
Since the data consists of integers, there are a finite number of unique values that can occur between any two values. So, start with looking at the first and last value in the array. If a[length-1] - a[0] < length - 1, there will be some repeating values. Put a[0] and a[length-1] into some constant-access-time container like a hash set. If the two values are equal, you konow that there is only one unique value in the array and you are done. You know that the array is sorted. So, if the two values are different, you can look at the middle element now. If the middle element is already in the set of values, you know that you can skip the whole left part of the array and only analyze the right part recursively. Otherwise, analyze both left and right part recursively.
Depending on the data in the array you will be able to get the set of all unique values in a different number of operations. You get them in constant time O(1) if all the values are the same since you will know it after only checking the first and last element. If there are "relatively few" unique values, your complexity will be close to O(log N) because after each partition you will "quite often" be able to throw away at least one half of the analyzed sub-array. If the values are all unique and a[length-1] - a[0] = length - 1, you can also "define" the set in constant time because they have to be consecutive numbers from a[0] to a[length-1]. However, in order to actually list them, you will have to output each number, and there are N of them.
Perhaps someone can provide a more formal analysis, but my estimate is that this algorithm is roughly linear in the number of unique values rather than the size of the array. This means that if there are few unique values, you can get them in few operations even for a huge array (e.g. in constant time regardless of array size if there is only one unique value). Since the number of unique values is no grater than the size of the array, I claim that this makes this algorithm "better than O(N)" (or, strictly: "not worse than O(N) and better in many cases").
import java.util.*;
/**
* remove duplicate in a sorted array in average O(log(n)), worst O(n)
* #author XXX
*/
public class UniqueValue {
public static void main(String[] args) {
int[] test = {-1, -1, -1, -1, 0, 0, 0, 0,2,3,4,5,5,6,7,8};
UniqueValue u = new UniqueValue();
System.out.println(u.getUniqueValues(test, 0, test.length - 1));
}
// i must be start index, j must be end index
public List<Integer> getUniqueValues(int[] array, int i, int j) {
if (array == null || array.length == 0) {
return new ArrayList<Integer>();
}
List<Integer> result = new ArrayList<>();
if (array[i] == array[j]) {
result.add(array[i]);
} else {
int mid = (i + j) / 2;
result.addAll(getUniqueValues(array, i, mid));
// avoid duplicate divide
while (mid < j && array[mid] == array[++mid]);
if (array[(i + j) / 2] != array[mid]) {
result.addAll(getUniqueValues(array, mid, j));
}
}
return result;
}
}

ArrayList sorting longest sequence

I'm not asking anyone to solve this for me, I just need a little push because I have no earthly idea on where to begin with this. All I know is that I should implement collections in this and have a sort.
Write a method longestSortedSequence that returns the length of the longest sorted sequence within a list of integers. For example, if a variable called list stores the following sequence of values:
[1, 3, 5, 2, 9, 7, -3, 0, 42, 308, 17]
then the call: list.longestSortedSequence() would return the value 4 because it is the length of the longest sorted sequence within this list (the sequence -3, 0, 42, 308). If the list is empty, your method should return 0. Notice that for a non-empty list the method will always return a value of at least 1 because any individual element constitutes a sorted sequence.
Assume you are adding to the ArrayIntList class with following fields:
public class ArrayIntList
{
private int[] elementData;
private int size;
// your code goes here
}
Iterate the array, and increment the counter variable if the next element you process is larger then the last one.
If the next element is smaller, or the end of the array is reached, store the current counter value if its larger then the currently stored max value and reset the counter variable with 0.
Pseudo code:
Variable X: first item of list
Variable Y: length of sequence (initial: 1)
Variable Z: max length occurred (initial: 0)
Loop over the list starting from 2nd index
if item is higher than X
set X to item
add 1 to Y
else
if Y is higher than Z
set Z to Y
end if
set X to item
set Y to 1
end if
End-Loop
This method will restart the counter every time the sequence 'restarts', aka: it's no longer sorted. While the list is sorted it just adds 1 for each element that is in sorted order.
When the sequence stops being ordered it checks if the current sequence is longer than the longest sequence length so far. If it is, you have your new longest sequence.
Have you thought about a for loop and if else statements? i hope this doesn't give it away. think one element at a time.
Loop over your array and compare i element with i+1 element. Make a counter. While i is less than i+1 increment the counter, when i is greater than i+1 reset the counter.

Looking for a hint (not the answer) on how to return the longest acsending non contiguous substring when I already have the length

My code currently returns the length of the largest substring:
for(int i = 1; i<=l-1;i++)
{
counter = 1;
for(int j = 0; j<i;j++)
{
if(seq[j]<seq[j+1])
{
count[j] = counter++;
}
}
}
for(int i = 0;i<l-1;i++)
{
if(largest < count[i+1])
{
largest = count[i+1];
}
}
assuming seq is the numbers in the sequence. So if the sequence is: 5;3;4;8;6;7, it prints out 4. However, I would like it to also print out 3;4;6;7 which is the longest subsisting in ascending order.
I am trying to get the length of the largest sub sequence itself and the actual sequence, but I already have length..
My instinct is to store each number in the array, while it is working out the count, with the count. So returning the longest count, can also return the array attatched to it. I think this can be done with hashtables, but I'm not sure how to use those.
I am just looking for a hint, not the answer.
Thanks
You need to implement a dynamic programming algorithm for the longest ascending subsequence. The idea is to store a pair of values for each position i:
The length of the longest ascending subsequence that ends at position i
The index of the item preceding the current one in such ascending subsequence, or -1 if all prior numbers are greater than or equal to the current one.
You can easily build both these arrays by setting the first pair to {Length=1, Prior=-1}, walking the array in ascending order, and looking for the "best" predecessor for the current item at index i. The predecessor must fit these two conditions:
It must have lower index and be smaller than the item at i, and
It must end an ascending subsequence of length greater than the one that you have found so far.
Here is how the data would look for your sequence:
Index: 0 1 2 3 4 5
Value: 5 3 4 8 6 7
------------ ----------------
Length: 1 1 2 3 3 4
Predecessor: -1 -1 1 2 2 4
Once you finish the run, find the max value among lengths array, and chain it back to the beginning using the predecessor's indexes until you hit -1.

Categories