Reverse Engineer Sorting Algorithm - java

I have been given 3 algorithms to reverse engineer and explain how they work, so far I have worked out that I have been given a quick sorting algorithm and a bubble sorting algorithm; however i'm not sure what algorithm this is. I understand how the quick sort and bubble sort work, but I just can't get my head around this algorithm. I'm unsure what the variables are and was hoping someone out there would be able to tell me whats going on here:
public static ArrayList<Integer> SortB(ArrayList<Integer> a)
{
ArrayList<Integer> array = CopyArray(a);
Integer[] zero = new Integer[a.size()];
Integer[] one = new Integer[a.size()];
int i,b;
Integer x,p;
//Change from 8 to 32 for whole integers - will run 4 times slower
for(b=0;b<8;++b)
{
int zc = 0;
int oc = 0;
for(i=0;i<array.size();++i)
{
x = array.get(i);
p = 1 << b;
if ((x & p) == 0)
{
zero[zc++] = array.get(i);
}
else
{
one[oc++] = array.get(i);
}
}
for(i=0;i<oc;++i) array.set(i,one[i]);
for(i=0;i<zc;++i) array.set(i+oc,zero[i]);
}
return(array);
}

This is a Radix Sort, limited to the least significant eight bits. It does not complete the sort unless you change the loop to go 32 times instead of 8.
Each iteration processes a single bit b. It prepares a mask called p by shifting 1 left b times. This produces a power of two - 1, 2, 4, 8, ..., or 1, 10, 100, 1000, 10000, ... in binary.
For each bit, the number of elements in the original array with bit b set to 1 and to 0 are separated into two buckets called one and zero. Once the separation is over, the elements are placed back into the original array, and the algorithm proceeds to the next iteration.
This implementation uses two times more storage than the size of the original array, and goes through the array a total of 16 times (64 times in the full version - once for reading and once for writing of data for each bit). The asymptotic complexity of the algorithm is linear.

Looks like a bit-by-bit radix sort to me, but it seems to be sorting backwards.

Related

What is the space complexity of bitset in this scenario

I am doing a leetcode problem where I have to find the duplicate of an array of size [1-N] inclusive and came upon this solution:
public int findDuplicate(int[] nums) {
BitSet bit = new BitSet();
for(int num : nums) {
if(!bit.get(num)) {
bit.set(num);
} else {
return num;
}
}
return -1;
}
The use of bitset here im assuming is similar to using boolean[] to keep track if we saw the current number previously. So my question is what the space complexity is for this? The runtime seems to be O(n) where n is the size of the input array. Would the same be true for the space complexity?
Link to problem : https://leetcode.com/problems/find-the-duplicate-number/
Your Bitset creates an underlying long[] to store the values. Reading the code of Bitset#set, I would say it's safe to say that the array will never be larger than max(nums) / 64 * 2 = max(nums) / 32. Since long has a fixed size, this comes down to O(max(nums)). If nums contains large values, you can do better with a hash map.
I'm trying this out with simple code, and it seems to corroborate my reading of the code.
BitSet bitSet = new BitSet();
bitSet.set(100);
System.out.println(bitSet.toLongArray().length); // 2 (max(nums) / 32 = 3.125)
bitSet.set(64000);
System.out.println(bitSet.toLongArray().length); // 1001 (max(nums) / 32 = 2000)
bitSet.set(100_000);
System.out.println(bitSet.toLongArray().length); // 1563 (max(nums) / 32 = 3125)
Note that the 2 factor I added is conservative, in general it will be a smaller factor, that's why my formula consistently over-estimates the actual length of the long array, but never by more than a factor of 2. This is the code in Bitset that made me add it:
private void ensureCapacity(int wordsRequired) {
if (words.length < wordsRequired) {
// Allocate larger of doubled size or required size
int request = Math.max(2 * words.length, wordsRequired);
words = Arrays.copyOf(words, request);
sizeIsSticky = false;
}
}
In summary, I would say the bit set is only a good idea if you have reason to believe you have smaller values than you have values (count). For example, if you have only two values but they are over a billion in value, you will needlessly allocate an array of several million elements.
Additionally, even in cases where values remain small, this solutions performs poorly for sorted arrays because Bitset#set will always reallocate and copy the array, so your complexity is not linear at all, it's quadratic in max(nums), which can be terrible if max(nums) is very large. To be linear, you would need to first find the maximum, allocate the necessary length in the Bitset, and then only go through the array.
At this point, using a map is simpler and fits all situations. If speed really matters, my bet is that the Bitset will beat a map under specific conditions (lots of values, but small, and by pre-sizing the bit set as described).

Finding mode for every window of size k in an array

Given an array of size n and k, how do you find the mode for every contiguous subarray of size k?
For example
arr = 1 2 2 6 6 1 1 7
k = 3
ans = 2 2 6 6 1 1
I was thinking of having a hashmap where the key is no and value is frequency, treemap where the key is freq and value is number, and having a queue to remove the first element when the size > k. Here the time complexity is o(nlog(n)). Can we do this in O(1)?.
This can be done in O(n) time
I was intrigued by this problem in part because, as I indicated in the comments, I felt certain that it could be done in O(n) time. I had some time over this past weekend, so I wrote up my solution to this problem.
Approach: Mode Frequencies
The basic concept is this: the mode of a collection of numbers is the number(s) which occur with the highest frequency within that set.
This means that whenever you add a number to the collection, if the number added was not already one of the mode-values then the frequency of the mode would not change. So with the collection (8 9 9) the mode-values are {9} and the mode-frequency is 2. If you add say a 5 to this collection ((8 9 9 5)) neither the mode-frequency nor the mode-values change. If instead you add an 8 to the collection ((8 9 9 8)) then the mode-values change to {9, 8} but the mode-frequency is still unchanged at 2. Finally, if you instead added a 9 to the collection ((8 9 9 9)), now the mode-frequency goes up by one.
Thus in all cases when you add a single number to the collection, the mode-frequency is either unchanged or goes up by only one. Likewise, when you remove a single number from the collection, the mode-frequency is either unchanged or goes down by at most one. So all incremental changes to the collection result in only two possible new mode-frequencies. This means that if we had all of the distinct numbers of the collection indexed by their frequencies, then we could always find the new Mode in a constant amount of time (i.e., O(1)).
To accomplish this I use a custom data structure ("ModeTracker") that has a multiset ("numFreqs") to store the distinct numbers of the collection along with their current frequency in the collection. This is implemented with a Dictionary<int, int> (I think that this is a Map in Java). Thus given a number, we can use this to find its current frequency within the collection in O(1).
This data structure also has an array of sets ("freqNums") that given a specific frequency will return all of the numbers that have that frequency in the current collection.
I have included the code for this data structure class below. Note that this is implemented in C# as I do not know Java well enough to implement it there, but I believe that a Java programmer should have no trouble translating it.
(pseudo)Code:
class ModeTracker
{
HashSet<int>[] freqNums; //numbers at each frequency
Dictionary<int, int> numFreqs; //frequencies for each number
int modeFreq_ = 0; //frequency of the current mode
public ModeTracker(int maxFrequency)
{
freqNums = new HashSet<int>[maxFrequency + 2];
// populate frequencies, so we dont have to check later
for (int i=0; i<maxFrequency+1; i++)
{
freqNums[i] = new HashSet<int>();
}
numFreqs = new Dictionary<int, int>();
}
public int Mode { get { return freqNums[modeFreq_].First(); } }
public void addNumber(int n)
{
int newFreq = adjustNumberCount(n, 1);
// new mode-frequency is one greater or the same
if (freqNums[modeFreq_+1].Count > 0) modeFreq_++;
}
public void removeNumber(int n)
{
int newFreq = adjustNumberCount(n, -1);
// new mode-frequency is the same or one less
if (freqNums[modeFreq_].Count == 0) modeFreq_--;
}
int adjustNumberCount(int num, int adjust)
{
// make sure we already have this number
if (!numFreqs.ContainsKey(num))
{
// add entries for it
numFreqs.Add(num, 0);
freqNums[0].Add(num);
}
// now adjust this number's frequency
int oldFreq = numFreqs[num];
int newFreq = oldFreq + adjust;
numFreqs[num] = newFreq;
// remove old freq for this number and and the new one
freqNums[oldFreq].Remove(num);
freqNums[newFreq].Add(num);
return newFreq;
}
}
Also, below is a small C# function that demonstrates how to use this datastructure to solve the problem originally posed in the question.
int[] ModesOfSubarrays(int[] arr, int subLen)
{
ModeTracker tracker = new ModeTracker(subLen);
int[] modes = new int[arr.Length - subLen + 1];
for (int i=0; i < arr.Length; i++)
{
//add every number into the tracker
tracker.addNumber(arr[i]);
if (i >= subLen)
{
// remove the number that just rotated out of the window
tracker.removeNumber(arr[i-subLen]);
}
if (i >= subLen - 1)
{
// add the new Mode to the output
modes[i - subLen + 1] = tracker.Mode;
}
}
return modes;
}
I have tested this and it does appear to work correctly for all of my tests.
Complexity Analysis
Going through the individual steps of the `ModesOfSubarrays()` function:
The new ModeTracker object is created in O(n) time or less.
The modes[] array is created in O(n) time.
The For(..) loops N times:
. 3a: the addNumber() function takes O(1) time
. 3b: the removeNumber() function takes O(1) time
. 3c: getting the new Mode takes O(1) time
So the total time is O(n) + O(n) + n*(O(1) + O(1) + O(1)) = O(n)
Please let me know of any questions that you might have about this code.

Array Duplicate Efficiency Riddle

Recently in AP Computer Science A, our class recently learned about arrays. Our teacher posed to us a riddle.
Say you have 20 numbers, 10 through 100 inclusive, right? (these numbers are gathered from another file using Scanners)
As each number is read, we must print the number if and only if it is not a duplicate of a number already read. Now, here's the catch. We must use the smallest array possible to solve the problem.
That's the real problem I'm having. All of my solutions require a pretty big array that has 20 slots in it.
I am required to use an array. What would be the smallest array that we could use to solve the problem efficiently?
If anyone could explain the method with pseudocode (or in words) that would be awesome.
In the worst case we have to use an array of length 19.
Why 19? Each unique number has to be remembered in order to sort out duplicates from the following numbers. Since you know that there are 20 numbers incoming, but not more, you don't have to store the last number. Either the 20th number already appeared (then don't do anything), or the 20th number is unique (then print it and exit – no need to save it).
By the way: I wouldn't call an array of length 20 big :)
If your numbers are integers: You have a range from 10 to 100. So you need 91 Bits to store which values have already been read. A Java Long has 64 Bits. So you will need an array of two Longs. Let every Bit (except for the superfluous ones) stand for a number from 10 to 100. Initialize both longs with 0. When a number is read, check if the corresponding bit mapped to the read value is set to 1. If yes, the read number is a duplicate, if no set the bit to 1.
This is the idea behind the BitSet class.
Agree with Socowi. If number of numbers is known and it is equal to N , it is always possible to use N-1 array to store duplicates. Once the last element from the input is received and it is already known that this is the last element, it is not really needed to store this last value in the duplicates array.
Another idea. If your numbers are small and really located in [10:100] diapason, you can use 1 Long number for storing at least 2 small Integers and extract them from Long number using binary AND to extract small integers values back. In this case it is possible to use N/2 array. But it will make searching in this array more complicated and does not save much memory, only number of items in the array will be decreased.
You technically don't need an array, since the input size is fixed, you can just declare 20 variables. But let's say it wasn't fixed.
As other answer says, worst case is indeed 19 slots in the array. But, assuming we are talking about integers here, there is a better case scenario where some numbers form a contiguous interval. In that case, you only have to remember the highest and lowest number, since anything in between is also a duplicate. You can use an array of intervals.
With the range of 10 to 100, the numbers can be spaced apart and you still need an array of 19 intervals, in the worst case. But let's say, that the best case occurs, and all numbers form a contiguous interval, then you only need 1 array slot.
The problem you'd still have to solve is to create an abstraction over an array, that expands itself by 1 when an element is added, so it will use the minimal size necessary. (Similar to ArrayList, but it doubles in size when capacity is reached).
Since an array cannot change size at run time You need a companion variable to count the numbers that are not duplicates and fill the array partially with only those numbers.
Here is a simple code that use companion variable currentsize and fill the array partially.
Alternative you can use arrayList which change size during run time
final int LENGTH = 20;
double[] numbers = new double[LENGTH];
int currentSize = 0;
Scanner in = new Scanner(System.in);
while (in.hasNextDouble()){
if (currentSize < numbers.length){
numbers[currentSize] = in.nextDouble();
currentSize++;
}
}
Edit
Now the currentSize contains those actual numbers that are not duplicates and you did not fill all 20 elements in case you had some duplicates. Of course you need some code to determine whither a numbers is duplicate or not.
My last answer misunderstood what you were needing, but I turned this thing up that does it an int array of 5 elements using bit shifting. Since we know the max number is 100 we can store (Quite messily) four numbers into each index.
Random rand = new Random();
int[] numbers = new int[5];
int curNum;
for (int i = 0; i < 20; i++) {
curNum = rand.nextInt(100);
System.out.println(curNum);
boolean print = true;
for (int x = 0; x < i; x++) {
byte numberToCheck = ((byte) (numbers[(x - (x % 4)) / 4] >>> ((x%4) * 8)));
if (numberToCheck == curNum) {
print = false;
}
}
if (print) {
System.out.println("No Match: " + curNum);
}
int index = ((i - (i % 4)) / 4);
numbers[index] = numbers[index] | (curNum << (((i % 4)) * 8));
}
I use rand to get my ints but you could easily change this to a scanner.

Longest sequence of numbers

I was recently asked this question in an interview for which i could give an O(nlogn) solution, but couldn't find a logic for O(n) . Can someone help me with O(n) solution?
In an array find the length of longest sequence of numbers
Example :
Input : 2 4 6 7 3 1
Output: 4 (because 1,2,3,4 is a sequence even though they are not in consecutive positions)
The solution should also be realistic in terms of space consumed . i.e the solution should be realistic even with an array of 1 billion numbers
For non-consecutive numbers you needs a means of sorting them in O(n). In this case you can use BitSet.
int[] ints = {2, 4, 6, 7, 3, 1};
BitSet bs = new BitSet();
IntStream.of(ints).forEach(bs::set);
// you can search for the longer consecutive sequence.
int last = 0, max = 0;
do {
int set = bs.nextSetBit(last);
int clear = bs.nextClearBit(set + 1);
int len = clear - set;
if (len > max)
max = len;
last = clear;
} while (last > 0);
System.out.println(max);
Traverse the array once and build the hash map whose key is a number from the input array and value is a boolean variable indicating whether the element has been processed or not (initially all are false). Traverse once more and do the following: when you check number a, put value true for that element in the hash map and immediately check the hash map for the existence of the elements a-1 and a+1. If found, denote their values in the hash map as true and proceed checking their neighbors, incrementing the length of the current contigous subsequence. Stop when there are no neighbors, and update longest length. Move forward in the array and continue checking unprocessed elements. It is not obvious at the first glance that this solution is O(n), but there are only two array traversals and hash map ensures that every element of the input is processed only once.
Main lesson - if you have to reduce time complexity, it is often neccesary to use additional space.

Finding unique numbers from sorted array in less than O(n)

I had an interview and there was the following question:
Find unique numbers from sorted array in less than O(n) time.
Ex: 1 1 1 5 5 5 9 10 10
Output: 1 5 9 10
I gave the solution but that was of O(n).
Edit: Sorted array size is approx 20 billion and unique numbers are approx 1000.
Divide and conquer:
look at the first and last element of a sorted sequence (the initial sequence is data[0]..data[data.length-1]).
If both are equal, the only element in the sequence is the first (no matter how long the sequence is).
If the are different, divide the sequence and repeat for each subsequence.
Solves in O(log(n)) in the average case, and O(n) only in the worst case (when each element is different).
Java code:
public static List<Integer> findUniqueNumbers(int[] data) {
List<Integer> result = new LinkedList<Integer>();
findUniqueNumbers(data, 0, data.length - 1, result, false);
return result;
}
private static void findUniqueNumbers(int[] data, int i1, int i2, List<Integer> result, boolean skipFirst) {
int a = data[i1];
int b = data[i2];
// homogenous sequence a...a
if (a == b) {
if (!skipFirst) {
result.add(a);
}
}
else {
//divide & conquer
int i3 = (i1 + i2) / 2;
findUniqueNumbers(data, i1, i3, result, skipFirst);
findUniqueNumbers(data, i3 + 1, i2, result, data[i3] == data[i3 + 1]);
}
}
I don't think it can be done in less than O(n). Take the case where the array contains 1 2 3 4 5: in order to get the correct output, each element of the array would have to be looked at, hence O(n).
If your sorted array of size n has m distinct elements, you can do O(mlogn).
Note that this is going to efficient when m << n (eg m=2 and n=100)
Algorithm:
Initialization: Current element y = first element x[0]
Step 1: Do a binary search for the last occurrence of y in x (can be done in O(log(n)) time. Let it's index be i
Step 2: y = x[i+1] and go to step 1
Edit: In cases where m = O(n) this algorithm is going to work badly. To alleviate it you can run it in parallel with regular O(n) algorithm. The meta algorithm consists of my algorithm and O(n) algorithm running in parallel. The meta algorithm stops when either of these two algorithms complete.
Since the data consists of integers, there are a finite number of unique values that can occur between any two values. So, start with looking at the first and last value in the array. If a[length-1] - a[0] < length - 1, there will be some repeating values. Put a[0] and a[length-1] into some constant-access-time container like a hash set. If the two values are equal, you konow that there is only one unique value in the array and you are done. You know that the array is sorted. So, if the two values are different, you can look at the middle element now. If the middle element is already in the set of values, you know that you can skip the whole left part of the array and only analyze the right part recursively. Otherwise, analyze both left and right part recursively.
Depending on the data in the array you will be able to get the set of all unique values in a different number of operations. You get them in constant time O(1) if all the values are the same since you will know it after only checking the first and last element. If there are "relatively few" unique values, your complexity will be close to O(log N) because after each partition you will "quite often" be able to throw away at least one half of the analyzed sub-array. If the values are all unique and a[length-1] - a[0] = length - 1, you can also "define" the set in constant time because they have to be consecutive numbers from a[0] to a[length-1]. However, in order to actually list them, you will have to output each number, and there are N of them.
Perhaps someone can provide a more formal analysis, but my estimate is that this algorithm is roughly linear in the number of unique values rather than the size of the array. This means that if there are few unique values, you can get them in few operations even for a huge array (e.g. in constant time regardless of array size if there is only one unique value). Since the number of unique values is no grater than the size of the array, I claim that this makes this algorithm "better than O(N)" (or, strictly: "not worse than O(N) and better in many cases").
import java.util.*;
/**
* remove duplicate in a sorted array in average O(log(n)), worst O(n)
* #author XXX
*/
public class UniqueValue {
public static void main(String[] args) {
int[] test = {-1, -1, -1, -1, 0, 0, 0, 0,2,3,4,5,5,6,7,8};
UniqueValue u = new UniqueValue();
System.out.println(u.getUniqueValues(test, 0, test.length - 1));
}
// i must be start index, j must be end index
public List<Integer> getUniqueValues(int[] array, int i, int j) {
if (array == null || array.length == 0) {
return new ArrayList<Integer>();
}
List<Integer> result = new ArrayList<>();
if (array[i] == array[j]) {
result.add(array[i]);
} else {
int mid = (i + j) / 2;
result.addAll(getUniqueValues(array, i, mid));
// avoid duplicate divide
while (mid < j && array[mid] == array[++mid]);
if (array[(i + j) / 2] != array[mid]) {
result.addAll(getUniqueValues(array, mid, j));
}
}
return result;
}
}

Categories