Find n highest numbers - java

There are millions of integers are given. How to find out n largest numbers from this? Note that since the input is huge i cant store anything in the memory.
Any suggestions?
Thanks
shag

You can iterate through all numbers (reading them from a media one by one for example) and only keep a list with the 10 maximum numbers.
In pseudo code:
max_numbers = new int[n]
until not end of file:
read number
if number > min(max_numbers):
'copy number to minimum value of max_numbers'

Just have an array of n elements and if you find one number that is bigger than the smallest in the array, you can change it.
You could keep an extra variable where you keep the smallest number in the array so you only iterate on it when you know you have to change something.

Get an array of 10 length, while you run through numbers, swap the smallest with a new bigger.

public void largest() {
int _current, _highest, _lowest;
if(_current >= _highest) {
_highest = _current;
} else if(_current <= _lowest) {
_lowest = _current;
}
}
What I would do.

Maintain a Max-Heap of size n.

EDITED
I recommend forming a priority-queue (heap based), taking Michael's suggestion to it's logical conclusion. Don't store 10, store n.
PQ a[n];
a.insert(input);
O(log n) FTW

Related

find the highest product you can get from three of the integers in an array - how to solve using brute force

int arr[] = {10, 10, 1, 3};
Assumptions: Assume every int is positive. Assume array contains at least 3 ints
Find the highest product you can get from three of the integers in the above array. We should return 300 (which we get by taking 10 ∗ 10 ∗ 3).
I want to solve this using brute force method. Basically, I want to multiply each integer by each other integer, and then multiply that product by each other other integer. Can anyone show me how this can be done using nested 3 loops because I want to learn how it's done using brute force first before trying the optimized approach.
Thanks.
Using three for loops :
public static Integer highestProduct(int array[])
{
if((array==null)||(array.length<3))
{
return null;
}
else
{
int max_product = Integer.MIN_VALUE;
for(int i=0;i<array.length;i++)
{
for(int j=i+1;j<array.length;j++)
{
for(int k=j+1;k<array.length;k++)
{
int product = array[i]*array[j]*array[k];
if(product>=max_product)
{
max_product = product;
}
}
}
}
return max_product;
}
}
there are some solutions to ignore brute force.
1. Sort
First you can sort array, it takes O(nlogn) time.
After sorting select last 3 items. As they are highest items, then product will maximum
NOTE: It will not work if there are any negative numbers in your array.
For fixing it you can check some combinations. First calculate first 3 items product, then last 3, then First 2 and Last 1. One of these will be greatest.
2. Dynamic programming
Please see matrix chain multiplication or max growing length problems and dynamic programming solutions for them. It will help you to understand what is dynamic programming and create simple algorithm to solve your problem.

Finding mean and median in constant time

This is a common interview question.
You have a stream of numbers coming in (let's say more than a million). The numbers are between [0-999]).
Implement a class which supports three methods in O(1)
* insert(int i);
* getMean();
* getMedian();
This is my code.
public class FindAverage {
private int[] store;
private long size;
private long total;
private int highestIndex;
private int lowestIndex;
public FindAverage() {
store = new int[1000];
size = 0;
total = 0;
highestIndex = Integer.MIN_VALUE;
lowestIndex = Integer.MAX_VALUE;
}
public void insert(int item) throws OutOfRangeException {
if(item < 0 || item > 999){
throw new OutOfRangeException();
}
store[item] ++;
size ++;
total += item;
highestIndex = Integer.max(highestIndex, item);
lowestIndex = Integer.min(lowestIndex, item);
}
public float getMean(){
return (float)total/size;
}
public float getMedian(){
}
}
I can't seem to think of a way to get the median in O(1) time.
Any help appreciated.
You have already done all the heavy lifting, by building the store counters. Together with the size value, it's easy enough.
You simply start iterating the store, summing up the counts until you reach half of size. That is your median value, if size is odd. For even size, you'll grab the two surrounding values and get their average.
Performance is O(1000/2) on average, which means O(1), since it doesn't depend on n, i.e. performance is unchanged even if n reaches into the billions.
Remember, O(1) doesn't mean instant, or even fast. As Wikipedia says it:
An algorithm is said to be constant time (also written as O(1) time) if the value of T(n) is bounded by a value that does not depend on the size of the input.
In your case, that bound is 1000.
The possible values that you can read are quite limited - just 1000. So you can think of implementing something like a counting sort - each time a number is input you increase the counter for that value.
To implement the median in constant time, you will need two numbers - the median index(i.e. the value of the median) and the number of values you've read and that are on the left(or right) of the median. I will just stop here hoping you will be able to figure out how to continue on your own.
EDIT(as pointed out in the comments): you already have the array with the sorted elements(stored) and you know the number of elements to the left of the median(size/2). You only need to glue the logic together. I would like to point out that if you use linear additional memory you won't need to iterate over the whole array on each insert.
For the general case, where range of elements is unlimited, such data structure does not exist based on any comparisons based algorithm, as it will allow O(n) sorting.
Proof: Assume such DS exist, let it be D.
Let A be input array for sorting. (Assume A.size() is even for simplicity, that can be relaxed pretty easily by adding a garbage element and discarding it later).
sort(A):
ds = new D()
for each x in A:
ds.add(x)
m1 = min(A) - 1
m2 = max(A) + 1
for (i=0; i < A.size(); i++):
ds.add(m1)
# at this point, ds.median() is smallest element in A
for (i = 0; i < A.size(); i++):
yield ds.median()
# Each two insertions advances median by 1
ds.add(m2)
ds.add(m2)
Claim 1: This algorithm runs in O(n).
Proof: Since we have constant operations of add() and median(), each of them is O(1) per iteration, and the number of iterations is linear - the complexity is linear.
Claim 2: The output is sorted(A).
Proof (guidelines): After inserting n times m1, the median is the smallest element in A. Each two insertions after it advances the median by one item, and since the advance is sorted, the total output is sorted.
Since the above algorithm sorts in O(n), and not possible under comparisons model, such DS does not exist.
QED.

Array Duplicate Efficiency Riddle

Recently in AP Computer Science A, our class recently learned about arrays. Our teacher posed to us a riddle.
Say you have 20 numbers, 10 through 100 inclusive, right? (these numbers are gathered from another file using Scanners)
As each number is read, we must print the number if and only if it is not a duplicate of a number already read. Now, here's the catch. We must use the smallest array possible to solve the problem.
That's the real problem I'm having. All of my solutions require a pretty big array that has 20 slots in it.
I am required to use an array. What would be the smallest array that we could use to solve the problem efficiently?
If anyone could explain the method with pseudocode (or in words) that would be awesome.
In the worst case we have to use an array of length 19.
Why 19? Each unique number has to be remembered in order to sort out duplicates from the following numbers. Since you know that there are 20 numbers incoming, but not more, you don't have to store the last number. Either the 20th number already appeared (then don't do anything), or the 20th number is unique (then print it and exit – no need to save it).
By the way: I wouldn't call an array of length 20 big :)
If your numbers are integers: You have a range from 10 to 100. So you need 91 Bits to store which values have already been read. A Java Long has 64 Bits. So you will need an array of two Longs. Let every Bit (except for the superfluous ones) stand for a number from 10 to 100. Initialize both longs with 0. When a number is read, check if the corresponding bit mapped to the read value is set to 1. If yes, the read number is a duplicate, if no set the bit to 1.
This is the idea behind the BitSet class.
Agree with Socowi. If number of numbers is known and it is equal to N , it is always possible to use N-1 array to store duplicates. Once the last element from the input is received and it is already known that this is the last element, it is not really needed to store this last value in the duplicates array.
Another idea. If your numbers are small and really located in [10:100] diapason, you can use 1 Long number for storing at least 2 small Integers and extract them from Long number using binary AND to extract small integers values back. In this case it is possible to use N/2 array. But it will make searching in this array more complicated and does not save much memory, only number of items in the array will be decreased.
You technically don't need an array, since the input size is fixed, you can just declare 20 variables. But let's say it wasn't fixed.
As other answer says, worst case is indeed 19 slots in the array. But, assuming we are talking about integers here, there is a better case scenario where some numbers form a contiguous interval. In that case, you only have to remember the highest and lowest number, since anything in between is also a duplicate. You can use an array of intervals.
With the range of 10 to 100, the numbers can be spaced apart and you still need an array of 19 intervals, in the worst case. But let's say, that the best case occurs, and all numbers form a contiguous interval, then you only need 1 array slot.
The problem you'd still have to solve is to create an abstraction over an array, that expands itself by 1 when an element is added, so it will use the minimal size necessary. (Similar to ArrayList, but it doubles in size when capacity is reached).
Since an array cannot change size at run time You need a companion variable to count the numbers that are not duplicates and fill the array partially with only those numbers.
Here is a simple code that use companion variable currentsize and fill the array partially.
Alternative you can use arrayList which change size during run time
final int LENGTH = 20;
double[] numbers = new double[LENGTH];
int currentSize = 0;
Scanner in = new Scanner(System.in);
while (in.hasNextDouble()){
if (currentSize < numbers.length){
numbers[currentSize] = in.nextDouble();
currentSize++;
}
}
Edit
Now the currentSize contains those actual numbers that are not duplicates and you did not fill all 20 elements in case you had some duplicates. Of course you need some code to determine whither a numbers is duplicate or not.
My last answer misunderstood what you were needing, but I turned this thing up that does it an int array of 5 elements using bit shifting. Since we know the max number is 100 we can store (Quite messily) four numbers into each index.
Random rand = new Random();
int[] numbers = new int[5];
int curNum;
for (int i = 0; i < 20; i++) {
curNum = rand.nextInt(100);
System.out.println(curNum);
boolean print = true;
for (int x = 0; x < i; x++) {
byte numberToCheck = ((byte) (numbers[(x - (x % 4)) / 4] >>> ((x%4) * 8)));
if (numberToCheck == curNum) {
print = false;
}
}
if (print) {
System.out.println("No Match: " + curNum);
}
int index = ((i - (i % 4)) / 4);
numbers[index] = numbers[index] | (curNum << (((i % 4)) * 8));
}
I use rand to get my ints but you could easily change this to a scanner.

Determining Highest and Lowest Numbers in an Array

I'm trying to solve a problem where I need to write java code to find the two highest and the smallest number in an array given the below conditions:
-Every element is a real number
-Every element is random
Any ideas on the best approach?
You have to examine every number, so your best algorithm is linear in the length of the array.
The standard approach is to just scan the array, keeping track of the two smallest and largest numbers that you've seen so far.
So, given that firstMin, secondMin, firstMax, secondMax are respectively the smallest, second smallest, largest and second largest values that you've seen so far, on the next iteration of the loop:
if (value > firstMax) {
secondMax = firstMax;
firstMax = value;
}
else if (value > secondMax) {
secondMax = value;
}
if (value < firstMin) {
secondMin = firstMin;
firstMin = value;
}
else if (value < secondMin) {
secondMin = value;
}
At the end of this block, we maintain the invariant that firstMin, secondMin, firstMax, secondMax are respectively the smallest, second smallest, largest and second largest values that you've seen so far. This proves correctness.
This algorithm is linear in the length of the array and examines each value exactly once and makes the minimum number of comparisons. It is also O(1) in space, and is optimal in that it uses only four extra memory locations for the top and bottom two values.
Have a variable to keep track of the min, second_min, max, and second_max values that you have seen so far.
You can go through the elements of the array one by one, and update your min/max variables accordingly. Here are some cases to consider:
If current element is smaller than your min, save your min to second_min and update your min.
If current element is larger than your max, save your max to second_max and update your max
If current element is smaller then second_min, but larger than min, update second_min only
If current element is larger then second_max, but smaller than max, update second_max only
Is super-optimized performance necessary? If not,
Arrays.sort( array );
Look at the first and last two elements.
I think the best way is use the array as a heap or a special data structure if you can. Keeping the structure balance and sorted will be most efficient for you. This implies keeping some elements empty at some indixes in your array since the index of the array will be information about your ADT. http://en.wikipedia.org/wiki/Heap_(data_structure)

Split an array in half and find the two Max values and then merge the two values

I'm taking an online class so there isn't any help from the teachers or other classmates. Our assignment is that we need to find the max value and index of an array of random numbers. We need to do it in two ways. A regualr loop(brute force) and divide and conquer. In the divide and conquer we need to split the array into two smaller arrays and find the max of both and then merge.
I got the brute force to work and I got the divide and conqure to find the max also. But I can't seem to get the max of the two smaller arrays and merge the two. We also need to check for how many comparison is made by both methods and print the output.
Here's what I have so far:
public class MinMaxValues{
// Find maxiumum (largest) value in array using Divide and Conquer
public static int findMax( int[]numbers, int left, int right )
{
int middle;
int max_l, max_r, max_m;
if ( left == right ) // Only one element...
{
// Base case: solved easily...
return numbers[left];
}
else
{
// Solve smaller problems
middle = (left+right)/2; // Divide into 2 halves
max_l = findMax( numbers, left, middle);
// Find max in first half
max_r = findMax( numbers, middle+1, right);
// Find max in second half
//System.out.println("Maximum Value = " + max_r);
max_m = max_l+ max_r;
// Use the solutions to solve original problem
if ( max_l > max_r )
return(max_l);
else
return(max_r);
//return(max_m);
}
}
}
You are never returning an array.
Also you don't make any changes to the array.
You must change the array in some way once you find the max.
Try wrapping it with a method.
public static int[] maxSort(int[] array,int length){
int[] sorted = new int[array.length];
sorted[arrayLength]=findmax(array,0,sorted,arrayLength);//assumes find max returns maximum value of entire array.
while(length>0){
sorted=maxsort(array,length--);
}
return sorted;
}
I am not 100% sure it's working by i think it's a step in the right direction.
You need to more carefully address points in your program where you are comparing the index or the value at that index. For example, instead of checking whether max_l > max_r, I believe you mean to be checking whether numbers[max_l] > numbers[max_r].

Categories