ArrayList or any other elegant solution?

ArrayList or any other elegant solution? - java

(Apologies in advance for the descriptive question)
I have to take integers as values for two variables and keep taking the integer values for as long as I like- from the console. At any point of time , I might requisition what is the sum of the integers entered corresponding to the variables. For example, For Variable A , I have entered 5,5,4 and for Variable B I have entered 2,3,6 [These inputs are taken from the console at different times, not at the same time]. At any point of time I would need to display the sum of A( which is 14) and sum of B(which is 11).
What I have thought is to declare two arraylists : one each for A and B. As I get the values for for A & B, I would keep adding them to their respective arraylists. Whenever there is a demand for displaying the sum of A , I would add the values in the index of Arraylist for A and display. Same for B.
However, I have a doubt that this approach may be inefficient. Because if I enter one thousand integers for variable A, then the arrayList would become 1000 indexes long and would consume too much memory and summing would be too much of work. Can I achieve what I want without using an arraylist ? some other elegant solution?

You have doubt about 1000 integers in a list, and the time it takes to sum them? Let me remove all your doubts: You don't have a problem.
Try this code:
long start = System.currentTimeMillis();
ArrayList<Integer> list = new ArrayList<>();
for (int i = 0; i < 1000000; i++)
list.add(i);
int sum = 0;
for (int num : list)
sum += num;
long end = System.currentTimeMillis();
System.out.printf("Sum of %d values is %d%n", list.size(), sum);
System.out.printf("Built and calculated in %.3f seconds%n", (end - start) / 1000.0);
It built an ArrayList<Integer> with one million values, not a measly 1000, then sums them. Output is:
Sum of 1000000 values is 1783293664
Built and calculated in 0.141 seconds
So, a million in way less than one second. 1000? Not an issue!

It sounds like you are trying to do something like:
(in Java)
private int a = 0;
public void add_to_A(int input)
{
a += input;
return;
}
public int a_sum()
{
return a;
}
As noted, an arrayList would require O(n) space, and would run at O(n) time for any given point in the input. If you received a pathological input, asking for the sum at every step, this would require O(n^2) time, which wouldnt be good for this problem.
As other people have noted, memory isnt going to be an issue for 1000 integers, BUT for the sake of knowledge, it is important to note that if we increase the size of our input a few orders of magnitude and give it a bad input (ask the size of A at every step) run time will blow up quickly. Going by the statistic Andreas submitted, if the array algorithm runs in a few milliseconds for 1000 inputs, the same algorithm running for a million inputs would be closer to a few days/weeks for an unlucky run.

Related

My collection's maximum size is not sufficient for computation, what can I do about it?

Working with ArrayList, its size grows exponentially with each iteration and, over time, the collection becomes too small to continue the computation.
With each iteration, you need to change each element and, according to some condition, add more new elements. My code worked for 100 iterations, but now it needs 256 and it can't handle it anymore.
It turns out a very large flow of calculations, size exceeds int and I need to continue computing in long, but ArrayList cant do it due to size restrictions.
I guess that at a certain stage I need to create new ArrayList's, also iterate over each of them multithreaded, adding new data to the new one, without hitting the limit of ArrayList's sizes.
How can I realize that? Initial size of my ArrayList is 300, each value is a number from 0 to 5.
Here is the code for my methods:
public static void main(String...args) {
ArrayList<Integer> old = new ArrayList<>();
//here is filling my array with 300 elements, too much numbers, don't want to dirty the question with them
//now I call the killer method, the capacity of which is no longer enough
calculateNew(256, old); //it was worked for 100 iterarions, but now i need 256 and he cant do it
old.size(); //accordingly, I need to calculate this value after 256 iterations
}
static void calculateNew(int days, ArrayList<Integer> old){
for(int i = 0; i < days; i++){
oneDayNew(old);
System.out.println("Day " + (i+1<10? i+1 + " " : i+1) + ": size of school: " + old.size());
}
}
static void oneDayNew(ArrayList<Integer> old){
int countNew = 0;
for(int i = 0; i < old.size(); i++){
if(old.get(i) == 0) {
countNew++;
old.set(i, 7);
}
if(old.get(i) > 0) {
int temp = old.get(i);
old.set(i, --temp);
}
}
if(countNew > 0 ){
for(int i = 0; i < countNew; i++){
old.add(8);
}
}
}
What can I do? Multi-threading? Array of ArrayLists? I'm looking for hints, not a ready-made solution.

The size of school after 256 iterations is over 350 billion (starting with 300 random elements sized 0-8). Even if every number was 1 byte, which it isn't, it would need over 350 GB of memory. As it stands, it needs 10* that. Therefore, your approach unfortunately does not scale. This is common for brute force algorithms. You do not want to store every number separately, and you definitely do not want to process them one by one.
Change your data structure.
For example, you can group the numbers together and only store how many of each numbers you have, using a long[] as a histogram. That way, histogram[0] would store how many zeros you have, histogram[1] would store how many ones you have, etc. all the way up to your max value. Of course, a Map<Integer, Long> would work just as well for the purpose. That way, you only use constant memory, and you process much fewer numbers per iteration as they are grouped together.

Calculate absolute minimum difference between any two numbers of a huge integer array

I have a very long array in a Java program (300 000+ unsorted integers) and need to calculate the minimum absolute difference between any two numbers inside the array, and display the absolute difference and the corresponding pair of numbers as an output. The whole calculation should happen very quickly.
I have the following code, which would usually work:
private static void calcMinAbsDiff(int[] inputArray)
{
Arrays.sort(inputArray);
int minimum = Math.abs(inputArray[1] - inputArray[0]);
int firstElement = inputArray[0];
int secondElement = inputArray[1];
for (int i = 2; i < inputArray.length; i++)
{
if(Math.abs(inputArray[i] - inputArray[i-1]) < minimum)
{
minimum = Math.abs(inputArray[i] - inputArray[i-1]);
firstElement = inputArray[i-1];
secondElement = inputArray[i];
}
}
System.out.println("Minimum Absolute Difference : "+minimum);
System.out.println("Pair of Elements : ("+firstElement+", "+secondElement+")");
}
However, the output I receive is all 0s. I believe this is because the array is way too long.

If you have two or more zeros and no negative integers in your dataset, then your output is expected. After sorting, then inputArray[0] and inputArray[1] would both be 0, and the difference would be 0. No other pair of adjacent elements would have an absolute difference less than 0, so minimum, firstElement, and second Element would all be 0 at the end of the algorithm.
If you really have no zeros in your dataset, or if you do have negative integers, then you may have an initialization problem. Check this thread:
Why is my simple Array only printing zeros in java?
If that's not it, then only other thing I can think of is that you have a problem in the previous scope causing the data to get zeroed out.
I would try printing samples of your dataset at various points to see exactly where/when it's getting zeroed.
If you still have trouble, then post more info on the dataset and the scope which calls this function to help us see what's going on. Let us know how you make out!

Finding mean and median in constant time

This is a common interview question.
You have a stream of numbers coming in (let's say more than a million). The numbers are between [0-999]).
Implement a class which supports three methods in O(1)
* insert(int i);
* getMean();
* getMedian();
This is my code.
public class FindAverage {
private int[] store;
private long size;
private long total;
private int highestIndex;
private int lowestIndex;
public FindAverage() {
store = new int[1000];
size = 0;
total = 0;
highestIndex = Integer.MIN_VALUE;
lowestIndex = Integer.MAX_VALUE;
}
public void insert(int item) throws OutOfRangeException {
if(item < 0 || item > 999){
throw new OutOfRangeException();
}
store[item] ++;
size ++;
total += item;
highestIndex = Integer.max(highestIndex, item);
lowestIndex = Integer.min(lowestIndex, item);
}
public float getMean(){
return (float)total/size;
}
public float getMedian(){
}
}
I can't seem to think of a way to get the median in O(1) time.
Any help appreciated.

You have already done all the heavy lifting, by building the store counters. Together with the size value, it's easy enough.
You simply start iterating the store, summing up the counts until you reach half of size. That is your median value, if size is odd. For even size, you'll grab the two surrounding values and get their average.
Performance is O(1000/2) on average, which means O(1), since it doesn't depend on n, i.e. performance is unchanged even if n reaches into the billions.
Remember, O(1) doesn't mean instant, or even fast. As Wikipedia says it:
An algorithm is said to be constant time (also written as O(1) time) if the value of T(n) is bounded by a value that does not depend on the size of the input.
In your case, that bound is 1000.

The possible values that you can read are quite limited - just 1000. So you can think of implementing something like a counting sort - each time a number is input you increase the counter for that value.
To implement the median in constant time, you will need two numbers - the median index(i.e. the value of the median) and the number of values you've read and that are on the left(or right) of the median. I will just stop here hoping you will be able to figure out how to continue on your own.
EDIT(as pointed out in the comments): you already have the array with the sorted elements(stored) and you know the number of elements to the left of the median(size/2). You only need to glue the logic together. I would like to point out that if you use linear additional memory you won't need to iterate over the whole array on each insert.

For the general case, where range of elements is unlimited, such data structure does not exist based on any comparisons based algorithm, as it will allow O(n) sorting.
Proof: Assume such DS exist, let it be D.
Let A be input array for sorting. (Assume A.size() is even for simplicity, that can be relaxed pretty easily by adding a garbage element and discarding it later).
sort(A):
ds = new D()
for each x in A:
ds.add(x)
m1 = min(A) - 1
m2 = max(A) + 1
for (i=0; i < A.size(); i++):
ds.add(m1)
# at this point, ds.median() is smallest element in A
for (i = 0; i < A.size(); i++):
yield ds.median()
# Each two insertions advances median by 1
ds.add(m2)
ds.add(m2)
Claim 1: This algorithm runs in O(n).
Proof: Since we have constant operations of add() and median(), each of them is O(1) per iteration, and the number of iterations is linear - the complexity is linear.
Claim 2: The output is sorted(A).
Proof (guidelines): After inserting n times m1, the median is the smallest element in A. Each two insertions after it advances the median by one item, and since the advance is sorted, the total output is sorted.
Since the above algorithm sorts in O(n), and not possible under comparisons model, such DS does not exist.
QED.

Array Duplicate Efficiency Riddle

Recently in AP Computer Science A, our class recently learned about arrays. Our teacher posed to us a riddle.
Say you have 20 numbers, 10 through 100 inclusive, right? (these numbers are gathered from another file using Scanners)
As each number is read, we must print the number if and only if it is not a duplicate of a number already read. Now, here's the catch. We must use the smallest array possible to solve the problem.
That's the real problem I'm having. All of my solutions require a pretty big array that has 20 slots in it.
I am required to use an array. What would be the smallest array that we could use to solve the problem efficiently?
If anyone could explain the method with pseudocode (or in words) that would be awesome.

In the worst case we have to use an array of length 19.
Why 19? Each unique number has to be remembered in order to sort out duplicates from the following numbers. Since you know that there are 20 numbers incoming, but not more, you don't have to store the last number. Either the 20th number already appeared (then don't do anything), or the 20th number is unique (then print it and exit – no need to save it).
By the way: I wouldn't call an array of length 20 big :)

If your numbers are integers: You have a range from 10 to 100. So you need 91 Bits to store which values have already been read. A Java Long has 64 Bits. So you will need an array of two Longs. Let every Bit (except for the superfluous ones) stand for a number from 10 to 100. Initialize both longs with 0. When a number is read, check if the corresponding bit mapped to the read value is set to 1. If yes, the read number is a duplicate, if no set the bit to 1.
This is the idea behind the BitSet class.

Agree with Socowi. If number of numbers is known and it is equal to N , it is always possible to use N-1 array to store duplicates. Once the last element from the input is received and it is already known that this is the last element, it is not really needed to store this last value in the duplicates array.
Another idea. If your numbers are small and really located in [10:100] diapason, you can use 1 Long number for storing at least 2 small Integers and extract them from Long number using binary AND to extract small integers values back. In this case it is possible to use N/2 array. But it will make searching in this array more complicated and does not save much memory, only number of items in the array will be decreased.

You technically don't need an array, since the input size is fixed, you can just declare 20 variables. But let's say it wasn't fixed.
As other answer says, worst case is indeed 19 slots in the array. But, assuming we are talking about integers here, there is a better case scenario where some numbers form a contiguous interval. In that case, you only have to remember the highest and lowest number, since anything in between is also a duplicate. You can use an array of intervals.
With the range of 10 to 100, the numbers can be spaced apart and you still need an array of 19 intervals, in the worst case. But let's say, that the best case occurs, and all numbers form a contiguous interval, then you only need 1 array slot.
The problem you'd still have to solve is to create an abstraction over an array, that expands itself by 1 when an element is added, so it will use the minimal size necessary. (Similar to ArrayList, but it doubles in size when capacity is reached).

Since an array cannot change size at run time You need a companion variable to count the numbers that are not duplicates and fill the array partially with only those numbers.
Here is a simple code that use companion variable currentsize and fill the array partially.
Alternative you can use arrayList which change size during run time
final int LENGTH = 20;
double[] numbers = new double[LENGTH];
int currentSize = 0;
Scanner in = new Scanner(System.in);
while (in.hasNextDouble()){
if (currentSize < numbers.length){
numbers[currentSize] = in.nextDouble();
currentSize++;
}
}
Edit
Now the currentSize contains those actual numbers that are not duplicates and you did not fill all 20 elements in case you had some duplicates. Of course you need some code to determine whither a numbers is duplicate or not.

My last answer misunderstood what you were needing, but I turned this thing up that does it an int array of 5 elements using bit shifting. Since we know the max number is 100 we can store (Quite messily) four numbers into each index.
Random rand = new Random();
int[] numbers = new int[5];
int curNum;
for (int i = 0; i < 20; i++) {
curNum = rand.nextInt(100);
System.out.println(curNum);
boolean print = true;
for (int x = 0; x < i; x++) {
byte numberToCheck = ((byte) (numbers[(x - (x % 4)) / 4] >>> ((x%4) * 8)));
if (numberToCheck == curNum) {
print = false;
}
}
if (print) {
System.out.println("No Match: " + curNum);
}
int index = ((i - (i % 4)) / 4);
numbers[index] = numbers[index] | (curNum << (((i % 4)) * 8));
}
I use rand to get my ints but you could easily change this to a scanner.

Enhanced for loop performance worse than traditional indexed lookup?

I just came across this seemingly innocuous comment, benchmarking ArrayList vs a raw String array. It's from a couple years ago, but the OP writes
I did notice that using for String s: stringsList was about 50% slower than using an old-style for-loop to access the list. Go figure...
Nobody commented on it in the original post, and the test seemed a little dubious (too short to be accurate), but I nearly fell out of my chair when I read it. I've never benchmarked an enhanced loop against a "traditional" one, but I'm currently working on a project that does hundreds of millions of iterations over ArrayList instances using enhanced loops so this is a concern to me.
I'm going to do some benchmarking and post my findings here, but this is obviously a big concern to me. I could find precious little info online about relative performance, except for a couple offhand mentions that enhanced loops for ArrayLists do run a lot slower under Android.
Has anybody experienced this? Does such a performance gap still exist? I'll post my findings here, but was very surprised to read it. I suspect that if this performance gap did exist, it has been fixed in more modern VM's, but I guess now I'll have to do some testing and confirm.
Update: I made some changes to my code, but was already suspecting what others here have already pointed out: sure the enhanced for loop is slower, but outside of very trivial tight loops, the cost should be a miniscule fraction of the cost of the logic of the loop. In my case, even though I'm iterating over very large lists of strings using enhanced loops, my logic inside the loop is complex enough that I couldn't even measure a difference after switching to index-based loops.
TL;DR: enhanced loops are indeed slower than a traditional index-based loop over an arraylist; but for most applications the difference should be negligible.

The problem you have is that using an Iterator will be slower than using a direct lookup. On my machine the difference is about 0.13 ns per iteration. Using an array instead saves about 0.15 ns per iteration. This should be trivial in 99% of situations.
public static void main(String... args) {
int testLength = 100 * 1000 * 1000;
String[] stringArray = new String[testLength];
Arrays.fill(stringArray, "a");
List<String> stringList = new ArrayList<String>(Arrays.asList(stringArray));
{
long start = System.nanoTime();
long total = 0;
for (String str : stringArray) {
total += str.length();
}
System.out.printf("The for each Array loop time was %.2f ns total=%d%n", (double) (System.nanoTime() - start) / testLength, total);
}
{
long start = System.nanoTime();
long total = 0;
for (int i = 0, stringListSize = stringList.size(); i < stringListSize; i++) {
String str = stringList.get(i);
total += str.length();
}
System.out.printf("The for/get List loop time was %.2f ns total=%d%n", (double) (System.nanoTime() - start) / testLength, total);
}
{
long start = System.nanoTime();
long total = 0;
for (String str : stringList) {
total += str.length();
}
System.out.printf("The for each List loop time was %.2f ns total=%d%n", (double) (System.nanoTime() - start) / testLength, total);
}
}
When run with one billion entries entries prints (using Java 6 update 26.)
The for each Array loop time was 0.76 ns total=1000000000
The for/get List loop time was 0.91 ns total=1000000000
The for each List loop time was 1.04 ns total=1000000000
When run with one billion entries entries prints (using OpenJDK 7.)
The for each Array loop time was 0.76 ns total=1000000000
The for/get List loop time was 0.91 ns total=1000000000
The for each List loop time was 1.04 ns total=1000000000
i.e. exactly the same. ;)

Every claim that X is slower than Y on a JVM which does not address all the issues presented in this article ant it's second part spreads fears and lies about the performance of a typical JVM. This applies to the comment referred to by the original question as well as to GravityBringer's answer. I am sorry to be so rude, but unless you use appropriate micro benchmarking technology your benchmarks produce really badly skewed random numbers.
Tell me if you're interested in more explanations. Although it is all in the articles I referred to.

GravityBringer's number doesn't seem right, because I know ArrayList.get() is as fast as raw array access after VM optimization.
I ran GravityBringer's test twice on my machine, -server mode
50574847
43872295
30494292
30787885
(2nd round)
33865894
32939945
33362063
33165376
The bottleneck in such tests is actually memory read/write. Judging from the numbers, the entire 2 arrays are in my L2 cache. If we decrease the size to fit L1 cache, or if we increase the size beyond L2 cache, we'll see 10X throughput difference.
The iterator of ArrayList uses a single int counter. Even if VM doesn't put it in a register (the loop body is too complex), at least it will be in the L1 cache, therefore r/w of are basically free.
The ultimate answer of course is to test your particular program in your particular environment.
Though it's not helpful to play agnostic whenever a benchmark question is raised.

The situation has gotten worse for ArrayLists. On my computer running Java 6.26, there is a fourfold difference. Interestingly (and perhaps quite logically), there is no difference for raw arrays. I ran the following test:
int testSize = 5000000;
ArrayList<Double> list = new ArrayList<Double>();
Double[] arr = new Double[testSize];
//set up the data - make sure data doesn't have patterns
//or anything compiler could somehow optimize
for (int i=0;i<testSize; i++)
{
double someNumber = Math.random();
list.add(someNumber);
arr[i] = someNumber;
}
//ArrayList foreach
long time = System.nanoTime();
double total1 = 0;
for (Double k: list)
{
total1 += k;
}
System.out.println (System.nanoTime()-time);
//ArrayList get() method
time = System.nanoTime();
double total2 = 0;
for (int i=0;i<testSize;i++)
{
total2 += list.get(i);
}
System.out.println (System.nanoTime()-time);
//array foreach
time = System.nanoTime();
double total3 = 0;
for (Double k: arr)
{
total3 += k;
}
System.out.println (System.nanoTime()-time);
//array indexing
time = System.nanoTime();
double total4 = 0;
for (int i=0;i<testSize;i++)
{
total4 += arr[i];
}
System.out.println (System.nanoTime()-time);
//would be strange if different values were produced,
//but no, all these are the same, of course
System.out.println (total1);
System.out.println (total2);
System.out.println (total3);
System.out.println (total4);
The arithmetic in the loops is to prevent the JIT compiler from possibly optimizing away some of the code. The effect of the arithmetic on performance is small, as the runtime is dominated by the ArrayList accesses.
The runtimes are (in nanoseconds):
ArrayList foreach: 248,351,782
ArrayList get(): 60,657,907
array foreach: 27,381,576
array direct indexing: 27,468,091

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.