Perform arithmetic on number array without iterating - java

For example, I would like to do something like the following in java:
int[] numbers = {1,2,3,4,5};
int[] result = numbers*2;
//result now equals {2,4,6,8,10};
Is this possible to do without iterating through the array? Would I need to use a different data type, such as ArrayList? The current iterating step is taking up some time, and I'm hoping something like this would help.

No, you can't multiply each item in an array without iterating through the entire array. As pointed out in the comments, even if you could use the * operator in such a way the implementation would still have to touch each item in the array.
Further, a different data type would have to do the same thing.

I think a different answer from the obvious may be beneficial to others who have the same problem and don't mind a layer of complexity (or two).
In Haskell, there is something known as "Lazy Evaluation", where you could do something like multiply an infinitely large array by two, and Haskell would "do" that. When you accessed the array, it would try to evaluate everything as needed. In Java, we have no such luxury, but we can emulate this behavior in a controllable manner.
You will need to create or extend your own List class and add some new functions. You would need functions for each mathematical operation you wanted to support. I have examples below.
LazyList ll = new LazyList();
// Add a couple million objects
ll.multiplyList(2);
The internal implementation of this would be to create a Queue that stores all the primitive operations you need to perform, so that order of operations is preserved. Now, each time an element is read, you perform all operations in the Queue before returning the result. This means that reads are very slow (depending on the number of operations performed), but we at least get the desired result.
If you find yourself iterating through the whole array each time, it may be useful to de-queue at the end instead of preserving the original values.
If you find that you are making random accesses, I would preserve the original values and returned modified results when called.
If you need to update entries, you will need to decide what that means. Are you replacing a value there, or are you replacing a value after the operations were performed? Depending on your answer, you may need to run backwards through the queue to get a "pre-operations" value to replace an older value. The reasoning is that on the next read of that same object, the operations would be applied again and then the value would be restored to what you intended to replace in the list.
There may be other nuances with this solution, and again the way you implement it would be entirely different depending on your needs and how you access this (sequentially or randomly), but it should be a good start.

With the introduction of Java 8 this task can be done using streams.
private long tileSize(int[] sizes) {
return IntStream.of(sizes).reduce(1, (x, y) -> x * y);
}

No it isn't. If your collection is really big and you want to do it faster you can try to operates on elements in 2 or more threads, but you have to take care of synchronization(use synchronized collection) or divide your collection to 2(or more) collections and in each thread operate on one collection. I'm not sure wheather it will be faster than just iterating through the array - it depends on size of your collection and on what you want to do with each element. If you want to use this solution you will have wheather is it faster in your case - it might be slower and definitely it will be much more complicated.
Generally - if it's not critical part of code and time of execution isn't too long i would leave it as it is now.

Related

Sorting a list in ascending order using a for loop repeatedly for binarySearch (Java)

I have a list of classes that I am attempting to sort in ascending order, by adding items in a for loop like so.
private static void addObjectToIndex(classObject object) {
for (int i=0;i<collection.size();i++)
{
if (collection.get(i).ID >= object.ID)
{
collection.insertElementAt(object, i);
return;
}
}
if (classObject.size() == 0)
classObject.add(object);
}
This is faster than sorting it every time I call that function, as that would be simpler but slower, as it gives O(N) time as opposed to using Collections.sort's O(N log N) every time (unless I'm wrong).
The problem is that when I run Collections.binarySearch to attempt to grab an item out of the Vector collection(The collection requires method calls on an atomic basis) it still ends up returning negative numbers as shown in the code below.
Comparator<classObject> c = new Comparator<classObject>()
{
public int compare(classObject u1, classObject u2)
{
int z1 = (int)(u1).ID;
int z2 = (int)(u2).ID;
if(z1 > z2)
return 1;
return z2 <= z1 ? 0 : -1;
}
};
int result = Collections.binarySearch(collection, new classObject(pID), c);
if (result < 0)
return null;
if (collection.get(result).ID != pID)
return null;
else
return collection.get(result);
Something like
result = -1043246
Shows up in the debugger, resulting in the second code snippet returning null.
Is there something I'm missing here? It's probably brain dead simple. I've tried adjusting the for loop that places things in order, <=, >=, < and > and it doesn't work. Adding object to the index i+1 doesn't work. Still returning null, which makes the entire program blow up.
Any insight would be appreciated.
Boy, did you get here from the 80s, because it sure sounds like you've missed quite a few API updates!
This is faster than sorting it every time I call that function, as that would be simpler but slower, as it gives O(N) time as opposed to using Collections.sort's O(N log N) every time (unless I'm wrong).
You're now spending an O(n) investment on every insert, So that's O(n^2) total, vs the model of 'add everything you want to add without sorting it' and then 'at the very end, sort the entire list', which is O(n logn).
Vector is threadsafe which is why I'm using it as opposed to something else, and that can't change
Nope. Threadsafety is not that easy; what you've written isn't thread safe.
Vector is obsolete and should never be used. What Vector does (vs. ArrayList) is that each individual operation on a vector is thread safe (i.e. atomic). Note that you can get this behaviour from any list if you really need it with: List<T> myList = Collections.synchronizedList(someList);, but it is highly unlikely you want this.
Take your current impl of addObjectToIndex. it is not atomic: It makes many different method calls on your vector, and these have zero guarantee of being consistent. If two threads both call addObjectToIndex and your computer has more than one core, than you will eventually end up with a list that looks like: [1, 2, 5, 4, 10] - i.e., not sorted.
Take your addObjectToIndex method: That method just doesn't work properly unless its view of your collection is consistent for the entirety of the run. In other words, that block needs to be 'atomic' - it either does it all or does none of it, and it needs a consistent view throughout. Stick a synchronized around the entire block. In contrast to Vector, which considers each individual call atomic and nothing else, which doesn't work here. More generally, 'just synchronize' is a rather inefficient way to do multicore - the various collections in the java.util.concurrent are usually vastly more efficient and much easier to use, you should read through that API and see if there's anything that'll work for you.
if(z1 > z2) return 1;
I'm pretty sure your insert code sorts ascending, but your comparator sorts descending. Which would break the binary search code (the binary search code is specced to return arbitrary garbage if the list isn't sorted, and as far as the comparator you use here is concerned, it isn't). You should use the same comparator anytime it is relevant, and not re-implement the logic multiple times (or if you do, at least test it!).
There is also no need to write all this code.
Comparator<classObject> c = Comparator::comparingInt(co -> co.ID);
is all you need.
However
It looks like what you really want is a collection that keeps itself continually sorted. Java has that; it's called a TreeSet. You pass it a Comparator (or you don't, and TreeSet expects that the elements you put in have a natural order, either is fine), and it will keep the collection sorted, at very cheap cost (far better than your O(n^2)!), continually. It IS a set, meaning if the comparator says that 2 items are equal, then adding both to the set results in the second add call being ignored (sets cannot contain the same element more than once, and for a TreeSet, 'the same element' is defined solely by 'comparing them returns 0' - TreeSet ignores hashCode and equals entirely).
This sounds like what you really want. If you need 2 different objects with the same ID to be added anyway, then add some more fields to your comparator (instead of returning 0 upon same ID, move on to checking the insertion timestamp or whatnot). But, with a name like 'ID', sounds like duplicates aren't going to happen.
The reason you want to use this off-the-shelf stuff is because otherwise you need to do it yourself, and if you're going to endeavour to write it yourself, you need to be a good programmer. Which you clearly aren't (yet - we all started a newbie and learned to become good later, it's the natural order of things). For example, if I try to add an element to a non-empty collection where the element I try to add has a larger ID than anything in the collection, it just won't add anything. That's because you wrote if (classObject.size() == 0) classObject.add(object); but you wanted classObject.add(object); without the if. Also, In java we write ClassObject, not ClassObject, and more generally, ClassObject is a completely meaningless name. Find a better name; this helps code be less confusing, and this question does suggest you could use some of that.

Faster Access Version of ArrayList?

Does anyone know of something similar to ArrayList that is better geared to handling really large amounts of data as quickly as possible?
I've got a program with a really large ArrayList that's getting choked up when it tries to explore or modify the ArrayList.
Presumably when you do:
//i is an int;
arrayList.remove(i);
The code behind the scenes runs something like:
public T remove(int i){
//Let's say ArrayList stores it's data in a T [] array called "contents".
T output = contents[i];
T [] overwrite = new T [contents.length - 1];
//Yes, I know generic arrays aren't created this simply. Bear with me here...
for(int x=0;x<i;x++){
overwrite[x] = contents[x];
}
for(int x=i+1;x<contents.length;x++){
overwrite[x-1] = contents[x];
}
contents = overwrite;
return output;
}
When the size of the ArrayList is a couple million units or so, all those cycles rearranging the positions of items in the array would take a lot of time.
I've tried to alleviate this problem by creating my own custom ArrayList subclass which segments it's data storage into smaller ArrayLists. Any process that required the ArrayList to scan it's data for a specific item generates a new search thread for each of the smaller ArrayLists within (to take advantage of my multiple CPU cores).
But this system doesn't work because when the Thread calling the search has an item in any of the ArrayLists synchronized, it can block those seperate search threads from completing their search, which in turn locks up the original thread that called the search in the process, essentially deadlocking the whole program up.
I really need some kind of data storage class oriented to containing and manipulating large amounts of objects as quickly as the PC is capable.
Any ideas?
I really need some kind of data storage class oriented to containing and manipulating large amounts of objects as quickly as the PC is capable.
The answer depends a lot on what sort of data you are talking about and the specific operations you need. You use the work "explore" without defining it.
If you are talking about looking up a record then nothing beats a HashMap – ConcurrentHashMap for threaded operation. If you are talking about keeping in order, especially when dealing with threads, then I'd recommend a ConcurrentSkipListMap which has O(logN) lookup, insert, remove, etc..
You may also want to consider using multiple collections. You need to be careful that the collections don't get out of sync, which can be especially challenging with threads, but that might be faster depending on the various operations you are making.
When the size of the ArrayList is a couple million units or so, all those cycles rearranging the positions of items in the array would take a lot of time.
As mentioned ConcurrentSkipListMap is O(logN) for rearranging an item. i.e. remove and add with new position.
The [ArrayList.remove(i)] code behind the scenes runs something like: ...
Well not really. You can look at the code in the JDK right? ArrayList uses System.arraycopy(...) for these sorts of operations. They maybe not efficient for your case but it isn't O(N).
One example of good usage for a linked list is where the list elements are very large ie. large enough that only one or two can fit in CPU cache at the same time. At this point the advantage that contiguous block containers like vectors or arrays for iteration have is more or less nullified, and a performance advantage may be possible if many insertions and removals are occurring in realtime.
ref: Under what circumstances are linked lists useful?
ref : https://coderanch.com/t/508171/java/Collection-datastructure-large-data
Different collection types has different time complexity for various operations. Typical complexities are: O(1), O(N), and O(log(N)). To choose a collection, you first need to decide which operation you use often, and avoid collections which have O(N) complexity for that operations. Here you often use operation ArrayList.remove(i) which is O(N). Even worse, you use remove(i) and not remove(element). If remove(element) would have been the only operation used often, then LinkedList could help, its remove(element) is O(1), but LinkedList.remove(i)is also O(N).
I doubt that a List with remove(i) complexity of O(1) can be implemented. The best possible time is O(log(N)), which is definitely better than O(N). Java standard library has no such implementation. You can try to google it by "binary indexed tree" keywords.
But the first thing I would do is to review the algorithm and try to get rid of List.remove(i) operation.

Efficiency-wise, would it be quicker to make an ArrayList or use an array when adding to the first index?

I am using Java. I want to add to the start of an Array. Would it be more efficient to move all variables up one space in the array, leaving one spot for a new variable to be added in index 0, or to just use an ArrayList?
I am aware an ArrayList will move the values for me, but I have heard that they are very inefficient, is this true?
Are there any other APIs that will do this efficiently?
Apart from the method call overhead and some small maintenance cost, ArrayList is no more inefficient than copying array elements yourself. Some implementations of ArrayList may even be faster at moving data, by allowing the list to start somewhere else in the backing array than at index 0, as ArrayDeque does.
Neither would be efficient, because each insertion at the beginning needs to move what you've added so far. This means that inserting N elements takes O(N2) time, which is rather inefficient.
LinkedList<T>s are better for situations when you need to insert at the beginning of the list. However, they have memory overhead, and do not allow fast lookup based on the index.
If you do not need to use your list until after all elements have been inserted, you may be better off inserting elements at the back of the list, and then reversing the list before starting to use it.
ArrayList also uses Arrays internally to store the data. But, Sun/Oracle added a fastest algorithm to add the item in index 0 and move the items starting from index 1. So, better use the ArrayList for simpler coding, But if you can tweak a better algorithm, then go for Array.
If you would be adding to the first index very frequenlty, it will be very expensive as it needs to relocate all the indices from 1 to end of the array i.e it will resize it itself to adjust a new element at the top.
LinkedLists provide better performance in such cases but they do not implement the Random Access behaviour .
ArrayList provides enough performance for normal usage, and what's even more important - they are safe. So you don't need to worry about getting out-of-bounds, null-pointers etc.
To make it "faster" you can, for example, get rid of ArrayList's checking capacity etc., but then you are making your code unsafe, which means you must be sure you are setting the right parameters, because if not you will be getting IndexOutOfBounds etc.
You can read a very interesting post about Trove - using primitive collections for performance, for more information.
But 99 times out of 100, there is no real need. Remember and repeat after me:
Premature optimization is the root of all evil.
Besides, I really recommend checking out the JDK source code yourself. You can learn a lot and, obviously, see how it's made.

Am I writing this method the right way?

I've got an ArrayList called conveyorBelt, which stores orders that have been picked and placed on the conveyor belt. I've got another ArrayList called readyCollected which contains a list of orders that can be collected by the customer.
What I'm trying to do with the method I created is when a ordNum is entered, it returns true if the order is ready to be collected by the customer (thus removing the collected order from the readyCollected). If the order hasn't even being picked yet, then it returns false.
I was wondering is this the right way to write the method...
public boolean collectedOrder(int ordNum)
{
int index = 0;
Basket b = new Basket(index);
if(conveyorBelt.isEmpty()) {
return false;
}
else {
readyCollected.remove(b);
return true;
}
}
I'm a little confused since you're not using ordNum at all.
If you want to confirm operation of your code and generally increase the reliability of what you're writing, you should check out unit testing and the Java frameworks available for this.
You can solve this problem using an ArrayList, but I think that this is fundamentally the wrong way to think about the problem. An ArrayList is good for storing a complete sequence of data without gaps where you are only likely to add or remove elements at the very end. It's inefficient to remove elements at other positions, and if you have just one value at a high index, then you'll waste a lot of space filling in all lower positions with null values.
Instead, I'd suggest using a Map that associates order numbers with the particular order. This more naturally encodes what you want - every order number is a key associated with the order. Maps, and particularly HashMaps, have very fast lookups (expected constant time) and use (roughly) the same amount of space no matter how many keys there are. Moreover, the time to insert or remove an element from a HashMap is expected constant time, which is extremely fast.
As for your particular code, I agree with Brian Agnew on this one that you probably want to write some unit tests for it and find out why you're not using the ordNUm parameter. That said, I'd suggest reworking the system to use HashMap instead of ArrayList before doing this; the savings in time and code complexity will really pay off.
Based on your description, why isn't this sufficient :
public boolean collectedOrder(int ordNum) {
return (readyCollected.remove(ordNum) != null);
}
Why does the conveyorBelt ArrayList even need to be checked?
As already pointed out, you most likely need to be using ordNum.
Aside from that the best answer anyone can give with the code you've posted is "perhaps". Your logic certainly looks correct and ties in with what you've described, but whether it's doing what it should depends entirely on your implementation elsewhere.
As a general pointer (which may or may not be applicable in this instance) you should make sure your code deals with edge cases and incorrect values. So you might want to flag something's wrong if readyCollected.remove(b); returns false for instance, since that indicates that b wasn't in the list to remove.
As already pointed out, take a look at unit tests using JUnit for this type of thing. It's easy to use and writing thorough unit tests is a very good habit to get into.

Java Performance - ArrayLists versus Arrays for lots of fast reads

I have a program where I need to make 100,000 to 1,000,000 random-access reads to a List-like object in as little time as possible (as in milliseconds) for a cellular automata-like program. I think the update algorithm I'm using is already optimized (keeps track of active cells efficiently, etc). The Lists do need to change size, but that performance is not as important. So I am wondering if the performance from using Arrays instead of ArrayLists is enough to make a difference when dealing with that many reads in such short spans of time. Currently, I'm using ArrayLists.
Edit:
I forgot to mention: I'm just storing integers, so another factor is using the Integer wrapper class (in the case of ArrayLists) versus ints (in the case of arrays). Does anyone know if using ArrayList will actually require 3 pointer look ups (one for the ArrayList, one for the underlying array, and one for the Integer->int) where as the array would only require 1 (array address+offset to the specific int)? Would HotSpot optimize the extra look ups away? How significant are those extra look ups?
Edit2:
Also, I forgot to mention I need to do random access writes as well (writes, not insertions).
Now that you've mentioned that your arrays are actually arrays of primitive types, consider using the collection-of-primitive-type classes in the Trove library.
#viking reports significant (ten-fold!) speedup using Trove in his application - see comments. The flip-side is that Trove collection types are not type compatible with Java's standard collection APIs. So Trove (or similar libraries) won't be the answer in all cases.
Try both, but measure.
Most likely you could hack something together to make the inner loop use arrays without changing all that much code. My suspicion is that HotSpot will already inline the method calls and you will see no performance gain.
Also, try Java 6 update 14 and use -XX:+DoEscapeAnalysis
ArrayLists are slower than Arrays, but most people consider the difference to be minor. In your case could matter though, since you're dealing with hundreds of thousands of them.
By the way, duplicate: Array or List in Java. Which is faster?
I would go with Kevin's advise.
Stay with the lists first and measure your performance if your programm is to slow compare it to a version with an array. If that gives you a measurable performance boost go with the arrays, if not stay with the lists because they will make your life much much easier.
There will be an overhead from using an ArrayList instead of an array, but it is very likely to be small. In fact, the useful bit of data in the ArrayList can be stored in registers, although you will probably use more (List size for instance).
You mention in your edit that you are using wrapper objects. These do make a huge difference. If you are typically using the same value repeatedly, then a sensible cache policy may be useful (Integer.valueOf gives the same results for -128 to 128). For primitives, primitive arrays usually win comfortably.
As a refinement, you might want to make sure the adjacent cells tend to be adjacent in the array (you can do better than rows of columns with a space filling curve).
One possibility would be to re-implement ArrayList (it's not that hard), but expose the backing array via a lock/release call cycle. This gets you convenience for your writes, but exposes the array for a large series of read/write operations that you know in advance won't impact the array size. If the list is locked, add/delete is not allowed - just get/set.
for example:
SomeObj[] directArray = myArrayList.lockArray();
try{
// myArrayList.add(), delete() would throw an illegal state exception
for (int i = 0; i < 50000; i++){
directArray[i] += 1;
}
} finally {
myArrayList.unlockArray();
}
This approach continues to encapsulate the array growth/etc... behaviors of ArrayList.
Java uses double indirection for its objects so they can be moved about in memory and have its references still be valid, this means every reference lookup is equivalent to two pointer lookups. These extra lookups cannot be optimised away completely.
Perhaps even worse is your cache performance will be terrible. Accessing values in cache is goings to be many times faster than accessing values in main memory. (perhaps 10x) If you have an int[] you know the values will be consecutive in memory and thus load into cache readily. However, for Integer[] the Integers individual objects can appear randomly across your memory and are much more likely to be cache misses. Also Integer use 24 bytes which means they are much less likely to fit into your caches than 4 byte values.
If you update an Integer, this often results in a new object created which is many orders of magnitude than updating an int value.
If you're creating the list once, and doing thousands of reads from it, the overhead from ArrayList may well be slight enough to ignore. If you're creating thousands of lists, go with the standard array. Object creation in a loop quickly goes quadratic, simply because of all the overhead of instantiating the member variables, calling the constructors up the inheritance chain, etc.
Because of this -- and to answer your second question -- stick with standard ints rather than the Integer class. Profile both and you'll quickly (or, rather, slowly) see why.
If you're not going to be doing a lot more than reads from this structure, then go ahead and use an array as that would be faster when read by index.
However, consider how you're going to get the data in there, and if sorting, inserting, deleting, etc, are a concern at all. If so, you may want to consider other collection based structures.
Primitives are much (much much) faster. Always. Even with JIT escape analysis, etc. Skip wrapping things in java.lang.Integer. Also, skip the array bounds check most ArrayList implementations do on get(int). Most JIT's can recognize simple loop patterns and remove the loop, but there isn't much reason to much with it if you're worried about performance.
You don't have to code primitive access yourself - I'd bet you could cut over to using IntArrayList from the COLT library - see http://acs.lbl.gov/~hoschek/colt/ - "Colt provides a set of Open Source Libraries for High Performance Scientific and Technical Computing in Java") - in a few minutes of refactoring.
The options are:
1. To use an array
2. To use the ArrayList which internally uses an array
It is obvious the ArrayList introduces some overhead (look into ArrayList source code). For the 99% of the use cases this overhead can be easily ignored. However if you implement time sensitive algorithms and do tens of millions of reads from a list by index then using bare arrays instead of lists should bring noticeable time savings. USE COMMON SENSE.
Please take a look here: http://robaustin.wikidot.com/how-does-the-performance-of-arraylist-compare-to-array I would personally tweak the test to avoid compiler optimizations, e.g. I would change "j = " into "j += " with the subsequent use of "j" after the loop.
An Array will be faster simply because at a minimum it skips a function call (i.e. get(i)).
If you have a static size, then Arrays are your friend.

Categories