Vector vs Arraylist (in non-multithreading environment) – in which requirements we will use Vector (rather than ArrayList)
One I know : if the size of the collection has to be increased dynamically and very frequently as vector size increases by 100% and ArrayList 50%
I think you should not use vector because you have to syncronized(also non-threaded environment) ,So Use ArrayList
Vector is syncronized each operation ,do not whole operation and .....and its also deprecated...
Why is Java Vector class considered obsolete or deprecated?
In a single thread environment never user Vector. Its methods are "synchronized" and this makes it slow, very slow against ArrayList.
So event if the ensureCapacity behavior is different, and Vector increase the size by 2, the cost of every single operation cannot be compared with the cost of the ArrayList operations, that are more fast.
Related
Does anyone know of something similar to ArrayList that is better geared to handling really large amounts of data as quickly as possible?
I've got a program with a really large ArrayList that's getting choked up when it tries to explore or modify the ArrayList.
Presumably when you do:
//i is an int;
arrayList.remove(i);
The code behind the scenes runs something like:
public T remove(int i){
//Let's say ArrayList stores it's data in a T [] array called "contents".
T output = contents[i];
T [] overwrite = new T [contents.length - 1];
//Yes, I know generic arrays aren't created this simply. Bear with me here...
for(int x=0;x<i;x++){
overwrite[x] = contents[x];
}
for(int x=i+1;x<contents.length;x++){
overwrite[x-1] = contents[x];
}
contents = overwrite;
return output;
}
When the size of the ArrayList is a couple million units or so, all those cycles rearranging the positions of items in the array would take a lot of time.
I've tried to alleviate this problem by creating my own custom ArrayList subclass which segments it's data storage into smaller ArrayLists. Any process that required the ArrayList to scan it's data for a specific item generates a new search thread for each of the smaller ArrayLists within (to take advantage of my multiple CPU cores).
But this system doesn't work because when the Thread calling the search has an item in any of the ArrayLists synchronized, it can block those seperate search threads from completing their search, which in turn locks up the original thread that called the search in the process, essentially deadlocking the whole program up.
I really need some kind of data storage class oriented to containing and manipulating large amounts of objects as quickly as the PC is capable.
Any ideas?
I really need some kind of data storage class oriented to containing and manipulating large amounts of objects as quickly as the PC is capable.
The answer depends a lot on what sort of data you are talking about and the specific operations you need. You use the work "explore" without defining it.
If you are talking about looking up a record then nothing beats a HashMap – ConcurrentHashMap for threaded operation. If you are talking about keeping in order, especially when dealing with threads, then I'd recommend a ConcurrentSkipListMap which has O(logN) lookup, insert, remove, etc..
You may also want to consider using multiple collections. You need to be careful that the collections don't get out of sync, which can be especially challenging with threads, but that might be faster depending on the various operations you are making.
When the size of the ArrayList is a couple million units or so, all those cycles rearranging the positions of items in the array would take a lot of time.
As mentioned ConcurrentSkipListMap is O(logN) for rearranging an item. i.e. remove and add with new position.
The [ArrayList.remove(i)] code behind the scenes runs something like: ...
Well not really. You can look at the code in the JDK right? ArrayList uses System.arraycopy(...) for these sorts of operations. They maybe not efficient for your case but it isn't O(N).
One example of good usage for a linked list is where the list elements are very large ie. large enough that only one or two can fit in CPU cache at the same time. At this point the advantage that contiguous block containers like vectors or arrays for iteration have is more or less nullified, and a performance advantage may be possible if many insertions and removals are occurring in realtime.
ref: Under what circumstances are linked lists useful?
ref : https://coderanch.com/t/508171/java/Collection-datastructure-large-data
Different collection types has different time complexity for various operations. Typical complexities are: O(1), O(N), and O(log(N)). To choose a collection, you first need to decide which operation you use often, and avoid collections which have O(N) complexity for that operations. Here you often use operation ArrayList.remove(i) which is O(N). Even worse, you use remove(i) and not remove(element). If remove(element) would have been the only operation used often, then LinkedList could help, its remove(element) is O(1), but LinkedList.remove(i)is also O(N).
I doubt that a List with remove(i) complexity of O(1) can be implemented. The best possible time is O(log(N)), which is definitely better than O(N). Java standard library has no such implementation. You can try to google it by "binary indexed tree" keywords.
But the first thing I would do is to review the algorithm and try to get rid of List.remove(i) operation.
I've been programming for quite a bit and recently started learning more pure Computer Science topics (for a job interview).
I know the difference between an Array and a LinkedList data structure, but now that I have started using Java I'm seeing this ArrayList, which I'm having trouble conceptualizing.
Web searches have only really shown me HOW to use them and WHEN to use them (benefits of each), but nothing can answer my question of:
What is an ArrayList? My assumption is that it is a list that maintains memory references to each element, making it also able to act like an array.
I also have a feeling since Java is open, that I should be able to look at the Class definition, but haven't figured out how to do that yet either.
Thanks!
I like to think of it as a data-structure that lets you enjoy both worlds, the quick-access to an index like with an array and the infinite growth of a list. Of course, there are always trade-offs.
ArrayList is actually a wrapper to an array. Every time the size of the array ends, a new array, twice the size, is created and all the data from the original array is copied to the new one.
From the java doc:
Resizable-array implementation of the List interface. Implements all
optional list operations, and permits all elements, including null. In
addition to implementing the List interface, this class provides
methods to manipulate the size of the array that is used internally to
store the list. (This class is roughly equivalent to Vector, except
that it is unsynchronized.) The size, isEmpty, get, set, iterator, and
listIterator operations run in constant time. The add operation runs
in amortized constant time, that is, adding n elements requires O(n)
time. All of the other operations run in linear time (roughly
speaking). The constant factor is low compared to that for the
LinkedList implementation.
Each ArrayList instance has a capacity. The capacity is the size of
the array used to store the elements in the list. It is always at
least as large as the list size. As elements are added to an
ArrayList, its capacity grows automatically. The details of the growth
policy are not specified beyond the fact that adding an element has
constant amortized time cost.
An application can increase the capacity of an ArrayList instance
before adding a large number of elements using the ensureCapacity
operation. This may reduce the amount of incremental reallocation.
This allows O(1) access for most of the operations like it would take with an array. Once in a while you need to pay for this performance with an insert operation that takes much longer though.
This is called amortized complexity. Each operation takes only O(1) aside for those times you need to double the size of the array. In those time you would pay O(n) but if you average it over n operations, the average time taken is only O(1) and not O(n).
Let's take an example:
We have an array of size 100 (n=100). You make 100 insert operations (to different indices) and each of them takes only O(1), of course that all get-by-index operations also take O(1) (as this is an array). On the 101 insertion, there's no more more capacity in the array so the ArrayList will create a new array, the size of 200, copy all the values to it (O(n) operations) and then insert the 101st item. Until you fill out the array to 200 items, all of the operations would take O(1).
An ArrayList is a list that is directly backed by an array. More specifically, it's backed by an array that is dynamically resized. You can read a bit more about it in its source code; there are some pretty good comments to it.
The reason that this is significant is due to how a LinkedList is implemented - as a traditional collection of nodes and references to other nodes. This has performance impacts in indexing and traversal, whereas with an ArrayList, since it's backed by an array, all one needs to do is index into the specific array to retrieve the value.
First I should say that in my book (2005), Vector<E> is (extensively used) in place of arrays. At the same time there is no explanation with differences between the two. Checking the Oracle Doc for Vector class it's pretty easy to understand its usage.
Doing some additional research on StackOverflow and Google, I found that the Vector class is actually deprecated and to use ArrayList instead, is this correct? I also found an extensive explanation about differences between Array and ArrayList.
The part that I can't really understand: Is there a rule on where I should use ArrayList instead of simple arrays? It seems like I should always use ArrayList. It looks more efficient and should be easier to implement collections of values/objects, is there any down side with this approach?
Some history:
Vector exists since Java 1.0;
the List interface exists since Java 1.2, and so does ArrayList;
Vector has been retrofitted to implement the List interface at that same time;
Java 5, introducing generics, has been introduced in 2004 (link).
Your course, dating back 2005, should have had knowledge of ArrayList at the very list (sorry, least), and should have introduced generics too.
As to Array, there is java.lang.reflect.Array, which helps with reflections over arrays (ie, int[], etc).
Basically:
Vector synchronizes all operations, which is a waste in 90+% of cases;
if you want concurrent collections, Java 5 has introduced ConcurrentHashMap, CopyOnWriteArrayList etc, you should use those;
DO NOT use Vector anymore in any event; some code in the JDK still uses it, but it is for backwards compatibility reasons. In new code, there are better alternatives, as mentioned in the previous point;
since Java 1.2, Vector does not offer the same thread safety guarantees as it used to offer anyway.
The latter point is interesting. Prior to Iterator there was Enumeration, and Enumeration did not offer the possibility to remove elements; Iterator, however, does.
So, let us take two threads t1 and t2, a Vector, and those two threads having an Iterator over that vector. Thread t1 does:
while (it.hasNext())
it.next();
Thread t2 does:
// remember: different iterator
if (!it.hasNext())
it.remove();
With some unlucky timing, you have:
t1 t2
------ ------
hasNext(): true
.hasNext(): false
removes last element
.next() --> BOOM
Therefore, Vector is in fact not thread safe. And it is even less thread safe since Java 5's introduction of the "foreach loop", which creates a "hidden" iterator.
The basic difference between an array and an ArrayList is that an array has fixed size, whereas, ArrayList can dynamically grow in size as needed. So, if you are assured that your array size won't change, then you can use it. But if you want to add elements later then a an ArrayList which is an implementation of List interface, is the way to go.
Although an ArrayList is internally backed by an array only. So, internally it also uses a fixed size array, with an initial capacity of 10 (which can change for that matter), but that detail is internally hidden. So, you don't have to bother about the changing size of the ArrayList.
Whenever you add elements more than the current size of array in your ArrayList, the internal array is extended. That means, the regular expansion of size can become an overhead, if you are regular inserting a large number of elements. Although this is rarely the case. Still, you can also give your own initial size while creating an ArrayList. So, that's upto you to decide.
As for Vector vs ArrayList discussion, yes Vector is now deprecated (not technically though, but it's use is discouraged as stated in comments by #Luiggi), and you should use an ArrayList. The difference is that Vector synchronizes each operation, which is nearly never required. When you need synchronization, you can always create a synchronized list using Collections.synchronizedList.
For more on this discussion, see this post.
An ArrayList is an implementation of List. There are other variations too. Like you also have a LinkedList, to get the functionality of a traditional linked list.
Vector Class is actually deprecated and to use ArrayList instead, is this correct?
Yes this is correct. Vector class and some other collections are deprecated and replaced with new collections like ArrayList, Map, etc. Here are few reasons why Vector is deprecated
Is there a rule on where i should use ArrayList instead of simple Arrays?
Almost always. I can think of two reasons why you should use arrays:
Makes JNI calls easier. It is MUCH easier to send a simple array from C++ to Java than an object of ArrayList
You can gain a little bit of performance, since access to elements of simple array does not requires boundaries checks and method calls.
On other hand using ArrayList gives a lot of advantages. You do not need to think about controlling array's size when you add new element, you can use simple API of ArrayList for adding/removing elements from your collection, etc.
I'll just add my two cents.
If you need a collection of primitive data and optimization matters, arrays will always be faster, as it eliminates the requirement of auto-boxing and auto-unboxing.
Could you please let me know Performance wise why Array is better than Collection?
It is not. It will actually depend on the use you make of your container.
Some algorithms may run in O(n) on an array and in O(1) on another collection type (which implements the Collection interface).
Think about removal of an item for instance. In that case, the array, even if a native type, would perform slower than the linked list and its method calls (which could be inlined anyway on some VMs): it runs in O(n) VS O(1) for a linked list
Think about searching an element. It runs in 0(n) for an array VS O(log n) for a tree.
Some Collection implementations use an array to store their elements (ArrayList I think) so in that case performance will not be significantly different.
You should spend time on optimizing your algorithm (and make use of the various collection types available) instead of worrying of the pros/cons of an array VS Collection.
Many collections are wrappers for arrays. This includes ArrayList, HashMap/Set, StringBuilder. For optimised code, the performance difference of the operations is minimal except when you come to operations which are better suited to that data structure e.g. lookup of a Map is much faster than the lookup in an array.
Using generics for collections which are basically primitives can be slower, not because the collection is slower but the extra object creation and cache usage (as the memory needed can be higher) This difference is usually too small to matter but if you are concerned about this you can use the Trove4J libraries which are wrappers for arrays of primitives instead of arrays of Objects.
Where collections are slower is when you use operations which they are not suitable for e.g. random access of a LinkedList, but sensible coding can avoid these situations.
Basically, because arrays are primitive data structures in Java. Accesses to them can be translated directly into native memory-access instructions rather than method calls.
That said, though, it's not entirely obvious that arrays will strictly outperform collections in all circumstances. If your code references collection variables where the runtime type can be monomorphically known at JIT-time, Hotspot will be able to inline the access methods, and where they are simple, can be just as fast since there's basically no overhead anyway.
Many of the collections' access methods are intrinsically more complex than array referencing, however. There is, for instance, no way that a HashMap will be as efficient as a simple array lookup, no matter how much Hotspot optimizes it.
You cannot compare the two. ArrayList is an implementation, Collection is an interface. There might be different implementations for the Collection interface.
In practice the implementation is chosen which as the simple access to your data. Usually ArrayList if you need to loop through all elements. Hashtable if you need access by key.
Performance should be considered only after measurements are made. Then it is easy to change the implementation because the collection framework has common interfaces like the Collection interface.
The question is which one to use and when?
An array is basically a fixed size collection of elements. The bad point about an array is that it is not resizable. But its constant size provides efficiency if you are clear with your element size. So arrays are better to use when you know the number of elements available with you.
Collection
ArrayList is another collection where the number of elements is resizable. So if you are not sure about the number of elements in the collection use an ArrayList. But there are certain facts to be considered while using ArrayLists.
ArrayLists is not synchronized. So if there are multiple threads
accessing and modifying the list, then synchronization might be
required to be handled externally.
ArrayList is internally implemented as an array. So whenever a new
element is added an array of n+1 elements is created and then all the
n elements are copied from the old array to the new array and then
the new element is inserted in the new array.
Adding n elements requires on time.
The isEmpty, size, iterator, set, get and listIterator operations
require the same amount of time, independently of element you access.
Only Objects can be added to an ArrayList
Permits null elements
If you need to add a large number of elements to an ArrayList, you can use the ensureCapacity(int minCapacity) operation to ensure that the ArrayList has that required capacity. This will ensure that the Array is copied only once when all the elements are added and increase the performance of addition of elements to an ArrayList. Also inserting an element in the middle of say 1000 elements would require you to move 500 elements up or down and then add the element in the middle.
The benefit of using ArrayList is that accessing random elements is cheap and is not affected by the number of elemets in the ArrayList. But addition of elements to the head of tail or in the middle is costly.
Vector is similar to ArrayList with the difference that it is synchronized. It offers some other benefits like it has an initial capacity and an incremental capacity. So if your vector has a capacity of 10 and incremental capacity of 10, then when you are adding the 11th element a new Vector would be created with 20 elements and the 11 elements would be copied to the new Vector. So addition of 12th to 20th elements would not require creation of new vector.
By default, when a vector needs to grow the size of its internal data structure to hold more elements, the size of internal data structure is doubled, whereas for ArrayList the size is increased by only 50%. So ArrayList is more conservative in terms of space.
LinkedList is much more flexible and lets you insert, add and remove elements from both sides of your collection - it can be used as queue and even double-ended queue! Internally a LinkedList does not use arrays. LinkedList is a sequence of nodes, which are double linked. Each node contains header, where actually objects are stored, and two links or pointers to next or previous node. A LinkedList looks like a chain, consisting of people who hold each other's hand. You can insert people or node into that chain or remove. Linked lists permit node insert/remove operation at any point in the list in constant time.
So inserting elements in linked list (whether at head or at tail or in the middle) is not expensive. Also when you retrieve elements from the head it is cheap. But when you want to randomly access the elements of the linked list or access the elements at the tail of the list then the operations are heavy. Cause, for accessing the n+1 th element, you will need to parse through the first n elements to reach the n+1th element.
Also linked list is not synchronized. So multiple threads modifying and reading the list would need to be synchronized externally.
So the choice of which class to use for creating lists depends on the requirements. ArrayList or Vector( if you need synchronization ) could be used when you need to add elements at the end of the list and access elements randomly - more access operations than add operations. Whereas a LinkedList should be used when you need to do a lot of add/delete (elements) operations from the head or the middle of the list and your access operations are comparatively less.
What are the differences between the two data structures ArrayList and Vector, and where should you use each of them?
Differences
Vectors are synchronized, ArrayLists
are not.
Data Growth Methods
Use ArrayLists if there is no specific requirement to use Vectors.
Synchronization
If multiple threads access an ArrayList concurrently then we must externally synchronize the block of code which modifies the list either structurally or simply modifies an element. Structural modification means addition or deletion of element(s) from the list. Setting the value of an existing element is not a structural modification.
Collections.synchronizedList is normally used at the time of creation of the list to avoid any accidental unsynchronized access to the list.
Data growth
Internally, both the ArrayList and Vector hold onto their contents using an Array. When an element is inserted into an ArrayList or a Vector, the object will need to expand its internal array if it runs out of room. A Vector defaults to doubling the size of its array, while the ArrayList increases its array size by 50 percent.
As the documentation says, a Vector and an ArrayList are almost equivalent. The difference is that access to a Vector is synchronized, whereas access to an ArrayList is not. What this means is that only one thread can call methods on a Vector at a time, and there's a slight overhead in acquiring the lock; if you use an ArrayList, this isn't the case. Generally, you'll want to use an ArrayList; in the single-threaded case it's a better choice, and in the multi-threaded case, you get better control over locking. Want to allow concurrent reads? Fine. Want to perform one synchronization for a batch of ten writes? Also fine. It does require a little more care on your end, but it's likely what you want. Also note that if you have an ArrayList, you can use the Collections.synchronizedList function to create a synchronized list, thus getting you the equivalent of a Vector.
Vector is a broken class that is not threadsafe, despite it being "synchronized" and is only used by students and other inexperienced programmers.
ArrayList is the go-to List implementation used by professionals and experienced programmers.
Professionals wanting a threadsafe List implementation use a CopyOnWriteArrayList.
ArrayList is newer and 20-30% faster.
If you don't need something explitly apparent in Vector, use ArrayList
There are 2 major differentiation's between Vector and ArrayList.
Vector is synchronized by default, and ArrayList is not.
Note : you can make ArrayList also synchronized by passing arraylist object to Collections.synchronizedList() method.
Synchronized means : it can be used with multiple threads with out any side effect.
ArrayLists grow by 50% of the previous size when space is not sufficient for new element, where as Vector will grow by 100% of the previous size when there is no space for new incoming element.
Other than this, there are some practical differences between them, in terms of programming effort:
To get the element at a particular location from Vector we use elementAt(int index) function. This function name is very lengthy.
In place of this in ArrayList we have get(int index) which is very
easy to remember and to use.
Similarly to replace an existing element with a new element in Vector we use setElementAt() method, which is again very lengthy and may irritate the programmer to use repeatedly. In place of this ArrayList has add(int index, object) method which is easy to use and remember.
Like this they have more programmer friendly and easy to use function names in ArrayList.
When to use which one?
Try to avoid using Vectors completely. ArrayLists can do everything what a Vector can do. More over ArrayLists are by default not synchronized. If you want, you can synchronize it when ever you need by using Collections util class.
ArrayList has easy to remember and use function names.
Note : even though arraylist grows by 100%, you can avoid this by ensurecapacity() method to make sure that you are allocating sufficient memory at the initial stages itself.
Hope it helps.
ArrayList and Vector both implements List interface and maintains insertion order.But there are many differences between ArrayList and Vector classes...
ArrayList -
ArrayList is not synchronized.
ArrayList increments 50% of current array size if number of element exceeds from its capacity.
ArrayList is not a legacy class, it is introduced in JDK 1.2.
ArrayList is fast because it is non-synchronized.
ArrayList uses Iterator interface to traverse the elements.
Vector -
Vector is synchronized.
Vector increments 100% means doubles the array size if total number of element exceeds than its capacity.
Vector is a legacy class.
Vector is slow because it is synchronized i.e. in multithreading environment, it will hold the other threads in runnable or non-runnable state until current thread releases the lock of object.
Vector uses Enumeration interface to traverse the elements. But it can use Iterator also.
See Also : https://www.javatpoint.com/difference-between-arraylist-and-vector
Basically both ArrayList and Vector both uses internal Object Array.
ArrayList: The ArrayList class extends AbstractList and implements the List interface and RandomAccess (marker interface). ArrayList supports dynamic arrays that can grow as needed. It gives us first iteration over elements.
ArrayList uses internal Object Array; they are created with an default initial size of 10. When this size is exceeded, the collection is automatically increases to half of the default size that is 15.
Vector: Vector is similar to ArrayList but the differences are, it is synchronized and its default initial size is 10 and when the size exceeds its size increases to double of the original size that means the new size will be 20. Vector is the only class other than ArrayList to implement RandomAccess. Vector is having four constructors out of that one takes two parameters Vector(int initialCapacity, int capacityIncrement) capacityIncrement is the amount by which the capacity is increased when the vector overflows, so it have more control over the load factor.
Some other differences are: