I need to store a large amount of information, say for example 'names' in a java List. The number of items can change (or in short I cannot predefine the size). I am of the opinion that from a memory allocation perspective LinkedList would be a better option than ArrayList, as for an ArrayList once the max size is reached, automatically the memory allocation doubles and hence there would always be a chance of more memory being allocated than what is needed.
I understand from other posts here that individual elements stored in a LinkedList takes more space than an ArrayList as LinkedList also needs to store the node information, but I am still guessing for the scenario I have defined LinkedList might be a better option. Also, I do not want to get into the performance aspect (fetching, deleting etc) , as much has already been discussed on it.
LinkedList might allocate fewer entries, but those entries are astronomically more expensive than they'd be for ArrayList -- enough that even the worst-case ArrayList is cheaper as far as memory is concerned.
(FYI, I think you've got it wrong; ArrayList grows by 1.5x when it's full, not 2x.)
See e.g. https://github.com/DimitrisAndreou/memory-measurer/blob/master/ElementCostInDataStructures.txt : LinkedList consumes 24 bytes per element, while ArrayList consumes in the best case 4 bytes per element, and in the worst case 6 bytes per element. (Results may vary depending on 32-bit versus 64-bit JVMs, and compressed object pointer options, but in those comparisons LinkedList costs at least 36 bytes/element, and ArrayList is at best 8 and at worst 12.)
UPDATE:
I understand from other posts here that individual elements stored in a LinkedList takes more space than an ArrayList as LinkedList also needs to store the node information, but I am still guessing for the scenario I have defined LinkedList might be a better option. Also, I do not want to get into the performance aspect (fetching, deleting etc) , as much has already been discussed on it.
To be clear, even in the worst case, ArrayList is 4x smaller than a LinkedList with the same elements. The only possible way to make LinkedList win is to deliberately fix the comparison by calling ensureCapacity with a deliberately inflated value, or to remove lots of values from the ArrayList after they've been added.
In short, it's basically impossible to make LinkedList win the memory comparison, and if you care about space, then calling trimToSize() on the ArrayList will instantly make ArrayList win again by a huge margin. Seriously. ArrayList wins.
... but I am still guessing for the scenario I have defined LinkedList might be a better option
Your guess is incorrect.
Once you have got past the initial capacity of the array list, the size of the backing will be between 1 and 2 references times the number of entries. This is due to strategy used to grow the backing array.
For a linked list, each node occupies AT LEAST 3 times the number of entries, because each node has a next and prev reference as well as the entry reference. (And in fact, it is more than 3 times, because of the space used by the nodes' object headers and padding. Depending on the JVM and pointer size, it can be as much as 6 times.)
The only situation where a linked list will use less space than an array list is if you badly over-estimate the array list's initial capacity. (And for very small lists ...)
When you think about it, the only real advantage linked lists have over array lists is when you are inserting and removing elements. Even then, it depends on how you do it.
ArrayList use one reference per object (or two when its double the size it needs to be) This is typically 4 bytes.
LinkedList uses only the nodes its needs, but these can be 24 bytes each.
So even at it worst ArrayList will be 3x smaller than LinkedList.
For fetching ARrayList support random access O(1) but LinkedList is O(n). For deleting from the end, both are O(1), for deleting from somewhere in the middle ArrayList is O(n)
Unless you have millions of entries, the size of the collection is unlikely to matter. What will matter first is the size of entries which is the same regardless of the collection used.
Back of the envelope worst-case:
500,000 names in an array sized to 1,000,000 = 500,000 used, 500,000 empty pointers in the unused portion of the allocated array.
500,000 entries in a linked list = 3 pointers per entry (Node object holds current, prev, and next) = 1,5000,000 pointers in memory. (Then you have to add the size of the Node itself)
ArrayList.trimToSize() may satisfy you.
Trims the capacity of this ArrayList instance to be the list's current
size. An application can use this operation to minimize the storage of
an ArrayList instance.
By the way, in ArrayList Java6, it's not double capacity, it's about 1.5 times max size is reached.
Related
I am currently reading my textbook and I am totally confused why a dynamic array would require O(n) time to delete an item at the end. I understand that deleting an item from any other index is O(n) because you have to copy all the data and move them to fill in the gap, but if it’s at the end don’t we simply just decrement the count and set the index to like 0 or null? I included a picture from my book. It’s weird cause it says indexing is O(1) so we must know where the item is so we don’t have to traverse the array like a linked list.
First, let's look up what the books means with a "Dynamic Array":
Dynamic array (also called as growable array, resizable array,
dynamic table, or array list) is a random access, variable-size list data structure that allows elements to be added or removed.
[...]
Note: We will see the implementation for dynamic array in the Stacks, Queues and Hashing chapters.
From this we learn that array lists are examples of a "Dynamic Array" as the author of the book defines it.
But looking further, the book mentioned that:
As soon as that array becomes full, create the new array of size
double than the original array. Similarly, reduce the array size to
half if the elements in the array are less than half.
(emphasis added by me)
A Java ArrayList doesn't do this - it doesn't decrease in storage when elements are removed. But the author is talking about (or believes that ArrayList does) reduce the array size.
In that case, from a worst-worst-case perspective, you could say that the complexity is O(n) because reducing the size involves copying n elements to the reduced array.
Conclusion:
Although it's not true for Java ArrayList implementations, when the author of this book talks about "dynamic arrays" that "reduce the array size" on deletion when necessary, then the worst-case complexity of a delete at the end of the array is indeed O(n).
That entry seems like it's either
incorrect, or
true, but misleading.
You are absolutely right that you can just destroy the object at the final position in a dynamic array and then decrement the size to remove the last element. In many implementations of dynamic arrays, you'll sometimes need to perform resize operations to make sure that the size of the allocated array is within some constant factor of the number of elements. If that happens, then yes, you'll need to make a new array, copy over the old elements, and free the previous array, which does take time O(n). However, you can show that these resizes are sufficiently infrequent that the average cost of doing a remove from the end is O(1). (In a more technical sense, we say that the amortized cost of removing an element from the end is O(1)). That is, as long as you care only about the total cost of performing a series of operations rather than the cost of any individual operation, you would not be wrong to just pretend each operation costs you time O(1).
I'd say that you're most likely looking at a typo in the materials. Looking at their entry for appending to the end, which differentiates between the not-full and full cases, I think this is likely a copy/paste error. Following the table's lead, that should say something to the effect of "O(1) if the array is not 'too empty,' O(n) otherwise." But again, keep in mind that the amortized efficiency of each operation is O(1), meaning that these scary O(n) terms aren't actually likely to burn you in practice unless you are in a specialized environment where each operation needs to work really quickly.
In java for Dynamic array (ArrayList) time complexity deletion of last element is o(1) in java it does not copy array
in java they will check weather the array index is end.
int numMoved = size - index - 1;
if (numMoved > 0)
//copy array element
Insertion and deletion are operations that we generally do not perform on arrays, because they have a fixed length by nature. You cannot increase or decrease the length of something which is fixed by its nature.
When people speak of "dynamic arrays" in Java they tend to mean using class ArrayList, which is backed by an array, and it provides the illusion of the ability to insert and remove elements.
But in order for some piece of software to provide the illusion of performing insertions and deletions on an array, each time (or almost each time, there are optimizations possible) it has to allocate a new array of the new desired length, and copy the old array into it, skipping the removed element or adding the inserted element as the case may be. That array copy is where the O(N) comes from.
And, due to the optimizations performed by ArrayList, the table that you posted is not accurate: it should rather say 'O(1) if the array has not shrunk by much, O(N) if the array has shrunk by so much that reallocation is deemed necessary'. But I guess that would have been too long to fit in the table cell.
As you have mentioned this could be confusing, if you add an element to a dynamic array, it change its size in a constant interval and will create a new array an copy elements to the new array as you may already know. And when it shrinks in size it will also shirk if needed.
For an example if interval is 4 when you add 1st, 2nd, 3rd, 4th element, everything will be okay, but when you add the 5th item dynamic array will grow into a 8 elements array and will copy the all elements to the new array.
It is the same when it is decreasing. If you remove one item from a 5 item array which has an interval of 4, dynamic array it will create a new 4 elements array and copy the elements.
Here is a good representation video tutorial,
Yes. When dynamic array does not have to shrink it is O(1) which it takes to remove the element but when it has to shrink its O(n), as you may already figured out.
when you find the big O notation you are defining the worst case, so it is O(n)
As far as I think
As it is a dynamic array so the computer system does not know as to what is the current length of this dynamic array so to find the length of this dynamic array it takes O(n) time and then takes O(1) time to delete the element at end.
Deleting an Item from Dynamic array(ArrayList in java) require to Search for the Item and than Delete.
If the element is at the end of list, than Search itself will result in n computation time. I hope this makes sense to you.
You can view source at http://www.docjar.com/html/api/java/util/ArrayList.java.html
Generally, They say that we have moved from Array to ArrayList for the following Reason
Arrays are fixed size where as Array Lists are not .
One of the disadvantages of ArrayList is:
When it reaches it's capacity , ArrayList becomes 3/2 of it's actual size. As a result , Memory can go wasted if we donot utilize the space properly.In this scenario, Arrays are preferred.
If we use ArrayList.TrimSize(), will that make Array List a unanimous choice? Eliminating the only advantage(fixed size) Array has over it?
One short answer would be: trimToSize doesn't solve everything, because shrinking an array after it has grown - is not the same as preventing growth in the first place; the former has the cost of copying + garbage collection.
The longer answer would be: int[] is low level, ArrayList is high level which means it's more convenient but gives you less control over the details. Thus in a business-oriented code (e.g. manipulating a short list of "Products") i'll prefer ArrayList so that I can forget about the technicalities and focus on the business. In a mathematically-oriented code i'll probably go for int[].
There are additional subtle differences, but i'm not sure how relevant they are to you. E.g. concurrency: if you change the data of ArrayList from several threads simultaneously, it will intentionally fail, because that's the intuitive requirement for most business code. An int[] will allow you to do whatever you want, leaving it up to you to make sure it makes sense. Again, this can all be summarized as "low level"...
If you are developing an extremely memory critical application, need resizability as well and performance can be traded off, then trimming array list is your best bet. This is the only time, array list with trimming will be unanimous choice.
In other situations, what you are actually doing is:
You have created an array list. Default capacity of the list is 10.
Added an element and applied trim operation. So both size and capacity is now 1. How trim size works? It basically creates a new array with actual size of the list and copies old array data to new array. Old array is left for grabage collection.
You again added a new element. Since list is full, it will be reallocated with more 50% spaces. Again, procedure similar to 2 will be followed.
Again you call TrimSize and it follows same procedure as 2.
Things repeats...
So you see, we are incurring lots of performance overhead just to keep list capacity and size same. Fixed size is not offering you anything advantageous here except saving few more extra spaces which is hardly an issue in modern machines.
In a nutshell, if you want resizability without writing lots of boilerplate code, then array list is unanimous choice. But if size never changes and you don't need any dynamic function such as removal operation, then array is better choice. Few extra bytes are hardly an issue.
What is the maximum size of HashSet, Vector, LinkedList? I know that ArrayList can store more than 3277000 numbers.
However the size of list depends on the memory (heap) size. If it reaches maximum the JDK throws an OutOfMemoryError.
But I don't know the limit for the number of elements in HashSet, Vector and LinkedList.
There is no specified maximum size of these structures.
The actual practical size limit is probably somewhere in the region of Integer.MAX_VALUE (i.e. 2147483647, roughly 2 billion elements), as that's the maximum size of an array in Java.
A HashSet uses a HashMap internally, so it has the same maximum size as that
A HashMap uses an array which always has a size that is a power of two, so it can be at most 230 = 1073741824 elements big (since the next power of two is bigger than Integer.MAX_VALUE).
Normally the number of elements is at most the number of buckets multiplied by the load factor (0.75 by default). However, when the HashMap stops resizing, then it will still allow you to add elements, exploiting the fact that each bucket is managed via a linked list. Therefore the only limit for elements in a HashMap/HashSet is memory.
A Vector uses an array internally which has a maximum size of exactly Integer.MAX_VALUE, so it can't support more than that many elements
A LinkedList doesn't use an array as the underlying storage, so that doesn't limit the size. It uses a classical doubly linked list structure with no inherent limit, so its size is only bounded by the available memory. Note that a LinkedList will report the size wrongly if it is bigger than Integer.MAX_VALUE, because it uses a int field to store the size and the return type of size() is int as well.
Note that while the Collection API does define how a Collection with more than Integer.MAX_VALUE elements should behave. Most importantly it states this the size() documentation:
If this collection contains more than Integer.MAX_VALUE elements, returns Integer.MAX_VALUE.
Note that while HashMap, HashSet and LinkedList seem to support more than Integer.MAX_VALUE elements, none of those implement the size() method in this way (i.e. they simply let the internal size field overflow).
This leads me to believe that other operations also aren't well-defined in this condition.
So I'd say it's safe to use those general-purpose collections with up to Integer.MAX_VLAUE elements. If you know that you'll need to store more than that, then you should switch to dedicated collection implementations that actually support this.
In all cases, you're likely to be limited by the JVM heap size rather than anything else. Eventually you'll always get down to arrays so I very much doubt that any of them will manage more than 231 - 1 elements, but you're very, very likely to run out of heap before then anyway.
It very much depends on the implementation details.
A HashSet uses an array as an underlying store which by default it attempt to grow when the collection is 75% full. This means it will fail if you try to add more than about 750,000,000 entries. (It cannot grow the array from 2^30 to 2^31 entries)
Increasing the load factor increases the maximum size of the collection. e.g. a load factor of 10 allows 10 billion elements. (It is worth noting that HashSet is relatively inefficient past 100 million elements as the distribution of the 32-bit hashcode starts to look less random, and the number of collisions increases)
A Vector doubles its capacity and starts at 10. This means it will fail to grow above approx 1.34 billion. Changing the initial size to 2^n-1 gives you slightly more head room.
BTW: Use ArrayList rather than Vector if you can.
A LinkedList has no inherent limit and can grow beyond 2.1 billion. At this point size() could return Integer.MAX_VALUE, however some functions such as toArray will fail as it cannot put all objects into an array, in will instead give you the first Integer.MAX_VALUE rather than throw an exception.
As #Joachim Sauer notes, the current OpenJDK could return an incorrect result for sizes above Integer.MAX_VALUE. e.g. it could be a negative number.
The maximum size depends on the memory settings of the JVM and of course the available system memory. Specific size of memory consumption per list entry also differs between platforms, so the easiest way might be to run simple tests.
As stated in other answers, an array cannot reach 2^31 entries. Other data types are limited either by this or they will likely misreport their size() eventually. However, these theoretical limits cannot be reached on some systems:
On a 32 bit system, the number of bytes available never exceeds 2^32 exactly. And that is assuming that you have no operating system taking up memory. A 32 bit pointer is 4 bytes. Anything which does not rely on arrays must include at least one pointer per entry: this means that the maximum number of entries is 2^32/4 or 2^30 for things that do not utilize arrays.
A plain array can achieve it's theoretical limit, but only a byte array, a short array of length 2^31-1 would use up about 2^32+38 bytes.
Some java VMs have introduced a new memory model that uses compressed pointers. By adjusting pointer alignment, slightly more than 2^32 bytes may be referenced with 32 byte pointers. Around four times more. This is enough to cause a LinkedList size() to become negative, but not enough to allow it to wrap around to zero.
A sixty four bit system has sixty four bit pointers, making all pointers twice as big, making non array lists a bunch fatter. This also means that the maximum capacity supported jumps to 2^64 bytes exactly. This is enough for a 2D array to reach its theoretical maximum. byte[0x7fffffff][0x7fffffff] uses memory apporximately equal to 40+40*(2^31-1)+(2^31-1)(2^31-1)=40+40(2^31-1)+(2^62-2^32+1)
My first question is I want to select 100000 elements from database,can list store that many elements?
My second question is I want to fetch all the elements from database in minimum time?Is list is the best way to store elements or is there any other way which can improve performance?
1) Yes, list can store 100000+ elements.The maximum capacity of an List is limited only by the amount of memory the JVM has available.
2) For performance issues, it depends on the type of data to be stored. Normaly HashMaps are used for databases.
i normally use list over your quota, and lists is a good way. If you use string is really great but what about raw type?
you can store Integer.MAX_VALUE elements in List I suspect since the value of index can not accept more than this.
List can store more that 100000 elements. The list capacity is only bound by the JVM memory capacity or Integer.MAX_VALUE whichever is less.
However, If you use know the number of elements that will be retrieved, then, using a simple array gives far better performance.
The maximum size of a List is limited by the maximum value of a Java integer, because integers are used to index the list and to return the size of the list in the method int size();. The maximum value of an int in Java is Integer.MAX_VALUE which is 2147483647.
A particular implementation of List could have a lower limit, but for java.util.ArrayList, that is the limit.
Of course you could run out of memory long before that, that really depends on the memory of your computer and whether you are using a 64-bit version of the JVM or the 32-bit version.
For your second question: the time it takes to transfer data from the database is almost always far higher than the time taken to store the data in the memory of the computer, so if you only worry about the time it takes to store the data in the list, then you should not worry.
If you however are thinking about the time it takes to retrieve the data, then it really depends on how you are retrieving the data from the collection (using a particular key for example).
In many cases, an implementation of java.util.Map such as java.util.HashMap will have better performance when you are retrieving data by a particular key.
can list store that many elements?
Many implementations of java.util.List do not restrict the number of elements, i.e. the number of elements is only limited by available heap memory.
The most commonly used List implementation, ArrayList, is limited to about 2 billion elements (Integer.MAX_VALUE), because that is the maximum length of a Java array.
Other List implementations, such as the Lists returned by Arrays.asList(), Collections.emptyList(), or Collections.singletonList(), have a fixed size, and can not be added to.
Is list is the best way to store elements or is there any other way which can improve performance?
If all you need is to store the elements for later iteration, an ArrayList is probably the best choice, but compared to the cost of communicating with a database, the overhead of any collection implementation will be insignificant, as the database will generally have to perform disk I/O, which is far slower than writing the data to main memory, and writing the actual data (the objects in the list) will take longer than writing the Collection itself.
I want to select 100000 elements from database,can list store that many elements?
Yes. There is an upper limit on the size of an ArrayList (2^31) ... but you are a long way off that. And some other List implementations don't have that limit.
I want to fetch all the elements from database in minimum time? Is list is the best way to store elements or is there any other way which can improve performance?
Most of the CPU time will be spent performing the query and reading from the resultset rather than appending to the list.
The performance of the collection will depend on the element type (object or primitive), and on whether or not you know how many elements there will be. A bare array will give you the best performance if you know the element count beforehand, and an ArrayList if you don't know1. For the case of a primitive types, consider using the "trove" list type instead to avoid the overhead of using primitive type wrappers.
1 - That is ... unless you are prepared to implement an ArrayList-like expansion algorithm for your array based collection.
I want to figure out sometihng.
is this:
List <String> list = new ArrayList<String>();
list.add("abc");
list.add(null);
equals to this
List <String> list = new ArrayList<String>();
list.add("abc");
in memory usage?
thanks!
The initial capacity of an ArrayList is ten (references). That is to say, the underlying array will be of size ten, even if you've only got one entry in your collection. Those references will default to null, and consequently setting the second reference to null will neither affect the internal state of the arraylist (in terms of the underlying array) nor its memory consumption.
If you'd added an eleventh item (set to null), the ArrayList would expand its capacity, and consequently you'd consume more memory, but rather because the ArrayList had created extra buckets for your String references.
From the doc linked above:
Each ArrayList instance has a capacity. The capacity is the size of
the array used to store the elements in the list. It is always at
least as large as the list size. As elements are added to an
ArrayList, its capacity grows automatically. The details of the growth
policy are not specified beyond the fact that adding an element has
constant amortized time cost.
In this case yes, as ArrayList allocates an array of 10 positions by default (in openjdk).
If you used LinkedList instead, then the answer would be no.
Maybe, the first list has two elements and the second list has one element so the first list is larger. However most lists default to 10 elements the two lists may be the same size but if the second add required the list to be expanded then it will take more memory.
Well, "it depends".
With the code you have there, running on the Sun JVM, and the way that they've implemented ArrayList, the answer is "yes", they're equal.
This is because by default, an ArrayList starts off with an array of 10 elements. Odds are the most implementations do the same thing.
But if you had an ArrayList with 10 elements, and another with 11, whether the elements are null or not, the 11 element one would consume more RAM because the internal array used to track the elements would have expanded.
Finally, if you were using different List implementation, such as a LinkedList, then the two lists would consume different amounts of memory, as the LinkedList doesn't pre-allocate anything and uses a node wrapper for its elements.
Edit: Changed my mind.
As many people are stating, yes they will take up the same amount of memory at first, once you add enough null's to the point where the initial capacity is reached you will increase your memory size.
I originally thought that because you can access the null in the first list that it would lead to more memory, however the null values are there in both cases, and the ability to access them does not affect memory allocation.
In this specific case, using an ArrayList with two elements they do.
The initial capacity for an ArrayList is 10 elements, so as long as you add 10 or fewer elements the size of the ArrayList itself is the same (size in memory, not size()). When you add the 11th element the the ArrayList will have to grow the internal storage and thus take more memory.
No both are not Equals.
list [abc,null]
Another list stored
list [abc]
so both are not equal