Difference between ArrayList.TrimToSize() and Array?

Difference between ArrayList.TrimToSize() and Array? - java

Generally, They say that we have moved from Array to ArrayList for the following Reason
Arrays are fixed size where as Array Lists are not .
One of the disadvantages of ArrayList is:
When it reaches it's capacity , ArrayList becomes 3/2 of it's actual size. As a result , Memory can go wasted if we donot utilize the space properly.In this scenario, Arrays are preferred.
If we use ArrayList.TrimSize(), will that make Array List a unanimous choice? Eliminating the only advantage(fixed size) Array has over it?

One short answer would be: trimToSize doesn't solve everything, because shrinking an array after it has grown - is not the same as preventing growth in the first place; the former has the cost of copying + garbage collection.
The longer answer would be: int[] is low level, ArrayList is high level which means it's more convenient but gives you less control over the details. Thus in a business-oriented code (e.g. manipulating a short list of "Products") i'll prefer ArrayList so that I can forget about the technicalities and focus on the business. In a mathematically-oriented code i'll probably go for int[].
There are additional subtle differences, but i'm not sure how relevant they are to you. E.g. concurrency: if you change the data of ArrayList from several threads simultaneously, it will intentionally fail, because that's the intuitive requirement for most business code. An int[] will allow you to do whatever you want, leaving it up to you to make sure it makes sense. Again, this can all be summarized as "low level"...

If you are developing an extremely memory critical application, need resizability as well and performance can be traded off, then trimming array list is your best bet. This is the only time, array list with trimming will be unanimous choice.
In other situations, what you are actually doing is:
You have created an array list. Default capacity of the list is 10.
Added an element and applied trim operation. So both size and capacity is now 1. How trim size works? It basically creates a new array with actual size of the list and copies old array data to new array. Old array is left for grabage collection.
You again added a new element. Since list is full, it will be reallocated with more 50% spaces. Again, procedure similar to 2 will be followed.
Again you call TrimSize and it follows same procedure as 2.
Things repeats...
So you see, we are incurring lots of performance overhead just to keep list capacity and size same. Fixed size is not offering you anything advantageous here except saving few more extra spaces which is hardly an issue in modern machines.
In a nutshell, if you want resizability without writing lots of boilerplate code, then array list is unanimous choice. But if size never changes and you don't need any dynamic function such as removal operation, then array is better choice. Few extra bytes are hardly an issue.

Related

Why is deletion of an item at end of Dynamic array O(n) time complexity?

I am currently reading my textbook and I am totally confused why a dynamic array would require O(n) time to delete an item at the end. I understand that deleting an item from any other index is O(n) because you have to copy all the data and move them to fill in the gap, but if it’s at the end don’t we simply just decrement the count and set the index to like 0 or null? I included a picture from my book. It’s weird cause it says indexing is O(1) so we must know where the item is so we don’t have to traverse the array like a linked list.

First, let's look up what the books means with a "Dynamic Array":
Dynamic array (also called as growable array, resizable array,
dynamic table, or array list) is a random access, variable-size list data structure that allows elements to be added or removed.
[...]
Note: We will see the implementation for dynamic array in the Stacks, Queues and Hashing chapters.
From this we learn that array lists are examples of a "Dynamic Array" as the author of the book defines it.
But looking further, the book mentioned that:
As soon as that array becomes full, create the new array of size
double than the original array. Similarly, reduce the array size to
half if the elements in the array are less than half.
(emphasis added by me)
A Java ArrayList doesn't do this - it doesn't decrease in storage when elements are removed. But the author is talking about (or believes that ArrayList does) reduce the array size.
In that case, from a worst-worst-case perspective, you could say that the complexity is O(n) because reducing the size involves copying n elements to the reduced array.
Conclusion:
Although it's not true for Java ArrayList implementations, when the author of this book talks about "dynamic arrays" that "reduce the array size" on deletion when necessary, then the worst-case complexity of a delete at the end of the array is indeed O(n).

That entry seems like it's either
incorrect, or
true, but misleading.
You are absolutely right that you can just destroy the object at the final position in a dynamic array and then decrement the size to remove the last element. In many implementations of dynamic arrays, you'll sometimes need to perform resize operations to make sure that the size of the allocated array is within some constant factor of the number of elements. If that happens, then yes, you'll need to make a new array, copy over the old elements, and free the previous array, which does take time O(n). However, you can show that these resizes are sufficiently infrequent that the average cost of doing a remove from the end is O(1). (In a more technical sense, we say that the amortized cost of removing an element from the end is O(1)). That is, as long as you care only about the total cost of performing a series of operations rather than the cost of any individual operation, you would not be wrong to just pretend each operation costs you time O(1).
I'd say that you're most likely looking at a typo in the materials. Looking at their entry for appending to the end, which differentiates between the not-full and full cases, I think this is likely a copy/paste error. Following the table's lead, that should say something to the effect of "O(1) if the array is not 'too empty,' O(n) otherwise." But again, keep in mind that the amortized efficiency of each operation is O(1), meaning that these scary O(n) terms aren't actually likely to burn you in practice unless you are in a specialized environment where each operation needs to work really quickly.

In java for Dynamic array (ArrayList) time complexity deletion of last element is o(1) in java it does not copy array
in java they will check weather the array index is end.
int numMoved = size - index - 1;
if (numMoved > 0)
//copy array element

Insertion and deletion are operations that we generally do not perform on arrays, because they have a fixed length by nature. You cannot increase or decrease the length of something which is fixed by its nature.
When people speak of "dynamic arrays" in Java they tend to mean using class ArrayList, which is backed by an array, and it provides the illusion of the ability to insert and remove elements.
But in order for some piece of software to provide the illusion of performing insertions and deletions on an array, each time (or almost each time, there are optimizations possible) it has to allocate a new array of the new desired length, and copy the old array into it, skipping the removed element or adding the inserted element as the case may be. That array copy is where the O(N) comes from.
And, due to the optimizations performed by ArrayList, the table that you posted is not accurate: it should rather say 'O(1) if the array has not shrunk by much, O(N) if the array has shrunk by so much that reallocation is deemed necessary'. But I guess that would have been too long to fit in the table cell.

As you have mentioned this could be confusing, if you add an element to a dynamic array, it change its size in a constant interval and will create a new array an copy elements to the new array as you may already know. And when it shrinks in size it will also shirk if needed.
For an example if interval is 4 when you add 1st, 2nd, 3rd, 4th element, everything will be okay, but when you add the 5th item dynamic array will grow into a 8 elements array and will copy the all elements to the new array.
It is the same when it is decreasing. If you remove one item from a 5 item array which has an interval of 4, dynamic array it will create a new 4 elements array and copy the elements.
Here is a good representation video tutorial,
Yes. When dynamic array does not have to shrink it is O(1) which it takes to remove the element but when it has to shrink its O(n), as you may already figured out.
when you find the big O notation you are defining the worst case, so it is O(n)

As far as I think
As it is a dynamic array so the computer system does not know as to what is the current length of this dynamic array so to find the length of this dynamic array it takes O(n) time and then takes O(1) time to delete the element at end.

Deleting an Item from Dynamic array(ArrayList in java) require to Search for the Item and than Delete.
If the element is at the end of list, than Search itself will result in n computation time. I hope this makes sense to you.
You can view source at http://www.docjar.com/html/api/java/util/ArrayList.java.html

ArrayList vs LinkedList from memory allocation perspective

I need to store a large amount of information, say for example 'names' in a java List. The number of items can change (or in short I cannot predefine the size). I am of the opinion that from a memory allocation perspective LinkedList would be a better option than ArrayList, as for an ArrayList once the max size is reached, automatically the memory allocation doubles and hence there would always be a chance of more memory being allocated than what is needed.
I understand from other posts here that individual elements stored in a LinkedList takes more space than an ArrayList as LinkedList also needs to store the node information, but I am still guessing for the scenario I have defined LinkedList might be a better option. Also, I do not want to get into the performance aspect (fetching, deleting etc) , as much has already been discussed on it.

LinkedList might allocate fewer entries, but those entries are astronomically more expensive than they'd be for ArrayList -- enough that even the worst-case ArrayList is cheaper as far as memory is concerned.
(FYI, I think you've got it wrong; ArrayList grows by 1.5x when it's full, not 2x.)
See e.g. https://github.com/DimitrisAndreou/memory-measurer/blob/master/ElementCostInDataStructures.txt : LinkedList consumes 24 bytes per element, while ArrayList consumes in the best case 4 bytes per element, and in the worst case 6 bytes per element. (Results may vary depending on 32-bit versus 64-bit JVMs, and compressed object pointer options, but in those comparisons LinkedList costs at least 36 bytes/element, and ArrayList is at best 8 and at worst 12.)
UPDATE:
I understand from other posts here that individual elements stored in a LinkedList takes more space than an ArrayList as LinkedList also needs to store the node information, but I am still guessing for the scenario I have defined LinkedList might be a better option. Also, I do not want to get into the performance aspect (fetching, deleting etc) , as much has already been discussed on it.
To be clear, even in the worst case, ArrayList is 4x smaller than a LinkedList with the same elements. The only possible way to make LinkedList win is to deliberately fix the comparison by calling ensureCapacity with a deliberately inflated value, or to remove lots of values from the ArrayList after they've been added.
In short, it's basically impossible to make LinkedList win the memory comparison, and if you care about space, then calling trimToSize() on the ArrayList will instantly make ArrayList win again by a huge margin. Seriously. ArrayList wins.

... but I am still guessing for the scenario I have defined LinkedList might be a better option
Your guess is incorrect.
Once you have got past the initial capacity of the array list, the size of the backing will be between 1 and 2 references times the number of entries. This is due to strategy used to grow the backing array.
For a linked list, each node occupies AT LEAST 3 times the number of entries, because each node has a next and prev reference as well as the entry reference. (And in fact, it is more than 3 times, because of the space used by the nodes' object headers and padding. Depending on the JVM and pointer size, it can be as much as 6 times.)
The only situation where a linked list will use less space than an array list is if you badly over-estimate the array list's initial capacity. (And for very small lists ...)
When you think about it, the only real advantage linked lists have over array lists is when you are inserting and removing elements. Even then, it depends on how you do it.

ArrayList use one reference per object (or two when its double the size it needs to be) This is typically 4 bytes.
LinkedList uses only the nodes its needs, but these can be 24 bytes each.
So even at it worst ArrayList will be 3x smaller than LinkedList.
For fetching ARrayList support random access O(1) but LinkedList is O(n). For deleting from the end, both are O(1), for deleting from somewhere in the middle ArrayList is O(n)
Unless you have millions of entries, the size of the collection is unlikely to matter. What will matter first is the size of entries which is the same regardless of the collection used.

Back of the envelope worst-case:
500,000 names in an array sized to 1,000,000 = 500,000 used, 500,000 empty pointers in the unused portion of the allocated array.
500,000 entries in a linked list = 3 pointers per entry (Node object holds current, prev, and next) = 1,5000,000 pointers in memory. (Then you have to add the size of the Node itself)

ArrayList.trimToSize() may satisfy you.
Trims the capacity of this ArrayList instance to be the list's current
size. An application can use this operation to minimize the storage of
an ArrayList instance.
By the way, in ArrayList Java6, it's not double capacity, it's about 1.5 times max size is reached.

Remove unused allocated Memory from HashMaps

I want to read some XML-files and convert it to a graph (no graphics, just a model). But because the files are very large (2,2 GB) my model object, which holds all the information, becomes even larger (4x the size of the file...).
Googling through the net I tried to find ways to reduce the object size. I tried different collection types but would like to stick to a HashMap (because I have to have random access). The actuall keys and values make up just a small amount of the allocated memory. Most of the hash table is empty...
If I'm not totally wrong a garbage collection doesn't help me to free the allocated memory and reduce the size of the hashmap. Is there and other way to release unused memory and shrink the hashmap? Or is there a way to do perfect hashing? Or shoud I just use another collection?
Thanks in advance,
Sebastian

A HashMap is typically just a large array of references filled to a certain percentage of capacity. If only 80% of the map is filled, the remaining 20% of the array cells are unused (i.e., are null). The extra overhead is really only just the empty (null) cells.
On a 32-bit CPU, each array cell is usually 4 bytes in size (although some JVM implementations may allocate 8 bytes). That's not really that much unused space overall.
Once your map is filled, you can copy it to another HashMap with a more appropriate (smaller) size giving a larger fill percentage.
Your question seems to imply that there are more allocated but unused objects that you're worried about. But how is that the case?
Addendum
Once a map is filled almost to capacity (typically more than 95% or so), a larger array is allocated, the old array's contents are copied to the new array, and then the smaller array is left to be garbage collected. This is obviously an expensive operation, so choosing a reasonably large initial size for the map is key to improving performance.
If you can (over)estimate the number of cells needed, preallocating a map can reduce or even eliminate the resizing operations.

What you are asking is not so clear, it is not clear if memory is taken by the objects that you put inside the hasmap or by the hashmap itself, which shouldn't be the case since it only holds references.
In any case take a look at the WeakHashMap, maybe it is what you are looking for: it is an hashmap which doesn't guarantee that keys are kept inside it, it should be used as a sort of cache but from your description I don't really know if it is your case or not.

If you get nowhere with reducing the memory footprint of your hashmap, you could always put the data in a database. Depending on how the data is accessed, you might still get reasonable performance if you introduce a cache in front of the db.

One thing that might come into play is that you might have substrings that are referencing old larger strings, and those substrings are then making it impossible for the GC to collect the char arrays that are too big.
This happens when you are using some XML parsers that are returning attributes/values as substring from a larger string. (A substring is only a limited view of the larger string).
Try to put your strings in the map by doing something like this:
map.put(new String(key), new String(value));
Note that the GC then might get more work to do when you are populating the map, and this might not help you if you don't have that many substrings that are referencing larger strings.

If you're really serious about this and you have time to spare, you can make your own implementation of the Map interface based on minimal perfect hashing
If your keys are Strings, then there apparently is a map available for you here.
I haven't tried it myself but it brags about reduced memory usage.

You might give the Trove collections a shot. They advertise it as a more time and space efficient drop-in replacement for the java.util Collections.

What else can I use instead of a HashMap?

In my project, I get entries of a form from two servers and keeping them in a hashmap.
key is serverName and value is 2d ArrayList (ArrayList<ArrayList<Object>>)
In ArrayList, I keep the values of fields that belong to the form on that server.
I compare these values in two server and print them to an excel file.
My problem is that When I get a form with 12000 entries and 100 fields, This map use around 400M of memory. I don't want my program to use this much memory. Can you suggest me anything?

I doubt it's the hashmap that is causing you problems, but the ArrayList, since it allocates room for 10 entries by default. If you're only storing one or two values for each index, then that will be wasteful.
You could try setting the initial size to 1 or 2 to see if that helps. A potential downside is that if the size is too small, it will cause frequent reallocation. But you will see yourself if that causes any significant slowdown.

The HashMap ist not at all the problem here. What objects are actually contained in the ArrayList<ArrayList<Object>>?
You really should use VisualVM and do some heap profiling to see what actually takes up your memory. That's much better than the guesswork here, and you may be surprised by the result.

I suppose that much of the memory waste results from using a lot of ArrayLists. They are designed for dynamic use (additions & removals), so they usually have many unused positions. If your matrix is static, consider using 2d array instead of a list of a lists. Otherwise, try to set the capacity of the ArrayList to some estimated value, instead of the default value.

The problem is obviously not the Hashmap itself, because it has no more then two entries (the keys are your two server names). You just have to handle a large amount of data(2 x 12000 x 100 values, if I get it right plus the result, which is an 'excel file'). It just needs some memory. The big objects are the two 2D arrays lists. The map just has references to those data structures.
Usually I wouldn't care and just increase the max heap size to 512M or even 1G.

Java collections memory consumption

Say I instantiate 100 000 of Vectors
a[0..100k] = new Vector<Integer>();
If i do this
a[0..100k] = new Vector<Integer>(1);
Will they take less memory? That is ignoring whether they have stuff in them and the overhead of expanding them when there has to be more than 1 element.

According to the Javadoc, the default capacity for a vector is 10, so I would expect it to take more memory than a vector of capacity 1.
In practice, you should probably use an ArrayList unless you need to work with another API that requires vectors.

When you create a Vector, you either specify the size you want it to have at the start or leave some default value. But it should be noted that in any case everything stored in a Vector is just a bunch of references, which take really little place compared to the objects they are actually pointing at.
So yes, you will save place initially, but only by the amount which equals to the difference between the default size and the specified multiplied by the size of a reference variable. If you create a really large amount of vectors like in your case, initial size does matter.

Well, sort of yes. IIRC Vector initializes internally 16 elements by default which means that due to byte alignment and other stuff done by underlying VM you'll save a considerable amount of memory initially.
What are you trying to accomplish, though?

Yes, they will. Putting in reasonable "initial sizes" for collections is one of the first things I do when confronted with a need to radically improve memory consumption of my program.

Yes, it will. By default, Vector allocates space for 10 elements.
Vector()
Constructs an empty vector so that its internal data array has size 10
and its standard capacity increment is zero.increment is zero.
Therefore, it reserves memory for 10 memory references.
That being said, in real life situations, this is rarely a concern. If you are truly generating 100,000 Vectors, you need to rethink your designincrement is zero.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.