Usefulness of ArrayList<E>.clear()? - java

I was looking through the Java documentation, and I came across the clear() method of ArrayLists.
What's the use of this, as opposed to simply reassigning a new ArrayList object to the variable?

Because there might be multiple references to the one list, it might be preferable and/or more practical to clear it than reassigning all the references.
If you empty your array a lot (within, say, a large loop) there's no point creating lots of temporary objects. Sure the garbage collector will eventually clean them up but there's no point being wasteful with resources if you don't have to be.
And because clearing the list is less work than creating a new one.

You might have a final field (class variable) List:
private final List<Thing> things = ...
Somewhere in the class you want to clear (remove all) things. Since things is final it can't be reassigned. Furthermore, you probably don't want to reassign a new List instance as you already have a perfectly good List instantiated.

Imagine the situation where there is multiple references of the same java.util.ArrayList throughout your code. It would be almost impossible or very difficult to create new instance of the list and assign to all the variables. But java.util.ArrayList.clear() does the trick!

You pay less with clear than you do with creating a new object if your objective was to really clear.
Reassigning a reference doesn't clear the object. The assumption is that if there are no other references to it, it would be reclaimed by the GC relatively soon. Otherwise, you just got yourself a mess.

In addition to the reasons mentioned, clearing a list is often more semantically correct than creating a new one. If your desired outcome is a cleared list, the action you should take is to clear the list, not create a new list.

clear() doesn't reallocate a new object, so it's less of a performance hit.

Related

Is it more efficient to put the 'new' in the put method, or use an intermediate variable?

Which of the below ways of adding to a HashMap is more efficient (considering both time and space efficiency)?
Way 1:
Music foo = new Music(Files.getMusic("bar/bold.mp3"));
HashMap.put("rock", foo);
Way 2:
HashMap.put("rock", new Music(Files.getMusic("bar/bold.mp3")));
Both are the exact same. When running
new Music(Files.getMusic("bar/bold.mp3"));
You create an object in memory, and return a reference to it. Whether you temporarily store that reference in foo before passing it to the HashMap or not doesn't really make a difference (and even if it would, this would be optimized away).
this is identical.
Java objects are passed by reference, you have the exact same number of objects created in both cases.

" Suppose you are passing or returning an array of references to mutable objects to/from a method..."

"Suppose you are passing or returning an array of references to mutable objects to/from a method. Is it safe to make a reference copy only? Is it safe to make a shallow copy?"
This is a study question that was given to my class and the answer is "Neither one is safe. Only a deep copy is safe in this case."
Why is this?
"Safe" can mean a lot of things, but in your particular context it is about the safety of "your" private data (the text refers to "you" as a writer of some Java class). Your private data cannot be safe if you let the client of your class access and modify it behind your back.
Therefore:
if you return an array of mutable objects, you must make copies of all those objects and return them in a new array;
if you get an array of mutable objects passed in, you must again copy them all and put them into a new array—because your client already has references to the objects he passed in.
In practice, all this is a lot of CPU work and takes memory, so it is rarely done. You either design everything to be immutable—or else live with the danger inherent to mutable objects.
If your objects are mutable that means that any client with a reference to them can modify them. This can lead to race conditions, deadlocks and other un-fun behavior. However, if you make a deep copy of your objects right before using them, you will effectively be working with a snapshot of the objects. This ensures no other client is able to modify them, eliminating any concurrency or correctness concern.
In the first case both the original array elements and the objects can be modified. In the second case only the objects can be modified, as we no longer have access to the original array. If you perform a deep copy we are working with entirely different arrays and objects, so of course it's safe.

Map clear vs null

I have a map that I use to store dynamic data that are discarded as soon as they are created (i.e. used; they are consumed quickly). It responds to user interaction in the sense that when user clicks a button the map is filled and then the data is used to do some work and then the map is no longer needed.
So my question is what's a better approach for emptying the map? should I set it to null each time or should I call clear()? I know clear is linear in time. But I don't know how to compare that cost with that of creating the map each time. The size of the map is not constant, thought it may run from n to 3n elements between creations.
If a map is not referenced from other objects where it may be hard to set a new one, simply null-ing out an old map and starting from scratch is probably lighter-weight than calling a clear(), because no linear-time cleanup needs to happen. With the garbage collection costs being tiny on modern systems, there is a good chance that you would save some CPU cycles this way. You can avoid resizing the map multiple times by specifying the initial capacity.
One situation where clear() is preferred would be when the map object is shared among multiple objects in your system. For example, if you create a map, give it to several objects, and then keep some shared information in it, setting the map to a new one in all these objects may require keeping references to objects that have the map. In situations like that it's easier to keep calling clear() on the same shared map object.
Well, it depends on how much memory you can throw at it. If you have a lot, then it doesn't matter. However, setting the map itself to null means that you have freed up the garbage collector - if only the map has references to the instances inside of it, the garbage collector can collect not only the map but also any instances inside of it. Clear does empty the map but it has to iterate over everything in the map to set each reference to null, and this takes place during your execution time that you can control - the garbage collector essentially has to do this work anyways, so let it do its thing. Just note that setting it to null doesn't let you reuse it. A typical pattern to reuse a map variable may be:
Map<String, String> whatever = new HashMap<String, String();
// .. do something with map
whatever = new HashMap<String, String>();
This allows you to reuse the variable without setting it to null at all, you silently discard the reference to the old map. This is atrocious practice in non-memory managed applications since they must reference the old pointer to clear it (this is a dangling pointer in other langauges), but in Java since nothing references this the GC marks it as eligible for collection.
I feel nulling the existing map is more cheaper than clear(). As creation of object is very cheap in modern JVMs.
Short answer: use Collection.clear() unless it is too complicated to keep the collection arround.
Detailed answer: In Java, the allocation of memory is almost instantaneous. It is litle more than a pointer that gets moved inside the VM. However, the initialization of those objects might add up to something significant. Also, all objects that use an internal buffer are sensible to resizing and copying of their content. Using clear() make sure that buffers eventually stabilize to some dimension, so that reallocation of memory and copying if old buffer to new buffer will never be necessary.
Another important issue is that reallocating then releasing a lot of objects will require more frequent execution of the Garbage collector, which might cause suddenly lag.
If you always holds the map, it will be prompted to the old generation. If each user has one corresponding map, the number of map in the old generation is proportionate to the number of the user. It may trigger Full GC more frequently when the number of users increase.
You can use both with similar results.
One prior answer notes that clear is expected to take constant time in a mature map implementation. Without checking the source code of the likes of HashMap, TreeMap, ConcurrentHashMap, I would expect their clear method to take constant time, plus amortized garbage collection costs.
Another poster notes that a shared map cannot be nulled. Well, it can if you want it, but you do it by using a proxy object which encapsulates a proper map and nulls it out when needed. Of course, you'd have to implement the proxy map class yourself.
Map<Foo, Bar> myMap = new ProxyMap<Foo, Bar>();
// Internally, the above object holds a reference to a proper map,
// for example, a hash map. Furthermore, this delegates all calls
// to the underlying map. A true proxy.
myMap.clear();
// The clear method simply reinitializes the underlying map.
Unless you did something like the above, clear and nulling out are equivalent in the ways that matter, but I think it's more mature to assume your map, even if not currently shared, may become shared at a later time due to forces you can't foresee.
There is another reason to clear instead of nulling out, even if the map is not shared. Your map may be instantiated by an external client, like a factory, so if you clear your map by nulling it out, you might end up coupling yourself to the factory unnecessarily. Why should the object that clears the map have to know that you instantiate your maps using Guava's Maps.newHashMap() with God knows what parameters? Even if this is not a realistic concern in your project, it still pays off to align yourself to mature practices.
For the above reasons, and all else being equal, I would vote for clear.
HTH.

is it ok to change the value from outside a Map?

So i have a code snippet here. I go this issue while i was discussing some code with my friend
Map<Integer , List<String>> myMap = new HashMap<Integer , List<String>>();
List<String> list = new ArrayList<String>();
myMap.put(45,list);
List<String> lst = myMap.get(45);
lst.add("String1");
lst.add("String2");
lst.add("String3");
System.out.println(myMap.get(45));
My question here is.
-> If its ok to modify the list outside the map through another reference? I am asking from OOP design point of view.
That is completely ok, IMHO
When you write
List<String> lst = myMap.get(45);
Still it is refering to the value in the map, for the key 45.
Once you get the value(reference to the list), It's up to you what you are doing with it.
If its ok to modify the list outside the map through another reference? I am asking from OOP design point of view.
It really depends on the context in which you're modifying it. If you plan on doing this a lot, with a lot of different values, then you're quickly going to find yourself with very confusing code that is difficult to debug and to follow.
BUT, in your example, you first load it from the map, then you edit it. It's completely clear that the data is coming from your Map object. Provided you make it clear with comments and documentation, especially when you're passing this reference between other methods, this isn't bad practise at all.
It is OK, provided that you take care of any potential synchronizations; e.g. if there are multiple threads that might be modifying the map and / or the list.
You might be confusing this with the case where you modify a key object. That is distinctly NOT ok if the modification breaks the hash table invariants; e.g.
if it causes either the key's hashcode to change, or
if it causes the key to give a different result when compared with some other key used in the table.
I am asking from OOP design point of view.
I'd say that OO design is neutral on this issue. You are using a Java interface (i.e. Map) that doesn't take control of the values. You are not violating encapsulation because the values are not encapsulated by the Map abstraction.
Whether this is sound design from the application perspective depends on the overall design. We can't make a judgement one way or another without understanding the context.
Every reference has a scope, it is your take(based on your requirement) whether you want the Map to be accessed through multiple reference or through a single reference.
It's OK.
After you have added numbers to the list in lines 5-7 in your code snippet, and then you get the list from the map again in line 8, the list you get from the map will have the extra numbers you just added.
That depends on what you want to do with the list and what your requirements are.
I'd say it is ok-ish but it might be better to encapsulate that in another object.
Consider the question what to do with empty lists, should they be removed or kept?
Encapsulation would allow you to ensure that empty lists are removed, since the user would then only access the wrapper, not the list directly.
Btw, with HashMap you have to change the list outside the map ;)
ArrayList is mutable. It is resizeable and keeps the same reference after modification. To have immutable list you should use following code.
List<String> list = Collections.unmodifiableList(new ArrayList<String>());
If you define list above way, than you can't modify it.

What's best for array reinitialization : set to null by iteration or simply allocate a new array?

In my code, I have an array that has about 1000 elements :
Object[] arr = new Object[1000];
After my array is populated (the whole array or just partially), I need to reinitialize it. From what I know, I have two choices : to initialize it by new keyword, or to iterate over it and set each element to null. I think first approach is best than second, but also I'm waiting for your thoughts.
Any links or articles on this topic are welcome.
Thanks in advance
First one is better. By reinitializing it with new keyword, you put the previous set of array eligible for garbage collection by providing a path to GC (assuming that other live objects does not have a reference to any of them).
The second one would achieve the same effect eventually, but there is an added performance hit because you need to iterate one by one. For 1000 records, this would likely happen very fast, but if the number grows then the hit would be greater.
Agree with your first choice use arr = new Object[1000] and don't loop it.
Also the use of new Object[1000] doesn't create 1000 objects it only makes a "placeholder" for 1000 objects so it's a very cheap operation.
And if you know you will populate all 1000 objects you can just use the array as is without reinitializing it.
First is the best way Object[] arr = new Object[1000];
also you can find in below link, number of ways array can be initialised
array initialisation
First by setting the array to null you effectively tell the GC it can check everything in that array and clean it up if necessary. So there is no need to iterate through the elements, even if you arent going to do a new right away.
That being said, the only time you really would ever explicitly need to set a variable(for GC purposes anyway) to be explicitly NULL is if you no longer need the data pointed to by that variable, but have nothing new to put in its place AND the variable, for whatever reason, will stay in scope for a while. In that case its advisable to set the value to null, or better yet, re-work your code so that variable goes out of scope and that is done for you.
So for instance, in your example say arr was a static member of some class and you just needed to do some processing on the array at startup and never look at it again. In that case, the contents of arr will stick around for the entire time your program is running UNLESS you explicitly set it to null(or assign it a new value).

Categories