working on my app I came across a behavior I have not expected or have previously encountered.
Consider this simple class:
public class A {
public long id;
public long date;
public List<Long> list;
/* constructors */
}
Now consider these 2 approaches to doing the same thing:
/* Approach #1 */
List<A> mList = new ArrayList<A>();
long mLong = ......;
A mA = new A(id, date);
if(!mList.contains(mA))
mList.add(mA);
mA = mList.get(mList.indexOf(mA));
if(!mA.list.contains(mLong))
mA.list.add(mLong);
/* Approach #2 */
List<A> mList = new ArrayList<A>();
long mLong = ......;
A mA = new A(id, date);
if(!mA.list.contains(mLong))
mA.list.add(mLong);
if(!mList.contains(mA))
mList.add(mA);
As you can see, approach #2 is more efficient than approach #1, and also much easier to understand.
Apparently, though, approach #2 does not work as expected.
The code actually runs in a loop, and it is expected that there could be various objects of type A inside mList, and an unknown (more than 1) amount of long values inside the list field of each object.
What really happens is that the first approach works fine, while the second approach results in a situation where there is always 1 long value inside list of every object (even when there should be more).
I personally can't see what could possibly cause it to work that way, which is why I'm here, asking for the answer to this 'mystery'. My wildest guess would say it's related to pointers, or maybe some default behavior of List<T> I'm not aware of.
With that being said, what could be the cause to this unexpected behavior?
P.S: I did try to run a search before posting, but I really had no idea what to search for, so I didn't find anything useful.
second approach results in a situation where there is always 1 long value inside list of every object (even when there should be more).
problem
A mA = new A(id, date);
if(!mA.list.contains(mLong))
As you can see you are not getting the reference of the class A from the mList and you are checking if the value long contains on the list that was just created which will only add one. So basically what you are doing is creating a new intance of class A with 1 long value on the list of long and add to the mList
on the other hand your first Approach is getting the instance of already added class A and checking if that long contains in the list if not then add it on the list long.
This is because mList.contains(mA) internally check for the equality of the objects by calling o1.equals(o2). The default implementation of equals() looks like this:
public boolean equals(Object o) {
return this == o;
}
Obviously the instances are not the same so you are adding a new instance every time. Override equals() in your class A to fix the problem. I guess the instances are the same if they have the same id?
public boolean equals(Object o) {
return this.mId == o.mId;
}
If there was a method mList.addIfAbsent(mA) (which returns either mA after adding it to the list, or the object which was already present and matches mA on equals), it would make your operation as trivial as
mA = mList.addIfAbsent(mA);
mA.list.addIfAbsent(mLong);
In your second example you obviously break this mechanism for the case when the mA equivalent is already in there. Basically, you change the definition of addIfAbsent(mA) to "adds mA to the list if no other object in the list is equal to it, and returns mA."
You can improve performance, achieving the identical result as your second example (sans the bug) like this:
int indOfOld = mList.indexOf(ma);
if (indOfOld != -1)
ma = mList.get(indOfOld);
else
mList.add(mA);
if(!mA.list.contains(mLong))
mA.list.add(mLong);
This won't cut your big-O complexity, but will at least make do with just one O(n) operation (compared with two in your working code).
BTW this may be obvious to you and everyone else; if so, excuse me—but if those lists get any larger than thousands of elements, you could get a significant improvement in performance if you used a HashSet, or even a LinkedHashSet if you care for the insertion order. In that case, you would just try to add an object, getting false if it was already there, and this would cost you just O(1) time. Then you would get(mA) from the set instead of your roundabout way with indexOf, also in O(1) time.
Related
I am searching for a way in Java to directly access an object in a list that contains a given object.
what i mean is something like this
List<ObjectA> list = new ArrayList<ObjectA>();
Objectb objb = new ObjectB();
list.add(new ObjectA(objb, new ObjectB()));
ObjectA containsObjB;
boolean gotit= false;
for(ObjectA a: list)
{
for(ObjectB pObjB: a.getObjBs())
{
if(pObjB.equals(objb)
{
containsObjB = a;
gotit = true;
break;
}
if(gotit) break;
}
}
this would be the long run around but since this operation will be really time-critical i wondered if there is a possibility to map the list so i can instant access the correct ObjectA.
Not sure if this is possible, if not any suggestion how to make that fast / faster is welcome.
Greetings
Sebastian
well, if you're stuck using a list, and the list isn't already sorted or in a known order, you have to look at every item.
Even if you wanted to build an index/map, you'd still have to look at every item to build the map... so if it is a one time lookup, the brute force might just be the way to go.
If this is a common occurrence, you could build some kind of index, like a Map of objb:obja, presuming each B is only in one A...
Or you could change the code so that it isn't lists of lists, it is collections optimized for lookups.
Which on these instructions is better in terms of performance and memory usage :
if(val.equals(CONSTANT1) || val.equals(CONSTANT2) ..... || val.equals(CONSTANTn)) {
}
OR
if(Arrays.asList(CONSTANT1,CONSTANT2, ..... ,CONSTANTn).contains(val)) {
}
A better question to ask would be how to write this code more clearly (and faster, if performance actually matters). The answer to that would be a switch statement (or possibly even polymorphism, if you want to convert your constants into an enum) or a lookup array.
But if you insist on comparing your two approaces, the first is slightly faster. To see this, let's look at what the second aproach entails:
create a new array with the constants, to pass them to the vararg parameter of Arrays.asList
create a new list object wrapping that array
iterate over that array, comparing each element with equals
The third step is equivalent to your first approach.
Finally, it's worth noting that such an operation will likely take far less than a micro second, so unless you invoke this method millions of times per second, any approach will be fast enough.
Theoretically #1 is faster but insignificantly, because Arrays.asList creates only one object - a list view (wrapper) of the specified array, there is no array copying:
public static <T> List<T> asList(T... a) {
return new ArrayList<T>(a);
}
private static class ArrayList<E> extends AbstractList<E>
implements RandomAccess, java.io.Serializable
{
private static final long serialVersionUID = -2764017481108945198L;
private final E[] a;
ArrayList(E[] array) {
if (array==null)
throw new NullPointerException();
a = array;
}
Since you are not using a loop I guess that the number of values is so low that in practice any differences will be irrelevant.
However, having said that, if one was to iterate by hand and use equals() versus asList() and contains()... it would still be the same.
Arrays.asList() returns a private implementation of a list which extends AbstractList and simply wraps around the existing array by reference (no copy is done). The contains() method uses the indexOf() which goes through the array using equals() on each element until it finds a match and then returns it. If you would break on your loop when you find an equals then both implementations would be quite equivalent.
The only difference would be a tiny memory footprint for the additional list structure that Arrays.asList() creates, other than that...
if(val.equals(CONSTANT1) || val.equals(CONSTANT2) ..... || val.equals(CONSTANTn)) {
}
is the better in terms of performance and memory because the 2nd one will take time to build a list and start searching for the val in that list. Here extra memory is required to maintain the list and also extra time is spent on iterating through the list.Where as the comparing the val with constant will make use of short circuit comparison approach.
Lets say I have a list like this:
private LinkedList<String> messages = new LinkedList<String>();
When my method gets invoked for the first time there some strings added to this list. And I have also another method in which I need to clear this list from previously added values. To clear it I can use:
messages.clear();
This will remove all the elements from the list. Also I can create a new instance like this:
messages = new LinkedList<String>();
Which way is more proper to clear the list?
messages.clear();
Will actually clear the list, messages = new LinkedList<String>(); will just set messages as referencing a new list instance, so you could argue the first way is more "correct" to clear the list instance.
Say you have a list that is referenced by two variables, a and b. Like this (they don't have to be as close to eachother as this, they might even be in different files..):
final List<String> a = new LinkedList<String>();
final List<String> b = a;
Now, there is a big difference between
a.clear();
which will make both a and b reference the same, empty list, and
a = new LinkedList<String>();
which will make 'a' reference a new, empty list, and 'b' the old, populated list. (So they do not reference the same list).
Since you probably want them to reference the same list, a.clear() is preferred, since you won't get any surprises when your looking at the list referenced by b (which you might believe to be empty, but turns out to be populated if you use the new-approach).
I prefer the first approach i.e. messages.clear(); as it clear the elements but the List is not destroyed and recreated. All elements are removed as desired.
One side effect is there though: It iterates your list and removes one item at a time so if the list is huge then it's an unnecessary overhead.
for (Node<E> x = first; x != null; ) {
Node<E> next = x.next;
x.item = null;
x.next = null;
x.prev = null;
x = next;
}
first = last = null;
size = 0;
modCount++;
Same way second approach has also one side effect: If you are using the object reference of you r list somewhere else in your program, that needs to handled properly otherwise you could get some unwanted surprises e.g. if you added your list to some other object/variable, then first approach will clear that elements from every place where it was referenced while second will not.
Summary: Both the approach outcomes are different in low level nature; though they seem to to serve your high level requirement (clearing the list). Decide carefully based on your low level requirements.
They are almost similar, but I would say messages.clear() is more flexible.
The second approach is simple and much used, but the problem where you have final modifier on your list you can not clear it that way.
messages.clear();
is more efficient. For more safety you can ask if this list is not empty befor
Personnaly I prefere to use LinkedList#clear because it is more clearly to understand during reading the code what you are doing.
But the new LinkedList<String>(); will work fine as well. So it's up to you what to use!
It clearly depends upon your need.
If you want to keep reference to your list object instance (as an example if that clear method is called inside a method in which the messages is a parameter, then the call to .clear() is the best solution.
On the other hand, if the list you want to clear is a member field (or a local variable in a method) of the object the current method is a member of, then you can call new LinkedList<String>(); without any trouble.
Notice that, to avoid the first (which I tend to disapprove), i usuall always return obejcts I modify as results from methods modifying them.
the first one is preferable. the second one makes some extra burden on the garbage collector. but the first one not.
I'm having problems with Iterator.remove() called on a HashSet.
I've a Set of time stamped objects. Before adding a new item to the Set, I loop through the set, identify an old version of that data object and remove it (before adding the new object). the timestamp is included in hashCode and equals(), but not equalsData().
for (Iterator<DataResult> i = allResults.iterator(); i.hasNext();)
{
DataResult oldData = i.next();
if (data.equalsData(oldData))
{
i.remove();
break;
}
}
allResults.add(data)
The odd thing is that i.remove() silently fails (no exception) for some of the items in the set. I've verified
The line i.remove() is actually called. I can call it from the debugger directly at the breakpoint in Eclipse and it still fails to change the state of Set
DataResult is an immutable object so it can't have changed after being added to the set originally.
The equals and hashCode() methods use #Override to ensure they are the correct methods. Unit tests verify these work.
This also fails if I just use a for statement and Set.remove instead. (e.g. loop through the items, find the item in the list, then call Set.remove(oldData) after the loop).
I've tested in JDK 5 and JDK 6.
I thought I must be missing something basic, but after spending some significant time on this my colleague and I are stumped. Any suggestions for things to check?
EDIT:
There have been questions - is DataResult truly immutable. Yes. There are no setters. And when the Date object is retrieved (which is a mutable object), it is done by creating a copy.
public Date getEntryTime()
{
return DateUtil.copyDate(entryTime);
}
public static Date copyDate(Date date)
{
return (date == null) ? null : new Date(date.getTime());
}
FURTHER EDIT (some time later):
For the record -- DataResult was not immutable! It referenced an object which had a hashcode which changed when persisted to the database (bad practice, I know). It turned out that if a DataResult was created with a transient subobject, and the subobject was persisted, the DataResult hashcode was changed.
Very subtle -- I looked at this many times and didn't notice the lack of immutability.
I was very curious about this one still, and wrote the following test:
import java.util.HashSet;
import java.util.Iterator;
import java.util.Random;
import java.util.Set;
public class HashCodeTest {
private int hashCode = 0;
#Override public int hashCode() {
return hashCode ++;
}
public static void main(String[] args) {
Set<HashCodeTest> set = new HashSet<HashCodeTest>();
set.add(new HashCodeTest());
System.out.println(set.size());
for (Iterator<HashCodeTest> iter = set.iterator();
iter.hasNext();) {
iter.next();
iter.remove();
}
System.out.println(set.size());
}
}
which results in:
1
1
If the hashCode() value of an object has changed since it was added to the HashSet, it seems to render the object unremovable.
I'm not sure if that's the problem you're running into, but it's something to look into if you decide to re-visit this.
Under the covers, HashSet uses HashMap, which calls HashMap.removeEntryForKey(Object) when either HashSet.remove(Object) or Iterator.remove() is called. This method uses both hashCode() and equals() to validate that it is removing the proper object from the collection.
If both Iterator.remove() and HashSet.remove(Object) are not working, then something is definitely wrong with your equals() or hashCode() methods. Posting the code for these would be helpful in diagnosis of your issue.
Are you absolutely certain that DataResult is immutable? What is the type of the timestamp? If it's a java.util.Date are you making copies of it when you're initializing the DataResult? Keep in mind that java.util.Date is mutable.
For instance:
Date timestamp = new Date();
DataResult d = new DataResult(timestamp);
System.out.println(d.getTimestamp());
timestamp.setTime(System.currentTimeMillis());
System.out.println(d.getTimestamp());
Would print two different times.
It would also help if you could post some source code.
You should all be careful of any Java Collection that fetches its children by hashcode, in the case that its child type's hashcode depends on its mutable state. An example:
HashSet<HashSet<?>> or HashSet<AbstaractSet<?>> or HashMap variant:
HashSet retrieves an item by its hashCode, but its item type
is a HashSet, and hashSet.hashCode depends on its item's state.
Code for that matter:
HashSet<HashSet<String>> coll = new HashSet<HashSet<String>>();
HashSet<String> set1 = new HashSet<String>();
set1.add("1");
coll.add(set1);
print(set1.hashCode()); //---> will output X
set1.add("2");
print(set1.hashCode()); //---> will output Y
coll.remove(set1) // WILL FAIL TO REMOVE (SILENTLY)
Reason being is HashSet's remove method uses HashMap and it identifies keys by hashCode, while AbstractSet's hashCode is dynamic and depends upon the mutable properties of itself.
Thanks for all the help. I suspect the problem must be with equals() and hashCode() as suggested by spencerk. I did check those in my debugger and with unit tests, but I've got to be missing something.
I ended up doing a workaround-- copying all the items except one to a new Set. For kicks, I used Apache Commons CollectionUtils.
Set<DataResult> tempResults = new HashSet<DataResult>();
CollectionUtils.select(allResults,
new Predicate()
{
public boolean evaluate(Object oldData)
{
return !data.equalsData((DataResult) oldData);
}
}
, tempResults);
allResults = tempResults;
I'm going to stop here-- too much work to simplify down to a simple test case. But the help is miuch appreciated.
It's almost certainly the case the hashcodes don't match for the old and new data that are "equals()". I've run into this kind of thing before and you essentially end up spewing hashcodes for every object and the string representation and trying to figure out why the mismatch is happening.
If you're comparing items pre/post database, sometimes it loses the nanoseconds (depending on your DB column type) which can cause hashcodes to change.
Have you tried something like
boolean removed = allResults.remove(oldData)
if (!removed) // COMPLAIN BITTERLY!
In other words, remove the object from the Set and break the loop. That won't cause the Iterator to complain. I don't think this is a long term solution but would probably give you some information about the hashCode, equals and equalsData methods
The Java HashSet has an issue in "remove()" method. Check the link below. I switched to TreeSet and it works fine. But I need the O(1) time complexity.
https://bugs.openjdk.java.net/browse/JDK-8154740
If there are two entries with the same data, only one of them is replaced... have you accounted for that? And just in case, have you tried another collection data structure that doesn't use a hashcode, say a List?
I'm not up to speed on my Java, but I know that you can't remove an item from a collection when you are iterating over that collection in .NET, although .NET will throw an exception if it catches this. Could this be the problem?
Let's say I have this type in my application:
public class A {
public int id;
public B b;
public boolean equals(Object another) { return this.id == ((A)another).id; }
public int hashCode() { return 31 * id; //nice prime number }
}
and a Set<A> structure. Now, I have an object of type A and want to do the following:
If my A is within the set, update its field b to match my object.
Else, add it to the set.
So checking if it is in there is easy enough (contains), and adding to the set is easy too. My question is this: how do I get a handle to update the object within? Interface Set doesn't have a get method, and the best I could think of was to remove the object in the set and add mine. another, even worse, alternative is to traverse the set with an iterator to try and locate the object.
I'll gladly take better suggestions... This includes the efficient use of other data structures.
Yuval =8-)
EDIT: Thank you all for answering... Unfortunately I can't 'accept' the best answers here, those that suggest using a Map, because changing the type of the collection radically for this purpose only would be a little extreme (this collection is already mapped through Hibernate...)
Since a Set can only contain one instance of an object (as defined by its equals and hashCode methods), just remove it and then add it. If there was one already, that other one will be removed from the Set and replaced by the one you want.
I have code that does something similar - I am caching objects so that everywhere a particular object appears in a bunch of different places on the GUI, it's always the same one. In that case, instead of using a Set I'm using a Map, and then I get an update, I retrieve it from the Map and update it in place rather than creating a new instance.
You really want to use a Map<Integer,A>, not a Set<A>.
Then map the ID (even though it's also stored in A!) to the object. So storing new is this:
A a = ...;
Map<Integer,A> map = new HashMap<Integer,A>();
map.put( a.id, a );
Your complete update algorithm is:
public static void update( Map<Integer,A> map, A obj ) {
A existing = map.get( obj.id );
if ( existing == null )
map.put( obj.id, obj );
else
existing.b = obj.b;
}
However, it might be even simpler. I'm assuming you have more fields than that in A that what you gave. If this is not the case, just using a Map<Integer,B> is in fact what you want, then it collapses to nothing:
Map<Integer,B> map = new HashMap<Integer,B>();
// The insert-or-update is just this:
map.put( id, b );
I don't think you can make it any easier than using remove/add if you are using a Set.
set.remove(a);
set.add(a);
If a matching A was found it will be removed and then you add the new one, you don't even need the if (set.contains(A)) conditional.
If you have an object with an ID and an updated field and you don't really care about any other aspects of that object, just throw it out and replace it.
If you need to do anything else to the A that matches that ID then you'll have to iterate through the Set to find it or use a different Container (like the Map as Jason suggested).
No one has mentioned this yet, but basing hashCode or equals on a mutable property is one of those really, really big things that you shouldn't do. Don't muck about with object identity after you leave the constructor - doing so greatly increases your chances of having really difficult-to-figure out bugs down the road. Even if you don't get hit with bugs, the accounting work to make sure that you always properly update any and all data structures that relies on equals and hashCode being consistent will far outweigh any perceived benefits of being able to just change the id of the object as you run.
Instead, I strongly recommend that you pass id in via the constructor, and if you need to change it, create a new instance of A. This will force users of your object (including yourself) to properly interact with the collection classes (and many others) that rely on immutable behavior in equals and hashCode.
What about Map<A,A> I know it's redundant, but I believe it will get you the behavior you'd like. Really I'd love to see Set have a get(Object o) method on it.
You might want to generate a decorator called ASet and use an internal Map as the backing data structure
class ASet {
private Map<Integer, A> map;
public ASet() {
map = new HashMap<Integer, A>();
}
public A updateOrAdd(Integer id, int delta) {
A a = map.get(a);
if(a == null) {
a = new A(id);
map.put(id,a);
}
a.setX(a.getX() + delta);
}
}
You can also take a look at the Trove API. While that is better for performance and for accounting that you are working with primitive variables, it exposes this feature very nicely (e.g. map.adjustOrPutValue(key, initialValue, deltaValue).
It's a bit outside scope, but you forgot to re-implement hashCode(). When you override equals please override hashCode(), even in an example.
For example; contains() will very probably go wrong when you have a HashSet implementation of Set as the HashSet uses the hashCode of Object to locate the bucket (a number which has nothing to do with business logic), and only equals() the elements within that bucket.
public class A {
public int id;
public B b;
public int hashCode() {return id;} // simple and efficient enough for small Sets
public boolean equals(Object another) {
if (object == null || ! (object instanceOf A) ) {
return false;
}
return this.id == ((A)another).id;
}
}
public class Logic {
/**
* Replace the element in data with the same id as element, or add element
* to data when the id of element is not yet used by any A in data.
*/
public void update(Set<A> data, A element) {
data.remove(element); // Safe even if the element is not in the Set
data.add(element);
}
}
EDIT Yuvalindicated correctly that Set.add does not overwrite an existing element, but only adds if the element is not yet in the collection (with "is" implemented by equals)