Hashset objects - java

I'm writing a piece of code which takes a great deal of objects and adds them to another array. The catch is, I don't want any duplicates. Is there a way I could implement a Hashset to solve this problem?
public static Statistic[] combineStatistics(Statistic[] rptData, Statistic[] dbsData) {
HashSet<Statistic> set = new HashSet<Statistic>();
for (int i=0; i<rptData.length; i++) {
set.add(rptData[i]);
}
/*If there's no data in the database, we don't have anything to add to the new array*/
if (dbsData!=null) {
for (int j=0; j<dbsData.length;j++) {
set.add(dbsData[j]);
}
}
Statistic[] total=set.toArray(new Statistic[0]);
for (int workDummy=0; workDummy<total.length; workDummy++) {
System.out.println(total[workDummy].serialName);
}
return total;
}//end combineStatistics()

Properly implement equals(Object obj) and hashCode() on YourObject if you expect value equality instead of reference equality.
Set<YourObject> set = new HashSet<YourObject>(yourCollection);
or
Set<YourObject> set = new HashSet<YourObject>();
set.add(...);
then
YourObject[] array = set.toArray(new YourObject[0])

I think you should pay attention to:
1 - what to do if there is a duplicate in the original Collection? Use the first added to the array? Use the other(s)?
2 - You definitely need to implement equals and hashcode so that you can tell what are duplicate objects
3 - Are you going to create a fixed size array and then won't add anymore objects? Or are you going to keep adding stuff?
You can use any kind of Set actually, but if you use LinkedHashSet, then you will have a defined iteration order (which looks like an array). HashSet wont't garantee any order and TreeSet will try to order data ascending.

Depends on what you are referring to as a duplicate. If you mean an identical object, then you could use a List and simply see if the List contains the object prior to adding it to the list.
Object obj = new Object();
List<Object> list = new ArrayList<Object>();
if (!list.contains(obj)) {
list.add(obj);
}

Related

Passing ArrayList as value only and not reference

Simply put, I have a method with an ArrayList parameter. In the method I modify the contents of the ArrayList for purposes relevant only to what is returned by the method. Therefore, I do not want the ArrayList which is being passed as the parameter to be affected at all (i.e. not passed as a reference).
Everything I have tried has failed to achieve the desired effect. What do I need to do so that I can make use of a copy of the ArrayList within the method only, but not have it change the actual variable?
Even if you had a way to pass the array list as a copy and not by reference it would have been only a shallow copy.
I would do something like:
void foo(final ArrayList list) {
ArrayList listCopy = new ArrayList(list);
// Rest of the code
}
And just work on the copied list.
You can create a copy of the ArrayList using ArrayList's copy constructor:
ArrayList copy = new ArrayList(original);
But if the elements of the list are also objects, then you must be aware that modifying a member of the copy will also modify that member in the original.
You could pass Collections#unmodifiableList(yourList) in order to send an unmodifiable copy of your list. By the way, your List<Whatever> is passed by value since Java always pass by value, note that in foo(List<Whatever> list) method you can not modify the list value but you can modify its contents.
public class MyClass {
List<Whatever> list = new ArrayList<Whatever>();
public void bar() {
//filling list...
foo(Collections.unmodifiableList(list));
}
public void foo(List<Whatever> list) {
//do what you want with list except modifying it...
}
}
You could use the .clone method or a CopyOnWriteArrayList to make a copy, thereby not impacting the original.
Try this in you method :
void method(List<Integer> list) {
List copyList = new ArrayList<Integer>();
copyList.addAll(list); // This will create a copy of all the emlements of your original list
}
I'm not sure on why, even after new ArrayList<MyObj>(old) the object was still changing reference in places it wasn't supposed to. So I had to instantiate a new copy of the objects inside.
I made a copy constructor like the one on the ArrayList and did like
newArray = new ArrayList<MyObj>();
for (int i = 0; i < oldArray.size(); i++) {
newArray.add(new MyObj(ondArray.get(i)));
}
Just hope to help someone else if the answer from Avi is not enough in your case, like mine with a code too messy to even understand =P
Just clone it.
public ArrayList cloneArrayList(ArrayList lst){
ArrayList list = new ArrayList();
for (int i=0; i<lst.size(); i++){
list.add(lst.get(i));
}
return list;
}
Add suggested in the comments, you can also use
ArrayList copy = new ArrayList(original);
and also
ArrayList copy = new ArrayList();
copy.addAll(original);
On the lines of the existing answers but using the ArrayList API. You can use subList(fromIndex, toIndex) method. It explicitly creates a view of the list with only desired elements (of course, in sequence). Here, even if you modify the view with add/remove etc operations, it won't change the original list. It saves you from explicitly creating a copy.
Something like this:
public void recursiveMethod(List<Integer> list) {
if(base)
return;
recursiveCall(list);
// following will just create a tail list but will not actually modify the list
recursiveCall(list.subList(1, list.size());
}

In Java, How to remove duplication from an ArrayList<StringBuilder> efficiently?

I tried to use HashSet to remove the duplications from an ArrayList<StringBuilder>.
E.g. Here is an ArrayList, each line is a StringBuilder object.
"u12e5 u13a1 u1423"
"u145d"
"u12e5 u13a1 u1423"
"u3ab4 u1489"
I want to get the following:
"u12e5 u13a1 u1423"
"u145d"
"u3ab4 u1489"
My current implementation is:
static void removeDuplication(ArrayList<StringBuilder> directCallList) {
HashSet<StringBuilder> set = new HashSet<StringBuilder>();
for(int i=0; i<directCallList.size()-1; i++) {
if(set.contains(directCallList.get(i)) == false)
set.add(directCallList.get(i));
}
StringBuilder lastString = directCallList.get(directCallList.size()-1);
directCallList.clear();
directCallList.addAll(set);
directCallList.add(lastString);
}
But the performance becomes worse and worse as the ArrayList size grows. Is there any problem with this implementation? Or do you have any better ones in terms of performance?
StringBuilder doesn't implement equals() or hashcode(). Two StringBuilders are only equal if they are the exact same object, so adding them to a HashSet won't exclude two different StringBuilder objects with identical content.
You should convert the StringBuilders to String objects.
Also, you should initialize your HashSet with an "initial capacity" in the constructor. This will help with the speed if you are dealing with large numbers of objects.
Lastly, it's not necessary to call contains() on the hashset before adding an object. Just add your Strings to the set, and the set will reject duplicates (and will return false).
Let's analyze your method to find where we can improve it:
static void removeDuplication(ArrayList<StringBuilder> directCallList) {
HashSet<StringBuilder> set = new HashSet<StringBuilder>();
for(int i=0; i<directCallList.size()-1; i++) {
if(set.contains(directCallList.get(i)) == false)
set.add(directCallList.get(i));
}
This for loop repeats once for each element in the ArrayList. This seems unavoidable for the task at hand. However, since HashSet can only contain one of each item, the if statement is redundant. HashSet.add() does the exact same check again.
StringBuilder lastString = directCallList.get(directCallList.size()-1);
I don't understand the need to get the lastString from your list and then add it. If your loop works correctly, it should have already been added to the HashSet.
directCallList.clear();
Depending on the implementation of the list, this can take up to O(n) time because it might need to visit every element in the list.
directCallList.addAll(set);
Again, this takes O(n) time. If there are no duplicates, set contains the original items.
directCallList.add(lastString);
This line seems to be a logic error. You will add a String which is already in the set and added to directCallList.
}
So overall, this algorithm takes O(n) time, but there is a constant factor of 3. If you can reduce this factor, you can improve the performance. One way to do this is to simply create a new ArrayList, rather than clearing the existing one.
Additionally, this removeDuplication() function can be written in one line if you use the correct constructors and return the ArrayList without duplicates:
static List<StringBuilder> removeDuplication(List<StringBuilder> inList) {
return new ArrayList<StringBuilder>(new HashSet<StringBuilder>(inList));
}
Of course, this still doesn't address the issues with StringBuilder that others have pointed out.
So you had some other options, but I like my solutions short, simple, and to the point. I've changed your method to no longer manipulate the parameter, but rather return a new List. I used a Set<String> to see if the contents of each StringBuilder was already included and returned the unique Strings. I also used a for each loop instead of accessing by index.
static List<StringBuilder> removeDuplication(List<StringBuilder> directCallList) {
HashSet<String> set = new HashSet<String>();
List<StringBuilder> returnList = new ArrayList<StringBuilder>();
for(StringBuilder builder : directCallList) {
if(set.add(builder.toString())
returnList.add(builder);
}
return returnList;
}
As Sam states, StringBuider does not override hashCode and equals and so the Set will not work appropriately.
I think the answer is to wrap the Builder in an object that executes toString only once:
class Wrapper{
final String string;
final StringBuilder builder;
Wrapper(StringBuilder builder){
this.builder = builder;
this.string = builder.toString();
}
public int hashCode(){return string.hashCode();}
public boolean equals(Object o){return string.equals(o);}
}
public Set removeDups(List<StringBuilder> list){
Set<Wrapper> set = ...;
for (StringBuilder builder : list)
set.add(new Wrapper(builder));
return set;
}
The removeDups method could be updated to extract the builders from the set and return a List<StringBuilder>
As explained, StringBuilders don't override Object#equals and aren't Comparable.
Although using StringBuilders to concatenate your Strings is the way to go, I would suggest that once you are done with your concatenation, you should store the underlying strings (stringBuilder.toString()) instead of the StringBuilders in your list.
Removing duplicates then becomes a one line:
Set<String> set = new HashSet<String>(list);
Or even better, store the strings in the set directly if you don't need to know that there are duplicates.

Is it possible to use the values method for a HashMap if the values are ArrayLists?

I'm stuck trying to get something to work in an assignment. I have a HashMap<Integer, ArrayList<Object>> called sharedLocks and I want to check whether a certain value can be found in any ArrayList in the HashMap.
The following code obviously wouldn't work because Object[] can't be cast to ArrayList[], but it is a demonstration of the general functionality that I want.
ArrayList[] values = (ArrayList[]) sharedLocks.values().toArray();
boolean valueExists = false;
for (int i = 0; i < values.length; i++) {
if (values[i].contains(accessedObject)) {
valueExists = true;
}
}
Is there a way for me to check every ArrayList in the HashMap for a certain value? I'm not sure how to use the values method for HashMaps in this case.
Any help would be much appreciated.
HashMap.values() returns a Collection. You can iterate through the collection without having to convert it to an array (or list).
for (ArrayList<Object> value : sharedLocks.values()) {
...
}
A HashMap is a bit special, in that it doesn't really have an index to go by at all...
What you want to do, is turn the HashMap into a collection first, and then iterate through the collection with an iterator.
Whenever you get hold of an ArrayList in the HashMap, you cycle through every element in the arrayList, and then you jump out if you find it :)
Use the toArray method which takes an array as an argument.
This uses the array you specify to fill the data, and maintains the typing so you don't need to typecast. Additionally, you should keep the generic <Object> in the definition.
ArrayList<Object>[] values =
sharedLocks.values().toArray(new ArrayList<Object>[sharedLocks.size()]);
One more thing to consider is if multiple threads can modify this HashMap. In this case, you will want to synchronize this line of code to the HashMap and make sure all modifications are also synchronized. This will make sure that other threads won't modify the contents between the .size() call and the .toArray() call, which is possible.
You dont need arrays:
boolean valueExists = false;
for (ArrayList<Object> value : sharedLocks.values()) {
if (value.contains(accessedObject)) {
valueExists = true;
break;
}
}
Why not just iterate through all the values in the map:
for (ArrayList<Object> list : sharedLocks) {
if (list.contains(accessedObject)) {
// ...
}
}
heres a link to an example of iterating though a hash map. Use this to pull out each arraylist and in turn extend this to then search each element of the array list for the given entry.
http://www.java-examples.com/iterate-through-values-java-hashmap-example
you will need to use a nested foreach loop.
foreach(every element in the hashmap) {
foreach(every element in arraylist) {
// do comparision
}
}
you might just get away with a foreach loop and a keyExists() call or something within it. I cannot recall the API off the top of my head.

Cross compare ArrayList elements and remove duplicates

I have an ArrayList<MyObject> that may (or may not) contain duplicates of MyObject I need to remove from the List. How can I do this in a way that I don't have to check duplication twice as I would do if I were to iterate the list in two for-loops and cross checking every item with every other item.
I just need to check every item once, so comparing A:B is enough - I don't want to compare B:A again, as I already did that.
Furthermore; can I just remove duplicates from the list while looping? Or will that somehow break the list and my loop?
Edit: Okay, I forgot an important part looking through the first answers: A duplicate of MyObject is not just meant in the Java way meaning Object.equals(Object), but I need to be able to compare objects using my own algorithm, as the equality of MyObjects is calculated using an algorithm that checks the Object's fields in a special way that I need to implement!
Furthermore, I can't just override euqals in MyObject as there are several, different Algorithms that implement different strategies for checking the equality of two MyObjects - e.g. there is a simple HashComparer and a more complex EuclidDistanceComparer, both being AbstractComparers implementing different algorithms for the public abstract boolean isEqual(MyObject obj1, MyObject obj2);
Sort the list, and the duplicates will be adjacent to each other, making them easy to identify and remove. Just go through the list remembering the value of the previous item so you can compare it with the current one. If they are the same, remove the current item.
And if you use an ordinary for-loop to go through the list, you control the current position. That means that when you remove an item, you can decrement the position (n--) so that the next time around the loop will visit the same position (which will now be the next item).
You need to provide a custom comparison in your sort? That's not so hard:
Collections.sort(myArrayList, new Comparator<MyObject>() {
public int compare(MyObject o1, MyObject o2) {
return o1.getThing().compareTo(o2.getThing());
}
});
I've written this example so that getThing().compareTo() stands in for whatever you want to do to compare the two objects. You must return an integer that is zero if they are the same, greater than 1 if o1 is greater than o2 and -1 if o1 is less than o2. If getThing() returned a String or a Date, you'd be all set because those classes have a compareTo method already. But you can put whatever code you need to in your custom Comparator.
Create a set and it will remove the duplicates automatically for you if the ordering is not important.
Set<MyObject> mySet = new HashSet<MyObject>(yourList);
Instantiate a new set-based collection HashSet. Don't forget to implement equals and hashcode for MyObject.
Good Luck!
If object order is insignificant
If the order is not important, you can put the elements of the list into a Set:
Set<MyObject> mySet = new HashSet<MyObject>(yourList);
The duplicates will be removed automatically.
If object order is significant
If ordering is significant, then you can manually check for duplicates, e.g. using this snippet:
// Copy the list.
ArrayList<String> newList = (ArrayList<String>) list.clone();
// Iterate
for (int i = 0; i < list.size(); i++) {
for (int j = list.size() - 1; j >= i; j--) {
// If i is j, then it's the same object and don't need to be compared.
if (i == j) {
continue;
}
// If the compared objects are equal, remove them from the copy and break
// to the next loop
if (list.get(i).equals(list.get(j))) {
newList.remove(list.get(i));
break;
}
System.out.println("" + i + "," + j + ": " + list.get(i) + "-" + list.get(j));
}
}
This will remove all duplicates, leaving the last duplicate value as original entry. In addition, it will check each combination only once.
Using Java 8
Java Streams makes it even more elegant:
List<Integer> newList = oldList.stream()
.distinct()
.collect(Collectors.toList());
If you need to consider two of your objects equal based on your own definition, you could do the following:
public static <T, U> Predicate<T> distinctByProperty(Function<? super T, ?> propertyExtractor) {
Set<Object> seen = ConcurrentHashMap.newKeySet();
return t -> seen.add(propertyExtractor.apply(t));
}
(by Stuart Marks)
And then you could do this:
List<MyObject> newList = oldList.stream()
.filter(distinctByProperty(t -> {
// Your custom property to use when determining whether two objects
// are equal. For example, consider two object equal if their name
// starts with the same character.
return t.getName().charAt(0);
}))
.collect(Collectors.toList());
Futhermore
You cannot modify a list while an Iterator (which is usually used in a for-each loop) is looping through an array. This will throw a ConcurrentModificationException. You can modify the array if you are looping it using a for loop. Then you must control the iterator position (decrementing it while removing an entry).
Or http://docs.oracle.com/javase/6/docs/api/java/util/SortedSet.html if you need sort-order..
EDIT: What about deriving from http://docs.oracle.com/javase/6/docs/api/java/util/TreeSet.html, it will allow you to pass in a Comparator at construction time. You override add() to use your Comparator instead of equals() - this will give you the flexibility of creating different sets that are ordered according to your Comparator and they will implement your "Equality"-Strategy.
Dont forget about equals() and hashCode() though...

Modifying a set during iteration java

I'm looking to make a recursive method iterative.
I have a list of Objects I want to iterate over, and then check their subobjects.
Recursive:
doFunction(Object)
while(iterator.hasNext())
{
//doStuff
doFunction(Object.subObjects);
}
I want to change it to something like this
doFunction(Object)
iIterator = hashSet.iterator();
while(Iterator.hasNext()
{
//doStuff
hashSet.addAll(Object.subObjects);
}
Sorry for the poor psuedo code, but basically I want to iterate over subobjects while appending new objects to the end of the list to check.
I could do this using a list, and do something like
while(list.size() > 0)
{
//doStuff
list.addAll(Object.subObjects);
}
But I would really like to not add duplicate subObjects.
Of course I could just check whether list.contains(each subObject) before I added It.
But I would love to use a Set to accomplish that cleaner.
So Basically is there anyway to append to a set while Iterating over it, or is there an easier way to make a List act like a set rather than manually checking .contains()?
Any comments are appreciated.
Thanks
I would use two data structures --- a queue (e.g. ArrayDeque) for storing objects whose subobjects are to be visited, and a set (e.g. HashSet) for storing all visited objects without duplication.
Set visited = new HashSet(); // all visited objects
Queue next = new ArrayDeque(); // objects whose subobjects are to be visited
// NOTE: At all times, the objects in "next" are contained in "visited"
// add the first object
visited.add(obj);
Object nextObject = obj;
while (nextObject != null)
{
// do stuff to nextObject
for (Object o : nextObject.subobjects)
{
boolean fresh = visited.add(o);
if (fresh)
{
next.add(o);
}
}
nextObject = next.poll(); // removes the next object to visit, null if empty
}
// Now, "visited" contains all the visited objects
NOTES:
ArrayDeque is a space-efficient queue. It is implemented as a cyclic array, which means you use less space than a List that keeps growing when you add elements.
"boolean fresh = visited.add(o)" combines "boolean fresh = !visited.contains(o)" and "if (fresh) visited.add(o)".
I think your problem is inherently a problem that needs to be solved via a List. If you think about it, your Set version of the solution is just converting the items into a List then operating on that.
Of course, List.contains() is a slow operation in comparison to Set.contains(), so it may be worth coming up with a hybrid if speed is a concern:
while(list.size() > 0)
{
//doStuff
for each subObject
{
if (!set.contains(subObject))
{
list.add(subObject);
set.add(subObject)
}
}
}
This solution is fast and also conceptually sound - the Set can be thought of as a list of all items seen, whereas the List is a queue of items to work on. It does take up more memory than using a List alone, though.
If you do not use a List, the iterator will throw an exception as soon as you read from it after modifying the set. I would recommend using a List and enforcing insertion limits, then using ListIterator as that will allow you to modify the list while iterating over it.
HashSet nextObjects = new HashSet();
HashSet currentObjects = new HashSet(firstObject.subObjects);
while(currentObjects.size() > 0)
{
Iterator iter = currentObjects.iterator();
while(iter.hasNext())
{
//doStuff
nextObjects.add(subobjects);
}
currentObjects = nextObjects;
nextObjects = new HashSet();
}
I think something like this will do what I want, I'm not concerned that the first Set contains duplicates, only that the subObjects may point to the same objects.
Use more than one set and do it in "rounds":
/* very pseudo-code */
doFunction(Object o) {
Set processed = new HashSet();
Set toProcess = new HashSet();
Set processNext = new HashSet();
toProcess.add(o);
while (toProcess.size() > 0) {
for(it = toProcess.iterator(); it.hasNext();) {
Object o = it.next();
doStuff(o);
processNext.addAll(o.subObjects);
}
processed.addAll(toProcess);
toProcess = processNext;
toProcess.removeAll(processed);
processNext = new HashSet();
}
}
Why not create an additional set that contains the entire set of objects? You can use that for lookups.

Categories