How to effectively remove updated HashSet items - java

Given the following snippet, how do we effectively remove elements on which have been previously updated/changed?
public static class Foo {
#Override
public int hashCode() {
return new Random().nextInt();
}
}
public static void main(String[] args) {
Set<Foo> set = new HashSet<>();
set.add(new Foo());
set.removeIf(f -> true); // Returns true, but no deletion occurs
assert set.size() == 0; // Fails as set still contains it's single item
}
Note: The above snippet is intended to simulate a different Foo upon next call to Object::hashCode (on Set::remove and Set::removeIf).
EDIT:
For those who did not understand the "random hash" part, here is a different view of the problem stated above:
public static class Bar {
public String firstName;
public String lastName;
public Bar() {
this(null, null);
}
public Bar(String firstName, String lastName) {
this.firstName = firstName;
this.lastName = lastName;
}
#Override
public int hashCode() {
int result = 1;
result *= 59 + (firstName == null ? 43 : firstName.hashCode());
result *= 59 + (lastName == null ? 43 : lastName.hashCode());
return result;
}
}
public static void main(String[] args) {
Set<Bar> set = new HashSet<>();
String originalFirstName = "FOO";
String updatedFirstName = "FOO_CHANGED";
// Create bar
Bar bar = new Bar();
bar.firstName = originalFirstName;
bar.lastName = "BAR";
// Add bar
set.add(bar);
// Change bar
System.out.println("Bar hash (now): " + bar.hashCode());
bar.firstName = updatedFirstName;
System.out.println("Bar hash (new): " + bar.hashCode());
Bar oldBar = new Bar(originalFirstName, bar.lastName);
Bar changedBar = new Bar(bar.firstName, bar.lastName);
System.out.println("Old bar hash: " + oldBar.hashCode()); // Hash matches old value
System.out.println("Changed bar hash: " + changedBar.hashCode()); // Hash matches new value
set.remove(oldBar); // Removes no elements (returns false)
set.remove(changedBar); // Removes no elements (returns false)
set.removeIf(f -> true); // Removes no elements (returns true)
Iterator<Bar> iterator = set.iterator();
while (iterator.hasNext()) {
iterator.next();
iterator.remove(); // Fails silently
}
assert set.size() == 0;
}
There's no random hash at all.
There are different hashes indeed, but apparently the elements can never be removed if they have ever been changed (therefore, their hash), regardless what. We can confirm that on both Set::remove calls, where set.remove(oldBar) should have removed the element as oldBar hash equals to the hash when bar was added.

As all other answers and comments, firstly, I should say that hashCode should remain consistent: it supposed remain the same when an element is stored in a hash based collection.
Amusingly, the code snippet in OpenJDK 11 will return 0 when the set's size is queried, but on OpenJDK 8 it will remain 1.
This happened due to changes in the standard library (see JDK-8170733):
The removeIf in HashMap#keySet (HashSet uses a HashMap underneath) is not overridden, so it relies on Iterator#remove.
Implementation of the latter method has been changed to avoid recomputing the hashCode inside the HashMap.HashIterator::remove method.
So the removeIf will successfully remove elements
See this commit:
- K key = p.key;
- removeNode(hash(key), key, null, false, false);
+ removeNode(p.hash, p.key, null, false, false);
Once again: do not rely rely on this implementation detail

The problem here is that if you modify the object in such a way that the hashcode is different then it is no longer structurally the same object. Another way to say this, original.equals(modified) is false (or at least should be due to the contracts of equals() and hashCode(). One solution is to modify hashCode() to calculate based on some invariant. In other words, the returned hashcode is only based on the identifying data in a Foo object that will never change no matter what. For example, this could be an id, such as for an object that maps to an underlying database table.
Alternatively, you could find a different data structure that matches your use case better. For example an ArrayList might be more appropriate since you can remove items at a given index regardles of the state of that object.

Not consistent hashCode is wrong.
However I could not understand how it should affect you in your case as you invoke removeIf which iterates all elements of the set.
So I tried it using JAVA 11 and it worked. Set was emptied and size returned was 0 as expected. I am curious of what configurations you use.
public static void main(String[] args){
Set<Foo> s = new HashSet<>();
s.add(new Foo("user1", 3));
s.add(new Foo("user2", 5));
s.forEach( e -> System.out.println(e));
s.removeIf(f-> true);
s.forEach( e ->System.out.println(e));
System.out.println(s.size());
}

Related

Duplicate items added in ConcurrentSkipListSet

I am trying to maintain insertion order in ConcurrentSkipListSet. The item being added is a custom class type with value(String) and index (int) properties. It implements Comparable interface. The set behaves very inconsistently, sometimes adding duplicate items. Items are considered duplicate if they have same value.
// This is the Item class being added in the set.
final class Item implements Comparable<Item> {
private String value;
private int index;
Item(String val, int idx) {
this.value = val;
this.index = idx;
}
#Override
public int compareTo(Item o) {
// returns zero when values are equal indicating it's a duplicate item.
return this.value.equals(o.value) ? 0 : this.index - o.index;
}
#Override
public String toString() {
return this.value;
}
}
// Below is the main class.
public class Test {
ConcurrentSkipListSet<Item> set;
AtomicInteger index;
public Test() {
set = new ConcurrentSkipListSet<>();
index = new AtomicInteger(0);
}
public static void main(String[] args) {
for (int i = 1; i <= 10; i++) {
Test test = new Test();
test.addItems();
test.assertItems();
}
}
//trying to test it for 10 times. It always fails for once or twice.
private void assertItems() {
Iterator<Item> iterator = set.iterator();
String[] values = {"yyyy", "bbbb", "aaaa"};
for (String value : values) {
if (!value.equals(iterator.next().toString())) {
System.out.println("failed for :" + set);
return;
}
}
System.out.println("passed for :" + set);
}
//adding items with some duplicate values
private void addItems() {
set.add(new Item("yyyy", index.getAndIncrement()));
set.add(new Item("bbbb", index.getAndIncrement()));
set.add(new Item("yyyy", index.getAndIncrement()));
set.add(new Item("aaaa", index.getAndIncrement()));
}
Expected : passed for :[yyyy, bbbb, aaaa]
Actual : failed for :[yyyy, bbbb, yyyy, aaaa]
But as mentioned before, the result is very inconsistent. Most of the times, it passes.
Please let know what could be the reason for this behavior. Is the 'compareTo()' method wrong? If so, it should always fail.
Ideally we should override 'equals()' method also. But it doesn't matter from sorted set perspective.
Appreciate your help.
You have broken the contract of compareTo, which results in undefined behaviour.
Finally, the implementor must ensure that x.compareTo(y)==0 implies
that sgn(x.compareTo(z)) == sgn(y.compareTo(z)), for all z.
You can easily see that you fail this requirement by pulling your Items out into variables:
final Item x = new Item("yyyy", index.getAndIncrement());
final Item z = new Item("bbbb", index.getAndIncrement());
final Item y = new Item("yyyy", index.getAndIncrement());
System.out.println(x.compareTo(y));
System.out.println(x.compareTo(z));
System.out.println(y.compareTo(z));
Output:
0
-1
1
The signs are different, therefore the contract has been broken.
In your compareTo-implementation you are mixing two different properties in an illegal way. Thus you break the contract of the Comparable interface.
In your comparison, you look at the index only if the values are not equal. This way you do not define an overall natural order for your items. Depending on what comparison is done first, the result of sorting a list will be random.
#Override
public int compareTo(Item o) {
int vCompare = this.value.compareTo(o.value);
if (vCompare == 0) {
return this.index - o.index;
}
return vCompare;
}
This implementation will first compare by value and then by index. It adheres to the Comparable contract and actually defines a natural order for Items and works fine with the Set implementation.
Caution: This sample implementation will break the tests.
The tests are there to show the code behaves as intended. But in this case the intended behavior is the actual issue.
It is incompatible with the Comparable contract.
You cannot sort a list by numeric index and expect a lookup by alphabetical value to succeed. But that's exactly what is attempted here. Sort by index but find duplicate names. It does not work this way.
I don't know the implementation of ConcurrentSkipListSet in detail, but it looks like you need to override the equals method of your class to specify what qualifies two objects to be equal.
This is not an answer, rather a solution to achieve the objective based on root cause finding by #Michael and #Jochen. Modified the Item class comparator to below to have natural order of value String.
public int compareTo(Item o) {
return this.value.compareTo(o.value);
}
And then, added an index based comparator to achieve FIFO retrieval.
// This iterator would now be used in assertItems() method in main class.
private Iterator<Item> getFIFOIterator() {
ArrayList<Item> list = new ArrayList<>(set);
list.sort(Comparator.comparingInt(Item::getIndex));
return list.iterator();
}
#Michael and #Jochen : Appreciate you for taking your time and figuring out the root cause.

Iterating through hashmap and creating unique objects - trying to prevent duplicates

I explain what I am trying to do in comments above the parts in the method:
public int addPatron(String name) throws PatronException {
int i = 0;
//1. Iterate through a hashmap, and confirm the new name I am trying to add to the record doesn't already exist in the hashmap
for (Map.Entry<Integer, Patron> entry : patrons.entrySet()) {
Patron nameTest = entry.getValue();
//2. If the name I am trying to add already exists, we want to throw an exception saying as much.
if (nameTest.getName() == name) {
throw new PatronException ("This patron already exists");
//3. If the name is unique, we want to get the largest key value (customer number) already in the hash, an increment by one.
} else if (nameTest.getName() != name) {
Map.Entry<Integer,Patron> maxEntry = null;
for(Map.Entry<Integer, Patron> entryCheck : patrons.entrySet()) {
if (maxEntry == null || entryCheck.getKey() > maxEntry.getKey()) {
maxEntry = entryCheck;
i = maxEntry.getKey();
i++;
}
}
} else {
throw new PatronException("Something's not working!");
}
//4. If everything is ok up to this point, we want to us the name and the new customer id number, and use those to create a new Patron object, which then gets added to a hashmap for this class which contains all the patrons.
Patron newPatron = new Patron(name, i);
patrons.put(i, newPatron);
}
return i;
}
When I try and run a simple unit test that will fail if I successfully add the same name for addPatron twice in a row, the test fails.
try {
testLibrary.addPatron("Dude");
testLibrary.addPatron("Dude");
fail("This shouldn't have worked");
The test fails, telling me the addPatron method is able to use the same name twice.
#Jon Skeet:
My Patron class looks like this:
public class Patron {
//attributes
private String name = null;
private int cardNumber = 0;
//operations
public Patron (String name, int cardNumber){
this.name = name;
this.cardNumber = cardNumber;
}
public String getName(){
return name;
}
public int getCardNumber(){
return cardNumber;
}
}
As others have said, the use of == for comparing strings is almost certainly inappropriate. However, it shouldn't actually have caused a problem in your test case, as you're using the same constant string twice, so == should have worked. Of course, you should still fix the code to use equals.
It's also not clear what the Patron constructor or getName methods do - either of those could cause a problem (e.g. if they create a new copy of the string - that would cause your test to fail, but would also be unnecessary usually).
What's slightly more worrying to me is this comment:
// 3. If the name is unique, we want to get the largest key value (customer number)
// already in the hash, an increment by one.
This comment is within the main loop. So by that point we don't know that the name is unique - we only know that it doesn't match the name of the patron in this iteration.
Even more worrying - and I've only just noticed this - you perform the add within the iteration block too. It seems to me that you should have something more like this:
public int addPatron(String name) throws PatronException {
int maxKey = -1;
for (Map.Entry<Integer, Patron> entry : patrons.entrySet()) {
if (entry.getValue().getName().equals(name)) {
// TODO: Consider using IllegalArgumentException
throw new PatronException("This patron already exists");
}
maxKey = Math.max(maxKey, entry.getKey());
}
int newKey = maxKey + 1;
Patron newPatron = new Patron(name, newKey);
patrons.put(newKey, newPatron);
return newKey;
}
Additionally, it sounds like really you want a map from name to patron, possibly as well as the id to patron map.
You need to use equals to compare String objects in java, not ==. So replace:
if (nameTest.getName() == name) {
with:
if (nameTest.getName().equals(name)) {
Try to use
nameTest.getName().equals(name)
instead of
nameTest.getName() == name
because now you're comparing references and not the value of the String.
it's explained here
Took another look on your code
Well i took another look on your code and the problem is, that your HashMap is empty at the start of the Test. So the loop will never be runned ==> there will never bee a Patron added or an Exception thrown.
The cause of the problem is how you have used the compare operator ==.
When you use this operator against two objects, what you test is that variable point to the same reference.
To test two objects for value equality, you should use equals() method or compareTo if available.
For String class, invoke of equals is sufficient the check that the store same characters more.
What is equals method ?
To compare the values of Object
The problem is how you compare names.

equals and hashCode: Is Objects.hash method broken?

I am using Java 7, and I have the following class below. I implemented equals and hashCode correctly, but the problem is that equals returns false in the main method below yet hashCode returns the same hash code for both objects. Can I get more sets of eyes to look at this class to see if I'm doing anything wrong here?
UPDATE: I replaced the line on which I call the Objects.hash method with my own hash function: chamorro.hashCode() + english.hashCode() + notes.hashCode(). It returns a different hash code, which is what hashCode is supposed to do when two objects are different. Is the Objects.hash method broken?
Your help will be greatly appreciated!
import org.apache.commons.lang3.StringEscapeUtils;
public class ChamorroEntry {
private String chamorro, english, notes;
public ChamorroEntry(String chamorro, String english, String notes) {
this.chamorro = StringEscapeUtils.unescapeHtml4(chamorro.trim());
this.english = StringEscapeUtils.unescapeHtml4(english.trim());
this.notes = notes.trim();
}
#Override
public boolean equals(Object object) {
if (!(object instanceof ChamorroEntry)) {
return false;
}
if (this == object) {
return true;
}
ChamorroEntry entry = (ChamorroEntry) object;
return chamorro.equals(entry.chamorro) && english.equals(entry.english)
&& notes.equals(entry.notes);
}
#Override
public int hashCode() {
return java.util.Objects.hash(chamorro, english, notes);
}
public static void main(String... args) {
ChamorroEntry entry1 = new ChamorroEntry("Åguigan", "Second island south of Saipan. Åguihan.", "");
ChamorroEntry entry2 = new ChamorroEntry("Åguihan", "Second island south of Saipan. Åguigan.", "");
System.err.println(entry1.equals(entry2)); // returns false
System.err.println(entry1.hashCode() + "\n" + entry2.hashCode()); // returns same hash code!
}
}
Actually, you happened to trigger pure coincidence. :)
Objects.hash happens to be implemented by successively adding the hash code of each given object and then multiplying the result by 31, while String.hashCode does the same with each of its characters. By coincidence, the differences in the "English" strings you used occur at exactly one offset more from the end of the string as the same difference in the "Chamorro" string, so everything cancels out perfectly. Congratulations!
Try with other strings, and you'll probably find that it works as expected. As others have already pointed out, this effect is not actually wrong, strictly speaking, since hash codes may correctly collide even if the objects they represent are unequal. If anything, it might be worthwhile trying to find a more efficient hash, but I hardly think it should be necessary in realistic situations.
There is no requirement that unequal objects must have different hashCodes. Equal objects are expected to have equal hashCodes, but hash collisions are not forbidden. return 1; would be a perfectly legal implementation of hashCode, if not very useful.
There are only 32 bits worth of possible hash codes, and an unbounded number of possible objects, after all :) Collisions will happen sometimes.
HashCode being 32 bit int value, there is always a possibility of collisions(same hash code for two objects), but its rare/coincidental. Your example is one of the such a highly coincidental one. Here is the explanation.
When you call Objects.hash, it internally calls Arrays.hashCode() with logic as below:
public static int hashCode(Object a[]) {
if (a == null)
return 0;
int result = 1;
for (Object element : a)
result = 31 * result + (element == null ? 0 : element.hashCode());
return result;
}
For your 3 param hashCode, it results into below:
31 * (31 * (31 *1 +hashOfString1)+hashOfString2) + hashOfString3
For your first object. Hash value of individual Strings are:
chamorro --> 1140493257
english --> 1698758127
notes --> 0
And for second object:
chamorro --> 1140494218
english --> 1698728336
notes -->0
If you notice, first two values of the hash code in both objects are different.
But when it computes the final hash code as:
int hashCode1 = 31*(31*(31+1140493257) + 1698758127)+0;
int hashCode2 = 31*(31*(31+1140494218) + 1698728336)+0;
Coincidentally it results into same hash code 1919283673 because int is stored in 32 bits.
Verify the theory your self be using the code segment below:
public static void main(String... args) {
ChamorroEntry entry1 = new ChamorroEntry("Åguigan",
"Second island south of Saipan. Åguihan.", "");
ChamorroEntry entry2 = new ChamorroEntry("Åguihan",
"Second island south of Saipan. Åguigan.", "");
System.out.println(entry1.equals(entry2)); // returns false
System.out.println("Åguigan".hashCode());
System.out.println("Åguihan".hashCode());
System.out.println("Second island south of Saipan. Åguihan.".hashCode());
System.out.println("Second island south of Saipan. Åguigan.".hashCode());
System.out.println("".hashCode());
System.out.println("".hashCode());
int hashCode1 = 31*(31*(31+1140493257) + 1698758127)+0;
int hashCode2 = 31*(31*(31+1140494218) + 1698728336)+0;
System.out.println(entry1.hashCode() + "\n" + entry2.hashCode());
System.out.println(getHashCode(
new String[]{entry1.chamorro, entry1.english, entry1.notes})
+ "\n" + getHashCode(
new String[]{entry2.chamorro, entry2.english, entry2.notes}));
System.out.println(hashCode1 + "\n" + hashCode2); // returns same hash code!
}
public static int getHashCode(Object a[]) {
if (a == null)
return 0;
int result = 1;
for (Object element : a)
result = 31 * result + (element == null ? 0 : element.hashCode());
return result;
}
If you use some different string parameters, hope it will result into different hashCode.
it's not necessary for two unequal objects to have different hashes, the important thing is to have the same hash for two equal objects.
I can implement hashCode() like this :
public int hashCode() {
return 5;
}
and it will stay correct (but inefficient).

BST intersection, NullPointerException

I am trying to create a new BST from the intersection of 2 known BSTs. I am getting a NullPointerException in the intersect2 method int he second case, at the line "cur3.item.set_account_id(cur1.item.get_accountid()+ cur2.item.get_accountid());". I know you get the error when you try to dereference the variable without initializing it but i think i am initializing it? I'm not really sure. I would appreciate the help.
public static Bst<Customer> intersect(Bst<Customer> a, Bst<Customer> b){
return( intersect2(a.root, b.root));
}
public static Bst<Customer> intersect2(BTNode<Customer> cur1, BTNode<Customer> cur2){
Bst<Customer> result = new Bst<Customer>();
// 1. both empty -> true
if (cur1==null && cur2==null){
result=null;
}
// 2. both non-empty -> compare them
else if (cur1!=null && cur2!=null) {
BTNode<Customer> cur3 = new BTNode<Customer>();
cur3.item.set_account_id(cur1.item.get_accountid()+ cur2.item.get_accountid());
result.insert(cur3.item);
intersect2(cur1.left, cur2.left);
intersect2(cur1.right, cur2.right);
}
// 3. one empty, one not -> false
else if (cur1==null ||cur2==null){
BTNode<Customer> cur3 = new BTNode<Customer>();
cur3.item=null;
intersect2(cur1.left, cur2.left);
intersect2(cur1.right, cur2.right);
}
return result;
}
Here is the image of the problem:
A NullPointerException can be caused by a number of things. In your given example, cur1 and cur2 are not null, but there is no guarantee that cur1.item, cur1.item.accountId (and similarly for cur2) are not null.
Being as you have no description for the underlying implementation, I cannot assist further.
I can suggest that you do some of a few things:
1.) check the implementation of your objects (if this happens EVERY time, there may be some sort of initialization problem.
2.) Whenever you create an instance of your item, do you make sure to specify the accountId field? Try giving a default value for this field so it cannot be null. (try some sort of illegal value [eg -1, false, etc] and test for it.
If you would post more implementation details, I (or someone) may be able to directly identify the problem.
Regards.
Edit:4/20#17:11
Here's an example of what you should be doing.
public class Customer {
private int accountId;
public Customer() {
this.accountId = 0;
}
public Customer(int account_identification) {
this.accountId = account_identification);
}
//As a side note, general practice implies fields be private
//Use a method (hence the term 'getter' and the reciprocal, 'setter')
public int getId() {
return this.accountId;
}
public void setId(int replacement_account_identification) {
this.accountId = replacement_account_identification;
}
}
It is because the item variable in Customer object is not initialized.
Does creating a BTNode automatically allocate its member item ?
You do:
cur3.item.set_account_id(.. )
For this to succeed, both cur3 and cur3.item need to be not null.
Same applies to cur1 and cur2 as well, that you reference later in that line.
And the example of the 3rd case shows that BTNode.item can be null in some scenarios:
cur3.item=null;

Returning searched results in an array in Java without ArrayList

I started down this path of implementing a simple search in an array for a hw assignment without knowing we could use ArrayList. I realized it had some bugs in it and figured I'd still try to know what my bug is before using ArrayList. I basically have a class where I can add, remove, or search from an array.
public class AcmeLoanManager
{
public void addLoan(Loan h)
{
int loanId = h.getLoanId();
loanArray[loanId - 1] = h;
}
public Loan[] getAllLoans()
{
return loanArray;
}
public Loan[] findLoans(Person p)
{
//Loan[] searchedLoanArray = new Loan[10]; // create new array to hold searched values
searchedLoanArray = this.getAllLoans(); // fill new array with all values
// Looks through only valid array values, and if Person p does not match using Person.equals()
// sets that value to null.
for (int i = 0; i < searchedLoanArray.length; i++) {
if (searchedLoanArray[i] != null) {
if (!(searchedLoanArray[i].getClient().equals(p))) {
searchedLoanArray[i] = null;
}
}
}
return searchedLoanArray;
}
public void removeLoan(int loanId)
{
loanArray[loanId - 1] = null;
}
private Loan[] loanArray = new Loan[10];
private Loan[] searchedLoanArray = new Loan[10]; // separate array to hold values returned from search
}
When testing this, I thought it worked, but I think I am overwriting my member variable after I do a search. I initially thought that I could create a new Loan[] in the method and return that, but that didn't seem to work. Then I thought I could have two arrays. One that would not change, and the other just for the searched values. But I think I am not understanding something, like shallow vs deep copying???....
The return value from getAllLoans is overwriting the searchedLoanArray reference, which means that both loanArray and searchedLoanArray are pointing at the same underlying array. Try making searchedLoanArray a local variable, and then use Arrays.copyOf. If you're trying not to use standard functions for your homework, manually create a new Loan array of the same size as loanArray, and then loop and copy the values over.
your searchloanarray and loanarray point to the same array. doing this
private Loan[] searchedLoanArray = new Loan[10]
does nothing as you never use that new Loan[10]
this is the key to your problem
searchedLoanArray = this.getAllLoans()
that just points searchedLoanArray at loanArray
You could rewrite it like this:
public Loan[] findLoans(Person p)
{
Loan[] allLoans = this.getAllLoans();
System.arraycopy(allLoans, searchedLoanArray, 0, 0, allLoans.length); // fill new array with all values
// remainder of method the same
}
But as it stands, the code still has some problems:
The maximum number of loans is fixed to the size of the array. You will avoid this problem when you switch to List<Loan>.
Using the id as an index means that your ids must be carefully generated. If IDs come from a database, you may find that the list tries to allocate a huge amount of memory to size itself to match the Id. You would be better using a Map, then the size of the map is based on the number of loans, rather than their IDs.
As the number of people and loans increase, the search time will also increase. You can reduce search time to a constant (irrespective of how many People) by using a Map>, which allows quick lookup of the loans associated just with that person.
Here's a version with these changes:
class AcmeLoanManager
{
public void addLoan(Loan l)
{
Person client = l.getClient();
List<Loan> loans = clientLoans.get(l);
if (loans==null)
{
loans = new ArrayList();
clientLoans.put(client, loans);
}
loans.add(l);
allLoans.put(l.getLoanId(), l);
}
public void removeLoan(int loanId)
{
Loan l = loans.remove(loanId);
clientLoans.remove(loan);
}
public Collection<Loan> getAllLoans()
{
return loans.values();
}
public List<Loan> findLoans(Person p)
{
List<Loan> loans = clientLoans.get(p);
if (loans==null)
loans = Collections.emptyList();
return loans;
}
private Map<Integer,Loan> allLoans = new HashMap<Integer,Loan>();
private Map<Person, List<Loan>> clientLoans = new HashMap<Person,List<Loan>>();
}
I hope this helps!
What I would do is loop through the values and reassign each value to the new variable. Alternatively, you could use "deep copy" technique as described here in Javaworld: http://www.javaworld.com/javaworld/javatips/jw-javatip76.html

Categories