Getting unique enteries from the file

Getting unique enteries from the file - java

I am parsing a file with more than 4M lines in it. It is of the form a^b^c^d^...^....
Now i want all the unique points(only the 1st two entries should be unique) from the file. So what I do is,
String str;
Set<String> lines = new LinkedHashSet<String>();
Set<String> set = Collections.synchronizedSet(lines);
String str1[] = str.split("\\^");
set.add(str1[0]+"^"+str1[1]);
So this gives me the unique 1st and 2nd unique points from the file. However, I also want the 3rd point(timestamp) i.e str1[2] associated with the above points. The new file should be of the form.
str1[0]^str1[1]^str1[2]
How do I go about doing this?

There are a few solutions that come to mind.
Make a class for the 3 entries.
Override the equals method and only check on the first 2 entries there, so 2 objects are equal if the first 2 entries are equal. Now add all the items to the set.
So what you 'll get in your set is a list with unique first and second points and the first occaurance of your timestamp.
Another solution is to keep two lists, one with your 2 points + time stamp, one with only your 2 points.
The you can do set.contains(...) to check if you already saw the point and if you didn't add to the list with 2 points + timestamp.

Create a class containing the information you need which you will store in the set, but only care about the first two in equals/hashCode. Then you can do:
Set<Point> set = new HashSet<Point>();
String str1[] = str.split("\\^");
set.add(new Point(str1[0], str1[1], str1[2]));
Using:
public class Point {
String str1;
String str2;
String str3;
public Point(String str1, String str2, String str3) {
this.str1 = str1;
this.str2 = str2;
this.str3 = str3;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((str1 == null) ? 0 : str1.hashCode());
result = prime * result + ((str2 == null) ? 0 : str2.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Point other = (Point) obj;
if (str1 == null) {
if (other.str1 != null)
return false;
} else if (!str1.equals(other.str1))
return false;
if (str2 == null) {
if (other.str2 != null)
return false;
} else if (!str2.equals(other.str2))
return false;
return true;
}
}

Related

Is there any way to find the duplicate values i.e ArrayList from a hash map?

For eg:-
I have declared a hashMap in the form of:
Map<String, List<Tracks>> dupItems = new LinkedHashMap();
Tracks is a model class that contains name, address, and age.
and I added items in a Tracks
and I added Items as:-
dupItems.add("Project",tracks);
dupItems.add("Report",tracks);
and what I want is a list of duplicate tracks i.e how can I match the items on the basis of values of tracks i.e all values have to be similar. same name, same address, and same age.

If you are looking for the duplicate map values, that is, duplicate lists of Tracks:
One way you can do this is iterate over the values, putting them all in a Set data structure. If you find that the value was already in the set, then it's a duplicate, and you add it to a separate data structure that keeps a record of the duplicate values:
Set<List<Tracks>> findDuplicateValues(Map<String, List<Tracks>> dupItems) {
Set<List<Tracks>> allValues = new HashSet<>();
Set<List<Tracks>> duplicateValues = new HashSet<>();
for (List<Tracks> value : dupItems.values()) {
if (!allValues.add(value)) {
// It's a duplicate!
duplicateValues.add(value);
}
}
return duplicateValues;
}
For this to work reliably, the Tracks class has to implement the equals and hashCode methods. Comparing two objects with the same values using the equals method should return true.
On the other hand, if you are looking for the duplicate Tracks values, no matter which map value list contains them:
you just need to add a loop to the previous method:
Set<List<Tracks>> findDuplicateValues(Map<String, List<Tracks>> dupItems) {
Set<List<Tracks>> allValues = new HashSet<>();
Set<List<Tracks>> duplicateValues = new HashSet<>();
for (List<Tracks> value : dupItems.values()) {
for (Tracks value : values) {
if (!allValues.add(value)) {
// It's a duplicate!
duplicateValues.add(value);
}
}
}
return duplicateValues;
}

I'm assuming we search for duplicate Tracks instances.
public Set<Tracks> findDuplicates(Map<String, List<Tracks>> dupItems) {
Set<Tracks> all = new HashSet<>();
return dupItems.values().stream()
.flatMap(list -> list.stream()) // build a single list of Tracks
.filter(t -> !all.add(t)) // add track to all set but only continue if duplicate
.collect(Collectors.toSet()); // store the remaining tracks in a set (to avoid duplicates in the result
}
A Set will check on duplicates only if the equals() function is correctly implemented.
My eclipse editor generated :
private static class Tracks {
String name;
String address;
int age;
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((address == null) ? 0 : address.hashCode());
result = prime * result + age;
result = prime * result + ((name == null) ? 0 : name.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Tracks other = (Tracks) obj;
if (address == null) {
if (other.address != null)
return false;
} else if (!address.equals(other.address))
return false;
if (age != other.age)
return false;
if (name == null) {
if (other.name != null)
return false;
} else if (!name.equals(other.name))
return false;
return true;
}
}
You can also have a look at Apache Commons EqualsBuilder

Java Custom Object with multiple properties as Map key or concatenation of its properties

I have a requirement where I have to aggregate a number of objects based on its properties. Object has around 10 properties and aggregation must be done on all its properties. For example -
If there are two objects A and B of some class C with properties p1, p2, p3,...p10, (all properties are of String type) then these two objects must be considered equal only if all its corresponding properties are equal.
For this I have two approaches in mind using HashMap in Java-
Approach 1 - Using key as Object of tyep C and Value as Integer for count and increase the count every time an existing object is found in Map otherwise create a new key value pair.
HahsMap<C, Integer>
But in this approach since I have to aggregate on all the properties, I will have to write(override) an equals() method which will check all the string properties for equality and similarly some implementation for hashCode().
Approach 2 - Using key as a single string made by concatenation of all the properties of object and value as a wrapper object which will have two properties one the object of type C and another a count variable of Integer type.
For each object(C) create an String key by concatenation of its properties and if key already exists in the Map, get the wrapper object and update its count property, otherwise create a new key, value pair.
HashMap<String, WrapperObj>
In this approach I don't have to do any manual task to use String as key and also it is considered a good practice to use String as key in Map.
Approach 2 seems easy to implement and efficient as opposed to Approach 2 every time when equals is called all the properties will be checked one by one.
But I am not sure whether Approach 2 in a standard way of comparing two objects and performing this kind of operation.
Please suggest if there is any other way to implement this requirement, like if there is any better way to implement equals() method for using it as key when all its properties should be taken into consideration when checking for equality of objects.
Example -
Class whose objects needs aggregation with hash and equals implementation in case of Approach 1
public class Report {
private String p1;
private String p2;
private String p3;
private String p4;
.
.
.
private String p10;
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((p1 == null) ? 0 : p1.hashCode());
result = prime * result + ((p2 == null) ? 0 : p2.hashCode());
result = prime * result + ((p3 == null) ? 0 : p3.hashCode());
result = prime * result + ((p4 == null) ? 0 : p4.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (!(obj instanceof Report))
return false;
Report other = (Report) obj;
if (p1 == null) {
if (other.p1 != null)
return false;
} else if (!p1.equals(other.p1))
return false;
if (p2 == null) {
if (other.p2 != null)
return false;
} else if (!p2.equals(other.p2))
return false;
if (p3 == null) {
if (other.p3 != null)
return false;
} else if (!p3.equals(other.p3))
return false;
if (p4 == null) {
if (other.p4 != null)
return false;
} else if (!p4.equals(other.p4))
return false;
.
.
.
if (p10 == null) {
if (other.p10 != null)
return false;
} else if (!p10.equals(other.p10))
return false;
return true;
}
}
Code For aggregation Approach 1-
Map<Report, Integer> map = new HashMap<Report, Integer>();
for(Report report : reportList) {
if(map.get(report) != null)
map.put(report, map.get(report)+1);
else
map.put(report, 1);
}
Approach 2 - With wrapper class and not implementing equals and hash for Report class.
public class Report {
private String p1;
private String p2;
private String p3;
private String p4;
public String getP1() {
return p1;
}
public void setP1(String p1) {
this.p1 = p1;
}
public String getP2() {
return p2;
}
public void setP2(String p2) {
this.p2 = p2;
}
public String getP3() {
return p3;
}
public void setP3(String p3) {
this.p3 = p3;
}
public String getP4() {
return p4;
}
public void setP4(String p4) {
this.p4 = p4;
}
Report warpper class -
public class ReportWrapper {
private Report report;
private Integer count;
public Report getReport() {
return report;
}
public void setReport(Report report) {
this.report = report;
}
public Integer getCount() {
return count;
}
public void setCount(Integer count) {
this.count = count;
}
}
Code For aggregation Approach 2-
Map<String, ReportWrapper> map = new HashMap<String,
ReportWrapper>();
for(Report report : reportList) {
String key = report.getP1() + ";" + report.getP2() +
";" + report.getP3() +
";" + .....+ ";" + report.getP10();
ReportWrapper rw = map.get(key);
if(rw != null) {
rw.setCount(rw.getCount()+1);
map.put(key, rw);
}
else {
ReportWrapper wrapper = new ReportWrapper();
wrapper.setReport(report);
wrapper.setCount(1);
map.put(key, wrapper);
}
}
PSI: Here I am more concerned about, which approach is better.

Consider using the equals and hashcode methods that you can get generated from an IDE or use a tool like Lombok which will do it for you using an annotation and you don't have to write any code.
For lombok:
https://projectlombok.org/features/EqualsAndHashCode
How to use #EqualsAndHashCode With Include - Lombok
This is what IDEA generates if you want to go that route. No special process required.
#Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
Report report = (Report) o;
return Objects.equals(prop1, report.prop1) &&
Objects.equals(prop2, report.prop2) &&
Objects.equals(prop3, report.prop3) &&
Objects.equals(prop4, report.prop4) &&
Objects.equals(prop5, report.prop5) &&
Objects.equals(prop6, report.prop6) &&
Objects.equals(prop7, report.prop7) &&
Objects.equals(prop8, report.prop8) &&
Objects.equals(prop9, report.prop9);
}
#Override
public int hashCode() {
return Objects.hash(prop1, prop2, prop3, prop4, prop5, prop6, prop7, prop8, prop9);
}

Updating the objects of a Set in Java

I am trying to read from a file and count how many times each string appears in the file. I am using a HashSet on the Object Item which i have created as follows :
Now in my main i am trying to read the file and add each String in the file to my set. Also while adding i am trying to increment the count of an item in the set which is appearing more than once. Here's my implementation for that :
package pack;
public class Item {
public String name;
public int count=1;
public Item(String name)
{
this.name = name;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + count;
result = prime * result + ((name == null) ? 0 : name.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Item other = (Item) obj;
if (count != other.count)
return false;
if (name == null) {
if (other.name != null)
return false;
} else if (!name.equals(other.name))
return false;
return true;
}
}
For an input file like this :
chair table teapot
teapot book table
chair floor ceiling
wall chair floor
it is giving an output as follows :
wall appears 1 times
book appears 1 times
table appears 2 times
floor appears 2 times
teapot appears 2 times
chair appears 1 times
ceiling appears 1 times
chair appears 2 times
Here the set is having duplicate elements which i don't want. What is the correct way to update the objects inside a set?

i think this'll help you.
Create list of all keywords, and use code below.
public static void main(String[] args) {
List<String> list = new ArrayList<String>();
list.add("a");
list.add("b");
list.add("a");
// get all Unique keywords
Set<String> set = new HashSet<String>(list);
for(String keyword: set){
System.out.println(keyword + ": " + Collections.frequency(list, keyword));
}
}
output
b: appears 1 time
a: appears 2 time

Your Item class uses the count field in its definition of equals and hashCode. This means that when you call set.contains(i) for the second occurrence of the string, contains will return true since count==1. You then increment count, and when you call set.contains(i) for the third occurrence of the string contains will return false, since the count of the Item in the set does not match the count of the Item you are passing to contains.
To fix this, you should change your definition of equals and hashCode to consider only the string and not the count.
This implementation will work, but is overly complex. You could simply create a Map<String, Integer> and increase the Integer (count) each time you see a new occurrence of the string.

Your Implementation is right. But your Item class equals method has only problem.
In equals method you have used count variable also. But name is only the unique field in that class. You have used count+name as unique. So it will create problem.

HashSet uses hashCode and equals to determine identity, so you should change hashCode and equals to work with the name only when you don't want to include the count of items in the test for equality:
package pack;
public class Item {
public String name;
public int count=1;
public Item(String name)
{
this.name = name;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((name == null) ? 0 : name.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Item other = (Item) obj;
if (name == null) {
if (other.name != null)
return false;
} else if (!name.equals(other.name))
return false;
return true;
}
}

I think the problem is in your equals method, when you do this check:
if (count != other.count)
return false;

Have you considered using a HashMap for your problem: put the name in the key and the counter in the value. This way you don't need an Item class at all.

Java Collection removeAll not removing a thing

So I have an old list, a new list, and a unique list. I read in the data from each list (old/new) and make a bunch of objects from my class file. Then I add the newList to the unique, followed by my removing the old list to determine the unique Users.
CLASS
public class User {
private String fName;
private String mInitial;
private String lName;
private String age;
private String city;
private String state;
... // set and get methods
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((age == null) ? 0 : age.hashCode());
result = prime * result + ((city == null) ? 0 : city.hashCode());
result = prime * result + ((fName == null) ? 0 : fName.hashCode());
result = prime * result + ((lName == null) ? 0 : lName.hashCode());
result = prime * result
+ ((mInitial == null) ? 0 : mInitial.hashCode());
result = prime * result + ((state == null) ? 0 : state.hashCode());
return result;
}
#Override
public boolean equals(Object o) {
if(o == null) return false;
if (getClass() != o.getClass()) return false;
User other = (User) o;
if(this.fName != other.fName) return false;
if(! this.mInitial.equals(other.mInitial)) return false;
if(! this.lName.equals(other.lName)) return false;
if(! this.age.equals(other.age)) return false;
if(! this.city.equals(other.city)) return false;
if(! this.state.equals(other.state)) return false;
return true;
}
}
MAIN
try {
// List creation (new, old, unique)
List<User> listNew = new ArrayList<User>();
List<User> listOld = new ArrayList<User>();
Collection<User> listUnique = new HashSet<User>();
// Read the files in with while loop,
// ...
// Put them in their respective list
// ...
listUnique.addAll(listNew);
System.out.println("Junk... " + listUnique.size());
listUnique.removeAll(listOld);
// Checking the sizes of lists to confirm stuff is working or not
System.out.println(
"New: \t" + listNew.size() + "\n" +
"Old: \t" + listOld.size() + "\n" +
"Unique: " + listUnique.size() + "\n"
);
}
catch { ... }
OUTPUT
Junk... 20010
New: 20010
Old: 20040
Unique: 20010
So basically it is adding the content to the list but the removeAll doesn't work. Could this be a problem with my hashCode() in my User Class file? I just cannot figure out why it's not working. (Note: I auto generated my hashCode in the class file, not sure if that's a bad idea)
Thanks for any help!

as Takendarkk pointed out. It might be because you are checking references instead of value in case of string name. If the origin of name is different (they have different references) they might be treated unequal even if they have same value.

How to order a unique list based on 2 object attributes in Java

I have a list of objects I'm referring to as Artifacts. I need to sort alphabetically by the "Name" attribute and in numerical order by an attribute that Artifact has called "Level".
The Level is not always set in Artifact and in that case the entire collection should be alphabetical. If the Artifact has a Level then that takes precedence and should be sorted by order of Level.
The Artifacts need to be unique based upon the Name attribute. I could use a Set collection and override the equals method of the Artifact to sort Alphabetically. However, when I want to sort by Level, the equals method relevant to Name will throw off the results of this sort.
What collections and object structure should I use to remain unique by Name but also be able to sort by Level?

You'll want to look at the comparable interface and the comparator interface. Implement Comparable if this is the only way your objects can be compared, comparator otherwise.

I think java.util.TreeSet is good Container for your problem. It is Set and it uses Compareble mechanism.
So you have two options:
1) put Comparator into TreeSet constructor
2) make your Artifact implement Comparable
TIP: In compareTo method you can use compareTo method from String.

The code below will sort the set giving the precedence to the level and later the name. If a level is null, it will be placed at the beginning, treating it as it was a level 0. For null names, the Artifact will be positioned as it had an empty level. Hope that helps.
import java.util.Arrays;
import java.util.SortedSet;
import java.util.TreeSet;
public class Artifact implements Comparable<Artifact> {
private String name;
private Integer level;
public Artifact(String name, Integer level) {
this.name = name;
this.level = level;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((level == null) ? 0 : level.hashCode());
result = prime * result + ((name == null) ? 0 : name.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Artifact other = (Artifact) obj;
if (level == null) {
if (other.level != null)
return false;
} else if (!level.equals(other.level))
return false;
if (name == null) {
if (other.name != null)
return false;
} else if (!name.equals(other.name))
return false;
return true;
}
#Override
public int compareTo(Artifact o) {
if (level == null){
return new Artifact(name, 0).compareTo(o);
}
if (name == null){
return new Artifact("", level).compareTo(o);
}
if (level.equals(o.level)) {
return name.compareTo(o.name);
} else {
return level.compareTo(o.level);
}
}
public String toString() {
return level + " " + name;
}
public static void main(String[] args) {
Artifact a1 = new Artifact("a", 1);
Artifact a2 = new Artifact("a", 2);
Artifact a3 = new Artifact("a", 3);
Artifact b1 = new Artifact("b", 1);
Artifact b2 = new Artifact("b", 2);
Artifact b2a = new Artifact("b", 2);
Artifact nullLevel = new Artifact("a",null);
Artifact nullName = new Artifact(null,2);
SortedSet<Artifact> set = new TreeSet<Artifact>();
set.add(a1);
set.add(a2);
set.add(a3);
set.add(b1);
set.add(b2);
set.add(b2a);
set.add(nullLevel);
set.add(nullName);
System.out.println(Arrays.toString(set.toArray()));
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Getting unique enteries from the file - java

Related

Is there any way to find the duplicate values i.e ArrayList from a hash map?

Java Custom Object with multiple properties as Map key or concatenation of its properties

Updating the objects of a Set in Java

Java Collection removeAll not removing a thing

How to order a unique list based on 2 object attributes in Java

Categories

Resources