Compare two lists of unequal lengths and remove partial matches? - java

Lets say I have two lists like:
List1 = Fulton Tax Commissioner 's Office, Grady Hospital, Fulton Health Department
List2 = Atlanta Police Department, Fulton Tax Commissioner, Fulton Health Department,Grady Hospital
I want my final list to look like this:
Final List = Fulton Tax Commissioner 's Office,Grady Hospital,Fulton Health Department,Atlanta Police Department
I can remove duplicates from these lists by adding both the lists to a set. But how do I remove partial matches like Fulton Tax Commissioner?

I suggest: Set the result to a copy of list 1. For each member of list 2:
If the result contains the same member, skip it.
If the result contains a member that starts with the list 2 member, also skip the list 2 member
If the result contains a member that is a prefix of the list 2 member, replace it by the list 2 member
Otherwise add the list 2 member to the result.
If using Java 8, the tests in the 2nd and 3rd bullets can be conveniently done with streams, for example result.stream().anyMatch(s -> s.startsWith(list2Member));.
There is room for optimization, for example using a TreeSet (if it’s OK to sort the items).
Edit: In Java:
List<String> result = new ArrayList<>(list1);
for (String list2Member : list2) {
if (result.stream().anyMatch(s -> s.startsWith(list2Member))) { // includes case where list2Member is in result
// skip
} else {
OptionalInt resultIndex = IntStream.range(0, result.size())
.filter(ix -> list2Member.startsWith(result.get(ix)))
.findAny();
if (resultIndex.isPresent()) {
result.set(resultIndex.getAsInt(), list2Member);
} else {
result.add(list2Member);
}
}
}
The result is:
[Fulton Tax Commissioner 's Office, Grady Hospital, Fulton Health Department, Atlanta Police Department]
I believe this exactly the result you asked for.
Further edit: In Java 9 you may use (not tested):
resultIndex.ifPresentOrElse(ix -> result.set(ix, list2Member), () -> result.add(list2Member));

Add to set by passing a comparator, like below:
Set s = new TreeSet(new Comparator() {
#Override
public int compare(Object o1, Object o2) {
// add the logic to say that partial match is considered same.
}
});
s.addAll(yourList);

Related

Stream - filter based on hashMap value

I want to start from a collection of diploma projects and by using stream I want to get an arrayList of diploma project titles, from the students that have taken a course identified by courseId. They will also need to have passed the course with grade of 2 or higher.
I have this DiplomaProject class:
public class DiplomaProject{
String title;
ArrayList<Student> authors
}
Each diplomaProject can have multiple authors.
I have this Course class:
public class Course{
String courseName;
String courseId;
}
This Student class:
public class Student{
String name;
HashMap<Course, Integer> courseList;
DiplomaProject diplomaProject;
}
The grade of the course is the Integer value of courseList.
This is my current solution, but I know it does not do what I want. I can't find a way to filter based on the value of the courseList, and I do not know how I can get the the diplomaProject titles at the end of the streams (only at the top level).
public static List<String> diplomaProjectTitle(List<DiplomaProject> diplomaProjects) {
return diplomaProjects.stream()
.map(diplomaProject -> diplomaProject.authors)
.flatMap(students -> students.stream())
.filter(student -> student.courseList.keySet().equals("math1001"))
.flatMap(student -> student.courseList.keySet().stream())
.map(student -> student.courseName)
.collect(Collectors.toList());
You are losing the info on the diploma projects with the the .map functions. What you want to do is operate within the .filter() functions of the first diplomaproj stream.
Therefore
public List<String> diplomaProjectTitles(List<DiplomaProject> diplomaProjects) {
return diplomaProjects.stream()
.filter(projects -> projects.getAuthors().stream().map(Student::getCourseList)
//get the course matching this one (both name and id)
.map(c -> c.get(new Course("math101", "1")))
//check if that course has grade greater than the minimum
.anyMatch(grade -> grade>=2))
.map(DiplomaProject::getTitle)
.collect(Collectors.toList());
}
For this to work though you would need to modify your Course class. Since you are using it within a hash map as a key, and want to get it through a custom query you will need to add the hashCode() function.
public class Course {
private String courseName;
private String courseId;
#Override
public int hashCode() {
return courseName.hashCode() + courseId.hashCode();
}
#Override
public boolean equals(Object o) {
if(o instanceof Course oc) {
return oc.getCourseName().equals(this.getCourseName()) && oc.getCourseId().equals(this.getCourseId());
}
return false;
}
//getters and setters
}
In order to test it I created a simple method that prepares a test case
public void filterStudents() {
List<DiplomaProject> diplomaProjects = new ArrayList<>();
List<Course> courses = new ArrayList<>();
courses.add(new Course("math101", "1"));
courses.add(new Course("calc101", "2"));
courses.add(new Course("calc102", "3"));
List<Student> students = new ArrayList<>();
Map<Course, Integer> courseMap = Map.of(courses.get(0), 3, courses.get(1), 1);
students.add(new Student("TestSubj", courseMap));
Map<Course, Integer> courseMap2 = Map.of(courses.get(0), 1, courses.get(1), 3);
students.add(new Student("TestSubj2", courseMap2));
diplomaProjects.add(new DiplomaProject("Project1", students));
diplomaProjects.add(new DiplomaProject("Project2", List.of(students.get(1))));
log.info("Dimploma projects are " + diplomaProjectTitles(diplomaProjects));
}
this way Project 1 will have a student with math101 with grade 3 and one with grade 1, and Project2 will have a student with math101 with grade 1. As expected, the result of the filtering method is only project1
I want to get a List of diploma project titles, from the students that have taken a Course identified by the given courseId. They will also need to have passed the course with grade of 2 or higher.
In your method diplomaProjectTitle you're actually losing the access to the titles of the diploma projects at the very beginning of the stream pipe-line because the very first operation extracts authors from the stream element.
You need to need the stream to of type Stream<DiplomaProject> in order to get a list of diploma project titles as a result. Therefore, all the logic needed to filter the desired diploma project should reside in the filter() operation.
That's how it might be implemented:
public static List<String> diplomaProjectTitle(List<DiplomaProject> diplomaProjects,
String courseId,
Integer grade) {
return diplomaProjects.stream()
.filter(diplomaProject -> diplomaProject.getAuthors().stream()
.anyMatch(student ->
student.getCourseList().getOrDefault(courseId, 0) >= grade
)
)
.map(DiplomaProject::getTitle)
.toList(); // or .collect(Collectors.toList()) for JDK version earlier than 16
}
A couple of important notes:
Avoid using public fields and accessing the fields from outside the class directly rather than via getters.
Pay attention to the names of your method, variables, etc. The name courseList is confusing because it's actually not a List. This map should rather be named gradeByCourse to describe its purpose in a clear way.
Leverage abstractions - write your code against interfaces. See What does it mean to "program to an interface"?
Pay attention to the types you're working with keySet().equals("math1001") even IDE is capable to tell you that something is wrong here because Set can never be equal to a String.
A step-by-step way of thinking:
We need to filter projects based on the criteria that these have at least one author (student) who has passed a specific course (courseId) with a grade >= 2 (another filter).
dipProjects.stream().filter(p->p.getAuthors().stream().anyMatch(s->s.getCourseList().getOrDefault(courseId,0) >= grade)).map(p->p.getTitle()).collect(Collectors.toList());

How to find the last object within a search condition in a lambda stream and remove it

I have a list of books. Some of these books are the same however they are not duplicates as they differ because of a variable called copyNumber. Example. Book1(Title: Spiderman, ISBN: 1111, CopyNo: 1), Book2(Title: Spiderman, ISBN: 1111, CopyNo: 2), Book3(Title: Spiderman, ISBN: 1111, CopyNo: 3), Book4(Title: Alice in wonderland, ISBN: 2222, CopyNo: 1), Book5(Title: Alice in wonderland, ISBN: 2222, CopyNo: 2). So my goal is to list all these books in order of their copy numbers, find the book (and its copies) via its ISBN number, then delete the last copy. So if i was to call deleteBook("1111"); for the example above i would delete the 3rd copy of spiderman. If i was to call deleteBook("2222"); I would delete the second copy of Alice in wonderland
public void deleteBook(String isbn){
Collections.sort(books, new OrderBooksByCopyNumber());
Book book = books.stream().filter((b) -> (b.getISBNNumber().equals(isbn))).findFirst().get();
books.remove(book);
}
The code above is close to what i want how ever i don't want findFirst(), I want to do the equivalent of findLast(). Below is my comparator if it's of any relevance. Thanks to anyone who can help!
private class OrderBooksByCopyNumber implements Comparator<Book>{
#Override
public int compare(Book o1, Book o2) {
return o1.getCopyNumber() < o2.getCopyNumber() ? -1 : o1.getCopyNumber() == o2.getCopyNumber() ? 0 : 1;
}
}
There are several problems with your code.
First, there is no reason to write your own comparator when all you want is to compare the copy numbers. The comparator you wrote is equivalent to Comparator.comparingInt(Book::getCopyNumber).
Second, if you are using the get method of an Optional, you are probably doing something wrong. In this case, you are not considering the case when the Optional is empty - when no books match the given ISDN. Instead, you should be doing something like
...findFirst().ifPresent( book -> books.remove(book) );
Third, you are sorting the collection. This leaves it sorted by book number as a side effect - which is probably not what you intended. Also, it takes O(N log N) time to sort it - when all you need is a partial maximum which shouldn't require more than O(N).
What you should do instead is:
public void deleteBook(String isbn){
books.stream()
.filter((b) -> (b.getISBNNumber().equals(isbn)))
.max(Comparator.comparingInt(Book::getCopyNumber))
.ifPresent(book->books.remove(book));
}
Note that you are still traversing the list twice, because the remove method also has to iterate the book collection one by one to find the matching book to delete. It may be more efficient to forget the stream and just work with a good old ListIterator.
int maxCopyNumber = -1;
int bookIndex = -1;
ListIterator iter = books.listIterator();
while ( iter.hasNext() ) {
Book book = iter.next();
if ( book.getISBNNumber().equals(isbn) && book.getCopyNumber() > maxCopyNumber ) {
maxCopyNumber = book.getCopyNumber();
bookIndex = iter.previousIndex();
}
}
if ( bookIndex >= 0 ) {
books.remove(bookIndex);
}
If your list is an ArrayList or similar, this will prevent traversing it twice as the remove is performed in O(1).
Finally, consider using a different data structure altogether! For example, a Map<String,List<Book>>, where the key is the ISDN, and each entry is the list of books, ordered by copy number. This will allow you to reach the list directly, and just remove the last element. (If the list becomes empty, delete the entry altogether, or check it's not empty to begin with).
You could use stream.skip() passing in the number of books with a certain ISBN -1 then do a findFirst().get();
Documented here ->
https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html
First,sort books with the special ISBN.Second,find the Book with max CopyNo.Third,delte the Book
public static void deleteBook(String isbn, List<Book> books) {
Book book = books.stream()
.filter((b) -> (b.getISBN().equals(isbn)))
.sorted(Comparator.comparingInt(Book::getCopyNo))
.reduce((book1, book2) -> book2).orElse(null);
books.remove(book);
}
Thanks #RealSkeptic sort in reverse order and first first is more direct
public static void deleteBook(String isbn, List<Book> books) {
Book book = books.stream()
.filter((b) -> (b.getISBN().equals(isbn)))
.sorted((o1, o2) -> o2.getCopyNo()- o1.getCopyNo())
.reduce((book1, book2) -> book2).orElse(null);
books.remove(book);
}

List of 10 Taxpayers who spent the most

I need to return a List, or a Collection in general, that gives me the 10 taxpayers who spent the most in the entire system. The classes are divided in User, Taxpayer (which extends User) and Expense, and in my main class Main I have a Map holding every single value for Users and Expenses, respectively a Map<String, User> users and a Map<String, Expense> expenses.
The first step would be to go through the Map of users and check if it's a Taxpayer , then for that Taxpayer get all the Expenses he has done. Inside each expense there's a variable called Value with a getValue method to return the Value.
I've tried to do it but I was having a problem in updating the Collection if the next Taxpayer had a higher sum on Expense values than the one on the "end" of the Collection.
Also, I would prefer if this wasn't done in Java 8 since I'm not very comfortable with it and there's more conditions that I would need to set in the middle of the method.
Edit (what I have until now):
public List<Taxpayer> getTenTaxpayers(){
List<taxpayer> list = new ArrayList<Taxpayer>();
for(User u: this.users.values()){
if(!u.getUserType()){ // if it is a Taxpayer
Taxpayer t = (Taxpayer) u;
double sum = 0;
for(Expense e: this.expenses.values()){
if(t.getNIF().equals(e.getNIFClient())){ //NIF is the code that corresponds to the Taxpayer. If the expense belongs to this Taxpayer, enters the if statement.
sum += e.getValue();
if(list.size()<10){
list.add(t.clone());
}
}
}
}
}
}
So if I understand correctly, when you already have 10 Taxpayers in your list, you are struggling on how to then add another taxpayer to the list to maintain a only to top 10 "spenders"
One way to approach this is to gather the expenses of all your Taxpayers and add them all to your list. Then sort the list in reverse order by the amount they have spent. Then just get the first 10 entries from the list.
You could do this using the Collections.sort() method defining your own custom Comparator
Something like:
List<Taxpayer> taxpayers =...
Collections.sort(taxpayers, new Comparator<Taxpayer>()
{
#Override
public int compare(Taxpayer o1, Taxpayer o2)
{
return o1.sum - o2.sum; // using your correct total spent here
// or to just sort in reverse order
// return o2.sum - o1.sum;
}
});
Or if Taxpayer implements Comparable you can just use
Collections.sort(taxpayers)
Then reverse
Collections.reverse(taxpayers)
Then get top 10
List<Taxpayer> top10 = taxpayers.subList(0, 10);
To be more efficient though you could just define the comparator to sort the list in reverse order - then you don't need to reverse the list - just get the top 10.

Using streams to find the highest unique values

Let's say I have this simple Car object:
class Car {
String id;
String model;
Integer price;
}
Now I have a list of cars which might look like so:
{ Car(1, Commodore, 55000), Car(2, F150, 120000), Car(3, F150, 130000),
Car(4, Camry, 50000), Car(5,Commodore,50000)}
I would like to filter any duplicate models out of the List, ensuring that I'm only keeping in the most expensive priced car of each duplicate, e.g:
{ Car(1, Commodore, 55000), Car(3, F150, 130000), Car(4, Camry, 50000) }
I believe that I can do this using the Streams API, but I'm just struggling to chain the right calls together.
In my head, I would imagine the pseudocode would be something like:
Collect all the cars, group by model
Use Collectors.maxBy() on each individual model's List to get the priciest
Amalgamate the resulting one-item lists into a global car list again
Trying to stitch that together got a bit messy though - any ideas?
You can create a Map of the max valued car by model:
List<Car> cars = new ArrayList<> ();
Map<String,Optional<Car>> maxCarByModel =
cars.stream ()
.collect (Collectors.groupingBy (c -> c.model,
Collectors.maxBy (Comparator.comparing (c->c.price))));
Use Collectors.toMap with a merge function that keeps the car with the max price:
Collection<Car> carsWithMaxPrice = cars.stream().collect(Collectors.toMap(
Car::getModel,
Function.identity(),
(c1, c2) -> c1.getPrice() > c2.getPrice() ? c1 : c2))
.values();

Creating a list of booleans on the fly

Imagine we are pulling data about people and their favourite foods.
The data would come to us in the format: "Name, FavFood1, FavFood2..FavFoodn".
e.g. "James, Beans, Chicken".Notice how we do not know how many foods a person will favour.
From this data we create an instance of a Person object which captures the person's name and favourite foods. After we have pulled data on every person, we want to create a spreadsheet whose columns would be: Name|Potato|Chicken|Beans|Curry etc.
All of the values to the right of the person's name will be simple boolean values representing whether or not that food was one of the person's favourites.
The problem is: we do not know in advance; all the foods that someone could possibly favour, and as such cannot just set up boolean instance variables in the Person class.
I've given this some thought, implementing sets,hash-sets and hash-maps, however every solution I think of ends up being horribly inelegant and so I've turned to the genius of stackoverflow for help on this one.
My question is: What design pattern / approach can I use to cleanly achieve the outcome I desire? Whilst this is a language-agnostic question I am programming this in Java, so if there's anything in the Java API or elsewhere built for this, do let me know.
Thanks in advance!
Try this. It generates data in csv form.
class Person {
final String name;
final Set<String> foods;
Person(String name, Set<String> foods) {
this.name = name;
this.foods = foods;
}
Stream<Boolean> getBooleans(List<String> foods) {
return foods.stream().map(food -> this.foods.contains(food));
}
#Override
public String toString() {
return "Person(" + name + ", " + foods +")";
}
}
class Test {
public static void main(String[] args) throws Exception {
List<String> data = Arrays.asList(
"James, Beans, Chicken",
"Emily, Potato, Curry",
"Clara, Beans, Curry"
);
List<String> foodNames = Arrays.asList(
"Potato", "Chicken", "Beans", "Curry"
);
Stream<Person> persons = data.stream().map(d -> {
String[] split = d.split(",");
for(int i = 0; i < split.length; i++) {
split[i] = split[i].trim();
}
String name = split[0];
Set<String> foods = Stream.of(split).skip(1).collect(Collectors.toSet());
return new Person(name, foods);
});
Stream<String> csvData = persons.map(p ->
p.name + ", " + p.getBooleans(foodNames)
.map(b -> b.toString())
.collect(Collectors.joining(", "))
);
csvData.forEach(System.out::println);
}
}
First of all, I highly recommend that whatever you do it in a separate class with methods like addFavoriteFood(String food) and boolean isFavoriteFood(String food) getFavorites(String food).
Personally I think the implementation of this class should contain both an instance HashSet (to hold the foods this person likes) and a SortedSet that is common to all the foods that can contain a list of ALL foods. (See notes at end)
Add would add it to both sets, getFavorites would return those in the first Hash set.
Hmm, it may also need a static getAllFavorites() method to return the SortedSet
Since your FavoiteFoods class knows the master list AND the person's favorites, you could even have it do most of the work by having a getFormattedRow() and static getFormattedHeaderRow() method. then your implementaiton is just:
System.out.println(FavoriteFoods.getFormattedHeaderRow());
for(Person person:people)
System.out.println(person.favoriteFood.getFormattedRow());
Again, the best thing here is that you can just use the Simplest Thing That Could Possibly Work for your implementation and re-do it later if need be since, being isolated in another class, it doesn't infect all your code with nasty implementation-specific sets, classes, booleans, etc.
Notes about the master list: This master list could naively be implemented as a Static but that's a bad idea--optimally the same masterList SortedSet would be passed into each instance on construction. Also since it is shared among all instances and is mutable it brings in issues if your solution is threaded!
What is so inelegant about this pseudocode?
Set<String> allFoods = new TreeSet<String>();
List<Person> allPersons = new ArrayList<Person>();
while (hasMorePersons()) {
Person person = getNextPerson();
allPersons.add(person);
allFoods.addAll(person.getFoods());
}
spreadSheet.writeHeader("Name", allFoods);
for (Person person : allPersons) {
spreadSheet.writeName(person.getName());
for (String food : allFoods) {
// assume getFoods() return a Set<String>,
// not necessarily ordered (could be a HashSet)
boolean yourBooleanHere = person.getFoods().contains(food);
spreadSheet.writeBoolean(yourBooleanHere);
}
spreadSheet.nextLine();
}
If you need a table of booleans or whatever else, you can easily store them anywhere you want during the second loop.
Note: TreeSet orders foods according to the natural order (that is, alphabetically). To output them in the order they are encountered, use a LinkedHashSet instead.

Categories