Java Streams with combining multiple rows to one - java

My code consists of a class with 10 variables. The class will get the data from a database table and the results from it is a List. Here is a sample class:
#Data
class pMap {
long id;
String rating;
String movieName;
}
The data will be as follows:
id=1, rating=PG-13, movieName=
id=1, rating=, movieName=Avatar
id=2, rating=, movieName=Avatar 2
id=2, rating=PG, movieName=
I want to combine both the rows to a single row grouping by id using Java streams. The end result should like this Map<Long, pMap>:
1={id=1, rating=PG-13, movieName=Avatar},
2={id=2, rating=PG, movieName=Avatar 2}
I am not sure how I can get the rows combined to one by pivoting them.

You can use toMap to achieve this:
Map<Long, pMap> myMap = myList.stream().collect(Collectors.toMap(x -> x.id, Function.identity(),
(x1, x2) -> new pMap(x1.id, x1.rating != null ? x1.rating : x2.rating, x1.movieName != null ? x1.movieName : x2.movieName)));
I am passing two functions to toMap method:
First one is a key mapper. It maps an element to a key of the map. In this case, I want the key to be the id.
The second one is a value mapper. I want the value to be the actual pMap so this is why I am passing the identity function.
The third argument is a merger function that tells how to merge two values with the same id.

Related

How to Add up the values in a Nested Collection using Streams

I have the following TicketDTO Object:
public class TicketDTO {
private LocalDate date;
private Set<OffenceDTO> offences;
}
And every OffenceDTO has an int field - penalty points.
public class OffenceDTO {
private int penaltyPoints;
}
I would like to add up the penalty points to a single int value by streaming the Set of Offenses of each Ticket. But only if the ticket's date is between the last two years.
I have collected tickets from the last two years, but now I have a problem in how to go through the offenses and count their points.
This is what I've written so far:
tickets().stream()
.filter(ticketEntity -> isDateBetween(LocalDate.now(), ticketEntity.getDate()))
.collect(Collectors.toList());
I would like to collect the penalty points in a single int value by streaming the set of tickets
It can be done in the following steps:
Turn the stream of filtered tickets into a stream of OffenceDTO using flatMap();
Extract penalty points from OffenceDTO with mapToInt(), that will transform a stream of objects into a IntStream;
Apply sum() to get the total.
int totalPenalty = tickets().stream()
.filter(ticketEntity -> isDateBetween(LocalDate.now(), ticketEntity.getDate()))
.flatMap(ticketDTO -> ticketDTO.getOffences().stream())
.mapToInt(OffenceDTO::getPenaltyPoints)
.sum();
Assuming that tickets() is a method that returns a List of TicketDTO, you could stream the List and filter its elements with your custom method isDateBetween (as you were doing).
Then flat the mapping of each ticket to their corresponding offences. This will provide you a stream of OffenceDTO whose TicketDTO is between the last two years (according to your isDateBetween method).
Ultimately, you can collect the points of each OffenceDTO by summing them with the summingInt method of the Collectors class.
int res = tickets().stream()
.filter(ticketEntity -> isDateBetween(LocalDate.now(), ticketEntity.getDate()))
.flatMap(ticketDTO -> ticketDTO.getOffences().stream())
.collect(Collectors.summingInt(OffenceDTO::getPenaltyPoints));

Read values from Java Map using Spark Column using java

I have tried below code to get Map values via spark column in java but getting null value expecting exact value from Map as per key search.
and Spark Dataset contains one column and name is KEY and dataset name dataset1
values in dataset :
KEY
1
2
Java Code -
Map<String,string> map1 = new HashMap<>();
map1.put("1","CUST1");
map1.put("2","CUST2");
dataset1.withColumn("ABCD", functions.lit(map1.get(col("KEY"))));
Current Output is:
ABCD (Column name)
null
null
Expected Output :
ABCD (Column name)
CUST1
CUST2
please me get this expected output.
The reason why you get this output is pretty simple. The get function in java can take any object as input. If that object is not in the map, the result is null.
The lit function in spark is used to create a single value column (all rows have the same value). e.g. lit(1) creates a column that takes the value 1 for each row.
Here, map1.get(col("KEY")) (that is executed on the driver), asks map1 the value corresponding to a column object (not the value inside the column, the java/scala object representing the column). The map does not contain that object so the result is null. Therefore, you could as well write lit(null). This is why you get a null result inside your dataset.
To solve your problem, you could wrap your map access within a UDF for instance. Something like:
UserDefinedFunction map_udf = udf(new UDF1<String, String>() {
#Override
public String call(String x) {
return map1.get(x);
}
}, DataTypes.StringType );
spark.udf().register("map_udf", map_udf);
result.withColumn("ABCD", expr("map_udf(KEY)"));

Preserving memory with two HashMaps that contain similar values

I am loading 2 large datasets into two separate HashMaps, sequentially. (The datasets are serialized into many Record objects, depicted below). The HashMaps are represented like so, with the key as the id of the Record:
Map<Long, Record> recordMapA = new HashMap<>();
Map<Long, Record> recordMapB = new HashMap<>();
The Record object looks like so:
class Record {
Long id;
Long timestamp;
String category;
String location;
}
In many cases, the records are the same between the two datasets, except that the timestamp field differs. For my use case, any two Record objects are equal if all field values except for the timestamp field are the same.
// These two records are the same because only the timestamp differs
Record recordA = new Record(54321, 1615270861975L, "foo", "USA");
Record recordB = new Record(54321, 1615357219994L, "foo", "USA");
To preserve memory, is there a way to make it so that if two Record objects are "equal", both of those map entry values in maps A and B would refer to the same Record object in memory? I've overridden the equals and hashCode methods for the Record object to ignore timestamp, then checked if RecordMapA already contains the same record. If so, I put the record from RecordMapA into RecordMapB instead of putting the new Record that has been serialized from dataset B into Map B. However the impact on memory seems negligible so far.
One side note is that I need to retain both maps (instead of merging them into one) for purposes of comparison later.
If the records are 'small enough' then I would not bother trying anything fancy. For large records, the easiest way seems to be to do what you're doing.
void addToMap(Long key, Record rec, Map<Long,Record> map,
Map<Long,Record> otherMap) {
Record existing = otherMap.get(key);
map.put(key, existing != null ? existing : rec);
]
Assumes that if the key is present then the record located by the key must be the same. If not the case, you'll need to check.
void addToMap(Long key, Record rec, Map<Long,Record> map,
Map<Long,Record> otherMap) {
Record existing = otherMap.get(key);
if (existing != null && existing.equals(rec))
map.put(key, existing);
else
map.put(key, rec);
]

Check if all object entities are equal using Java Streams [duplicate]

I am new to Java 8. I have a list of custom objects of type A, where A is like below:
class A {
int id;
String name;
}
I would like to determine if all the objects in that list have same name. I can do it by iterating over the list and capturing previous and current value of names. In that context, I found How to count number of custom objects in list which have same value for one of its attribute. But is there any better way to do the same in java 8 using stream?
You can map from A --> String , apply the distinct intermediate operation, utilise limit(2) to enable optimisation where possible and then check if count is less than or equal to 1 in which case all objects have the same name and if not then they do not all have the same name.
boolean result = myList.stream()
.map(A::getName)
.distinct()
.limit(2)
.count() <= 1;
With the example shown above, we leverage the limit(2) operation so that we stop as soon as we find two distinct object names.
One way is to get the name of the first list and call allMatch and check against that.
String firstName = yourListOfAs.get(0).name;
boolean allSameName = yourListOfAs.stream().allMatch(x -> x.name.equals(firstName));
another way is to calculate count of distinct names using
boolean result = myList.stream().map(A::getName).distinct().count() == 1;
of course you need to add getter for 'name' field
One more option by using Partitioning. Partitioning is a special kind of grouping, in which the resultant map contains at most two different groups – one for true and one for false.
by this, You can get number of matching and not matching
String firstName = yourListOfAs.get(0).name;
Map<Boolean, List<Employee>> partitioned = employees.stream().collect(partitioningBy(e -> e.name==firstName));
Java 9 using takeWhile takewhile will take all the values until the predicate returns false. this is similar to break statement in while loop
String firstName = yourListOfAs.get(0).name;
List<Employee> filterList = employees.stream()
.takeWhile(e->firstName.equals(e.name)).collect(Collectors.toList());
if(filterList.size()==list.size())
{
//all objects have same values
}
Or use groupingBy then check entrySet size.
boolean b = list.stream()
.collect(Collectors.groupingBy(A::getName,
Collectors.toList())).entrySet().size() == 1;

Fast aggregation of multiple ArrayLists into a single one

I have the following list:
List<ArrayList> list;
list.get(i) contains the ArrayList object with the following values {p_name=set1, number=777002}.
I have to create a
Map<key,value>
where the key contains the p_name, and values are the numbers.
How to do it easily and fast as there can be hundreds of entries in the initial list and each number can be present in multiple p_name entries.
Update: Here is my current solution
List<Row> list; //here is my data
Map<String,String> map = new TreeMap<String,String>();
for (Row l : list) {
if (l.hasValues()) {
Map<String, String> values = l.getResult(); // internal method of Row interface that returns a map
String key = values.get( "number");
map.put(key, values.get( "p_name" ));
}
}
The method works, but maybe it could be done better?
PS : There is an obvious error in my design. I wonder if you find it :)
Sine the key can have more then one values, what you are looking for is a MultiMap. Multimap
Or a simple map in the form
Map<Key,ArrayList<Values>>
There is no "fast" way here to me. You still need to iterate through all the elements and check all the values.
And actually hundreds to Java is not much at all

Categories