Getting the sum and max from a list using stream - java

Hi I have a List where the data looks like this
[{"month":"April","day":"Friday","count":5},
{"month":"April","day":"Monday","count":6},
{"month":"April","day":"Saturday","count":2},
{"month":"April","day":"Sunday","count":1},
{"month":"April","day":"Thursday","count":7},
{"month":"April","day":"Tuesday","count":8},
{"month":"April","day":"Wednesday","count":10},
{"month":"March","day":"Friday","count":3},
{"month":"March","day":"Monday","count":2},
{"month":"March","day":"Saturday","count":15},
{"month":"March","day":"Sunday","count":11},
{"month":"March","day":"Thursday","count":4},
{"month":"March","day":"Tuesday","count":20},
{"month":"March","day":"Wednesday","count":7},
{"month":"May","day":"Friday","count":2},
{"month":"May","day":"Monday","count":0},
{"month":"May","day":"Saturday","count":7},
{"month":"May","day":"Sunday","count":4},
{"month":"May","day":"Thursday","count":8},
{"month":"May","day":"Tuesday","count":3},
{"month":"May","day":"Wednesday","count":6}]
My object class is
String month;
String day;
Integer count;
What I want to get by using stream is sum of count grouped by month and the day with max count for that month.
so end result will look something like
April, Wednesday, 39
March, Tuesday, 62
May, Thursday , 30
I have been trying to use stream and grouping by but no luck. Any help is appreciated. Thanks
EDIT
Map<String, Integer> totalMap = transactions.stream().collect(Collectors.groupingBy(MonthlyTransaction::getMonth, Collectors.summingInt(MonthlyTransaction::getCount)));
Map<String, String> maxMap = transactions.stream().collect(Collectors.groupingBy(MonthlyTransaction::getMonth)).values().stream().toMap(Object::getDay, Collextions.max(Object::getCount);
obviously the maxMap method is wrong but I do not know how to write it.

If you want to find both the sum of counts per month and the day with the max count per month in a single pass, I think you need a custom collector.
First, let's create a holder class where to store the results:
public class Statistics {
private final String dayWithMaxCount;
private final long totalCount;
public Statistics(String dayWithMaxCount, long totalCount) {
this.dayWithMaxCount = dayWithMaxCount;
this.totalCount = totalCount;
}
// TODO getters and toString
}
Then, create this method, which returns a collector that accumulates both the sum of counts and the max count, along with the day in which that max was found:
public static Collector<MonthlyTransaction, ?, Statistics> withStatistics() {
class Acc {
long sum = 0;
long maxCount = Long.MIN_VALUE;
String dayWithMaxCount;
void accumulate(MonthlyTransaction transaction) {
sum += transaction.getCount();
if (transaction.getCount() > maxCount) {
maxCount = transaction.getCount();
dayWithMaxCount = transaction.getDay();
}
}
Acc merge(Acc another) {
sum += another.sum;
if (another.maxCount > maxCount) {
maxCount = another.maxCount;
dayWithMaxCount = another.dayWithMaxCount;
}
return this;
}
Statistics finish() {
return new Statistics(dayWithMaxCount, sum);
}
}
return Collector.of(Acc::new, Acc::accumulate, Acc::merge, Acc::finish);
}
This uses the local class Acc to accumulate and merge partial results. The finish method returns an instance of the Statistics class, which holds the final results. At the end, I'm using Collector.of to create a collector based on the methods of the Acc class.
Finally, you can use the method and class defined above as follows:
Map<String, Statistics> statisticsByMonth = transactions.stream()
.collect(Collectors.groupingBy(MonthlyTransaction::getMonth, withStatistics()));

did this in 2 steps instead of trying to write 1 stream to achieve the result
//First get the total of counts grouping by month
Map<String, Integer> totalMap = transactions.stream()
.collect(Collectors.groupingBy(MonthlyTransaction::getMonth, Collectors.summingInt(MonthlyTransaction::getCount)));
List<MonthlyTransaction> finalStat = new ArrayList<>();
//iterate over the total count map
totalMap.entrySet().stream().forEach(entry -> {
//Using the Stream filter to mimic a group by
MonthlyTransaction maxStat = transactions.stream()
.filter(t -> t.getMonth().equals(entry.getKey()))
//getting the item with the max count for the month
.max(Comparator.comparing(MonthlyTransaction::getCount)).get();
//Setting the count to the total value from the map as the max count value is not a requirement.
maxStat.setCount(entry.getValue());
//add the item to the list
finalStat.add(maxStat);
});
This may not be the best approach to the problem but this gives me the exact result. Thanks to everyone who had a look at it and tried to help.

Related

How can I filter rows having particular column value which is above average using java for input comma separated file

Suppose I have some file having some data in comma separated format as below
TIMESTAMP,COUNTRYCODE,RESPONSETIME
1544190995,US,500
1723922044,GB,370
1711557214,US,750
How can I filter rows by RESPONSETIME above average using java ?ie Here the average of RESPONSETIME is 526.So I need to display all rows having RESPONSETIME greater than 526.Here datalines are not guaranteed to be in any particular order.Can we do both(finding average and filter rows having RESPONSETIME above average) in a single method?
Currently I am finding average as below .How can I apply filter and return as collection inside the same method?As per my understanding its not possible to read a file twice inside same method.
public static Collection<?> filterByResponseTimeAboveAverage(Reader source) {
BufferedReader br = new BufferedReader(source);
String line = null;
Collection<String> additionalList = new ArrayList<String>();
int iteration = 0;
String[] myArray = null;
long count=0;
long responseTime=0;
long sum=0;
int numOfResponseTime=0;
long average=0;
List<String> myList = new ArrayList<String>();
try
{
while ((line = br.readLine()) != null) {
System.out.println("Inside while");
if (iteration == 0) {
iteration++;
continue;
}
myArray = line.split(",");
for (String eachval:myArray)
{
boolean isNumeric = eachval.chars().allMatch(x -> Character.isDigit(x));
//since input dataline is not guaranted to be in any particular order I am finding RESPONSETIME like this
if (isNumeric)
{
count=eachval.chars().count();
if (count<10)
{
responseTime=Integer.parseInt(eachval);
sum=sum+responseTime;
numOfResponseTime++;
}
}
myList.add(eachval);
}
}
average=sum/numOfResponseTime;
System.out.println("Average -- "+average);
---------------
---------------
}
As per my understanding its not possible to read a file twice inside
same method.
You can but you should not do this as it is not efficient.
You have mainly two ways of proceeding.
Optimized way :
reading all values from the file and compute the average of RESPONSETIME.
filtering values above the average
You could introduce a private methods invoked by filterByResponseTimeAboveAverage() to retrieve both all values from the source and compute average of them.
Functional way (a little more expensive in overhead) :
reading all values from the file
use IntStream.average() to compute the average of RESPONSETIME.
filtering values above the average
For the second and last step, it could be like :
double average = list
.stream()
.mapToInt(MyObject::getAverage)
.average()
.getAsDouble();
List<MyObject> filteredElements = list
.stream()
.filter(o-> o.getAverage() > average)
.collect(Collectors.toList());
How can I apply filter and return as collection inside the same method?
Use Java 8 streams and lambdas

Best Data Structure for fast retrieval, update, and keeping ordering

The problem is as follows
I need to keep track of url + click count.
I need to be able to update url quickly with click count when user click on that url.
I need to be able to retrieve the top 10 click count URL quickly.
NOTE: Assuming you cannot use the database.
What is the best data structure to achieve the result?
I have thought about using a map before, but map doesn't keep track of ordering of the top 10 clicks.
You need an additional List<Map.Entry<URL,Integer>> for holding the top ten, with T being the click count for the lowermost.
If you count another click and this count is still not greater than T: do nothing.
If the increased count is greater than T, check whether the URL is in the list or not. If it is, do nothing. If it is not, add this entry to the List, sort and delete the last entry if the list has more than 10 entries. Update T.
The best data structure I can think is of using the TreeSet.
The elements of TreeSet are sorted, so you can easily find top items.
Also make sure for URL you maintain a separate comparator class which implements
Comparator, so you can put your logic of keeping elements sorted all
the time based on count. Use this comparator while creating the TreeSet. Insertion/Update/delete/Get all operations happen in O(logn)
Here is the code, how you should define the structure.
TreeSet<URL> treeSet = new TreeSet<URL>(new URLComparator());
class URL {
private String url;
int count;
public URL(String string, int i) {
url = string;
count = i;
}
#Override
public int hashCode() {
return url.hashCode();
}
#Override // No need to write this method. Just used it for testing
public String toString() {
return "url : " + url + " ,count : " + count+"\n";
}
}
One more info- Use hashcode method of your URL class as hashcode of your url.
This is how you define URLComparator class. compare logic is based on URL count.
class URLComparator implements Comparator<URL> {
#Override
public int compare(URL o1, URL o2) {
return new Integer(o2.count).compareTo(o1.count);
}
}
Testing
TreeSet<URL> treeSet = new TreeSet<URL>(new URLComparator());
treeSet.add(new URL("url1", 12));
treeSet.add(new URL("url2", 0));
treeSet.add(new URL("url3", 5));
System.out.println(treeSet);
Output:-
[url : url1 ,count : 12
, url : url3 ,count : 5
, url : url2 ,count : 0
]
To print top 10 elements, use following code.
Iterator<URL> iterator = treeSet.iterator();
int count = 0;
while(count < 10 && iterator.hasNext() ){
System.out.println(iterator.next());
count++;
}
You can use a Map<String, Integer> for the use case as:
It keeps track of key(url) and value(click count)
You can put to the map an updated url with mapped click count when user click on that url.
You can retrieve the top 10 click count after sorting the map based on the entryset
// create a list out of the entryset of your map
Set<Map.Entry<String, Integer>> set = map.entrySet();
List<Map.Entry<String, Integer>> list = new ArrayList<>(set);
// this can be clubbed in another stub to act on top 'N' click counts
list.sort((o1, o2) -> (o2.getValue()).compareTo(o1.getValue()));
list.stream().limit(10).forEach(entry ->
System.out.println(entry.getKey() + " ==== " + entry.getValue()));
Using Map, you will have to sort the values for top 10 urls.
which will egt you o(nlogn) complexity using comparator for sorting by values.
Another Way is:
Using Doubly linked list(of size 10) with a HashMap (And proceeding in a LRU cache way)
Retrieve/Update will be o(1).
Top 10 results will be items in list.
Structure of Doubly list :
class UrlAndCountNode{
String url;
int count;
UrlAndCountNode next;
UrlAndCountNode prev;
}
Structure of Map:
Map<String, UrlAndCountNode>
That's an interesting question IMO. It seems you need something that is sorted by clicks, but at the same time you need to alter these values, the only way to do that with a data structure is to remove that entry (that you want to update) and put the updated one back. Simply updating clicks will not work. As such I think that keeping them sorted by clicks is a batter option.
The downside is that if there are entries with the same number of clicks, they will get overriden, as such something like guava multiset would be a much better option.
As such I would do this:
static class Holder {
private final String name;
private final int clicks;
public Holder(String name, int clicks) {
super();
this.name = name;
this.clicks = clicks;
}
public String getName() {
return name;
}
public int getClicks() {
return clicks;
}
#Override
public String toString() {
return "name = " + name + " clicks = " + clicks;
}
}
And methods would look like this:
private static List<Holder> firstN(Multiset<Holder> set, int n) {
return set.stream().limit(n).collect(Collectors.toList());
}
private static void updateOne(Multiset<Holder> set, String urlName, int more) {
Iterator<Holder> iter = set.iterator();
int currentClicks = 0;
boolean found = false;
while (iter.hasNext()) {
Holder h = iter.next();
if (h.getName().equals(urlName)) {
currentClicks = h.getClicks();
iter.remove();
found = true;
}
}
if (found) {
set.add(new Holder(urlName, currentClicks + more));
}
}

How to Sort a list of strings and find the 1000 most common values in java

In java (either using external libraries or not) I need to take a list of approximately 500,000 values and find the most frequently occurring (mode) 1000. Doing my best to keep the complexity to a minimum.
What I've tried so far, make a hash, but I can't because it would have to be backwards key=count value =string, otherwise when getting the top 1000, my complexity will be garbage. and the backwards way doesn't really work great because I would be having a terrible complexity for insertion as I search for where my string is to be able to remove it and insert it one higher...
I've tried using a binary search tree, but that had the same issue of what the data would be for sorting, either on the count or the string. If it's on the string then getting the count for the top 1000 is bad, and vice versa insertion is bad.
I could sort the list first (by string) and then iterate over the list and keep a count until it changes strings. but what data structure should I use to keep track of the top 1000?
Thanks
I would first create a Map<String, Long> to store the frequency of each word. Then, I'd sort this map by value in descending order and finally I'd keep the first 1000 entries.
In code:
List<String> top1000Words = listOfWords.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet().stream()
.sorted(Map.Entry.comparingByValue().reversed())
.limit(1000)
.map(Map.Entry::getKey)
.collect(Collectors.toList());
You might find it cleaner to separate the above into 2 steps: first collecting to the map of frequencies and then sorting its entries by value and keeping the first 1000 entries.
I'd separate this into three phases:
Count word occurrences (e.g. by using a HashMap<String, Integer>)
Sort the results (e.g. by converting the map into a list of entries and ordering by value descending)
Output the top 1000 entries of the sorted results
The sorting will be slow if the counts are small (e.g. if you've actually got 500,000 separate words) but if you're expecting lots of duplicate words, it should be fine.
I have had this question open for a few days now and have decided to rebel against Federico's elegant Java 8 answer and submit the least Java 8 answer possible.
The following code makes use of a helper class that associates a tally with a string.
public class TopOccurringValues {
static HashMap<String, StringCount> stringCounts = new HashMap<>();
// set low for demo. Change to 1000 (or whatever)
static final int TOP_NUMBER_TO_COLLECT = 10;
public static void main(String[] args) {
// load your strings in here
List<String> strings = loadStrings();
// tally up string occurrences
for (String string: strings) {
StringCount stringCount = stringCounts.get(string);
if (stringCount == null) {
stringCount = new StringCount(string);
}
stringCount.increment();
stringCounts.put(string, stringCount);
}
// sort which have most
ArrayList<StringCount> sortedCounts = new ArrayList<>(stringCounts.values());
Collections.sort(sortedCounts);
// collect the top occurring strings
ArrayList<String> topCollection = new ArrayList<>();
int upperBound = Math.min(TOP_NUMBER_TO_COLLECT, sortedCounts.size());
System.out.println("string\tcount");
for (int i = 0; i < upperBound; i++) {
StringCount stringCount = sortedCounts.get(i);
topCollection.add(stringCount.string);
System.out.println(stringCount.string + "\t" + stringCount.count);
}
}
// in this demo, strings are randomly generated numbers.
private static List<String> loadStrings() {
Random random = new Random(1);
ArrayList<String> randomStrings = new ArrayList<>();
for (int i = 0; i < 5000000; i++) {
randomStrings.add(String.valueOf(Math.round(random.nextGaussian() * 1000)));
}
return randomStrings;
}
static class StringCount implements Comparable<StringCount> {
int count = 0;
String string;
StringCount(String string) {this.string = string;}
void increment() {count++;}
#Override
public int compareTo(StringCount o) {return o.count - count;}
}
}
55 lines of code! It's like reverse code golf. The String generator creates 5 million strings instead of 500,000 because: why not?
string count
-89 2108
70 2107
77 2085
-4 2077
36 2077
65 2072
-154 2067
-172 2064
194 2063
-143 2062
The randomly generated strings can have values between -999 and 999 but because we are getting gaussian values, we will see numbers with higher scores that are closer to 0.
The Solution I chose to use was to first make a hash map with key value pairs as . I got the count by iterating over a linked list, and inserting the key value pair, Before insertion I would check for existence and if so increase the count. That part was quite straight forward.
The next part where I needed to sort it according to it's value, I used a library called guava published by google and it was able to make it very easy to sort by value instead of key using what they called a multimap. where they in a sense reverse the hash, and allow multiple values to be mapped to one key, so that I can have all my top 1000, opposed to some solutions mentioned above which didn't allow that, and would cause me to just get one value per key.
The last step was to iterate over the multimap (backwards) to get the 1000 most frequent occurrences.
Have a look at the code of the function if you're interested
private static void FindNMostFrequentOccurences(ArrayList profileName,int n) {
HashMap<String, Integer> hmap = new HashMap<String, Integer>();
//iterate through our data
for(int i = 0; i< profileName.size(); i++){
String current_id = profileName.get(i).toString();
if(hmap.get(current_id) == null){
hmap.put(current_id, 1);
} else {
int current_count = hmap.get(current_id);
current_count += 1;
hmap.put(current_id, current_count);
}
}
ListMultimap<Integer, String> multimap = ArrayListMultimap.create();
hmap.entrySet().forEach(entry -> {
multimap.put(entry.getValue(), entry.getKey());
});
for (int i = 0; i < n; i++){
if (!multimap.isEmpty()){
int lastKey = Iterables.getLast(multimap.keys());
String lastValue = Iterables.getLast(multimap.values());
multimap.remove(lastKey, lastValue);
System.out.println(i+1+": "+lastValue+", Occurences: "+lastKey);
}
}
}
You can do that with the java stream API :
List<String> input = Arrays.asList(new String[]{"aa", "bb", "cc", "bb", "bb", "aa"});
// First we compute a map of word -> occurrences
final Map<String, Long> collect = input.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
// Here we sort the map and collect the first 1000 entries
final List<Map.Entry<String, Long>> entries = new ArrayList<>(collect.entrySet());
final List<Map.Entry<String, Long>> result = entries.stream()
.sorted(Comparator.comparing(Map.Entry::getValue, Comparator.reverseOrder()))
.limit(1000)
.collect(Collectors.toList());
result.forEach(System.out::println);

Intersecting ranges when using RangeMap

I came across a problem like
You are maintaining a trading platform for a hedge fund. Traders in your hedge fund execute trading strategies throughout the day.
For the sake of simplicity let's assume that each trading strategy makes i pounds/minute for as long as it's running. i can be negative.
In the end of the day you have a log file that looks as follows:
timestamp_start_1, timestamp_end_1, i_1
timestamp_start_2, timestamp_end_2, i_2
timestamp_start_3, timestamp_end_3, i_3
Each line represents when the strategy started executing, when it stopped, and the income produced as a rate.
Write some code to return the time during the day when the hedge fund was making the highest amount of money per minute.
Examples:
Input:
(1, 13, 400)
(10, 20, 100)
Result :
(10,13) 500
Input:
(12,14, 400)
(10,20,100)
Result:
(12,14) 500
Input:
(10, 20, 400)
(21,25,100)
Result:
(10,20) 400
I have been trying to use guava RangeMap to solve it, but there is no apparent way to intersect the overlapping intervals.
For example:
private static void method(Record[] array){
RangeMap<Integer, Integer> rangeMap = TreeRangeMap.create();
for (Record record : array) {
rangeMap.put(Range.closed(record.startTime, record.endTime), record.profitRate);
}
System.out.println(rangeMap);
}
public static void main(String[] args) {
Record[] array = {new Record(1,13,400), new Record(10,20,100)};
method(array);
}
And the map looks like:
[[1..10)=400, [10..20]=100]
Is there any way to override the overlap behavior or any other data structure that can be used to solve the problem?
Use RangeMap.subRangeMap(Range) to identify all the existing range entries intersecting with some other particular range, which will allow you to filter out the intersections.
This might look something like:
void add(RangeMap<Integer, Integer> existing, Range<Integer> range, int add) {
List<Map.Entry<Range<Integer>, Integer>> overlaps = new ArrayList<>(
existing.subRangeMap(range).asMapOfRanges().entrySet());
existing.put(range, add);
for (Map.Entry<Range, Integer> overlap : overlaps) {
existing.put(overlap.getKey(), overlap.getValue() + add);
}
}
As mentioned in the comments: A RangeMap is probably not suitable for this, because the ranges of a RangeMap have to be disjoint.
One approach for solving this in general was mentioned in the comments: One could combine all the ranges, and generate all disjoint ranges from that. For example, given these ranges
|------------| :400
|----------|:100
one could compute all sub-ranges that are implied by their intersections
|--------| :400
|---| :500
|------|:100
Where the range in the middle would obviously be the solution in this case.
But in general, there are some imponderabilities in the problem statement. For example, it is not entirely clear whether multiple ranges may have the same start time and/or the same end time. Things that may be relevant for possible optimizations may be whether the records are "ordered" in any way.
But regardless of that, one generic approach could be as follows:
Compute a mapping from all start times to the records that start there
Compute a mapping from all end times to the records that end there
Walk through all start- and end times (in order), keeping track of the accumulated profit rate for the current interval
(Yes, this is basically the generation of the disjoint sets from the comments. But it does not "construct" a data structure holding this information. It just uses this information to compute the maximum, on the fly).
An implementation could look like this:
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.Set;
import java.util.TreeSet;
class Record
{
int startTime;
int endTime;
int profitRate;
public Record(int startTime, int endTime, int profitRate)
{
this.startTime = startTime;
this.endTime = endTime;
this.profitRate = profitRate;
}
#Override
public String toString()
{
return "(" + startTime + "..." + endTime + ", " + profitRate + ")";
}
}
public class MaxRangeFinder
{
public static void main(String[] args)
{
test01();
test02();
}
private static void test01()
{
System.out.println("Test case 01:");
Record[] records = {
new Record(1,13,400),
new Record(10,20,100),
};
Record max = computeMax(Arrays.asList(records));
printNicely(Arrays.asList(records), max);
}
private static void test02()
{
System.out.println("Test case 02:");
Record[] records = {
new Record(1,5,100),
new Record(2,6,200),
new Record(3,4,50),
new Record(3,4,25),
new Record(5,8,200),
};
Record max = computeMax(Arrays.asList(records));
printNicely(Arrays.asList(records), max);
}
private static Record computeMax(Collection<? extends Record> records)
{
// Create mappings from the start times to all records that start
// there, and from the end times to the records that end there
Map<Integer, List<Record>> recordsByStartTime =
new LinkedHashMap<Integer, List<Record>>();
for (Record record : records)
{
recordsByStartTime.computeIfAbsent(record.startTime,
t -> new ArrayList<Record>()).add(record);
}
Map<Integer, List<Record>> recordsByEndTime =
new LinkedHashMap<Integer, List<Record>>();
for (Record record : records)
{
recordsByEndTime.computeIfAbsent(record.endTime,
t -> new ArrayList<Record>()).add(record);
}
// Collect all times where a record starts or ends
Set<Integer> eventTimes = new TreeSet<Integer>();
eventTimes.addAll(recordsByStartTime.keySet());
eventTimes.addAll(recordsByEndTime.keySet());
// Walk over all events, keeping track of the
// starting and ending records
int accumulatedProfitRate = 0;
int maxAccumulatedProfitRate = -Integer.MAX_VALUE;
int maxAccumulatedProfitStartTime = 0;
int maxAccumulatedProfitEndTime = 0;
for (Integer eventTime : eventTimes)
{
int previousAccumulatedProfitRate = accumulatedProfitRate;
// Add the profit rate of the starting records
List<Record> startingRecords = Optional
.ofNullable(recordsByStartTime.get(eventTime))
.orElse(Collections.emptyList());
for (Record startingRecord : startingRecords)
{
accumulatedProfitRate += startingRecord.profitRate;
}
// Subtract the profit rate of the ending records
List<Record> endingRecords = Optional
.ofNullable(recordsByEndTime.get(eventTime))
.orElse(Collections.emptyList());
for (Record endingRecord : endingRecords)
{
accumulatedProfitRate -= endingRecord.profitRate;
}
// Update the information about the maximum, if necessary
if (accumulatedProfitRate > maxAccumulatedProfitRate)
{
maxAccumulatedProfitRate = accumulatedProfitRate;
maxAccumulatedProfitStartTime = eventTime;
maxAccumulatedProfitEndTime = eventTime;
}
if (previousAccumulatedProfitRate == maxAccumulatedProfitRate &&
accumulatedProfitRate < previousAccumulatedProfitRate)
{
maxAccumulatedProfitEndTime = eventTime;
}
}
return new Record(
maxAccumulatedProfitStartTime,
maxAccumulatedProfitEndTime,
maxAccumulatedProfitRate);
}
private static void printNicely(
Collection<? extends Record> records,
Record max)
{
StringBuilder sb = new StringBuilder();
int maxEndTime = Collections.max(records,
(r0, r1) -> Integer.compare(r0.endTime, r1.endTime)).endTime;
for (Record record : records)
{
sb.append(" ")
.append(createString(record, maxEndTime))
.append("\n");
}
sb.append("Max: ").append(createString(max, maxEndTime));
System.out.println(sb.toString());
}
private static String createString(Record record, int maxEndTime)
{
StringBuilder sb = new StringBuilder();
int i = 0;
while (i < record.startTime)
{
sb.append(" ");
i++;
}
sb.append("|");
while (i < record.endTime)
{
sb.append("-");
i++;
}
sb.append("|");
while (i < maxEndTime)
{
sb.append(" ");
i++;
}
sb.append(":").append(record.profitRate);
return sb.toString();
}
}
The output for the two test cases given in the code is
Test case 01:
|------------| :400
|----------|:100
Max: |---| :500
Test case 02:
|----| :100
|----| :200
|-| :50
|-| :25
|---|:200
Max: |-| :400

Java : How to do aggregation over a list supporting min, max, avg, last kind of aggregations in each group

I have done this earlier in MySQL itself, as that seems the proper way, but I have to do some business logic calculations and then need to apply the group by on the resulting list, any suggestions to do this in Java without compromising performance (have looked at lambdaj, seems it slows down due to heavy use of proxies, haven't tried though).
List<Item> contains name, value, unixtimestamp as properties, and is returned by the database.
Each record is 5 mins apart.
I should be able to group by a dynamic sample time, say 1 hour, which means have to group every 12 records into one record, and then apply min, max, avg, last on each group.
Any suggestions appreciated.
[Update] Have the below working, yet to do aggregation on each of the list elements on the indexed map value. As you see, I created a map of lists, where key is the integer representation sample time requested (30 is the sample requested here).
private List<Item> performConsolidation(List<Item> items) {
ListMultimap<Integer, Item> groupByTimestamp = ArrayListMultimap.create();
List<Item> consolidatedItems = new ArrayList<>();
for (Item item : items) {
groupByTimestamp.put((int)floor(((Double.valueOf(item.getItem()[2])) / 1000) / (60 * 30)), item);
}
return consolidatedItems;
}
Here is one suggestion:
public Map<Long,List<Item>> group_items(List<Item> items,long sample_period) {
Map<Long,List<Item>> grouped_result = new HashMap<Long,List<Item>>();
long group_key;
for (Item item: items) {
group_key = item.timestamp / sample_period;
if (grouped_result.containsKey(group_key)) {
grouped_result.get(group_key).add(item);
}
else {
grouped_result.put(group_key, new ArrayList<Item>());
grouped_result.get(group_key).add(item);
}
}
return grouped_result;
}
sample_period is the number of seconds to group by: 3600 = hour, 900 = 15 mins
The keys in the map can of course be pretty big numbers (depending on the sample period), but this grouping will preserve the internal time order of the groups, i.e. lower keys are those that come first in the time order. If we assume that the data in the original list is ordered in time order we could of course get the value of the first key and then subtract that from the keys. That way we will get keys 0, 1, etc. In that case before the for loop starts we need:
int subtract = items.get(0).timestamp / sample_period; // note since both numbers a ints/longs we have a integer division
Then inside the for loop:
group_key = items.timestamp / sample_period - subtract;
Something along these lines will work, i.e. group your dataset as you describe. Then you can apply min max avg etc to the resulting lists. But since those functions will of course have to iterate over individual group lists again it is maybe better to incorporate those calculations into this solution, and have the function return something like Map where Aggregates is a new type containing fields for avg, min, max, and then a list of the items in the group? As for performance I would think this is acceptable. This is a plain O(N) solution.
Edit:
ok just want to add a more complete solution/suggestion which also calculates the min, max and avg:
public class Aggregate {
public double avg;
public double min;
public double max;
public List<Item> items = new ArrayList<Item>();
public Aggregate(Item item) {
min = item.value;
max = item.value;
avg = item.value;
items.add(item);
}
public void addItem(Item item) {
items.add(item);
if (item.value < this.min) {
this.min = item.value;
}
else if (item.value > this.max) {
this.max = item.value;
}
this.avg = (this.avg * (this.items.size() - 1) + item.value) / this.items.size();
}
}
public Map<Long,Aggregate> group_items(List<Item> items,long sample_period) {
Map<Long,Aggregate> grouped_result = new HashMap<Long,Aggregate>();
long group_key;
long subtract = items.get(0).timestamp / sample_period;
for (Item item: items) {
group_key = items.timestamp / sample_period - subtract;
if (grouped_result.containsKey(group_key)) {
grouped_result.get(group_key).addItem(item);
}
else {
grouped_result.put(group_key, new Aggregate(item));
}
}
return grouped_result;
}
that is just a rough solution. We might want to add some more properties to the aggregate etc.
Setting aside the computation of min/max/etc., I note that your performConsolidation method looks like it could use Multimaps.index. Just pass it the items and a Function<Item, Integer> that computes the value you want:
return (int) floor(((Double.valueOf(item.getItem()[2])) / 1000) / (60 * 30));
That won't save a ton of code, but it may make it easier to see what's happening at a glance: index(items, timeBucketer).
If you can use my xpresso project you can do the following:
Let your input list be:
list<tuple> items = x.list(x.tuple("name1",1d,100),x.tuple("name2",3d,105),x.tuple("name1",4d,210));
You first unzip your list of tuple to get a tuple of lists:
tuple3<list<String>,list<Double>,list<Integer>> unzipped = x.unzip(items, String.class, Double.class, Integer.class);
Then you can aggregate the way you want:
x.print(x.tuple(x.last(unzipped.value0), x.avg(unzipped.value1), x.max(unzipped.value2)));
The preceding will produce:
(name1,2.67,210)

Categories