My API needs to read large recordset and transform it an hierarchy (JSON) so that the UI (Angular) can display it appropriately. I am looking for an efficient way to achive this transformation (for 1000s of records).
Which Collection type is best suited? Are there any preferred mappers?
Details:
public class Batch implements Serializable {
private Timestamp deliveryDateTime;
private String deliveryLocation;
private String patientName;
// other batch details
}
I have a list of batches Collection<Batch>. When I return this collection to UI, it needs to be first sorted by deliveryDateTime, and then by deliveryLocation, and then by patientName.
The resulting JSON will look like:
{
"deliveryDateTimes": [
{
"deliveryDateTime": "Mon, 20-Nov-2017",
"deliveryLocations": [
{
"deliveryLocation": "location1",
"patients": [
{
"patientName": "LastName1, FirstName1",
"batches": [
{
"otherBatchDetails": "other batch details"
// other batch details.
},
{
"otherBatchDetails": "other batch details"
// other batch details.
}
]
}
]
}
]
}
]
}
You can try this one. I have tried and it works fine for me.
public class BatchTest {
public static void main(String[] args) {
List<Batch> sortedList = generateBatches().stream().
sorted(Comparator.comparing(Batch::getDeliveryDateTime).reversed().
thenComparing(Comparator.comparing(Batch::getDeliveryLocation).
thenComparing(Comparator.comparing(Batch::getPatientName)))).collect(Collectors.toList());
Map<Date, Map<String, Map<String, List<Batch>>>> result = sortedList.stream().
collect(Collectors.groupingBy(Batch::getDeliveryDateTime,
Collectors.groupingBy(Batch::getDeliveryLocation,
Collectors.groupingBy(Batch::getPatientName,
Collectors.toList()))));
System.out.println("Batches : " + result);
}
private static List<Batch> generateBatches() {
//DB call to fetch list of objects
}
A TreeSet collection can be used in this context. A comparator object for TreeMap is designed as follows:
class BatchSorter implements Comparator<Batch>{
#Override
public int compare(Batch b1, Batch b2) {
if(b1.getDeliveryDateTime().after(b2.getDeliveryDateTime())){
return 1;
}
else if(b1.getDeliveryDateTime().before(b2.getDeliveryDateTime())){
return -1;
}
else{ // if 2 dates are equal
if(b1.getDeliveryLocation().compareTo(b2.getDeliveryLocation())>0){
return 1;
}
else if(b1.getDeliveryLocation().compareTo(b2.getDeliveryLocation())<0){
return -1;
}
else{
return(b1.getPatientName().compareTo(b2.getPatientName())); // If location names are equal
}
}
}
}
This can be used in TreeSet as follows:
TreeSet<Batch> ts = new TreeSet<Batch>(new BatchSorter());
Related
I'm using Jackson in Spring MVC application. I want to use a String value as key name for Java POJO --> JSON
"record": {
"<Dynamic record name String>": {
"value": {
....
}
}
}
So the dynamic record name String could be "abcd","xyz" or any other string value. How can I define my "record" POJO to have a key like that ?
Unfortunately, you cannot have dynamic fields in Java classes (unlike some other languages), so you have two choices:
Using Maps
Using JSON objects (i.e. JsonNode in case of Jackson)
Suppose, you have a data like this:
{
"record": {
"jon-skeet": {
"name": "Jon Skeet",
"rep": 982706
},
"darin-dimitrov": {
"name": "Darin Dimitrov",
"rep": 762173
},
"novice-user": {
"name": "Novice User",
"rep": 766
}
}
}
Create two classes to capture it, one for user and another for the object itself:
User.java:
public class User {
private String name;
private Long rep;
public String getName() { return name; }
public void setName(String name) { this.name = name; }
public Long getRep() { return rep; }
public void setRep(Long rep) { this.rep = rep; }
#Override
public String toString() {
return "User{" +
"name='" + name + '\'' +
", rep=" + rep +
'}';
}
}
Data.java:
public class Data {
private Map<String, User> record;
public Map<String, User> getRecord() { return record; }
public void setRecord(Map<String, User> record) { this.record = record; }
#Override
public String toString() {
return "Data{" +
"record=" + record +
'}';
}
}
Now, parse the JSON (I assume there is a data.json file in the root of your classpath):
public class App {
public static void main(String[] args) throws Exception {
final ObjectMapper objectMapper = new ObjectMapper();
System.out.println(objectMapper.readValue(App.class.getResourceAsStream("/data.json"), Data.class));
System.out.println(objectMapper.readTree(App.class.getResourceAsStream("/data.json")));
}
}
This will output:
Data{record={jon-skeet=User{name='Jon Skeet', rep=982706}, darin-dimitrov=User{name='Darin Dimitrov', rep=762173}, novice-user=User{name='Novice User', rep=766}}}
{"record":{"jon-skeet":{"name":"Jon Skeet","rep":982706},"darin-dimitrov":{"name":"Darin Dimitrov","rep":762173},"novice-user":{"name":"Novice User","rep":766}}}
In case of a Map you can use some static classes, like User in this case, or go completely dynamic by using Maps of Maps (Map<String, Map<String, ...>>. However, if you find yourself using too much maps, consider switching to JsonNodes. Basically, they are the same as Map and "invented" specifically for highly dynamic data. Though, you'll have some hard time working with them later...
Take a look at a complete example, I've prepared for you here.
This is in Kotlin but I have found a solution to the same problem using Jackson.
You don't need the root node "record", so you will need to get rid of it or start one node deeper(you're on your own there) but to turn the list of records that are children of their id into a list of records with id in the object follows:
val node = ObjectMapper().reader().readTree(json)
val recordList = mutableListOf<Record>()
node.fields().iterator().forEach {
val record = record(
it.key,
it.value.get("name").asText(),
it.value.get("rep").asText()
)
recordList.add(event)
}
node.fields() returns a map of children(also maps)
iterating through the parent map you will get the id from the key and then the nested data is in the value (which is another map)
each child of fields is key : value where
key = record id
value = nested data (map)
This solution, you don't need multiple classes to deserialize a list of classes.
I have my data in this format:
{
"0" : {"a": {}}, {"b": {}}, ...
"1" : {"c": {}}, {"d": {}}, ...
.
.
.
}
I am able to capture it into a map using the dynamic capture feature of jackson by using #JsonAnySetter annotation.
public class Destination{
Map<String, Object> destination = new LinkedHashMap<>();
#JsonAnySetter
void setDestination(String key, Object value) {
destination.put(key, value);
}
}
I have this class (in Java), which I want to use in Spark (1.6):
public class Aggregation {
private Map<String, Integer> counts;
public Aggregation() {
counts = new HashMap<String, Integer>();
}
public Aggregation add(Aggregation ia) {
String key = buildCountString(ia);
addKey(key);
return this;
}
private void addKey(String key, int cnt) {
if(counts.containsKey(key)) {
counts.put(key, counts.get(key) + cnt);
}
else {
counts.put(key, cnt);
}
}
private void addKey(String key) {
addKey(key, 1);
}
public Aggregation merge(Aggregation agg) {
for(Entry<String, Integer> e: agg.counts.entrySet()) {
this.addKey(e.getKey(), e.getValue());
}
return this;
}
private String buildCountString(Aggregation rec) {
...
}
}
When starting Spark I enabled Kyro and added this class (in Scala):
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
conf.registerKryoClasses(Array(classOf[Aggregation]))
And I want to use it with Spark aggregate like this (Scala):
rdd.aggregate(new InteractionAggregation)((agg, rec) => agg.add(rec), (a, b) => a.merge(b) )
Somehow this raises a "Task not serializable" exception.
But when I use the class with map and reduce, everything works fine:
val rdd2= interactionObjects.map( _ => new InteractionAggregation())
rdd2.reduce((a,b) => a.merge(b))
println(rdd2.count())
Do you have an idea why the error occurs with aggregate but not with map/reduce?
Thanks and regards!
Your Aggregation class should implement Serializable. When you call aggregate, the driver sends your (new Aggregation()) object to all workers, which results in a serialization error.
Our code has several processors, each one having several api methods, where each method is overloaded also with same method that can accept collection.
For example:
public class Foo {
public X foo(Y y){...}
public Collection<X> foo(Collection<Y> y){... // iterate and execute foo(y) ... }
public Z bar(W w){...}
public Collection<Z> bar(Collection<W> w){... // iterate and execute bar(w) ... }
}
public class Other{
// also method and method on collection
}
Naturally, those methods on collections are actually duplication code of iteration.
What we are looking for, is kind of way to make some pattern or use generics, so the iteration over collection will be implemented once, also for that need a way to somehow pass the method name.
I'd suggest Startegy pattern. And do something like:
public interface Transformer<X, Y> {
Y transform( X input );
}
class Processor {
public <X,Y> Collection<Y> process( Collection<X> input, Transformer<X, Y> transformer) {
Collection<Y> ret = new LinkedList<Y>();
// generic loop, delegating transformation to specific transformer
for( X x : input) {
ret.add( transformer.transform( x ) );
}
return ret;
}
}
Example:
public static void main( String[] args ) {
List<String> strings = new LinkedList<String>();
strings.add( "1" );
strings.add( "2" );
strings.add( "3" );
Processor p = new Processor();
Collection<Integer> numbers = p.process( strings, new Transformer<String, Integer>() {
#Override
public Integer transform( String input ) {
return Integer.parseInt( input );
}
} );
}
I can't see how reflection could help here. You're trying to replace something as trivial as
public Collection<X> foo(Collection<Y> y) {
List<X> result = Lists.newArrayList();
for (Y e : y) result.add(foo(e));
return result;
}
by something probably much slower. I don't think that saving those 3 lines (several times) is worth it, but you might want to try either annotation processing (possibly without using annotations) or dynamic code generation. In both cases you'd write the original class as is without the collection methods and use a different one containing both the scalar and the collection methods.
Or you might want to make it more functionally styled:
public class Foo {
public final RichFunction<Y, X> foo = new RichFunction<Y, X>() {
X apply(Y y) {
return foo(y);
}
}
// after some refactoring the original method can be made private
// or inlined into the RichFunction
public X foo(Y y){...}
// instead of calling the original method like
// foo.foo(y)
// you'd use
// foo.foo.apply(y)
// which would work for both the scalar and collection methods
}
public abstract class RichFunction<K, V> extends com.google.common.base.Function<K, V> {
Collection<V> apply(Collection<K> keys) {
List<V> result = Lists.newArrayList();
for (K k : keys) result.add(apply(k));
return result;
}
}
RUAKH - I chosed to implement your suggestion for reflection (although, admit, I don't like reflection). So, I did something like the code below THANKS :)
public class Resource {
private static final int CLIENT_CODE_STACK_INDEX;
static {
// Finds out the index of "this code" in the returned stack trace - funny but it differs in JDK 1.5 and 1.6
int i = 0;
for (StackTraceElement ste : Thread.currentThread().getStackTrace()) {
i++;
if (ste.getClassName().equals(Resource.class.getName())) {
break;
}
}
CLIENT_CODE_STACK_INDEX = i;
}
public static String getCurrentMethodName() {
return Thread.currentThread().getStackTrace()[CLIENT_CODE_STACK_INDEX].getMethodName();
}
protected <IN,OUT> Collection<OUT> doMultiple(String methodName, Collection<IN> inCol, Class<?>... parameterTypes){
Collection<OUT> result = new ArrayList<OUT>();
try {
Method m = this.getClass().getDeclaredMethod(methodName, parameterTypes);
if (inCol==null || inCol.size()==0){
return result;
}
for (IN in : inCol){
Object o = m.invoke(this, in);
result.add((OUT) o);
}
}catch (Exception e){
e.printStackTrace();
}
return result;
}
}
public class FirstResource extends Resource{
public String doSomeThing(Integer i){
// LOTS OF LOGIC
return i.toString();
}
public Collection<String> doSomeThing(Collection<Integer> ints){
return doMultiple(getCurrentMethodName(), ints, Integer.class);
}
}
You should use Strategy pattern. By using Strategy pattern you can omit the usage if/else which makes the code more complex. Where strategy pattern creates less coupled code which is much simpler. By using Strategy pattern you can achieve more ways to configure code dynamically. So I would like to suggest you to use Strategy pattern.
I have created a Vector object to store data in Table object as Vector<Table>. Vector<Table> contains components as below.
[Vector<Record> records, String tableName, String keyColumnName, int recordCount, int columnCount]
I need to sort tableName in above Vector to my own order and return Vector<Table> with sorted tableNames for other processes.
I have wrote method as below.
private Vector<Table> orderTables(Vector<Table> loadTables) {
List<String> tableNames = new ArrayList<String>();
for (Table table : loadTables) {
String tblName = table.getTableName();
tableNames.add(tblName);
}
Collections.sort(tableNames, new MyComparable());
return null;
}
But I have no idea about how to write Comparator to this. My own sort order is stored in .properties file. I can read it and get value. But I have no idea about how to compare it.
How could I do it?
Before clarification
You need to write a Comparator for Table objects that delegates to the tableName's comparator:
new Comparator<Table>() {
#Override public int compare(Table one, Table two) {
return one.getTableName().compareTo(two.getTableName());
}
}
Note that this will consider Tables that have the same name to be equal. This can mess things up if you put these tables in a HashMap or HashSet. To avoid this, you can detect this case and return one.hashCode() - two.hashCode() if the table names are the same.
Guava's ComparisonChain is a convenient way to write such multi-stage comparisons:
new Comparator<Table>() {
#Override public int compare(Table one, Table two) {
return ComparisonChain.start()
.compare(one.getTableName(), two.getTableName())
.compare(one.hashCode(), two.hashCode())
.result();
}
}
After clarification
Okay, the question is to impose a predefined sorting order rather than sorting the Tables by name. In that case, you need to make a Comparator that is aware of the ordering defined in the .properties file.
One way to achieve this is to initialize a mapping of table names to sorting order indices, and refer that mapping during the comparison. Given the property value:
SORT_ORDER = SALES,SALE_PRODUCTS,EXPENSES,EXPENSES_ITEMS
The mapping should look like:
{
SALES: 0,
SALE_PRODUCTS: 1,
EXPENSES: 2,
EXPENSES_ITEMS: 3
}
Here's what the comparator would look like:
private static class PredefinedOrderComparator implements Comparator<Table> {
public PredefinedOrderComparator() {
// Initialize orderIndex here
}
private final Map<String, Integer> orderIndex;
#Override public int compare(Table one, Table two) {
return orderIndex.get(one.getTableName()) - orderIndex.get(two.getTableName());
}
}
To populate orderIndex from the property value, you need to:
Get the comma-separated list using getProperty() as you mentioned
Split that value on comma (I recommend using Guava's Splitter, but String.split or others will work too)
Initialize a new HashMap<String, Integer> and an int index = 0
Iterate through the split tokens, map the current token to index and increment index
Note the implicit assumption that none of the table names have a comma in it.
public class MyComparable implements Comparator<Table>{
#Override
public int compare(Table table1, Table table2) {
return (table1.getTableName().compareTo(table2.getTableName());
}
}
make sure that you have overridden the hashcode and equals in Table class to achieve this.
I wrote you a very simple example on how to work with a Comparator. If you create a class called Main, copy paste below contents in it, compile and run it, you can see what's going on.
A comparator just needs to implement an interface. For this it needs to implement one method (public int compare(T arg0, T arg1). There you specify how a collection will get sorted; in this case according to the alfabet.
I hope this helps you.
import java.util.*;
public class Main {
public static void main(String[] args) {
System.out.println("Start\n");
List<Item> items = new ArrayList<Item>();
for(String s : new String[]{"mzeaez", "xcxv", "hjkhk", "azasq", "iopiop"}) {
items.add(createItem(s));
}
System.out.println("Items before sort:");
System.out.println(Item.toString(items));
Collections.sort(items, new ItemComparator());
System.out.println("Items after sort:");
System.out.println(Item.toString(items));
System.out.println("End");
}
private static Item createItem(String s) {
Item item = new Item();
item.setS(s);
return item;
}
}
class Item {
private String s;
public String getS() {
return s;
}
public void setS(String s) {
this.s = s;
}
#Override
public String toString() {
return "Item: " + s;
}
public static String toString(Collection<Item> items) {
String s = "";
for(Item item : items) {
s += item + "\n";
}
return s;
}
}
class ItemComparator implements Comparator<Item> {
#Override
public int compare(Item item1, Item item2) {
return item1.getS().compareTo(item2.getS());
}
}
I have a class, the outline of which is basically listed below.
import org.apache.commons.math.stat.Frequency;
public class WebUsageLog {
private Collection<LogLine> logLines;
private Collection<Date> dates;
WebUsageLog() {
this.logLines = new ArrayList<LogLine>();
this.dates = new ArrayList<Date>();
}
SortedMap<Double, String> getFrequencyOfVisitedSites() {
SortedMap<Double, String> frequencyMap = new TreeMap<Double, String>(Collections.reverseOrder()); //we reverse order to sort from the highest percentage to the lowest.
Collection<String> domains = new HashSet<String>();
Frequency freq = new Frequency();
for (LogLine line : this.logLines) {
freq.addValue(line.getVisitedDomain());
domains.add(line.getVisitedDomain());
}
for (String domain : domains) {
frequencyMap.put(freq.getPct(domain), domain);
}
return frequencyMap;
}
}
The intention of this application is to allow our Human Resources folks to be able to view Web Usage Logs we send to them. However, I'm sure that over time, I'd like to be able to offer the option to view not only the frequency of visited sites, but also other members of LogLine (things like the frequency of assigned categories, accessed types [text/html, img/jpeg, etc...] filter verdicts, and so on). Ideally, I'd like to avoid writing individual methods for compilation of data for each of those types, and they could each end up looking nearly identical to the getFrequencyOfVisitedSites() method.
So, my question is twofold: first, can you see anywhere where this method should be improved, from a mechanical standpoint? And secondly, how would you make this method more generic, so that it might be able to handle an arbitrary set of data?
This is basically the same thing as Eugene's solution, I just left all the frequency calculation stuff in the original method and use the strategy only for getting the field to work on.
If you don't like enums you could certainly do this with an interface instead.
public class WebUsageLog {
private Collection<LogLine> logLines;
private Collection<Date> dates;
WebUsageLog() {
this.logLines = new ArrayList<LogLine>();
this.dates = new ArrayList<Date>();
}
SortedMap<Double, String> getFrequency(LineProperty property) {
SortedMap<Double, String> frequencyMap = new TreeMap<Double, String>(Collections.reverseOrder()); //we reverse order to sort from the highest percentage to the lowest.
Collection<String> values = new HashSet<String>();
Frequency freq = new Frequency();
for (LogLine line : this.logLines) {
freq.addValue(property.getValue(line));
values.add(property.getValue(line));
}
for (String value : values) {
frequencyMap.put(freq.getPct(value), value);
}
return frequencyMap;
}
public enum LineProperty {
VISITED_DOMAIN {
#Override
public String getValue(LogLine line) {
return line.getVisitedDomain();
}
},
CATEGORY {
#Override
public String getValue(LogLine line) {
return line.getCategory();
}
},
VERDICT {
#Override
public String getValue(LogLine line) {
return line.getVerdict();
}
};
public abstract String getValue(LogLine line);
}
}
Then given an instance of WebUsageLog you could call it like this:
WebUsageLog usageLog = ...
SortedMap<Double, String> visitedSiteFrequency = usageLog.getFrequency(VISITED_DOMAIN);
SortedMap<Double, String> categoryFrequency = usageLog.getFrequency(CATEGORY);
I'd introduce an abstraction like "data processor" for each computation type, so you can just call individual processors for each line:
...
void process(Collection<Processor> processors) {
for (LogLine line : this.logLines) {
for (Processor processor : processors) {
processor.process();
}
}
for (Processor processor : processors) {
processor.complete();
}
}
...
public interface Processor {
public void process(LogLine line);
public void complete();
}
public class FrequencyProcessor implements Processor {
SortedMap<Double, String> frequencyMap = new TreeMap<Double, String>(Collections.reverseOrder()); //we reverse order to sort from the highest percentage to the lowest.
Collection<String> domains = new HashSet<String>();
Frequency freq = new Frequency();
public void process(LogLine line)
String property = getProperty(line);
freq.addValue(property);
domains.add(property);
}
protected String getProperty(LogLine line) {
return line.getVisitedDomain();
}
public void complete()
for (String domain : domains) {
frequencyMap.put(freq.getPct(domain), domain);
}
}
}
You could also change a LogLine API to be more like a Map, i.e. instead of strongly typed line.getVisitedDomain() could use line.get("VisitedDomain"), then you can write a generic FrequencyProcessor for all properties and just pass a property name in its constructor.