I have a CSV file that links a region to a zip code. It looks like this (lowest-zip, highest-zip, region):
1600,1799,1
1800,1899,1
4300,4699,1
2820,2839,2
2850,2879,2
2930,2949,2
5600,5819,3
5850,6514,3
6516,6549,3
6800,6849,3
I need a function that returns the region based on the zip code. Something like this:
foo = getRegion(1600) // foo is set to 1
bar = getRegion(1642) // bar is set to 1
baz = getRegion(4351) // baz is set to 2
qux = getRegion(1211) // qux is set to null
The way I currently implemented this is using a HashMap. When I read the CSV I iterate over every value between 1600 and 1799 and create a key-value pair for each zip code / region combination and repeat that for every row in the CSV. The result is a HashMap looking like this:
1600,1
1601,1
1602,1
...
1799,1
1800,2
1801,2
...
This creates a large HashMap, which does work. Is there a more (memory) efficient implementation than exploding this small table to a large HashMap?
Something like below will help -
class ZipRange {
int start;
int end;
}
// Fill up this map parsing through csv
Map<ZipRange, Integer> zipToRegion;
int zipToSearch = 2870;
// Create method which returns integer which corresponds to region
for (ZipRange zip : zipToRegion.keySet()) {
if (zipToSearch >= zip.start && zipToSearch <= zip.end) {
return zipToRegion.get(zip);
}
}
return -1;
I think you want a segment tree
Related
I have a YAML file that looks like this:
foo:
bar:
- entry1: 1
entry2: a
- entry1: 2
entry2: b
(Where the actual list is much longer.) I'm reading this file using Apache Configuration2's YAMLConfiguration. I can see the data in the internal data structures used in Apache Configuration2, but I can't figure out how to get this list out. I actually have a class that matches the structure of the list elements, which is what I'd really like to read into:
class MyListEntry {
public int entry1;
public String entry2;
}
How can I get the data YAMLConfiguration into a List<MyListEntry>?
Here's the solution I found (note this works for any HierarchicalConfiguration, not just YAMLConfiguration)
// this will return a list of List<HierarchicalConfiguration<ImmutableNode>>, one entry for each element of the list
var subConfigList = hierarchicalConfig.configurationsAt("foo.bar");
List<MyListEntry> myListEntries = new ArrayList<>(subConfigList.size());
// iterate over the subconfigs and pull out the specific values of interest
for(var subConfig : subconfigs) {
MyListEntry myListEntry = new MyListEntry();
myListEntry.entry1 = subConfig.getInt("entry1");
myListEntry.entry2 = subConfig.getString("entry2");
myListEntries.add(myListEntry);
}
I searched the site and didn't find something similar. I'm newbie to using the Java stream, but I understand that it's a replacement for a loop command. However, I would like to know if there is a way to filter a CSV file using stream, as shown below, where only the repeated records are included in the result and grouped by the Center field.
Initial CSV file
Final result
In addition, the same pair cannot appear in the final result inversely, as shown in the table below:
This shouldn't happen
Is there a way to do it using stream and grouping at the same time, since theoretically, two loops would be needed to perform the task?
Thanks in advance.
You can do it in one pass as a stream with O(n) efficiency:
class PersonKey {
// have a field for every column that is used to detect duplicates
String center, name, mother, birthdate;
public PersonKey(String line) {
// implement String constructor
}
// implement equals and hashCode using all fields
}
List<String> lines; // the input
Set<PersonKey> seen = new HashSet<>();
List<String> unique = lines.stream()
.filter(p -> !seen.add(new PersonKey(p))
.distinct()
.collect(toList());
The trick here is that a HashSet has constant time operations and its add() method returns false if the value being added is already in the set, true otherwise.
What I understood from your examples is you consider an entry as duplicate if all the attributes have same value except the ID. You can use anymatch for this:
list.stream().filter(x ->
list.stream().anyMatch(y -> isDuplicate(x, y))).collect(Collectors.toList())
So what does the isDuplicate(x,y) do?
This returns a boolean. You can check whether all the entries have same value except the id in this method:
private boolean isDuplicate(CsvEntry x, CsvEntry y) {
return !x.getId().equals(y.getId())
&& x.getName().equals(y.getName())
&& x.getMother().equals(y.getMother())
&& x.getBirth().equals(y.getBirth());
}
I've assumed you've taken all the entries as String. Change the checks according to the type. This will give you the duplicate entries with their corresponding ID
My sample request
{
"requestModel":{
"CUSTID": "100"
},
"returnParameters":[
{
"name":"NETWORK/NETID",
"datatype":"String",
"order":"asc",
"sequence":1
},
{
"name":"INFODATA/NAME",
"datatype":"String",
"order":"asc",
"sequence":1
},
{
"name":"SOURCE/SYSTEM",
"datatype":"int",
"order":"asc",
"sequence":2
},
]
}
Sample Response
Below is my dynamically generated Map format of json response[Response parameters will be different each time based on the request params],
"responseModel":{
"documents": [
{
"NETWORK":[
{"NETID":"1234"},
{"ACT":"300"}
],
"SOURCE": {
"SYSTEM":"50"
},
"INFODATA":{
"NAME":"PHIL"
}
},
{
"NETWORK":[
{"NETID":"1234"},
{"ACT":"300"}
],
"SOURCE": {
"SYSTEM":"100"
},
"INFODATA":{
"NAME":"PHIL"
}
}
]
}
Problem Statement
I need to do multi level sorting based on the "returnParameters" in the request which is dynamic...
"order" indicates ascending (or) descending and sequence indicates the the priority for ordering like (group by in sql query)
Code
Map<String,Object> documentList = new HashMap<String,Object>();
JSONObject jsonObject= new JSONObject(response.getContent());
response.getContent() -> is nothing but it contains the above json response in Map format.
Now I converting the map to list of json object
JSONArray jsonArray= (JSONArray)jsonObject.get("documents");
ArrayList<JSONObject> list = new ArrayList<>();
for(int i=0;i<jsonArray.length();i++){
list.add((JSONObject) jsonArray.get(i));
}
Collections.sort(list, new ResponseSorter());
public class ResponseSorter implements Comparator<JSONObject> {
#Override
public int compare(JSONObject o1,JSONObject o2){
String s1= (String)((JSONObject) o1.get("NETWORK")).get("NETID");
String s2= (String)((JSONObject) o2.get("NETWORK")).get("NETID");
int i1=Integer.parseInt(s1);
int i2=Integer.parseInt(s2);
return i1-i2;
}
}
I'm stuck here to proceed further. Created one for Integer comparator, .Should I create for each dataType? also
I need to dynamically construct the composite comparator by parsing the "retunrParameters" , below sample is hard coded, how to create dynamically??
(String)((JSONObject) o1.get("NETWORK")).get("NETID"); -> this should be dynamically framed , since "returnParameters" are also dynamic in nature.[NETWORK & NETID may not be come in another request],so my comparator should be capable enough to frame the keys in runtime
Would anyone able to assist me to create composite comparator in runtime for sorting?
NOTE:- Java Pojo cannot be created as the response is dynamic nature
In your case a simple comparator that's provided with the sort parameters might be easier to understand than a bunch of nested comparators.
Basically you'd do something like this:
class ReturnParameterComparator implements Comparator<JSONObject> {
private List<ReturnParameter> params; //set via constructor
public int compare( JSONObject left, JSONObject right) {
int result = 0;
for( ReturnParameter p : params ) {
//how exactly you get those values depends on the actual structure of your data and parameters
String leftValueStr = left.get( p );
String rightValueStr = right.get( p );
switch( p.datatype ) {
case "String":
result = String.compare( leftValueStr, rightValueStr );
break;
case "int":
//convert and then compare - I'll leave the rest for you
}
//invert the result if the order is descending
if( "desc".equals(p.order ) {
result += -1;
}
//the values are not equal so return the order, otherwise continue with the next parameter
if( result != 0 ) {
return result;
}
}
//at this point all values are to be considered equal, otherwise we'd have returned already (from the loop body)
return 0;
}
}
Note that this is just a stub to get you started. You'll need to add quite a few things:
how to correctly use the parameters to extract the values from the json objects
how to convert the data based on the type
how to handle nulls, missing or incompatible data (e.g. if a value should be sorted as "int" but it can't be parsed)
Adding all those would be way too much for the scope of this question and depends on your data and requirements anyway.
EDITED after additional questions in comments and additional info in description
You have a couple of steps you need to do here to get to the solution:
You want to have the sorting be dynamic based on the value of the property sequence in the request. So you need to parse the names of those returnParameters and put them in order. Below I map them to a List where each String[] has the name and order (asc/desc). The list will be ordered using the value of sequence:
List<String[]> sortParams = params.stream() // params is a List<JSONObject>
.filter(json -> json.containsKey("sequence")) // filter those that have "sequence" attribute
.sorted( sequence ) // sorting using Comparator called sequence
.map(jsonObj -> new String[]{jsonObj.get("name").toString(), jsonObj.get("order").toString()} )
.collect(Collectors.toList());
Before this you'll map the objects in the returnParameters array in the request to a List first.Then the stream is processed by 1. filtering the JSONObjects to only keep those that have prop sequence, 2. sorting the JSONObjects using comparator below. 3. from each JSONObject get "name" & "order" and put them in a String[], 4. generate a list with those Arrays. This list will be ordered in the order of attributes with priority 1 first, then priority 2, etc, so it will be ordered in the same way you want the JSONObjects ordered in the end.
Comparator<JSONObject> sequence = Comparator.comparingInt(
jsonObj -> Integer.valueOf( jsonObj.get("sequence").toString() )
);
So for your example, sortParams would look like: List( String[]{"NETWORK/NETID", "asc"}, String[]{""INFODATA/NAME", "asc"}, String[]{"SOURCE/SYSTEM", "asc"} )
Then you need to write a method that takes two params: a JSONObject and a String (the path to the property) and returns the value of that property. Originally I advised you to use JSONAware interface and then figure out the sub-class, but let's forget about that for now.
I am not going to write this method for you. Just keep in mind that .get(key) method of JSON.Simple always yields an Object. Write a method with this signature:
public String findSortValue(JSONObject doc, String path){
// split the path
// find the parent
// cast it (parent was returned as an Object of type Object)
// find the child
return value;
}
Write a generic individual comparator (that compares values of just one sort attribute at a time) and figures out if it's an Int, Date or regular String. I would write this as a regular method so it'll be easier to combine everything later on. Since you had so many questions about this I've made an example:
int individualComparator(String s1, String s2){
int compResult = 0;
try{
int numeric1 = Integer.parseInt(s1);
int numeric2 = Integer.parseInt(s2);
compResult = numeric1 - numeric2; // if this point was reached both values could be parsed
} catch (NumberFormatException nfe){
// if the catch block is reached they weren't numeric
try{
DateTime date1 = DateTime.parse(s1);
DateTime date2 = DateTime.parse(s2);
compResult = date1.compareTo(date2); // compareTo method of joda.time, the library I'm using
} catch (IllegalArgumentException iae){
//if this catch block is reached they weren't dates either
compResult = s1.compareTo(s2);
}
}
return compResult;
};
Write an overall Comparator that combines everything
Comparator<JSONObject> overAllComparator = (jsonObj1, jsonObj2) -> {
List<String[]> sortValuesList = sortParams.stream()
.map(path -> new String[]{ findValueByName(jsonObj1, path), findValueByName(jsonObj2, path) } )
.collect(Collectors.toList());
//assuming we always have 3 attributes to sort on
int comp1 = individualComparator(sortValuesList.get(0)[0], sortValuesList.get(0)[1]);
int comp2 = individualComparator(sortValuesList.get(1)[0], sortValuesList.get(1)[1]);
int comp3 = individualComparator(sortValuesList.get(2)[0], sortValuesList.get(2)[1]);
int result = 0;
if (comp1 != 0){
result = comp1;
} else if (comp2 != 0){
result = comp2;
} else{
result = comp3;
}
return result;
};
This Comparator is written lambda-style, for more info https://www.mkyong.com/java8/java-8-lambda-comparator-example/ .
First it takes the ordered list of sortParams we made in step 1 and for each returns an array where position 0 has the value for jsonObj1, and position 1 has the value for jsonObj2 and collects it in sortValuesList. Then for each attribute to sort on, it get the result of the individualComparatormethod. Then it goes down the line and returns as result of the overall comparison the first one that doesn't result in 0 (when a comparator results in 0 both values are equal).
The only thing that's missing now is the asc/desc value from the request. You can add that by chainingint comp1 = individualComparator(sortValuesList.get(0)[0], sortValuesList.get(0)[1]); with a simple method that takes an int & a String and multiplies the int by -1 if the String equals "desc". (Remember that in sortParams we added the value for order on position 1 of the array).
Because the first list we made, sortParams was ordered based on the priority indicated in the request, and we always did evertything in the order of this list, the result is a multi-sort in this order. It is generic & will be determined dynamically by the contents of returnParams in the request. You can apply it to your list of JSONObjects by using Collections.sort()
My suggestion: learn about:
Comparator.comparing which allows you to build your comparator by specifying the key extractor
Comparator.thanComparing which allows you to chain multiple comparators. The comparators later in the chain are called only if predecessors say the objects are equal
A tutorial if you need one: https://www.baeldung.com/java-8-comparator-comparing
I have a test.csv file that is formatted as:
Home,Owner,Lat,Long
5th Street,John,5.6765,-6.56464564
7th Street,Bob,7.75,-4.4534564
9th Street,Kyle,4.64,-9.566467364
10th Street,Jim,14.234,-2.5667564
I have a hashmap that reads a file that contains the same header contents such as the CSV, just a different format, with no accompanying data.
In example:
Map<Integer, String> container = new HashMap<>();
where,
Key, Value
[0][NULL]
[1][Owner]
[2][Lat]
[3][NULL]
I have also created a second hash map that:
BufferedReader reader = new BufferedReader (new FileReader("test.csv"));
CSVParser parser = new CSVParser(reader, CSVFormat.DEFAULT);
Boolean headerParsed = false;
CSVRecord headerRecord = null;
int i;
Map<String,String> value = new HashMap<>();
for (final CSVRecord record : parser) {
if (!headerParsed = false) {
headerRecord = record;
headerParsed = true;
}
for (i =0; i< record.size(); i++) {
value.put (headerRecord.get(0), record.get(0));
}
}
I want to read and compare the hashmap, if the container map has a value that is in the value map, then I put that value in to a corresponding object.
example object
public DataSet (//args) {
this.home
this.owner
this.lat
this.longitude
}
I want to create a function where the data is set inside the object when the hashmaps are compared and when a value map key is equal to a contain map key, and the value is placed is set into the object. Something really simply that is efficient at handling the setting as well.
Please note: I made the CSV header and the rows finite, in real life, the CSV could have x number of fields(Home,Owner,Lat,Long,houseType,houseColor, ect..), and a n number of values associated to those fields
First off, your approach to this problem is too unnecessarily long. From what I see, all you are trying to do is this:
Select a two columns from a CSV file, and add them to a data structure. I highlighted the word two because in a map, you have a key and a value. One column becomes the key, and the other becomes the value.
What you should do instead:
Import the names of columns you wish to add to the data structure into two strings. (You may read them from a file).
Iterate over the CSV file using the CSVParser class that you did.
Store the value corresponding to the first desired column in a string, repeat with the value corresponding to the second desired column, and push them both into a DataSet object, and push the DataSet object into a List<DataSet>.
If you prefer to stick to your way of solving the problem:
Basically, the empty file is supposed to hold just the headers (column names), and that's why you named the corresponding hash map containers. The second file is supposed to contain the values and hence you named the corresponding hash map values.
First off, where you say
if (!headerParsed = false) {
headerRecord = record;
headerParsed = true;
}
you probably mean to say
if (!headerParsed) {
headerRecord = record;
headerParsed = true;
}
and where you say
for (i =0; i< record.size(); i++) {
value.put(headerRecord.get(0), record.get(0));
}
you probably mean
for (i =0; i< record.size(); i++) {
value.put(headerRecord.get(i), record.get(i));
}
i.e. You iterate over one record and store the value corresponding to each column.
Now I haven't tried this code on my desktop, but since the for loop also iterates over Home and Longitude, I think it should create an error and you should add an extra check before calling value.put (i.e. value.put("Home", "5th Street") should create an error I suppose). Wrap it inside an if conditional and check of the headerRecord(i) even exists in the containers hash map.
for (i =0; i< record.size(); i++) {
if (container[headerRecord.get(i)] != NULL) {
value.put(headerRecord.get(i), record.get(i));
}
}
Now thing is, that the data structure itself depends on which values from the containers hash map you want to store. It could be Home and Lat, or Owner and Long. So we are stuck. How about you create a data structure like below:
struct DataSet {
string val1;
string val2;
}
Also, note that this DataSet is only for storing ONE row. For storing information from multiple rows, you need to create a Linked List of DataSet.
Lastly, the container file contains ALL the column names. Not all these columns will be stored in the Data Set (i.e. You chose to NULL Home and Long. You could have chosen to NULL Owner and Lat), hence the header file is not what you need to make this decision.
If you think about it, just iterate over the values hash map and store the first value in string val1 and the second value in val2.
List<DataSet> myList;
DataSet row;
Iterator it = values.entrySet().iterator();
while (it.hasNext()) {
Map.Entry pair = (Map.Entry)it.next();
row.val1 = pair.getKey();
row.val2 = pair.getValue();
myList.add(row);
it.remove();
}
I hope this helps.
I need to be able to sort multiple intermediate result sets and enter them to a file in sorted order. Sort is based on a single column/key value. Each result set record will be list of values (like a record in a table)
The intermediate result sets are got by querying entirely different databases.
The intermediate result sets are already sorted based on some key(or column). They need to be combined and sorted again on the same key(or column) before writing it to a file.
Since these result sets can be massive(order of MBs) this cannot be done in memory.
My Solution broadly :
To use a hash and a random access file . Since the result sets are already sorted, when retrieving the result sets , I will store the sorted column values as keys in a hashmap.The value in the hashmap will be a address in the random access file where every record associated with that column value will be stored.
Any ideas ?
Have a pointer into every set, initially pointing to the first entry
Then choose the next result from the set, that offers the lowest entry
Write this entry to the file and increment the corresponding pointer
This approach has basically no overhead and time is O(n). (it's Merge-Sort, btw)
Edit
To clarify: It's the merge part of merge sort.
If you've got 2 pre-sorted result sets, you should be able to iterate them concurrently while writing the output file. You just need to compare the current row in each set:
Simple example (not ready for copy-and-paste use!):
ResultSet a,b;
//fetch a and b
a.first();
b.first();
while (!a.isAfterLast() || !b.isAfterLast()) {
Integer valueA = null;
Integer valueB = null;
if (a.isAfterLast()) {
writeToFile(b);
b.next();
}
else if (b.isAfterLast()) {
writeToFile(a);
a.next();
} else {
int valueA = a.getInt("SORT_PROPERTY");
int valueB = b.getInt("SORT_PROPERTY");
if (valueA < valueB) {
writeToFile(a);
a.next();
} else {
writeToFile(b);
b.next();
}
}
}
Sounds like you are looking for an implementation of the Balance Line algorithm.