Only take most recent line from CSV when a value appears twice

Only take most recent line from CSV when a value appears twice - java

I'm working with a CSV file in Mule that could look something like the following:
ID|LastUpdated
01|01/12/2016 09:00:00
01|01/12/2016 09:45:00
02|01/12/2016 09:00:00
02|01/12/2016 09:45:00
03|01/12/2016 09:00:00
I'm trying to find a way of stripping out all duplicate occurrences of an ID value by taking only the most recent one, determined by the LastUpdated column. I'm trying to achieve this using DataWeave but have so far had no luck. I'm open to writing the logic in to a custom Java class but have limited knowledge of how to do that as well.
My desired output is something like the following:
ID|LastUpdated
01|01/12/2016 09:45:00
02|01/12/2016 09:45:00
03|01/12/2016 09:00:00
Any help or guidance would be appreciated.
Edit: it's worth noting that I expect the inbound file to be quite large (up to 000's of rows) so I need to be aware of performance in my solution
Edit: a solution using DataWeave can be found on the Mulesoft forum here.

If the dates/hours are always sorted into your CSV like in the example you gave the you can keep a reference on all your ID as keys into a Map and just update the value corresponding to the ids:
public static void main(String[] arg){
// I replace all the CSV reading by this list for the example
ArrayList<String> lines = new ArrayList<>();
lines.add("01|01/12/2016 09:00:00");
lines.add("01|01/12/2016 09:45:00");
lines.add("02|01/12/2016 09:00:00");
lines.add("02|01/12/2016 09:45:00");
lines.add("03|01/12/2016 09:00:00");
Iterator it = lines.iterator();
Map<String, String> lastLines = new HashMap<String, String>();
while (it.hasNext()) { // Iterator on the CVS lines here
String s = (String)it.next();
String id = s.substring(0, s.indexOf("|"));
String val = s.substring(s.indexOf("|") + 1 , s.length());
lastLines.put(id, val);
}
Iterator<String> keys = lastLines.keySet().iterator();
while (keys.hasNext()) {
String id = (String) keys.next();
System.out.println(id + "|" + lastLines.get(id));
}
}
This produce :
01|01/12/2016 09:45:00
02|01/12/2016 09:45:00
03|01/12/2016 09:00:00
If the CSV records can be in any order then you need to add a validation of the dates to keep only the most recent for each id.
private static final SimpleDateFormat sdf = new SimpleDateFormat("dd/MM/yyyy hh:mm:ss");
public static void main(String... args) {
// I replace all the CSV reading by this list for the example
ArrayList<String> lines = new ArrayList<>();
lines.add("01|01/12/2016 09:45:00");
lines.add("01|01/12/2016 09:00:00");
lines.add("02|01/12/2016 09:00:00");
lines.add("02|01/12/2016 09:45:00");
lines.add("03|01/12/2016 09:00:00");
Iterator it = lines.iterator();
Map<String, String> lastLines = new HashMap<String, String>();
while (it.hasNext()) { // Iterator on the CVS lines here
String s = (String)it.next();
String id = s.substring(0, s.indexOf("|"));
String val = s.substring(s.indexOf("|") + 1 , s.length());
if(lastLines.containsKey(id)){
try{
Date storeDate = sdf.parse(lastLines.get(id));
Date readDate = sdf.parse(val);
if(readDate.getTime() > storeDate.getTime())
lastLines.put(id, val);
}catch(ParseException pe){
pe.printStackTrace();
}
}else{
lastLines.put(id, val);
}
}
Iterator<String> keys = lastLines.keySet().iterator();
while (keys.hasNext()) {
String id = (String) keys.next();
System.out.println(id + "|" + lastLines.get(id));
}
}
I'm not sure about the date format you are currently using. You may need to change the format of the parser"dd/MM/yyyy hh:mm:ss". You can find the related documentation here

Just saw this one and I believe #danw had asked this question on Mule forum too. There is a better way to achieve it with DataWeave.
Check out my answer on mule forum -
http://forums.mulesoft.com/questions/40897/only-take-most-recent-line-from-csv-when-a-value-a.html#answer-40975

Related

Get,Put key and values from nested hashmap

I want to create a nested HashMap which returns the frequency of terms among multiple files. Like,
Map<String, Map<String, Integer>> wordToDocumentMap=new HashMap<>();
I have been able to return the number of times a term appears in a file.
Map<String, Integer> map = new HashMap<>();//for frequecy count
String str = "Wikipedia is a free online encyclopedia, created and edited by
volunteers around the world."; //String str suppose a file a.java
// The query string
String query = "edited Wikipedia volunteers";
// Split the given string and the query string on space
String[] strArr = str.split("\\s+");
String[] queryArr = query.split("\\s+");
// Map to hold the frequency of each word of query in the string
Map<String, Integer> map = new HashMap<>();
for (String q : queryArr) {
for (String s : strArr) {
if (q.equals(s)) {
map.put(q, map.getOrDefault(q, 0) + 1);
}
}
}
// Display the map
System.out.println(map);
In my code its count the frequency of the given query Individually. But I want to Map the query term and its frequency with its filenames. I have searched around the web for a solution but am finding it tough to find a solution that applies to me. Any help would be appreciated!

I hope I'm understanding you correctly.
What you want is to be able to read in a list of files and map the file name to the map you create in the code above. So let's start with your code and let's turn it into a function:
public Map<String, Integer> createFreqMap(String str, String query) {
Map<String, Integer> map = new HashMap<>();//for frequecy count
// The query string
String query = "edited Wikipedia volunteers";
// Split the given string and the query string on space
String[] strArr = str.split("\\s+");
String[] queryArr = query.split("\\s+");
// Map to hold the frequency of each word of query in the string
Map<String, Integer> map = new HashMap<>();
for (String q : queryArr) {
for (String s : strArr) {
if (q.equals(s)) {
map.put(q, map.getOrDefault(q, 0) + 1);
}
}
}
// Display the map
System.out.println(map);
return map;
}
OK so now you have a nifty function that makes a map from a string and a query
Now you're going to want to set up a system for reading in a file to a string.
There are a bunch of ways to do this. You can look here for some ways that work for different java versions: https://stackoverflow.com/a/326440/9789673
lets go with this (assuming >java 11):
String content = Files.readString(path, StandardCharsets.US_ASCII);
Where path is the path to the file you want.
Now we can put it all together:
String[] paths = ["this.txt", "that.txt"]
Map<String, Map<String, Integer>> output = new HashMap<>();
String query = "edited Wikipedia volunteers"; //String query = "hello";
for (int i = 0; i < paths.length; i++) {
String content = Files.readString(paths[i], StandardCharsets.US_ASCII);
output.put(paths[i], createFreqMap(content, query);
}

Mapping several columns from sql to a java object

I am trying to retrieve and process code from JIRA, unfortunately the pieces of information (which are in the Metadata-Plugin) are saved in a column, not a row.
Picture of JIRA-MySQL-Database
The goal is to save this in an object with following attributes:
public class DesiredObject {
private String Object_Key;
private String Aze.kunde.name;
private Long Aze.kunde.schluessel;
private String Aze.projekt.name;
private Long Aze.projekt.schluessel
//getters and setters here
}
My workbench is STS and it's a Spring-Boot-Application.
I can fetch a List of Object-Keys with the JRJC using:
JiraController jiraconnect = new JiraController();
List<JiraProject> jiraprojects = new ArrayList<JiraProject>();
jiraprojects = jiraconnect.findJiraProjects();
This is perfectly working, also the USER_KEY and USER_VALUE are easily retrievable, but I hope there is a better way than to perform
three SQL-Searches for each project and then somehow build an object from all those lists.
I was starting with
for (JiraProject jp : jiraprojects) {
String SQL = "select * from jira_metadata where ENRICHED_OBJECT_KEY = ?";
List<DesiredObject> do = jdbcTemplateObject.query(SQL, new Object[] { "com.atlassian.jira.project.Project:" + jp.getProjectkey() }, XXX);
}
to get a list with every object, but I'm stuck as i can't figure out a ObjectMapper (XXX) who is able to write this into an object.
Usually I go with
object.setter(rs.getString("SQL-Column"));
But that isn't working, as all my columns are called the same. (USER_KEY & USER_VALUE)
The Database is automatically created by JIRA, so I can't "fix" it.
The Object_Keys are unique which is why I tried to use those to collect all the data from my SQL-Table.
I hope all you need to enlighten me is in this post, if not feel free to ask for more!
Edit: Don't worry if there are some 'project' and 'projekt', that's because I gave most of my classes german names and descriptions..

I created a Hashmap with the Objectkey and an unique token in brackets, e.g.: "(1)JIRA".
String SQL = "select * from ao_cc6aeb_jira_metadata";
List<JiraImportObjekt> jioList = jdbcTemplateObject.query(SQL, new JiraImportObjektMapper());
HashMap<String, String> hmap = new HashMap<String, String>();
Integer unique = 1;
for (JiraImportObjekt jio : jioList) {
hmap.put("(" + unique.toString() + ")" + jio.getEnriched_Object_Key(),
jio.getUser_Key() + "(" + jio.getUser_Value() + ")");
unique++;
}
I changed this into a TreeMap
Map<String, String> tmap = new TreeMap<String, String>(hmap);
And then i iterated through that treemap via
String aktuProj = new String();
for (String s : tmap.keySet()) {
if (aktuProj.equals(s.replaceAll("\\([^\\(]*\\)", ""))) {
} else { //Add Element to list and start new Element }
//a lot of other stuff
}
What I did was to put all the data in the right order, iterate through and process everything like I wanted it.
Object hinfo = hmap.get(s);
if (hinfo.toString().replaceAll("\\([^\\(]*\\)", "").equals("aze.kunde.schluessel")) {
Matcher m = Pattern.compile("\\(([^)]+)\\)").matcher(hinfo.toString());
while (m.find()) {
jmo[obj].setAzeKundeSchluessel(Long.parseLong(m.group(1), 10));
// logger.info("AzeKundeSchluessel: " +
// jmo[obj].getAzeKundeSchluessel());
}
} else ...
After the loop I needed to add the last Element.
Now I have a List with the Elements which is easy to use and ready for further steps.
I cut out a lot of code because most of it is customized for my problem.. the roadmap should be enough to solve it though.
Good luck!

Sorting a map by date key in Java

I'm trying to sort a map in java by date key using TreeMap. Here's my code
public static void sort() {
BufferedReader br;
String line;
String[] data;
Date date ;
DateFormat df = new SimpleDateFormat("dd-mm-YYY");
Map<Date,String> map = new TreeMap<Date,String>();
try {
br = new BufferedReader(new FileReader(
"/home/user/Desktop/train/2013-training_set.txt"));
int i=0;
while ((line = br.readLine()) != null) {
++i;
data = line.split(":");
map.put(df.parse(data[1]), line);
}
System.out.println(map.size()+" i = "+i);
Set st = mp.entrySet();
Iterator it = st.iterator();
while (it.hasNext()) {
Map.Entry me = (Map.Entry) it.next();
System.out.print(me.getKey() + "->:");
System.out.println(me.getValue());
}
} catch (Exception e) {
e.printStackTrace();
}
}
The date[1] contains the date in string format and looks like (e.g. 21-3-2013). The problem is that it stores in the TreeMap(mp) only 12 key-value pairs(one for each month) instead of the 103(i) expected. Any ideas ?

See http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html.
Use y for year, M for month in year, and d for day in month. Specifically, lowercase m is minute in hour, while uppercase M is month in year.

It looks like the lines
data = line.split(":");
map.put(df.parse(data[1]), line);
Are effectively parsing out only the month. Line.split(":") is going to produce an array split by : . If you have dates formatted in your data file dd:mm:yyyy then the resulting array "data" should be {[dd], [mm], [yyyy]}. So data[1] is going to simply be the month.
I could be wrong, but I suspect this is why you are only getting the 12 key-value pairs; you are parsing only for the month, and any time you get a new month key, you're overwriting the old key.

Trying to get the arraylist value inside hashmap key

I'm probably being stupid here...but I need help with this one! Basically i need to do a .contains("message") to determine if the key already contains the incoming message.
Thanks in advance!
EDIT: Just as a note, i do not want it to do anything if it already exists! Currently its not adding it to the list.
EDIT2: the date will not matter for the incoming message because the incoming message does not have the date portion.
private Map<Integer,List<String>> map = new HashMap<Integer,List<String>>();
public synchronized void addToProblemList(String incomingMessage, int storeNumber){
Date date = new Date();
SimpleDateFormat sdf = new SimpleDateFormat("MM/dd/yyyy h:mm:ss a");
String formattedDate = sdf.format(date);
if(map.get(storeNumber)==null){
map.put(storeNumber, new ArrayList<String>());
}
for(String lookForText : map.get(storeNumber)){
if(lookForText.contains(incomingMessage)){
}else if(!lookForText.contains(incomingMessage)){
map.get(storeNumber).add(incomingMessage+"\nTime of incident: "+formattedDate+"\n--------------------------------------------------------");
}
}
}
It used to look like this, but it always added it:
public synchronized void addToProblemList(String incomingMessage, int storeNumber){
Date date = new Date();
SimpleDateFormat sdf = new SimpleDateFormat("MM/dd/yyyy h:mm:ss a");
String formattedDate = sdf.format(date);
if(map.get(storeNumber)==null){
map.put(storeNumber, new ArrayList<String>());
}
if(map.get(storeNumber).contains(incomingMessage)==true){
//Do nothing
}
if (map.get(storeNumber).contains(incomingMessage)==false){
map.get(storeNumber).add(incomingMessage+"\nTime of incident: "+formattedDate+"\n--------------------------------------------------------");
}

What you are adding to the list is a key of the store number and an empty array list,
So the first message for the store you add to the list is empty, therefore your for loop will not execute as it has no elements to iterate.
So add this
if(map.get(storeNumber)==null){
ArrayList<String> aList = new ArrayList<String>();
aList.add(incomingMessage+"\nTime of incident: "+formattedDate+"\n--------------------------------------------------------");
map.put(storeNumber, aList);
}
Note map.get(storeNumber).contains(incomingMessage)==true you dont need to boolean comparison here as contains() returns a boolean.
The reason this original approach of yours wouldn't have worked is doing a List.contains() means you are doing an check to see if the list contains an exact matching string which it would not have since when you have added the String it also contained "\nTime of incident: "+formattedDate+"\n... which I suspect would not have matched just incomingMessage

You have this:
for(String lookForText : map.get(storeNumber)){
if(lookForText.contains(incomingMessage)){
}else if(!lookForText.contains(incomingMessage)){
map.get(storeNumber).add(incomingMessage+"\nTime of incident: "+formattedDate+"\n--------------------------------------------------------");
}
}
Try this instead:
List<String> messages = map.get(storeNumber);
if(!messages.contains(incomingMessage)){
map.get(storeNumber).add(incomingMessage+"\nTime of incident: "+formattedDate+"\n--------------------------------------------------------");
}

How to sort a string into a map and print the results

I have a string in the format nm=Alan&hei=72&hair=brown
I would like to split this information up, add a conversion to the first value and print the results in the format
nm Name Alan
hei Height 72
hair Hair Color brown
I've looked at various methods using the split function and hashmaps but have had no luck piecing it all together.
Any advice would be very useful to me.

Map<String, String> aliases = new HashMap<String, String>();
aliases.put("nm", "Name");
aliases.put("hei", "Height");
aliases.put("hair", "Hair Color");
String[] params = str.split("&"); // gives you string array: nm=Alan, hei=72, hair=brown
for (String p : params) {
String[] nv = p.split("=");
String name = nv[0];
String value = nv[1];
System.out.println(nv[0] + " " + aliases.get(nv[0]) + " " + nv[1]);
}
I really do not understand what you problem was...

Try something like this:
static final String DELIMETER = "&"
Map<String,String> map = ...
map.put("nm","Name");
map.put("hei","Height");
map.put("hair","Hair color");
StringBuilder builder = new StringBuilder();
String input = "nm=Alan&hei=72&hair=brown"
String[] splitted = input.split(DELIMETER);
for(Stirng str : splitted){
int index = str.indexOf("=");
String key = str.substring(0,index);
builder.append(key);
builder.append(map.get(key));
builder.append(str.substring(index));
builder.append("\n");
}

A HashMap consists of many key, value pairs. So when you use split, devise an appropriate regex (&). Once you have your string array, you can use one of the elements as the key (think about which element will make the best key). However, you may now be wondering- "how do I place the rest of elements as the values?". Perhaps you can create a new class which stores the rest of the elements and use objects of this class as values for the hashmap.
Then printing becomes easy- merely search for the value of the corresponding key. This value will be an object; use the appropriate method on this object to retrieve the elements and you should be able to print everything.
Also, remember to handle exceptions in your code. e.g. check for nulls, etc.
Another thing: your qn mentions the word "sort". I don't fully get what that means in this context...

Map<String, String> propsMap = new HashMap<String, String>();
Map<String, String> propAlias = new HashMap<String, String>();
propAlias.put("nm", "Name");
propAlias.put("hei", "Height");
propAlias.put("hair", "Hair Color");
String[] props = input.split("&");
if (props != null && props.length > 0) {
for (String prop : props) {
String[] propVal = prop.split("=");
if (propVal != null && propVal.length == 2) {
propsMap.put(propVal[0], propVal[1]);
}
}
}
for (Map.Entry tuple : propsMap.getEntrySet()) {
if (propAlias.containsKey(tuple.getKey())) {
System.out.println(tuple.getKey() + " " + propAlias.get(tuple.getKey()) + " " + tuple.getValue());
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Only take most recent line from CSV when a value appears twice - java

Just saw this one and I believe #danw had asked this question on Mule forum too. There is a better way to achieve it with DataWeave. Check out my answer on mule forum - http://forums.mulesoft.com/questions/40897/only-take-most-recent-line-from-csv-when-a-value-a.html#answer-40975

Related

Get,Put key and values from nested hashmap

Mapping several columns from sql to a java object

Sorting a map by date key in Java

Trying to get the arraylist value inside hashmap key

How to sort a string into a map and print the results

Categories

Resources