SnakeYAML load into Guava MultiMap - java

I am trying to load a Yaml file into a MultiMap using SnakeYAML, but I keep encountering the following exception: java.base/java.util.LinkedHashMap cannot be cast to com.google.common.collect.Multimap. Is there any way to efficiently and effectively load a SnakeYAML object into a Guava MultiMap? I am aware that you can do this with regular HashMaps, but my target Yaml has duplicate keys, so it requires the use of MultiMaps. Thanks for the help in advance. My code for populating the MultiMap using SnakeYAML is as follows:
//Read the config file
InputStream configIn;
try {
configIn = FileUtil.loadFileAsStream(configPath);
//Load the config file into the YAML object
pluginConf = LinkedHashMultimap.create((Multimap<String, Object>) configData.load(configIn));
}
catch(FileNotFoundException e){
// TODO Auto-generated catch block
e.printStackTrace();
}
EDIT: Sample YAML that has duplicate keys
join:
message: "JMessage1"
quit:
message: "QMessage1"
join:
message: "JMessage2"
quit:
message: "QMessage2"
join:
message: "JMessage3"
quit:
message: "QMessage3"

I did manage to solve this problem, but I used regex with regular HashMaps instead of the Guava Multimaps like I originally intended. For example, here's a YAML file that has duplicated keys:
join:
message: "JMessage1"
quit:
message: "QMessage1"
join:
message: "JMessage2"
quit:
message: "QMessage2"
To parse out all of the messages, all I did was simply add numbers to each of the parent keys that were being repeated (join becomes join1, quit becomes quit1, and so on). Since each key has a number in it, they can easily be reverted back to just saying join or quit with the following regex: str.replaceAll("[^a-zA-Z.]", "").toLowerCase() in a for loop that iterates through the HashMap. Since the HashMap entries now just read as join, quit, join, quit when put through the loop, their values can be easily grabbed using something like the following snippet:
if(entry.equals("join")){
//Do stuff
}
After that, the values can be added to something like an ArrayList or other collection. In my case, I used an ArrayList and assigned the properties of the YAML file to instance variables of an object. The following snippet from one of my classes in the project accomplishes this:
//Iterate through the HashMap
for(Entry<?, ?> configEntry : pluginConf.entrySet()){
//Get a string from the selected key that contains only lower-case letters and periods
String entryKey = configEntry.getKey().toString().replaceAll("[^a-zA-Z.]", "").toLowerCase();
//Search for join messages
if(entryKey.equals("messages.join")){
//Get the key name of the current entry
String joinKeyName = configEntry.getKey().toString();
//Create a join message object
JoinMessage newJoinMessage = new JoinMessage(,
configData.getString(joinKeyName + ".message")
);
//Add the join message object to the join message ArrayList
joinMessages.add(newJoinMessage);
//Add to the join message quantity
joinMsgQuantity ++;
}
}
Not really using HashMaps that allow duplicates like I originally asked for, but this "hack" worked flawlessly for me. Hope this helps anyone else looking to do something like this.

Related

How to sort a csv file?

I want to create a file in this format:
device1,t1,t2,t3,t4,t5
device2,t1,t2,t3,t4,t5
device3,t6,t7,t8,t9,t10
device4,t6,t7,t8,t9,t10
Here, t1, t2, ..., tn are time stamps.
Every value tn is generated based on one execution of JAR file along with that device name gets generated too.
I am able to generate a format like this using the JAR file now:
For example:
Current format in csv file:
device1,t1,device2,t2,device2,t3,device1,t4,device2,t5,device2,t6,device1,t7,device2,t8
I want this in this format in csv file:
device1-t1,t4,t7
device2-t2,t3,t5,t6,t8
So here, I have to put the time stamp belonging to specific devices on the right-hand side.
Please let me know how can I sort it in Java.
I will answer this question here as per my understanding of your question.
What you can do is to create a hashmap which stores device name as hashmap key.
And then for values create a sortedCollection.
Feed your timestamp in this sorted collection and keep updating this HashMap for the corresponding device name key.
As and when you will update your sorted timestamp collections, they will automatically be stored in sorted manner.
your hashmap will look like :
key : value (collection)
device1 : t1, t4, t7
device2 : t2, t5, t8 (add more timestamp in the end of this collection)
Then feed this hashmap data in the CSV file.
This is to do from java end.
If you want to sort in csv whenever a new timestamp is added for a device, then I dont think so that you can do this from java. Then you would have to keep some logic in csv file once all your data is added in csv file.
This is the solution:
I got output as:
Entire map:{Device1=[[t8], t9], Device2=[[[[[t2], t3], t5], t7], t10]}
BufferedReader reader = new BufferedReader(new FileReader("results.csv"));
String eachline;
// int i=2, j=2;
while((eachline = reader.readLine()) != null)
{
String[] fields = eachline.split(",");
if(Integer.parseInt(fields[2])==0)//data is = 0
{
if(tree.get(fields[0])!=null)//returns null if this key not present
{
values.add(tree.get(fields[0]));//get entire key value pair for that particular field
}
values.add(fields[1]);//to prev value, add next value
tree.put(fields[0], values.toString());// write to hashmap along with value
values.clear();
}
}
System.out.println("Entire map:"+tree);

How to remove duplicate columns after a JOIN in Pig?

Let's say I JOIN two relations like:
-- part looks like:
-- 1,5.3
-- 2,4.9
-- 3,4.9
-- original looks like:
-- 1,Anju,3.6,IT,A,1.6,0.3
-- 2,Remya,3.3,EEE,B,1.6,0.3
-- 3,akhila,3.3,IT,C,1.3,0.3
jnd = JOIN part BY $0, original BY $0;
The output will be:
1,5.3,1,Anju,3.6,IT,A,1.6,0.3
2,4.9,2,Remya,3.3,EEE,B,1.6,0.3
3,4.9,3,akhila,3.3,IT,C,1.3,0.3
Notice that $0 is shown twice in each tuple. EG:
1,5.3,1,Anju,3.6,IT,A,1.6,0.3
^ ^
|-----|
I can remove the duplicate key manually by doing:
jnd = foreach jnd generate $0,$1,$3,$4 ..;
Is there a way to remove this dynamically? Like remove(the duplicate key joiner).
Have faced the same kind of issue while working on Data Set Joining and other data processing techniques where in output the column names get repeated.
So was working on UDF which will remove the duplicates column by using schema name of that field and retaining the first unique column occurrence data.
Pre-Requisite:
Name of all the fields should be present
You need to download this UDF file and make it jar so as to use it.
UDF file location from GitHub :
GitHub UDF Java File Location
We will take the above question as example.
--Data Set A contains this data
-- 1,5.3
-- 2,4.9
-- 3,4.9
--Data Set B contains this data
-- 1,Anju,3.6,IT,A,1.6,0.3
-- 2,Remya,3.3,EEE,B,1.6,0.3
-- 3,Akhila,3.3,IT,C,1.3,0.3
PIG Script:
REGISTER /home/user/
DSA = LOAD '/home/user/DSALOC' AS (ROLLNO:int,CGPA:float);
DSB = LOAD '/home/user/DSBLOC' AS (ROLLNO:int,NAME:chararray,SUB1:float,BRANCH:chararray,GRADE:chararray,SUB2:float);
JOINOP = JOIN DSA BY ROLLNO,DSB BY ROLLNO;
We will get column name after joining as
DSA::ROLLNO:int,DSA::CGPA:float,DSB::ROLLNO:int,DSB::NAME:chararray,DSB::SUB1:float,DSB::BRANCH:chararray,DSB::GRADE:chararray,DSB::SUB2:float
For making it to
DSA::ROLLNO:int,DSA::CGPA:float,DSB::NAME:chararray,DSB::SUB1:float,DSB::BRANCH:chararray,DSB::GRADE:chararray,DSB::SUB2:float
DSB::ROLLNO:int is removed.
We need to use the UDF as
JOINOP_NODUPLICATES = FOREACH JOINOP GENERATE FLATTEN(org.imagine.REMOVEDUPLICATECOLUMNS(*));
Where org.imagine.REMOVEDUPLICATECOLUMNS is the UDF.
This UDF removes duplicate columns by using Name in schema.So DSA::ROLLNO:int is retained and DSB::ROLLNO:int is removed from the dataset.

How to retrieve the Field that "hit" in Lucene

Maybe I'm really missing something.
I have indexed a bunch of key/value pairs in Lucene (v4.1 if it matters). Say I have
key1=value1 and key2=value2, e.g. as read from a properties file.
They get indexed both as specific fields and into a catchall "ALL" field, e.g.
new Field("key1", "value1", aFieldTypeMimickingKeywords);
new Field("key2", "value2", aFieldTypeMimickingKeywords);
new Field("ALL", "key1=value1", aFieldTypeMimickingKeywords);
new Field("ALL", "key2=value2", aFieldTypeMimickingKeywords);
// then get added to the Document of course...
I can then do a wildcard search, using
new WildcardQuery(new Term("ALL", "*alue1"));
and it will find the hit.
But, it would be nice to get more info, like "what was complete value (e.g. "key1=value1") that goes with that hit?".
The best I can figure out it to get the Document, then get the list of IndexableFields, then loop over all of them and see if the field.stringValue().contains("alue1"). (I can look at the data structures in the debugger and all the info is there)
This seems completely insane cause isn't that what Lucene just did? Shouldn't the Hit information return some of the Fields?
Is Lucene missing what seems like "obvious" functionality? Google and starting at the APIs hasn't revealed anything straightforward, but I feel like I must be searching on the wrong stuff.
You might want to try with IndexSearcher.explain() method. Once you get the ID of the matching document, prepare a query for each field (using the same search keywords) and invoke Explanation.isMatch() for each query: the ones that yield true will give you the matched field. Example:
for (String field: fields){
Query query = new WildcardQuery(new Term(field, "*alue1"));
Explanation ex = searcher.explain(query, docID);
if (ex.isMatch()){
//Your query matched field
}
}

Converting a awk 2D array with counts into hashmap in java

I found this problem so interesting. I am using an awk 2D array that has a key,value,count of the same. and that is being printed to a file. This file is in the below format
A100|B100|3
A100|C100|2
A100|B100|5
Now I have a file like this .. My motive is to convert this file into a hash map so that the final output from the hash map is.
A100|B100|8
A100|C100|2
Just an aggregation
The challenge is, this one has 3 dimensions and not two. I did have an another file in the below format which is
D100|4
H100|5
D100|6
I easily aggregated the above as it is only 2D and I used the below code to do that
String[] fields= strLine.trim().split("\\|");
if(hashmap.containsKey(fields[0]))
{
//Update the value of the key here
hashmap.put(fields[0],hashmap.get(fields[0]) + Integer.parseInt(fields[1]));
}
else
{
//Inserting the key to the map
hashmap.put(fields[0],Integer.parseInt(fields[1]));
}
So this was quite simple for implementation.
But when it comes to 3D I have to have an another check inside.. My idea for this is to maintain a [B100,5(beanObject[5])]
Map<String,beanClassObject> hashmap=new Map<String,beanClassObject>();
secongField hash map which has been used in the code that has a mapping relation between the created ben Object subscript and the key as the second field "For instance it is "
This bean class would have the getter and setter method for the 2nd and 3rd fields of the file. I hope I am clear with this point. So the implementation of this would be
if(hashmap.containsKey(fields[0]))
{
**//Have to check whether the the particular key value pair already exists ... I dint find any method for this ... Just a normal iteration is there .. Could you ppl guide me regarding this**
//Update the value of the key here
secondFieldHashMap.get(fields[1]).COunt= secondFieldHashMap.get(fields[1]).getCOunt+ Integer.parseInt(fields[2]));
}
else
{
//Inserting the key to the map
hashmap.put(fields[0],Integer.parseInt(fields[1]));
secondFieldHashMap.get(fields[1]).COunt= Integer.parseInt(fields[2]));
}
else
{
// This meands there is no key field
// Hence insert the key field and also update the count of seconfFieldHashMap as done previously.
}
COuld you ppl please throw some ideas regarding this. Thank you
Consider using a Table available in the Google Guava libraries.

retrieving the values from the nested hashmap

I have a XML file with many copies of table node structure as below:
<databasetable TblID=”123” TblName=”Department1_mailbox”>
<SelectColumns>
<Slno>dept1_slno</Slno>
<To>dept1_to</To>
<From>dept1_from</From>
<Subject>dept1_sub</Subject>
<Body>dept1_body</Body>
<BCC>dept1_BCC</BCC>
<CC>dept1_CC</CC>
</SelectColumns>
<WhereCondition>MailSentStatus=’New’</WhereCondition>
<UpdateSuccess>
<MailSentStatus>’Yes’</MailSentStatus>
<MailSentFailedReason>’Mail Sent Successfully’</MailSentFailedReason>
</UpdateSuccess>
<UpdateFailure>
<MailSentStatus>’No’</MailSentStatus>
<MailSentFailedReason>’Mail Sending Failed ’</MailSentFailedReason>
</ UpdateFailure>
</databasetable>
As it is not an efficient manner to traverse the file for each time to fetch the details of each node for the queries in the program, I used the nested hashmap concept to store the details while traversing the XML file for the first time. The structure I used is as below:
MapMaster
Key Value
123 MapDetails
Key Value
TblName Department1_mailbox
SelectColumns mapSelect
Key Value
Slno dept1_slno
To dept1_to
From dept1_from
Subject dept1_sub
Body dept1_body
BCC dept1_BCC
CC dept1_CC
WhereCondition MailSentStatus=’New’
UpdateSuccess mapUS
MailSentStatus ’Yes’
MailSentFailedReason ’Mail Sent Successfully’
UpdateFailure mapUF
MailSentStatus ’No’
MailSentFailedReason ’Mail Sending Failed’
But the problem I’m facing now is regarding retrieving the Value part using the nested Keys. For example,
If I need the value of Slno Key, I have to specify TblID, SelectColumns, Slno in nested form like:
Stirng Slno = ((HashMap)((HashMap)mapMaster.get(“123”))mapDetails.get(“SelectColumns”))mapSelect.get(“Slno”);
This is unconvinent to use in a program. Please suggest a solution but don’t tell that iterators are available. As I’ve to fetch the individual value from the map according to the need of my program.
EDIT:my program has to fetch the IDs of the department for which there is privilege to send mails and then these IDs are compared with the IDs in XML file. Only information of those IDs are fetched from XML which returned true in comparison. This is all my program. Please help.
Thanks in advance,
Vishu
Never cast to specific Map implementation. Better use casting to Map interface, i.e.
((Map)one.get("foo")).get("bar")
Do not use casting in your case. You can define collection using generics, so compiler will do work for you:
Map<String, Map> one = new HashMap<String, Map>();
Map<String, Integer> two = new HashMap<String, Integer>();
Now your can say:
int n = one.get("foo").get("bar");
No casting, no problems.
But the better solution is not to use nested tables at all. Create your custom classes like SelectColumns, WhereCondition etc. Each class should have appropriate private fields, getters and setters. Now parse your XML creating instance of these classes. And then use getters to traverse the data structure.
BTW if you wish to use JAXB you do not have to do almost anything! Something like the following:
Unmarshaller u = JAXBContext.newInstance(SelectColumns.class, WhereCondition.class).createUnmarshaller();
SelectColumns[] columns = (SelectColumns[])u.unmarshal(in);
One approach to take would be to generate fully qualified keys that contain the XML path to the element or attribute. These keys would be unique, stored in a single hashmap and get you to the element quickly.
Your code would simply have to generate a unique textual representation of the path and store and retrieve the xml element based on the key.

Categories