Splitting user pattern matcher

Splitting user pattern matcher - java

I have some entries in log and I want to use Pattern matcher to get the entries out of log.
Log entries
1223-12-23 00:00:00 exception : 1223. Operation Cannot be done
1223-12-24 00:00:01 exception : 1221. Operation Cannot be done
I want to get entries like
String [] date = {1223-12-23 00:00:00, 1223-12-24 00:00:01}
String [] message = {exception : 1223. Operation Cannot be done, exception : 1221. Operation Cannot be done}
Is there an efficient way to do this.

I already used Flat File Parsing Library to perform a similar task.

Better than my other answer:
// dynamic list of strings for dates and messages
List<String> dates = new ArrayList<>();
List<String> messages = new ArrayList<>();
// split your logfile by line
String[] lines = yourLogFileContentAsString.split("\n");
for (String line : lines) {
// dates are characters 0-19
dates.add(line.substring(0, 20));
// message starts at character 21
messages.add(line.substring(21);
}
// you wanted arrays
String[] datesArray = dates.toArray(new String[0]);
String[] messagesArray = messages.toArray(new String[0]);

Related

unique number of words in a propertyfile

What is the optimum way to count the unique number of words in a propertyfile (Just the Values) in java (java 1.8)
for example entries may be:
key1=This is my value for error {0}
key2=This is success message.Great.
Output should be 10 (including {0})
What I tried
property.load(in);
Enumeration em = property.keys();
while (em.hasMoreElements()) {
String str = (String) em.nextElement();
completeString =completeString+property.get(str);
}
Set<String> myset=new HashSet<>();
String s[]=completeString.split("[ .]");
for(int i=1;i<s.length;i++){
myset.add(s[i]);
}
for (String sss: myset){
System.out.println(sss);
}
System.out.println(myset.size());
Do we have a simpler way in java 1.8

Data used :
I used a dummy Properties
Properties prop = new Properties();
prop.put("A", "This is my value for error {0}");
prop.put("B", "This is success message.Great.");
Good old Java:
Using the same logic you used, you can simply split the String of each property in the iteration :
Set<String> set = new HashSet<>();
Enumeration em = property.keys();
while (em.hasMoreElements()) {
String str = (String) em.nextElement();
for(String s : str.split("[ .]")){
set.add(s);
}
}
In Java 8 - Stream API :
Define the pattern to split each "word".
Pattern pattern = Pattern.compile("[ .]");
Now, first let's get our Stream<String> for our values.
You can either take a List<Object> :
Stream<String> stream =
//Create a `List<Object>` from the enumeration and stream it
Collections.list(prop.elements()).stream()
//Convert in String
.map(o -> (String)o);
Or Stream the Map.Entry of the Properties :
Stream<String> stream =
prop.entrySet().stream() //Iterate the Map.Entry<Object,Object>
.map(e -> (String)e.getValue())
(Not sure which is more efficient)
Then, all you have to do is to flatMap the Stream to split each String into new Stream<String>.
stream.flatMap(pattern::splitAsStream) //split based on the pattern define and return a new `Stream<String>`
Then collect the Stream into a Set
.collect(Collectors.toSet()); //collect in a `Set<String>`
The result would be a nice Set printed like:
[Great, success, for, This, {0}, is, my, error, message, value]
Summary :
Set<String> set =
prop.entrySet().stream()
.map(e -> (String)e.getValue())
.flatMap(Pattern.compile(pattern)::splitAsStream)
.collect(Collectors.toSet());

Getting original text after using stanford NLP parser

Hello people of the internet,
We're having the following problem with the Stanford NLP API:
We have a String that we want to transform into a list of sentences.
First, we used String sentenceString = Sentence.listToString(sentence); but listToString does not return the original text because of the tokenization. Now we tried to use listToOriginalTextString in the following way:
private static List<String> getSentences(String text) {
Reader reader = new StringReader(text);
DocumentPreprocessor dp = new DocumentPreprocessor(reader);
List<String> sentenceList = new ArrayList<String>();
for (List<HasWord> sentence : dp) {
String sentenceString = Sentence.listToOriginalTextString(sentence);
sentenceList.add(sentenceString.toString());
}
return sentenceList;
}
This does not work. Apparently we have to set an attribute " invertible " to true but we don't know how to. How can we do this?
In general, how do you use listToOriginalTextString properly? What preparations do you need?
sincerely,
Khayet

If I understand correctly, you want to get the mapping of tokens to the original input text after tokenization. You can do it like this;
//split via PTBTokenizer (PTBLexer)
List<CoreLabel> tokens = PTBTokenizer.coreLabelFactory().getTokenizer(new StringReader(text)).tokenize();
//do the processing using stanford sentence splitter (WordToSentenceProcessor)
WordToSentenceProcessor processor = new WordToSentenceProcessor();
List<List<CoreLabel>> splitSentences = processor.process(tokens);
//for each sentence
for (List<CoreLabel> s : splitSentences) {
//for each word
for (CoreLabel token : s) {
//here you can get the token value and position like;
//token.value(), token.beginPosition(), token.endPosition()
}
}

String sentenceStr = sentence.get(CoreAnnotations.TextAnnotation.class)
It gives you original text. An example for JSONOutputter.java file :
l2.set("id", sentence.get(CoreAnnotations.SentenceIDAnnotation.class));
l2.set("index", sentence.get(CoreAnnotations.SentenceIndexAnnotation.class));
l2.set("sentenceOriginal",sentence.get(CoreAnnotations.TextAnnotation.class));
l2.set("line", sentence.get(CoreAnnotations.LineNumberAnnotation.class));

Only take most recent line from CSV when a value appears twice

I'm working with a CSV file in Mule that could look something like the following:
ID|LastUpdated
01|01/12/2016 09:00:00
01|01/12/2016 09:45:00
02|01/12/2016 09:00:00
02|01/12/2016 09:45:00
03|01/12/2016 09:00:00
I'm trying to find a way of stripping out all duplicate occurrences of an ID value by taking only the most recent one, determined by the LastUpdated column. I'm trying to achieve this using DataWeave but have so far had no luck. I'm open to writing the logic in to a custom Java class but have limited knowledge of how to do that as well.
My desired output is something like the following:
ID|LastUpdated
01|01/12/2016 09:45:00
02|01/12/2016 09:45:00
03|01/12/2016 09:00:00
Any help or guidance would be appreciated.
Edit: it's worth noting that I expect the inbound file to be quite large (up to 000's of rows) so I need to be aware of performance in my solution
Edit: a solution using DataWeave can be found on the Mulesoft forum here.

If the dates/hours are always sorted into your CSV like in the example you gave the you can keep a reference on all your ID as keys into a Map and just update the value corresponding to the ids:
public static void main(String[] arg){
// I replace all the CSV reading by this list for the example
ArrayList<String> lines = new ArrayList<>();
lines.add("01|01/12/2016 09:00:00");
lines.add("01|01/12/2016 09:45:00");
lines.add("02|01/12/2016 09:00:00");
lines.add("02|01/12/2016 09:45:00");
lines.add("03|01/12/2016 09:00:00");
Iterator it = lines.iterator();
Map<String, String> lastLines = new HashMap<String, String>();
while (it.hasNext()) { // Iterator on the CVS lines here
String s = (String)it.next();
String id = s.substring(0, s.indexOf("|"));
String val = s.substring(s.indexOf("|") + 1 , s.length());
lastLines.put(id, val);
}
Iterator<String> keys = lastLines.keySet().iterator();
while (keys.hasNext()) {
String id = (String) keys.next();
System.out.println(id + "|" + lastLines.get(id));
}
}
This produce :
01|01/12/2016 09:45:00
02|01/12/2016 09:45:00
03|01/12/2016 09:00:00
If the CSV records can be in any order then you need to add a validation of the dates to keep only the most recent for each id.
private static final SimpleDateFormat sdf = new SimpleDateFormat("dd/MM/yyyy hh:mm:ss");
public static void main(String... args) {
// I replace all the CSV reading by this list for the example
ArrayList<String> lines = new ArrayList<>();
lines.add("01|01/12/2016 09:45:00");
lines.add("01|01/12/2016 09:00:00");
lines.add("02|01/12/2016 09:00:00");
lines.add("02|01/12/2016 09:45:00");
lines.add("03|01/12/2016 09:00:00");
Iterator it = lines.iterator();
Map<String, String> lastLines = new HashMap<String, String>();
while (it.hasNext()) { // Iterator on the CVS lines here
String s = (String)it.next();
String id = s.substring(0, s.indexOf("|"));
String val = s.substring(s.indexOf("|") + 1 , s.length());
if(lastLines.containsKey(id)){
try{
Date storeDate = sdf.parse(lastLines.get(id));
Date readDate = sdf.parse(val);
if(readDate.getTime() > storeDate.getTime())
lastLines.put(id, val);
}catch(ParseException pe){
pe.printStackTrace();
}
}else{
lastLines.put(id, val);
}
}
Iterator<String> keys = lastLines.keySet().iterator();
while (keys.hasNext()) {
String id = (String) keys.next();
System.out.println(id + "|" + lastLines.get(id));
}
}
I'm not sure about the date format you are currently using. You may need to change the format of the parser"dd/MM/yyyy hh:mm:ss". You can find the related documentation here

Just saw this one and I believe #danw had asked this question on Mule forum too. There is a better way to achieve it with DataWeave.
Check out my answer on mule forum -
http://forums.mulesoft.com/questions/40897/only-take-most-recent-line-from-csv-when-a-value-a.html#answer-40975

Parsing xml content line by line and extracting some values from it

How can I elegantly extract these values from the following text content ? I have this long file that contains thousands of entries. I tried the XML Parser and Slurper approach, but I ran out of memory. I have only 1GB. So now I'm reading the file line by line and extract the values. But I think there should be a better in Java/Groovy to do this, maybe a cleaner and reusable way. (I read the content from Standard-In)
1 line of Content:
<sample t="336" lt="0" ts="1406036100481" s="true" lb="txt1016.pb" rc="" rm="" tn="Thread Group 1-9" dt="" by="0"/>
My Groovy Solution:
Map<String, List<Integer>> requestSet = new HashMap<String, List<Integer>>();
String reqName;
String[] tmpData;
Integer reqTime;
System.in.eachLine() { line ->
if (line.find("sample")){
tmpData = line.split(" ");
reqTime = Integer.parseInt(tmpData[1].replaceAll('"', '').replaceAll("t=", ""));
reqName = tmpData[5].replaceAll('"', '').replaceAll("lb=", "");
if (requestSet.containsKey(reqName)){
List<Integer> myList = requestSet.get(reqName);
myList.add(reqTime);
requestSet.put(reqName, myList);
}else{
List<Integer> myList = new ArrayList<Integer>();
myList.add(reqTime);
requestSet.put(reqName, myList);
}
}
}
Any suggestion or code snippets that improve this ?

Arraylist as Message when sending mail

I have this arraylist;
ArrayList<String> list = new ArrayList<String>();
I have populate this arraylist from some DB queries and i must send this list as e-mail.
public void sendMail(ArrayList carriers) throws Exception {
Email email = new SimpleEmail();
email.setHostName("mail.test.com.tr");
email.setSmtpPort(587);
email.setAuthentication("testuser#mail.test.com.tr","testuserpass");
email.setSSLOnConnect(false);
email.setFrom("testuser#mail.test.com.tr");
email.setSubject("Test Information List");
email.setMsg("Last 1 hour Information;\n"+carriers);
email.addTo("test#mail.test.com.tr");
email.send();
System.out.println("email sended succesfully.");
}
When i call this sendMail(list); method mail came to my mailbox succesfully. But all strings in this list showing side by side in message body normally. I want all strings align vertically.Let me explain;
Now;
trying1, trying2, trying3
Desired format;
trying1
trying2
trying3
How can i handle it?
--SOLVED--
StringBuilder b = new StringBuilder();
for(Object carrier : carriers)
b.append(carrier).append("\n");
String carriersString = b.toString();
Above lines added to sendMail() method, beginning of code. And below lines editing to;
email.setMsg("Last 1 hour Information;\n"+carriersString);
Thanks to #Icewind

You have to manually concatenate the strings to your desired format. The default toString() method will concatenate the values by a comma.
Something like this:
StringBuilder b = new StringBuilder();
for(String carrier : carriers)
b.append(carrier).append("\n");
String carriersString = b.toString();
or with StringUtils in apache commons (http://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringUtils.html):
String carriersString = StringUtils.join(carriers, "\n");
...snip...
email.setMsg("Last 1 hour Information;\n"+carriersString);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Splitting user pattern matcher - java

I already used Flat File Parsing Library to perform a similar task.

Related

unique number of words in a propertyfile

Getting original text after using stanford NLP parser

Only take most recent line from CSV when a value appears twice

Parsing xml content line by line and extracting some values from it

Arraylist as Message when sending mail

Categories

Resources