Intelligent String Parsing in java

Intelligent String Parsing in java - java

I have an email Subject line that i need to parse. I need to find first occurance of any word given in a list of words and get the next word which can be separated by
('=' or ',' or ';' or 'blank' or '.').
for example
list of word for customer ["customer","client","kunden","kd.nr."]
list of word for Order ["order","auftrag","auftragsnummer","auftragnr."]
separator : [= , ; .]
subjectline: Customer 2013ABC has send an Aufrag 2056899A for Motif=A
I need to parse the information like
customer=2013ABC
order=2056899A
Motif=A
I am using Java 7 so Scanner class can be used as well.
Thanks for any tips in advance

You can achieve this by using regular expressions, here is a sample code:
Pattern p = Pattern.compile(".*(customer|client|kunden|kd\\.nr\\.)[=,;\\. ]*(\\w*).*(order|auftrag|auftragsnummer|auftragnr\\.)[=,;\\. ]*(\\w*).*[ ](.*)$", Pattern.CASE_INSENSITIVE);
String subject = "subjectline: kd.nr. 2013ABC has send an Auftrag 2056899A for Motif=A";
Matcher m = p.matcher(subject);
if(m.matches()) {
System.out.println(m.group(1) + " : " + m.group(2) );
System.out.println(m.group(3) + " : " + m.group(4));
System.out.println(m.group(5));
}
Hope this helps.

Related

Java Regex OR operator not working properly

I have this Strings :
String test1=":test:block1:%a1%a2%a3%a4:block2:BL";
and
String test2=":test:block2:BL:block1:%a1%a2%a3%a4";
I've created an regex pattern in order to isolate this piece of String
block1:%a1%a2%a3%a4:
from the rest of the String letting those Strings like this :
in the case of test1="block1:%a1%a2%a3%a4:"; (with ':' at the end)
in the case of test2=":block1:%a1%a2%a3%a4"; (with ':' at the beggining)
The regex i've created is :
"(block1:(.*?):|:block1:(.*))";
With test1 is working , but with test2 is retrieving me this :
block1:%a1%a2%a3%a4:block2:BL";
Can someone give me a hand with this ?
Cheers!

You may use
block1:([^:]*)
It matches block1: text and then captures into Group 1 any 0 or more chars other than :.
See Java demo:
String patternString = "block1:([^:]*)";
String[] tests = {":test:block1:%a1%a2%a3%a4:block2:BL",
":test:block2:BL:block1:%a1%a2%a3%a4"};
for (int i=0; i<tests.length; i++)
{
Pattern p = Pattern.compile(patternString, Pattern.DOTALL);
Matcher m = p.matcher(tests[i]);
if(m.find())
{
System.out.println(tests[i] + " matched. Match: " +
m.group(0) + ", Group 1: " + m.group(1));
}
}
Output:
:test:block1:%a1%a2%a3%a4:block2:BL matched. Match: block1:%a1%a2%a3%a4, Group 1: %a1%a2%a3%a4
:test:block2:BL:block1:%a1%a2%a3%a4 matched. Match: block1:%a1%a2%a3%a4, Group 1: %a1%a2%a3%a4

Parsing a Log File to Display Data from Multiple Lines Using Regular Expressions

So I'm trying to parse a bit of code here to get message text from a log file. I'll explain as I go. Here's the code:
// Print to interactions
try
{
// assigns the input file to a filereader object
BufferedReader infile = new BufferedReader(new FileReader(log));
sc = new Scanner(log);
while(sc.hasNext())
{
String line=sc.nextLine();
if(line.contains("LANTALK")){
Document doc = Jsoup.parse(line);
Element idto = doc.select("MBXTO").first();
Element msg = doc.select("MSGTEXT").first();
System.out.println(" to " + idto.text() + " " +
msg.text());
System.out.println();
} // End of if
} // End of while
try
{
// Print to output file
sc = new Scanner (log);
while(sc.hasNext())
{
String line=sc.nextLine();
if(line.contains("LANTALK")){
Document doc = Jsoup.parse(line);
Element idto = doc.select("MBXTO").first();
Element msg = doc.select("MSGTEXT").first();
outFile.println(" to " + idto.text() + " " +
msg.text());
outFile.println();
outFile.println();
} // End of if
} // End of while
} // end of try
I'm getting input from a log file, here's a sample of what it looks like and the lines that I'm filtering out:
08:25:20.740 [D] [T:000FF0] [F:LANTALK2C] <CMD>LANMSG</CMD>
<MBXID>1124</MBXID><MBXTO>5760</MBXTO><SUBTEXT>LanTalk</SUBTEXT><MOBILEADDR>
</MOBILEADDR><LAP>0</LAP><SMS>0</SMS><MSGTEXT>and I talked to him and he
gave me a credit card number</MSGTEXT>
08:25:20.751 [+] [T:000FF0] [S:1:1:1124:5607:5] LANMSG [15/2 | 0]
08:25:20.945 [+] [T:000FF4] [S:1:1:1124:5607:5] LANMSGTYPESTOPPED [0/2 | 0]
08:25:21.327 [+] [T:000FE8] [S:1:1:1124:5607:5] LANMSGTYPESTARTED [0/2 | 0]
So far, I've been able to filter the line that contains the message (LANMSG). And from that, I've been able to get the id number of the recipient (MBXTO). But the next line contains the sender's id, which I need to pull out and display. ([S:1:1:1124:SENDERID:5]). How should I do this? Below is a copy of the output I'm getting:
to 5760 and I talked to him and he gave me a credit card number
And here's what I need to get:
SENDERID to 5760 and I talked to him and he gave me a credit card number
Any help you guys could give me on this would be great. I'm just not sure how to go about getting the information I need.

Your answer isn't clear enough, but as it seems like you have not used regex in this code... remember to specify what have you tried before asking.
Anyways the regex you're searching for is:
(\d{2}:\d{2}:\d{2}\.\d{3})\s\[D\].+<MBXID>(\d+)<\/MBXID><MBXTO>(\d+)<\/MBXTO>.+<MSGTEXT>(.+)<\/MSGTEXT>
Working example in Regex101
It should capture:
$1: 08:25:20.740
$2: 1124
$3: 5760
$4: and I talked to him and he
gave me a credit card number (Note that it also capture \n, or newline, characters).
(Also, you'll use matcher.group(number) instead of $number in Java).
And then you can use these substitution (group reference) terms to get your formatted output.
E.g.: $1 [$2] to [$3] $4
Should return:
08:25:20.740 [1124] to [5760] and I talked to him and he
gave me a credit card number
Remember, when you're going to implement regex in your Java code, you must escape all the backslashes (\), for this reason, this regex looks bigger:
Pattern pattern = Pattern.compile("(\\d{2}:\\d{2}:\\d{2}\\.\\d{3})\\s\\[D\\].+<MBXID>(\\d+)<\\/MBXID><MBXTO>(\\d+)<\\/MBXTO>.+<MSGTEXT>(.+)<\\/MSGTEXT>", Pattern.MULTILINE + Pattern.DOTALL);
// Multiline is used to capture the LANMSG more than once, and Dotall is used to make the '.' term in regex also match the newline in the input
Matcher matcher = pattern.matcher(input);
while (matcher.find()){
String output = matcher.group(1) + " [" + matcher.group(2) + "] to [" + matcher.group(3) + "] " + matcher.group(4);
System.out.println(output);
}
And for your second problem Oh, you have edited and erased it already. . . But I'll still answer:
You can parse the $2 and $3 and make them return an integer:
int id1 = Integer.parseInt(matcher.group(2));
int id2 = Integer.parseInt(matcher.group(3));
This way you can create a method to return a name for these IDs. e.g.: UserUtil.getName(int id)

Java - Regex to split mathematical expression for operator excluding operator which comes under brackets

I need to split below string using below regex. but it splits data which comes under brackets.
Input
T(i-1).XX_1 + XY_8 + T(i-1).YY_2 * ZY_14
Expected Output
T(i-1).XX_1 , XY_8 , T(i-1).YY_2 , ZY_14
It should not split data which comes under "(" and ")";
I tried with below code but split data which comes under "(" and ")"
String[] result = expr.split("[+*/]");
any pointer to fix this.
I am new to this regex.
Input
(T(i-1).XX_1 + XY_8) + T(i-1).YY_2 * (ZY_14 + ZY_14)
Output
T(i-1).XX_1 , XY_8 , T(i-1).YY_2 , ZY_14 , ZY_14
if it is T(i-1) need to ignore.
For below expression its not working
XY_98 + XY_99 +XY_100
String lineExprVal = lineExpr.replaceAll("\\s+","");
String[] result = lineExprVal.split("[+*/-] (?!(^))");

You can split every thing outside your parentheses like this :
String str = "T(i-1).XX_1 + XY_8 + T(i-1).YY_2 * ZY_14";
String result[] = str.split("[+*/-] (?!(^))");
//---------------------------^----^^--List of your delimiters
System.out.println(Arrays.toString(result));
This will print :
[T(i-1).XX_1 , XY_8 , T(i-1).YY_2 , ZY_14]
The idea is simple you have to split with your delimiters that not inside your parenthesis.
You can check this here ideone and you can check your regex here Regex demo
EDIT
In your second case you have to use this regex :
String str = "(T(i - 1).XX_1 + XY_8)+ (i - 1).YY_2*(ZY_14 + ZY_14)";
String result[] = str.split("[+*+\\/-](?![^()]*(?:\\([^()]*\\))?\\))");
System.out.println(Arrays.toString(result));
This will give you :
[(T(i-1).XX_1+XY_8), T(i-1).YY_2, (ZY_14+ZY_14)]
^----Group1------^ ^--Groupe2-^ ^--Groupe3-^
You can find the Regex Demo, i inspirit this solution from this post here Regex to match only comma's but not inside multiple parentheses .
Hope this can help you.

Split in your second mathematical expression is really hard if it is not possible, so instead you have to use pattern, it is more helpful, so for your expression, you need this regex :
(\w+\([\w-*+\/]+\).\w+)|((?:(\w+\(.*?\))))|(\w+)
Here is a Demo regex you will understand more.
To get the result you need to loop throw your result :
public static void main(String[] args) {
String input = "(T(i-1).XX_1 + XY_8) + X + T(i-1).YY_2 * (ZY_14 + ZY_14) + T(i-1)";
Pattern pattern = Pattern.compile("(\\w+\\([\\w-*+\\/]+\\).\\w+)|((?:(\\w+\\(.*?\\))))|(\\w+)");
Matcher matcher = pattern.matcher(input);
List<String> reslt = new ArrayList<>();
while (matcher.find()) {//loop throw your matcher
if (matcher.group(1) != null) {
reslt.add(matcher.group(1));
}
//In your case you have to avoid this two groups
// if (matcher.group(2) != null) {
// reslt.add(matcher.group(2));
// }
// if (matcher.group(3) != null) {
// reslt.add(matcher.group(3));
// }
if (matcher.group(4) != null) {
reslt.add(matcher.group(4));
}
}
reslt.forEach(System.out::println);
}
This will gives you :
T(i-1).XX_1
XY_8
X
T(i-1).YY_2
ZY_14
ZY_14

Split string after n amount of digits occurrence

I'm parsing some folder names here. I have a program that lists subfolders of a folder and parses folder names.
For example, one folder could be named something like this:
"Folder.Name.1234.Some.Info.Here-ToBeParsed"
and I would like to parse it so name would be "Folder Name". At the moment I'm first using string.replaceAll() to get rid of special characters and then there is this 4-digit sequence. I would like to split string on that point. How can I achieve this?
Currently my code looks something like this:
// Parsing string if regex p matches folder's name
if(b) {
//System.out.println("Folder: \" " + name + "\" contains special characters.");
String result = name.replaceAll("[\\p{P}\\p{S}]", " "); // Getting rid of all punctuations and symbols.
//System.out.println("Parsed: " + name + " > " + result);
// If string matches regex p2
if(b2) {
//System.out.println("Folder: \" " + result + "\" contains release year.");
String parsed_name[] = result.split("20"); // This is the line i would like to split when 4-digits in row occur.
//System.out.println("Parsed: " + result + " > " + parsed_name[0]);
movieNames.add(parsed_name[0]);
}
Or maybe there is even easier way to do this? Thanks in advance!

You should keep it simple like this:
String name = "Folder.Name.1234.Some.Info.Here-ToBeParsed";
String repl = name.replaceFirst( "\\.\\d{4}.*", "" ).
replaceAll( "[\\p{P}\\p{S}&&[^']]+", " " );
//=> Folder Name
replaceFirst is removing everything after a DOT and 4 digits
replaceAll is replacing all punctuation and space (except apostrophe) by a single space

Java how to setup regex for this string

So I'm trying to pull two strings via a matcher object from one string that is stored in my online databases.
Each string appears after s:64: and is in quotations
Example s:64:"stringhere"
I'm currently trying to get them as so but any regex that I've tried has failed,
Pattern p = Pattern.compile("I don't know what to put as the regex");
Matcher m = p.matcher(data);
So with that said, all I need is the regex that will return the two strings in the matcher so that m.group(1) is my first string and m.group(2) is my second string.

Try this regex:-
s:64:\"(.*?)\"
Code:
Pattern pattern = Pattern.compile("s:64:\"(.*?)\"");
Matcher matcher = pattern.matcher(YourStringVar);
// Check all occurance
int count = 0;
while (matcher.find() && count++ < 2) {
System.out.println("Group : " + matcher.group(1));
}
Here group(1) returns the each match.
OUTPUT:
Group : First Match
Group : Second Match
Refer LIVE DEMO

String data = "s:64:\"first string\" random stuff here s:64:\"second string\"";
Pattern p = Pattern.compile("s:64:\"([^\"]*)\".*s:64:\"([^\"]*)\"");
Matcher m = p.matcher(data);
if (m.find()) {
System.out.println("First string: '" + m.group(1) + "'");
System.out.println("Second string: '" + m.group(2) + "'");
}
prints:
First string: 'first string'
Second string: 'second string'

Regex you need should be compile("s:64:\"(.*?)\".*s:64:\"(.*?)\"")

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Intelligent String Parsing in java - java

Related

Java Regex OR operator not working properly

Parsing a Log File to Display Data from Multiple Lines Using Regular Expressions

Java - Regex to split mathematical expression for operator excluding operator which comes under brackets

Split string after n amount of digits occurrence

Java how to setup regex for this string

Categories

Resources