Java Regex Multiline issue - java

I have a String read from a file via apache commons FileUtils.readFileToString, which has the following format:
<!--LOGHEADER[START]/-->
<!--HELP[Manual modification of the header may cause parsing problem!]/-->
<!--LOGGINGVERSION[2.0.7.1006]/-->
<!--NAME[./log/defaultTrace_00.trc]/-->
<!--PATTERN[defaultTrace_00.trc]/-->
<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->
<!--ENCODING[UTF8]/-->
<!--FILESET[0, 20, 10485760]/-->
<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->
<!--NEXTFILE[defaultTrace_00.1.trc]/-->
<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->
<!--LOGHEADER[END]/-->
#2.0#2015 03 04 11:04:19:687#+0100#Debug#...(few lines to follow)
I am trying to filter out everything between the LOGHEADER[START] and LOGHEADER[END] line. Therefore I created a java regex:
String fileContent = FileUtils.readFileToString(file);
String logheader = "LOGHEADER\\[START\\].*LOGHEADER\\[END\\]";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
System.out.println(m.matches());
(Dotall since it is a Multiline pattern and i want to cover linebreaks as well)
However this pattern does not match the String. If I try to remove the LOGHEADER\[END\] part of the regex I get a match, that contains the whole String. I don't get why it is not matching for the original RegEx.
Any help is appreciated - thanks a lot!

The important thing to remember about this Java matches() method is that your regular expression must match the entire line.
So, you have to use find() this way to capture all in-between <!--LOGHEADER[START]/--> and n<!--LOGHEADER[END]/--:
String logheader = "(?<=LOGHEADER\\[START\\]/-->).*(?=<!--LOGHEADER\\[END\\])";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
while(m.find()) {
System.out.println(m.group());
}
Or, to follow the logics you suggest (just using matches), we need to add ^.* and .*$:
String logheader = "^.*LOGHEADER\\[START\\].*LOGHEADER\\[END\\].*$";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
System.out.println(m.matches());

You actually need to use Pattern and Matcher classes along with find method. The below regex will fetch all the lines which exists between LOGHEADER[START] and LOGHEADER[END].
String s = "<!--LOGHEADER[START]/-->\n" +
"<!--HELP[Manual modification of the header may cause parsing problem!]/-->\n" +
"<!--LOGGINGVERSION[2.0.7.1006]/-->\n" +
"<!--NAME[./log/defaultTrace_00.trc]/-->\n" +
"<!--PATTERN[defaultTrace_00.trc]/-->\n" +
"<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->\n" +
"<!--ENCODING[UTF8]/-->\n" +
"<!--FILESET[0, 20, 10485760]/-->\n" +
"<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->\n" +
"<!--NEXTFILE[defaultTrace_00.1.trc]/-->\n" +
"<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->\n" +
"<!--LOGHEADER[END]/-->\n" +
"#2.0#2015 03 04 11:04:19:687#+0100#Debug#...(few lines to follow)";
Matcher m = Pattern.compile("(?s)\\bLOGHEADER\\[START\\][^\\n]*\\n(.*?)\\n[^\\n]*\\bLOGHEADER\\[END\\]").matcher(s);
while(m.find())
{
System.out.println(m.group(1));
}
Output:
<!--HELP[Manual modification of the header may cause parsing problem!]/-->
<!--LOGGINGVERSION[2.0.7.1006]/-->
<!--NAME[./log/defaultTrace_00.trc]/-->
<!--PATTERN[defaultTrace_00.trc]/-->
<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->
<!--ENCODING[UTF8]/-->
<!--FILESET[0, 20, 10485760]/-->
<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->
<!--NEXTFILE[defaultTrace_00.1.trc]/-->
<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->
If you do want to match also the LOGHEADER lines, then a capturing group would be an unnecessary one.
Matcher m = Pattern.compile("(?s)[^\\n]*\\bLOGHEADER\\[START\\].*?\\bLOGHEADER\\[END\\][^\\n]*").matcher(s);
while(m.find())
{
System.out.println(m.group());
}

Related

JAVA regex to find string

i have a string like this:
font-size:36pt;color:#ffffff;background-color:#ff0000;font-family:Times New Roman;
How can I get the value of the color and the value of background-color?
color:#ffffff;
background-color:#ff0000;
i have tried the following code but the result is not my expected.
Pattern pattern = Pattern.compile("^.*(color:|background-color:).*;$");
The result will display:
font-size:36pt; color:#ffffff; background-color:#ff0000; font-family:Times New Roman;
If you want to have multiple matches in a string, don't assert ^ and $ because if those matches, then the whole string matches, which means that you can't match it again.
Also, use a lazy quantifier like *?. This will stop matching as soon as it finds some string that matches the pattern after it.
This is the regex you should use:
(color:|background-color:)(.*?);
Group 1 is either color: or background-color:, group 2 is the color code.
Demo
To do this you should use the (?!abc) expression in regex. This finds a match but doesn't select it. After that you can simply select the hexcode, like this:
String s = "font-size:36pt;color:#ffffff;background-color:#ff0000;font-family:Times New Roman";
Pattern pattern = Pattern.compile("(?!color:)#.{6}");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group());
}
Pattern pattern = Pattern.compile("color\\s*:\\s*([^;]+)\\s*;\\s*background-color\\s*:\\s*([^;]+)\\s*;");
Matcher matcher = pattern.matcher("font-size:36pt; color:#ffffff; background-color:#ff0000; font-family:Times New Roman;");
if (matcher.find()) {
System.out.println("color:" + matcher.group(1));
System.out.println("background-color:" + matcher.group(2));
}
No need to describe the whole input, only the relevant part(s) that you're looking to extract.
The regex color:(#[\\w\\d]+); does the trick for me:
String input = "font-size:36pt;color:#ffffff;background-color:#ff0000;font-family:Times New Roman;";
String regex = "color:(#[\\w\\d]+);";
Matcher m = Pattern.compile(regex).matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Notice that m.group(1) returns the matching group which is inside the parenthesis in the regex. So the regex actually matches the whole color:#ffffff; and color:#ff0000; parts, but the print only handles the number itself.
Use a CSS parser like ph-css
String input = "font-size:36pt; color:#ffffff; background-color:#ff0000; font-family:Times New Roman;";
final CSSDeclarationList cssPropertyList =
CSSReaderDeclarationList.readFromString(input, ECSSVersion.CSS30);
System.out.println(cssPropertyList.get(1).getProperty() + " , "
+ cssPropertyList.get(1).getExpressionAsCSSString());
System.out.println(cssPropertyList.get(2).getProperty() + " , "
+ cssPropertyList.get(2).getExpressionAsCSSString());
Prints:
color , #ffffff
background-color , #ff0000
Find more about ph-css on github

Regex - find data inside left and right encloses

I have this string:
text=123+456+789&xxxxxxxxx&yyyyyyyyyy&zzzzzzzzzzz
I need to extract 123+456+789
What I done so far is:
String s = "text=123+456+789&xxxxxxxxx&yyyyyyyyyy&zzzzzzzzzzz";
String ps = "text=(.*)&";
Pattern p = Pattern.compile(ps);
Matcher m = p.matcher(s);
if (m.find()){
System.out.println(m.group(0));
System.out.println(m.group(1));
}
And I got all text until the last & which is: 123+456+789&xxxxxxxxx&yyyyyyyyyy while the requested output is: 123+456+789
Any suggestions how to fix it (regex is mandatory)?
Use a negated character class:
String ps = "text=([^&]*)";
The value you need will be in Group 1.
The [^&] matches any character but an ampersand.
You almost getting, you need to make your regex lazy (or non greedy) like this:
String ps = "text=(.*?)&";
here ---^
Working demo
Try this regex :
([0-9+]+)
Link : https://regex101.com/r/xU2zF4/1
java code :
String s = "text=123+456+789&xxxxxxxxx&yyyyyyyyyy&zzzzzzzzzzz";
String ps = "([0-9+]+)";
Pattern p = Pattern.compile(ps);
Matcher m = p.matcher(s);
if (m.find()){
System.out.println(m.group(0)); // value of s
System.out.println(m.group(1)); // returns 123+456+789
}

Regular expression to fetch username= 'testuserMM' from the string

I tried this regex to capture username
highs\(\d+\)\[.*?\]\[\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\]\sftid\(\d+\):\s.
It didn't work.
<55>Mar 17 12:02:00 forcesss-off [Father][1x91422234][eee][hote] abcd(QlidcxpOulqsf): highs(23455814)[mothers][192.192.21.12] ftid(64322816): oops authentication failed with (http-commo-auth, username='testuserMM' password='********'congratulation-fakem='login' )
You can use a much simpler regex for that:
\busername='([^']+)
See demo, result is in Group 1.
REGEX:
\b - Word boundary
username=' - literal string username='
([^']+) - A capturing group containing our substring that only contains 1 or more symbols other then a single apostrophe.
UPDATE:
Here are 2 ways to get the text you are looking for:
String str = "<55>Mar 17 12:02:00 forcesss-off [Father][1x91422234][eee][hote] abcd(QlidcxpOulqsf): highs(23455814)[mothers][192.192.21.12] ftid(64322816): oops authentication failed with (http-commo-auth, username='testuserMM' password='********'congratulation-fakem='login' )";
String res = str.replaceAll(".*\\busername='([^']+)'.*", "$1");
System.out.println(res);
String rx = "(?<=\\busername=')[^']+";
Pattern ptrn = Pattern.compile(rx);
Matcher m = ptrn.matcher(str);
while (m.find()) {
System.out.println(m.group());
}
See IDEONE demo

Replace different Regex-Matches with Match-based results in Java

One common usage for regex is the replacement of the matches with something that is based on the matches.
For example a commit-text with ticket numbers ABC-1234: some text (ABC-1234) has to be replaced with <ABC-1234>: some text (<ABC-1234>) (<> as example for some surroundings.)
This is very simple in Java
String message = "ABC-9913 - Bugfix: Some text. (ABC-9913)";
String finalMessage = message;
Matcher matcher = Pattern.compile("ABC-\\d+").matcher(message);
if (matcher.find()) {
String ticket = matcher.group();
finalMessage = finalMessage.replace(ticket, "<" + ticket + ">");
}
System.out.println(finalMessage);
results in<ABC-9913> - Bugfix: Some text. (<ABC-9913>).
But if there are different matches in the input String, this is different. I tried a slightly different code replacing if (matcher.find()) { with while (matcher.find()) {. The result is messed up with doubled replacements (<<ABC-9913>>).
How can I replace all matching values in an elegant way?
You can simply use replaceAll:
String input = "ABC-1234: some text (ABC-1234)";
System.out.println(input.replaceAll("ABC-\\d+", "<$0>"));
prints:
<ABC-1234>: some text (<ABC-1234>)
$0 is a reference to the matched string.
Java regex reference (see "Groups and capturing").
The problem is that the replace() method transforms the string over and over again.
A better way is to replace one match at a time. The matcher class has an appendReplacement-method for this.
String message = "ABC-9913, ABC-9915 - Bugfix: Some text. (ABC-9913,ABC-9915)";
Matcher matcher = Pattern.compile("ABC-\\d+").matcher(message);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
String ticket = matcher.group();
matcher.appendReplacement(sb, "<" + ticket + ">");
}
matcher.appendTail(sb);
System.out.println(sb);

Unclosed group near index 14

I get error on this line:
Pattern pattern = Pattern.compile(word + "\\(.*\\)");
It says:
Exception in thread "AWT-EventQueue-0" java.util.regex.PatternSyntaxException: Unclosed group near index 14
I know this error is when you leave not escaped special characters but I dont see any there..
FULL CODE:
StyleConstants.setItalic(set, true);
for (String word : code.split("\\s")) {
Pattern pattern = Pattern.compile(word + "\\(.*\\)");
Matcher matcher = pattern.matcher(word);
while (matcher.find()) {
doc.setCharacterAttributes(matcher.start(), word.length(), set, true);
}
}
code is a string. It explodes code and checks every word. If word matches, colors it
I tried the following:
Pattern pattern = Pattern.compile("abcd" + "\\(.*\\)");
log.debug("RegEx: " + pattern);
And that worked fine:
RegEx: abcd\(.*\)
I can only assume that you have some unescaped characters in word.
If you don't know the value of word at compile time then build the pattern and log it before calling the compile() method:
String regex = word + "\\(.*\\)";
System.out.println("Regex: \"" + regex + "\"");
Pattern pattern = Pattern.compile(regex);

Categories