Unclosed group near index 14 - java

I get error on this line:
Pattern pattern = Pattern.compile(word + "\\(.*\\)");
It says:
Exception in thread "AWT-EventQueue-0" java.util.regex.PatternSyntaxException: Unclosed group near index 14
I know this error is when you leave not escaped special characters but I dont see any there..
FULL CODE:
StyleConstants.setItalic(set, true);
for (String word : code.split("\\s")) {
Pattern pattern = Pattern.compile(word + "\\(.*\\)");
Matcher matcher = pattern.matcher(word);
while (matcher.find()) {
doc.setCharacterAttributes(matcher.start(), word.length(), set, true);
}
}
code is a string. It explodes code and checks every word. If word matches, colors it

I tried the following:
Pattern pattern = Pattern.compile("abcd" + "\\(.*\\)");
log.debug("RegEx: " + pattern);
And that worked fine:
RegEx: abcd\(.*\)
I can only assume that you have some unescaped characters in word.
If you don't know the value of word at compile time then build the pattern and log it before calling the compile() method:
String regex = word + "\\(.*\\)";
System.out.println("Regex: \"" + regex + "\"");
Pattern pattern = Pattern.compile(regex);

Related

Why matcher.find() for input parameter always return 'false'?

I have a strange situation which I find difficult to understand regarding regex matcher.
When I pass the next input parameter issueBody to the matcher, the matcher.find() always return false, while passing a hard-coded String with the same value as the issueBody - it works as expected.
The regex function:
private Map<String, String> extractCodeSnippet(Set<String> resolvedIssueCodeLines, String issueBody) {
String codeSnippetForCodeLinePattern = "\\(Line #%s\\).*\\W\\`{3}\\W+(.*)(?=\\W+\\`{3})";
Map<String, String> resolvedIssuesMap = new HashMap<>();
for (String currentResolvedIssue : resolvedIssueCodeLines) {
String currentCodeLinePattern = String.format(codeSnippetForCodeLinePattern, currentResolvedIssue);
Pattern pattern = Pattern.compile(currentCodeLinePattern, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(issueBody);
while (matcher.find()) {
resolvedIssuesMap.put(currentResolvedIssue, matcher.group());
}
}
return resolvedIssuesMap;
}
The following always return false
Pattern pattern = Pattern.compile(currentCodeLinePattern, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(issueBody);
While the following always return true
Pattern pattern = Pattern.compile(currentCodeLinePattern, Pattern.MULTILINE);
Matcher matcher = pattern.matcher("**SQL_Injection** issue exists # **VB_3845_112_lines/encode.frm** in branch **master**\n" +
"\n" +
"Severity: High\n" +
"\n" +
"CWE:89\n" +
"\n" +
"[Vulnerability details and guidance](https://cwe.mitre.org/data/definitions/89.html)\n" +
"\n" +
"[Internal Guidance](https://checkmarx.atlassian.net/wiki/spaces/AS/pages/79462432/Remediation+Guidance)\n" +
"\n" +
"[ppttx](http://WIN2K12-TEMP/bbcl/ViewerMain.aspx?planid=1010013&projectid=10005&pathid=1)\n" +
"\n" +
"Lines: 41 42 \n" +
"\n" +
"---\n" +
"[Code (Line #41):](null#L41)\n" +
"```\n" +
" user_name = txtUserName.Text\n" +
"```\n" +
"---\n" +
"[Code (Line #42):](null#L42)\n" +
"```\n" +
" password = txtPassword.Text\n" +
"```\n" +
"---\n");
My question is - why? what is the difference between the two statements?
TL;DR:
By using Pattern.UNIX_LINES, you tell Java regex engine to match with . any char but a newline, LF. Use
Pattern pattern = Pattern.compile(currentCodeLinePattern, Pattern.UNIX_LINES);
In your hard-coded string, you have only newlines, LF endings, while your issueBody most likely contains \r\n, CRLF endings. Your pattern only matches a single non-word char with \W (see \\W\\`{3} pattern part), but CRLF consists of two non-word chars. By default, . does not match line break chars, so it does not match neither \r, CR, nor \n, LF. The \(Line #%s\).*\W\`{3} fails right because of this:
\(Line #%s\) - matches (Line #<NUMBER>)
.* - matches 0 or more chars other than any line break char (up to CR or CRLF)
\W - matches a char other than a letter/digit/_ (so, only \r or \n)
\`{3} - 3 backticks - these are only matched if there was a \n ending, not \r\n (CRLF).
Again, by using Pattern.UNIX_LINES, you tell Java regex engine to match with . any char but a newline, LF.
BTW, Pattern.MULTILINE only makes ^ match at the start of each line, and $ to match at the end of each line, and since there are neither ^, nor $ in your pattern, you may safely discard this option.

JAVA regex to find string

i have a string like this:
font-size:36pt;color:#ffffff;background-color:#ff0000;font-family:Times New Roman;
How can I get the value of the color and the value of background-color?
color:#ffffff;
background-color:#ff0000;
i have tried the following code but the result is not my expected.
Pattern pattern = Pattern.compile("^.*(color:|background-color:).*;$");
The result will display:
font-size:36pt; color:#ffffff; background-color:#ff0000; font-family:Times New Roman;
If you want to have multiple matches in a string, don't assert ^ and $ because if those matches, then the whole string matches, which means that you can't match it again.
Also, use a lazy quantifier like *?. This will stop matching as soon as it finds some string that matches the pattern after it.
This is the regex you should use:
(color:|background-color:)(.*?);
Group 1 is either color: or background-color:, group 2 is the color code.
Demo
To do this you should use the (?!abc) expression in regex. This finds a match but doesn't select it. After that you can simply select the hexcode, like this:
String s = "font-size:36pt;color:#ffffff;background-color:#ff0000;font-family:Times New Roman";
Pattern pattern = Pattern.compile("(?!color:)#.{6}");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group());
}
Pattern pattern = Pattern.compile("color\\s*:\\s*([^;]+)\\s*;\\s*background-color\\s*:\\s*([^;]+)\\s*;");
Matcher matcher = pattern.matcher("font-size:36pt; color:#ffffff; background-color:#ff0000; font-family:Times New Roman;");
if (matcher.find()) {
System.out.println("color:" + matcher.group(1));
System.out.println("background-color:" + matcher.group(2));
}
No need to describe the whole input, only the relevant part(s) that you're looking to extract.
The regex color:(#[\\w\\d]+); does the trick for me:
String input = "font-size:36pt;color:#ffffff;background-color:#ff0000;font-family:Times New Roman;";
String regex = "color:(#[\\w\\d]+);";
Matcher m = Pattern.compile(regex).matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Notice that m.group(1) returns the matching group which is inside the parenthesis in the regex. So the regex actually matches the whole color:#ffffff; and color:#ff0000; parts, but the print only handles the number itself.
Use a CSS parser like ph-css
String input = "font-size:36pt; color:#ffffff; background-color:#ff0000; font-family:Times New Roman;";
final CSSDeclarationList cssPropertyList =
CSSReaderDeclarationList.readFromString(input, ECSSVersion.CSS30);
System.out.println(cssPropertyList.get(1).getProperty() + " , "
+ cssPropertyList.get(1).getExpressionAsCSSString());
System.out.println(cssPropertyList.get(2).getProperty() + " , "
+ cssPropertyList.get(2).getExpressionAsCSSString());
Prints:
color , #ffffff
background-color , #ff0000
Find more about ph-css on github

Regex to extrat particular strings from a response data

As a response am getting following string
String response = "<span class="timeTempText">6:35a</span><span class="dividerText"> | </span><span class="timeTempText">59°F</span>"
From this i have to fetch only 6:35a and 59°F.
using subString and indexOf method i can get the values from the string and but it seems like lots of code.
Is there any easy way to get it.I mean using regular expression?
Using regular experssion how can i get the strings.
Try this:
timeTempText">(.*?)<\/span>
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "timeTempText\">(.*?)<\\/span>";
final String string = "<span class="timeTempText">6:35a</span><span class="dividerText"> | </span><span class="timeTempText">59°F</span>"
+ "asdfasdf asdfasdf timeTempText\">59°F</span> asdfasdf\n";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
/*
1st Capturing Group (.?)
.? matches any character (except for line terminators)
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
*/
Capturing Group 1 has the value
You can do it with regexes, but that's lot of code:
String response = "<span class=\"timeTempText\">6:35a</span><span class=\"dividerText\"> | </span><span class=\"timeTempText\">59°F</span>";
Matcher matcher = Pattern.compile("\"timeTempText\">(.*?)</span>").matcher(response);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Explanation:
You create a Pattern
Then retrieve the Matcher
You loop through the matches with the while(matcher.find()) idiom.
matcher.group(1) returns the 1st group of the pattern. I.e. the matched text between the first ()
Please note that this code is very brittle. You're better off with XPATH.

Java Regex Multiline issue

I have a String read from a file via apache commons FileUtils.readFileToString, which has the following format:
<!--LOGHEADER[START]/-->
<!--HELP[Manual modification of the header may cause parsing problem!]/-->
<!--LOGGINGVERSION[2.0.7.1006]/-->
<!--NAME[./log/defaultTrace_00.trc]/-->
<!--PATTERN[defaultTrace_00.trc]/-->
<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->
<!--ENCODING[UTF8]/-->
<!--FILESET[0, 20, 10485760]/-->
<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->
<!--NEXTFILE[defaultTrace_00.1.trc]/-->
<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->
<!--LOGHEADER[END]/-->
#2.0#2015 03 04 11:04:19:687#+0100#Debug#...(few lines to follow)
I am trying to filter out everything between the LOGHEADER[START] and LOGHEADER[END] line. Therefore I created a java regex:
String fileContent = FileUtils.readFileToString(file);
String logheader = "LOGHEADER\\[START\\].*LOGHEADER\\[END\\]";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
System.out.println(m.matches());
(Dotall since it is a Multiline pattern and i want to cover linebreaks as well)
However this pattern does not match the String. If I try to remove the LOGHEADER\[END\] part of the regex I get a match, that contains the whole String. I don't get why it is not matching for the original RegEx.
Any help is appreciated - thanks a lot!
The important thing to remember about this Java matches() method is that your regular expression must match the entire line.
So, you have to use find() this way to capture all in-between <!--LOGHEADER[START]/--> and n<!--LOGHEADER[END]/--:
String logheader = "(?<=LOGHEADER\\[START\\]/-->).*(?=<!--LOGHEADER\\[END\\])";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
while(m.find()) {
System.out.println(m.group());
}
Or, to follow the logics you suggest (just using matches), we need to add ^.* and .*$:
String logheader = "^.*LOGHEADER\\[START\\].*LOGHEADER\\[END\\].*$";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
System.out.println(m.matches());
You actually need to use Pattern and Matcher classes along with find method. The below regex will fetch all the lines which exists between LOGHEADER[START] and LOGHEADER[END].
String s = "<!--LOGHEADER[START]/-->\n" +
"<!--HELP[Manual modification of the header may cause parsing problem!]/-->\n" +
"<!--LOGGINGVERSION[2.0.7.1006]/-->\n" +
"<!--NAME[./log/defaultTrace_00.trc]/-->\n" +
"<!--PATTERN[defaultTrace_00.trc]/-->\n" +
"<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->\n" +
"<!--ENCODING[UTF8]/-->\n" +
"<!--FILESET[0, 20, 10485760]/-->\n" +
"<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->\n" +
"<!--NEXTFILE[defaultTrace_00.1.trc]/-->\n" +
"<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->\n" +
"<!--LOGHEADER[END]/-->\n" +
"#2.0#2015 03 04 11:04:19:687#+0100#Debug#...(few lines to follow)";
Matcher m = Pattern.compile("(?s)\\bLOGHEADER\\[START\\][^\\n]*\\n(.*?)\\n[^\\n]*\\bLOGHEADER\\[END\\]").matcher(s);
while(m.find())
{
System.out.println(m.group(1));
}
Output:
<!--HELP[Manual modification of the header may cause parsing problem!]/-->
<!--LOGGINGVERSION[2.0.7.1006]/-->
<!--NAME[./log/defaultTrace_00.trc]/-->
<!--PATTERN[defaultTrace_00.trc]/-->
<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->
<!--ENCODING[UTF8]/-->
<!--FILESET[0, 20, 10485760]/-->
<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->
<!--NEXTFILE[defaultTrace_00.1.trc]/-->
<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->
If you do want to match also the LOGHEADER lines, then a capturing group would be an unnecessary one.
Matcher m = Pattern.compile("(?s)[^\\n]*\\bLOGHEADER\\[START\\].*?\\bLOGHEADER\\[END\\][^\\n]*").matcher(s);
while(m.find())
{
System.out.println(m.group());
}

Java how to setup regex for this string

So I'm trying to pull two strings via a matcher object from one string that is stored in my online databases.
Each string appears after s:64: and is in quotations
Example s:64:"stringhere"
I'm currently trying to get them as so but any regex that I've tried has failed,
Pattern p = Pattern.compile("I don't know what to put as the regex");
Matcher m = p.matcher(data);
So with that said, all I need is the regex that will return the two strings in the matcher so that m.group(1) is my first string and m.group(2) is my second string.
Try this regex:-
s:64:\"(.*?)\"
Code:
Pattern pattern = Pattern.compile("s:64:\"(.*?)\"");
Matcher matcher = pattern.matcher(YourStringVar);
// Check all occurance
int count = 0;
while (matcher.find() && count++ < 2) {
System.out.println("Group : " + matcher.group(1));
}
Here group(1) returns the each match.
OUTPUT:
Group : First Match
Group : Second Match
Refer LIVE DEMO
String data = "s:64:\"first string\" random stuff here s:64:\"second string\"";
Pattern p = Pattern.compile("s:64:\"([^\"]*)\".*s:64:\"([^\"]*)\"");
Matcher m = p.matcher(data);
if (m.find()) {
System.out.println("First string: '" + m.group(1) + "'");
System.out.println("Second string: '" + m.group(2) + "'");
}
prints:
First string: 'first string'
Second string: 'second string'
Regex you need should be compile("s:64:\"(.*?)\".*s:64:\"(.*?)\"")

Categories