JAVA regex to find string - java

i have a string like this:
font-size:36pt;color:#ffffff;background-color:#ff0000;font-family:Times New Roman;
How can I get the value of the color and the value of background-color?
color:#ffffff;
background-color:#ff0000;
i have tried the following code but the result is not my expected.
Pattern pattern = Pattern.compile("^.*(color:|background-color:).*;$");
The result will display:
font-size:36pt; color:#ffffff; background-color:#ff0000; font-family:Times New Roman;

If you want to have multiple matches in a string, don't assert ^ and $ because if those matches, then the whole string matches, which means that you can't match it again.
Also, use a lazy quantifier like *?. This will stop matching as soon as it finds some string that matches the pattern after it.
This is the regex you should use:
(color:|background-color:)(.*?);
Group 1 is either color: or background-color:, group 2 is the color code.
Demo

To do this you should use the (?!abc) expression in regex. This finds a match but doesn't select it. After that you can simply select the hexcode, like this:
String s = "font-size:36pt;color:#ffffff;background-color:#ff0000;font-family:Times New Roman";
Pattern pattern = Pattern.compile("(?!color:)#.{6}");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group());
}

Pattern pattern = Pattern.compile("color\\s*:\\s*([^;]+)\\s*;\\s*background-color\\s*:\\s*([^;]+)\\s*;");
Matcher matcher = pattern.matcher("font-size:36pt; color:#ffffff; background-color:#ff0000; font-family:Times New Roman;");
if (matcher.find()) {
System.out.println("color:" + matcher.group(1));
System.out.println("background-color:" + matcher.group(2));
}

No need to describe the whole input, only the relevant part(s) that you're looking to extract.
The regex color:(#[\\w\\d]+); does the trick for me:
String input = "font-size:36pt;color:#ffffff;background-color:#ff0000;font-family:Times New Roman;";
String regex = "color:(#[\\w\\d]+);";
Matcher m = Pattern.compile(regex).matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Notice that m.group(1) returns the matching group which is inside the parenthesis in the regex. So the regex actually matches the whole color:#ffffff; and color:#ff0000; parts, but the print only handles the number itself.

Use a CSS parser like ph-css
String input = "font-size:36pt; color:#ffffff; background-color:#ff0000; font-family:Times New Roman;";
final CSSDeclarationList cssPropertyList =
CSSReaderDeclarationList.readFromString(input, ECSSVersion.CSS30);
System.out.println(cssPropertyList.get(1).getProperty() + " , "
+ cssPropertyList.get(1).getExpressionAsCSSString());
System.out.println(cssPropertyList.get(2).getProperty() + " , "
+ cssPropertyList.get(2).getExpressionAsCSSString());
Prints:
color , #ffffff
background-color , #ff0000
Find more about ph-css on github

Related

Alternative to positive lookbehind when there are unknown number of spaces

My replacerRegex is
("schedulingCancelModal": \{\s*? "title": ")(.+?)(?=")
The right value is getting picked up, i.e. valueToBePicked:
But how do I get ("schedulingCancelModal": \{\s*? "title": ") not to be included in the result like positive lookbehind does?
My Java code so far:
Pattern replacerPattern = Pattern.compile(replacerRegex);
Matcher matcher = replacerPattern.matcher(value);
while (matcher.find()) {
String valueToBePicked = matcher.group();
}
You can simply select matcher.group(2) which will give you the contents of the second capture group. For example:
String replacerRegex = "(\"schedulingCancelModal\": \\{\\s*? \"title\": \")(.+?)(?=\")";
String value = "\"valueToBePicked\": \"schedulingCancelModal\": {\n \"title\": \"Are you sure you want to leave scheduling?\", ... }";
Pattern replacerPattern = Pattern.compile(replacerRegex);
Matcher matcher = replacerPattern.matcher(value);
while (matcher.find()) {
String valueToBePicked = matcher.group(2);
System.out.println(valueToBePicked);
}
Output:
Are you sure you want to leave scheduling?
Demo on rextester

Regex to extrat particular strings from a response data

As a response am getting following string
String response = "<span class="timeTempText">6:35a</span><span class="dividerText"> | </span><span class="timeTempText">59°F</span>"
From this i have to fetch only 6:35a and 59°F.
using subString and indexOf method i can get the values from the string and but it seems like lots of code.
Is there any easy way to get it.I mean using regular expression?
Using regular experssion how can i get the strings.
Try this:
timeTempText">(.*?)<\/span>
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "timeTempText\">(.*?)<\\/span>";
final String string = "<span class="timeTempText">6:35a</span><span class="dividerText"> | </span><span class="timeTempText">59°F</span>"
+ "asdfasdf asdfasdf timeTempText\">59°F</span> asdfasdf\n";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
/*
1st Capturing Group (.?)
.? matches any character (except for line terminators)
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
*/
Capturing Group 1 has the value
You can do it with regexes, but that's lot of code:
String response = "<span class=\"timeTempText\">6:35a</span><span class=\"dividerText\"> | </span><span class=\"timeTempText\">59°F</span>";
Matcher matcher = Pattern.compile("\"timeTempText\">(.*?)</span>").matcher(response);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Explanation:
You create a Pattern
Then retrieve the Matcher
You loop through the matches with the while(matcher.find()) idiom.
matcher.group(1) returns the 1st group of the pattern. I.e. the matched text between the first ()
Please note that this code is very brittle. You're better off with XPATH.

Java Regex Multiline issue

I have a String read from a file via apache commons FileUtils.readFileToString, which has the following format:
<!--LOGHEADER[START]/-->
<!--HELP[Manual modification of the header may cause parsing problem!]/-->
<!--LOGGINGVERSION[2.0.7.1006]/-->
<!--NAME[./log/defaultTrace_00.trc]/-->
<!--PATTERN[defaultTrace_00.trc]/-->
<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->
<!--ENCODING[UTF8]/-->
<!--FILESET[0, 20, 10485760]/-->
<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->
<!--NEXTFILE[defaultTrace_00.1.trc]/-->
<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->
<!--LOGHEADER[END]/-->
#2.0#2015 03 04 11:04:19:687#+0100#Debug#...(few lines to follow)
I am trying to filter out everything between the LOGHEADER[START] and LOGHEADER[END] line. Therefore I created a java regex:
String fileContent = FileUtils.readFileToString(file);
String logheader = "LOGHEADER\\[START\\].*LOGHEADER\\[END\\]";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
System.out.println(m.matches());
(Dotall since it is a Multiline pattern and i want to cover linebreaks as well)
However this pattern does not match the String. If I try to remove the LOGHEADER\[END\] part of the regex I get a match, that contains the whole String. I don't get why it is not matching for the original RegEx.
Any help is appreciated - thanks a lot!
The important thing to remember about this Java matches() method is that your regular expression must match the entire line.
So, you have to use find() this way to capture all in-between <!--LOGHEADER[START]/--> and n<!--LOGHEADER[END]/--:
String logheader = "(?<=LOGHEADER\\[START\\]/-->).*(?=<!--LOGHEADER\\[END\\])";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
while(m.find()) {
System.out.println(m.group());
}
Or, to follow the logics you suggest (just using matches), we need to add ^.* and .*$:
String logheader = "^.*LOGHEADER\\[START\\].*LOGHEADER\\[END\\].*$";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
System.out.println(m.matches());
You actually need to use Pattern and Matcher classes along with find method. The below regex will fetch all the lines which exists between LOGHEADER[START] and LOGHEADER[END].
String s = "<!--LOGHEADER[START]/-->\n" +
"<!--HELP[Manual modification of the header may cause parsing problem!]/-->\n" +
"<!--LOGGINGVERSION[2.0.7.1006]/-->\n" +
"<!--NAME[./log/defaultTrace_00.trc]/-->\n" +
"<!--PATTERN[defaultTrace_00.trc]/-->\n" +
"<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->\n" +
"<!--ENCODING[UTF8]/-->\n" +
"<!--FILESET[0, 20, 10485760]/-->\n" +
"<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->\n" +
"<!--NEXTFILE[defaultTrace_00.1.trc]/-->\n" +
"<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->\n" +
"<!--LOGHEADER[END]/-->\n" +
"#2.0#2015 03 04 11:04:19:687#+0100#Debug#...(few lines to follow)";
Matcher m = Pattern.compile("(?s)\\bLOGHEADER\\[START\\][^\\n]*\\n(.*?)\\n[^\\n]*\\bLOGHEADER\\[END\\]").matcher(s);
while(m.find())
{
System.out.println(m.group(1));
}
Output:
<!--HELP[Manual modification of the header may cause parsing problem!]/-->
<!--LOGGINGVERSION[2.0.7.1006]/-->
<!--NAME[./log/defaultTrace_00.trc]/-->
<!--PATTERN[defaultTrace_00.trc]/-->
<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->
<!--ENCODING[UTF8]/-->
<!--FILESET[0, 20, 10485760]/-->
<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->
<!--NEXTFILE[defaultTrace_00.1.trc]/-->
<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->
If you do want to match also the LOGHEADER lines, then a capturing group would be an unnecessary one.
Matcher m = Pattern.compile("(?s)[^\\n]*\\bLOGHEADER\\[START\\].*?\\bLOGHEADER\\[END\\][^\\n]*").matcher(s);
while(m.find())
{
System.out.println(m.group());
}

How do I use regex in Java to pull this from html?

I'm trying to pull data from the ESPN box scores, and one of the html files has:
<td style="text-align:left" nowrap>Channing Frye, PF</td>
and I'm only interested in grabbing the name (Channing Frye) and the position (PF)
Right now, I've been using Pattern.quote(start) + "(.*?)" + Pattern.quote(end) to grab text in between start and end, but I'm not sure how I'm supposed to grab text that starts with pattern .../http://espn.go.com/nba/player/_/id/ and then can contain (any integer)/anyfirst-anylast"> then grab the name I need (Channing Frye), then </a>, and then grab the position I need (PF) and ends with pattern </td>
Thanks!
Here is the pattern:
http://espn.go.com/nba/player/_/id/(\d+)/([\w-]+)">(.*?)</a>,\s*(\w+)</td>
You can use this tool - http://www.regexplanet.com/advanced/java/index.html for verifying regular expressions.
You could use this pattern:
\\/nba\\/player\\/_\\/.*\\\">(.*)<.+>,\\s(.*)<
This will match any link in the html that contains `/nba/player/
String re = "\\/nba\\/player\\/_\\/.*\\">(.*)<.+>,\\s(.*)<";
String str = "<td style=\"text-align:left\" nowrap>Channing Frye, PF</td>";
Pattern p = Pattern.compile(re, Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(str);
example: http://regex101.com/r/hA3uV0
Use this regex:
[A-Z\sa-z0-9]+(?=</a>)|\w+(?=</td>)
Here is one regex:
. is used for any item, .+ is used for any 1+ items
.* means o or more items
\s is used for space
String str = "<td style=\"text-align:left\" nowrap>Channing Frye, PF</td>";
Pattern pattern = Pattern.compile("<td.+>.*<a.+>(.+)</a>[\\s,]+(.+)</td>");
Matcher matcher = pattern.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
You can use :
String lString = "<td style=\"text-align:left\" nowrap>Channing Frye, PF</td>";
Pattern lPattern = Pattern.compile("<td.+><a.+id/\\d+/.+\\-.+>(.+)</a>, (.+)</td>");
Matcher lMatcher = lPattern.matcher(lString);
while(lMatcher.find()) {
System.out.println(lMatcher.group(1));
System.out.println(lMatcher.group(2));
}
This will give you :
Channing Frye
PF

Replace different Regex-Matches with Match-based results in Java

One common usage for regex is the replacement of the matches with something that is based on the matches.
For example a commit-text with ticket numbers ABC-1234: some text (ABC-1234) has to be replaced with <ABC-1234>: some text (<ABC-1234>) (<> as example for some surroundings.)
This is very simple in Java
String message = "ABC-9913 - Bugfix: Some text. (ABC-9913)";
String finalMessage = message;
Matcher matcher = Pattern.compile("ABC-\\d+").matcher(message);
if (matcher.find()) {
String ticket = matcher.group();
finalMessage = finalMessage.replace(ticket, "<" + ticket + ">");
}
System.out.println(finalMessage);
results in<ABC-9913> - Bugfix: Some text. (<ABC-9913>).
But if there are different matches in the input String, this is different. I tried a slightly different code replacing if (matcher.find()) { with while (matcher.find()) {. The result is messed up with doubled replacements (<<ABC-9913>>).
How can I replace all matching values in an elegant way?
You can simply use replaceAll:
String input = "ABC-1234: some text (ABC-1234)";
System.out.println(input.replaceAll("ABC-\\d+", "<$0>"));
prints:
<ABC-1234>: some text (<ABC-1234>)
$0 is a reference to the matched string.
Java regex reference (see "Groups and capturing").
The problem is that the replace() method transforms the string over and over again.
A better way is to replace one match at a time. The matcher class has an appendReplacement-method for this.
String message = "ABC-9913, ABC-9915 - Bugfix: Some text. (ABC-9913,ABC-9915)";
Matcher matcher = Pattern.compile("ABC-\\d+").matcher(message);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
String ticket = matcher.group();
matcher.appendReplacement(sb, "<" + ticket + ">");
}
matcher.appendTail(sb);
System.out.println(sb);

Categories