How to delete <script>..</script> by regexp in Java? - java

Now I have this:
String s = "1<script type='text/javascript'>2</script>3<script type='text/javascript'>3</script>5";
Pattern pattern = Pattern.compile("<script.*</script>");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
s = s.replace(matcher.group(), "");
}
System.out.println(s);
The result is
15
But I need
135
In PHP we have /U modificator, but what should I do in Java? I thought about sth like this, but it is incorrect:
Pattern pattern = Pattern.compile("<script[^(script)]*</script>");

<script([^>]*)?>.*?<\/script>
Try this.You needed a ? for lazy match or shorter match.
See demo.
http://regex101.com/r/kO7lO2/3

replaceAll the below regex by empty string:
<script [^>]*>[^<]*</script>

Related

Parsing a String using Java regex

I have the below java string in the below format.
externalCustomerID: { \"custToken\": \"xyz\" }
I want to extract xyz value from above string.
can anyone suggest me any regex expression for that in java?
check this one
Pattern pattern = Pattern.compile("(\\w+: \\{ \"\\w+\": \")(\\w+)");
Matcher matcher = pattern.matcher("externalCustomerID: { \"custToken\": \"xyz\" }");
if (matcher.find()) {
System.out.println(matcher.group(2));
}

JAVA regex to find string

i have a string like this:
font-size:36pt;color:#ffffff;background-color:#ff0000;font-family:Times New Roman;
How can I get the value of the color and the value of background-color?
color:#ffffff;
background-color:#ff0000;
i have tried the following code but the result is not my expected.
Pattern pattern = Pattern.compile("^.*(color:|background-color:).*;$");
The result will display:
font-size:36pt; color:#ffffff; background-color:#ff0000; font-family:Times New Roman;
If you want to have multiple matches in a string, don't assert ^ and $ because if those matches, then the whole string matches, which means that you can't match it again.
Also, use a lazy quantifier like *?. This will stop matching as soon as it finds some string that matches the pattern after it.
This is the regex you should use:
(color:|background-color:)(.*?);
Group 1 is either color: or background-color:, group 2 is the color code.
Demo
To do this you should use the (?!abc) expression in regex. This finds a match but doesn't select it. After that you can simply select the hexcode, like this:
String s = "font-size:36pt;color:#ffffff;background-color:#ff0000;font-family:Times New Roman";
Pattern pattern = Pattern.compile("(?!color:)#.{6}");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group());
}
Pattern pattern = Pattern.compile("color\\s*:\\s*([^;]+)\\s*;\\s*background-color\\s*:\\s*([^;]+)\\s*;");
Matcher matcher = pattern.matcher("font-size:36pt; color:#ffffff; background-color:#ff0000; font-family:Times New Roman;");
if (matcher.find()) {
System.out.println("color:" + matcher.group(1));
System.out.println("background-color:" + matcher.group(2));
}
No need to describe the whole input, only the relevant part(s) that you're looking to extract.
The regex color:(#[\\w\\d]+); does the trick for me:
String input = "font-size:36pt;color:#ffffff;background-color:#ff0000;font-family:Times New Roman;";
String regex = "color:(#[\\w\\d]+);";
Matcher m = Pattern.compile(regex).matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Notice that m.group(1) returns the matching group which is inside the parenthesis in the regex. So the regex actually matches the whole color:#ffffff; and color:#ff0000; parts, but the print only handles the number itself.
Use a CSS parser like ph-css
String input = "font-size:36pt; color:#ffffff; background-color:#ff0000; font-family:Times New Roman;";
final CSSDeclarationList cssPropertyList =
CSSReaderDeclarationList.readFromString(input, ECSSVersion.CSS30);
System.out.println(cssPropertyList.get(1).getProperty() + " , "
+ cssPropertyList.get(1).getExpressionAsCSSString());
System.out.println(cssPropertyList.get(2).getProperty() + " , "
+ cssPropertyList.get(2).getExpressionAsCSSString());
Prints:
color , #ffffff
background-color , #ff0000
Find more about ph-css on github

Extract json data from given string

I am having a string something like this :
a.b.c.d.e =
{"altImages":2,"available":1,"availableColorCount":3};
Now I only need to fetch :
{"altImages":2,"available":1,"availableColorCount":3}
What should be regex expression to extract that part from given string. Please help
My Try :
(?smi)a.b.c.d\\(.*\"e\"=(.*?)\\}\\);.*
But its not helping around.
Try this:
.+\s*=\s*({(?:.+:.+,?)+})(?=;)
You can use something like:
.*?\n(.*);
Here is the version with named groups:
String text = "a.b.c.d.e = \n{\"altImages\":2,\"available\":1,\"availableColorCount\":3};";
Pattern pattern = Pattern.compile(".*?\n(?<JSON>.*);");
Matcher matcher = pattern.matcher(text);
if (matcher.matches()) {
System.out.println(matcher.group("JSON"));
}

Regex - find data inside left and right encloses

I have this string:
text=123+456+789&xxxxxxxxx&yyyyyyyyyy&zzzzzzzzzzz
I need to extract 123+456+789
What I done so far is:
String s = "text=123+456+789&xxxxxxxxx&yyyyyyyyyy&zzzzzzzzzzz";
String ps = "text=(.*)&";
Pattern p = Pattern.compile(ps);
Matcher m = p.matcher(s);
if (m.find()){
System.out.println(m.group(0));
System.out.println(m.group(1));
}
And I got all text until the last & which is: 123+456+789&xxxxxxxxx&yyyyyyyyyy while the requested output is: 123+456+789
Any suggestions how to fix it (regex is mandatory)?
Use a negated character class:
String ps = "text=([^&]*)";
The value you need will be in Group 1.
The [^&] matches any character but an ampersand.
You almost getting, you need to make your regex lazy (or non greedy) like this:
String ps = "text=(.*?)&";
here ---^
Working demo
Try this regex :
([0-9+]+)
Link : https://regex101.com/r/xU2zF4/1
java code :
String s = "text=123+456+789&xxxxxxxxx&yyyyyyyyyy&zzzzzzzzzzz";
String ps = "([0-9+]+)";
Pattern p = Pattern.compile(ps);
Matcher m = p.matcher(s);
if (m.find()){
System.out.println(m.group(0)); // value of s
System.out.println(m.group(1)); // returns 123+456+789
}

How do I use regex in Java to pull this from html?

I'm trying to pull data from the ESPN box scores, and one of the html files has:
<td style="text-align:left" nowrap>Channing Frye, PF</td>
and I'm only interested in grabbing the name (Channing Frye) and the position (PF)
Right now, I've been using Pattern.quote(start) + "(.*?)" + Pattern.quote(end) to grab text in between start and end, but I'm not sure how I'm supposed to grab text that starts with pattern .../http://espn.go.com/nba/player/_/id/ and then can contain (any integer)/anyfirst-anylast"> then grab the name I need (Channing Frye), then </a>, and then grab the position I need (PF) and ends with pattern </td>
Thanks!
Here is the pattern:
http://espn.go.com/nba/player/_/id/(\d+)/([\w-]+)">(.*?)</a>,\s*(\w+)</td>
You can use this tool - http://www.regexplanet.com/advanced/java/index.html for verifying regular expressions.
You could use this pattern:
\\/nba\\/player\\/_\\/.*\\\">(.*)<.+>,\\s(.*)<
This will match any link in the html that contains `/nba/player/
String re = "\\/nba\\/player\\/_\\/.*\\">(.*)<.+>,\\s(.*)<";
String str = "<td style=\"text-align:left\" nowrap>Channing Frye, PF</td>";
Pattern p = Pattern.compile(re, Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(str);
example: http://regex101.com/r/hA3uV0
Use this regex:
[A-Z\sa-z0-9]+(?=</a>)|\w+(?=</td>)
Here is one regex:
. is used for any item, .+ is used for any 1+ items
.* means o or more items
\s is used for space
String str = "<td style=\"text-align:left\" nowrap>Channing Frye, PF</td>";
Pattern pattern = Pattern.compile("<td.+>.*<a.+>(.+)</a>[\\s,]+(.+)</td>");
Matcher matcher = pattern.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
You can use :
String lString = "<td style=\"text-align:left\" nowrap>Channing Frye, PF</td>";
Pattern lPattern = Pattern.compile("<td.+><a.+id/\\d+/.+\\-.+>(.+)</a>, (.+)</td>");
Matcher lMatcher = lPattern.matcher(lString);
while(lMatcher.find()) {
System.out.println(lMatcher.group(1));
System.out.println(lMatcher.group(2));
}
This will give you :
Channing Frye
PF

Categories