Java replaceAll with regex only replacing first instance - java

I am trying to update all url's in a CSS string and my regex only seems to get the first one. I want to get anything like:
url("file")
url('file');
url(file);
I also want to exclude things where the url is data:
url("data: ...");
url('data: ...');
url(data: ...);
I wrote some code to do this, but it only replaces the first one:
String str = ".ff0{font-family:sans-serif;visibility:hidden;}#font-face{font-family:ff1;src:url(f1.woff)format(\"woff\");}.ff1{font-family:ff1;line-height:1.330566;font-style:normal;font-weight:normal;visibility:visible;}#font-face{font-family:ff2;src:url(f2.woff)format(\"woff\");}.ff2{font-family:ff2;line-height:1.313477;font-style:normal;font-weight:normal;visibility:visible;}#font-face{font-family:ff3;src:url(f3.woff)format(\"woff\");}.ff3{font-family:ff3;line-height:1.386719;font-style:normal;font-weight:normal;visibility:visible;}#font-face{font-family:ff4;src:url()format(\"woff\");";
str = str.replaceAll("url\\((['\"]?)(?!data)(.*)\\1\\)","url(someURL/$2)");
out.println(str);
Any ideas on how to fix? I imagine it has something to do with the regex.

You probably want to use non-greedy quantifier (*? instead of *).
To exclude the data entries properly, also use possessive quantifier for capturing the qoutes: ?+ instead of ?.
So your regex should look as follows:
url\((['"]?+)(?!data)(.*?)\1\)
Note that you should probably escape some characters with extra slash as you did in your example.

Your .* is greedy. It's capturing to the end of the string. Use .*?, instead, which will force the engine to capture as few characters as possible.
str = str.replaceAll("url\\((['\"]?)(?!data)(.*?)\\1\\)","url(someURL/$2)");

Something like this should work:
~\((?!.*data).+\)~

Related

How to use string.replaceAll to change everything after a certain word

I have the following string: http://localhost:somePort/abc/soap/1.0
I want the string to just look like this: http://localhost:somePort/abc.
I want to use string.replaceAll but can't seem to get the regex right. My code looks like this: someString.replaceAll(".*\\babc\\b.*", "abc");
I'm wondering what I'm missing? I don't want to split the string or use .replaceFirst, as many solutions suggest.
It would seem to make more sense to use substring, but if you must use replaceAll, here's a way to do it.
You want to replace /abc and everything after it with just /abc.
string = string.replaceAll("/abc.*", "/abc")
If you want to be more discriminating you can include a word boundary after abc, giving you
string = string.replaceAll("/abc\\b.*", "/abc")
Just for explanation on the given regex, why it wont work:
\b \b - word boundaries are not required here and also as .* is added in the beginning it matches the whole string and when you try to replace it with "abc" it will replace the entire match with "abc". Hence you get the wrong answer. Instead, only try to match what is required and then whatever is matched that will be replaced with "abc" string.
someString.replaceAll("/abc.*", "/abc");
/abc.* - Looks specifically for /abc followed by 0 or more characters
/abc - Replaces the above match with /abc
You should use replaceFirst since after first match you are removing all after
text= text.replaceFirst("/abc.*", "/abc");
Or
You can use indexOf to get the index of certain word and then get substring.
String findWord = "abc";
text = text.substring(0, text.indexOf(findWord) + findWord.length());

I want to Capture a alphanumeric group without underscore

I want to Capture an alphanumeric group in regex such that it does not capture starting underscore. For example _reverse(abc) should return reverse(. I am using (?<name>\w+) but it return _reverse(.
You can try this,
[^a-zA-Z0-9()\\s+]
The output will be reverse(abc)
You can specify characters explicitly, e.g.:
[a-zA-Z0-9]+
From what you are showing, I assume you want to strip underscores and content behind the opening parentheses.
Basically, that should work with a regex like this:
"_([a-zA-Z0-9]+\()"
this can be used in conjunction with a Matcher to extract all capturing groups (in this case, [a-zA-Z0-9]+\() and return them.
Note that you can find almost all the help you need with Regular Expressions on utility sites like RegEx 101 and RegEx Per, the latter being a nice visualizer but only working with javaScript-like expressions.
Also, RegEx 101 contains a Regex Debugger to help avoid dangerous regular expressions

Strip out every occurrence using regex

I want to strip out every occurrence of (title) from a string like below. How do I write a regex for that? I tried a regex like below but it doesn't work.
String ruler1="115.28(54)(title) is renumbered 115.363(title) and amended to read:";
Pattern rulerPattern1 = Pattern.compile("(.*)\\(title\\)(.*)", Pattern.MULTILINE);
System.out.println(rulerPattern1.matcher(ruler1).replaceAll(""));
The regex is much simpler than that - all you need is to escape parentheses, like this:
\\(title\\)
You do not need to use the Pattern class explicitly, because replaceAll takes a regular expression.
String ruler1="115.28(54)(title) is renumbered 115.363(title) and amended to read:";
String result = ruler1.replaceAll("\\(title\\)", "");
Your pattern replaces everything in a string that contains "(title)"
Here is a demo on ideone.
Just use what String has to offer:
System.out.println(ruler1.replace("(title)", ""));
DO NOT be fooled by its name vs .replaceAll(), it is very misleading:
.replace() does NOT use regexes;
.replace() DOES replace all occurrences.
Given what you need to do, it is a perfect fit. Javadoc for .replace()
I don't think regex is a great solution for something so simple. Try StringUtils.replace() from the Apache commons-lang package.
String result = StringUtils.replace(ruler1,"(title)","");

Replace string inside tags?

I want to replace a content inside some tags, eg:
<p>this it to be replaced</p>
I could extract the content between with groups like this, but can i actually replace the group?
str = str.replaceAll("<p>([^<]*)</p>", "replacement");
You can use lookaround (positive lookahead and lookbehind) for this:
Change the regex to: "(?<=<p>)(.*?)(?=</p>)" and you will be fine.
Example:
String str = "<p>this it to be replaced</p>";
System.out.println(str.replaceAll("(?<=<p>)(.*?)(?=</p>)", "replacement"));
Output:
<p>replacement</p>
Note however that if you are parsing HTML you should be using some kind of a HTML parser, often regular expressions is not good enough...
Change the regex to this:
(?<=<p>).*?(?=</p>)
ie
str = str.replaceAll("(?<=<p>).*?(?=</p>)", "replacement");
This uses a "look behind" and a "look ahead" to assert, but not capture, input before/after the matching (non-greedy) regex
Just in case anyone is wondering, this answer is different to dacwe's: His uses unnecessary brackets. This answer is the more elegant :)

Java: regex - how do i get the first quote text

As a beginner with regex i believe im about to ask something too simple but ill ask anyway hope it won't bother you helping me..
Lets say i have a text like "hello 'cool1' word! 'cool2'"
and i want to get the first quote's text (which is 'cool1' without the ')
what should be my pattern? and when using matcher, how do i guarantee it will remain the first quote and not the second?
(please suggest a solution only with regex.. )
Use this regular expression:
'([^']*)'
Use as follows: (ideone)
Pattern pattern = Pattern.compile("'([^']*)'");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Or this if you know that there are no new-line characters in your quoted string:
'(.*?)'
when using matcher, how do i guarantee it will remain the first quote and not the second?
It will find the first quoted string first because it starts seaching from left to right. If you ask it for the next match it will give you the second quoted string.
If you want to find first quote's text without the ' you can/should use Lookahead and Lookbehind mechanism like
(?<=').*?(?=')
for example
System.out.println("hello 'cool1' word! 'cool2'".replaceFirst("(?<=').*?(?=')", "ABC"));
//out -> hello 'ABC' word! 'cool2'
more info
You could just split the string on quotes and get the second piece (which will be between the first and second quotes).
If you insist on regex, try this:
/^.*?'(.*?)'/
Make sure it's set to multiline, unless you know you'll never have newlines in your input. Then, get the subpattern from the result and that will be your string.
To support double quotes too:
/^.*?(['"])(.*?)\1/
Then get subpattern 2.

Categories