Strip out every occurrence using regex - java

I want to strip out every occurrence of (title) from a string like below. How do I write a regex for that? I tried a regex like below but it doesn't work.
String ruler1="115.28(54)(title) is renumbered 115.363(title) and amended to read:";
Pattern rulerPattern1 = Pattern.compile("(.*)\\(title\\)(.*)", Pattern.MULTILINE);
System.out.println(rulerPattern1.matcher(ruler1).replaceAll(""));

The regex is much simpler than that - all you need is to escape parentheses, like this:
\\(title\\)
You do not need to use the Pattern class explicitly, because replaceAll takes a regular expression.
String ruler1="115.28(54)(title) is renumbered 115.363(title) and amended to read:";
String result = ruler1.replaceAll("\\(title\\)", "");
Your pattern replaces everything in a string that contains "(title)"
Here is a demo on ideone.

Just use what String has to offer:
System.out.println(ruler1.replace("(title)", ""));
DO NOT be fooled by its name vs .replaceAll(), it is very misleading:
.replace() does NOT use regexes;
.replace() DOES replace all occurrences.
Given what you need to do, it is a perfect fit. Javadoc for .replace()

I don't think regex is a great solution for something so simple. Try StringUtils.replace() from the Apache commons-lang package.
String result = StringUtils.replace(ruler1,"(title)","");

Related

How to use string.replaceAll to change everything after a certain word

I have the following string: http://localhost:somePort/abc/soap/1.0
I want the string to just look like this: http://localhost:somePort/abc.
I want to use string.replaceAll but can't seem to get the regex right. My code looks like this: someString.replaceAll(".*\\babc\\b.*", "abc");
I'm wondering what I'm missing? I don't want to split the string or use .replaceFirst, as many solutions suggest.
It would seem to make more sense to use substring, but if you must use replaceAll, here's a way to do it.
You want to replace /abc and everything after it with just /abc.
string = string.replaceAll("/abc.*", "/abc")
If you want to be more discriminating you can include a word boundary after abc, giving you
string = string.replaceAll("/abc\\b.*", "/abc")
Just for explanation on the given regex, why it wont work:
\b \b - word boundaries are not required here and also as .* is added in the beginning it matches the whole string and when you try to replace it with "abc" it will replace the entire match with "abc". Hence you get the wrong answer. Instead, only try to match what is required and then whatever is matched that will be replaced with "abc" string.
someString.replaceAll("/abc.*", "/abc");
/abc.* - Looks specifically for /abc followed by 0 or more characters
/abc - Replaces the above match with /abc
You should use replaceFirst since after first match you are removing all after
text= text.replaceFirst("/abc.*", "/abc");
Or
You can use indexOf to get the index of certain word and then get substring.
String findWord = "abc";
text = text.substring(0, text.indexOf(findWord) + findWord.length());

What should be the Regex Pattern

Hi I have a String like "ANCBTH2016100931011730300000458" which always start with ANCBTH followed by Numbers.
What can be a regex to match a word which may or may not have spaces and the position of space is also not fixed.
Example:
ANCBTH2016100931011730300000458
ANCBTH 2016100931011730300000458
ANCBTH 20161009 31011730300000458
I would like to have a regex which satisfied all above examples.
You can test this regex: ANCBTH((?:\s?\d+)*)
To test: regex101
Why don't you remove all the spaces from incoming string e.g.
String yourString = "ANCBTH 20161009 31011730300000458";
yourString.replaceAll("\\s+","");
yourString will become like ANCBTH2016100931011730300000458
and then you may run a regex like
ANCBTH[0-9]*
and all of your string with spaces anywhere will pass this pattern.

Java replaceAll with regex only replacing first instance

I am trying to update all url's in a CSS string and my regex only seems to get the first one. I want to get anything like:
url("file")
url('file');
url(file);
I also want to exclude things where the url is data:
url("data: ...");
url('data: ...');
url(data: ...);
I wrote some code to do this, but it only replaces the first one:
String str = ".ff0{font-family:sans-serif;visibility:hidden;}#font-face{font-family:ff1;src:url(f1.woff)format(\"woff\");}.ff1{font-family:ff1;line-height:1.330566;font-style:normal;font-weight:normal;visibility:visible;}#font-face{font-family:ff2;src:url(f2.woff)format(\"woff\");}.ff2{font-family:ff2;line-height:1.313477;font-style:normal;font-weight:normal;visibility:visible;}#font-face{font-family:ff3;src:url(f3.woff)format(\"woff\");}.ff3{font-family:ff3;line-height:1.386719;font-style:normal;font-weight:normal;visibility:visible;}#font-face{font-family:ff4;src:url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI1IiBoZWlnaHQ9IjUiPgo8cmVjdCB3aWR0aD0iNSIgaGVpZ2h0PSI1IiBmaWxsPSIjOWU5ZTllIj48L3JlY3Q+CjxwYXRoIGQ9Ik0wIDVMNSAwWk02IDRMNCA2Wk0tMSAxTDEgLTFaIiBzdHJva2U9IiM4ODgiIHN0cm9rZS13aWR0aD0iMSI+PC9wYXRoPgo8L3N2Zz4=)format(\"woff\");";
str = str.replaceAll("url\\((['\"]?)(?!data)(.*)\\1\\)","url(someURL/$2)");
out.println(str);
Any ideas on how to fix? I imagine it has something to do with the regex.
You probably want to use non-greedy quantifier (*? instead of *).
To exclude the data entries properly, also use possessive quantifier for capturing the qoutes: ?+ instead of ?.
So your regex should look as follows:
url\((['"]?+)(?!data)(.*?)\1\)
Note that you should probably escape some characters with extra slash as you did in your example.
Your .* is greedy. It's capturing to the end of the string. Use .*?, instead, which will force the engine to capture as few characters as possible.
str = str.replaceAll("url\\((['\"]?)(?!data)(.*?)\\1\\)","url(someURL/$2)");
Something like this should work:
~\((?!.*data).+\)~

Replace string inside tags?

I want to replace a content inside some tags, eg:
<p>this it to be replaced</p>
I could extract the content between with groups like this, but can i actually replace the group?
str = str.replaceAll("<p>([^<]*)</p>", "replacement");
You can use lookaround (positive lookahead and lookbehind) for this:
Change the regex to: "(?<=<p>)(.*?)(?=</p>)" and you will be fine.
Example:
String str = "<p>this it to be replaced</p>";
System.out.println(str.replaceAll("(?<=<p>)(.*?)(?=</p>)", "replacement"));
Output:
<p>replacement</p>
Note however that if you are parsing HTML you should be using some kind of a HTML parser, often regular expressions is not good enough...
Change the regex to this:
(?<=<p>).*?(?=</p>)
ie
str = str.replaceAll("(?<=<p>).*?(?=</p>)", "replacement");
This uses a "look behind" and a "look ahead" to assert, but not capture, input before/after the matching (non-greedy) regex
Just in case anyone is wondering, this answer is different to dacwe's: His uses unnecessary brackets. This answer is the more elegant :)

java Regex - split but ignore text inside quotes?

using only regular expression methods, the method String.replaceAll and ArrayList
how can i split a String into tokens, but ignore delimiters that exist inside quotes?
the delimiter is any character that is not alphanumeric or quoted text
for example:
The string :
hello^world'this*has two tokens'
should output:
hello
worldthis*has two tokens
I know there is a damn good and accepted answer already present but I would like to add another regex based (and may I say simpler) approach to split the given text using any non-alphanumeric delimiter which not inside the single quotes using
Regex:
/(?=(([^']+'){2})*[^']*$)[^a-zA-Z\\d]+/
Which basically means match a non-alphanumeric text if it is followed by even number of single quotes in other words match a non-alphanumeric text if it is outside single quotes.
Code:
String string = "hello^world'this*has two tokens'#2ndToken";
System.out.println(Arrays.toString(
string.split("(?=(([^']+'){2})*[^']*$)[^a-zA-Z\\d]+"))
);
Output:
[hello, world'this*has two tokens', 2ndToken]
Demo:
Here is a live working Demo of the above code.
Use a Matcher to identify the parts you want to keep, rather than the parts you want to split on:
String s = "hello^world'this*has two tokens'";
Pattern pattern = Pattern.compile("([a-zA-Z0-9]+|'[^']*')+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
See it working online: ideone
You cannot in any reasonable way. You are posing a problem that regular expressions aren't good at.
Do not use a regular expression for this. It won't work. Use / write a parser instead.
You should use the right tool for the right task.

Categories