Java regex text replace - java

I have text like this
Some. / text to-match (1)
I wanna replace ./() for _ has next
Some_text_to_match_1
How do it the pattern?

You may trim the string from non-word chars on both ends (with .replaceAll("^\\W+|\\W+$", "")), and then replace 1 or more non-word character chunks with _ inside the string (with .replaceAll("\\W+", "_")):
String s = "Some. / text to-match (1)";
s = s.replaceAll("^\\W+|\\W+$", "").replaceAll("\\W+", "_");
System.out.println(s);
See the Java demo
Details:
\W matches a non-word character
+ matches 1 or more occurrences of the subpattern this quantifier modifies.
Since we need to use 2 different replacements when trimming the string and then replacing non-word chars inside it, we cannot use just 1 replaceAll.

Related

Matching whole words with special characters with a dynamically built pattern

I need to match an exact substring in a string in Java. I've tried with
String pattern = "\\b"+subItem+"\\b";
But it doesn't work if my substring contains non alphanumerical characters.
I want this to work exactly as the "Match whole word only" function in Notepad++.
Could you help?
I suggest either unambigous word boundaries (that match a string only if the search pattern is not enclosed with letters, digits or underscores):
String pattern = "(?<!\\w)"+Pattern.quote(subItem)+"(?!\\w)";
where (?<!\w) matches a location not preceded with a word char and (?!\w) fails if there is no word char immediately after the current position (see this regex demo), or, you can use a variation that takes into account leading/trailing special chars of the potential match:
String pattern = "(?:\\B(?!\\w)|\\b(?=\\w))" + Pattern.quote(subword) + "(?:(?<=\\w)\\b|(?<!\\w)\\B)";
See the regex demo.
Details:
(?:\B(?!\w)|\b(?=\w)) - either a non-word boundary if the next char is not a word char, or a word boundary if the next char is a word char
Data\[3\] - this is a quoted subItem
(?:(?<=\w)\b|(?<!\w)\B) - either a word boundary if the preceding char is a word char, or a non-word boundary if the preceding char is not a word char.

Java regex to replace all special characters in a String with an underscore also considering removing leading,trailing,multiple underscores

I would need a regular expression to replace all the special characters considering multiple with a single underscore and also not to add trailing and leading underscore if the String contains trailing and leading special characters, I have tried the following but it doesn't seem to work.
String myDefaultString = "_###%Default__$*_123_"
myDefaultString.replaceAll("[\\p{Punct}&&[^_]]", "_")
My eventual result should be Default_123 where the regular expression needs to consider leading underscore and remove them keeping the underscore in between Default and 123 but also should remove trailing and multiple underscores in between the String.
Also tried the following regex
myDefaultString.replaceAll("[^a-zA-Z0-9_.]+", "_")
But does not seem to work, is what I'm trying to achieve very complicated or it there a better way to do it?
You may use this regex in replaceAll:
String str = "_###%Default__$*_123_";
str = str.replaceAll("[\\p{Punct}&&[^_]]+|^_+|\\p{Punct}+(?=_|$)", "");
//=> "Default_123"
RegEx Demo
RegEx Details:
[\\p{Punct}&&[^_]]+: Match 1+ punctuation characters that are not _
|: OR
^_+: Match 1+ underscores at start
|: OR
\\p{Punct}+(?=_|$): Match 1+ punctuation characters if that is followed by a _ or end of string.

How to extract and replace a String with specific format?

I have input String like;
(rm01ADS21212, 'adfffddd', rmAdssssss, '1231232131', rm2321312322)
What I want to do is find all words starting with "rm" and replace them with remove function.
(remove(01ADS21212), 'adfffddd', remove(Adssssss), '1231232131', remove(2321312322))
I am trying to use replaceAll function but I don't know how to extract parts after "rm" literal.
statement.replaceAll("\\(rm*.,", "remove($1)");
Is there any way to get these parts?
You have not captured any substring with a capturing group, thus $1 is null.
You may use
.replaceAll("\\brm(\\w*)", "remove($1)")
See the regex demo
Details
\b - a word boundary (to start matching only at the start of a word)
rm - a literal part
(\w*) - Group 1: 0+ word chars (letters, digits or underscores)
The $1 in the replacement pattern stands for Group 1 value.
If you mean to match any chars other than a comma and whitespace after rm, use "\\brm([^\\s,]*)", see this regex demo.
Use "Replace" with empty string .
Eg;
string str = "(rm01ADS21212, 'adfffddd', rmAdssssss, '1231232131', rm2321312322)";
Console.WriteLine(str.Replace("rm", ""));
Output : (01ADS21212, 'adfffddd', Adssssss, '1231232131', 2321312322)

Use regex java to replace a string before and after a certain character

My effort: I tried looking at similar questions however I cannot figure out my answer. I also tried using the web (https://www.cheatography.com/davechild/cheat-sheets/regular-expressions/) to figure this out myself, but I just cant get the right answer. Tried using myString.replaceAll("_.+/[^.]*/", "");
I have a string: String myString = "hello_AD123.mp3";
And I want to use regex java in order to REMOVE everything after the underscore (including it) AND stopping before the (.mp3). How would I do this?
So I want the final result to be the following: myString = "hello.mp3";
Your regex did not work because it matched something that is missing from your string:
_ - an underscore followed with...
.+ - one or more any characters other than a line feed
/ - a literal / symbol
[^.]* - zero or more characters other than a dot
/ - a literal /.
There are no slashes in your input string.
You can use
String myString = "hello_AD123.mp3";
myString = myString.replaceFirst("_.*[.]", ".");
// OR myString = myString.replaceFirst("_[^.]*", "");
System.out.println(myString);
See the IDEONE Java demo
The pattern _[^.]* matches an underscore and then 0+ characters other than a literal dot. In case the string has dots before .mp3, "_.*[.]" matches _ up to the last ., and needs to be replaced with a ..
See the regex Demo 1 and Demo 2.
Details:
_ - matches _
[^.]* - matches zero or more (due to * quantifier) characters other than (because the negated character class is used, see [^...]) a literal dot (as . inside a character class - [...] - is treated as a literal dot character (full stop, period).
OR
.*[.] - matches 0 or more characters other than a newline up to the last literal dot (consuming the dot, thus, the replacement pattern should be ".").
The .replaceFirst() is used because we only need to perform a single search and replace operation. When the matching substring is matched, it is replaced with an empty string because the replacement pattern is "".

How to split a string without losing any word?

I am using Eclipse for Java and I want to split an input line without losing any characters.
For example, the input line is:
IPOD6 1 USD6IPHONE6 16G,64G,128G USD9,USD99,USD999MACAIR 2013-2014 USD123MACPRO 2013-2014,2014-2015 USD899,USD999
and the desired output is:
IPOD6 1 USD6
IPHONE6 16G,64G,128G USD9,USD99,USD999
MACAIR 2013-2014 USD123
MACPRO 2013-2014,2014-2015 USD899,USD999
I was using split("(?<=\\bUSD\\d{1,99}+)") but it doesn't work.
You just need to add a non-word boundary \B inside the positive look-behind. \B matches between two non-word characters or between two word characters. It won't split on the boundary which exists between USD9 and comma in this USD9, substring because there is a word boundary exits between USD9 and comma since 9 is a word character and , is a non-word character. It splits on the boundary which exists between USD6 and IPHONE6 because there is a non-word boundary \B exists between those substrings since 6 is a word character and I is also a word character.
String s = "IPOD6 1 USD6IPHONE6 16G,64G,128G USD9,USD99,USD999MACAIR 2013-2014 USD123MACPRO 2013-2014,2014-2015 USD899,USD999";
String[] parts = s.split("(?<=\\bUSD\\d{1,99}+\\B)");
for(String i: parts)
{
System.out.println(i);
}
Output:
IPOD6 1 USD6
IPHONE6 16G,64G,128G USD9,USD99,USD999
MACAIR 2013-2014 USD123
MACPRO 2013-2014,2014-2015 USD899,USD999
without making it too complicated, use this pattern
(?=IPOD|IPHONE|MAC)
and replace with new line
now it is easy to capture or split into an array
Demo
or maybe this pattern
((USD\d+,?)+)
and replace w/ $1\n
Demo

Categories