RegExp - Replace exact String [duplicate] - java

This question already has answers here:
Regex match entire words only
(7 answers)
Closed 5 years ago.
I need a help in regular expression, I am trying to replace "Inc" with "LLC" for many string.
I used this code in java :
String newValue = newValue.replaceAll("(?i)Inc[^\\w]","LLC");
But the result is incorrect, one char is removed after replacing :
Exp: CONSULTANTS, INC. ("ABC") ==> CONSULTANTS, LLC ("ABC") // the "." was removed.
Please note that i have used [^\w] to prevent replacing "inc" in sentences like "include".

You can use \b for word boundaries if there is no special literals in the string like ^ or ():
String s = "include inc".replaceAll("(?i)\\binc\\b", "LLC");
System.out.println(s); // (?i) is for case insensitive.
Above approach would fail if you use special characters in place of inc like ^inc. You can use Pattern.quote(s) for it to work.
If there are special characters then you can use quoted literals \Q...\E. Utility method using quoted literals:
public static String replaceExact(String str, String from, String to) {
return str.replaceAll("(?<!\\w)\\Q" + from + "\\E(?!\\w)", to);
}
Note: Pattern.quote("wor^d") will return \\Qwor^d\\E which is same as above regex.

Related

I can not use regex to replace text contain "\" character [duplicate]

This question already has answers here:
How to escape text for regular expression in Java?
(8 answers)
Closed 2 years ago.
I have a snippet
public static void main(String[] args) {
// replacement text
String replacement = "Not Set";
// text
String text = "Item A \\(XXX\\) Lock"; // text is "Item A\(XXX\)Lock"
String regex = "\\(XXX\\)"; // regex is "\(XXX\)"
// result text
String resultText = text.replaceAll(regex, replacement);
System.out.println("Result text: " + resultText);
}
resultText is "Item A \(XXX\) Lock" -> I can not replace "\(XXX\)" to "Not Set".
Please help me if you know about this problem.
The regex language has its own escape sequence on top of the Java string literal escape sequence. So to match a backslash, you need \\ in the regex and thus \\\\ in the Java string literal
In this case you could also use Pattern.quote
text.replaceAll(Pattern.quote(regex), Matcher.quoteReplacement(replacement));
The characters \, ( and ) all have a special meaning when used in a regular expression. But you don't want to use them with their special meaning, which means you have to escape them in the regular expression. That means preceding them with \, to tell the regular expression processor not to invoke the special meaning of those characters.
In other words, a regular expression containing \\ will match \, a regular expression containing
\( will match ( and so on.
To match \(XXX\), the regular expression you want will be \\\(XXX\\\) - see how there's an extra \ for each \, ( and ) that you want to match. But to specify this regular expression in a Java String literal, you need to write \\ in place of each \. That is, you need to write
"\\\\\\(XXX\\\\\\)". There are six \ characters in each little run of them.
String regex = "\\\\\\(XXX\\\\\\)";
String resultText = text.replaceAll(regex, replacement);

Empty Strings within a non empty String [duplicate]

This question already has answers here:
Replace with empty string replaces newChar around all the characters in original string
(4 answers)
Closed 6 years ago.
I'm confused with a code
public class StringReplaceWithEmptyString
{
public static void main(String[] args)
{
String s1 = "asdfgh";
System.out.println(s1);
s1 = s1.replace("", "1");
System.out.println(s1);
}
}
And the output is:
asdfgh
1a1s1d1f1g1h1
So my first opinion was every character in a String is having an empty String "" at both sides. But if that's the case after 'a' (in the String) there should be two '1' coming in the second line of output (one for end of 'a' and second for starting of 's').
Now I checked whether the String is represented as a char[] in these links In Java, is a String an array of chars? and String representation in Java I got answer as YES.
So I tried to assign an empty character '' to a char variable, but its giving me a compiler error,
Invalid character constant
The same process gives a compiler error when I tried in char[]
char[] c = {'','a','','s'}; // CTE
So I'm confused about three things.
How an empty String is represented by char[] ?
Why I'm getting that output for the above code?
How the String s1 is represented in char[] when it is initialized first time?
Sorry if I'm wrong at any part of my question.
Just adding some more explanation to Tim Biegeleisen answer.
As of Java 8, The code of replace method in java.lang.String class is
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
Here You can clearly see that the string is replaced by Regex Pattern matcher and in regex "" is identified by Zero-Length character and it is present around any Non-Zero length character.
So, behind the scene your code is executed as following
Pattern.compile("".toString(), Pattern.LITERAL).matcher("asdfgh").replaceAll(Matcher.quoteReplacement("1".toString()));
The the output becomes
1a1s1d1f1g1h1
Going with Andy Turner's great comment, your call to String#replace() is actually implemented using String#replaceAll(). As such, there is a regex replacement happening here. The matches occurs before the first character, in between each character in the string, and after the last character.
^|a|s|d|f|g|h|$
^ this and every pipe matches to empty string ""
The match you are making is a zero length match. In Java's regex implementation used in String.replaceAll(), this behaves as the example above shows, namely matching each inter-character position and the positions before the first and after the last characters.
Here is a reference which discusses zero length matches in more detail: http://www.regexguru.com/2008/04/watch-out-for-zero-length-matches/
A zero-width or zero-length match is a regular expression match that does not match any characters. It matches only a position in the string. E.g. the regex \b matches between the 1 and , in 1,2.
This is because it does a regex match of the pattern/replacement you pass to the replace().
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
Replaces each substring of this string that matches the literal target
sequence with the specified literal replacement sequence. The
replacement proceeds from the beginning of the string to the end, for
example, replacing "aa" with "b" in the string "aaa" will result in
"ba" rather than "ab".
Parameters:
target The sequence of char values
to be replaced
replacement The replacement sequence of char values
Returns: The resulting string
Throws: NullPointerException if target
or replacement is null.
Since:
1.5
Please read more at the link below ... (Also browse through the source code).
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/String.java#String.replace%28java.lang.CharSequence%2Cjava.lang.CharSequence%29
A regex such as "" would match every possible empty string in a string. In this case it happens to be every empty space at the start and end and after every character in the string.

Recursive replaceAll java [duplicate]

This question already has answers here:
Regex to replace repeated characters
(2 answers)
Closed 6 years ago.
I am trying to replace all the repeated characters from a String in Java, and let only one.
For example:
aaaaa ---> a
For that, I have tried using the replaceAll method:
"aaaaa".replaceAll("a*","a") //returns "aa"
I have developed a recursive method, which is probably not very efficient:
public String recursiveReplaceAll(String original,String regex, String replacement) {
if (original.equals(original.replaceAll(regex, replacement))) return original;
return recursiveReplaceAll(original.replaceAll(regex, replacement),regex,replacement);
}
This method works, I was just wondering if there was anything using RegEx for example, which does the work with better performance.
Your replaceAll approach was nearly right - it's just that * matches 0 occurrences. You want + to mean "one or more".
"aaaaa".replaceAll("a+","a") // Returns "a"
You can do it without recursion. The regular expression "(.)\\1+" will capture every character followed by themselves at least once, and it replaces them with the captured character. Thus, this removes any repeated characters.
public static void main(String[] args) {
String str = "aaaabbbaaa";
String result = str.replaceAll("(.)\\1+", "$1");
System.out.println(result); // prints "aba".
}
With this, it works for all characters.

String.split() not working with "[]" [duplicate]

This question already has answers here:
Why can't I split a string with the dollar sign?
(6 answers)
Closed 7 years ago.
I have a IPv6 string
String str = "demo1 26:11:d0a2:f020:0:0:0:a3:2123 demo2";
String searchString = "26:11:d0a2:f020:0:0:0:a3:2123";
When i use str.split(searchString) code returns
["demo1 ", " demo2"]
Which is fine but when i use:
String str = "demo1 [26:11:d0a2:f020:0:0:0:a3]:2123 demo2";
String searchString = "[26:11:d0a2:f020:0:0:0:a3]:2123";
and do str.split(searchString) it reutrns
[demo1 [26:11:d0a2:f020:0:0:0:a3]:2123 demo2]
Which is wrong i guess , can some one tell why I am getting this sort of output?
Since split function takes a regex as parameter, you need to escape those brackets otherwise this [26:11:d0a2:f020:0:0:0:a3] would match a single character only.
String searchString = "\\[26:11:d0a2:f020:0:0:0:a3\\]:2123";
str.split(searchString);
It is happening because split(String str) take regex pattern string as argument. And that string will be used as regex pattern to match all the delimiter with this pattern.
In your regex pattern you are providing character sets in [].
To make it work your way you will have to use this regex pattern string :
\[26:11:d0a2:f020:0:0:0:a3\]:2123
i.e. in java :
String searchString = "\\[26:11:d0a2:f020:0:0:0:a3\\]:2123";
I hope you are familiar with the string regexs. In java, the regex [abc] means match with a OR b OR c I encourage you to escape your square brackets try:
String str = "demo1 [26:11:d0a2:f020:0:0:0:a3]:2123 demo2";
String searchString = "\\[26:11:d0a2:f020:0:0:0:a3\\]:2123";
You have to use an escape sequence for some special characters. Use \\[ ... \\] in the searchString variable.

Tell if string contains a-z chars [duplicate]

This question already has answers here:
How to check whether a string contains at least one alphabet in java?
(2 answers)
Closed 8 years ago.
I very new to programming. I want to check if a string s contains a-z characters. I use:
if(s.contains("a") || s.contains("b") || ... {
}
but is there any way for this to be done in shorter code? Thanks a lot
You can use regular expressions
// to emulate contains, [a-z] will fail on more than one character,
// so you must add .* on both sides.
if (s.matches(".*[a-z].*")) {
// Do something
}
this will check if the string contains at least one character a-z
to check if all characters are a-z use:
if ( ! s.matches(".*[^a-z].*") ) {
// Do something
}
for more information on regular expressions in java
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
In addition to regular expressions, and assuming you actually want to know if the String doesn't contain only characters, you can use Character.isLetter(char) -
boolean hasNonLetters = false;
for (char ch : s.toCharArray()) {
if (!Character.isLetter(ch)) {
hasNonLetters = true;
break;
}
}
// hasNonLetters is true only if the String contains something that isn't a letter -
From the Javadoc for Character.isLetter(char),
A character is considered to be a letter if its general category type, provided by Character.getType(ch), is any of the following:
UPPERCASE_LETTER
LOWERCASE_LETTER
TITLECASE_LETTER
MODIFIER_LETTER
OTHER_LETTER
Use Regular Expressions. The Pattern.matches() method can do this easily. For example:
Pattern.matches("[a-z]", "TESTING STRING a");
If you need to check a great number of string this class can be compiled internally to improve performance.
Try this
Pattern p = Pattern.compile("[a-z]");
if (p.matcher(stringToMatch).find()) {
//...
}

Categories