Escape any non-alphanumeric characters in a string using Regex - java

I would like to escape non-alphanumeric characters occurring in a string as follows:
Say, the original string is: "test_", I would like to transform as "test\_".
In order to do this, one approach I can take by scanning the original string, and constructing a new string and while a non-alphanumeric character is found, append a '\' in front of this character.
But I am wondering if there is any cleaner approach to do the same using regular expression.

You can use the replaceable parameter as shown below:
public class Main {
public static void main(String[] args) {
String s = "test_";
s = s.replaceAll("[^\\p{Alnum}]", "\\\\$0");
System.out.println(s);
}
}
Output:
test\_
Notes:
$0 represents the string matched by the complete regex pattern, [^\\p{Alnum}].
\p{Alnum} specifies alphanumeric character and ^ inside [] is used to negate the pattern. Learn more about patterns from the documentation.
Notice the extra pair of \\ which is to escape \ that has been used to escape \.

Related

Java String replaceAll() method using {} curly brackets

So for my app in Android Studio I want to replace the following:
String card = cards.get(count).getCard();
if (card.contains("{Player1}")) {
String replacedCard = card.replaceAll("{Player1}", "Poep");
}
An example of String card can be: {Player1} switch drinks with the person next to you.
Somehow I can't use {} for the replacing. With the { it says: "Dangling metacharacter". Screenshot: https://prnt.sc/s2bbl8
Is there a solution for this?
the first Argument of replaceAll is a String that is parsed to a regalar Expression (regEx). The braces { } are special reserved meta characters to express something within the regular expression. To match them as normal characters, you need to escape them with a leading backslash \ and because the backslash is also a special character you need to escape itself with an additional backslash:
String replacedCard = card.replaceAll("\\{Player1\\}", "Poep");
Both { } are reserved regex characters. Since the replaceAll() function takes in a regex parameter, you have to explicitly state that { and } are part of your actual string. You can do this by prefixing them with the escape character: \. But because the escape character is also a reserved character, you need to escape it too.
Here's the correct way to write your code:
String card = cards.get(count).getCard();
if (card.contains("{Player1}")) {
String replacedCard = card.replaceAll("\\{Player1\\}", "Poep");
}
You need to escape the initial { with \. I.e;
String card = "{Player1}";
if (card.contains("{Player1}")) {
String replacedCard = card.replaceAll("\\{Player1}", "Poep");
System.out.println("replace: " + replacedCard);
}
The method String.replaceAll expects a regular expression. The other answers already give a solution for this. However, if you don't need regular expressions, then you can also use String.replace:
String replacedCard = card.replace("{Player1}", "Poep");
Since the input value of the replaceAll method expects a regex, you need to escape the curly brackets with a backslash. The curly brackets are special characters in the context of regular expressions.
In Java a backslash in a regex is accomplished by a double backslash \\ (see https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html for reference).
So you would need to adjust the line like so:
String replacedCard = card.replaceAll("\\{Player1\\}", "Poep");
{} are special characters for Regular Expressions. replaceAll method takes as first parameter a Regular Expressions, so if you want also to replace the curly brackets you have to skip them with \\ , as follow:
String card = cards.get(count).getCard();
if (card.contains("{Player1}")) {
String replacedCard = card.replaceAll("\\{Player1}", "Poep");
}

Java split with special characters

I have below code that doing a split for string using <div>\\$\\$PZ\\$\\$</div> and it's not working using the special characters.
public class HelloWorld{
public class HelloWorld{
public static void main(String []args){
String str = "test<div>\\$\\$PZ\\$\\$</div>test";
String[] arrOfStr = str.split("<div>\\$\\$PZ\\$\\$</div>", 2);
for (String a : arrOfStr)
System.out.println(a);
}
}
the output os test<div>\$\$PZ\$\$</div>test
it works when I remove the special characters
Can you please help.
As you already know, the parameter to split(...) is a regular expression, so some characters have special meaning. If you want the parameter to be treated literally, i.e. not as a regex, call the Pattern.quote(String s) method.
Example
String str = "test<div>\\$\\$PZ\\$\\$</div>test";
String[] arrOfStr = str.split(Pattern.quote("<div>\\$\\$PZ\\$\\$</div>"), 2);
for (String a : arrOfStr)
System.out.println(a);
Output
test
test
The quote() method simply surrounds the literal text with the regex \Q...\E quotation pattern1, e.g. your <div>\$\$PZ\$\$</div> text becomes:
\Q<div>\$\$PZ\$\$</div>\E
For fixed text you could just do that yourself, i.e. the following 3 versions all create the same regex to split on:
str.split(Pattern.quote("<div>\\$\\$PZ\\$\\$</div>"), 2)
str.split("\\Q<div>\\$\\$PZ\\$\\$</div>\\E", 2)
str.split("<div>\\\\\\$\\\\\\$PZ\\\\\\$\\\\\\$</div>", 2)
To me, the 3rd one, using \ to escape, is the least readable/desirable version.
If there is a lot of special characters to escape, using \Q...\E is easier than \-escaping all the special characters separately, but very few people use it, so it's fairly unknown to most.
The quote() method is especially useful when you need to treat dynamic text literally, e.g. when the text to split on is configurable by the user.
1) quote() will correctly handle literal text containing \E.
This:
String str = "test<div>\\$\\$PZ\\$\\$</div>test";
String[] arrOfStr = str.split("<div>\\\\\\$\\\\\\$PZ\\\\\\$\\\\\\$</div>", 2);
for (String a : arrOfStr) {
System.out.println(a);
}
prints:
test
test
EDIT: Why do we need all those backslashes? It's because of how we need to handle String literals representing regex expressions. This page describes the reason with examples. The essence is this:
For a backslash \...
...the pattern to match that would be \\... (to escape the escape)
... but the string literal to create that pattern would have to have one backslash to escape each of the two backslashes: \\\\.
Add to that the original need to also escape the $, that gives us our 6 backslashes in the string representation.

How to split a string at only . and not if it is preceded by double slash?

i'm using the regex \/\/[.] to match //. in a String.
This is//. a .example .String
If we split the above String at dot the output should be
This is//. a
example
String
What is the regular expression for the String.split() method
You want to split a string with a dot not immediately preceded with // string.
Use
.split("(?<!//)\\.")
See the regex demo
The (?<!//) is a negative lookbehind that fails the match if there is a // text immediately to the left of the current location.
You can do it as follows:
import java.util.Arrays;
public class Main {
public static void main(String args[]) {
String str="This is//. a .example .String";
String[] strArr=str.split("[^//.]\\.");
System.out.println(Arrays.toString(strArr));
}
}
Output:
[This is//. a, example, String]
Explanation: You use negated character classes to exclude certain characters: e.g. [^abcde] matches anything but a,b,c,d,e characters.
In split() method you can pass \\. as String.split("\\.")
How about split on dot if not preceded with /
String.split("\w*(?<!//)\\.")

How to use regular expression to replace non-digits and math operators together?

How do I only keep chars of [0-9] and [+-*/] in a string in Java? My approach is to use a union to create a single character class comprised of [0-9] and [+-*/] character classes, but I got an empty string.
Here is an example string I use: 10+2*2-5
public void cleanup(String s){
String regex = "[^0-9[^+-*//]]";
String tmp = s.replaceAll(regex, "");
System.out.println(tmp);
}
You want you character class ([...]) to include the range 0-9 and the additional characters *, /, - and +. All you need is to put them one after the other and escape - (\\-), unless it's the last character. Then, use a negation construct (^) inside at the beginning:
public class Example {
public static void main(String[] args) {
String test = "a3f6+[,b7*\"d/-8u";
System.out.println(test.replaceAll("[^0-9/*+-]", ""));
}
}
Outputs
36+7*/-8
What about s/[^0-9]|[^+-*//]//g for the regex?
You have an issue there with the way you have use - in the second part, if you use - in the middle of an expression like you have in [^+-*/] it thinks that is part of a range expression like you did with 0-9 so you need to put the - at the end of the expression so that it isn't treated as a range.
The following expression should do what you are after:
[^0-9*/+-]

Java regular expression: Matches back slash character

How to macth a backslah (\) in java regular expression? I hava some sript to matching all latex tag In some file but it didnt work.
public class TestMatchTag {
public static void main(String[] args) {
String tag = "\begin";
if (Pattern.matches("\\\\[a-z]+", tag)) {
System.out.println("MATCH");
}
}
}
Replace String tag = "\begin"; with String tag = "\\begin";. The regex is valid, but your input string needs to escape \ character.
Try this,
Pattern.matches("[\\a-z]+", tag)
You need another backslash to escape the "\" in "\begin", change it to "\begin", otherwise the "\b" in your "\begin" will be considered as one character.
This should work...
Pattern.matches("\\[a-z]+", tag);
[a-z] allows any character between a-z more than once and \\ allows "\" once.
you can validate your expression online here

Categories