Im starting to learn regex and I don't know if I understand it correctly.
I have a problem with function replaceAll because it does not replace the character in a string that I want to replace.
Here is my code:
public class TestingRegex {
public static void main (String args[]) {
String string = "Hel%l&+++o_Wor_++l%d&#";
char specialCharacters[] = {'%', '%', '&', '_'};
for (char sc : specialCharacters) {
if (string.contains(sc + ""))
string = string.replaceAll(sc + "", "\\" + sc);
}
System.out.println("New String: " + string);
}
}
The output is the same as the original. Nothing changed.
I want the output to be : Hel\%l\&+++o\_Wor\_++l\%d\&\#.
Please help. Thanks in advance.
The reason why it's not working: You need four backslashes in a Java string to create a single "real" backslash.
string = string.replaceAll(sc, "\\\\" + sc);
should work. But this is not the right way to do it. You don't need a for loop at all:
String string = "Hel%l&+++o_Wor_++l%d&#";
string = string.replaceAll("[%&_]", "\\\\$0");
and you're done.
Explanation:
[%&_] matches any of the three characters you want to replace
$0 is the result of the match, so
"\\\\$0" means "a backslash plus whatever was matched by the regex".
Caveat: This solution is obviously not checking whether any of those characters had already been escaped previously. So
Hello\%
would become
Hello\\%
which you would not want to happen. Could this be a problem?
Related
I'm trying to use the String.replaceAll() method with regex to only keep letter characters and ['-_]. I'm trying to do this by replacing every character that is neither a letter nor one of the characters above by an empty string.
So far I have tried something like this (in different variations) which correctly keeps letters but replaces the special characters I want to keep:
current = current.replaceAll("(?=\\P{L})(?=[^\\'-_])", "");
Make it simplier :
current = current.replaceAll("[^a-zA-Z'_-]", "");
Explanation :
Match any char not in a to z, A to Z, ', _, - and replaceAll() method will replace any matched char with nothing.
Tested input : "a_zE'R-z4r#m"
Output : a_zE'R-zrm
You don't need lookahead, just use negated regex:
current = current.replaceAll("[^\\p{L}'_-]+", "");
[^\\p{L}'_-] will match anything that is not a letter (unicode) or single quote or underscore or hyphen.
Your regex is too complicated. Just specify the characters you want to keep, and use ^ to negate, so [^a-z'_-] means "anything but these".
public class Replacer {
public static void main(String[] args) {
System.out.println("with 1234 &*()) -/.,>>?chars".replaceAll("[^\\w'_-]", ""));
}
}
You can try this:
String str = "Se#rbi323a`and_Eur$ope#-t42he-[A%merica]";
str = str.replaceAll("[\\d+\\p{Punct}&&[^-'_\\[\\]]]+", "");
System.out.println("str = " + str);
And it is the result:
str = Serbia'and_Europe-the-[America]
Im trying to replace part of a String based on a certain phrase being present within it. Consider the string "Hello my Dg6Us9k. I am alive.".
I want to search for the phase "my" and remove 8 characters to the right, which removes the hash code. This gives the string "Hello. I am alive." How can i do this in Java?
You could achieve this through string.replaceAll function.
string.replaceAll("\\bmy.{8}", "");
Add \\b if necessary. \\b called word boundary which matches between a word character and a non-word character. .{8} matches exactly the following 8 characters.
To remove also the space before my
System.out.println("Hello my Dg6Us9k. I am alive.".replaceAll("\\smy.{8}", ""));
This should do it:
String s = ("Hello my Dg6Us9k. I am alive");
s.replace(s.substring(s.indexOf("my"), s.indexOf("my")+11),"");
That is replacing the string starts at "my" and is 11 char long with nothing.
Use regex like this :
public static void main(String[] args) {
String s = "Hello my Dg6Us9k. I am alive";
String newString=s.replaceFirst("\\smy\\s\\w{7}", "");
System.out.println(newString);
}
O/P :
Hello. I am alive
Java strings are immutable, so you cannot change the string. You have to create a new string. So, find the index i of "my". Then concatenate the substring before (0...i) and after (i+8...).
int i = s.indexOf("my");
if (i == -1) { /* no "my" in there! */ }
string ret = s.substring(0,i);
ret.concat(s.substring(i+2+8));
return ret;
If you want to be flexible about the hash code length, use the folowing regexp:
String foo="Hello my Dg6Us9k. I am alive.";
String bar = foo.replaceFirst("\\smy.*?\\.", ".");
System.out.println(bar);
I have string, and I want to replace one of its character with backslash \
I tried the following, but no luck.
engData.replace("'t", "\\'t")
and
engData = engData.replace("'t", String.copyValueOf(new char[]{'\\', 't'}));
INPUT : "can't"
EXPECTED OUTPUT : "can\'t"
Any idea how to do this?
Try this..
String s = "can't";
s = s.replaceAll("'","\\\\'");
System.out.println(s);
out put :
can\'t
This will replace every ' occurences with \' in your string.
Try like this
engData.replace("'", "\\\'");
INPUT : can't
EXPECTED OUTPUT : can\'t
String is immutable in Java. You need to assign back the modified string to itself.
engData = engData.replace("'t", "\\'t"); // assign the modified string back.
This is possible with regex:
engData = engData.replaceAll("('t)","\\\\$1");
The ( and ) specify a group. The 't will match any string containing 't. Finally, the second part replaced such a string with a backslash character: \\\\ (four because this), and the first group: $1. Thus you are replacing any substring 't with \'t
The same thing is possible without regex, what you tried (see this for output):
engData = engData.replace("'t","\\'t"); //note the assignment; Strings are immutable
See String.replace(CharSequence, CharSequence)
For String instances you can use, str.replaceAll() will return a new String with the changes requested:
String str = "./";
String s_modified = s.replaceAll("\\./", "");
The following works for me:
class Foobar {
public static void main(String[] args) {
System.err.println("asd\\'t".replaceAll("\\'t", "\\\'t"));
}
}
public static final String specialChars1= "\\W\\S";
String str2 = str1.replaceAll(specialChars1, "").replace(" ", "+");
public static final String specialChars2 = "`~!##$%^&*()_+[]\\;\',./{}|:\"<>?";
String str2 = str1.replaceAll(specialChars2, "").replace(" ", "+");
Whatever str1 is I want all the characters other than letters and numbers to be removed, and spaces to be replaced by a plus sign (+).
My problem is if I use specialChar1, it does not remove some characters like ;, ', ", and if I am use specialChar2 it gives me an error :
java.util.regex.PatternSyntaxException: Syntax error U_REGEX_MISSING_CLOSE_BRACKET near index 32:
How can this be to achieved?. I have searched but could not find a perfect solution.
This worked for me:
String result = str.replaceAll("[^\\dA-Za-z ]", "").replaceAll("\\s+", "+");
For this input string:
/-+!##$%^&())";:[]{}\ |wetyk 678dfgh
It yielded this result:
+wetyk+678dfgh
replaceAll expects a regex:
public static final String specialChars2 = "[`~!##$%^&*()_+[\\]\\\\;\',./{}|:\"<>?]";
The problem with your first regex, is that "\W\S" means find a sequence of two characters, the first of which is not a letter or a number followed by a character which is not whitespace.
What you mean is "[^\w\s]". Which means: find a single character which is neither a letter nor a number nor whitespace. (we can't use "[\W\S]" as this means find a character which is not a letter or a number OR is not whitespace -- which is essentially all printable character).
The second regex is a problem because you are trying to use reserved characters without escaping them. You can enclose them in [] where most characters (not all) do not have special meanings, but the whole thing would look very messy and you have to check that you haven't missed out any punctuation.
Example:
String sequence = "qwe 123 :#~ ";
String withoutSpecialChars = sequence.replaceAll("[^\\w\\s]", "");
String spacesAsPluses = withoutSpecialChars.replaceAll("\\s", "+");
System.out.println("without special chars: '"+withoutSpecialChars+ '\'');
System.out.println("spaces as pluses: '"+spacesAsPluses+'\'');
This outputs:
without special chars: 'qwe 123 '
spaces as pluses: 'qwe+123++'
If you want to group multiple spaces into one + then use "\s+" as your regex instead (remember to escape the slash).
I had a similar problem to solve and I used following method:
text.replaceAll("\\p{Punct}+", "").replaceAll("\\s+", "+");
Code with time bench marking
public static String cleanPunctuations(String text) {
return text.replaceAll("\\p{Punct}+", "").replaceAll("\\s+", "+");
}
public static void test(String in){
long t1 = System.currentTimeMillis();
String out = cleanPunctuations(in);
long t2 = System.currentTimeMillis();
System.out.println("In=" + in + "\nOut="+ out + "\nTime=" + (t2 - t1)+ "ms");
}
public static void main(String[] args) {
String s1 = "My text with 212354 digits spaces and \n newline \t tab " +
"[`~!##$%^&*()_+[\\\\]\\\\\\\\;\\',./{}|:\\\"<>?] special chars";
test(s1);
String s2 = "\"Sample Text=\" with - minimal \t punctuation's";
test(s2);
}
Sample Output
In=My text with 212354 digits spaces and
newline tab [`~!##$%^&*()_+[\\]\\\\;\',./{}|:\"<>?] special chars
Out=My+text+with+212354+digits+spaces+and+newline+tab+special+chars
Time=4ms
In="Sample Text=" with - minimal punctuation's
Out=Sample+Text+with+minimal+punctuations
Time=0ms
you can use a regex like this:
[<#![CDATA[¢<(+|!$*);¬/¦,%_>?:#="~{#}\]]]#>]`
remove "#" at first and at end from expression
regards
#npinti
using "\w" is the same as "\dA-Za-z"
This worked for me:
String result = str.replaceAll("[^\\w ]", "").replaceAll("\\s+", "+");
For example I'm extracting a text String from a text file and I need those words to form an array. However, when I do all that some words end with comma (,) or a full stop (.) or even have brackets attached to them (which is all perfectly normal).
What I want to do is to get rid of those characters. I've been trying to do that using those predefined String methods in Java but I just can't get around it.
Reassign the variable to a substring:
s = s.substring(0, s.length() - 1)
Also an alternative way of solving your problem: you might also want to consider using a StringTokenizer to read the file and set the delimiters to be the characters you don't want to be part of words.
Use:
String str = "whatever";
str = str.replaceAll("[,.]", "");
replaceAll takes a regular expression. This:
[,.]
...looks for each comma and/or period.
To remove the last character do as Mark Byers said
s = s.substring(0, s.length() - 1);
Additionally, another way to remove the characters you don't want would be to use the .replace(oldCharacter, newCharacter) method.
as in:
s = s.replace(",","");
and
s = s.replace(".","");
You can't modify a String in Java. They are immutable. All you can do is create a new string that is substring of the old string, minus the last character.
In some cases a StringBuffer might help you instead.
The best method is what Mark Byers explains:
s = s.substring(0, s.length() - 1)
For example, if we want to replace \ to space " " with ReplaceAll, it doesn't work fine
String.replaceAll("\\", "");
or
String.replaceAll("\\$", ""); //if it is a path
Note that the word boundaries also depend on the Locale. I think the best way to do it using standard java.text.BreakIterator. Here is an example from the java.sun.com tutorial.
import java.text.BreakIterator;
import java.util.Locale;
public static void main(String[] args) {
String text = "\n" +
"\n" +
"For example I'm extracting a text String from a text file and I need those words to form an array. However, when I do all that some words end with comma (,) or a full stop (.) or even have brackets attached to them (which is all perfectly normal).\n" +
"\n" +
"What I want to do is to get rid of those characters. I've been trying to do that using those predefined String methods in Java but I just can't get around it.\n" +
"\n" +
"Every help appreciated. Thanx";
BreakIterator wordIterator = BreakIterator.getWordInstance(Locale.getDefault());
extractWords(text, wordIterator);
}
static void extractWords(String target, BreakIterator wordIterator) {
wordIterator.setText(target);
int start = wordIterator.first();
int end = wordIterator.next();
while (end != BreakIterator.DONE) {
String word = target.substring(start, end);
if (Character.isLetterOrDigit(word.charAt(0))) {
System.out.println(word);
}
start = end;
end = wordIterator.next();
}
}
Source: http://java.sun.com/docs/books/tutorial/i18n/text/word.html
You can use replaceAll() method :
String.replaceAll(",", "");
String.replaceAll("\\.", "");
String.replaceAll("\\(", "");
etc..