How to strip off unwanted characters in Android? - java

I have this code
public String StripText(String name){
String stripped = name.replaceAll("/:!##$%^&*()<>+?\"{}[]=`~;", "");
return stripped;
}
which doesn't work. I want it to return a string, erasing occurrences of characters like "/" ":" "!" "#" and so on.
e.g. If I give it a string "puppy:)love", I want it to return a string containing only "puppylove".

You just need to put all the characters inside a character class.With a "character class", also called "character set", you can tell the regex engine to match only one out of several characters. Simply place the characters you want to match between square brackets.
String stripped = name.replaceAll("[/:!##$%^&*()<>+?\"{}\\[\\]=`~;]", "");
You also need to escape [, ] characters present inside the character class or it would consider the first ] as an end of charclass.
Example:
String name = "[{puppy:)love}]";
String stripped = name.replaceAll("[/:!##$%^&*()<>+?\"{}\\[\\]=`~;]", "");
System.out.println(stripped);
Output:
puppylove

You need to escape all the special characters or use [] like this [/:!##$%^&*()<>+?\"{}[]=~;]+ in your code

Related

Java escape characters in strings - string contains \r (need to keep "r")

I have a string "EAD\rgonzalez" which is passed to me.
I need to pull out "rgonzalez" from it.
I am running into problems with the "\" character.
I cannot find the index of it, I cannot replace it, etc.
Any help on pulling the data after the "\" would be appreciated.
The string that i receive is in the format of domain\username; the data can vary.
Another example would be US\ngross where \n would be interpreted as a newline character.
To clarify, I am not adding a '\', i am trying to split a string on a '\'
This string contains '\r' which in itself is a character, a special one.
I need a way to make \r contained within my string two separate characters, a '\' and an 'r'.
You haven't provided any code, but I'm assuming what you're doing is something like this:
String user = request.getParameter("user"); // user = "EAD\rgonzalez"
If you were to declare a static string in your application, you would have to escape the backslash because it is a special character for Java strings:
String user = "EAD\\rgonzalez";
To split that string on the backslash you must escape it twice in the regex that you pass to the split method. Once because backslash is a special character for Java strings and again because backslash is a special character for regex strings. So instead of one backlash you have four. The one is escaped so then you have two, and then both of them are escaped again.
String[] parts = user.split("\\\\");
Now you have split the string:
System.out.println(parts[0]); // "EAD"
System.out.println(parts[1]); // "rgonzalez"
The string that i receive is in the format of domain\username... the data can vary
The data shouldn't vary if that is the input your program expects.
where \n would be interpreted as a newline character
I'm not sure how you'd get newlines from a single line input form. If you are, then your input is invalid because it does not follow the format you're specified and are expecting. In the case where you did interpret newlines and other whitespace characters, you would either treat the whole thing as the domain, or the username, thus potentially breaking your program logic... Since you have stated the requirement of domain\username, and I don't think that requires you to handle any other form of input.
I am collecting this string from the header data from the request object in a webapp.
In that case, the raw value should not contain an escape character and is actually represented as the form "domain\\username" as a Java string. When you print the value, the escape characters aren't shown
I cannot find the index of it,
With the correct representation, indexOf("\\") will work...
pulling the data after the "\"
Since you would have the value as domain\\username, you need to escape both of the backslashes within the method of split(String pattern) since that is a regular expression.
For example,
public static void main (String[] args) throws java.lang.Exception
{
String in = "EAD\\rgonzalez";
System.out.println(in.indexOf("\\")); // find the index of '\'
String[] parts = in.split("\\\\"); // split on '\\'
System.out.println(Arrays.toString(parts));
}
Again, the string "EAD\rgonzalez" is not in the form of domain\username, as demonstrated here
System.out.print("EAD\rgonzalez".matches("[A-Z]+\\[a-z]+")); // false
The magic you need is in org.apache.commons.lang.StringEscapeUtils
Here is a demo:
package ignoreescapeseq2;
import org.apache.commons.lang.StringEscapeUtils;
/*
* #author Charles Knell
*/
public class IgnoreEscapeSeq2 {
public static void main(String[] args) {
String string = "EAD\rgonzalez"; // REQUIRED INPUT STRING
String eString = StringEscapeUtils.escapeJava(string);
String [] sArray = eString.split("\\\\");
System.out.println("domain: " + sArray[0]);
System.out.println("username: " + sArray[1]);
}
}
Here is the output:
Although this MAY answer the question, there does still seem to be a problem
if you must define the string in java. As you said, "EAD\xgonzalez" isn't a
valid java string because \x isn't a valid escape character. The solution above only works if the input string never has to be explictly defined, as in the demo.

java regex replaceAll with negated groups

I'm trying to use the String.replaceAll() method with regex to only keep letter characters and ['-_]. I'm trying to do this by replacing every character that is neither a letter nor one of the characters above by an empty string.
So far I have tried something like this (in different variations) which correctly keeps letters but replaces the special characters I want to keep:
current = current.replaceAll("(?=\\P{L})(?=[^\\'-_])", "");
Make it simplier :
current = current.replaceAll("[^a-zA-Z'_-]", "");
Explanation :
Match any char not in a to z, A to Z, ', _, - and replaceAll() method will replace any matched char with nothing.
Tested input : "a_zE'R-z4r#m"
Output : a_zE'R-zrm
You don't need lookahead, just use negated regex:
current = current.replaceAll("[^\\p{L}'_-]+", "");
[^\\p{L}'_-] will match anything that is not a letter (unicode) or single quote or underscore or hyphen.
Your regex is too complicated. Just specify the characters you want to keep, and use ^ to negate, so [^a-z'_-] means "anything but these".
public class Replacer {
public static void main(String[] args) {
System.out.println("with 1234 &*()) -/.,>>?chars".replaceAll("[^\\w'_-]", ""));
}
}
You can try this:
String str = "Se#rbi323a`and_Eur$ope#-t42he-[A%merica]";
str = str.replaceAll("[\\d+\\p{Punct}&&[^-'_\\[\\]]]+", "");
System.out.println("str = " + str);
And it is the result:
str = Serbia'and_Europe-the-[America]

Java String.replace/replaceAll not working

So, I'm trying to parse a String input in Java that contains (opening) square brackets. I have str.replace("\\[", ""), but this does absolutely nothing. I've tried replaceAll also, with more than one different regex, but the output is always unchanged. Part of me wonders if this is possibly caused by the fact that all my back-slash characters appear as yen symbols (ever since I added Japanese to my languages), but it's been that way for over a year and hasn't caused me any issues like this before.
Any idea what I might be doing wrong here?
Strings are immutable in Java. Make sure you re-assign the return value to the same String variable:
str = str.replaceAll("\\[", "");
For the normal replace method, you don't need to escape the bracket:
str = str.replace("[", "");
public String replaceAll(String regex, String replacement)
As shown in the code above, replaceAll method expects first argument as regular expression and hence you need to escape characters like "(", ")" etc (with "\") if these exists in your replacement text which is to be replaced out of the string. For example :
String oldString = "This is (stringTobeReplaced) with brackets.";
String newString = oldString.replaceAll("\\(stringTobeReplaced\\)", "");
System.out.println(newString); // will output "This is with brackets."
Another way of doing this is to use Pattern.quote("str") :
String newString = oldString.replaceAll(Pattern.quote("(stringTobeReplaced)"), "");
This will consider the string as literal to be replaced.
As always, the problem is not that "xxx doesn't work", it is that you don't know how to use it.
First things first:
a String is immutable; if you read the javadoc of .replace() and .replaceAll(), you will see that both specify that a new String instance is returned;
replace() accepts a string literal as its first argument, not a regex literal.
Which means that you probably meant to do:
str = str.replace("[", "");
If you only ever do:
str.replace("[", "");
then the new instance will be created but you ignore it...
In addition, and this is a common trap with String (the other being that .matches() is misnamed), in spite of their respective names, .replace() does replace all occurrences of its first argument with its second argument; the only difference is that .replaceAll() accepts a regex as a first argument, and a "regex aware" expression as its second argument; for more details, see the javadoc of Matcher's .replaceAll().
For it to work it has to be inside a method.
for example:
public class AnyClass {
String str = "gtrg4\r\n" + "grtgy\r\n" + "grtht\r\n" + "htrjt\r\n" + "jtyjr\r\n" + "kytht";
public String getStringModified() {
str.replaceAll("\r\n", "");
return str;
}
}

Eliminating Unicode Characters and Escape Characters from String

I want to remove all Unicode Characters and Escape Characters like (\n, \t) etc. In short I want just alphanumeric string.
For example :
\u2029My Actual String\u2029
\nMy Actual String\n
I want to fetch just 'My Actual String'. Is there any way to do so, either by using a built in string method or a Regular Expression ?
Try
String stg = "\u2029My Actual String\u2029 \nMy Actual String";
Pattern pat = Pattern.compile("(?!(\\\\(u|U)\\w{4}|\\s))(\\w)+");
Matcher mat = pat.matcher(stg);
String out = "";
while(mat.find()){
out+=mat.group()+" ";
}
System.out.println(out);
The regex matches all things except unicode and escape characters. The regex pictorially represented as:
Output:
My Actual String My Actual String
Try this:
anyString = anyString.replaceAll("\\\\u\\d{4}|\\\\.", "");
to remove escaped characters. If you also want to remove all other special characters use this one:
anyString = anyString.replaceAll("\\\\u\\d{4}|\\\\.|[^a-zA-Z0-9\\s]", "");
(I guess you want to keep the whitespaces, if not remove \\s from the one above)

How to check if a string contains a substring containing spaces?

Say I have a string like this in java:
"this is {my string: } ok"
Note, there can be any number of white spaces in between the various characters. How do I check the above string to see if it contains just the substring:
"{my string: }"
Many thanks!
If you are looking to see if a String contains another specific sequence of characters then you could do something like this :
String stringToTest = "blah blah blah";
if(stringToTest.contains("blah")){
return true;
}
You could also use matches. For a decent explanation on matching Strings I would advise you check out the Java Oracle tutorials for Regular Expressions at :
http://docs.oracle.com/javase/tutorial/essential/regex/index.html
Cheers,
Jamie
If you have any number of white space between each character of your matching string, I think you are better off removing all white spaces from the string you are trying to match before the search. I.e. :
String searchedString = "this is {my string: } ok";
String stringToMatch = "{my string: }";
boolean foundMatch = searchedString.replaceAll(" ", "").contains(stringToMatch.replaceAll(" ",""));
Put it all into a string variable, say s, then do s.contains("{my string: }); this will return true if {my string: } is in s.
For this purpose you need to use String#contains(CharSequence).
Note, there can be any number of white spaces in between the various
characters.
For this purpose String#trim() method is used to returns a copy of the string, with leading and trailing whitespace omitted.
For e.g.:
String myStr = "this is {my string: } ok";
if (myStr.trim().contains("{my string: }")) {
//Do something.
}
The easiest thing to do is to strip all the spaces from both strings.
return stringToSearch.replaceAll("\s", "").contains(
stringToFind.replaceAll("\s", ""));
Look for the regex
\{\s*my\s+string:\s*\}
This matches any sequence that contains
A left brace
Zero or more spaces
'my'
One or more spaces
'string:'
Zero or more spaces
A right brace
Where 'space' here means any whitespace (tab, space, newline, cr)

Categories