Not able to remove multiple whitespace(s) in a string in java - java

Problem:
Can't remove multiple white spaces in a string while working in eclipse editor
Context:
String myString1 ="aye bye tye ";
String myString2 =myString1.replaceAll("\\s+","");
System.out.println("replaced string ="+myString2);
In the output the white spaces are not removed and the result is the same as the string,
replaced string =aye bye tye
is getting printed
But if there is only one white space between the words like,
String myString1 ="aye bye tye";
the result is correctly coming as below:
replaced string =ayebyetye
I wonder where I am going wrong?

I can only guess that the spaces are not really space character (U+0020), but some Unicode space character, like U+00A0 NO BREAK SPACE. \s by default only matches space characters in the ASCII range, so they are not removed.
If you want to remove all Unicode spaces, you have to enable the UNICODE_CHARACTER_CLASS flag with inline construct (?U)
String myString2 = myString1.replaceAll("(?U)\\s+", "");

Use space in the replacement part so that one or more spaces would be replaced by a single space character.
String myString2 = myString1.replaceAll("\\s+", " ");
or
String myString2 = myString1.replaceAll("(\\s)+", "$1");

Why the
String myString2 =myString1.replaceAll(" ","");
Is not an option? You don't need a regex at all

Related

How to strip all characters from a string except for numbers, space and +

How to strip all characters from a string except for numbers, space and +?
What is the regex for that?
I want to use
String regex = ???
String string = previousString.replaceAll(regex, "");
but I dont know how to generate a regex that will only keep 0-9, space and +
regex = "[^\\d\\s\\+]"
^: not, negate
d: number
s: space
+: you know :)

Remove apostrophe and white space from string

So I'm playing around string manipulation. I'm done replacing white space characters with hyphens. Now I want to combine replacing white spaces characters and removing apostrophe from string. How can I do this?
This is what I've tried so far:
String str = "Please Don't Ask Me";
String newStr = str.replaceAll("\\s+","-");
System.out.println("New string is " + newStr);
Output is:
Please-Don't-Ask-Me
But I want the output to be:
Please-Dont-Ask-Me
But I can't get to work removing the apostrophe. Any ideas? Help is much appreciated. Thanks.
Try this:
String newStr = str.replaceAll("\\s+","-").replaceAll("'", "");
The first replaceAll returns the String with all spaces replaced with -, then we perform on this another replaceAll to replace all ' with nothing (Meaning, we are removing them).
It's very easy, use replaceAll again on the resulted String:
String newStr = str.replaceAll("\\s+","-").replaceAll("'","");
Try this..
String s = "This is a string that contain's a few incorrect apostrophe's. don't fail me now.'O stranger o'f t'h'e f'u't'u'r'e!";
System.out.println(s);
s = s.replace("\'", "");
System.out.println("\n"+s);

Eliminating Unicode Characters and Escape Characters from String

I want to remove all Unicode Characters and Escape Characters like (\n, \t) etc. In short I want just alphanumeric string.
For example :
\u2029My Actual String\u2029
\nMy Actual String\n
I want to fetch just 'My Actual String'. Is there any way to do so, either by using a built in string method or a Regular Expression ?
Try
String stg = "\u2029My Actual String\u2029 \nMy Actual String";
Pattern pat = Pattern.compile("(?!(\\\\(u|U)\\w{4}|\\s))(\\w)+");
Matcher mat = pat.matcher(stg);
String out = "";
while(mat.find()){
out+=mat.group()+" ";
}
System.out.println(out);
The regex matches all things except unicode and escape characters. The regex pictorially represented as:
Output:
My Actual String My Actual String
Try this:
anyString = anyString.replaceAll("\\\\u\\d{4}|\\\\.", "");
to remove escaped characters. If you also want to remove all other special characters use this one:
anyString = anyString.replaceAll("\\\\u\\d{4}|\\\\.|[^a-zA-Z0-9\\s]", "");
(I guess you want to keep the whitespaces, if not remove \\s from the one above)

How to check if a string contains a substring containing spaces?

Say I have a string like this in java:
"this is {my string: } ok"
Note, there can be any number of white spaces in between the various characters. How do I check the above string to see if it contains just the substring:
"{my string: }"
Many thanks!
If you are looking to see if a String contains another specific sequence of characters then you could do something like this :
String stringToTest = "blah blah blah";
if(stringToTest.contains("blah")){
return true;
}
You could also use matches. For a decent explanation on matching Strings I would advise you check out the Java Oracle tutorials for Regular Expressions at :
http://docs.oracle.com/javase/tutorial/essential/regex/index.html
Cheers,
Jamie
If you have any number of white space between each character of your matching string, I think you are better off removing all white spaces from the string you are trying to match before the search. I.e. :
String searchedString = "this is {my string: } ok";
String stringToMatch = "{my string: }";
boolean foundMatch = searchedString.replaceAll(" ", "").contains(stringToMatch.replaceAll(" ",""));
Put it all into a string variable, say s, then do s.contains("{my string: }); this will return true if {my string: } is in s.
For this purpose you need to use String#contains(CharSequence).
Note, there can be any number of white spaces in between the various
characters.
For this purpose String#trim() method is used to returns a copy of the string, with leading and trailing whitespace omitted.
For e.g.:
String myStr = "this is {my string: } ok";
if (myStr.trim().contains("{my string: }")) {
//Do something.
}
The easiest thing to do is to strip all the spaces from both strings.
return stringToSearch.replaceAll("\s", "").contains(
stringToFind.replaceAll("\s", ""));
Look for the regex
\{\s*my\s+string:\s*\}
This matches any sequence that contains
A left brace
Zero or more spaces
'my'
One or more spaces
'string:'
Zero or more spaces
A right brace
Where 'space' here means any whitespace (tab, space, newline, cr)

How to remove duplicate white spaces in string using Java?

How to remove duplicate white spaces (including tabs, newlines, spaces, etc...) in a string using Java?
Like this:
yourString = yourString.replaceAll("\\s+", " ");
For example
System.out.println("lorem ipsum dolor \n sit.".replaceAll("\\s+", " "));
outputs
lorem ipsum dolor sit.
What does that \s+ mean?
\s+ is a regular expression. \s matches a space, tab, new line, carriage return, form feed or vertical tab, and + says "one or more of those". Thus the above code will collapse all "whitespace substrings" longer than one character, with a single space character.
Source: Java: Removing duplicate white spaces in strings
You can use the regex
(\s)\1
and
replace it with $1.
Java code:
str = str.replaceAll("(\\s)\\1","$1");
If the input is "foo\t\tbar " you'll get "foo\tbar " as outputBut if the input is "foo\t bar" it will remain unchanged because it does not have any consecutive whitespace characters.
If you treat all the whitespace characters(space, vertical tab, horizontal tab, carriage return, form feed, new line) as space then you can use the following regex to replace any number of consecutive white space with a single space:
str = str.replaceAll("\\s+"," ");
But if you want to replace two consecutive white space with a single space you should do:
str = str.replaceAll("\\s{2}"," ");
String str = " Text with multiple spaces ";
str = org.apache.commons.lang3.StringUtils.normalizeSpace(str);
// str = "Text with multiple spaces"
Try this - You have to import java.util.regex.*;
Pattern pattern = Pattern.compile("\\s+");
Matcher matcher = pattern.matcher(string);
boolean check = matcher.find();
String str = matcher.replaceAll(" ");
Where string is your string on which you need to remove duplicate white spaces
hi the fastest (but not prettiest way) i found is
while (cleantext.indexOf(" ") != -1)
cleantext = StringUtils.replace(cleantext, " ", " ");
this is running pretty fast on android in opposite to an regex
Though it is too late, I have found a better solution (that works for me) that will replace all consecutive same type white spaces with one white space of its type. That is:
Hello!\n\n\nMy World
will be
Hello!\nMy World
Notice there are still leading and trailing white spaces. So my complete solution is:
str = str.trim().replaceAll("(\\s)+", "$1"));
Here, trim() replaces all leading and trailing white space strings with "". (\\s) is for capturing \\s (that is white spaces such as ' ', '\n', '\t') in group #1. + sign is for matching 1 or more preceding token. So (\\s)+ can be consecutive characters (1 or more) among any single white space characters (' ', '\n' or '\t'). $1 is for replacing the matching strings with the group #1 string (which only contains 1 white space character) of the matching type (that is the single white space character which has matched). The above solution will change like this:
Hello!\n\n\nMy World
will be
Hello!\nMy World
I have not found my above solution here so I have posted it.
If you want to get rid of all leading and trailing extraneous whitespace then you want to do something like this:
// \\A = Start of input boundary
// \\z = End of input boundary
string = string.replaceAll("\\A\\s+(.*?)\\s+\\z", "$1");
Then you can remove the duplicates using the other strategies listed here:
string = string.replaceAll("\\s+"," ");
You can also try using String Tokeniser, for any space, tab, newline, and all. A simple way is,
String s = "Your Text Here";
StringTokenizer st = new StringTokenizer( s, " " );
while(st.hasMoreTokens())
{
System.out.print(st.nextToken());
}
This can be possible in three steps:
Convert the string in to character array (ToCharArray)
Apply for loop on charater array
Then apply string replace function (Replace ("sting you want to replace"," original string"));

Categories