How to remove duplicate white spaces in string using Java? - java

How to remove duplicate white spaces (including tabs, newlines, spaces, etc...) in a string using Java?

Like this:
yourString = yourString.replaceAll("\\s+", " ");
For example
System.out.println("lorem ipsum dolor \n sit.".replaceAll("\\s+", " "));
outputs
lorem ipsum dolor sit.
What does that \s+ mean?
\s+ is a regular expression. \s matches a space, tab, new line, carriage return, form feed or vertical tab, and + says "one or more of those". Thus the above code will collapse all "whitespace substrings" longer than one character, with a single space character.
Source: Java: Removing duplicate white spaces in strings

You can use the regex
(\s)\1
and
replace it with $1.
Java code:
str = str.replaceAll("(\\s)\\1","$1");
If the input is "foo\t\tbar " you'll get "foo\tbar " as outputBut if the input is "foo\t bar" it will remain unchanged because it does not have any consecutive whitespace characters.
If you treat all the whitespace characters(space, vertical tab, horizontal tab, carriage return, form feed, new line) as space then you can use the following regex to replace any number of consecutive white space with a single space:
str = str.replaceAll("\\s+"," ");
But if you want to replace two consecutive white space with a single space you should do:
str = str.replaceAll("\\s{2}"," ");

String str = " Text with multiple spaces ";
str = org.apache.commons.lang3.StringUtils.normalizeSpace(str);
// str = "Text with multiple spaces"

Try this - You have to import java.util.regex.*;
Pattern pattern = Pattern.compile("\\s+");
Matcher matcher = pattern.matcher(string);
boolean check = matcher.find();
String str = matcher.replaceAll(" ");
Where string is your string on which you need to remove duplicate white spaces

hi the fastest (but not prettiest way) i found is
while (cleantext.indexOf(" ") != -1)
cleantext = StringUtils.replace(cleantext, " ", " ");
this is running pretty fast on android in opposite to an regex

Though it is too late, I have found a better solution (that works for me) that will replace all consecutive same type white spaces with one white space of its type. That is:
Hello!\n\n\nMy World
will be
Hello!\nMy World
Notice there are still leading and trailing white spaces. So my complete solution is:
str = str.trim().replaceAll("(\\s)+", "$1"));
Here, trim() replaces all leading and trailing white space strings with "". (\\s) is for capturing \\s (that is white spaces such as ' ', '\n', '\t') in group #1. + sign is for matching 1 or more preceding token. So (\\s)+ can be consecutive characters (1 or more) among any single white space characters (' ', '\n' or '\t'). $1 is for replacing the matching strings with the group #1 string (which only contains 1 white space character) of the matching type (that is the single white space character which has matched). The above solution will change like this:
Hello!\n\n\nMy World
will be
Hello!\nMy World
I have not found my above solution here so I have posted it.

If you want to get rid of all leading and trailing extraneous whitespace then you want to do something like this:
// \\A = Start of input boundary
// \\z = End of input boundary
string = string.replaceAll("\\A\\s+(.*?)\\s+\\z", "$1");
Then you can remove the duplicates using the other strategies listed here:
string = string.replaceAll("\\s+"," ");

You can also try using String Tokeniser, for any space, tab, newline, and all. A simple way is,
String s = "Your Text Here";
StringTokenizer st = new StringTokenizer( s, " " );
while(st.hasMoreTokens())
{
System.out.print(st.nextToken());
}

This can be possible in three steps:
Convert the string in to character array (ToCharArray)
Apply for loop on charater array
Then apply string replace function (Replace ("sting you want to replace"," original string"));

Related

Java String: remove all other whitespaces before and after line breaks

I need to remove heading and tailing whitespaces which is done with .trim(), also i need replace all line breaks by single space which is done with .replaceAll("\\R+", " ") but before that i need to remove all whitespaces (except line breaks) before and after line break.
String toReplace = "\t\t random \t\n\r\t text \t\t";
String result = toReplace.replaceAll(Some magic, "")
.replaceAll("\\R+", " ")
.trim();
Assert.assertEquals("random text", result);
Thank you.
You can match the whitespaces around the line breaks and remove them together with the line break replacement with space:
String result = toReplace.replaceAll("\\h*\\R+\\h*", " ").trim();
The regex is \h*\R+\h*, and the .replaceAll("\\h*\\R+\\h*", " ") replaces the following pattern sequence with a single regular space:
\h* - zero or more horizontal whitespace
\R+ - one or more line break sequences
\h* - zero or more horizontal whitespace

How to remove spaces from string only if it occurs once between two words but not if it occurs thrice?

I am a beginner working on a diff and regenerate algorithm but for Strings. I store the patch in a file. To regenerate the new string from old I use that file. Although the code works, I face a problem when using space.
I use replaceAll(" ", ""); for removing spaces. This is fine when the string is [char][space][char], but creates problem when it is like [space][space][space]. Here, I want that the space be retained(only one).
I thought of doing replaceAll(" ", " ");. But this would leave spaces in type [char][space][char]. I am using scanner to scan through the string.
Is there a way to achieve this?
Input Output
c => c
cc => cc
c c => cc
c c => This is not possible. Since there will be padding of one space for each character
c c => c c
We can also split the string on where there are more than one white space, then join the resulting array by into a string using the Stream and Collector API.
Also we would replace the single spaces by using replaceAll() in a Stream#map operation:
String test = " this is a test of space in string ";
//using the pattern \\s{n,} for splitting at multi spaces
String[] arr = test.split("\\s{2,}");
String s = Arrays.stream(arr)
.map(str -> str.replaceAll(" ", ""))
.collect(Collectors.joining(" "));
System.out.println(s);
Output:
this isatestof spaceinstring
You could use lookarounds to do your replacement:
String newText = text
.replaceAll("(?<! ) (?! )", "")
.replaceAll(" +", " ");
The first replaceAll removes any space not surrounded by spaces; the second one replaces the remaining sequences of spaces by a single one.
Ideone example. Sequences of two or more spaces become a single space, and single spaces are removed.
Lookarounds
A lookaround in the context of regular expressions is a collective term for lookbehinds and lookaheads. These are so-called zero-width assertions, that means they match a certain pattern, but do not actually consume characters. There are positive and negative lookarounds.
A short example: the pattern Ira(?!q) matches the substring Ira, but only if it's not followed by a q. So if the input string is Iraq, it won't match, but if the input string is Iran, then the match is Ira.
More info:
https://www.regular-expressions.info/lookaround.html
If you want to replace any group of space by one you could use:
value.replaceAll("\\s+", " ")
I had to use two replacements:
String e = "a b c";
e = e.replaceAll("([A-Z|a-z])\\s([A-Z|a-z])", "$1$2");
e = e.replaceAll(" "," ");
System.out.println(e);
Which prints
ab c
The first one replaces any letter-space-letter combo with just the two letters, and then the second replaces any triple-space with a single space.
The first replacement is using backreferences. $1 refers to the part inside the first set of parenthesis that matches the first letter, and $2 refers to the part inside the second set of parenthesis.
If you have leading/trailing spaces on the input, you can call trim() before doing the replacements.
e = e.trim()

Regex pattern doesn't work when ending without a space

I want to remove strings that contain either http or https. I have the following code segment:
String line="abc http://someurl something https://someurl";
if (line.contains("https") || line.contains("http")) {
System.out.println(line);
String x = line.replaceAll("https?://.*?\\s+", " ");
System.out.println(x);
}
The output is: abc something https://someurl (doesn't remove the ending url)
Desired output is: abc something
I'm guessing its a simple change to the regex...
Edit: Sorry, the previous example didn't contain an actual url after the http.
Your regex is
https?://.*?\\s+
That final token \s+ means one or more space characters. If you want to remove substrings that don't necessarily end in spaces, you can repeat with * instead of + - * means to repeat the preceding token zero or more times:
String x = line.replaceAll("https?://.*?\\s*", " ");
That said, if the URLs you have are valid and don't contain any space characters, it would probably make more sense to match non-space characters with \S and replace with the empty string, rather than look for space characters, match them, and then replace with another space:
String x = line.replaceAll("https?://\\S*", "");

Regex add space between all punctuation

I need to add spaces between all punctuation in a string.
\\ "Hello: World." -> "Hello : World ."
\\ "It's 9:00?" -> "It ' s 9 : 00 ?"
\\ "1.B,3.D!" -> "1 . B , 3 . D !"
I think a regex is the way to go, matching all non-punctuation [a-ZA-Z\\d]+, adding a space before and/or after, then extracting the remainder matching all punctuation [^a-ZA-Z\\d]+.
But I don't know how to (recursively?) call this regex. Looking at the first example, the regex will only match the "Hello". I was thinking of just building a new string by continuously removing and appending the first instance of the matched regex, while the original string is not empty.
private String addSpacesBeforePunctuation(String s) {
StringBuilder builder = new StringBuilder();
final String nonpunctuation = "[a-zA-Z\\d]+";
final String punctuation = "[^a-zA-Z\\d]+";
String found;
while (!s.isEmpty()) {
// regex stuff goes here
found = ???; // found group from respective regex goes here
builder.append(found);
builder.append(" ");
s = s.replaceFirst(found, "");
}
return builder.toString().trim();
}
However this doesn't feel like the right way to go... I think I'm over complicating things...
You can use lookarounds based regex using punctuation property \p{Punct} in Java:
str = str.replaceAll("(?<=\\S)(?:(?<=\\p{Punct})|(?=\\p{Punct}))(?=\\S)", " ");
(?<=\\S) Asserts if prev char is not a white-space
(?<=\\p{Punct}) asserts a position if previous char is a punctuation char
(?=\\p{Punct}) asserts a position if next char is a punctuation char
(?=\\S) Asserts if next char is not a white-space
IdeOne Demo
When you see a punctuation mark, you have four possibilities:
Punctuation is surrounded by spaces
Punctuation is preceded by a space
Punctuation is followed by a space
Punctuation is neither preceded nor followed by a space.
Here is code that does the replacement properly:
String ss = s
.replaceAll("(?<=\\S)\\p{Punct}", " $0")
.replaceAll("\\p{Punct}(?=\\S)", "$0 ");
It uses two expressions - one matching the number 2, and one matching the number 3. Since the expressions are applied on top of each other, they take care of the number 4 as well. The number 1 requires no change.
Demo.

How to replace last letter to another letter in java using regular expression

i have seen to replace "," to "." by using ".$"|",$", but this logic is not working with alphabets.
i need to replace last letter of a word to another letter for all word in string containing EXAMPLE_TEST using java
this is my code
Pattern replace = Pattern.compile("n$");//here got the real problem
matcher2 = replace.matcher(EXAMPLE_TEST);
EXAMPLE_TEST=matcher2.replaceAll("k");
i also tried "//n$" ,"\n$" etc
Please help me to get the solution
input text=>njan ayman
output text=> njak aymak
Instead of the end of string $ anchor, use a word boundary \b
String s = "njan ayman";
s = s.replaceAll("n\\b", "k");
System.out.println(s); //=> "njak aymak"
You can use lookahead and group matching:
String EXAMPLE_TEST = "njan ayman";
s = EXAMPLE_TEST.replaceAll("(n)(?=\\s|$)", "k");
System.out.println("s = " + s); // prints: s = njak aymak
Explanation:
(n) - the matched word character
(?=\\s|$) - which is followed by a space or at the end of the line (lookahead)
The above is only an example! if you want to switch every comma with a period the middle line should be changed to:
s = s.replaceAll("(,)(?=\\s|$)", "\\.");
Here's how I would set it up:
(?=.\b)\w
Which in Java would need to be escaped as following:
(?=.\\b)\\w
It translates to something like "a character (\w) after (?=) any single character (.) at the end of a word (\b)".
String s = "njan ayman aowkdwo wdonwan. wadawd,.. wadwdawd;";
s = s.replaceAll("(?=.\\b)\\w", "");
System.out.println(s); //nja ayma aowkdw wdonwa. wadaw,.. wadwdaw;
This removes the last character of all words, but leaves following non-alphanumeric characters. You can specify only specific characters to remove/replace by changing the . to something else.
However, the other answers are perfectly good and might achieve exactly what you are looking for.
if (word.endsWith("char oldletter")) {
name = name.substring(0, name.length() - 1 "char newletter");
}

Categories