I have:
String s = "Hello world";
or
String s = " Hello world ";
the result should be:
String[] splited = s.split("REGEX");
splited[0].equals(" Hello"); \\true
splited[1].equals("world "); \\true
I did like this: s.trim().split(" +"); but I have lost first spaces in splited[0], but the space should stay.
How can I do it whith regex?
You could combine negative look ahead/behind assertions
String[] array = s.split("(?<!^\\s*)\\s+(?=\\S)");
(?<!^\\s*) Match start of string + 0 or more whitespaces
(?=\\S) Match non-whitespace
A limited (to 1000 spaces at the begining) way:
String[] splited = s.split("(?<!\\A\\s{0,1000})\\s+(?=\\S)");
details:
(?<!\\A\\s{0,1000}) # not preceded by white-spaces at the start of the string
\\s+ # white-spaces
(?=\\S) # followed by a non white-space character
or the same strictly for spaces (not for tabs or newlines...):
String[] splited = s.split("(?<!\\A {0,1000}) +(?=[^ ])");
Related
I am trying to split a string into a string array, there might be number of combinations,
I tried:
String strExample = "A, B";
//possible option are:
1. A,B
2. A, B
3. A , B
4. A ,B
String[] parts;
parts = strExample.split("/"); //Split the string but doesnt remove the space in between them so the 2 item in the string array is space and B ( B)
parts = strExample.split("/| ");
parts = strExample.split(",|\\s+");
Any guidance would be appreciated
To split with comma enclosed with optional whitespace chars you may use
s.split("\\s*,\\s*")
The \s*,\s* pattern matches
\s* - 0+ whitespaces
, - a comma
\s* - 0+ whitespaces
In case you want to make sure there are no leading/trailing spaces, consider trim()ming the string before splitting.
You can use
parts=strExample.split("\\s,\\s*");
for your case.
This question already has answers here:
How to split a string with any whitespace chars as delimiters
(13 answers)
Closed 5 years ago.
I need to split some sentences into words.
For example:
Upper sentence.
Lower sentence. And some text.
I do it by:
String[] words = text.split("(\\s+|[^.]+$)");
But the output I get is:
Upper, sentence.Lower, sentence., And, some, text.
And it should be like:
Upper, sentence., Lower, sentence., And, some, text.
Notice that I need to preserve all the characters (.,-?! etc.)
in regular expressions \W+ match one or more non word characters.
http://www.vogella.com/tutorials/JavaRegularExpressions/article.html
So if you want to get the words in the sentences you can use \W+ as the splitter.
String[] words = text.split("\\W+");
this will give you following output.
Upper
sentence
Lower
sentence
And
some
text
UPDATE :
Since you have updated your question, if you want to preserve all characters and split by spaces, use \s+ as the splitter.
String[] words = text.split("\\s+");
I have checked following code block and confirmed that it is working with new lines too.
String text = "Upper sentence.\n" +
"Lower sentence. And some text.";
String[] words = text.split("\\s+");
for (String word : words){
System.out.println(word);
}
Replace dots, commas, etc... for a white space and split that for whitespace
String text = "hello.world this is.a sentence.";
String[] list = text.replaceAll("\\.", " " ).split("\\s+");
System.out.println(new ArrayList<>(Arrays.asList(list)));
Result: [hello, world, this, is, a, sentence]
Edit:
If is only for dots this trick should work...
String text = "hello.world this is.a sentence.";
String[] list = text.replaceAll("\\.", ". " ).split("\\s+");
System.out.println(new ArrayList<>(Arrays.asList(list)));
[hello., world, this, is., a, sentence.]
The expression \\s+ means "1 or more whitespace characters". I think what you need to do is replace this by \\s*, which means "zero or more whitespace characters".
Simple answer for updated question
String text = "Upper sentence.\n"+
"Lower sentence. And some text.";
[just space] one or more OR new lines one or more
String[] arr1 = text.split("[ ]+|\n+");
System.out.println(Arrays.toString(arr1));
result:
[Upper, sentence., Lower, sentence., And, some, text.]
You can split the string into sub strings using the following line of code:
String[] result = speech.split("\\s");
For reference: https://alvinalexander.com/java/edu/pj/pj010006
String numbers = "5 1 5 1";
So, is it:
String [] splitNumbers = numbers.split();
or:
String [] splitNumbers = numbers.split("\s+");
Looking for: ["5","1","5","1"]
Any idea why neither of the .split lines will work? I tried reading answers about the regex, but I'm not getting anywhere.
In your above case split("\s+");, you need to escape \ with another backslash, which is:
split("\\s+");
Or
split(" "); also can do it
Note that split("\\s+"); split any length of whitespace including newline(\n), tab(\t) while split(" "); will split only single space.
For example, when you have string separated with two spaces, say "5 1 5 1" ,
using split("\\s+"); you will get {"5","1","5","1"}
while using split(" "); you will get {"5","","1","5","1"}
You must escape the regex with an additional \ since \ denotes the escape character:
public static void main(String[] args) {
String numbers = "5 1 5 1";
String[] tokens = numbers.split("\\s+");
for(String token:tokens){
System.out.println(token);
}
}
So the additional \ escapes the next \ which is then treated as the literal \.
When using \\s+ the String will be split on multiple whitespace characters (space, tab, etc).
I know that you can split your string using myString.split("something"). But I do not know how I can split a string by two delimiters.
Example:
mySring = "abc==abc++abc==bc++abc";
I need something like this:
myString.split("==|++")
What is its regularExpression?
Use this :
myString.split("(==)|(\\+\\+)")
How I would do it if I had to split using two substrings:
String mainString = "This is a dummy string with both_spaces_and_underscores!"
String delimiter1 = " ";
String delimiter2 = "_";
mainString = mainString.replaceAll(delimiter2, delimiter1);
String[] split_string = mainString.split(delimiter1);
Replace all instances of second delimiter with first and split with first.
Note: using replaceAll allows you to use regexp for delimiter2. So, you should actually replace all matches of delimiter2 with some string that matches delimiter1's regexp.
You can use this
mySring = "abc==abc++abc==bc++abc";
String[] splitString = myString.split("\\W+");
Regular expression \W+ ---> it will split the string based upon non-word character.
Try this
String str = "aa==bb++cc";
String[] split = str.split("={2}|\\+{2}");
System.out.println(Arrays.toString(split));
The answer is an array of
[aa, bb, cc]
The {2} matches two characters of the proceding character. That is either = or + (escaped)
The | matches either side
I am escaping the \ in java so the regex is actually ={2}|\+{2}
How to remove duplicate white spaces (including tabs, newlines, spaces, etc...) in a string using Java?
Like this:
yourString = yourString.replaceAll("\\s+", " ");
For example
System.out.println("lorem ipsum dolor \n sit.".replaceAll("\\s+", " "));
outputs
lorem ipsum dolor sit.
What does that \s+ mean?
\s+ is a regular expression. \s matches a space, tab, new line, carriage return, form feed or vertical tab, and + says "one or more of those". Thus the above code will collapse all "whitespace substrings" longer than one character, with a single space character.
Source: Java: Removing duplicate white spaces in strings
You can use the regex
(\s)\1
and
replace it with $1.
Java code:
str = str.replaceAll("(\\s)\\1","$1");
If the input is "foo\t\tbar " you'll get "foo\tbar " as outputBut if the input is "foo\t bar" it will remain unchanged because it does not have any consecutive whitespace characters.
If you treat all the whitespace characters(space, vertical tab, horizontal tab, carriage return, form feed, new line) as space then you can use the following regex to replace any number of consecutive white space with a single space:
str = str.replaceAll("\\s+"," ");
But if you want to replace two consecutive white space with a single space you should do:
str = str.replaceAll("\\s{2}"," ");
String str = " Text with multiple spaces ";
str = org.apache.commons.lang3.StringUtils.normalizeSpace(str);
// str = "Text with multiple spaces"
Try this - You have to import java.util.regex.*;
Pattern pattern = Pattern.compile("\\s+");
Matcher matcher = pattern.matcher(string);
boolean check = matcher.find();
String str = matcher.replaceAll(" ");
Where string is your string on which you need to remove duplicate white spaces
hi the fastest (but not prettiest way) i found is
while (cleantext.indexOf(" ") != -1)
cleantext = StringUtils.replace(cleantext, " ", " ");
this is running pretty fast on android in opposite to an regex
Though it is too late, I have found a better solution (that works for me) that will replace all consecutive same type white spaces with one white space of its type. That is:
Hello!\n\n\nMy World
will be
Hello!\nMy World
Notice there are still leading and trailing white spaces. So my complete solution is:
str = str.trim().replaceAll("(\\s)+", "$1"));
Here, trim() replaces all leading and trailing white space strings with "". (\\s) is for capturing \\s (that is white spaces such as ' ', '\n', '\t') in group #1. + sign is for matching 1 or more preceding token. So (\\s)+ can be consecutive characters (1 or more) among any single white space characters (' ', '\n' or '\t'). $1 is for replacing the matching strings with the group #1 string (which only contains 1 white space character) of the matching type (that is the single white space character which has matched). The above solution will change like this:
Hello!\n\n\nMy World
will be
Hello!\nMy World
I have not found my above solution here so I have posted it.
If you want to get rid of all leading and trailing extraneous whitespace then you want to do something like this:
// \\A = Start of input boundary
// \\z = End of input boundary
string = string.replaceAll("\\A\\s+(.*?)\\s+\\z", "$1");
Then you can remove the duplicates using the other strategies listed here:
string = string.replaceAll("\\s+"," ");
You can also try using String Tokeniser, for any space, tab, newline, and all. A simple way is,
String s = "Your Text Here";
StringTokenizer st = new StringTokenizer( s, " " );
while(st.hasMoreTokens())
{
System.out.print(st.nextToken());
}
This can be possible in three steps:
Convert the string in to character array (ToCharArray)
Apply for loop on charater array
Then apply string replace function (Replace ("sting you want to replace"," original string"));