How to split a string by keywords in Java [duplicate] - java

This question already has answers here:
Split String In JAVA by specific words
(2 answers)
How do I split a string in Java?
(39 answers)
Closed 4 years ago.
I need to split an input and put it in to a list according to a list of keywords or delimiters that I have created.
I've tried simply splitting the string by spaces and then processing it afterwards, but the problem is that some of my keywords have spaces in them so they are split apart which causes undesired behavior.
// "Jr.", "III", "Sr", "pro se" are all keywords in my list
String input = "Abraham Lincoln Jr. III Sr pro se";
String [] splitBySpace = input.split(" ");
List<String> separatedName = new ArrayList<>(Arrays.asList(splitBySpace);
My list ends up being: {Abraham, Lincoln, Jr., III, Sr, pro, se}
And I would like it to be: {Abraham, Lincoln, Jr., III, Sr, pro se}
This output would also work: {Abraham Lincoln, Jr., III, Sr, pro se} (I don't need the strings that are not in my delimiter list to be split apart)

You could use this: \s(?=(Jr\.|III|Sr|pro se)) - and add in the keywords you want to preserve.
Here's a working demo to see what it matches: https://regexr.com/47gbm
So with String[] split = input.split("\\s(?=(Jr\\.|III|Sr|pro se))"); you should get the desired result.

You can try adding a different seprator to the string hoding the keywords. Insted of seprating them by " " separate them by "|".
String input = "Abraham|Lincoln|Jr.|III|Sr|pro se";
String [] splitByPipe = input.split("\\|");
This way you won`t be having any problems with the spaces you need to keep.

Related

How to split a string every N words [duplicate]

This question already has answers here:
How to split a String by space
(17 answers)
How to split a string array into small chunk arrays in java?
(17 answers)
Splitting at every n-th separator, and keeping the character
(4 answers)
Closed last year.
I want to split one big string into smaller parts, so given for example:
"A B C D E F G H I J K L"
I want to get array (String []): [A,B,C,D], [E,F,G,H], [I,J,K,L]
Is there any regex for that or I need to do that manually so first to split every space and then concat every N words. ??
You can create a regex that describes this pattern.
e.g. "((?:\w+\s*){4})"
Or in simple words:
The \w+\s* part means that there are 1 or multiple word-characters (e.g. text, digits) followed by 0, 1 or multiple whitespace characters.
It is surrounded in braces and followed by {4} to indicate that we want this to occur 4 times.
Finally that again is wrapped in braces, because we want to capture that result.
By contrast the braces which were used to specify {4} are preceded by a (?: ...) prefix, which makes it a "non-capturing-group". We don't want to capture the individual matches just yet.
You can use that pattern in java to extract each chunk of 4 occurrences.
And than next, you can simply split each individual result with a second regex, \s+ ( = whitespace)
Edit
One more thing, you may notice that the first matched group also contains whitespace at the end. You can get rid of that with a more advanced regex: ((?:\w+\s+){3}(?:\w+))\s*
You could use regex for this:
e.g.:
String x = "AAS BASD CAFAS DAFASF EASFASF FAFSASF GA HASF IAS JAS KAS LSA";
ArrayList<String> found = new ArrayList<>();
Pattern pattern = Pattern.compile("(\\w+\\s\\w+\\s\\w+)");
Matcher m = pattern.matcher(x);
while (m.find()) {
String s = m.group();
found.add(s);
}
//if you want to convert your List to an Array
String[] result = found.toArray(new String[0]);
System.out.println(Arrays.toString(result));
Result: [AAS BASD CAFAS DAFASF, EASFASF FAFSASF GA HASF, IAS JAS KAS LSA]
This pattern ("(\\w+\\s\\w+\\s\\w+\\s\\w+)") matches 4 words separated by one space. The loop iterates over every found match and adds it to your result list.
There are multiple ways you can achieve this,
for ex. let your string be
String str = "A B C D E F G H I J K L";
one way to split it would be using regular expression
java.util.Arrays.toString(str.split("(?<=\\G....)"))
here the .... represent how many characters in each string, another way to specify the pattern would be .{4}
another way would be
Iterable<String> strArr = Splitter.fixedLength(3).split(str );
there could be more ways to achieve the same

About String split for two numbered variable which is read as String [duplicate]

This question already has answers here:
How to split a string, but also keep the delimiters?
(24 answers)
Closed 3 years ago.
if I read 1symbol2 as a String i can split that using String.split("") into 3 variables but if I read 12 symbol 16 as a string and if I apply String.split(" ") it is split into 6 variables. How can I split that into 3 variables that are (12, symbol,16)?
Note:
The following any of them can be considered as Symbols +,-,*,/,%,~,!,#,#,$,^,&
If you can separate the three string 12 + and 16 with a comma, means something like --> 12,+,16 then below code will work for you.
String str = "12,+,16";
String a[] = str.split(",");
System.out.println(a[0]+" "+a[1]+" "+a[2]);
Result will be --> 12 + 16
Try this and let me know
You can use following regex to separate your string in parts:
String myString = "12+16";
String[] result = myString.split("(?<=[-+*/])|(?=[-+*/])");
System.out.println(Arrays.toString(result));
Output:
[12,+,16]
Instead of String.split(" "), you can just do: String.split("+")

Java split String that start with space [duplicate]

This question already has answers here:
How to prevent java.lang.String.split() from creating a leading empty string?
(9 answers)
Closed 4 years ago.
I know how to split a string by space as the following:
String[] array = string.split(" ");
This works great until I try to split string that starts with a space like
" I like apple"
The result looks something like this:
{"", "I", "like", "apple"}
How can I split the string so it only keeps strings that is not empty?
You can call string.trim()and then string.split(" "). The trim() method removes spaces before the first non-space-character and after the last non-space-character.
To remove leading and trailing spaces, you can use .trim().
String[] array = string.trim().split(" ");

Android: Split string by "\n" as string not as new line [duplicate]

This question already has answers here:
How to split a java string at backslash
(8 answers)
Closed 4 years ago.
I have a below string which is coming from the server
String text = "- 30016264\n- 30014837\n- 30014836\n";
When I used to split it like this
String[] list = text.split("\n");
I got the list like this with length 1
list[0] = "- 30016264\n- 30014837\n- 30014836\n";
And when I used to split it like this
String[] list = text.split("\\n");
I got the same list like this with length 1
list[0] = "- 30016264\n- 30014837\n- 30014836\n";
How do I write the code to split the string on basis of "\n" not the next line?
NOTE: This string is coming from the server as it is written here and when I use this server string as TextView value, it will display in one single line.
If you input is coming from the server and in this format :
- 30016264\n- 30014837\n- 30014836\n
Then, in Java it should be represented with double backslash like this :
- 30016264\\n- 30014837\\n- 30014836\\n
because backslash is a special character in Java, you have to escape it with another backslash.
Then to split with \\n you need to use \\\\n, why 4 backslashes because like i said before the backslash is special character for that you have to escape each one with another backslash for that you need 4 instead of 2 or 1.
Your solution should look like :
String text = "- 30016264\\n- 30014837\\n- 30014836\\n";
String[] split = text.split("\\\\n");
Outputs
- 30016264
- 30014837
- 30014836

How to split a String sentence into words using split method in Java? [duplicate]

This question already has answers here:
How to split a string with any whitespace chars as delimiters
(13 answers)
Closed 5 years ago.
I need to split some sentences into words.
For example:
Upper sentence.
Lower sentence. And some text.
I do it by:
String[] words = text.split("(\\s+|[^.]+$)");
But the output I get is:
Upper, sentence.Lower, sentence., And, some, text.
And it should be like:
Upper, sentence., Lower, sentence., And, some, text.
Notice that I need to preserve all the characters (.,-?! etc.)
in regular expressions \W+ match one or more non word characters.
http://www.vogella.com/tutorials/JavaRegularExpressions/article.html
So if you want to get the words in the sentences you can use \W+ as the splitter.
String[] words = text.split("\\W+");
this will give you following output.
Upper
sentence
Lower
sentence
And
some
text
UPDATE :
Since you have updated your question, if you want to preserve all characters and split by spaces, use \s+ as the splitter.
String[] words = text.split("\\s+");
I have checked following code block and confirmed that it is working with new lines too.
String text = "Upper sentence.\n" +
"Lower sentence. And some text.";
String[] words = text.split("\\s+");
for (String word : words){
System.out.println(word);
}
Replace dots, commas, etc... for a white space and split that for whitespace
String text = "hello.world this is.a sentence.";
String[] list = text.replaceAll("\\.", " " ).split("\\s+");
System.out.println(new ArrayList<>(Arrays.asList(list)));
Result: [hello, world, this, is, a, sentence]
Edit:
If is only for dots this trick should work...
String text = "hello.world this is.a sentence.";
String[] list = text.replaceAll("\\.", ". " ).split("\\s+");
System.out.println(new ArrayList<>(Arrays.asList(list)));
[hello., world, this, is., a, sentence.]
The expression \\s+ means "1 or more whitespace characters". I think what you need to do is replace this by \\s*, which means "zero or more whitespace characters".
Simple answer for updated question
String text = "Upper sentence.\n"+
"Lower sentence. And some text.";
[just space] one or more OR new lines one or more
String[] arr1 = text.split("[ ]+|\n+");
System.out.println(Arrays.toString(arr1));
result:
[Upper, sentence., Lower, sentence., And, some, text.]
You can split the string into sub strings using the following line of code:
String[] result = speech.split("\\s");
For reference: https://alvinalexander.com/java/edu/pj/pj010006

Categories