Split each string from a paragraph

Split each string from a paragraph - java

I am trying to split each string from a paragraph, which has proper grammar based punctuation delimiters like ,.!? or more if any.
I am trying to achieve this using Java. Here is my code.
private void printWords(String inputString) {
String[] x = inputString.split("[.!,\\s]");
for(String temp: x){
System.out.println(temp);
}
}
Sample input String:
He is srk. Oh! I am a very good friend of srk.
My output:
He
is
srk
Oh
I
am
a
very
good
friend
of
srk
There is a problem here, It is having spaces as shown in the output. What should be my regular expression to split strings in any given paragraph, without spaces in the output.

You need to add a + to make your expression match one or more characters:
String[] x = inputString.split("[.!,\\s]+");

What about:
String[] x = inputString.split("\\W+");

Related

Java split with certain patern

String abc ="abc_123,low,101.111.111.111,100.254.132.156,abc,1";
String[] ab = abc.split("(\\d+),[a-z]");
System.out.println(ab[0]);
Expected Output:
abc_123
low
101.111.111.111,100.254.132.156
abc
1
The problem is i am not able to find appropriate regex for this pattern.

I would suggest to not solve all problems with one regular expression.
It seems that your initial string contains values that are separated by ",". So split those values with ",".
Then iterate the output of that process; and "join" those elements that are IP addresses (as it seems that this is what you are looking for).
And just for the sake of it: keep in mind that IP addresses are actually pretty complicated; a pattern "to match em all" can be found here

You could use lookahead and lookbehind to check, if 3 digits and a . at the correct place are preceding or following the ,:
String[] ab = abc.split("(?<!\\.\\d{3}),|,(?!\\d{3}\\.)");

String[] ab = abc.split(",");
System.out.println(ab[0]);
System.out.println(ab[1]);
int i = 2;
while(ab[i].matches("[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}")) {
if(i > 2) System.out.print(",");
System.out.print(ab[i++]);
}
System.out.println();
System.out.println(ab[i++]);
System.out.println(ab[i++]);

first split them into array by , ,then apply regex to check whether it is in desired formate or not.If yes then concate all these separated by,
String abc ="abc_123,low,101.111.111.111,100.254.132.156,abc,1";//or something else.
String[] split = abc.split(",");
String concat="";
for(String data:split){
boolean matched=data.matches("[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}");
if(matched){
concat=concat+","+data;
}else{
System.out.println(data);
}
}
if(concat.length()>0)
System.out.println(concat.substring(1));
}

Splitting a string with multiple spaces

I want to split a string like
"first middle last"
with String.split(). But when i try to split it I get
String[] array = {"first","","","","middle","","last"}
I tried using String.isEmpty() to check for empty strings after I split them but I it doesn't work in android. Here is my code:
String s = "First Middle Last";
String[] array = s.split(" ");
for(int i=0; i<array.length; i++) {
//displays segmented strings here
}
I think there is a way to split it like this: {"first","middle","last"} but can't figure out how.
Thanks for the help!

Since the argument to split() is a regular expression, you can look for one or more spaces (" +") instead of just one space (" ").
String[] array = s.split(" +");

try using this s.split("\\s+");

if you have a string like
String s = "This is a test string This is the next part This is the third part";
and want to get an array like
String[] sArray = { "This is a test string", "This is the next part", "This is the third part" }
you should try
String[] sArray = s.split("\\s{2,}");
The {2,} part defines that at least 2 and up to almost infinity whitespace characters are needed for the split to occur.

This worked for me.
s.split(/\s+/)
var foo = "first middle last";
console.log(foo.split(/\s+/));

Since split() uses regular expressions, you can do something like s.split("\\s+") to set the split delimiter to be any number of whitespace characters.

How about using something that is provided out of the box by Android SDK.
TextUtils.split(stringToSplit, " +");

If someone is looking for koltin code
val str = " fly me to the moon "
println(str.trim().split(" +".toRegex()))
// output - [fly, me, to, the, moon]

How do I fill a new array with split pieces from an existing one? (Java)

I'm trying to split paragraphs of information from an array into a new one which is broken into individual words. I know that I need to use the String[] split(String regex), but I can't get this to output right.
What am I doing wrong?
(assume that sentences[i] is the existing array)
String phrase = sentences[i];
String[] sentencesArray = phrase.split("");
System.out.println(sentencesArray[i]);
Thanks!

It might be just the console output going wrong. Try replacing the last line by
System.out.println(java.util.Arrays.toString(sentencesArray));
The empty-string argument to phrase.split("") is suspect too. Try passing a word boundary:
phrase.split("\\b");

You are using an empty expression for splitting, try phrase.split(" ") and work from there.

This does nothing useful:
String[] sentencesArray = phrase.split("");
you're splitting on empty string and it will return an array of the individual characters in the string, starting with an empty string.
It's hard to tell from your question/code what you're trying to do but if you want to split on words you need something like:
private static final Pattern SPC = Pattern.compile("\\s+");
.
.
String[] words = SPC.split(phrase);
The regex will split on one or more spaces which is probably what you want.

String[] sentencesArray = phrase.split("");
The regex based on which the phrase needs to be split up is nothing here. If you wish to split it based on a space character, use:
String[] sentencesArray = phrase.split(" ");
// ^ Give this space

Use String.split() with multiple delimiters

I need to split a string base on delimiter - and .. Below are my desired output.
AA.BB-CC-DD.zip ->
AA
BB
CC
DD
zip
but my following code does not work.
private void getId(String pdfName){
String[]tokens = pdfName.split("-\\.");
}

I think you need to include the regex OR operator:
String[]tokens = pdfName.split("-|\\.");
What you have will match:
[DASH followed by DOT together] -.
not
[DASH or DOT any of them] - or .

Try this regex "[-.]+". The + after treats consecutive delimiter chars as one. Remove plus if you do not want this.

You can use the regex "\W".This matches any non-word character.The required line would be:
String[] tokens=pdfName.split("\\W");

The string you give split is the string form of a regular expression, so:
private void getId(String pdfName){
String[]tokens = pdfName.split("[\\-.]");
}
That means to split on any character in the [] (we have to escape - with a backslash because it's special inside []; and of course we have to escape the backslash because this is a string). (Conversely, . is normally special but isn't special inside [].)

Using Guava you could do this:
Iterable<String> tokens = Splitter.on(CharMatcher.anyOf("-.")).split(pdfName);

For two char sequence as delimeters "AND" and "OR" this should be worked. Don't forget to trim while using.
String text ="ISTANBUL AND NEW YORK AND PARIS OR TOKYO AND MOSCOW";
String[] cities = text.split("AND|OR");
Result : cities = {"ISTANBUL ", " NEW YORK ", " PARIS ", " TOKYO ", " MOSCOW"}

pdfName.split("[.-]+");
[.-] -> any one of the . or - can be used as delimiter
+ sign signifies that if the aforementioned delimiters occur consecutively we should treat it as one.

I'd use Apache Commons:
import org.apache.commons.lang3.StringUtils;
private void getId(String pdfName){
String[] tokens = StringUtils.split(pdfName, "-.");
}
It'll split on any of the specified separators, as opposed to StringUtils.splitByWholeSeparator(str, separator) which uses the complete string as a separator

String[] token=s.split("[.-]");

It's better to use something like this:
s.split("[\\s\\-\\.\\'\\?\\,\\_\\#]+");
Have added a few other characters as sample. This is the safest way to use, because the way . and ' is treated.

Try this code:
var string = 'AA.BB-CC-DD.zip';
array = string.split(/[,.]/);

You may also specified regular expression as argument in split() method ..see below example....
private void getId(String pdfName){
String[]tokens = pdfName.split("-|\\.");
}

s.trim().split("[\\W]+")
should work.

you can try this way as split accepts varargs so we can pass multiple parameters as delimeters
String[]tokens = pdfName.split("-",".");
you can pass as many parameters that you want.

If you know the sting will always be in the same format, first split the string based on . and store the string at the first index in a variable. Then split the string in the second index based on - and store indexes 0, 1 and 2. Finally, split index 2 of the previous array based on . and you should have obtained all of the relevant fields.
Refer to the following snippet:
String[] tmp = pdfName.split(".");
String val1 = tmp[0];
tmp = tmp[1].split("-");
String val2 = tmp[0];
...

What is the best way to extract the first word from a string in Java?

Trying to write a short method so that I can parse a string and extract the first word. I have been looking for the best way to do this.
I assume I would use str.split(","), however I would like to grab just the first first word from a string, and save that in one variable, and and put the rest of the tokens in another variable.
Is there a concise way of doing this?

The second parameter of the split method is optional, and if specified will split the target string only N times.
For example:
String mystring = "the quick brown fox";
String arr[] = mystring.split(" ", 2);
String firstWord = arr[0]; //the
String theRest = arr[1]; //quick brown fox
Alternatively you could use the substring method of String.

You should be doing this
String input = "hello world, this is a line of text";
int i = input.indexOf(' ');
String word = input.substring(0, i);
String rest = input.substring(i);
The above is the fastest way of doing this task.

To simplify the above:
text.substring(0, text.indexOf(' '));
Here is a ready function:
private String getFirstWord(String text) {
int index = text.indexOf(' ');
if (index > -1) { // Check if there is more than one word.
return text.substring(0, index).trim(); // Extract first word.
} else {
return text; // Text is the first word itself.
}
}

The simple one I used to do is
str.contains(" ") ? str.split(" ")[0] : str
Where str is your string or text bla bla :). So, if
str is having empty value it returns as it is.
str is having one word, it returns as it is.
str is multiple words, it extract the first word and return.
Hope this is helpful.

import org.apache.commons.lang3.StringUtils;
...
StringUtils.substringBefore("Grigory Kislin", " ")

You can use String.split with a limit of 2.
String s = "Hello World, I'm the rest.";
String[] result = s.split(" ", 2);
String first = result[0];
String rest = result[1];
System.out.println("First: " + first);
System.out.println("Rest: " + rest);
// prints =>
// First: Hello
// Rest: World, I'm the rest.
API docs for: split

for those who are searching for kotlin
var delimiter = " "
var mFullname = "Mahendra Rajdhami"
var greetingName = mFullname.substringBefore(delimiter)

like this:
final String str = "This is a long sentence";
final String[] arr = str.split(" ", 2);
System.out.println(Arrays.toString(arr));
arr[0] is the first word, arr[1] is the rest

You could use a Scanner
http://download.oracle.com/javase/1.5.0/docs/api/java/util/Scanner.html
The scanner can also use delimiters
other than whitespace. This example
reads several items in from a string:
String input = "1 fish 2 fish red fish blue fish";
Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*");
System.out.println(s.nextInt());
System.out.println(s.nextInt());
System.out.println(s.next());
System.out.println(s.next());
s.close();
prints the following output:
1
2
red
blue

None of these answers appears to define what the OP might mean by a "word". As others have already said, a "word boundary" may be a comma, and certainly can't be counted on to be a space, or even "white space" (i.e. also tabs, newlines, etc.)
At the simplest, I'd say the word has to consist of any Unicode letters, and any digits. Even this may not be right: a String may not qualify as a word if it contains numbers, or starts with a number. Furthermore, what about hyphens, or apostrophes, of which there are presumably several variants in the whole of Unicode? All sorts of discussions of this kind and many others will apply not just to English but to all other languages, including non-human language, scientific notation, etc. It's a big topic.
But a start might be this (NB written in Groovy):
String givenString = "one two9 thr0ee four"
// String givenString = "oňňÜÐæne;:tŵo9===tĥr0eè? four!"
// String givenString = "mouse"
// String givenString = "&&^^^%"
String[] substrings = givenString.split( '[^\\p{L}^\\d]+' )
println "substrings |$substrings|"
println "first word |${substrings[0]}|"
This works OK for the first, second and third givenStrings. For "&&^^^%" it says that the first "word" is a zero-length string, and the second is "^^^". Actually a leading zero-length token is String.split's way of saying "your given String starts not with a token but a delimiter".
NB in regex \p{L} means "any Unicode letter". The parameter of String.split is of course what defines the "delimiter pattern"... i.e. a clump of characters which separates tokens.
NB2 Performance issues are irrelevant for a discussion like this, and almost certainly for all contexts.
NB3 My first port of call was Apache Commons' StringUtils package. They are likely to have the most effective and best engineered solutions for this sort of thing. But nothing jumped out... https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html ... although something of use may be lurking there.

You could also use http://download.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html

I know this question has been answered already, but I have another solution (For those still searching for answers) which can fit on one line:
It uses the split functionality but only gives you the 1st entity.
String test = "123_456";
String value = test.split("_")[0];
System.out.println(value);
The output will show:
123

The easiest way I found is this:
void main()
String input = "hello world, this is a line of text";
print(input.split(" ").first);
}
Output: hello

Assuming Delimiter is a blank space here:
Before Java 8:
private String getFirstWord(String sentence){
String delimiter = " "; //Blank space is delimiter here
String[] words = sentence.split(delimiter);
return words[0];
}
After Java 8:
private String getFirstWord(String sentence){
String delimiter = " "; //Blank space is delimiter here
String firstWord = Arrays.stream(sentence.split(delimiter))
.findFirst()
.orElse("No word found");
}

String anotherPalindrome = "Niagara. O roar again!";
String roar = anotherPalindrome.substring(11, 15);
You can also do like these

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Split each string from a paragraph - java

You need to add a + to make your expression match one or more characters: String[] x = inputString.split("[.!,\\s]+");

What about: String[] x = inputString.split("\\W+");

Related

Java split with certain patern

Splitting a string with multiple spaces

How do I fill a new array with split pieces from an existing one? (Java)

Use String.split() with multiple delimiters

What is the best way to extract the first word from a string in Java?

Categories

Resources