Java Splitting A Sentence

Java Splitting A Sentence - java

I am writing a program for Twitter. It will read a tweet and get the hashtags in it.
The problem is, I couldn't split it. For example, "I love #computers so much." in this one, I need to obtain only the "computers" part.
I thought about using split function by using # but it will split the sentence in a half so still, it won't be a solution. Any ideas?

You want to split on the # indeed. After that you want to have the word. So split on the " " space :).
string="I love #computers so much.";
String[] parts = string.split("#");
String part1 = parts[0]; // I love
String part2 = parts[1]; // computers so much.
String[] parts2 = part2.split(" ");
String output = parts2[0];
The above should work, haven't tested it though.
If there are multiple hashtages the above won't work, try the below one:
String string="I love #computers so #much omg #lol .";
String[] stringParts = string.split("#");
//'delete' first element.
String[] parts = Arrays.copyOfRange(stringParts, 1, stringParts.length);
int i = 0;
String[] output = new String[10];
for(String part : parts)
{
if(part.contains(" "))
{
String[] parts2 = part.split(" ");
output[i] = parts2[0];
i++;
}
}
The only problem is with this code, that you need a space otherwise you will have different characters in your word.

You would do well to take a look at solving the problem using regular expressions.... try something like (?<=#)\w+ -- it will return all alpha numerics after the #, while not capturing the #. You may want to change the \w to include additional characters as required. Hope this helps.

You can use regular expressions to obtain the hash tags from the tweet. Something like:
String sentence = "I love #computers and #something_Else so much";
Pattern p = Pattern.compile("#\\S+");
List<String> hashTags = new ArrayList<>();
Matcher matcher = p.matcher(sentence);
while (matcher.find()) {
hashTags.add(matcher.group(0));
}
System.out.println(hashTags);

Related

How to retrieve all records in a string after a certain value

I am looking to retrieve all of the values after "xnum=" and before the "," delimiter. For example, in the below String, i would like to retrieve the values "zjdb" and "2jdb" and store them in an array in the order they are found. I know this is a very random thing to ask for but it's unfortunately the only way to solve the problem i am currently faced with.
String: "{zjdb={fname=jbdjd, lname=ejdj, xnum=zjdb, email=ejdj}, 2jdb={fname=ij, lname=vji, xnum=2jdb, email=bbb}}"
I understand that i need to loop through and search for "x" and then see if the next character is "n" and the next is "u" etc and then get the index after the "=" and upto the "," but i've thought about this too much and it's too complex for me to get. I'm wondering if anyone knows a somewhat simple solution to this..? Thanks so much in advance.

You can use a regular expression to find the tokens. Assuming that the input is well formatted e.g. after xnum= value there is either , or }:
String input = "{zjdb={fname=jbdjd, lname=ejdj, xnum=zjdb, email=ejdj}, 2jdb={fname=ij, lname=vji, xnum=2jdb, email=bbb}}";
Pattern p = Pattern.compile("[{ ]xnum=([^,}]+)[,}]");
Matcher m = p.matcher(input);
while (m.find()) {
String xnum = m.group(1);
System.out.println(xnum);
}

You can use a regex and some replace to clean your text.
For instance you can have something like this:
String str = "{zjdb={fname=jbdjd, lname=ejdj, xnum=zjdb, email=ejdj}, 2jdb={fname=ij, lname=vji, xnum=2jdb, email=bbb}}";
String[] result = str.replaceAll(".*?(xnum[^}]*).*", "$1")
.replaceAll(", *", ",")
.split(",");
for (String item : Arrays.asList(result)) {
System.out.println(item.split("=")[1]);
}
IdeOne demo

Below code gives output your are desiring. But #Karol Dowbecki 's approach is better. This code does same thing but without regular expression.
String input = "{zjdb={fname=jbdjd, lname=ejdj, xnum=zjdb, email=ejdj}, 2jdb={fname=ij, lname=vji, xnum=2jdb, email=bbb}}";
String[] b = input.split(",");
List<String> list = Arrays.stream(b).filter(s -> s.contains("xnum=")).collect(Collectors.toList());
List<String> finalList = new ArrayList<>();
for (String s : list) {
String x = s.replace("xnum=","");
finalList.add(x);
}
System.out.println(finalList);

How to get a array of string like ["#{xxxx}","#{yyyy}"] from a string like "abc#{xxxx}def#{yyyy}ghi" using java?

How to get a array of string like ["#{xxxx}","#{yyyy}"] from a string like "abc#{xxxx}def#{yyyy}ghi" using java?
I'm not good at English so I have to make great effort to express my question.
I want to take the uel expressions out, so I think there may be some methods existing to solve this situation.

Scanner sc = new Scanner(System.in);
String input = sc.nextLine();
//remove first substring from input
String formattedInput = input.substring(input.indexOf("#"), input.lastIndexOf("}") + 1);
//make a regex that checks for string enclosed in } #{
String regex = "(?<=[}])[A-Za-z]*(?=[#])";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(formattedInput);
//remove the characters between } and #{
if (m.find()) {
formattedInput = formattedInput.replaceAll(regex, "");
}
System.out.println(formattedInput);
}
Input: abc#{xxxx}def#{yyyy}
Output: #{xxxx}#{yyyy}
I am not really sure as to what you are trying to ask because your question was not worded properly, but this code will remove any characters that are not enclosed in the #{} tag. You can then split the resultant string into an array. I hope this helps

Splitting a string with multiple spaces

I want to split a string like
"first middle last"
with String.split(). But when i try to split it I get
String[] array = {"first","","","","middle","","last"}
I tried using String.isEmpty() to check for empty strings after I split them but I it doesn't work in android. Here is my code:
String s = "First Middle Last";
String[] array = s.split(" ");
for(int i=0; i<array.length; i++) {
//displays segmented strings here
}
I think there is a way to split it like this: {"first","middle","last"} but can't figure out how.
Thanks for the help!

Since the argument to split() is a regular expression, you can look for one or more spaces (" +") instead of just one space (" ").
String[] array = s.split(" +");

try using this s.split("\\s+");

if you have a string like
String s = "This is a test string This is the next part This is the third part";
and want to get an array like
String[] sArray = { "This is a test string", "This is the next part", "This is the third part" }
you should try
String[] sArray = s.split("\\s{2,}");
The {2,} part defines that at least 2 and up to almost infinity whitespace characters are needed for the split to occur.

This worked for me.
s.split(/\s+/)
var foo = "first middle last";
console.log(foo.split(/\s+/));

Since split() uses regular expressions, you can do something like s.split("\\s+") to set the split delimiter to be any number of whitespace characters.

How about using something that is provided out of the box by Android SDK.
TextUtils.split(stringToSplit, " +");

If someone is looking for koltin code
val str = " fly me to the moon "
println(str.trim().split(" +".toRegex()))
// output - [fly, me, to, the, moon]

regular expression to split the string in java

I want to split the string say [AO_12345678, Real Estate] into AO_12345678 and Real Estate
how can I do this in Java using regex?
main issue m facing is in avoiding "[" and "]"
please help

Does it really have to be regex?
if not:
String s = "[AO_12345678, Real Estate]";
String[] split = s.substring(1, s.length()-1).split(", ");

I'd go the pragmatic way:
String org = "[AO_12345678, Real Estate]";
String plain = null;
if(org.startsWith("[") {
if(org.endsWith("]") {
plain = org.subString(1, org.length());
} else {
plain = org.subString(1, org.length() + 1);
}
}
String[] result = org.split(",");
If the string is always surrounded with '[]' you can just substring it without checking.

One easy way, assuming the format of all your inputs is consistent, is to ignore regex altogether and just split it. Something like the following would work:
String[] parts = input.split(","); // parts is ["[AO_12345678", "Real Estate]"]
String firstWithoutBrace = parts[0].substring(1);
String secondWithoutBrace = parts[1].substring(0, parts[1].length() - 1);
String first = firstWithoutBrace.trim();
String second = secondWithoutBrace.trim();
Of course you can tailor this as you wish - you might want to check whether the braces are present before removing them, for example. Or you might want to keep any spaces before the comma as part of the first string. This should give you a basis to modify to your specific requirements however.
And in a simple case like this I'd much prefer code like the above to a regex that extracted the two strings - I consider the former much clearer!

you can also use StringTokenizer. Here is the code:
String str="[AO_12345678, Real Estate]"
StringTokenizer st=new StringTokenizer(str,"[],",false);
String s1 = st.nextToken();
String s2 = st.nextToken();
s1=AO_12345678
s1=Real Estate
Refer to javadocs for reading about StringTokenizer
http://download.oracle.com/javase/1.4.2/docs/api/java/util/StringTokenizer.html

Another option using regular expressions (RE) capturing groups:
private static void extract(String text) {
Pattern pattern = Pattern.compile("\\[(.*),\\s*(.*)\\]");
Matcher matcher = pattern.matcher(text);
if (matcher.find()) { // or .matches for matching the whole text
String id = matcher.group(1);
String name = matcher.group(2);
// do something with id and name
System.out.printf("ID: %s%nName: %s%n", id, name);
}
}
If speed/memory is a concern, the RE can be optimized to (using Possessive quantifiers instead of Greedy ones)
"\\[([^,]*+),\\s*+([^\\]]*+)\\]"

What is the best way to extract the first word from a string in Java?

Trying to write a short method so that I can parse a string and extract the first word. I have been looking for the best way to do this.
I assume I would use str.split(","), however I would like to grab just the first first word from a string, and save that in one variable, and and put the rest of the tokens in another variable.
Is there a concise way of doing this?

The second parameter of the split method is optional, and if specified will split the target string only N times.
For example:
String mystring = "the quick brown fox";
String arr[] = mystring.split(" ", 2);
String firstWord = arr[0]; //the
String theRest = arr[1]; //quick brown fox
Alternatively you could use the substring method of String.

You should be doing this
String input = "hello world, this is a line of text";
int i = input.indexOf(' ');
String word = input.substring(0, i);
String rest = input.substring(i);
The above is the fastest way of doing this task.

To simplify the above:
text.substring(0, text.indexOf(' '));
Here is a ready function:
private String getFirstWord(String text) {
int index = text.indexOf(' ');
if (index > -1) { // Check if there is more than one word.
return text.substring(0, index).trim(); // Extract first word.
} else {
return text; // Text is the first word itself.
}
}

The simple one I used to do is
str.contains(" ") ? str.split(" ")[0] : str
Where str is your string or text bla bla :). So, if
str is having empty value it returns as it is.
str is having one word, it returns as it is.
str is multiple words, it extract the first word and return.
Hope this is helpful.

import org.apache.commons.lang3.StringUtils;
...
StringUtils.substringBefore("Grigory Kislin", " ")

You can use String.split with a limit of 2.
String s = "Hello World, I'm the rest.";
String[] result = s.split(" ", 2);
String first = result[0];
String rest = result[1];
System.out.println("First: " + first);
System.out.println("Rest: " + rest);
// prints =>
// First: Hello
// Rest: World, I'm the rest.
API docs for: split

for those who are searching for kotlin
var delimiter = " "
var mFullname = "Mahendra Rajdhami"
var greetingName = mFullname.substringBefore(delimiter)

like this:
final String str = "This is a long sentence";
final String[] arr = str.split(" ", 2);
System.out.println(Arrays.toString(arr));
arr[0] is the first word, arr[1] is the rest

You could use a Scanner
http://download.oracle.com/javase/1.5.0/docs/api/java/util/Scanner.html
The scanner can also use delimiters
other than whitespace. This example
reads several items in from a string:
String input = "1 fish 2 fish red fish blue fish";
Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*");
System.out.println(s.nextInt());
System.out.println(s.nextInt());
System.out.println(s.next());
System.out.println(s.next());
s.close();
prints the following output:
1
2
red
blue

None of these answers appears to define what the OP might mean by a "word". As others have already said, a "word boundary" may be a comma, and certainly can't be counted on to be a space, or even "white space" (i.e. also tabs, newlines, etc.)
At the simplest, I'd say the word has to consist of any Unicode letters, and any digits. Even this may not be right: a String may not qualify as a word if it contains numbers, or starts with a number. Furthermore, what about hyphens, or apostrophes, of which there are presumably several variants in the whole of Unicode? All sorts of discussions of this kind and many others will apply not just to English but to all other languages, including non-human language, scientific notation, etc. It's a big topic.
But a start might be this (NB written in Groovy):
String givenString = "one two9 thr0ee four"
// String givenString = "oňňÜÐæne;:tŵo9===tĥr0eè? four!"
// String givenString = "mouse"
// String givenString = "&&^^^%"
String[] substrings = givenString.split( '[^\\p{L}^\\d]+' )
println "substrings |$substrings|"
println "first word |${substrings[0]}|"
This works OK for the first, second and third givenStrings. For "&&^^^%" it says that the first "word" is a zero-length string, and the second is "^^^". Actually a leading zero-length token is String.split's way of saying "your given String starts not with a token but a delimiter".
NB in regex \p{L} means "any Unicode letter". The parameter of String.split is of course what defines the "delimiter pattern"... i.e. a clump of characters which separates tokens.
NB2 Performance issues are irrelevant for a discussion like this, and almost certainly for all contexts.
NB3 My first port of call was Apache Commons' StringUtils package. They are likely to have the most effective and best engineered solutions for this sort of thing. But nothing jumped out... https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html ... although something of use may be lurking there.

You could also use http://download.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html

I know this question has been answered already, but I have another solution (For those still searching for answers) which can fit on one line:
It uses the split functionality but only gives you the 1st entity.
String test = "123_456";
String value = test.split("_")[0];
System.out.println(value);
The output will show:
123

The easiest way I found is this:
void main()
String input = "hello world, this is a line of text";
print(input.split(" ").first);
}
Output: hello

Assuming Delimiter is a blank space here:
Before Java 8:
private String getFirstWord(String sentence){
String delimiter = " "; //Blank space is delimiter here
String[] words = sentence.split(delimiter);
return words[0];
}
After Java 8:
private String getFirstWord(String sentence){
String delimiter = " "; //Blank space is delimiter here
String firstWord = Arrays.stream(sentence.split(delimiter))
.findFirst()
.orElse("No word found");
}

String anotherPalindrome = "Niagara. O roar again!";
String roar = anotherPalindrome.substring(11, 15);
You can also do like these

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Splitting A Sentence - java

You would do well to take a look at solving the problem using regular expressions.... try something like (?<=#)\w+ -- it will return all alpha numerics after the #, while not capturing the #. You may want to change the \w to include additional characters as required. Hope this helps.

Related

How to retrieve all records in a string after a certain value

How to get a array of string like ["#{xxxx}","#{yyyy}"] from a string like "abc#{xxxx}def#{yyyy}ghi" using java?

Splitting a string with multiple spaces

regular expression to split the string in java

What is the best way to extract the first word from a string in Java?

Categories

Resources