How do I get the 5th word in a string? Java - java

Say I have a string of a text document and I want to save the 124th word of that string in another string how would I do this? I assume it counts each "word" as a string of text in between spaces (including things like - hyphens).
Edit:
Basically what I'm doing right now is grabbing text off a webpage and I want to get number values next to a certain word. Like here is the webpage, and its saved in a string .....health 78 mana 32..... or something of the sort and i want to get the 78 and the 32 and save it as a variable

If you have a string
String s = "...";
then you can get the word (separated by spaces) in the nth position using split(delimiter) which returns an array String[]:
String word = s.split("\\s+")[n-1];
Note:
The argument passed to split() is the delimiter. In this case, "\\s+" is a regular expression, that means that the method will split the string on each whitespace, even if there are more than one together.

Why not convert the String to a String array using StringName.split(" "), i.e. split the string based on spaces. Then only a matter of retrieving the 124th element of the array.

For example, you have a string like this:
String a="Hello stackoverflow i am Gratin";
To see 5th word, just write that code:
System.out.println(a.split("\\s+")[4]);

This is a different approach that automatically returns a blank String if there isn't a 5th word:
String word = input.replaceAll("^\\s*(\\S+\\s+){4}(\\S+)?.*", "$1");
Solutions that rely on split() would need an extra step of checking the resulting array length to prevent getting an ArrayIndexOutOfBoundsException if there are less than 5 words.

Related

+count the number of words in string

I want to count the number of words in a given String. For example, we are parsing a large text document.
I have used this method
noOfWords = countedStegoText.trim().split(" +").length;
but what about if the text contain two kind of spaces for example( "U+0020" and "U+205F) how I can count the number of word in this case?
.split(...) can take a regular expression, simply build one that includes all your matching characters.
For example:
"hello world-foo_bar".split("[ |\\-|_]")
Results in an array of length 4
["hello", "world", "foo", "bar"]
To use Unicode characters in RegEx you use \u####, so you're looking for something like:
countedStegoText.trim().split("[\u0020|\u205F]").length

Correct way to split UTF-8 String

I want to split a utf-8 string.
I have tried the StringTokenizer but it fails.
The title should be "0" but it shows as "عُدي_صدّام_حُسين".
String test = "en.m عُدي_صدّام_حُسين 1 0";
StringTokenizer stringTokenizer = new StringTokenizer(test);
String code = stringTokenizer.nextToken();
String title = stringTokenizer.nextToken();
What is the correct way to split a utf-8 string?
The problem here is that the Arabic text isn't "at the end" of the string.
For example, if I select the contents of the string literal (in Chrome), moving my mouse from left-to-right, it selects the en.m first, then selects all of the arabic text, then the 0 1. The text just looks "at the end" because that's how it is being rendered.
The string, as specified in your Java source code actually does have the عُدي_صدّام_حُسين as the second token. So, you're splitting it correctly, you're just not splitting what you think you're splitting.
Generally, there is not the correct way, but I normally use the method substring() of the String class (see here). You can pass it either the begin index to make it return the substring from that index to the original String's end or two bounds of the substring within the original String. With the method indexOf() of the same class you can locate a character within the original String if you do not know its index.

Splitting characters using regex returns an empty value in java

Here is my input
....ALPO..LAPOL.STRING
I want to seperate each string when it reaches '.' and store in string array list.
I tried using the below code,
ArrayList<String> words = new ArrayList<> (Arrays.asList(chars.stream().map(String::valueOf)
.collect(Collectors.joining("")).split("\\.+")));
There is a problem with regex split("\.+")) .
EXPECTED OUTPUT:
ALPO
LAPOL
STRING
ACTUAL OUTPUT:
" " - > Blank Space
LAPOL
STRING
It prints the first value of the list as an empty value because there are many '.' present before 'A". How to get rid of this empty value in string array list. Any help would be glad !!
The empty element appears because the delimiters are matched before the first value you need to get.
You need to remove the delimiter symbols from the start of the string first with .replaceFirst("^\\.+", "") and then split:
String results[] = "....ALPO..LAPOL.STRING".replaceFirst("^\\.+", "").split("\\.+");
System.out.println(Arrays.toString(results));
See the IDEONE demo
The ^\\.+ pattern matches the beginning of a string (^) and then 1 or more literal dots (\\.+). replaceFirst is used because only 1 replacement is expected (no need using replaceAll).
A bit more details on splitting in Java can be found in the documentation:
public String[] split(String regex)
Splits this string around matches of the given regular expression.
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
However, leading empty elements will be included if found. So, we first need to get rid of those delimiters at the string start.

Substring contatining words up to n characters

I've got a string and I want to get first words that are containing up to N characters together.
For example:
String s = "This is some text form which I want to get some first words";
Let's say that I want to get words up to 30 characters, result should look like this:
This is some text form which
Is there any method for this? I don't want to reinvent the wheel.
EDIT: I know the substring method, but it can break words. I don't want to get something like
This is some text form whi
etc.
You could use regular expressions to achieve this. Something like below should do the job:
String input = "This is some text form which I want to get some first words";
Pattern p = Pattern.compile("(\\b.{25}[^\\s]*)");
Matcher m = p.matcher(input);
if(m.find())
System.out.println(m.group(1));
This yields:
This is some text form which
An explanation of the regular expression is available here. I used 25 since the first 25 characters would result into a broken sub string, so you can replace it with whatever value you want.
Split your string with space ' ' then foreach substring add it to a new string and check whether the length of the new substring exceeds or not exceeds the limit.
you could do it like this without regex
String s = "This is some text form which I want to get some first words";
// Check if last character is a whitespace
int index = s.indexOf(' ', 29-1);
System.out.println(s.substring(0,index));
The output is This is some text form which;
obligatory edit: there is no length check in there, so care for it.

take characters after spaces in java

I have a string and it has two words with 3 spaces between them. For example: "Hello Java". I need to extract "Java" from this string. I searched for regex and tokens, but can't find a solution.
All you have to do is split the string by the space and get the second index since Java is after the first space. You said it has 3 spaces so you can just do:
string.split(" ")[1];
In your case, if string is equal to Hello Java with three spaces, this will work.
There are many ways you can do this, but which you choose depends on how you may expect your input to vary. If you can assume there will always be exactly 3 spaces in the string, all sequential, then just use the indexOf method to locate the first space, add 3 to that index, and take a substring with the resulting value. If you're unsure how many sequential spaces there will be, use lastIndexOf and add 1. You can also use the split method mentioned in another solution.
For instance:
String s = "Hello Java";
s = s.substring(s.lastIndexOf(" ")+1);
System.out.println(s);

Categories