I want to count the number of words in a given String. For example, we are parsing a large text document.
I have used this method
noOfWords = countedStegoText.trim().split(" +").length;
but what about if the text contain two kind of spaces for example( "U+0020" and "U+205F) how I can count the number of word in this case?
.split(...) can take a regular expression, simply build one that includes all your matching characters.
For example:
"hello world-foo_bar".split("[ |\\-|_]")
Results in an array of length 4
["hello", "world", "foo", "bar"]
To use Unicode characters in RegEx you use \u####, so you're looking for something like:
countedStegoText.trim().split("[\u0020|\u205F]").length
Related
I've got a string and I want to get first words that are containing up to N characters together.
For example:
String s = "This is some text form which I want to get some first words";
Let's say that I want to get words up to 30 characters, result should look like this:
This is some text form which
Is there any method for this? I don't want to reinvent the wheel.
EDIT: I know the substring method, but it can break words. I don't want to get something like
This is some text form whi
etc.
You could use regular expressions to achieve this. Something like below should do the job:
String input = "This is some text form which I want to get some first words";
Pattern p = Pattern.compile("(\\b.{25}[^\\s]*)");
Matcher m = p.matcher(input);
if(m.find())
System.out.println(m.group(1));
This yields:
This is some text form which
An explanation of the regular expression is available here. I used 25 since the first 25 characters would result into a broken sub string, so you can replace it with whatever value you want.
Split your string with space ' ' then foreach substring add it to a new string and check whether the length of the new substring exceeds or not exceeds the limit.
you could do it like this without regex
String s = "This is some text form which I want to get some first words";
// Check if last character is a whitespace
int index = s.indexOf(' ', 29-1);
System.out.println(s.substring(0,index));
The output is This is some text form which;
obligatory edit: there is no length check in there, so care for it.
I have a string and it has two words with 3 spaces between them. For example: "Hello Java". I need to extract "Java" from this string. I searched for regex and tokens, but can't find a solution.
All you have to do is split the string by the space and get the second index since Java is after the first space. You said it has 3 spaces so you can just do:
string.split(" ")[1];
In your case, if string is equal to Hello Java with three spaces, this will work.
There are many ways you can do this, but which you choose depends on how you may expect your input to vary. If you can assume there will always be exactly 3 spaces in the string, all sequential, then just use the indexOf method to locate the first space, add 3 to that index, and take a substring with the resulting value. If you're unsure how many sequential spaces there will be, use lastIndexOf and add 1. You can also use the split method mentioned in another solution.
For instance:
String s = "Hello Java";
s = s.substring(s.lastIndexOf(" ")+1);
System.out.println(s);
I want to remove a numeric value plus the words after it.
I have some Strings like this. When I perform my Regex, it only removes the numeric value. But the words after that value is not removed. I want the words after that to be deleted as well.
Before: Strings :
E.g.
APPLEJUICE2.4L
GreenAppleJuice1L
HALVEDPEACHES415g
IceyChocIceCream60ml
After: Strings:
E.g.
APPLEJUICE
GreenAppleJuice
HALVEDPEACHES
IceyChocIceCream
You can use this regex:
String repl = input.replaceFirst("\\d\\S*", "");
i.e. find a first digit in input and match every non-space character after that.
RegEx Demo
Say I have a string of a text document and I want to save the 124th word of that string in another string how would I do this? I assume it counts each "word" as a string of text in between spaces (including things like - hyphens).
Edit:
Basically what I'm doing right now is grabbing text off a webpage and I want to get number values next to a certain word. Like here is the webpage, and its saved in a string .....health 78 mana 32..... or something of the sort and i want to get the 78 and the 32 and save it as a variable
If you have a string
String s = "...";
then you can get the word (separated by spaces) in the nth position using split(delimiter) which returns an array String[]:
String word = s.split("\\s+")[n-1];
Note:
The argument passed to split() is the delimiter. In this case, "\\s+" is a regular expression, that means that the method will split the string on each whitespace, even if there are more than one together.
Why not convert the String to a String array using StringName.split(" "), i.e. split the string based on spaces. Then only a matter of retrieving the 124th element of the array.
For example, you have a string like this:
String a="Hello stackoverflow i am Gratin";
To see 5th word, just write that code:
System.out.println(a.split("\\s+")[4]);
This is a different approach that automatically returns a blank String if there isn't a 5th word:
String word = input.replaceAll("^\\s*(\\S+\\s+){4}(\\S+)?.*", "$1");
Solutions that rely on split() would need an extra step of checking the resulting array length to prevent getting an ArrayIndexOutOfBoundsException if there are less than 5 words.
Hi in my program a string is generated like "1&area_id=54&cid=3".First an integer and then a string "&area_id=",then antoher integer, and after that a string "&cid=" and then the final integer.These two strings are always same.But integer is changing.How to write split function so that i can find these three integers in three variable or with in an array.I can seperate these by looping but i want to use split function.Thanks
How about
string.split("&\\w+=")
This works with your example:
System.out.println(Arrays.asList("1&area_id=54&cid=3".split("&\\w+=")));
outputs
[1, 54, 3]
The call to string.split("&\\w+=") reads in English: Split string on every match for the regular expression parameter, and then return all substrings in between the matched tokens as an array.
The regular expression reads: Match all substrings starting with "&", followed by at least ("+") one word-character ("\\w", i.e. letters, digits, and some special characters, such as the underscore from your example), followed by "=". For more details see the Javadoc for java.util.regex.Pattern