Substring contatining words up to n characters

Substring contatining words up to n characters - java

I've got a string and I want to get first words that are containing up to N characters together.
For example:
String s = "This is some text form which I want to get some first words";
Let's say that I want to get words up to 30 characters, result should look like this:
This is some text form which
Is there any method for this? I don't want to reinvent the wheel.
EDIT: I know the substring method, but it can break words. I don't want to get something like
This is some text form whi
etc.

You could use regular expressions to achieve this. Something like below should do the job:
String input = "This is some text form which I want to get some first words";
Pattern p = Pattern.compile("(\\b.{25}[^\\s]*)");
Matcher m = p.matcher(input);
if(m.find())
System.out.println(m.group(1));
This yields:
This is some text form which
An explanation of the regular expression is available here. I used 25 since the first 25 characters would result into a broken sub string, so you can replace it with whatever value you want.

Split your string with space ' ' then foreach substring add it to a new string and check whether the length of the new substring exceeds or not exceeds the limit.

you could do it like this without regex
String s = "This is some text form which I want to get some first words";
// Check if last character is a whitespace
int index = s.indexOf(' ', 29-1);
System.out.println(s.substring(0,index));
The output is This is some text form which;
obligatory edit: there is no length check in there, so care for it.

Related

Fix issue with regex expression getting first word in string in Java?

I am running a simple Java application and I want to grab the very first word in each string that I pass. So I did a str.split(" "); but I realized that my string that I have is very dynamic and it is constantly changing. So I pick up a , tagged on to the string which once I pass as a parameter crashes my code. So I am trying to find a particualry regex expression to just grab the very first words up until either a space, coma, period, or etc.
Example String:
WebServer, Config where AppConfig.display ends-with '/conf/workers.properties' and Config.content.content contains 'worker.lyc_' and Config.parent.guid==WebServer.guid and exists(WebServer.container.virtualHosts.serverName contains 'www.laffatservices.gix.com')
My goal would to be to just grab WebServer.

If your goal is to grab only the first word then here is your regex ^[A-Za-z]{2,}.
Note that this matches a sequence of, at least two, capital and lower letters. And that match must be only in the start.
Hope that helps your case.

If you want to find() first word which is NOT comma, whitespace, etc. you can use negated character class [^ ] and place inside all characters which you don't want. This will result in creating character class which will accept all other characters.
In your case you can use code like:
String yourText = "WebServer, Config where AppConfig.display ends-with '/conf/workers.properties' and Config.content.content contains 'worker.lyc_' and Config.parent.guid==WebServer.guid and exists(WebServer.container.virtualHosts.serverName contains 'www.laffatservices.gix.com')";
Pattern p = Pattern.compile("[^,\\s]+");
Matcher m = p.matcher(yourText);
if (m.find()){
String firstMatch = m.group();
System.out.println(firstMatch);
}else{
//handle case where there is no match found.
}
Output: WebServer
NOTE: Using find() will also allow you to find word even when your text starts with characters you want to skip like when String yourText = ",,,foo,,"; result would be foo.

Correct way to split UTF-8 String

I want to split a utf-8 string.
I have tried the StringTokenizer but it fails.
The title should be "0" but it shows as "عُدي_صدّام_حُسين".
String test = "en.m عُدي_صدّام_حُسين 1 0";
StringTokenizer stringTokenizer = new StringTokenizer(test);
String code = stringTokenizer.nextToken();
String title = stringTokenizer.nextToken();
What is the correct way to split a utf-8 string?

The problem here is that the Arabic text isn't "at the end" of the string.
For example, if I select the contents of the string literal (in Chrome), moving my mouse from left-to-right, it selects the en.m first, then selects all of the arabic text, then the 0 1. The text just looks "at the end" because that's how it is being rendered.
The string, as specified in your Java source code actually does have the عُدي_صدّام_حُسين as the second token. So, you're splitting it correctly, you're just not splitting what you think you're splitting.

Generally, there is not the correct way, but I normally use the method substring() of the String class (see here). You can pass it either the begin index to make it return the substring from that index to the original String's end or two bounds of the substring within the original String. With the method indexOf() of the same class you can locate a character within the original String if you do not know its index.

String split method returning first element as empty using regex

I'm trying to get the digits from the expression [1..1], using Java's split method. I'm using the regex expression ^\\[|\\.{2}|\\]$ inside split. But the split method returning me String array with first value as empty, and then "1" inside index 1 and 2 respectively. Could anyone please tell me what's wrong I'm doing in this regex expression, so that I only get the digits in the returned String array from split method?

You should use matching. Change your expression to:
`^\[(.*?)\.\.(.*)\]$`
And get your results from the two captured groups.
As for why split acts this way, it's simple: you asked it to split on the [ character, but there's still an "empty string" between the start of the string and the first [ character.

Your regex is matching [ and .. and ]. Thus it will split at this occurrences.
You should not use a split but match each number in your string using regex.

You've set it up such that [, ] and .. are delimiters. Split will return an empty first index because the first character in your string [1..1] is a delimiter. I would strip delimiters from the front and end of your string, as suggested here.
So, something like
input.replaceFirst("^[", "").split("^\\[|\\.{2}|\\]$");
Or, use regex and regex groups (such as the other answers in this question) more directly rather than through split.

Why not use a regex to capture the numbers? This will be more effective less error prone. In that case the regex looks like:
^\[(\d+)\.{2}(\d+)\]$
And you can capture them with:
Pattern pat = Pattern.compile("^\\[(\\d+)\\.{2}(\\d+)\\]$");
Matcher matcher = pattern.matcher(text);
if(matcher.find()) { //we've found a match
int range_from = Integer.parseInt(matcher.group(1));
int range_to = Integer.parseInt(matcher.group(2));
}
with range_from and range_to the integers you can no work with.
The advantage is that the pattern will fail on strings that make not much sense like ..3[4, etc.

Check if a string has a word followed by a number

I have strings like:
String test="top 10 products";
String test2="show top 10 products";
Is there a way to check if the word "top" has a number following it? If so, how to get that number to another string?
I'm thinking about using indexOf("top") and add 4 to that and try to get the next word. Not sure how it will work. Any suggestions?

If you only want to extract a possible number after single / first occurrence of "top", that's a viable way. Don't forget to check for existence of the word, and that there's something behind it at all.
You can also use regular expression for this, which will need a bit less error checking:
top\\s+([0-9]+)
You could even make a Pattern out of this, and then iterate the Matcher.find() method and extract the numbers for multiple matches:
Pattern pat = Pattern.compile("top\\s+([0-9]+)");
Matcher matcher = pat.matcher("top 10 products or top 20 products");
while (matcher.find()) {
System.out.println(matcher.group(1));
}

An evil regex can help you.
String test="top 10 products";
System.out.println(test.replaceAll(".*?\\w+\\s+(\\d+).*", "$1"));
O/P :
10
Note : This will return the entire String in case there is no "Word[space]digits" in the String. You will have to do a length check for the actual String and the returned String. If the length is same, then your String doesn't contain the expected pattern.

How do I get the 5th word in a string? Java

Say I have a string of a text document and I want to save the 124th word of that string in another string how would I do this? I assume it counts each "word" as a string of text in between spaces (including things like - hyphens).
Edit:
Basically what I'm doing right now is grabbing text off a webpage and I want to get number values next to a certain word. Like here is the webpage, and its saved in a string .....health 78 mana 32..... or something of the sort and i want to get the 78 and the 32 and save it as a variable

If you have a string
String s = "...";
then you can get the word (separated by spaces) in the nth position using split(delimiter) which returns an array String[]:
String word = s.split("\\s+")[n-1];
Note:
The argument passed to split() is the delimiter. In this case, "\\s+" is a regular expression, that means that the method will split the string on each whitespace, even if there are more than one together.

Why not convert the String to a String array using StringName.split(" "), i.e. split the string based on spaces. Then only a matter of retrieving the 124th element of the array.

For example, you have a string like this:
String a="Hello stackoverflow i am Gratin";
To see 5th word, just write that code:
System.out.println(a.split("\\s+")[4]);

This is a different approach that automatically returns a blank String if there isn't a 5th word:
String word = input.replaceAll("^\\s*(\\S+\\s+){4}(\\S+)?.*", "$1");
Solutions that rely on split() would need an extra step of checking the resulting array length to prevent getting an ArrayIndexOutOfBoundsException if there are less than 5 words.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Substring contatining words up to n characters - java

Split your string with space ' ' then foreach substring add it to a new string and check whether the length of the new substring exceeds or not exceeds the limit.

Related

Fix issue with regex expression getting first word in string in Java?

Correct way to split UTF-8 String

String split method returning first element as empty using regex

Check if a string has a word followed by a number

How do I get the 5th word in a string? Java

Categories

Resources