Regex : Looking for dots in a sentence except inside braquets - java

I'm looking for a regex to split a java string on "dots" in a sentence except if these dots are between brackets.
This is to say that in this example sentence :
word1.word2.word3[word4.word5[word6.word7]].word8
I would like to split only the first two ones and the last one (just before "word8").
I managed to get to this regex :
\.(?![^\[]*?\])
But it's not good enough as it also splits on the dot between words 4 and 5 :-(
Any idea to solve this particuliar case ?

By looking at PerlMonks discussions I don't think the problem can be solved in Java by a single regex.
If you are okay with using multiple steps, then you could first remove all pairs of brackets (starting with the innermost) and then split the remaining string by dots:
public static void main (String[] args) {
String str = "word1.word2.word3[word4.word5[word6.word7]].word8";
final Pattern BRACKET_PAIR = Pattern.compile("\\[[^\\[\\]]+\\]");
while (BRACKET_PAIR.matcher(str).find()) {
str = BRACKET_PAIR.matcher(str).replaceFirst("");
}
for (String word: str.split("\\.")) {
System.out.println(word);
}
}
Resulting in the output:
word1
word2
word3
word8

Related

spliting a string by space and dot and comma at the same time

How can I split a string by space, dot and comma at the same time? I want to get rid of them and get words only.
My code for space:
str=array.get(0).split(" ");
After advices i wrote this
str=array.get(0).split("[ ]|[.]|[,]|[ \t]");
but i see a new problem
String
New problem
The method split can be used with a Regex pattern, so you can match more elaborated cases to split your string.
A matching pattern for your case would be:
[ \.,]+
Regex Exaplanation:
[ .,]+ - The brackets create Character Set, that will match any character in the set.
[ .,]+ - The plus sign is a Quantifier, it will match the previous token (the character set) one or more times, this solves the problem where the tokens are following one another, creating empty strings in the array.
You can test it with the following code:
class Main {
public static void main(String[] args) {
String str = "Hello, World!, StackOverflow. Test Regex";
String[] split = str.split("[ .,]+");
for(String s : split){
System.out.println(s);
}
}
}
The output is:
Hello
World!
StackOverflow
Test
Regex
Using .split() can lead to having empty entries in your array.
Try this:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
String text = "This is... a real sentence, actually.";
Pattern reg = Pattern.compile("\\w+");
Matcher m = reg.matcher(text);
while (m.find()) {
System.out.println(m.group());
}

Is there a way to find a word by searching for only part of it?

I need to take a phrase that contains a specific word, then if it does have that word even if it's part of another word, to print the entire word out.
I think how to find the word "apple", but I can't figure how to find the word "appletree".
So far, I have some code that finds the word apple and prints that out.
String phrase = "She's sitting under an appletree!";
if (phrase.contains("apple")) {
System.out.println("apple");
} else {
System.out.println("none");
}
How do I print "appletree"?
Use regex for a 1-liner:
String target = phrase.replaceAll(".*?(\\w*apple\\w*).*", "$1");
This works by matching (and thus replacing) the entire input, but capturing the target then using a backreference ($1) to the captured input resulting in just the target being returned.
The word in which apple appears is matched using \\w* (ie any number of word chars) at either end of apple. The minimum number of leading chars are matched outside the target by using a reluctant quantifier .*?, otherwise that expression would match all the way to apple, which would miss words like dapple.
Test code:
String phrase = "She's sitting under an appletree!";
String target = phrase.replaceAll(".*?(\\w*apple\\w*).*", "$1");
System.out.println(target);
Output:
appletree
You could import a scanner to read the phrase. You would use the scanner.next() to capture each token of input into a String variable "s", in this case each word, and then use the if statement if(s.contains("apple")) then System.out.println(s).
Hope this helps!
Robin.
without using regex you could simply split the sentence into words, loop through and check if it contains the requested word - old school style
String [] arr = phrase.split ("\\s+");
for (String word : arr) {
if (word.contains("apple")) return word;
}

Split String end with special characters - Java

I have a string which I want to first split by space, and then separate the words from the special characters.
For Example, let's say the input is:
Hi, How are you???
I already wrote the logic to split by space here:
String input = "Hi, How are you???";
String[] words = input.split("\\\\s+");
Now, I want to seperate each word from the special character.
For example: "Hi," to {"Hi", ","} and "you???" to {"you", "???"}
If the string does not end with any special characters, just ignore it.
Can you please help me with the regular expression and code for this in Java?
Following regex should help you out:
(\s+|[^A-Za-z0-9]+)
This is not a java regex, so you need to add a backspace.
It matches on whitespaces \s+ and on strings of characters consisting not of A-Za-z0-9. This is a workaround, since there isn't (or at least I do not know of) a regex for special characters.
You can test this regex here.
If you use this regex with the split function, it will return the words. Not the special characters and whitespaces it machted on.
UPDATE
According to this answer here on SO, java has\P{Alpha}+, which matches any non-alphabetic character. So you could try:
(\s|\P{Alpha})+
I want to separate each word from the special character.
For example: "Hi," to {"Hi", ","} and "you???" to {"you", "???"}
regex to achieve above behavior
String stringToSearch ="Hi, you???";
Pattern p1 = Pattern.compile("[a-z]{0}\\b");
String[] str = p1.split(stringToSearch);
System.out.println(Arrays.asList(str));
output:
[Hi, , , you, ???]
#mike is right...we need to split the sentence on special characters, leaving out the words. Here is the code:
`public static void main(String[] args) {
String match = "Hi, How are you???";
String[] words = match.split("\\P{Alpha}+");
for(String word: words) {
System.out.print(word + " ");
}
}`

Split a sentence in array of string with special characters or spaces intact

I want to split a sentence having spaces or any special character into an array of words with spaces or special character also an element of array.
Sentence like:
aman,amit and sumit went to top-up
should be split into an array of String:
{"aman",",","amit"," ","and"," ","sumit"," ","went"," ","to"," ","top","-","up")
Please suggest any regex or logic to split the same using java.
I missed one thing in my question. I also need to split on numeric character as well.. But using split("\b") does not split a string having something like
abc12def
into
{ "abc", "12","def") or {"abc","1","2","def")
It seems all you need is to match either word characters (\w+) or non-word ones (\W+). Combine these with an alternation operator and - perhaps - add a Pattern.UNICODE_CHARACTER_CLASS (or its inline/embedded version (?U)) to make the pattern Unicode-aware:
String value = "aman,amit and sumit went to top-up";
String pattern = "(?U)\\w+|\\W+";
List<String> lst = new ArrayList<>();
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(value);
while (m.find())
lst.add(m.group(0));
System.out.println(lst);
See the Java demo
I hope the below code snippet helps you solve this.
public static void main(final String[] args) {
String message = "aman,amit and sumit went to top-up";
String[] messages = message.split("\\b");
for(String string : messages) {
System.out.println(string);
}
}

How to write a regex to split a String in this format?

I want to use [,.!?;~] to split a string, but I want to remain the [,.!?;~] to its place for example:
This is the example, but it is not enough
To
[This is the example,, but it is not enough] // length=2
[0]=This is the example,
[1]=but it is not enough
As you can see the comma is still in its place. I did this with this regex (?<=([,.!?;~])+). But I want if some special word (e.g: but) comes after the [,.!?;~], then do not split that part of string. For example:
I want this sentence to be split into this form, but how to do. So if
anyone can help, that will be great
To
[0]=I want this sentence to be split into this form, but how to do.
[1]=So if anyone can help,
[2]=that will be great
As you can see this part (form, but) is not split int the first sentence.
I've used:
Positive Lookbehind (?<=a)b to keep the delimiter.
Negative Lookahead a(?!b) to rule out stop words.
Notice how I've appended RegEx (?!\\s*(but|and|if)) after your provided RegEx. You can put all those stop words that you've to rule out (eg, but, and, if) inside the bracket separated by pipe symbol.
Also do notice that the delimiter is still in it's place.
Output
Count of tokens = 3
I want this sentence to be split into this form, but how to do.
So if anyone can help,
that will be great
Code
import java.lang.*;
public class HelloWorld {
public static void main(String[] args) {
String str = "I want this sentence to be split into this form, but how to do. So if anyone can help, that will be great";
//String delimiters = "\\s+|,\\s*|\\.\\s*";
String delimiters = "(?<=,)";
// analyzing the string
String[] tokensVal = str.split("(?<=([,.!?;~])+)(?!\\s*(but|and|if))");
// prints the number of tokens
System.out.println("Count of tokens = " + tokensVal.length);
for (String token: tokensVal) {
System.out.println(token);
}
}
}

Categories