Method to extract Words from a sentence - java

Im having a hard time at a at writing a method that extracts words from a sentence. The words should start with aAeEiIoOuU and are 5 letters long for example ether.
The method should return a String array.My problem here is that I want that the length of the array is the same as the foudn words. If it found 3 words the array length should be 3 too.
This is my code at the moment:
public static String[] extractWords(String text){
String text = "einer hallo hallo einer";
String pattern = "\\b[AaEeIiOoUu]\\p{L}\\p{L}\\p{L}\\p{L}\\b";
Pattern p = Pattern.compile(pattern, Pattern.UNICODE_CASE);
Matcher m = p.matcher(text);
int i = 0;
while (m.find()){
i++;
}
String[] array = new String[i];
while(m.find()){
System.out.println(m.group());
array[i] = m.group();
i++;
}
}

You should be using ArrayList here. To use array, you have to do the matching twice, which is unnecessary extra work.
Also, just so you know, the second while(m.find()) loop, will not run even once, because the matcher has been exhausted by the first loop. You would need to re-initialize the Matcher object:
Matcher m = p.matcher(text); // Needed before second while loop.
But that is not needed. Let's use an ArrayList instead:
public static String[] extractWords(String text){
String text = "einer hallo hallo einer";
// Use quantifier to match 4 characters, instead of repeating it 4 times.
String pattern = "\\b[AaEeIiOoUu]\\p{L}{4}\\b";
Pattern p = Pattern.compile(pattern, Pattern.UNICODE_CASE);
Matcher m = p.matcher(text);
List<String> matchedWords = new ArrayList<>();
while (m.find()){
matchedWords.add(m.group());
}
// If you want an array, convert the list to array
String[] matchedWordArray = matchedWords.toArray(new String[matchedWords.size()]);
}

Related

How to replace multiple consecutive occurrences of a character with a maximum allowed number of occurences?

CharSequence content = new StringBuffer("aaabbbccaaa");
String pattern = "([a-zA-Z])\\1\\1+";
String replace = "-";
Pattern patt = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = patt.matcher(content);
boolean isMatch = matcher.find();
StringBuffer buffer = new StringBuffer();
for (int i = 0; i < content.length(); i++) {
while (matcher.find()) {
matcher.appendReplacement(buffer, replace);
}
}
matcher.appendTail(buffer);
System.out.println(buffer.toString());
In the above code content is input string,
I am trying to find repetitive occurrences from string and want to replace it with max no of occurrences
For Example
input -("abaaadccc",2)
output - "abaadcc"
here aaaand cccis replced by aa and cc as max allowed repitation is 2
In the above code, I found such occurrences and tried replacing them with -, it's working, But can someone help me How can I get current char and replace with allowed occurrences
i.e If aaa is found it is replaced by aa
or is there any alternative method w/o using regex?
You can declare the second group in a regex and use it as a replacement:
String result = "aaabbbccaaa".replaceAll("(([a-zA-Z])\\2)\\2+", "$1");
Here's how it works:
( first group - a character repeated two times
([a-zA-Z]) second group - a character
\2 a character repeated once
)
\2+ a character repeated at least once more
Thus, the first group captures a replacement string.
It isn't hard to extrapolate this solution for a different maximum value of allowed repeats:
String input = "aaaaabbcccccaaa";
int maxRepeats = 4;
String pattern = String.format("(([a-zA-Z])\\2{%s})\\2+", maxRepeats-1);
String result = input.replaceAll(pattern, "$1");
System.out.println(result); //aaaabbccccaaa
Since you defined a group in your regex, you can get the matching characters of this group by calling matcher.group(1). In your case it contains the first character from the repeating group so by appending it twice you get your expected result.
CharSequence content = new StringBuffer("aaabbbccaaa");
String pattern = "([a-zA-Z])\\1\\1+";
Pattern patt = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = patt.matcher(content);
StringBuffer buffer = new StringBuffer();
while (matcher.find()) {
System.out.println("found : "+matcher.start()+","+matcher.end()+":"+matcher.group(1));
matcher.appendReplacement(buffer, matcher.group(1)+matcher.group(1));
}
matcher.appendTail(buffer);
System.out.println(buffer.toString());
Output:
found : 0,3:a
found : 3,6:b
found : 8,11:a
aabbccaa

How do I take a string with a named group and replace only that named capture group with a value in Java 7

Say for example I have the following string with a named capture group:
/this/(?<capture1>.*)/a/string/(?<capture2>.*)
And I want to replace the capture group with a value like "foo" so that I end up with a string that looks like:
/this/foo/a/string/bar
Limitations are:
Regex must be used as the string is evaluated elsewhere but it doesn't have to be a capture group.
I'd rather not have to regex match the regex.
EDIT: There can be many groups in the string.
You can find the starting and ending index
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
startindex= matcher.start();
stopindex=matcher.end();
// Your code for replacing that index and generating a new string with foo
// you can use string buffer to delete and insert the characters as you know the indexes
}
}
Full Implementation:
public static String getnewString(String text,String reg){
StringBuffer result = new StringBuffer(text);
Pattern pattern = Pattern.compile(reg);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
int startindex= matcher.start();
int stopindex=matcher.end();
System.out.println(startindex+" "+stopindex);
result.delete(startindex, stopindex);
result.insert(startindex, "foo");
}
return result.toString();
}
Try this,
int lastIndex = s.lastIndexOf("/");
String newString = s.substring(0, lastIndex+1).concat("newString");
System.out.println(newString);
Get the subString till last '/' and then add new string to the substring like above
I got it:
String string = "/this/(?<capture1>.*)/a/string/(?<capture2>.*)";
Pattern pattern = Pattern.compile(string);
Matcher matcher = pattern.matches(string);
string.replace(matcher.group("capture1"), "value 1");
string.replace(matcher.group("capture2"), "value 2");
Crazy, but works.

How to replace all {!XXX} from string?

I have string with multiple {!XXX} phrases. For example:
Kumar gaurav {!str1} is just {!str2}, adasdas {!str3}
I need to replace all {!str} values with corresponding str, how to replace all {!str} from my string?
You can use a Pattern and Matcher, which provides you the means to query the string for a unknown number of elements, in combination with a regular expression of \{!str\d\} which will allow you to break the text down based on the tags
For example...
String text = "All that {!str1} is {!str2}";
Map<String, String> values = new HashMap<>(25);
values.put("{!str1}", "glitters");
values.put("{!str2}", "gold");
Pattern p = Pattern.compile("\\{!str\\d\\}");
Matcher matcher = p.matcher(text);
while (matcher.find()) {
String match = matcher.group();
text = text.replaceAll("\\" + match, values.get(match));
}
System.out.println(text);
Which outputs
All that glitters is gold
You could also use something like...
int previousStart = 0;
StringBuilder sb = new StringBuilder();
while (matcher.find()) {
String match = matcher.group();
int start = matcher.start();
int end = matcher.end();
sb.append(text.substring(previousStart, start));
sb.append(values.get(match));
previousStart = end;
}
if (previousStart < text.length()) {
sb.append(text.substring(previousStart));
}
Which does away with the String creation in a loop and relies more on the position of the match to cut the original text around the tokens, which makes me happier ;)
use this regex, simple
String string="hello world {!hello}";
string=string.replaceAll("\\{!(.*?)\\}", "replace");
System.out.println(string); //this will print (hello world replace)

Finding Upper Case in String Array and extracting it out

I have an array input like this which is an email id in reverse order along with some data:
MOC.OOHAY#ABC.PQRqwertySDdd
MOC.OOHAY#AB.JKLasDDbfn
MOC.OOHAY#XZ.JKGposDDbfn
I want my output to come as
MOC.OOHAY#ABC.PQR
MOC.OOHAY#AB.JKL
MOC.OOHAY#XZ.JKG
How should I filter the string since there is no pattern?
There is a pattern, and that is any upper case character which is followed either by another upper case letter, a period or else the # character.
Translated, this would become something like this:
String[] input = new String[]{"MOC.OOHAY#ABC.PQRqwertySDdd","MOC.OOHAY#AB.JKLasDDbfn" , "MOC.OOHAY#XZ.JKGposDDbfn"};
Pattern p = Pattern.compile("([A-Z.]+#[A-Z.]+)");
for(String string : input)
{
Matcher matcher = p.matcher(string);
if(matcher.find())
System.out.println(matcher.group(1));
}
Yields:
MOC.OOHAY#ABC.PQR
MOC.OOHAY#AB.JKL
MOC.OOHAY#XZ.JKG
Why do you think there is no pattern?
You clearly want to get the string till you find a lowercase letter.
You can use the regex (^[^a-z]+) to match it and extract.
Regex Demo
Simply split on [a-z], with limit 2:
String s1 = "MOC.OOHAY#ABC.PQRqwertySDdd";
String s2 = "MOC.OOHAY#AB.JKLasDDbfn";
String s3 = "MOC.OOHAY#XZ.JKGposDDbfn";
System.out.println(s1.split("[a-z]", 2)[0]);
System.out.println(s2.split("[a-z]", 2)[0]);
System.out.println(s3.split("[a-z]", 2)[0]);
Demo.
You can do it like this:
String arr[] = { "MOC.OOHAY#ABC.PQRqwertySDdd", "MOC.OOHAY#AB.JKLasDDbfn", "MOC.OOHAY#XZ.JKGposDDbfn" };
for (String test : arr) {
Pattern p = Pattern.compile("[A-Z]*\\.[A-Z]*#[A-Z]*\\.[A-Z.]*");
Matcher m = p.matcher(test);
if (m.find()) {
System.out.println(m.group());
}
}

Java regex matcher always returns false

I have a string expression from which I need to get some values. The string is as follows
#min({(((fields['example6'].value + fields['example5'].value) * ((fields['example1'].value*5)+fields['example2'].value+fields['example3'].value-fields['example4'].value)) * 0.15),15,9.087})
From this stribg, I need to obtain a string array list which contains the values such as "example1", "example2" and so on.
I have a Java method which looks like this:
String regex = "/fields\\[['\"]([\\w\\s]+)['\"]\\]/g";
ArrayList<String> arL = new ArrayList<String>();
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(expression);
while(m.find()){
arL.add(m.group());
}
But m.find() always returns false. Is there anything I'm missing?
The problem is with the '/'s. If what you want to extract is only the field name, you should use m.group(1):
String regex = "fields\\[['\"]([\\w\\s]+)['\"]\\]";
ArrayList<String> arL = new ArrayList<String>();
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(expression);
while(m.find()){
arL.add(m.group(1));
}
The main issue you seem to have is that you are using delimiters (as in PHP or Perl or JavaScript) that cannot be used in a Java regex. Also, you have your matches in the first capturing group, but you are using group() that returns the whole match (including fields[').
Here is a working code:
String str = "#min({(((fields['example6'].value + fields['example5'].value) * ((fields['example1'].value*5)+fields['example2'].value+fields['example3'].value-fields['example4'].value)) * 0.15),15,9.087})";
ArrayList<String> arL = new ArrayList<String>();
String rx = "(?<=fields\\[['\"])[\\w\\s]*(?=['\"]\\])";
Pattern ptrn = Pattern.compile(rx);
Matcher m = ptrn.matcher(str);
while (m.find()) {
arL.add(m.group());
}
Here is a working IDEONE demo
Note that I have added look-arounds to extract just the texts between 's with group().

Categories