This question already has answers here:
Difference between matches() and find() in Java Regex
(5 answers)
Closed 5 years ago.
I am stuck in a simple issue I want to check if any of the words : he, be, de is present my text.
So I created the pattern (present in the code) using '|' to symbolize OR
and then I matched against my text. But the match is giving me false result (in print statement).
I tried to do the same match in Notepad++ using Regex search and it worked there but gives FALSE( no match) in Java. C
public class Del {
public static void main(String[] args) {
String pattern="he|be|de";
String text= "he is ";
System.out.println(text.matches(pattern));
}
}
Can any one suggest what am I doing wrong.
Thanks
It's because you are trying to match against the entire string instead of the part to find. For example, this code will find that only a part of the string is conforming to the present regex:
Matcher m = Pattern.compile("he|be|de").matcher("he is ");
m.find(); //true
When you want to match an entire string and check if that string contains he|be|de use this regex .*(he|be|de).*
. means any symbol, * is previous symbol may be present zero or more times.
Example:
"he is ".matches(".*(he|be|de).*"); //true
String regExp="he|be|de";
Pattern pattern = Pattern.compile(regExp);
String text = "he is ";
Matcher matcher = pattern.matcher(text);
System.out.println(matcher.find());
Related
This question already has answers here:
How do I split a string in Java?
(39 answers)
Closed 6 years ago.
i am trying to split the string using regex with closing bracket as a delimiter and have to keep the bracket..
i/p String: (GROUP=test1)(GROUP=test2)(GROUP=test3)(GROUP=test4)
needed o/p:
(GROUP=test1)
(GROUP=test2)
(GROUP=test3)
(GROUP=test4)
I am using the java regex - "\([^)]*?\)" and it is throwing me the error..Below is the code I am using and when I try to get the group, its throwing the error..
Pattern splitDelRegex = Pattern.compile("\\([^)]*?\\)");
Matcher regexMatcher = splitDelRegex.matcher("(GROUP=test1)(GROUP=test2) (GROUP=test3)(GROUP=test4)");
List<String> matcherList = new ArrayList<String>();
while(regexMatcher.find()){
String perm = regexMatcher.group(1);
matcherList.add(perm);
}
any help is appreciated..Thanks
You simply forgot to put capturing parentheses around the entire regex. You are not capturing anything at all. Just change the regex to
Pattern splitDelRegex = Pattern.compile("(\\([^)]*?\\))");
^ ^
I tested this in Eclipse and got your desired output.
You could use
str.split(")")
That would return an array of strings which you would know are lacking the closing parentheses and so could add them back in afterwards. Thats seems much easier and less error prone to me.
You could try changing this line :
String perm = regexMatcher.group(1);
To this :
String perm = regexMatcher.group();
So you read the last found group.
I'm not sure why you need to split the string at all. You can capture each of the bracketed groups with a regex.
Try this regex (\\([a-zA-Z0-9=]*\\)). I have a capturing group () that looks for text that starts with a literal \\(, contains [a-zA-Z0-9=] zero or many times * and ends with a literal \\). This is a pretty loose regex, you could tighten up the match if the text inside the brackets will be predictable.
String input = "(GROUP=test1)(GROUP=test2)(GROUP=test3)(GROUP=test4)";
String regex = "(\\([a-zA-Z0-9=]*\\))";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
while(matcher.find()) { // find the next match
System.out.println(matcher.group()); // print the match
}
Output:
(GROUP=test1)
(GROUP=test2)
(GROUP=test3)
(GROUP=test4)
This question already has answers here:
Java Regex matching between curly braces
(5 answers)
Closed 6 years ago.
I am trying to extract value between { } using
"\\(\\{[^}]+\\}\\)"
regex in java. My input is
String text = "Hi this is {text to be extracted}."
I want output as
"text to be extracted"
but that regex isn't working.
Try this:
"\\{([^}]*)\\}"
Online Demo
Then $1 is containing text to be extracted.
The regexp seems malformed.
You need to match extra characters before and after the group, and you do not need to escape the parenthesis.
Also, you can use the named group to extract exactly the text you care about
Here is working code
String text = "Hi this is {text to be extracted}.";
Pattern p = Pattern.compile(".*\\{(?<t>[^}]+)\\}.*");
Matcher m = p.matcher(text);
if (m.matches()) {
System.out.println(m.group("t"));
}
This question already has answers here:
Rationale for Matcher throwing IllegalStateException when no 'matching' method is called
(6 answers)
Closed 7 years ago.
I am trying to implement simple regex string matching with wildcards in Java. So the idea is, you have a needle(the string to search for) and a haystack(the string being searched), you have to search for the needle in the haystack and give the starting index of the needle. The wildcard comes in in a situation where the string supplied as the needle is incomplete and the missing character(s) is/are replaced with an underscore '_'( for example test is equivalent to t_st or tes_t or te__).
I have written a simple method that takes in the haystack and needle as arguments but I can't get it to work. I keep getting an IIllegalStateException: No match available error. Here is the code:
static int findRegex(String needle, String haystack)
{
char [] needleChars = needle.toCharArray();
StringBuilder builder = new StringBuilder("");
builder.append(".*");
for (char c: needleChars)
{
builder.append('(');
builder.append(c);
builder.append('|');
builder.append('_');
builder.append(')');
}
System.out.println(builder.toString());
return Pattern.compile(builder.toString()).matcher(haystack).start();
}
I have tested the regex pattern generated by the code (.*(t|_)(e|_)(s|_)(t|_)) and it works. Where did I go wrong?
IIllegalStateException: No match available error means, that regex engine wasn't able to find any match for your regex.
It can be thrown when
you don't call one of these methods from your Matcher to let it search for match:
matches()
find()
lookingAt()
result of these methods will be false, which means that despite trying, regex engine wasn't able to find any match. In that case there is no valid index which can be returned as start().
Anyway I suspect that your method should look more like
static int findRegex(String needle, String haystack) {
String regex = needle.replace("_", ".{0,10}?");
//System.out.println(regex);
Matcher matcher = Pattern.compile(regex).matcher(haystack);
if (matcher.find()){
return matcher.start();
}else{
return -1;
}
}
I simply replaced any _ with with .{0,10}? to let it match any character (with limit to 10 characters). I also added ? to make this quantifier reluctant so te_t would find minimal match.
This question already has answers here:
What do 'lazy' and 'greedy' mean in the context of regular expressions?
(13 answers)
Closed 8 years ago.
just experiencing some problems with Java Regular expressions.
I have a program that reads through an HTML file and replaces any string inside the #VR# characters, i.e. #VR#Test1 2 3 4#VR#
However my issue is that, if the line contains more than two strings surrounded by #VR#, it does not match them. It would match the leftmost #VR# with the rightmost #VR# in the sentence and thus take whatever is in between.
For example:
#VR#Google#VR#
My code would match
URL-GOES-HERE#VR#" target="_blank" style="color:#f4f3f1; text-decoration:none;" title="ContactUs">#VR#Google
Here is my Java code. Would appreciate if you could help me to solve this:
Pattern p = Pattern.compile("#VR#.*#VR#");
Matcher m;
Scanner scanner = new Scanner(htmlContent);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
m = p.matcher(line);
StringBuffer sb = new StringBuffer();
while (m.find()) {
String match_found = m.group().replaceAll("#VR#", "");
System.out.println("group: " + match_found);
}
}
I tried replacing m.group() with m.group(0) and m.group(1) but nothing. Also m.groupCount() always returns zero, even if there are two matches as in my example above.
Thanks, your help will be very much appreciated.
Your problem is that .* is "greedy"; it will try to match as long a substring as possible while still letting the overall expression match. So, for example, in #VR# 1 #VR# 2 #VR# 3 #VR#, it will match 1 #VR# 2 #VR# 3.
The simplest fix is to make it "non-greedy" (matching as little as possible while still letting the expression match), by changing the * to *?:
Pattern p = Pattern.compile("#VR#.*?#VR#");
Also m.groupCount() always returns zero, even if there are two matches as in my example above.
That's because m.groupCount() returns the number of capture groups (parenthesized subexpressions, whose corresponding matched substrings retrieved using m.group(1) and m.group(2) and so on) in the underlying pattern. In your case, your pattern has no capture groups, so m.groupCount() returns 0.
You can try the regular expression:
#VR#(((?!#VR#).)+)#VR#
Demo:
private static final Pattern REGEX_PATTERN =
Pattern.compile("#VR#(((?!#VR#).)+)#VR#");
public static void main(String[] args) {
String input = "#VR#Google#VR# ";
System.out.println(
REGEX_PATTERN.matcher(input).replaceAll("$1")
); // prints "Google "
}
This question already has answers here:
Java: splitting a comma-separated string but ignoring commas in quotes
(12 answers)
Closed 9 years ago.
I'm stuck with this regex.
So, I have input as:
"Crane device, (physical object)"(X1,x2,x4), not "Seen by research nurse (finding)", EntirePatellaBodyStructure(X1,X8), "Besnoitia wallacei (organism)", "Catatropis (organism)"(X1,x2,x4), not IntracerebralRouteQualifierValue, "Diospyros virginiana (organism)"(X1,x2,x4), not SuturingOfHandProcedure(X1)
and in the end I would like to get is:
"Crane device, (physical object)"(X1,x2,x4)
not "Seen by research nurse (finding)"
EntirePatellaBodyStructure(X1,X8)
"Besnoitia wallacei (organism)"
"Catatropis (organism)"(X1,x2,x4)
not IntracerebralRouteQualifierValue
"Diospyros virginiana (organism)"(X1,x2,x4)
not SuturingOfHandProcedure(X1)
I've tried regex
(\'[^\']*\')|(\"[^\"]*\")|([^,]+)|\\s*,\\s*
It works if I don't have a comma inside parentheses.
RegEx
(\w+\s)?("[^"]+"|\w+)(\(\w\d(,\w\d)*\))?
Java Code
String input = ... ;
Matcher m = Pattern.compile(
"(\\w+\\s)?(\"[^\"]+\"|\\w+)(\\(\\w\\d(,\\w\\d)*\\))?").matcher(input);
while(matcher.find()) {
System.out.println(matcher.group());
}
Output
"Crane device, (physical object)"(X1,x2,x4)
not "Seen by research nurse (finding)"
EntirePatellaBodyStructure(X1,X8)
not "Besnoitia wallacei (organism)"(X1,x2,x4)
not "Catatropis (organism)"(X1,x2,x4)
not IntracerebralRouteQualifierValue
not "Diospyros virginiana (organism)"(X1,x2,x4)
not SuturingOfHandProcedure(X1)
Don't use regexes for this. Write a simple parser that keeps track of the number of parentheses encountered, and whether or not you are inside quotes. For more information, see: RegEx match open tags except XHTML self-contained tags
Would this do what you need?
System.out.println(yourString.replaceAll(", not", "\nnot"));
Assuming that there is no possibility of nesting () within (), and no possibility of (say) \" within "", you can write something like:
private static final Pattern CUSTOM_SPLIT_PATTERN =
Pattern.compile("\\s*((?:\"[^\"]*\"|[(][^)]*[)]|[^\"(]+)+)");
private static final String[] customSplit(final String input) {
final List<String> ret = new ArrayList<String>();
final Matcher m = CUSTOM_SPLIT_PATTERN.matcher(input);
while(m.find()) {
ret.add(m.group(1));
}
return ret.toArray(new String[ret.size()]);
}
(disclaimer: not tested).