This question already has answers here:
Rationale for Matcher throwing IllegalStateException when no 'matching' method is called
(6 answers)
Closed 7 years ago.
I am trying to implement simple regex string matching with wildcards in Java. So the idea is, you have a needle(the string to search for) and a haystack(the string being searched), you have to search for the needle in the haystack and give the starting index of the needle. The wildcard comes in in a situation where the string supplied as the needle is incomplete and the missing character(s) is/are replaced with an underscore '_'( for example test is equivalent to t_st or tes_t or te__).
I have written a simple method that takes in the haystack and needle as arguments but I can't get it to work. I keep getting an IIllegalStateException: No match available error. Here is the code:
static int findRegex(String needle, String haystack)
{
char [] needleChars = needle.toCharArray();
StringBuilder builder = new StringBuilder("");
builder.append(".*");
for (char c: needleChars)
{
builder.append('(');
builder.append(c);
builder.append('|');
builder.append('_');
builder.append(')');
}
System.out.println(builder.toString());
return Pattern.compile(builder.toString()).matcher(haystack).start();
}
I have tested the regex pattern generated by the code (.*(t|_)(e|_)(s|_)(t|_)) and it works. Where did I go wrong?
IIllegalStateException: No match available error means, that regex engine wasn't able to find any match for your regex.
It can be thrown when
you don't call one of these methods from your Matcher to let it search for match:
matches()
find()
lookingAt()
result of these methods will be false, which means that despite trying, regex engine wasn't able to find any match. In that case there is no valid index which can be returned as start().
Anyway I suspect that your method should look more like
static int findRegex(String needle, String haystack) {
String regex = needle.replace("_", ".{0,10}?");
//System.out.println(regex);
Matcher matcher = Pattern.compile(regex).matcher(haystack);
if (matcher.find()){
return matcher.start();
}else{
return -1;
}
}
I simply replaced any _ with with .{0,10}? to let it match any character (with limit to 10 characters). I also added ? to make this quantifier reluctant so te_t would find minimal match.
Related
This question already has answers here:
Difference between matches() and find() in Java Regex
(5 answers)
Closed 5 years ago.
I am stuck in a simple issue I want to check if any of the words : he, be, de is present my text.
So I created the pattern (present in the code) using '|' to symbolize OR
and then I matched against my text. But the match is giving me false result (in print statement).
I tried to do the same match in Notepad++ using Regex search and it worked there but gives FALSE( no match) in Java. C
public class Del {
public static void main(String[] args) {
String pattern="he|be|de";
String text= "he is ";
System.out.println(text.matches(pattern));
}
}
Can any one suggest what am I doing wrong.
Thanks
It's because you are trying to match against the entire string instead of the part to find. For example, this code will find that only a part of the string is conforming to the present regex:
Matcher m = Pattern.compile("he|be|de").matcher("he is ");
m.find(); //true
When you want to match an entire string and check if that string contains he|be|de use this regex .*(he|be|de).*
. means any symbol, * is previous symbol may be present zero or more times.
Example:
"he is ".matches(".*(he|be|de).*"); //true
String regExp="he|be|de";
Pattern pattern = Pattern.compile(regExp);
String text = "he is ";
Matcher matcher = pattern.matcher(text);
System.out.println(matcher.find());
This question already has answers here:
Regex to replace repeated characters
(2 answers)
Closed 6 years ago.
I am trying to replace all the repeated characters from a String in Java, and let only one.
For example:
aaaaa ---> a
For that, I have tried using the replaceAll method:
"aaaaa".replaceAll("a*","a") //returns "aa"
I have developed a recursive method, which is probably not very efficient:
public String recursiveReplaceAll(String original,String regex, String replacement) {
if (original.equals(original.replaceAll(regex, replacement))) return original;
return recursiveReplaceAll(original.replaceAll(regex, replacement),regex,replacement);
}
This method works, I was just wondering if there was anything using RegEx for example, which does the work with better performance.
Your replaceAll approach was nearly right - it's just that * matches 0 occurrences. You want + to mean "one or more".
"aaaaa".replaceAll("a+","a") // Returns "a"
You can do it without recursion. The regular expression "(.)\\1+" will capture every character followed by themselves at least once, and it replaces them with the captured character. Thus, this removes any repeated characters.
public static void main(String[] args) {
String str = "aaaabbbaaa";
String result = str.replaceAll("(.)\\1+", "$1");
System.out.println(result); // prints "aba".
}
With this, it works for all characters.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
JAVA: check a string if there is a special character in it
I'm trying to create a method to check if a password starts or ends with a special character. There were a few other checks that I have managed to code, but this seems a bit more complicated.
I think I need to use regex to do this efficiently. I have already created a method that checks if there are any special characters, but I can't figure out how modify it.
Pattern p = Pattern.compile("\\p{Punct}");
Matcher m = p.matcher(password);
boolean a = m.find();
if (!a)
System.out.println("Password must contain at least one special character!");
According to the book I'm reading I need to use ^ and $ in the pattern to check if it starts or ends with a special character. Can I just add both statements to the existing pattern or how should I start solving this?
EDIT:
Alright, I think I got the non-regex method working:
for (int i = 0; i < password.length(); i++) {
if (SPECIAL_CHARACTERS.indexOf(password.charAt(i)) > 0)
specialCharSum++;
}
Can't you just use charAt to get the character and indexOf to check for whether or not the character is special?
final String SPECIAL_CHARACTERS = "?#"; // And others
if (SPECIAL_CHARACTERS.indexOf(password.charAt(0)) >= 0
|| SPECIAL_CHARACTERS.indexOf(password.charAt(password.length() - 1)) >= 0) {
System.out.println("password begins or ends with a special character");
}
I haven't profiled (profiling is the golden rule for performance), but I would expect iterating through a compile-time constant string to be faster than building and executing a finite-state automaton for a regular expression. Furthermore, Java's regular expressions are more complex than FSAs, so I would expect that Java regular expressions are implemented differently and are thus slower than FSAs.
The simplest approach would be an or with grouping.
Pattern p = Pattern.compile("(^\\p{Punct})|(\\p{Punct}$)");
Matcher m = p.matcher(password);
boolean a = m.find();
if (!a)
System.out.println("Password must contain at least one special character at the beginning or end!");
Use this pattern:
"^\\p{Punct}|\\p{Punct}$"
^\\p{Punct} = "start of string, followed by a punctuation character
| = "or"
\\p{Punct}$ = "punctuation character, followed by end of string"
Can anyone please help me do the following in a java regular expression?
I need to read 3 characters from the 5th position from a given String ignoring whatever is found before and after.
Example : testXXXtest
Expected result : XXX
You don't need regex at all.
Just use substring: yourString.substring(4,7)
Since you do need to use regex, you can do it like this:
Pattern pattern = Pattern.compile(".{4}(.{3}).*");
Matcher matcher = pattern.matcher("testXXXtest");
matcher.matches();
String whatYouNeed = matcher.group(1);
What does it mean, step by step:
.{4} - any four characters
( - start capturing group, i.e. what you need
.{3} - any three characters
) - end capturing group, you got it now
.* followed by 0 or more arbitrary characters.
matcher.group(1) - get the 1st (only) capturing group.
You should be able to use the substring() method to accomplish this:
string example = "testXXXtest";
string result = example.substring(4,7);
This might help: Groups and capturing in java.util.regex.Pattern.
Here is an example:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Example {
public static void main(String[] args) {
String text = "This is a testWithSomeDataInBetweentest.";
Pattern p = Pattern.compile("test([A-Za-z0-9]*)test");
Matcher m = p.matcher(text);
if (m.find()) {
System.out.println("Matched: " + m.group(1));
} else {
System.out.println("No match.");
}
}
}
This prints:
Matched: WithSomeDataInBetween
If you don't want to match the entire pattern rather to the input string (rather than to seek a substring that would match), you can use matches() instead of find(). You can continue searching for more matching substrings with subsequent calls with find().
Also, your question did not specify what are admissible characters and length of the string between two "test" strings. I assumed any length is OK including zero and that we seek a substring composed of small and capital letters as well as digits.
You can use substring for this, you don't need a regex.
yourString.substring(4,7);
I'm sure you could use a regex too, but why if you don't need it. Of course you should protect this code against null and strings that are too short.
Use the String.replaceAll() Class Method
If you don't need to be performance optimized, you can try the String.replaceAll() class method for a cleaner option:
String sDataLine = "testXXXtest";
String sWhatYouNeed = sDataLine.replaceAll( ".{4}(.{3}).*", "$1" );
References
https://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html
http://www.vogella.com/tutorials/JavaRegularExpressions/article.html#using-regular-expressions-with-string-methods
I'm trying to find all the occurrences of "Arrows" in text, so in
"<----=====><==->>"
the arrows are:
"<----", "=====>", "<==", "->", ">"
This works:
String[] patterns = {"<=*", "<-*", "=*>", "-*>"};
for (String p : patterns) {
Matcher A = Pattern.compile(p).matcher(s);
while (A.find()) {
System.out.println(A.group());
}
}
but this doesn't:
String p = "<=*|<-*|=*>|-*>";
Matcher A = Pattern.compile(p).matcher(s);
while (A.find()) {
System.out.println(A.group());
}
No idea why. It often reports "<" instead of "<====" or similar.
What is wrong?
Solution
The following program compiles to one possible solution to the question:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class A {
public static void main( String args[] ) {
String p = "<=+|<-+|=+>|-+>|<|>";
Matcher m = Pattern.compile(p).matcher(args[0]);
while (m.find()) {
System.out.println(m.group());
}
}
}
Run #1:
$ java A "<----=====><<---<==->>==>"
<----
=====>
<
<---
<==
->
>
==>
Run #2:
$ java A "<----=====><=><---<==->>==>"
<----
=====>
<=
>
<---
<==
->
>
==>
Explanation
An asterisk will match zero or more of the preceding characters. A plus (+) will match one or more of the preceding characters. Thus <-* matches < whereas <-+ matches <- and any extended version (such as <--------).
When you match "<=*|<-*|=*>|-*>" against the string "<---", it matches the first part of the pattern, "<=*", because * includes zero or more. Java matching is greedy, but it isn't smart enough to know that there is another possible longer match, it just found the first item that matches.
Your first solution will match everything that you are looking for because you send each pattern into matcher one at a time and they are then given the opportunity to work on the target string individually.
Your second attempt will not work in the same manner because you are putting in single pattern with multiple expressions OR'ed together, and there are precedence rules for the OR'd string, where the leftmost token will be attempted first. If there is a match, no matter how minimal, the get() will return that match and continue on from there.
See Thangalin's response for a solution that will make the second work like the first.
for <======= you need <=+ as the regex. <=* will match zero or more ='s which means it will always match the zero case hence <. The same for the other cases you have. You should read up a bit on regexs. This book is FANTASTIC:
Mastering Regular Expressions
Your provided regex pattern String does work for your example: "<----=====><==->>"
String p = "<=*|<-*|=*>|-*>";
Matcher A = Pattern.compile(p).matcher(s);
while (A.find()) {
System.out.println(A.group());
}
However it is broken for some other examples pointed out in the answers such as input string "<-" yields "<", yet strangely "<=" yields "<=" as it should.