Java regex return after first match - java

how do i return after the first match of regular expression? (does the Matcher.find() method do that? )
say I have a string "abcdefgeee". I want to ask the regex engine stop finding immediately after it finds the first match of "e" for example. I am writing a method to return true/false if the pattern is found and i don't want to find the whole string for "e". (I am looking for a regex solution )
Another question, sometimes when i use matches() , it doesn't return correctly. For example, if i compile my pattern like "[a-z]". and then use matches(), it doesn't match. But when I compile the pattern as ".*[a-z].*", it matches.... is that the behaviour of the matches() method of Matcher class?
Edit, here's actually what i want to do. For example I want to search for a $ sign AND a # sign in a string. So i would define 2 compiled patterns (since i can't find any logical AND for regex as I know the basics).
pattern1 = Pattern.compiled("$");
pattern2 = Pattern.compiled("#");
then i would just use
if ( match1.find() && match2.find() ){
return true;
}
in my method.
I only want the matchers to search the string for first occurrence and return.
thanks

For your second question, matches does work correctly, you example uses two different regular expressions.
.*[a-z].* will match a String that has at least one character. [a-z] will only match a one character String that is lower case a-z. I think you might mean to use something like [a-z]+

Another question, sometimes when i use matches() , it doesn't return correctly. For example, if i compile my pattern like "[a-z]". and then use matches(), it doesn't match. But when I compile the pattern as ".[a-z].", it matches.... is that the behaviour of the matches() method of Matcher class?
Yes, matches(...) tests the entire target string.
... here's actually what i want to do. For example I want to search for a $ sign AND a # sign in a string. So i would define 2 compiled patterns (since i can't find any logical AND for regex as I know the basics).
I know you said you wanted to use regex, but all your examples seems to suggest you have no need for them: those are all singe characters that can be handled with a couple of indexOf(...) calls.
Anyway, using regex, you could do it like this:
public static boolean containsAll(String text, String... patterns) {
for(String p : patterns) {
Matcher m = Pattern.compile(p).matcher(text);
if(!m.find()) return false;
}
return true;
}
But, again: indexOf(...) would do the trick as well:
public static boolean containsAll(String text, String... subStrings) {
for(String s : subStrings) {
if(text.indexOf(s) < 0) return false;
}
return true;
}

Related

Regex to replace All turkish symbols to regular latin symbols

I have a class that replaces all turkish symbols to similar latin symbols and pass the result to searcher.
these are the methods for symbol replacement
#Override
String replaceTurkish(String words) {
if (checkWithRegExp(words)) {
return words.toLowerCase().replaceAll("ç", "c").replaceAll("ğ", "g").replaceAll("ı", "i").
replaceAll("ö", "o").replaceAll("ş", "s").replaceAll("ü", "u");
} else return words;
}
public static boolean checkWithRegExp(String word){
Pattern p = Pattern.compile("[öçğışü]");
Matcher m = p.matcher(word);
return m.matches();
}
But this always return unmodified words statement.
What am I doing wrong?
Thanks in advance!
Per the Java 7 api, Matcher.matches()
Attempts to match the entire region against the pattern.
Your pattern is "[öçğışü]", which regex101.com (an awesome resource) says will match
a single character in the list öçğışü literally
Perhaps you may see the problem already. Your regex is not going to match anything except a single Turkish character, since you are attempting to match the entire region against a regex which will only ever accept one character.
I recommend either using find(), per suggestion by Andreas in the comments, or using a regex like this:
".*[öçğışü].*"
which should actually find words which contains any Turkish-specific characters.
Additionally, I'll point out that regex is case-sensitive, so if there are upper-case variants of these letters, you should include those as well and modify your replace statements.
Finally (edit): you can make your Pattern case-insensitive, but your replaceAll's will still need to change to be case-insensitive. I am unsure of how this will work with non-Latin characters, so you should test that flag before relying on it.
Pattern p = Pattern.compile(".*[öçğışü].*", Pattern.CASE_INSENSITIVE);

regex in java string

While trying some JAVA coding on the codingbat.com site, I came repeatedly to a Question about the functionality of regular expressions in java strings.
I know there are JAVA methods like matches() or finder() as well as replace() and so on, but this isn't where I wanted to go.
Take a quick look at the example:
boolean doubleX(String str) {
if(str.equals("xx")){
return true;
} else {
return false;
}
}
I wonder whether I could use regular expressions in the string to add a quantifier, for example
<----- add regex here
if(str.equals("x\[x.*]")){
Would you sirs, be so kind, to explain me, how I could use regex in strings? After all I understood, I thought, it would be possible even w/o using the java regex methodes, because the escape signal \ makes them usable even in plain code. Did I got this wrong?
Use String#matches(String)
if (str.matches(regex)) {
// ...
}
This will only find out if there is a match for the regex though.
What I suggest is that you specify the quantifier in your regex instead of counting the number of matches, like so:
public boolean isX(String str, int count) {
return str.matches("^x{" + count + "}$");
}
Some methods support regex as input and some is not. In general you can't use regex in plain String, because after all it will be just plain string. But some your or framework's methods can support regex inside with Pattern or other approaches.
You can use the Pattern and the Matcher class
private final Pattern PATTERN = Pattern.compile("x\[x.*]");
and then
Matcher matcher = PATTERN.matcher(str);
if (matcher.find())
doSomething();

Negative Lookaround Regex - Only one occurrence - Java

I am trying to find if a string contains only one occurrence of a word ,
e.g.
String : `jjdhfoobarfoo` , Regex : `foo` --> false
String : `wewwfobarfoo` , Regex : `foo` --> true
String : `jjfffoobarfo` , Regex : `foo` --> true
multiple foo's may happen anywhere in the string , so they can be non-consecutive,
I test the following regex matching in java with string foobarfoo, but it doesn't work and it returns true :
static boolean testRegEx(String str){
return str.matches(".*(foo)(?!.*foo).*");
}
I know this topic may seem duplicate , but I am surprised because when I use this regex : (foo)(?!.*foo).* it works !
Any idea why this happens ?
Use two anchored look-aheads:
static boolean testRegEx(String str){
return str.matches("^(?=.*foo)(?!.*foo.*foo.*$).*");
}
A couple of key points are that there is a negative look-ahead to check for 2 foo's that is anchored to start, and importantly containes an end of input.
If you want to check if a string contains another string exactly once, here are two possible solutions, (one with regex, one without)
static boolean containsRegexOnlyOnce(String string, String regex) {
Matcher matcher = Pattern.compile(regex).matcher(string);
return matcher.find() && !matcher.find();
}
static boolean containsOnlyOnce(String string, String substring) {
int index = string.indexOf(substring);
if (index != -1) {
return string.indexOf(substring, index + substring.length()) == -1;
}
return false;
}
All of them work fine. Here's a demo of your examples:
String str1 = "jjdhfoobarfoo";
String str2 = "wewwfobarfoo";
String str3 = "jjfffoobarfo";
String foo = "foo";
System.out.println(containsOnlyOnce(str1, foo)); // false
System.out.println(containsOnlyOnce(str2, foo)); // true
System.out.println(containsOnlyOnce(str3, foo)); // true
System.out.println(containsRegexOnlyOnce(str1, foo)); // false
System.out.println(containsRegexOnlyOnce(str2, foo)); // true
System.out.println(containsRegexOnlyOnce(str3, foo)); // true
You can use this pattern:
^(?>[^f]++|f(?!oo))*foo(?>[^f]++|f(?!oo))*$
It's a bit long but performant.
The same with the classical example of the ashdflasd string:
^(?>[^a]++|a(?!shdflasd))*ashdflasd(?>[^a]++|a(?!shdflasd))*$
details:
(?> # open an atomic group
[^f]++ # all characters but f, one or more times (possessive)
| # OR
f(?!oo) # f not followed by oo
)* # close the group, zero or more times
The possessive quantifier ++ is like a greedy quantifier + but doesn't allow backtracks.
The atomic group (?>..) is like a non capturing group (?:..) but doesn't allow backtracks too.
These features are used here for performances (memory and speed) but the subpattern can be replaced by:
(?:[^f]+|f(?!oo))*
The problem with your regex is that the first .* initially consumes the whole string, then backs off until it finds a spot where the rest of the regex can match. That means, if there's more than one foo in the string, your regex will always match the last one. And from that position, the lookahead will always succeed as well.
Regexes that you use for validating have to be more precise than the ones you use for matching. Your regex is failing because the .* can match the sentinel string, 'foo'. You need to actively prevent matches of foo before and after the one you're trying to match. Casimir's answer shows one way to do that; here's another:
"^(?>(?!foo).)*+foo(?>(?!foo).)*+$"
It's not quite as efficient, but I think it's a lot easier to read. In fact, you could probably use this regex:
"^(?!.*foo.*foo).+$"
It's a great deal more inefficient, but a complete regex n00b would probably figure out what it does.
Finally, notice that none of theses regexes--mine or Casimir's--uses lookbehinds. I know it seems like the perfect tool for the job, but no. In fact, lookbehind should never be the first tool you reach for. And not just in Java. Whatever regex flavor you use, it's almost always easier to match the whole string in the normal way than it is to use lookbehinds. And usually much more efficient, too.
Someone answered the question, but deleted it ,
The following short code works correctly :
static boolean testRegEx(String str){
return !str.matches("(.*?foo.*){0}|(.*?foo.*){2,}");
}
Any idea on how to invert the result inside the regex itself ?

Need regex to match the given string

I need a regex to match a particular string, say 1.4.5 in the below string . My string will be like
absdfsdfsdfc1.4.5kdecsdfsdff
I have a regex which is giving [c1.4.5k] as an output. But I want to match only 1.4.5. I have tried this pattern:
[^\\W](\\d\\.\\d\\.\\d)[^\\d]
But no luck. I am using Java.
Please let me know the pattern.
When I read your expression [^\\W](\\d\\.\\d\\.\\d)[^\\d] correctly, then you want a word character before and not a digit ahead. Is that correct?
For that you can use lookbehind and lookahead assertions. Those assertions do only check their condition, but they do not match, therefore that stuff is not included in the result.
(?<=\\w)(\\d\\.\\d\\.\\d)(?!\\d)
Because of that, you can remove the capturing group. You are also repeating yourself in the pattern, you can simplify that, too:
(?<=\\w)\\d(?:\\.\\d){2}(?!\\d)
Would be my pattern for that. (The ?: is a non capturing group)
Your requirements are vague. Do you need to match a series of exactly 3 numbers with exactly two dots?
[0-9]+\.[0-9]+\.[0-9]+
Which could be written as
([0-9]+\.){2}[0-9]+
Do you need to match x many cases of a number, seperated by x-1 dots in between?
([0-9]+\.)+[0-9]+
Use look ahead and look behind.
(?<=c)[\d\.]+(?=k)
Where c is the character that would be immediately before the 1.4.5 and k is the character immediately after 1.4.5. You can replace c and k with any regular expression that would suit your purposes
I think this one should do it : ([0-9]+\\.?)+
Regular Expression
((?<!\d)\d(?:\.\d(?!\d))+)
As a Java string:
"((?<!\\d)\\d(?:\\.\\d(?!\\d))+)"
String str= "absdfsdfsdfc**1.4.5**kdec456456.567sdfsdff22.33.55ffkidhfuh122.33.44";
String regex ="[0-9]{1}\\.[0-9]{1}\\.[0-9]{1}";
Matcher matcher = Pattern.compile( regex ).matcher( str);
if (matcher.find())
{
String year = matcher.group(0);
System.out.println(year);
}
else
{
System.out.println("no match found");
}

How to find the exact word using a regex in Java?

Consider the following code snippet:
String input = "Print this";
System.out.println(input.matches("\\bthis\\b"));
Output
false
What could be possibly wrong with this approach? If it is wrong, then what is the right solution to find the exact word match?
PS: I have found a variety of similar questions here but none of them provide the solution I am looking for.
Thanks in advance.
When you use the matches() method, it is trying to match the entire input. In your example, the input "Print this" doesn't match the pattern because the word "Print" isn't matched.
So you need to add something to the regex to match the initial part of the string, e.g.
.*\\bthis\\b
And if you want to allow extra text at the end of the line too:
.*\\bthis\\b.*
Alternatively, use a Matcher object and use Matcher.find() to find matches within the input string:
Pattern p = Pattern.compile("\\bthis\\b");
Matcher m = p.matcher("Print this");
m.find();
System.out.println(m.group());
Output:
this
If you want to find multiple matches in a line, you can call find() and group() repeatedly to extract them all.
Full example method for matcher:
public static String REGEX_FIND_WORD="(?i).*?\\b%s\\b.*?";
public static boolean containsWord(String text, String word) {
String regex=String.format(REGEX_FIND_WORD, Pattern.quote(word));
return text.matches(regex);
}
Explain:
(?i) - ignorecase
.*? - allow (optionally) any characters before
\b - word boundary
%s - variable to be changed by String.format (quoted to avoid regex
errors)
\b - word boundary
.*? - allow (optionally) any characters after
For a good explanation, see: http://www.regular-expressions.info/java.html
myString.matches("regex") returns true or false depending whether the
string can be matched entirely by the regular expression. It is
important to remember that String.matches() only returns true if the
entire string can be matched. In other words: "regex" is applied as if
you had written "^regex$" with start and end of string anchors. This
is different from most other regex libraries, where the "quick match
test" method returns true if the regex can be matched anywhere in the
string. If myString is abc then myString.matches("bc") returns false.
bc matches abc, but ^bc$ (which is really being used here) does not.
This writes "true":
String input = "Print this";
System.out.println(input.matches(".*\\bthis\\b"));
You may use groups to find the exact word. Regex API specifies groups by parentheses. For example:
A(B(C))D
This statement consists of three groups, which are indexed from 0.
0th group - ABCD
1st group - BC
2nd group - C
So if you need to find some specific word, you may use two methods in Matcher class such as: find() to find statement specified by regex, and then get a String object specified by its group number:
String statement = "Hello, my beautiful world";
Pattern pattern = Pattern.compile("Hello, my (\\w+).*");
Matcher m = pattern.matcher(statement);
m.find();
System.out.println(m.group(1));
The above code result will be "beautiful"
Is your searchString going to be regular expression? if not simply use String.contains(CharSequence s)
System.out.println(input.matches(".*\\bthis$"));
Also works. Here the .* matches anything before the space and then this is matched to be word in the end.

Categories