regex parenthesis OR digit start of line - java

I'm new to regex and trying to get this statement to pass. What I'm trying to do is check the first character or two from the string.
I want to see that if it starts with a number then the statement will be true. I also want to check that if the first character is a "(" then i want to check that the next number is a digit too.
So far I've got:
if (str.matches("^[(?\d")){
return true;
but this doesn't seem to work. ^ for anchoring to start, (? to check for optional parenthesis and then check if it's a digit afterwards. How have i stuffed up?
So 0800, (0800, (09, 09, should pass where as *08, (*0, AB, (AB, *AB should fail.
Thanks

You need to remove the opening square bracket and escape the metacharacters. Also, matches() tells whether or not the entire string matches the given regular expression. So, you need to add the token .* afterwards to greedily match every single character in the string.
if (str.matches("\\(?\\d.*")) { ... }
Ideone Demo

This regex should do the trick:
^\d.*|^\(\d.*
In Java, with escaping of backslashes:
if (str.matches("^\\d.*|^\\(\\d.*"))

Related

Regex return true if even a substring follows the pattern

I was just practicing regex and found something intriguing
for a string
"world9 a9$ b6$" my regular expression "^(?=.*[\\d])(?=\\S+\\$).{2,}$"
will return false as there is a space in between before the look ahead finds the $ sign with at least one digit and non space character.
As a whole the string doesn't matches the pattern.
What should be the regular expression if I want to return true even if a substring follows a pattern?
as in this one a9$ and b6$ both follow the regular expression.
You can use
^(?=\D*\d)(?=.*\S\$).{2,}$
See the regex demo. As The fourth bird mentions, since \S\$ matches two chars, you may simply move the pattern to the consuming part, and use ^(?=\D*\d).*\S\$.*$, see this regex demo.
Details
^ - start of string (implicit if used in .matches())
(?=\D*\d) - a positive lookahead that requires zero or more non-digit chars followed with a digit char immediately to the right of the current location
(?=.*\S\$) - a positive lookahead that requires zero or more chars other than line break chars, as many as possible, followed with a non-whitespace char and a $ char immediately to the right of the current location
.{2,} - any two or more chars other than line break chars, as many as possible
$ - end of string (implicit if used in .matches())
Mostly, knock out the ^ and $ bits, as those force this into a full string match, and you want substring matches. In general, look-ahead seems like a mistake here, what are you trying to accomplish by using that? (Look-ahead/look-behind is rarely needed in general). All you need is:
Pattern.compile("\\S+\\$");
possibly, if you want an element (such as a9$) to stand on its own, use \b which is regexpese for word break: Basically, whitespace (and a few other characters, such as underscores. Most non-letter, non-digits characters are considered a break. Think [^a-zA-Z0-9]) - but \b also matches start/end of input. Thus:
Pattern.compile("\\b\\S+\\$\\b")
still matches foo a9$ bar, or a9$ just fine.
If you MUST put this in terms of a full match, e.g. because matches() (which always does a full string match) is run and you can't change that, well, put ^.* in front and .*$ at the back of it, simple as that.
Absolutely nothing about this says "This can only be needed with lookahead".

Java Regex with "Joker" characters

I try to have a regex validating an input field.
What i call "joker" chars are '?' and '*'.
Here is my java regex :
"^$|[^\\*\\s]{2,}|[^\\*\\s]{2,}[\\*\\?]|[^\\*\\s]{2,}[\\?]{1,}[^\\s\\*]*[\\*]{0,1}"
What I'm tying to match is :
Minimum 2 alpha-numeric characters (other than '?' and '*')
The '*' can only appears one time and at the end of the string
The '?' can appears multiple time
No WhiteSpace at all
So for example :
abcd = OK
?bcd = OK
ab?? = OK
ab*= OK
ab?* = OK
??cd = OK
*ab = NOT OK
??? = NOT OK
ab cd = NOT OK
abcd = Not OK (space at the begining)
I've made the regex a bit complicated and I'm lost can you help me?
^(?:\?*[a-zA-Z\d]\?*){2,}\*?$
Explanation:
The regex asserts that this pattern must appear twice or more:
\?*[a-zA-Z\d]\?*
which asserts that there must be one character in the class [a-zA-Z\d] with 0 to infinity questions marks on the left or right of it.
Then, the regex matches \*?, which means an 0 or 1 asterisk character, at the end of the string.
Demo
Here is an alternative regex that is faster, as revo suggested in the comments:
^(?:\?*[a-zA-Z\d]){2}[a-zA-Z\d?]*\*?$
Demo
Here you go:
^\?*\w{2,}\?*\*?(?<!\s)$
Both described at demonstrated at Regex101.
^ is a start of the String
\?* indicates any number of initial ? characters (must be escaped)
\w{2,} at least 2 alphanumeric characters
\?* continues with any number of and ? characters
\*? and optionally one last * character
(?<!\s) and the whole String must have not \s white character (using negative look-behind)
$ is an end of the String
Other way to solve this problem could be with look-ahead mechanism (?=subregex). It is zero-length (it resets regex cursor to position it was before executing subregex) so it lets regex engine do multiple tests on same text via construct
(?=condition1)
(?=condition2)
(?=...)
conditionN
Note: last condition (conditionN) is not placed in (?=...) to let regex engine move cursor after tested part (to "consume" it) and move on to testing other things after it. But to make it possible conditionN must match precisely that section which we want to "consume" (earlier conditions didn't have that limitation, they could match substrings of any length, like lets say few first characters).
So now we need to think about what are our conditions.
We want to match only alphanumeric characters, ?, * but * can appear (optionally) only at end. We can write it as ^[a-zA-Z0-9?]*[*]?$. This also handles non-whitespace characters because we didn't include them as potentially accepted characters.
Second requirement is to have "Minimum 2 alpha-numeric characters". It can be written as .*?[a-zA-Z0-9].*?[a-zA-Z0-9] or (?:.*?[a-zA-Z0-9]){2,} (if we like shorter regexes). Since that condition doesn't actually test whole text but only some part of it, we can place it in look-ahead mechanism.
Above conditions seem to cover all we wanted so we can combine them into regex which can look like:
^(?=(?:.*?[a-zA-Z0-9]){2,})[a-zA-Z0-9?]*[*]?$

Java Regular Expression Negative Look Ahead Finding Wrong Match

Assume I have the following string.
create or replace package test as
-- begin null; end;/
end;
/
I want a regular expression that will find the semicolon not preceded by a set of "--" double dashes on the same line. I'm using the following pattern "(?!--.*);" and I'm still getting matches for the two semicolons on the 2nd line.
I feel like I'm missing something about negative look aheads but I can't figure out what.
If you want to match semicolons only on the lines which do not start with --, this regex should do the trick:
^(?!--).*(;)
Example
I only made a few changes from your regex:
Multi-line mode, so we can use ^ and $ and search by line
^ at the beginning to indicate start of a line
.* between the negative lookahead and the semicolon, because otherwise with the first change it would try to match something like ^;, which is wrong
(I also added parentheses around the semicolon so the demo page displays the result more clearly, but this is not necessary and you can change to whatever is most convenient for your program.)
First of all, what you need is a negative lookbehind (?<!) and not a negative lookahead (?!) since you want to check what's behind your potential match.
Even with that, you won't be able to use the negative lookbehind in your case since the Java's regex engine does not support variable length lookbehind. This means that you need to know exactly how many characters to look behind your potential match for it to work.
With that said, wouldn't be simpler in your case to just split your String by linefeed/carriage return and then remove the line that start with "--"?
The reason "(?!--.*);" isn't working is because the negative look ahead is asserting that when positioned before a ; that the next two chars are --, which of course matches every time (; is always not --).
In java, to match a ; that doesn't have -- anywhere before it:
"\\G(((?<!--)[^;])*);"
To see this in action using a replaceAll() call:
String s = "foo; -- begin null; end;";
s = s.replaceAll("\\G(((?<!--)[^;])*);", "$1!");
System.out.println(s);
Output:
foo! -- begin null; end;
Showing that only semi colons before a double dash are matched.

How to match ^(d+) in a particular text using regex

For example I have text like below :
case1:
(1) Hello, how are you?
case2:
Hi. (1) How're you doing?
Now I want to match the text which starts with (\d+).
I have tried the following regex but nothing is working.
^[\(\d+\)], ^\(\d+\).
[] are used to match any of the things you specify inside the brackets, and are to be followed by a quantifier.
The second regexp will work: ^\(\d+\), so check your code.
Check also so there's no space in front of the first parenthesis, or add \s* in front.
EDIT: Also, java can be tricky with escapes depending on if the regexp you type is directly translated to a regexp or is first a string literal. You may need to double escape your escapes.
In Java you have to escape parenthesis, so "\\(\\d+\\)" should match (1) in case one and two. Adding ^ as you did "^\\(\\d+\\)" will match only case1.
You have to use double back slashes within java string. Consider this
"\n" give you [line break]
"\\n" give you [backslash][n]
If you are going to downvote my post, at least comment to tell me WHY it's not useful.
I believe Java's Regex Engine supports Positive Lookbehind, in which case you can use the following regex:
(?<=[(][0-9]{1,9999}[)]\s?)\b.*$
Which matches:
The literal text (
Any digit [0-9], between 1 and 9999 times {1,9999}
The literal text )
A space, between 0 and 1 times \s?
A word boundary \b
Any character, between 0 and unlimited times .*
The end of a string $

regex certain character can exist or not but nothing after that

I'm new to regex and I'm trying to do a search on a couple of string.
I wanted to check if a certain character, in this case its ":" (without the quote) exist on the strings.
If : does not exist in the string it would still match, but if : exist there should be nothing after that only space and new line will be allowed.
I have this pattern, but it does not seem to work as I want it.
(.*)(:?\s*\n*)
Thank you.
If I understand your question correctly, ^[^:]*(:\s*)?$
Let's break this down a bit:
^ Starting anchor; without this, the match can restart itself every time it sees another colon, or non-whitespace following a colon.
[^:]* Match any number of characters that AREN'T colon characters; this way, if the entire string is non-colon characters, the string is treated as a valid match.
(:\s*)? If at any point we do see a colon, all following characters must be white space until the end of the string; the grouping parens and following ? act to make this an all-or-nothing conditional statement.
$ Ending anchor; without this, the regex won't know that if it sees a colon the following whitespace MUST persist until the end of the string.
here is a pattern which should work
/^([^:]*|([^:]*:\s*))$/
you can use the pipe to manage alternatives
Another way is :
^[^:]*(|:[\n]*)$
^[^:]* => starts with anything except :
(|:[\n]*)$ => ends either with exactly nothing OR ':' followed by line breaks

Categories