Regex for multiple instances of character - java

In Java, using a regular expression, how would I check a string to see if it had a correct amount of instances of a character.
For example take the string hello.world.hello:world:. How could this string be checked to see if it contained two instances of a . or two instances of a :?
I have tried
Pattern p = Pattern.compile("[:]{2}");
Matcher m = p.matcher(hello.world.hello:world:);
m.find();
but that failed.
Edit
First I would like to say thank you for all the answers. I noticed a lot of the answers said something along the lines of "This means: zero or more non-colons, followed by a single colon, followed by zero or more non-colons - matched exactly twice". So if you were checking for 3 : in a string such as Hello::World: how would you do it?

Well, using matches you could use:
"([^:]*:[^:]*){2}"
This means: "zero or more non-colons, followed by a single colon, followed by zero or more non-colons - matched exactly twice".
Using find is not as good, as there may be additional : and it will just ignore them.

You can use this regex based on two lookaheads assertions:
^(?=(?:[^.]*\.){2}[^.]*$)(?=(?:[^:]*:){2}[^:]*$)
(?=(?:[^.]*\.){2}[^.]*$) makes sure there are exactly 2 DOTS and (?=(?:[^:]*:){2}[^:]*$) asserts that there are exactly 2 colons in input string.
RegEx Demo

You can determine whether the string has exectly the given number of a certain character, say ':', by attempting to match it against a pattern of this form:
^(?:[^:]*[:]){2}[^:]*$
That says exactly two non-capturing groups consisting of any number (including zero) of characters other than ':' followed by one colon, with the second group followed by any number of additional characters other than ':'.

Related

How to match a string in this way?

I need to check if a String matches this specific pattern.
The pattern is:
(Numbers)(all characters allowed)(numbers)
and the numbers may have a comma ("." or ",")!
For instance the input could be 500+400 or 400,021+213.443.
I tried Pattern.matches("[0-9],?.?+[0-9],?.?+", theequation2), but it didn't work!
I know that I have to use the method Pattern.match(regex, String), but I am not being able to find the correct regex.
Dealing with numbers can be difficult. This approach will deal with your examples, but check carefully. I also didn't do "all characters" in the middle grouping, as "all" would include numbers, so instead I assumed that finding the next non-number would be appropriate.
This Java regex handles the requirements:
"((-?)[\\d,.]+)([^\\d-]+)((-?)[\\d,.]+)"
However, there is a potential issue in the above. Consider the following:
300 - -200. The foregoing won't match that case.
Now, based upon the examples, I think the point is that one should have a valid operator. The number of math operations is likely limited, so I would whitelist the operators in the middle. Thus, something like:
"((-?)[\\d,.]+)([\\s]*[*/+-]+[\\s]*)((-?)[\\d,.]+)"
Would, I think, be more appropriate. The [*/+-] can be expanded for the power operator ^ or whatever. Now, if one is going to start adding words (such as mod) in the equation, then the expression will need to be modified.
You can see this regular expression here
In your regex you have to escape the dot \. to match it literally and escape the \+ or else it would make the ? a possessive quantifier. To match 1+ digits you have to use a quantifier [0-9]+
For your example data, you could match 1+ digits followed by an optional part which matches either a dot or a comma at the start and at the end. If you want to match 1 time any character you could use a dot.
Instead of using a dot, you could also use for example a character class [-+*] to list some operators or list what you would allow to match. If this should be the only match, you could use anchors to assert the start ^ and the end $ of the string.
\d+(?:[.,]\d+)?.\d+(?:[.,]\d+)?
In Java:
String regex = "\\d+(?:[.,]\\d+)?.\\d+(?:[.,]\\d+)?";
Regex demo
That would match:
\d+(?:[.,]\d+)? 1+ digits followed by an optional part that matches . or , followed by 1+ digits
. Match any character (Use .+) to repeat 1+ times
Same as the first pattern

Formulating a regex with a single dot

I am trying to formulate a regex for the following scenario :
The String to match : mName87.com
So, the string may consist of any number of alpha numeric characters , but can contain only a single dot anywhere in the string .
I formulated this regex : [a-zA-Z0-9.], but it matches even multiple dots(.)
What am i doing wrong here ?
The regex you provided matches only a single character in the whole string you're trying to validate. There are a few things to take care of in your scenario
You want to match over the whole string, so your regex must start with ^ (beginning of the string) and end with $ (end of the string).
Then you want to accept any number of alpha-numeric characters, this is done with [a-zA-Z0-9]+, here the + means one or more characters.
Then match the point: \. (you must escape it here)
Finally accept more characters again.
All together the regex would then be:
^[a-zA-Z0-9]+\.[a-zA-Z0-9]+$
You can use this regex:
\\w*\\.\\w*
You can try here
Try with:
^([a-zA-Z0-9]+\.)+[a-zA-Z]$
use this regular expression ^[a-zA-Z0-9]*\.[a-zA-Z0-9.]*$
EDITED:
Try
([a-zA-Z0-9]+\.[a-zA-Z0-9]+)|(\.[a-zA-Z0-9]+)|([a-zA-Z0-9]+\.)
That is: [a word that ends with a dot] OR [two words and the dot in the middle] OR [a word that starts with a dot]

How to combine these regex for javascript

Hi I am trying to use regEx in JS for identifying 3 identical consecutive characters (could be alphabets,numbers and also all non alpha numeric characters)
This identifies 3 identical consecutive alphabets and numbers : '(([0-9a-zA-Z])\1\1)'
This identifies 3 identical consecutive non alphanumerics : '(([^0-9a-zA-Z])\1\1)'
I am trying to combine both, like this : '(([0-9a-zA-Z])\1\1)|(([^0-9a-zA-Z])\1\1)'
But I am doing something wrong and its not working..(returns true for '88aa3BBdd99##')
Edit : And to find NO 3 identical characters, this seems to be wrong /(^([0-9a-zA-Z]|[^0-9a-zA-Z])\1\1)/ --> RegEx in JS to find No 3 Identical consecutive characters
thanks
Nohsib
The problem is that backreferences are counted from left to right throughout the whole regex. So if you combine them your numbers change:
(([0-9a-zA-Z])\2\2)|(([^0-9a-zA-Z])\4\4)
You could also remove the outer parens:
([0-9a-zA-Z])\1\1|([^0-9a-zA-Z])\2\2
Or you could just capture the alternatives in one set of parens together and append one back-reference to the end:
([0-9a-zA-Z]|[^0-9a-zA-Z])\1\1
But since your character classes match all characters anyway you can have that like this as well:
([\s\S])\1\1
And if you activate the DOTALL or SINGLELINE option, you can use a . instead:
(.)\1\1
It's actually much simpler:
(.)\1\1
The (.) matches any character, and each \1 is a back reference that matches the exact string that was matched by the first capturing group. You should be aware of what the . actually matches and then modify the group (in the parentheses) to fit your exact needs.

Regular Expression to match one or more digits 1-9, one '|', one or more '*" and zero or more ','

I'm new to regular expressions and I need to find a regular expression that matches one or more digits [1-9] only ONE '|' sign, one or more '*' sign and zero or more ',' sign.
The string should not contain any other characters.
This is what I have:
if(this.ruleString.matches("^[1-9|*,]*$"))
{
return true;
}
Is it correct?
Thanks,
Vinay
I think you should test separately for every type of symbols rather than write complex expression.
First, test that you don't have invalid symbols - "^[0-9|*,]$"
Then test for digits "[1-9]", it should match at least one.
Then test for "\\|", "\\*" and "\\," and check the number of matches.
If all test are passed then your string is valid.
Nope, try this:
"^[1-9]+\\|\\*+,*$"
Please give us at least 10 possible matching strings of what you are looking to accept, and 10 of what you want to reject, and tell us if either this have to keep some sequence or its order doesn't matter. So we can make a reliable regex.
By now, all I can offer is:
^[1-9]+\|{1}\*+,*$
This RegEx was tested against these sample strings, accepting them:
56421|*****,,,
2|*********,,,
1|*
7|*,
18|****
123456789|*
12|********,,
1516332|**,,,
111111|*
6|*****,,,,
And it was tested against these sample strings, rejecting them:
10|*,
2***525*|*****,,,
123456,15,22*66*****4|,,,*167
1|2*3,4,5,6*
,*|173,
|*,
||12211
12
1|,*
1233|54|***,,,,
I assume your given order is strict and all conditions apply at the same time.
It looks like the pattern you need is
n-n, one or more times seperated by commas
then a bar (|)
then n*n, one or more times seperated by commas.
Here is a regular expression for that.
([1-9]{1}[0-9]*\-[0-9]+){1}
(,[1-9]{1}[0-9]*\-[0-9]+)*
\|
([1-9]{1}[0-9]*\*[0-9]+){1}
(,[1-9]{1}[0-9]*\*[0-9]+)*
But it is so complex, and does not take into account the details, such as
for the case of n-m, you want
n less than m
(I guess).
And you likely want the same number of n-m before the bar, and x*y after the bar.
Depends whether you want to check the syntax completely or not.
(I hope you do want to.)
Since this is so complex, it should be done with a set of code instead of a single regular expression.
this regex should work
"^[1-9\\|\\*,-]*$"
Assert position at the beginning of the string «^»
Match a single character present in the list below «[1-9\|*,-]»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «»
A character in the range between “1” and “9” «1-9»
A | character «\|»
A * character «*»
The character “,” «,»
The character “-” «-»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

regex for specific digit prefix

I am trying to have the following regx rule, but couldn't find solution.
I am sorry if I didn't make it clear. I want for each rule different regx. I am using Java.
rule should fail for all digit inputs start with prefix '1900' or '1901'.
(190011 - fail, 190111 - fail, 41900 - success...)
rule should success for all digit inputs with the prefix '*'
different regex for each rule (I am not looking for the combination of both of them together)
Is this RE fitting the purpose ? :
'\A(\*|(?!190[01])).*'
\A means 'the beginning of string' . I think it's the same in Java's regexes
.
EDIT
\A : "from the very beginning of the string ....". In Python (which is what I know, in fact) this can be omitted if we use the function match() that always analyzes from the very beginning, instead of search() that search everywhere in a string. If you want the regex able to analyze lines from the very beginning of each line, this must be replaced by ^
(...|...) : ".... there must be one of the two following options : ....."
\* : "...the first option is one character only, a star; ..." . As a star is special character meaning 'zero, one or more times what is before' in regex's strings, it must be escaped to strictly mean 'a star' only.
(?!190[01]) : "... the second option isn't a pattern that must be found and possibly catched but a pattern that must be absent (still after the very beginning). ...". The two characters ?! are what says 'there must not be the following characters'. The pattern not to be found is 4 integer characters long, '1900' or '1901' .
(?!.......) is a negative lookahead assertion. All kinds of assertion begins with (? : the parenthese invalidates the habitual meaning of ? , that's why all assertions are always written with parentheses.
If \* have matched, one character have been consumed. On the contrary, if the assertion is verified, the corresponding 4 first characters of the string haven't been consumed: the regex motor has gone through the analysed string until the 4th character to verify them, and then it has come back to its initial position, that is to say, presently, at the very beginning of the string.
If you want the bi-optional part (...|...) not to be a capturing group, you will write ?: just after the first paren, then '\A(?:\*|(?!190[01])).*'
.* : After the beginning pattern (one star catched/matched, or an assertion verified) the regex motor goes and catch all the characters until the end of the line. If the string has newlines and you want the regex to catch all the characters until the end of the string, and not only of a line, you will specify that . must match the newlines too (in Python it is with re.MULTILINE), or you will replace .* with (.|\r|\n)*
I finally understand that you apparently want to catch strings composed of digits characters. If so the RE must be changed to '\A(?:\*|(?!190[01]))\d*' . This RE matches with empty strings. If you want no-match with empty strings, put \d+ in place of \d* . If you want that only strings with at least one digit, even after the star when it begins with a star, match, then do '\A(?:\*|(?!190[01]))(?=\d)\d*'
For the first rule, you should use a combo regex with two captures, one to capture the 1900/1901-prefixed case, and one the capture the rest. Then you can decide whether the string should succeed or fail by examining the two captures:
(190[01]\d+)|(\d+)
Or just a simple 190[01]\d+ and negate your logic.
Regex's are not really very good at excluding something.
You may exclude a prefix using negative look-behind, but it won't work in this case because the prefix is itself a stream of digits.
You seem to be trying to exclude 1-900/901 phone numbers in the US. If the number of digits is definite, you can use a negative look-behind to exclude this prefix while matching the remaining exact number digits.
For the second rule, simply:
\*\d+

Categories