Java regex for a specific phone number format - java

I am looking for a regex java pattern to match the following string:
[Phone Number]= 1234567890
Here:
The regex should look for hardcoded string of "[Phone Number]=" followed by space or not, and followed by
any numbers of digits.
That means it should match:
[Phone Number]= 123456 and
[Phone Number]=1234567890
Any help is appreciated.

Well, something like:
String pattern = "\\[Phone Number\\]= ?\\d+";
The backslashes are doubled just because of Java string literal syntax
The square brackets are escaped to avoid them being used to group a set of characters
The ? means zero or one space
The \d+ (after unescaping) means "at least one digit"

What are your rules for a phone number?
Simply a list of digits is rarely a valid phone number format.
A simple search can find you the regexps for specific countries (I'll provide a specific one if you can tell us which one you need).

Related

Regex format for a particular Match

I am trying to write a regex for the following format
PA-123456-067_TY
It's always PA, followed by a dash, 6 digits, another dash, then 3 digits, and ends with _TY
Apparently, when I write this regex to match the above format it shows the output correctly
^[^[PA]-]+-(([^-]+)-([^_]+))_([^.]+)
with all the Negation symbols ^
This does not work if I write the regex in the below format without negation symbols
[[PA]-]+-(([-]+)-([_]+))_([.]+)
Can someone explain to me why is this so?
The negation symbol means that the character cannot be anything within the specified class. Your regex is much more complicated than it needs to be and is therefore obfuscating what you really want.
You probably want something like this:
^PA-(\d+)-(\d+)_TY$
... which matches anything that starts with PA-, then includes two groups of numbers separated by a dash, then an underscore and the letters TY. If you want everything after the PA to be what you capture, but separated into the three groups, then it's a little more abstract:
^PA-(.+)-(.+)_(.+)$
This matches:
PA-
a capture group of any characters
a dash
another capture group of any characters
an underscore
all the remaining characters until end-of-line
Character classes [...] are saying match any single character in the list, so your first capture group (([^-]+)-([^_]+)) is looking for anything that isn't a dash any number of times followed by a dash (which is fine) followed by anything that isn't an underscore (again fine). Having the extra set of parentheses around that creates another capture group (probably group 1 as it's the first parentheses reached by the regex engine)... that part is OK but probably makes interpreting the answer less intuitive in this case.
In the re-write however, your first capture group (([-]+)-([_]+)) matches [-]+, which means "one or more dashes" followed by a dash, followed by any number of underscores followed by an underscore. Since your input does not have a dash immediately following PA-, the entire regex fails to find anything.
Putting the PA inside embedded character classes is also making things complicated. The first part of your first one is looking for, well, I'm not actually sure how [^[PA]-]+ is interpreted in practice but I suspect it's something like "not either a P or an A or a dash any number of times". The second one is looking for the opposite, I think. But you don't want any of that, you just want to start without anything other than the actual sequence of characters you care about, which is just PA-.
Update: As per the clarifications in the comments on the original question, knowing you want fixed-size groups of digits, it would look like this:
^PA-(\d{6})-(\d{3})_TY$
That captures PA-, then a 6-digit number, then a dash, then a 3-digit number, then _TY. The six digit number and 3 digit numbers will be in capture groups 1 and 2, respectively.
If the sizes of those numbers could ever change, then replace {x} with + to just capture numbers regardless of max length.
according to your comment this would be appropriate PA-\d{6}-\d{3}_TY
EDIT: if you want to match a line use it with anchors: ^PA-\d{6}-\d{3}_TY$

Regex for multiple instances of character

In Java, using a regular expression, how would I check a string to see if it had a correct amount of instances of a character.
For example take the string hello.world.hello:world:. How could this string be checked to see if it contained two instances of a . or two instances of a :?
I have tried
Pattern p = Pattern.compile("[:]{2}");
Matcher m = p.matcher(hello.world.hello:world:);
m.find();
but that failed.
Edit
First I would like to say thank you for all the answers. I noticed a lot of the answers said something along the lines of "This means: zero or more non-colons, followed by a single colon, followed by zero or more non-colons - matched exactly twice". So if you were checking for 3 : in a string such as Hello::World: how would you do it?
Well, using matches you could use:
"([^:]*:[^:]*){2}"
This means: "zero or more non-colons, followed by a single colon, followed by zero or more non-colons - matched exactly twice".
Using find is not as good, as there may be additional : and it will just ignore them.
You can use this regex based on two lookaheads assertions:
^(?=(?:[^.]*\.){2}[^.]*$)(?=(?:[^:]*:){2}[^:]*$)
(?=(?:[^.]*\.){2}[^.]*$) makes sure there are exactly 2 DOTS and (?=(?:[^:]*:){2}[^:]*$) asserts that there are exactly 2 colons in input string.
RegEx Demo
You can determine whether the string has exectly the given number of a certain character, say ':', by attempting to match it against a pattern of this form:
^(?:[^:]*[:]){2}[^:]*$
That says exactly two non-capturing groups consisting of any number (including zero) of characters other than ':' followed by one colon, with the second group followed by any number of additional characters other than ':'.

word range or \w in negative lookbehind

I was trying to made regex for extracting word at the place of Delhi in text
sending to: GK Delhi, where the sending to: is fixed and i don't want to capture whatever at the place of GK. Actually GK will be one word in my case, what i made which should work is: (?<=sending to: \w )Delhi, means if word starts with sending to: and ends with Delhi then return Delhi.
Please help me to fix this.
Three points,
\w matches a single word character. Use \w+ to match one or more or \w* to match zero or more word characters.
Don't forget about space between DK and Delhi: \s+.
Just a note: The (?<= construct is the positive lookbehind, not negative one.
So the regex could look like this:
(?<=sending to:\s*\w+\s+)Delhi
Please also note that arbitrary-length lookbehind is only supported by very few regex engines, but you didn't say anything about the tool you are using.
Update:
Java doesn't support arbitrary-length lookbehind expressions.
The possibilities you have are:
The matched text will always be Delhi (on successful match). So if you are only checking for a match, then you could just use the regex: sending to:\s*\w+\s+Delhi.
If you want to extend the regex to other towns in future, then you could use a capturing group. The regex would be, for example, sending to:\s*\w+\s+(Delhi|Mumbai) and in Java code you would get the city name via matcher.group(1).
Please post your actual Java code of how you are using the regex if you want a more detailed advice.

Regular Expression of a Specific Word

I want to create a regular expression in java using standard libraries that will accommodate the following sentence:
12 of 128
Obviously the numbers can be anything though... From 1 digit to many
Also, I'm not sure how to accommodate the word "of" but I thought maybe something along the lines of:
[\d\sof\s\d]
This should work for you:
(\d+\s+of\s+\d+)
This will assume that you want to capture the full block of text as "one group", and there can be one-or-more whitespace characters in between each (if only one space, you can change \s+ to just \s).
If you want to capture the numbers separately, you can try:
(\d+)\s+of\s+(\d+)
You want this:
\d+\sof\s\d+
The relevant change from what you already had is the addition of the two plus signs. That means, that it should match multiple digits but at least one.
Sample: http://regexr.com?32cao
This regexp
"\\d+ of \\d+"
will match at least one to any number of digits, followed by string " of " followed by one to any number of digits.

Java - Unknown characters passing as [a-zA-z0-9]*?

I'm no expert in regex but I need to parse some input I have no control over, and make sure I filter away any strings that don't have A-z and/or 0-9.
When I run this,
Pattern p = Pattern.compile("^[a-zA-Z0-9]*$"); //fixed typo
if(!p.matcher(gottenData).matches())
System.out.println(someData); //someData contains gottenData
certain spaces + an unknown symbol somehow slip through the filter (gottenData is the red rectangle):
In case you're wondering, it DOES also display Text, it's not all like that.
For now, I don't mind the [?] as long as it also contains some string along with it.
Please help.
[EDIT] as far as I can tell from the (very large) input, the [?]'s are either white spaces either nothing at all; maybe there's some sort of encoding issue, also perhaps something to do with #text nodes (input is xml)
The * quantifier matches "zero or more", which means it will match a string that does not contain any of the characters in your class. Try the + quantifier, which means "One or more": ^[a-zA-Z0-9]+$ will match strings made up of alphanumeric characters only. ^.*[a-zA-Z0-9]+.*$ will match any string containing one or more alphanumeric characters, although the leading .* will make it much slower. If you use Matcher.lookingAt() instead of Matcher.matches, it will not require a full string match and you can use the regex [a-zA-Z0-9]+.
You have an error in your regex: instead of [a-zA-z0-9]* it should be [a-zA-Z0-9]*.
You don't need ^ and $ around the regex.
Matcher.matches() always matches the complete string.
String gottenData = "a ";
Pattern p = Pattern.compile("[a-zA-z0-9]*");
if (!p.matcher(gottenData).matches())
System.out.println("doesn't match.");
this prints "doesn't match."
The correct answer is a combination of the above answers. First I imagine your intended character match is [a-zA-Z0-9]. Note that A-z isn't as bad as you might think it include all characters in the ASCII range between A and z, which is the letters plus a few extra (specifically [,\,],^,_,`).
A second potential problem as Martin mentioned is you may need to put in the start and end qualifiers, if you want the string to only consists of letters and numbers.
Finally you use the * operator which means 0 or more, therefore you can match 0 characters and matches will return true, so effectively your pattern will match any input. What you need is the + quantifier. So I will submit the pattern you are most likely looking for is:
^[a-zA-Z0-9]+$
You have to change the regexp to "^[a-zA-Z0-9]*$" to ensure that you are matching the entire string
Looks like it should be "a-zA-Z0-9", not "a-zA-z0-9", try correcting that...
Did anyone consider adding space to the regex [a-zA-Z0-9 ]*. this should match any normal text with chars, number and spaces. If you want quotes and other special chars add them to the regex too.
You can quickly test your regex at http://www.regexplanet.com/simple/
You can check input value is contained string and numbers? by using regex ^[a-zA-Z0-9]*$
if your value just contained numberString than its show match i.e, riz99, riz99z
else it will show not match i.e, 99z., riz99.z, riz99.9
Example code:
if(e.target.value.match('^[a-zA-Z0-9]*$')){
console.log('match')
}
else{
console.log('not match')
}
}
online working example

Categories