Extract mobile number from string using regex - java

I want to extract mobile number from a string.
Example string is "Hi, Your Mobile no. is: 9876499321."
Now I want to extract "9876499321" from the string. My main string can have +919876499321 or 919876499321 or 09876499321 inside the string along with other words. How to achieve this?
Rules I want:
First of all remove all "-"
Then extract number that can range from 10 digit to 14 digit (inclusive)
I have tried this:
String myregex = "^\\d{10}$";
Pattern pattern = Pattern.compile(myregex);
Matcher matcher = pattern.matcher(inputStr);
while (matcher.find()) {
System.out.println(matcher.group());
}
I am not able to find any match.

You may remove all hyphens before passing the string to pattern.matcher and then match standalone numbers of 10 to 14 digits:
String inputStr = "Hi, Your Mobile no. is: 9876499321. Also, +919876499321 or 919876499321 or 09-876499321.";
String myregex = "(?<!\\d)\\d{10,14}(?!\\d)";
// Or String myregex = "\\b\\d{10,14}\\b";
Pattern pattern = Pattern.compile(myregex);
Matcher matcher = pattern.matcher(inputStr.replace("-", ""));
while(matcher.find()) {
System.out.println(matcher.group());
}
See the Java demo, output:
9876499321
919876499321
919876499321
09876499321
The (?<!\d)\d{10,14}(?!\d) pattern matches 10 to 14 digits only if they are not enclosed with other digits.

If it's always the last 10 digits of a 10+ digit string, you can do the following:
String myregex = "^.*(\\d{10})([^\\d].*|$)";
And use matcher.group(0) instead of matcher.group().

Related

Extracting a substring from String with regex in Java (with condition)

I need to extract a substring from a string using regex. The tricky (for me) part is that the string may be in one of two formats:
either LLDDDDLDDDDDDD/DDD (eg. AB1000G242424/001) or just between 1 and 7 digits (eg. 242424).
The substring I need to extract would needs to be:
If string is 7 digits or longer, then extract substring consisting of 7 digits.
Else (if string is shorter than 7 digits), then extract substring consisting of 1-6 digits.
Below is one of my tries.
String regex = ("([0-9]{7}|[0-9]{0,6})");
Pattern pattern = Pattern.compile(regex);
Matcher matcher;
matcher = pattern.matcher("242424");
String extractedNr1 = "";
while (matcher.find()) {
extractedNr1 += matcher.group();
}
matcher = pattern.matcher("AB1000G242424/001");
String extractedNr2 = "";
while (matcher.find()) {
extractedNr2 += matcher.group();
}
System.out.println("ExtractedNr1 = " + extractedNr1);
System.out.println("ExtractedNr2 = " + extractedNr2);
Output:
ExtractedNr1 = 242424
ExtractedNr2 = 1000242424001
I understand the second one is a concat from all the groups, but don't understand why matches are arranged like that. Can I make a regex that will stop immidiately after finding a match (with priority for the first option, that is 7 digits)?
I thought about using some conditional statement, but apparently these are not supported in java.util.regex, and I cannot use third party library.
I can do this in java obviously, but the whole point is in using regex.
Regex is a secundary concern, the occurrences of digits must be compared by length. As in regex \d stand for digit and \D for non-digit you can use String.splitAsStream as follows:
Optional<String> digits takeDigits(String s) {
return s.splitAsStream("\\D+")
filter(w -> !w.isEmpty() && w.length() <= 7)
max(Comparator.comparingInt(String::length));
}
You can use String.replaceAll to remove the non-digit characters:
String extracted = new String("AB1000G242424/001").replaceAll("[^0-9]","");
if (extracted.length() > 7)
extracted = extracted.substring(0, 7);
Output:
1000242

JAVA split with regex doesn't work

I have the following String 46MTS007 and i have to split numbers from letters so in result i should get an array like {"46", "MTS", "007"}
String s = "46MTS007";
String[] spl = s.split("\\d+|\\D+");
But spl remains empty, what's wrong with the regex? I've tested in regex101 and it's working like expected (with global flag)
If you want to use split you can use this lookaround based regex:
(?<=\d)(?=\D)|(?<=\D)(?=\d)
RegEx Demo
Which means split the places where next position is digit and previous is non-digit OR when position is non-digit and previous position is a digit.
In Java:
String s = "46MTS007";
String[] spl = s.split("(?<=\\d)(?=\\D)|(?<=\\D)(?=\\d)");
Regex you're using will not split the string. Split() splits the string with regex you provide but regex used here matches with whole string not the delimiter. You can use Pattern Matcher to find different groups in a string.
public static void main(String[] args) {
String line = "46MTS007";
String regex = "\\D+|\\d+";
Pattern pattern = Pattern.compile(regex);
Matcher m = pattern.matcher(line);
while(m.find())
System.out.println(m.group());
}
Output:
46
MTS
007
Note: Don't forget to user m.find() after capturing each group otherwise it'll not move to next one.

Regex for find data

I have used this (?:#\d{7}) regex for extracting only 7 digit after '#'.
For example I have string something like "#1234567890". After using the above patterrn I will get 7 digit after '#'.
Now the problem is : I have string something like that "Referenc number #1234567890"
where "Referenc number #" fixed.
Now I am finding for regex which can return the 1234567 number from the above string.
I have a one file which contains above string and there are also other data available.
You can try something like this:
String ref_no = "Referenc number #123456789";
Pattern p = Pattern.compile("Referenc number #([0-9]{7})");
Matcher m = p.matcher(ref_no);
while (m.find())
{
System.out.println(m.group(1));
}
The ?: should make your group "non-capturing", so if you add that separately around the hash sign, it should used for matching but excluded from capture.
(?:#)(\d{7})
If the String always starts with Referenc number # you could just use the following code:
String text = "Referenc number #1234567890";
Pattern pattern = Pattern.compile("\\d{7}");
Matcher matcher = pattern.matcher(text);
while(matcher.find()){
System.out.println(matcher.group());
}

Extract only the numbers from String

I need a Regex that given the following Strings: "12.123.123/1234-11", "12.123123123411" or "1123123/1234-11".
I could extract only the numbers(12123123123411);
Pattern padrao = Pattern.compile("\d+");
Matcher matcher = padrao.matcher("12.123.123/1234-11");
while (matcher.find()) {
System.out.println(matcher.group());
}
//output:12,123,123,1234,11,
//I need: 121231234123411
Can anyone help me?
A better way would be use String#replaceAll(regex, replacement) method to replace all characters except digits (As you see, the method takes a regex for replacing):
String str = "12.123.123/1234-11";
String digits = str.replaceAll("\\D", "");
\\D matches non-digit characters. Equivalent to [^0-9].
Note that, you need to escape the \D on Java regex engine.
If you have restriction for using Matcher#group() method, then you would have to build a StringBuilder instance, appending digits, everytime they are found:
String str = "12.123.123/1234-11";
StringBuilder digits = new StringBuilder();
Matcher matcher = Pattern.compile("\\d+").matcher(str);
while (matcher.find()) {
digits.append(matcher.group());
}
System.out.println(digits);
You could simply remove all the non-digit characters through replaceAll:
String out = string.replaceAll("\\D+", "");

Retrieve numbers separated by '-'

Lets say I have a large amount of (random) text. Within this text there is a phone number, consisting of three digits, a dash, another three digits, a dash, and four digits. For example, XXX-XXX-XXXX. What would be the regex for retrieving this number from the text. I tried using:
Matcher matcher = pattern.matcher(previousText);
Pattern pattern2 = Pattern.compile(".*(\\d\\d\\d-\\d\\d\\d-\\d\\d\\d\\d).*")
Matcher matcher2 = pattern2.matcher(currentText);
Now, I though it would work, but it doesn't. Please help.
The regex: \d{3}-\d{3}-\d{4}
Pattern pattern = Pattern.compile(".*(\\d{3}-\\d{3}-\\d{4}).*");
Matcher matcher = pattern.matcher(text);
if (matcher.find()) {
String number = matcher.group(1);
System.out.println(number);
}

Categories