Extracting a substring from String with regex in Java (with condition) - java

I need to extract a substring from a string using regex. The tricky (for me) part is that the string may be in one of two formats:
either LLDDDDLDDDDDDD/DDD (eg. AB1000G242424/001) or just between 1 and 7 digits (eg. 242424).
The substring I need to extract would needs to be:
If string is 7 digits or longer, then extract substring consisting of 7 digits.
Else (if string is shorter than 7 digits), then extract substring consisting of 1-6 digits.
Below is one of my tries.
String regex = ("([0-9]{7}|[0-9]{0,6})");
Pattern pattern = Pattern.compile(regex);
Matcher matcher;
matcher = pattern.matcher("242424");
String extractedNr1 = "";
while (matcher.find()) {
extractedNr1 += matcher.group();
}
matcher = pattern.matcher("AB1000G242424/001");
String extractedNr2 = "";
while (matcher.find()) {
extractedNr2 += matcher.group();
}
System.out.println("ExtractedNr1 = " + extractedNr1);
System.out.println("ExtractedNr2 = " + extractedNr2);
Output:
ExtractedNr1 = 242424
ExtractedNr2 = 1000242424001
I understand the second one is a concat from all the groups, but don't understand why matches are arranged like that. Can I make a regex that will stop immidiately after finding a match (with priority for the first option, that is 7 digits)?
I thought about using some conditional statement, but apparently these are not supported in java.util.regex, and I cannot use third party library.
I can do this in java obviously, but the whole point is in using regex.

Regex is a secundary concern, the occurrences of digits must be compared by length. As in regex \d stand for digit and \D for non-digit you can use String.splitAsStream as follows:
Optional<String> digits takeDigits(String s) {
return s.splitAsStream("\\D+")
filter(w -> !w.isEmpty() && w.length() <= 7)
max(Comparator.comparingInt(String::length));
}

You can use String.replaceAll to remove the non-digit characters:
String extracted = new String("AB1000G242424/001").replaceAll("[^0-9]","");
if (extracted.length() > 7)
extracted = extracted.substring(0, 7);
Output:
1000242

Related

Get substring between "first two" occurrences of a character

I have a String:
String thestra = "/aaa/bbb/ccc/ddd/eee";
Every time, in my situation, for this Sting, a minimum of two slashes will be present without fail.
And I am getting the /aaa/ like below, which is the subString between "FIRST TWO occurrences" of the char / in the String.
System.out.println("/" + thestra.split("\\/")[1] + "/");
It solves my purpose but I am wondering if there is any other elegant and cleaner alternative to this?
Please notice that I need both slashes (leading and trailing) around aaa. i.e. /aaa/
You can use indexOf, which accepts a second argument for an index to start searching from:
int start = thestra.indexOf("/");
int end = thestra.indexOf("/", start + 1) + 1;
System.out.println(thestra.substring(start, end));
Whether or not it's more elegant is a matter of opinion, but at least it doesn't find every / in the string or create an unnecessary array.
Scanner::findInLine returning the first match of the pattern may be used:
String thestra = "/aaa/bbb/ccc/ddd/eee";
System.out.println(new Scanner(thestra).findInLine("/[^/]*/"));
Output:
/aaa/
Use Pattern and Matcher from java.util.regex.
Pattern pattern = Pattern.compile("/.*?/");
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
String match = matcher.group(0); // output
}
Pattern.compile("/.*?/")
.matcher(thestra)
.results()
.map(MatchResult::group)
.findFirst().ifPresent(System.out::println);
You can test this variant :)
With best regards, Fr0z3Nn
Every time, in my situation, for this Sting, minimum two slashes will be present
if that is guaranteed, split at each / keeping those delimeters and take the first three substrings.
String str = String.format("%s%s%s",(thestra.split("((?<=\\/)|(?=\\/))")));
You could also match the leading forward slash, then use a negated character class [^/]* to optionally match any character except / and then match the trailing forward slash.
String thestra = "/aaa/bbb/ccc/ddd/eee";
Pattern pattern = Pattern.compile("/[^/]*/");
Matcher matcher = pattern.matcher(thestra);
if (matcher.find()) {
System.out.println(matcher.group());
}
Output
/aaa/
One of the many ways can be replacing the string with group#1 of the regex, [^/]*(/[^/].*?/).* as shown below:
public class Main {
public static void main(String[] args) {
String thestra = "/aaa/bbb/ccc/ddd/eee";
String result = thestra.replaceAll("[^/]*(/[^/].*?/).*", "$1");
System.out.println(result);
}
}
Output:
/aaa/
Explanation of the regex:
[^/]* : Not the character, /, any number of times
( : Start of group#1
/ : The character, /
[^/]: Not the character, /
.*?: Any character any number of times (lazy match)
/ : The character, /
) : End of group#1
.* : Any character any number of times
Updated the answer as per the following valuable suggestion from Holger:
Note that to the Java regex engine, the / has no special meaning, so there is no need for escaping here. Further, since you’re only expecting a single match (the .* at the end ensures this), replaceFirst would be more idiomatic. And since there was no statement about the first / being always at the beginning of the string, prepending the pattern with either , .*? or [^/]*, would be a good idea.
I am surprised nobody mentioned using Path as of Java 7.
String thestra = "/aaa/bbb/ccc/ddd/eee";
String path = Paths.get(thestra).getName(0).toString();
System.out.println("/" + path + "/");
/aaa/
String thestra = "/aaa/bbb/ccc/ddd/eee";
System.out.println(thestra.substring(0, thestra.indexOf("/", 2) + 1));

Extract mobile number from string using regex

I want to extract mobile number from a string.
Example string is "Hi, Your Mobile no. is: 9876499321."
Now I want to extract "9876499321" from the string. My main string can have +919876499321 or 919876499321 or 09876499321 inside the string along with other words. How to achieve this?
Rules I want:
First of all remove all "-"
Then extract number that can range from 10 digit to 14 digit (inclusive)
I have tried this:
String myregex = "^\\d{10}$";
Pattern pattern = Pattern.compile(myregex);
Matcher matcher = pattern.matcher(inputStr);
while (matcher.find()) {
System.out.println(matcher.group());
}
I am not able to find any match.
You may remove all hyphens before passing the string to pattern.matcher and then match standalone numbers of 10 to 14 digits:
String inputStr = "Hi, Your Mobile no. is: 9876499321. Also, +919876499321 or 919876499321 or 09-876499321.";
String myregex = "(?<!\\d)\\d{10,14}(?!\\d)";
// Or String myregex = "\\b\\d{10,14}\\b";
Pattern pattern = Pattern.compile(myregex);
Matcher matcher = pattern.matcher(inputStr.replace("-", ""));
while(matcher.find()) {
System.out.println(matcher.group());
}
See the Java demo, output:
9876499321
919876499321
919876499321
09876499321
The (?<!\d)\d{10,14}(?!\d) pattern matches 10 to 14 digits only if they are not enclosed with other digits.
If it's always the last 10 digits of a 10+ digit string, you can do the following:
String myregex = "^.*(\\d{10})([^\\d].*|$)";
And use matcher.group(0) instead of matcher.group().

Java regex for detecting version with optional period

I'm trying to find a proper regex in java to detect all version 1 from large content. And I only care with just version 1, version 1.0, or version 1.0 but not 1.1. The test string can then be followed any other character or end of line.
How do I do that in java?
Thanks in advance
String regex="(version)(\\s)(1|1\\.0)";
Pattern p = Pattern.compile(regex);
Matcher m = null;
String testString1="version 1";
m = p.matcher(testString1);
System.out.println (m.find());
String testString2="version 1.0";
m = p.matcher(testString2);
System.out.println (m.find());
String testString3="version 1.1"; // should not match
m = p.matcher(testString3);
System.out.println (m.find());
If you have version string in a longer string then use this lookahead regex:
\bversion\s+1(?:\.0)?(?=\s|$)
RegEx Demo
In Java:
final String regex = "\\bversion\\s+1(?:\\.0)?(?=\\s|$)";
(?=\\s|$) is positive lookahead to assert that we have a whitespace or line end after version number.
Proposal:
"(version)\\s(1(\\.0)?)([^\\.0-9].*|$)"
The string "version" needs to be present
followed by any whitespace
then a single "1"
optionally followed by ".0"
and the next char either (cannot be "." or any digit (think 1.01 is forbidden, as well as 1.0.1)) or (is the end of the string)

Java Regex "-[0-9]{0,}" seems to match "-abc"

Regex:
"-[0-9]{0,}"
String:
"-abc"
According to the test here, that should not happen. I assume I'm doing something wrong in my code.
Code:
public static void main(String[] args) {
String s = "-abc";
String regex = "-[0-9]{0,}";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
if (matcher.group().length() == 0)
break;
// get the number less the dash
int beginIndex = matcher.start();
int endIndex = matcher.end();
String number = s.substring(beginIndex + 1, endIndex);
s = s.replaceFirst(regex, "negative " + number);
}
System.out.println(s);
}
Some context: The speech synthesis program I use cannot pronounce numbers with a leading negative sign, so it must be replaced with the word "negative".
-[0-9]{0,}
means your sting must have -, then could be 0 or more numbers.
so -abc is 0 number case
you didn't specify ^ and $, so your regex matches foo-bar or lll-0 even abc- as well
{0,} has exactly the same meaning as *. You regexp thus means "a dash that can be followed by digits". -abc contains a dash, so the pattern get found.
-\d+ should suit your needs better (don't forget to escape the backslash for java: -\\d+).
If you want the whole string to match the pattern, anchor your regexp with ^ and $: ^-\d+$.

regex pattern - extract a string only if separated by a hyphen

I've looked at other questions, but they didn't lead me to an answer.
I've got this code:
Pattern p = Pattern.compile("exp_(\\d{1}-\\d)-(\\d+)");
The string I want to be matched is: exp_5-22-718
I would like to extract 5-22 and 718. I'm not too sure why it's not working What am I missing? Many thanks
Try this one:
Pattern p = Pattern.compile("exp_(\\d-\\d+)-(\\d+)");
In your original pattern you specified that second number should contain exactly one digit, so I put \d+ to match as more digits as we can.
Also I removed {1} from the first number definition as it does not add value to regexp.
If the string is always prefixed with exp_ I wouldn't use a regular expression.
I would:
replaceFirst() exp_
split() the resulting string on -
Note: This answer is based on the assumptions. I offer it as a more robust if you have multiple hyphens. However, if you need to validate the format of the digits then a regular expression may be better.
In your regexp you missed required quantifier for second digit \\d. This quantifier is + or {2}.
String yourString = "exp_5-22-718";
Matcher matcher = Pattern.compile("exp_(\\d-\\d+)-(\\d+)").matcher(yourString);
if (matcher.find()) {
System.out.println(matcher.group(1)); //prints 5-22
System.out.println(matcher.group(2)); //prints 718
}
You can use the string.split methods to do this. Check the following code.
I assume that your strings starts with "exp_".
String str = "exp_5-22-718";
if (str.contains("-")){
String newStr = str.substring(4, str.length());
String[] strings = newStr.split("-");
for (String string : strings) {
System.out.println(string);
}
}

Categories