How to extract specific substring from an expression in java - java

I have the following text:
Units Currently On Bed List
[total beds=0]
Number Of Beds Unit Interval Select All
The number after '=' is dynamic and subject to change. How can I extract the number in java using regex?

If you mean "number after = is dynamic and subject to change.", then for your example data you could for example capture the number in a group:
\[.+?=(\d+)\]
Match a \[
Match any character one or more times non greedy .+?
Match an equal sign =
Capture 1 or more digits (\d+)
Match a \]

Not sute using regex but this is one way :
int first = str.indexOf('=') +1;
int last = str.lastIndexOf(']');
String nbr = str.subString(first,last );
int number = Integer.parseInt(nbr);

Related

Use regex to get 2 specific groups of substring

String s = #Section250342,Main,First/HS/12345/Jack/M,2000 10.00,
#Section250322,Main,First/HS/12345/Aaron/N,2000 17.00,
#Section250399,Main,First/HS/12345/Jimmy/N,2000 12.00,
#Section251234,Main,First/HS/12345/Jack/M,2000 11.00
Wherever there is the word /Jack/M in the3 string, I want to pull the section numbers(250342,251234) and the values(10.00,11.00) associated with it using regex each time.
I tried something like this https://regex101.com/r/4te0Lg/1 but it is still messed.
.Section(\d+(?:\.\d+)?).*/Jack/M
If the only parts of each section that change are the section number, the name of the person and the last value (like in your example) then you can make a pattern very easily by using one of the sections where Jack appears and replacing the numbers you want by capturing groups.
Example:
#Section250342,Main,First/HS/12345/Jack/M,2000 10.00
becomes,
#Section(\d+),Main,First/HS/12345/Jack/M,2000 (\d+.\d{2})
If the section substring keeps the format but the other parts of it may change then just replace the rest like this:
#Section(\d+),\w+,(?:\w+/)*Jack/M,\d+ (\d+.\d{2})
I'm assuming that "Main" is a class, "First/HS/..." is a path and that the last value always has 2 and only 2 decimal places.
\d - A digit: [0-9]
\w - A word character: [a-zA-Z_0-9]
+ - one or more times
* - zero or more times
{2} - exactly 2 times
() - a capturing group
(?:) - a non-capturing group
For reference see: https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/regex/Pattern.html
Simple Java example on how to get the values from the capturing groups using java.util.regex.Pattern and java.util.regex.Matcher
import java.util.regex.*;
public class GetMatch {
public static void main(String[] args) {
String s = "#Section250342,Main,First/HS/12345/Jack/M,2000 10.00,#Section250322,Main,First/HS/12345/Aaron/N,2000 17.00,#Section250399,Main,First/HS/12345/Jimmy/N,2000 12.00,#Section251234,Main,First/HS/12345/Jack/M,2000 11.00";
Pattern p = Pattern.compile("#Section(\\d+),\\w+,(?:\\w+/)*Jack/M,\\d+ (\\d+.\\d{2})");
Matcher m;
String[] tokens = s.split(",(?=#)"); //split the sections into different strings
for(String t : tokens) //checks every string that we got with the split
{
m = p.matcher(t);
if(m.matches()) //if the string matches the pattern then print the capturing groups
System.out.printf("Section: %s, Value: %s\n", m.group(1), m.group(2));
}
}
}
You could use 2 capture groups, and use a tempered greedy token approach to not cross #Section followed by a digit.
#Section(\d+)(?:(?!#Section\d).)*\bJack/M,\d+\h+(\d+(?:\.\d+)?)\b
Explanation
#Section(\d+) Match #Section and capture 1+ digits in group 1
(?:(?!#Section\d).)* Match any character if not directly followed by #Section and a digit
\bJack/M, Match the word Jack and /M,
\d+\h+ Match 1+ digits and 1+ spaces
(\d+(?:\.\d+)?) Capture group 2, match 1+ digits and an optional decimal part
\b A word boundary
Regex demo
In Java:
String regex = "#Section(\\d+)(?:(?!#Section\\d).)*\\bJack/M,\\d+\\h+(\\d+(?:\\.\\d+)?)\\b";

Mask mobile number in Java [duplicate]

I would like to mask the last 4 digits of the identity number (hkid)
A123456(7) -> A123***(*)
I can do this by below:
hkid.replaceAll("\\d{3}\\(\\d\\)", "***(*)")
However, can my regular expression really can match the last 4 digit and replace by "*"?
hkid.replaceAll(regex, "*")
Please help, thanks.
Jessie
Personally, I wouldn't do it with regular expressions:
char[] cs = hkid.toCharArray();
for (int i = cs.length - 1, d = 0; i >= 0 && d < 4; --i) {
if (Character.isDigit(cs[i])) {
cs[i] = '*';
++d;
}
}
String masked = new String(cs);
This goes from the end of the string, looking for digit characters, which it replaces with a *. Once it's found 4 (or reaches the start of the string), it stops iterating, and builds a new string.
While I agree that a non-regex solution is probably the simplest and fastest, here's a regex to catch the last 4 digits independent if there is a grouping ot not: \d(?=(?:\D*\d){0,3}\D*$)
This expression is meant to match any digit that is followed by 0 to 3 digits before hitting the end of the input.
A short breakdown of the expression:
\d matches a single digit
\D matches a single non-digit
(?=...) is a positive look-ahead that contributes to the match but isn't consumed
(?:...){0,3} is a non-capturing group with a quantity of 0 to 3 occurences given.
$ matches the end of the input
So you could read the expression as follows: "match a single digit if it is followed by a sequence of 0 to 3 times any number of non-digits which are followed by a single digit and that sequence is followed by any number of non-digits and the end of the input" (sounds complicated, no?).
Some results when using input.replaceAll( "\\d(?=(?:\\D*\\d){0,3}\\D*$)", "*" ):
input = "A1234567" -> output = "A123****"
input = "A123456(7)" -> output = "A123***(*)"
input = "A12345(67)" -> output = "A123**(**)"
input = "A1(234567)" -> output = "A1(23****)"
input = "A1234B567" -> output = "A123*B***"
As you can see in the last example the expression will match digits only. If you want to match letters as well either replace \d and \D with \w and \W (note that \w matches underscores as well) or use custom character classes, e.g. [02468] and [^02468] to match even digits only.

Matching a sequence of dot-separated digits of variable length with regular expressions

I am parsing text from an Excel spread sheet using Java.
I need to validate whether a sequence of 3 integers is present in the text.
The sequence of integers is:
comma separated inside
whitespace-delimited outside
Integers in the sequence can either have 1 or 2 digits.
This is my attempt:
*((\d|\d\d)[^\w](\d|\d\d)[^\w](\d|\d\d))*
With the * meaning that I can have characters before it, and the [\d|\d\d] being a number of either one or two digits, and [^\w] being a non word character?
Valid text: CPI WEIGHTS 05.1.2 : CARPETS & OTHER FLOOR COVERINGS
Invalid text: CPIH INDEX 05.2 : HOUSEHOLD TEXTILES 2005=100
Your last comment actually clarifies the question a bit.
Assuming you are looking for a dot-separated sequence of 1 or 2 digits, externally delimited by whitespace, here's an example:
String ok = "CPI WEIGHTS 05.1.2 : CARPETS & OTHER FLOOR COVERINGS";
String notOk = "CPIH INDEX 05.2 : HOUSEHOLD TEXTILES 2005=100";
Pattern p = Pattern.compile("(\\d{1,2}(\\.|\\s)){3}");
Matcher m = p.matcher(ok);
while (m.find()) {
System.out.printf("Found: %s%n", m.group());
}
m = p.matcher(notOk);
while (m.find()) {
System.out.printf("Found: %s%n", m.group());
}
Output
Found: 05.1.2

Regular expression for phone number starting with '00' or '+'

I've got a regex problem: I'm trying to force a phone number beginning with either "00" or "+" but my attempt doesn't work.
String PHONE_PATTERN = "^[(00)|(+)]{1}[0-9\\s.\\/-]{6,20}$";
It still allows for example "0123-45678". What am i doing wrong?
Inside character class every character is matched literally, which means [(00)|(+)] will match a 0 or + or | or ( or )
Use this regex:
String PHONE_PATTERN = "^(?:00|\\+)[0-9\\s.\\/-]{6,20}$";
if you have removed spaces, hyphens and whatever from the number, and you want to catch either +xxnnnnnnnn or 00xxnnnnnnnn where xx is the country code of course and n is the 9 digit number OR 0nnnnnnnnn where a non international number starting with a zero is followed by 9 digits then try this regex
String PHONE_PATTERN = "^(?:(?:00|\+)\d{2}|0)[1-9](?:\d{8})$"

Regex to get first number in string with other characters

I'm new to regular expressions, and was wondering how I could get only the first number in a string like 100 2011-10-20 14:28:55. In this case, I'd want it to return 100, but the number could also be shorter or longer.
I was thinking about something like [0-9]+, but it takes every single number separately (100,2001,10,...)
Thank you.
/^[^\d]*(\d+)/
This will start at the beginning, skip any non-digits, and match the first sequence of digits it finds
EDIT:
this Regex will match the first group of numbers, but, as pointed out in other answers, parseInt is a better solution if you know the number is at the beginning of the string
Try this to match for first number in string (which can be not at the beginning of the string):
String s = "2011-10-20 525 14:28:55 10";
Pattern p = Pattern.compile("(^|\\s)([0-9]+)($|\\s)");
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(2));
}
Just
([0-9]+) .*
If you always have the space after the first number, this will work
Assuming there's always a space between the first two numbers, then
preg_match('/^(\d+)/', $number_string, $matches);
$number = $matches[1]; // 100
But for something like this, you'd be better off using simple string operations:
$space_pos = strpos($number_string, ' ');
$number = substr($number_string, 0, $space_pos);
Regexs are computationally expensive, and should be avoided if possible.
the below code would do the trick.
Integer num = Integer.parseInt("100 2011-10-20 14:28:55");
[0-9] means the numbers 0-9 can be used the + means 1 or more times. if you use [0-9]{3} will get you 3 numbers
Try ^(?'num'[0-9]+).*$ which forces it to start at the beginning, read a number, store it to 'num' and consume the remainder without binding.
This string extension works perfectly, even when string not starts with number.
return 1234 in each case - "1234asdfwewf", "%sdfsr1234" "## # 1234"
public static string GetFirstNumber(this string source)
{
if (string.IsNullOrEmpty(source) == false)
{
// take non digits from string start
string notNumber = new string(source.TakeWhile(c => Char.IsDigit(c) == false).ToArray());
if (string.IsNullOrEmpty(notNumber) == false)
{
//replace non digit chars from string start
source = source.Replace(notNumber, string.Empty);
}
//take digits from string start
source = new string(source.TakeWhile(char.IsDigit).ToArray());
}
return source;
}
NOTE: In Java, when you define the patterns as string literals, do not forget to use double backslashes to define a regex escaping backslash (\. = "\\.").
To get the number that appears at the start or beginning of a string you may consider using
^[0-9]*\.?[0-9]+ # Float or integer, leading digit may be missing (e.g, .35)
^-?[0-9]*\.?[0-9]+ # Optional - before number (e.g. -.55, -100)
^[-+]?[0-9]*\.?[0-9]+ # Optional + or - before number (e.g. -3.5, +30)
See this regex demo.
If you want to also match numbers with scientific notation at the start of the string, use
^[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Just number
^-?[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Number with an optional -
^[-+]?[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Number with an optional - or +
See this regex demo.
To make sure there is no other digit on the right, add a \b word boundary, or a (?!\d)
or (?!\.?\d) negative lookahead that will fail the match if there is any digit (or . and a digit) on the right.
public static void main(String []args){
Scanner s=new Scanner(System.in);
String str=s.nextLine();
Pattern p=Pattern.compile("[0-9]+");
Matcher m=p.matcher(str);
while(m.find()){
System.out.println(m.group()+" ");
}
\d+
\d stands for any decimal while + extends it to any other decimal coming directly after, until there is a non number character like a space or letter

Categories