Java - Extract strings with Regex - java

I've this string
String myString ="A~BC~FGH~~zuzy|XX~ 1234~ ~~ABC~01/01/2010 06:30~BCD~01/01/2011 07:45";
and I need to extract these 3 substrings
1234
06:30
07:45
If I use this regex \\d{2}\:\\d{2} I'm only able to extract the first hour 06:30
Pattern depArrHours = Pattern.compile("\\d{2}\\:\\d{2}");
Matcher matcher = depArrHours.matcher(myString);
String firstHour = matcher.group(0);
String secondHour = matcher.group(1); (IndexOutOfBoundException no Group 1)
matcher.group(1) throws an exception.
Also I don't know how to extract 1234. This string can change but it always comes after 'XX~ '
Do you have any idea on how to match these strings with regex expressions?
UPDATE
Thanks to Adam suggestion I've now this regex that match my string
Pattern p = Pattern.compile(".*XX~ (\\d{3,4}).*(\\d{1,2}:\\d{2}).*(\\d{1,2}:\\d{2})";
I match the number, and the 2 hours with matcher.group(1); matcher.group(2); matcher.group(3);

The matcher.group() function expects to take a single integer argument: The capturing group index, starting from 1. The index 0 is special, which means "the entire match". A capturing group is created using a pair of parenthesis "(...)". Anything within the parenthesis is captures. Groups are numbered from left to right (again, starting from 1), by opening parenthesis (which means that groups can overlap). Since there are no parenthesis in your regular expression, there can be no group 1.
The javadoc on the Pattern class covers the regular expression syntax.
If you are looking for a pattern that might recur some number of times, you can use Matcher.find() repeatedly until it returns false. Matcher.group(0) once on each iteration will then return what matched that time.
If you want to build one big regular expression that matches everything all at once (which I believe is what you want) then around each of the three sets of things that you want to capture, put a set of capturing parenthesis, use Matcher.match() and then Matcher.group(n) where n is 1, 2 and 3 respectively. Of course Matcher.match() might also return false, in which case the pattern did not match, and you can't retrieve any of the groups.
In your example, what you probably want to do is have it match some preceding text, then start a capturing group, match for digits, end the capturing group, etc...I don't know enough about your exact input format, but here is an example.
Lets say I had strings of the form:
Eat 12 carrots at 12:30
Take 3 pills at 01:15
And I wanted to extract the quantity and times. My regular expression would look something like:
"\w+ (\d+) [\w ]+ (\d{1,2}:\d{2})"
The code would look something like:
Pattern p = Pattern.compile("\\w+ (\\d+) [\\w ]+ (\\d{2}:\\d{2})");
Matcher m = p.matcher(oneline);
if(m.matches()) {
System.out.println("The quantity is " + m.group(1));
System.out.println("The time is " + m.group(2));
}
The regular expression means "a string containing a word, a space, one or more digits (which are captured in group 1), a space, a set of words and spaces ending with a space, followed by a time (captured in group 2, and the time assumes that hour is always 0-padded out to 2 digits). I would give a closer example to what you are looking for, but the description of the possible input is a little vague.

Related

Using regex get particular value

I have a response like below
2020 Aug 05 09:31:25.646515 arrisxg1v4 WPEWebProcess[22024]: [AAMP-PLAYER]NotifyBitRateChangeEvent :: bitrate:2800000 desc:BitrateChanged - Network adaptation width:1280 height:720 fps:25.000000 position:256.000000
From this using regex how I can retrieve position value only as Integer. Using java code I can get it using .split method But How I can get this value 256 using Regex?
Try this one-liner:
String positionStr = str.replaceAll("(?:(?!position:).)*(?:position:(\\d+))?.*", "$1");
Integer position = positionStr.isEmpty() ? null : new Integer(positionStr);
This regex matches the whole string, capturing the target position value in group 1 ((?:...) is a non-capturing group), and replaces the match (ie everything) with the captured group. This effectively deletes everything you don't want.
Conveniently, because the capture is optional (has a quantifier of ?), if the input does not have a position: value, the result is a blank string.
The negative lookahead (?!position:). prevents the dot running past our target. Without the negative lookahead, the first dot would consume the entire input.
You could try something like this:
Pattern pattern = Pattern.compile(".*position:(\\d+(\\.\\d+))");
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
String position = matcher.group(1);
}
Basically what this expression says is:
Accept anything up until the word position
Accept a colon :
Accept 1 or more digits (0-9), optionally followed by a dot and 1 or more digits
Then by using matcher.group you take everything between parentheses starting at the 1st parenthesis from the left.

can deal with the first line space when i use regex for polynomials

here is my code
String a = "X^5+2X^2+3X^3+4X^4";
String exp[]=a.split("(|\\+\\d)[xX]\\^");
for(int i=0;i<exp.length;i++) {
System.out.println("exp: "+exp[i]+" ");
}
im try to find the output which is 5,2,3,4
but instead i got this answer
exp:
exp:5
exp:2
exp:3
exp:4
i dont know where is the first line space come from, and i cannot find a will to get rid of that, i try to use others regex for this and also use compile,still can get rid of the first line, i try to use new string "X+X^5+2X^2+3X^3+4X^4";the first line shows exp:X.
and i also use online regex compiler to try my problem, but their answer is 5,2,3,4, buy eclipse give a space ,and then 5,2,3,4 ,need a help to figure this out
Try to use regex, e.g:
String input = "X^5+2X^2+3X^3+4X^4";
Pattern pattern = Pattern.compile("\\^([0-9]+)");
Matcher matcher = pattern.matcher(input);
for (int i = 1; matcher.find(); i++) {
System.out.println("exp: " + matcher.group(1));
}
It gives output:
exp: 5
exp: 2
exp: 3
exp: 4
How does it work:
Pattern used: \^([0-9]+)
Which matches any strings starting with ^ followed by 1 or more digits (note the + sign). Dash (^) is prefixed with backslash (\) because it has a special meaning in regular expressions - beginning of a string - but in Your case You just want an exact match of a ^ character.
We want to wrap our matches in a groups to refer to them late during matching process. It means we need to mark them using parenthesis ( and ).
Then we want to pu our pattern into Java String. In String literal, \character has a special meaning - it is used as a control character, eg "\n" represents a new line. It means that if we put our pattern into String literal, we need to escape a \ so our pattern becomes: "\\^([0-9]+)". Note double \.
Next we iterate through all matches getting group 1 which is our number match. Note that a ^.character is not covered in our match even if it is a part of our pattern. It is so because wr used parenthesis to mark our searched group, which in our case are only digits
Because you are using the split method which looks for the occurrence of the regex and, well.. splits the string at this position. Your string starts with X^ so it very much matches your regex.

minimum number in a string should be 1 regex validation?

I have a String which I need to match. Meaning it should only contains a number followed by space or just a number and minimum number should be 1 always. For ex:
3 1 2
1 p 3
6 3 2
0 3 2
First and third are valid string and all other are not.
I came up with below regex but I am not sure how can I check for minimum number in that string should be 1 always?
str.matches("(\\d|\\s)+")
Regex used from here
Just replace \\d with [1-9].
\\d is just a shorthand for the class [0-9].
This is a better regex though: ([1-9]\\s)*[1-9]$, as it takes care of double digit issues and won't allow space at the end.
Not everything can or should be solved with regular expressions.
You could use a simple expression like
str.matches("((\\d+)\\s)+")
or something alike to simply check that your input line contains only groups of digits followed by one or more spaces.
If that matches, you split along the spaces and for each group of digits you turn it into a number and validate against the valid range.
I have a gut feeling that regular expressions are actually not sufficient for the kind of validation you need.
If it should only contains a number followed by space or just a number and minimum number should be 1 and number can also be larger than 10 you might use:
^[1-9]\\d*(?: [1-9]\\d*)*$
Note that if you want to match a space only, instead of using \s which matches more you could just add a space in the pattern.
Explanation
^ Assert the start of the string
[1-9]\\d* Match a number from 1 up
(?: [1-9]\\d*)* Repeat a number from 1 up with a prepended space
$ Assert end of the string
Regex demo
Regex is part of the solution. But I don't think that regex alone can solve your problem.
This is my proposed solution:
private static boolean isValid(String str) {
Pattern pattern = Pattern.compile("[(\\d+)\\s]+");
Matcher matcher = pattern.matcher(str);
return matcher.matches() && Arrays.stream(Arrays.stream(matcher.group().split(" "))
.mapToInt(Integer::parseInt)
.toArray()).min().getAsInt() == 1;
}
Pay attention to the mathing type: matcher.matches() - to check match against the entire input. (don't use matcher.find() - because it will not reject invalid input such as "1 p 2")

Java Regex capture multiple groups with groups containing others

I'm trying to build a regular expression which captures multiple groups, with some of them being contained in others. For instance, let's say I want to capture every 4-grams that follows a 'to' prefix:
input = "I want to run to get back on shape"
expectedOutput = ["run to get back", "get back on shape"]
In that case I would use this regex:
"to((?:[ ][a-zA-Z]+){4})"
But it only captures the first item in expectedOutput (with a space prefix but that's not the point).
This is quite easy to solve without regex, but I'd like to know if it is possible only using regex.
You can make use of a regex overlapping mstrings:
String s = "I want to run to get back on shape";
Pattern pattern = Pattern.compile("(?=\\bto\\b((?:\\s*[\\p{L}\\p{M}]+){4}))");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1).trim());
}
See IDEONE demo
The regex (?=\bto\b((?:\s*[\p{L}\p{M}]+){4})) checks each location in the string (since it is a zero width assertion) and looks for:
\bto\b - a whole word to
((?:\s*[\p{L}\p{M}]+){4}) - Group 1 capturing 4 occurrences of
\s* zero or more whitespace(s)
[\p{L}\p{M}]+ - one or more letters or diacritics
If you want to allow capturing fewer than 4 ngrams, use a {0,4} (or {1,4} to require at least one) greedy limiting quantifier instead of {4}.
It is the order of groups in Regex
1 ((A)(B(C))) // first group (surround two other inside this)
2 (A) // second group ()
3 (B(C)) // third group (surrounded one other group)
4 (C) // forth group ()

Match regex but only replace the first section - Java

I'm trying to take a phone number which can be in the format either +44 or +4 followed by any number of digits or hyphens, and replace the +44 or +4 with +44 or +4 followed by a space.
I believe I need a look around to match the full number but only replace the initial prefix, what I'm trying atm is
^[+]\d[0-9](?:([0-9]+))?
which matches the number (without hyphens) however I thought the lookahead would only match the number and not capture the extra digits however it seems to capture the whole thing.
Can anyone point me in the right direction as to what I've done wrong?
EDIT:
To be clearer my Java code is
Pattern pattern = Pattern.compile("^[+]\\d[0-9](?:([0-9]+))?");
if(pattern.matcher("+441234567890").matches())
String num = pattern.matcher(title).replaceFirst("$0 $1");
Thanks.
If you want to match whole number, but replace only part of it, you should not use positive lookahead, but just gruping, like in:
(^\+\d\d)([\d-]+)?
prefix will be in group 1, and the rest of number in group 2, so to add a space between these parts, just use something like group1 + space + group2.
In your example it should look like this:
Pattern pattern = Pattern.compile("(^\\+\\d\\d)([\\d-]+)?");
if(pattern.matcher("+441234567890").matches()) {
num = pattern.matcher(title).replaceFirst("$1 $2");
}
However this regex will always capture two digits in prefix, if you want to match +44 or +4 you should use:
(^\+(44|4))([\d-]+)?
so if you have more possible prefixes, you need to change this regex also.
You regex didn't work as you expected because (?:([0-9]+))? is a non capturing group, so the fragment matched by this part of regex was not captured, but it was still matched by whole regex. So $0 returned whole regex, and $1 should not return anything.

Categories