Regex capturing: get only result from second group - java

I have a following string:
'pp_3', 365]
What comes after pp_ may have different length. What comes after , and is before ] is what I'd like to capture (and only it). Its length varies but it is always a number.
I've come up with (?<=pp_).*,(.*)(?=]). It gives 3', 365 as a full match and in group 1 there is what I want '365'. How do I get only 365 as a full match?
Please let me know if I am unable to explain my doubts. Thanks

Try this:
[^_]*_(\d*)'\s*,\s*(\-?\d+)\s*].
This regex captures 2 groups, that correspond to each of the numbers, the first after pp_ and the second after ', (which may be negative). If you don't want to capture the first one as a group, instead of (\d*), just write (?:\d*).

Try this expression. The second group should be what you're after:
(?<='pp_)(\d*', )(\d*)]

To match the digits only and if you want to make use of a positive lookbehind, you could make use of a quantifier in the lookbehind (which you can specify yourself) which is supported by Java
(?<=pp_[^,]{0,1000}, )\d+(?=])
Explanation
(?<= Positive lookbehind, assert what is on the left is
pp_[^,]{0,1000} Match pp_, match any char except , 0-1000 times
, Match a comma and space
) Close lookbehind
\d+ Match 1+ digits
(?=]) Positive lookahead, assert what is on the right is ]
In Java
String regex = "(?<=pp_[^,]{0,1000}, )\\d+(?=])";
Java demo
You could also use a capturing group instead:
pp_[^,]*, (\d+)]
Regex demo

Related

How to make Regex lookahead to match one and two digit numbers?

Let's say for example that I have a string reading "1this12string". I would like to use String#split with regex using lookahead that will give me ["1this", "12string"].
My current statement is (?=\d), which works very well for single digit numbers. I am having trouble modifying this statement to include both 1 and 2 digit numbers.
Add a look behind so you don't split within numbers:
(?<!\d)(?=\d)
See live demo.
If you really want to use Regex Lookahead, try this:
(\d{1,2}[^\d]*)(?=\d|\b)
Regex Demo
Note that this assume every string split must have 1 or 2 digits at the front. In case this is not the case, please let us know so that we can further enhance it.
Regex Logics
\d{1,2} to match 1 or 2 digits at the front
[^\d]* to match non-digit characters following the first 1 or 2 digit(s)
Enclose the the above 2 segments in parenthesis () so as to make it a capturing group for extraction of matched text.
(?=\d to fulfill your requirement to use Regex Lookahead
|\b to allow the matching text to be at the end of a text (just before a word boundary)
I think you can also achieve your task with a simpler regex, without using the relatively more sophisticated feature like Regex Lookahead. For example:
\d{1,2}[^\d]*
You can see in the Regex Demo that this works equally well for your sample input. Anyway, in case your requirement is anything more than this, please let us know to fine-tune it.
Use
String[] splits = string.split("(?<=\\D)(?=\\d)");
See regex proof
Explanation
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
\D non-digits (all but 0-9)
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
) end of look-ahead

How to match a string in this way?

I need to check if a String matches this specific pattern.
The pattern is:
(Numbers)(all characters allowed)(numbers)
and the numbers may have a comma ("." or ",")!
For instance the input could be 500+400 or 400,021+213.443.
I tried Pattern.matches("[0-9],?.?+[0-9],?.?+", theequation2), but it didn't work!
I know that I have to use the method Pattern.match(regex, String), but I am not being able to find the correct regex.
Dealing with numbers can be difficult. This approach will deal with your examples, but check carefully. I also didn't do "all characters" in the middle grouping, as "all" would include numbers, so instead I assumed that finding the next non-number would be appropriate.
This Java regex handles the requirements:
"((-?)[\\d,.]+)([^\\d-]+)((-?)[\\d,.]+)"
However, there is a potential issue in the above. Consider the following:
300 - -200. The foregoing won't match that case.
Now, based upon the examples, I think the point is that one should have a valid operator. The number of math operations is likely limited, so I would whitelist the operators in the middle. Thus, something like:
"((-?)[\\d,.]+)([\\s]*[*/+-]+[\\s]*)((-?)[\\d,.]+)"
Would, I think, be more appropriate. The [*/+-] can be expanded for the power operator ^ or whatever. Now, if one is going to start adding words (such as mod) in the equation, then the expression will need to be modified.
You can see this regular expression here
In your regex you have to escape the dot \. to match it literally and escape the \+ or else it would make the ? a possessive quantifier. To match 1+ digits you have to use a quantifier [0-9]+
For your example data, you could match 1+ digits followed by an optional part which matches either a dot or a comma at the start and at the end. If you want to match 1 time any character you could use a dot.
Instead of using a dot, you could also use for example a character class [-+*] to list some operators or list what you would allow to match. If this should be the only match, you could use anchors to assert the start ^ and the end $ of the string.
\d+(?:[.,]\d+)?.\d+(?:[.,]\d+)?
In Java:
String regex = "\\d+(?:[.,]\\d+)?.\\d+(?:[.,]\\d+)?";
Regex demo
That would match:
\d+(?:[.,]\d+)? 1+ digits followed by an optional part that matches . or , followed by 1+ digits
. Match any character (Use .+) to repeat 1+ times
Same as the first pattern

Match regex but only replace the first section - Java

I'm trying to take a phone number which can be in the format either +44 or +4 followed by any number of digits or hyphens, and replace the +44 or +4 with +44 or +4 followed by a space.
I believe I need a look around to match the full number but only replace the initial prefix, what I'm trying atm is
^[+]\d[0-9](?:([0-9]+))?
which matches the number (without hyphens) however I thought the lookahead would only match the number and not capture the extra digits however it seems to capture the whole thing.
Can anyone point me in the right direction as to what I've done wrong?
EDIT:
To be clearer my Java code is
Pattern pattern = Pattern.compile("^[+]\\d[0-9](?:([0-9]+))?");
if(pattern.matcher("+441234567890").matches())
String num = pattern.matcher(title).replaceFirst("$0 $1");
Thanks.
If you want to match whole number, but replace only part of it, you should not use positive lookahead, but just gruping, like in:
(^\+\d\d)([\d-]+)?
prefix will be in group 1, and the rest of number in group 2, so to add a space between these parts, just use something like group1 + space + group2.
In your example it should look like this:
Pattern pattern = Pattern.compile("(^\\+\\d\\d)([\\d-]+)?");
if(pattern.matcher("+441234567890").matches()) {
num = pattern.matcher(title).replaceFirst("$1 $2");
}
However this regex will always capture two digits in prefix, if you want to match +44 or +4 you should use:
(^\+(44|4))([\d-]+)?
so if you have more possible prefixes, you need to change this regex also.
You regex didn't work as you expected because (?:([0-9]+))? is a non capturing group, so the fragment matched by this part of regex was not captured, but it was still matched by whole regex. So $0 returned whole regex, and $1 should not return anything.

Regex Exceptions

I am trying to come up with a java regex that will match numbers with 2 too 3 decimals and not match any decimal number more than 3.
this is my regex
[0-9]{2}[.][0-9]{3}
It matches 41.51778000 and 18.740
but I only want it to match numbers that have exactly 3 decimal places and not numbers with more than three
You need to ask the regex to match the end and start as well.
^[0-9]{2}[.][0-9]{3}$
You must use word boundary on either side to stop unexpected matches:
\b[0-9]{2}[.][0-9]{2,3}\b
In Java it would be:
\\b\\d{2}\\.\\d{2,3}\\b
You can invoke Matcher#matches or String.matches instead of, say, Matcher#find to match the whole String.
Otherwise, you can prepend ^ and append $ to your pattern, to delimit start and end of input.
Finally, you can surround your pattern with something like \\D, or \\b or \\w to respectively match non-digits, word boundaries or whitespace around it, if you need to invoke find on an input containing more than 1 instance of the pattern.

How do I capture the text that is before and after a multiple regex matches in java?

Given a test string of:
I have a 1234 and a 2345 and maybe a 3456 id.
I would like to match all the IDs (the four digit numbers) AND at the same time get 12 characters of their surrounding text (before and after) (if any!)
So the matches should be:
BEFORE MATCH AFTER
Match #1: I have a- 1234 -and a 2345-
Match #2: -1234 and a- 2345 -and maybe a
Match #3: and maybe a- 3456 -id.
This (-) is a space character
Note:
The BEFORE match of Match #1 is not 12 characters long (not many characters at the beginning of the string). Same with the AFTER match of Match #3 (not many characters after the last match)
Can I achieve these matches with a single regex in java?
My best attempt so far is to use a positive look behind and an atomic group (to get the surrounding text) but it fails in the beginning and the end of the string when there are not enough characters (like my note above)
(?<=(.{12}))(\d{4})(?>(.{12}))
This matches only 2345. If I use a small enough value for the quantifiers (2 instead of 12, for example) then I correctly match all IDs.
Here is a link to my regex playground where I was trying my regex's:
http://regex101.com/r/cZ6wG4
When you look at the MatchResult (http://docs.oracle.com/javase/7/docs/api/java/util/regex/MatchResult.html) interface implemented by the Matcher class (http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html) you will find the functions start() and end() which give you the index of the first / last character of the match within the input string. Once you have the indicies, you can use some simple math and the substring function to extract the parts you want.
I hope this helps you, because I won't write the entire code for you.
There might be a possibility to do what you want purely with regex. But I think using the indicies and substring is easier (and probably more reliable)
You can do it in a single regex:
Pattern regex = Pattern.compile("(?<=^.{0,10000}?(.{0,12}))(\\d+)(?=(.{0,12}))");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
before = regexMatcher.group(1);
match = regexMatcher.group(2);
after = regexMatcher.group(3);
}
Explanation:
(?<= # Assert that the following can be matched before current position
^.{0,10000}? # Match as few characters as possible from the start of the string
(.{0,12}) # Match and capture up to 12 chars in group 1
) # End of lookbehind
(\d+) # Match and capture in group 2: Any number
(?= # Assert that the following can be matched here:
(.*) # Match and capture up to 12 chars in group 3
) # End of lookahead
You don't need a lookbehind or an atomic group for this, but you do need a lookahead:
(.{0,12}?)\b(\d+)\b(?=(.{0,12}))
I'm assuming your ID's are not enclosed in longer words (thus the \b). I used a reluctant quantifier in the leading portion ({0,12}?) to prevent it consuming more than one ID when they're spaced close to each other, and in:
I have a 1234, 2345 and 1456 id.

Categories