Regex to match the nearest character backwards - java

I have this string:
P.1 P.2 P.3 P.4
ASTON VETERINARY HOSPITAL
Page 1/2
00 PennelJ Road
Media, PA 19063-5983
(610) 474-5670
Client :
I want to get the text in between Client and P.\d. Here is the demo: Regex
(P.\d)[\s\S]*(?=^.+Client :?)
The problem is that it matches from the first Page P.1. I need the nearest P.\d before Client.
How to change the regex so that it would match from P.4.

I tried this with non-greedy operators but that's not going to work. I'd try to move away from having the entire regex match precisely what you want, and use groups instead. Then you can just write a matcher to match any number of those P.1 constructs, and it makes your scan for the Client string at the end a lot simpler because you don't have to try to do it as a lookahead. Thus:
String x = "P.1 P.2 P.3 P.4 foobar Client :";
Pattern p = Pattern.compile("((P\\.\\d)(.*(P\\.\\d))*)+(?<result>.*)Client");
Matcher m = p.matcher(x);
System.out.println(m.find());
System.out.println(m.group("result"));
Seems to produce precisely what you want. The syntax (?<whatever>REGEX HERE) is regular-expression-ese for: Let me grab just this bit later by asking for the group 'whatever'.

Related

Java Regex Look-Behind Doesn't Work

So I am working on regex comparing phone numbers and this is the result:
(?:(?:0{2}|\+)?([1-9][0-9]))? ?([1-9][0-9])? ?([1-9][0-9]{5})
As you can see there are spaces between the numbers. I want them to appear only when there is some other number before the space so:
"0022 45 432345" - should match
"45 345678" or "560032" - should match
" 324400" - shouldn't match because of the space in the beginning
I've been reading different tutorials about regexes and found out about look-behinds, but simple construction like that(just for test):
Pattern p2 = Pattern.compile("(?<=abc)aa");
Matcher m2 = p2.matcher("abcaa");
doesn't work.
Can you tell me what's wrong?
Another problem is - I want a character only happen when it is THE FIRST character in a string, otherwise it shouldn't occur. So the code:
0043 022 234567 should not work, but 022 123450 should match.
I'm stuck right now and would appreciate any help a lot.
This should work just fine. The spaces are moved into the optional groups and are themselves optional. This way, they only match if the group before them is present, but even then they are still optional. No look-behind required.
(?:(?:(?:00|\+)?([1-9][0-9]) ?)?([1-9][0-9]) ?)?([1-9][0-9]{5})
Lookbehind is a zero length match.
The javadoc for the Matcher.matches method determines if the whole String is a match.
What you're looking for is something the Matcher.find and Matcher.group methods. Something like:
final Pattern pattern = Pattern.compile("(?<=abc)aa");
final Matcher matcher = pattern.matcher("abaca");
final String subMatch;
if (matcher.find()) {
subMatch = matcher.group();
} else {
subMatch = "";
}
System.out.println(subMatch);
Example.

Capture Regex repeating string between slashes in URL

I have following partial URL that can be
/it/xyz/test/param+1/param-2/1234/gfd4
Basically two letter at the beginning a slash another unknown string and then a series of repeatable strings between slashes
I need to capture every string (I know a split with / delimiter would be fine but I am interested to know how can I extract with regex). I came out first with this:
^\/([a-zA-Z]{2})\/([a-zA-Z]{1,10})(\/[a-zA-Z1-9\+\-]+)
but it only capture
group1: it
group2: xyz
group3: /test
and of course it ignores the rest of the string.
If I add a * sign at the end it only captures the last sentence:
^\/([a-zA-Z]{2})\/([a-zA-Z]{1,10})(\/[a-zA-Z1-9\+\-]+)*
group1: it
group2: xyz
group3: /gfd4
So, I am obviously missing some fundamentals, so in addition to the proper regex I would like to have an explanation.
I tagged as Java because the engine which parses the regex is the JDK 7. It is my knowledge that each engine may have differences.
As mentioned here, this is expected:
With one group in the pattern, you can only get one exact result in that group.
If your capture group gets repeated by the pattern (you used the + quantifier on the surrounding non-capturing group), only the last value that matches it gets stored.
I would rather capture the rest of the string in group3 ((\/.*$), as in this demo), then use a split around '/'. Or apply yhat pattern on the rest of the string:
Pattern p = Pattern.compile("(\/[a-zA-Z1-9\+\-]+)");
Matcher m = p.matcher(str);
while (m.find()) {
String place = m.group(1);
...
}

Java reduce dynamic regex pattern to eliminate duplicated qualifiers

Yes I know another regex question, MEH! Well this is kind of regex but more pattern recognition which drives regex generation...
Anyway, I'm working on a brain teaser and need to convert a binary representation of a string of characters to some other string representation. i.e. 0 = A|AA|AAA|AAAA+ and 1 = A|AA|AAA|AAAA+|B|BB|BBB|BBBB+ does 1010101010 == AAAAABBBBAAAA? Given a rather large input file.
My solution was to create regex on the fly using the pattern A+ for 0 and (A+|B+) for 1.
The issue is that as I iterate the input which can be pretty large (binary representation is up to 150 chars and the AB notation can be up to 1000 chars), I end up with a large regex pattern that is not performing quick enough for my needs (needs to be able to perform a match on a character string up to 1000 characters in less than 10 seconds)
To speed up the solution I wanted to reduce the size of the generated regex so for the input of a binary representation of 1010101010 I want the regex to be (A+(A+|B+))+ instead of my generated A+(A+|B+)A+(A+|B+)A+(A+|B+)A+(A+|B+)A+(A+|B+)
My thought was that I could detect the repeating pattern and reduce it to just the first sequence that is repeated and then generate the regex string off of that.
Any thoughts?
Instead of trying to make one big pattern that matches the whole file you could go for partial matching using a loop and matcher.find() to iterate through the individual matches of A+B+.
Pattern pattern = Pattern.compile("A+B+");
Matcher matcher = pattern.matcher(input);
while (matcher.find())
{
String part = matcher.group(); // this is the matched part
}

Java Regex to check "=number", ex "=5455"?

I want to check a string that matches the format "=number", ex "=5455".
As long as the fist char is "=" & the subsequence is any number in [0-9] (dot is not allowed), then it will popup "correct" message.
if(str.matches("^[=][0-9]+")){
Window.alert("correct");
}
So, is this ^[=][0-9]+ the correct one?
if it is not correct, can u provide a correct solution?
if it is correct, then can u find a better solution?
I'm no big regex expert and more knowledgeable people than me might correct this answer, but:
I don't think there's a point in using [=] rather than simply = - the [...] block is used to declare multiple choices, why declare a multiple choice of one character?
I don't think you need to use ^ (if your input string contains any character before =, it won't match anyway). I'm unsure as to whether its presence makes your regex faster, slower or has no effect.
In conclusion, I'd use =[0-9]+
That should be correct it is looking for an anchored at the beginning = sign and then 1 or more digits between 0-9
Your regex will work, even though it can be simplified:
.matches() does not really do regex matching, since it tries and matches all the input against the regex; therefore the beginning of input anchor is not needed;
you don't need the character class around the =.
Therefore:
if (str.matches("=[0-9]+")) { ... }
If you want to match a string which only begins with that regex, you have to use a Pattern, a Matcher and .find():
final Pattern p = Pattern.compile("^=[0-9]+");
final Matcher m = p.matcher(str);
if (m.find()) { ... }
And finally, Matcher also has .lookingAt() which anchors the regex only at the beginning of the input.

Regular expression to match this query

I have a sentence like this:
Well, {hero}Superman X. 123 Sr.{/hero}, the most lovable guy was hated by {lover}Louis{/lover}.
I am using java regular exp. like this (which is not working: of course):
Pattern search = Pattern.compile("}.*{\/")
Actually it provides me this output:
}Superman X. 123 Sr.{/hero}, the most lovable guy was hated by {lover}Louis{/
When actually I want: "Superman X. 123 Sr." and then "Louis". How can this be achieved apart from running a while loop and increment the index? I can try that ..but was trying to know if there is an easier way that I am missing.
There may be a better regex, but this (\{\w+\})([\w\.\s]+)(\{/\w+\}) does your work:
String test = "Well, {hero}Superman X. 123 Sr.{/hero}, the most lovable guy"+
" was hated by {lover}Louis{/lover}.";
Pattern p = Pattern.compile("(\\{\\w+\\})([\\w\\.\\s]+)(\\{/\\w+\\})");
Matcher m = p.matcher(test);
while(m.find()){
System.out.println(m.group(2));
}
That is because quantifiers are greedy by default. You want a lazy quantifier, so try .*? instead of just .*.
Also, you might want to capture the tag itself:
Pattern.compile("\\{([^}]+)\\}(.*?)\\{/\1\\}");
Note that I'm not 100% certain of the current syntax for a backreference in Java regexes, but that should work. You should end up with the tag name in the first captured subpattern (hero or lover in this case), and the name itself in the second subpattern.

Categories