Java regex matching in between two pattern [duplicate] - java

This question already has answers here:
Regex Match all characters between two strings
(16 answers)
Closed 3 years ago.
I've a url like
https://example.com/helloworld/#.id==imhere
or
https://example.com/helloworld/#.id==imnothere?param1=value1
I want to extract the value imhere or imnothere from these URLs.
Pattern.compile("(?<=helloworld\\/#\\.id==).*(?=\\?)");
Problem with this one is it does not found ? (first case) it is not matching the pattern.
Can someone help me to fix this?
Sorry my mistake, I've missed #.id phase in the URL.

This expression should do it:
^.*#==(.*?)(?:\?.*)?$
regex101 demo
It searches for #== and grabs everything after this string, up to a ?, if any. The trick is the lazy *.
The actual match is in group one. Translated to Java, a sample application would look like this:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Sample {
private static final String PATTERN_TEMPLATE = "^.*#==(.*?)(?:\\?.*)?$";
public static void main (final String... args) {
final Pattern pattern = Pattern.compile(PATTERN_TEMPLATE);
final String firstTest = "https://example.com/helloworld/.#==imhere";
final Matcher firstMatcher = pattern.matcher(firstTest);
if (firstMatcher.matches()) {
System.out.println(firstMatcher.group(1));
}
final String secondTest =
"https://example.com/helloworld/.#==imnothere?param1=value1";
final Matcher secondMatcher = pattern.matcher(secondTest);
if (secondMatcher.matches()) {
System.out.println(secondMatcher.group(1));
}
}
}
Ideone demo
If one wants to incorporate the regex to also validate that helloworld/. is present, then one can simply extend the regular expression:
^.*helloworld\/\.#==(.*?)(?:\?.*)?$
regex101 demo
But one should be careful when translating this expression to Java. The backslashes have to be escaped.

I would not use a regular expression for this. It’s a heavyweight solution for a simple problem.
Use the URI class to extract the path (the part between the host and the ?, if any), then look for the last occurrence of ==:
String s = "https://example.com/helloworld/#.id==imnothere?param1=value1";
URI uri = URI.create(s);
String path = uri.getPath();
int idAttributeIndex = path.lastIndexOf("#.id==");
String id = path.substring(idAttributeIndex + 6);

To match after a specific phrase up to the next whitespace, question mark, or end of string, the Java regex is this :
"(?<=helloworld/\\.#==).*?(?![^\\s?])"
https://regex101.com/r/9yraPt/1
If spanning lines, ad (?s) to the beginning.

Related

A bug in a regex in JDK 8?

I have this reference working Perl script with a regex, copied from a Java snippet that isn't giving the expected results:
my $regex = '^[AT]-([A-Z0-9]{4})-([A-Z0-9]{4})(?:-([A-Z0-9]{4}))*-([A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})$';
if ("A-PROD-COMP-LOGL-00000000-0000-8033-0000-000200354F0A" =~ /$regex/)
{
print "Matches 1=$1 2=$2 3=$3 4=$4\n";
}
This correctly outputs:
Matches 1=PROD 2=COMP 3=LOGL 4=00000000-0000-8033-0000-000200354F0A
Now the equivalent Java snippet:
private static final String NON_SYSTEM_TYPE_REGEX = "^[AT]-([A-Z0-9]{4})-([A-Z0-9]{4})(?:-([A-Z0-9]{4}))*-([A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})$";
private static final Pattern NON_SYSTEM_TYPE_PATTERN = Pattern.compile(MutableUniqueIdentity.NON_SYSTEM_TYPE_REGEX);
...
final Matcher match = MutableUniqueIdentity.NON_SYSTEM_TYPE_PATTERN.matcher(uniqueIdentity);
The uniqueIdentity input is further back in the stack trace (in a unit test) and is this value:
final String id5CompactString = "A-PROD-COMP-LOGL-00000000-0000-8033-0000-000200354F0A";
NOTE: The regex and uniqueIdentity values were copied to the Perl program from a debug session to assert if a different language comes up with a different result (which it did).
ADDITIONAL NOTE: The reason the non-capture group is there is to allow the third element in the string to be optional, so it has to deal with both of these:
A-PROD-COMP-LOGL-00000000-0000-8033-0000-000200354F0A
A-PROD-COMP-00000000-0000-8033-0000-000200354F0A
My unit test fails in Java - the third match group, which should be LOGL, is in fact 0000.
Here is a screenshot of the debugger right after the regex match line above:
You can see that the pattern matches, you can verify that the input parameter (text) and regex are the same as the Perl script, but the result is different!
So my question is: Why does match.groups(3) have a value of 0000 (when it should have a value LOGL) and how does that related back to the regex and the string it is applied to?
In Perl it yields the correct result - LOGL.
Additional info: I have perused this page that highlights the differences between Perl and Java regex engines, and there doesn't appear to be anything applicable.
Replace your regex with the following regex:
^[AT]-([A-Z0-9]{4})-([A-Z0-9]{4})-(?:([A-Z0-9]{4}))*-([A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})$
This has been moved out----------^
I have moved - out of the non-capturing group.
Demo:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
final String NON_SYSTEM_TYPE_REGEX = "^[AT]-([A-Z0-9]{4})-([A-Z0-9]{4})-(?:([A-Z0-9]{4}))*-([A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})$";
final Pattern NON_SYSTEM_TYPE_PATTERN = Pattern.compile(NON_SYSTEM_TYPE_REGEX);
String uniqueIdentity = "A-PROD-COMP-LOGL-00000000-0000-8033-0000-000200354F0A";
final Matcher match = NON_SYSTEM_TYPE_PATTERN.matcher(uniqueIdentity);
if (match.find()) {
System.out.printf("Matches 1=%s 2=%s 3=%s 4=%s%n", match.group(1), match.group(2), match.group(3),
match.group(4));
}
}
}
Output:
Matches 1=PROD 2=COMP 3=LOGL 4=00000000-0000-8033-0000-000200354F0A
Check the demo at regex101 as well.
Ok I've made it work, but I don't understand why.
The regex needs to be made non-greedy, so instead of:
^[AT]-([A-Z0-9]{4})-([A-Z0-9]{4})(?:-([A-Z0-9]{4}))*-([A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})$
it needs to be:
^[AT]-([A-Z0-9]{4})-([A-Z0-9]{4})(?:-([A-Z0-9]{4}))*?-([A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12})$
(with the extra ? after the * of the non-capture group)

How to remove everything after last hyphen using regex java [duplicate]

This question already has answers here:
Remove string after last occurrence of a character
(5 answers)
Closed 3 years ago.
I want to remove everything after last hyphen
branch:resource-fix-c95e12f
I have tried with following command :
replaceAll(".*[^/]*$","");
I want the expected result as branch:resource-fix
Instead of regex, could you use
String::lastIndexOf?
str.substring(0,str.lastIndexOf(‘-‘))
You can use this regex:
-[^-]+$
Example: https://regex101.com/r/dHSNNN/1
You can use the following pattern
-[^-]*$
Demo
This expression might simply work:
-[^-]*$
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "-[^-]*$";
final String string = "branch:resource-fix-c95e12f";
final String subst = "";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll(subst);
System.out.println(result);
Another option would be, ^(.*)-.*$ replaced with \\1.
Demo 2
The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.

Using or '|' in regex [duplicate]

This question already has answers here:
Difference between matches() and find() in Java Regex
(5 answers)
Closed 5 years ago.
I am stuck in a simple issue I want to check if any of the words : he, be, de is present my text.
So I created the pattern (present in the code) using '|' to symbolize OR
and then I matched against my text. But the match is giving me false result (in print statement).
I tried to do the same match in Notepad++ using Regex search and it worked there but gives FALSE( no match) in Java. C
public class Del {
public static void main(String[] args) {
String pattern="he|be|de";
String text= "he is ";
System.out.println(text.matches(pattern));
}
}
Can any one suggest what am I doing wrong.
Thanks
It's because you are trying to match against the entire string instead of the part to find. For example, this code will find that only a part of the string is conforming to the present regex:
Matcher m = Pattern.compile("he|be|de").matcher("he is ");
m.find(); //true
When you want to match an entire string and check if that string contains he|be|de use this regex .*(he|be|de).*
. means any symbol, * is previous symbol may be present zero or more times.
Example:
"he is ".matches(".*(he|be|de).*"); //true
String regExp="he|be|de";
Pattern pattern = Pattern.compile(regExp);
String text = "he is ";
Matcher matcher = pattern.matcher(text);
System.out.println(matcher.find());

How to split a binary string into groups that containt only ones or zeros with Java regular expressions? [duplicate]

This question already has an answer here:
Split regex to extract Strings of contiguous characters
(1 answer)
Closed 7 years ago.
I'm new to using regular expressions, but I think that in an instance like this using them would be the quickest and most ellegant way. I have a binary string, and I need to split it into groups that only contain consecutive zeros or ones, for example:
110001
would be split into
11
000
1
I just can't figure it out, this is my current code, thanks:
class Solution {
public static void main(String args[]) {
String binary = Integer.toBinaryString(67);
String[] exploded = binary.split("0+| 1+");
for(String string : exploded) {
System.out.println(string);
}
}
}
}
Try
public class Solution {
public static void main(String[] args) {
String binary = Integer.toBinaryString(67);
System.out.println(binary);
Pattern pattern = Pattern.compile("0+|1+");
Matcher matcher = pattern.matcher(binary);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
}
}
Rather than split you can use match using this regex and use captured group #1 for your matches:
(([01])\2*)
RegEx Demo
if you want to use split (?<=0)(?=1)|(?<=1)(?=0).
Not sure what you want to do with all zero's or all one's though.
The method split requires a pattern to describe the separator. To achieve you goal you have to describe a location (you don’t want to consume characters at the split position) between the groups:
public static void main(String args[]) {
String binary = Integer.toBinaryString(67);
String[] exploded = binary.split("(?<=([01]))(?!\\1)");
for(String string : exploded) {
System.out.println(string);
}
}
(?<=([01])) describes via “look-behind” that before the splitting position, there must be either 1 or 0, and captures the character in a group. (?!\\1) specifies via “negative look-ahead” that the character after the split position must be different than the character found before the position.
Which is exactly what is needed to split into groups of the same character. You could replace [01] with . here to make it a general solution for splitting into groups having the same character, regardless of which one.
The reason it's not working is because of the nature of the split method. The found pattern will not be included in the array. You would need to use a regex search instead.

What is the regex for string with the format XYZ%20DEF.emx#ZMP_00234C3B7?XYZ%20DEF/ABC_AL12345?

I want to find the string of pattern
XYZ%20DEF.emx#ZMP_00234C3B7?XYZ%20DEF/ABC_AL12345?
inside another string.
Following are the rules for the regex
'.emx' is fixed in the same relative position.
'#' is fixed in the same relative position.
'/' is fixed in the same relative position.
all '?' are fixed in the same relative position.
The portion before '.emx' is url encoded so there will be %20 in them.
The portion before '.emx' repeats itself after the first '?'.
below is the string I have constructed based on the rules shown above.
SOMENAME_WITH_%20.emx#SOMETHING_WITH_NUMBERS_AND_ALPHABETS?SOMENAME_WITH_%20/SOME_OTHER_NAME_WITH_%20AL12345?
I have made an attempt at getting to the regex I need and below is how far I have got.
(\w+)\.emx#\w+\?\1\/\w+AL\d{5}\?
This regex finds the string if there are no '%20's in it. I don't know how to modify the above regex to look for %20 within the string.
The regex for finding %20 is
%[A-Fa-f0-9]{2}
but I don't know how to combine the 2 regexs to get what I want.
What is the correct regex for finding the string with the pattern shown above?
Based on the answer I chose below I finally use the following regex
([\\w%\\)\\(]+)\\.emx#\\w+\\?\\1\\/([\\w%\\)\\(?-]+)AL\\d{5}\\?")
You can try with:
(\\w+%\\w+)\\.emx#[\\w%]+\\?\\1\\/[\\w%]+AL\\d{5}\\?
DEMO
Here is example in Java:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest{
public static void main(String[] args){
String string = "fregtreDFGDFG5464SOMENAME_WITH_%20.emx#SOMETHING_WITH_NUMBERS_" +
"AND_ALPHABETS?SOMENAME_WITH_%20/SOME_OTHER_NAME_WITH_%20AL12345?" +
"SDFSFDSFSDfdste5464565yGFDGdfgdfgdfgdfgTRy45y/retertre?retertreterERter" +
"45345435XYZ%20DEF.emx#ZMP_00234C3B7?XYZ%20DEF/ABC_AL12345?%?54654DFGDfg?5656//56456";
Pattern pattern = Pattern.compile("(\\w+%\\w+)\\.emx#[\\w%]+\\?\\1\\/[\\w%]+AL\\d{5}\\?");
Matcher matcher = pattern.matcher(string);
while(matcher.find()) {
System.out.println(string.substring(matcher.start(), matcher.end()));
System.out.println();
}
}
}
with result:
SOMENAME_WITH_%20.emx#SOMETHING_WITH_NUMBERS_AND_ALPHABETS?SOMENAME_WITH_%20/SOME_OTHER_NAME_WITH_%20AL12345?
XYZ%20DEF.emx#ZMP_00234C3B7?XYZ%20DEF/ABC_AL12345?

Categories