I need to configure a url matching pattern with inverse criteria. Something like this.
/endpoint/300[0-9]+ -> These should not match and everything else should match.
I am trying the negative look back with \/endpoint\/(?!^300)\d+, but is not quite working. Can somebody explain what I am missing?
Related
I am trying to create a regex in Java to match and get the name, version, channel and owner for each dependency but I haven't been able to have one that covers all the possible scenarios:
the structure is something like name/version#owner/channel, where the version might have a semver structure, the owner and channel are optional.
Currently, I have :
^(?<name>[\d\w][\d\w\+\.-]+)\/(?<version>[\d\w][\d\w\.-]+)(#(?<owner>\w+))?(\/(?<channel>.+))?$
but it's failing for boost_atomic/1.59.0+4#owner/release, since the +4 is not matched and I need the value before that -> 1.59.0
Some other scenarios that need to be valid and are valid for the regex above are:
Poco/1.9.0#pocoproject/stable
zlib/1.2.11#conan/stable
freetype/2.10.1/stable
openssl/1.0.2g/stable
openssl/1.0.2g
openssl/1.0.2g#owner
Also, there might be some dependencies with comments :
zlib/1.2.11#conan/stable # comment
In that case I would need to get rid of the component and only get the relevant information with the regex.
I am not sure if my current regex is good, but from what I've tested only some scenarios are missing
You can simplify your regex and avoid putting too many characters in that character set and escaping them, instead use something like [^\/] to capture anything except / as you want to capture anything preceding a slash.
I've made some modifications and the updated regex that should work for you is following,
^(?<name>[^\/]+)\/(?<version>[^\/#\s]+)(#(?<owner>\w+))?(\/(?<channel>\S+))?(?:\s*#\s*(?<comment>.+))?$
I've added another named group for comment as you mentioned that can also be present. Let me know if this works for you.
Try this demo
Edit: If channel contains a text like release:132434 and anything followed by a colon is to be ignored as part of channel, you can use updated regex below,
^(?<name>[^\/]+)\/(?<version>[^\/#\s]+)(?:#(?<owner>\w+))?(?:\/(?<channel>[^:\s]+)\S*)?(?:\s*#\s*(?<comment>.+))?\s*$
Updated Demo
Before y'all jump on me for posting something similar to previous questions asked, yes, there seem to be a number of regex related questions but nothing which seems to help me, or at least that I can see.
I am trying to parse strings in JAVA using PATTERN and MATCHER and am really having no joy. My regular expression seems to match my input string when I use a few of the online regular expression testing websites but Java simply does not match my expression.
My input string is:
"Big apple" title="Little Apple" type="Container" url="http://malcolm.com/testing"
The regular expression I am using to match is ".*" title="(.*)" type="Container" url="(.*)"
Essentially I want to pull out the text within the second and the fourth set of quotes. There will always be 4 sets of quotes with text within and around.
I am coding as follows:
Variable XMLSubstring contains the string above (including the quotes) and is as stated, even when I print it out.
Pattern p = Pattern.compile(".* title=\"(.*)\" type=\"Container\" url=\"(.*)\"");
m = p.matcher(XMLSubstring);
It doesn't appear to be rocket science I'm attempting but I'm pulling my hair out trying to debug the bloody thing.
Is there something wrong with my regex pattern?
Is there something wrong with the code I am using?
Am I simply a moron and should stop coding with immediate effect?
EDIT & UPDATE: I have found the problem. My string had a space at the end of it which was breaking the parser! How silly, and I think based on that, I need to accept the third suggestion of mine and give up programming. Thanks all for your assistance.
Try this,
String str="\"Big apple\" title=\"Little Apple\" type=\"Container\" url=\"http://malcolm.com/testing\"";
Pattern p=Pattern.compile(".* title=\\\".*\\\" type=\\\"Container\\\" url=\\\".*\\\"");
Matcher m=p.matcher(str);
I write a RegExp for validate a file URL
file:/{2,3}[a-zA-Z0-9]*\\|?/.*
for URL like
file:/D:/workspace/project/build/libs/myjar-1.0.jar
But doesnt work,
am looking for a pattern that matches only URL like this,no other.
Pattern Will return false URLs like
file:/workspace/project/build/libs/myjar-1.0.jar
and
file:/D:/workspace/project/build/libs/myjar-1.0
are will not match
please help
Complete question rewrite
Given the OP's updated criteria, the regex you're looking for is file\:\/\w\:\/[^\s]*\.jar; ensure you enabled g (global) and i (case-insensitive) modifiers.
See a working example on Regex101
If there is no context or rule your regex should be:
/file\:.*/i
See it here: http://regex101.com/r/yY5xG6
I'm currently having some issues with a regex to extract a URL.
I want my regex to take URLS such as:
http://stackoverflow.com/questions/ask
https://stackoverflow.com
http://local:1000
https://local:1000
Through some tutorials, I've learned that this regex will find all the above: ^(http|https)\://.*$ however, it will also take http://local:1000;http://invalid http://khttp://as a single string when it shouldn't take it at all.
I understand that my expression isn't written to exclude this, but my issue is I cannot think of how to write it so it checks for this scenario.
Any help is greatly appreciated!
Edit:
Looking at my issue, it seems that I could eliminate my issue as long as I can implement a check to make sure '//' doesn't occur in my string after the initial http:// or https://, any ideas on how to implement?
Sorry this will be done with Java
I also need to add the following constraint: a string such as http://local:80/test:90 fails because of the duplicate of port...aka I need to have a constraint that only allows two total : symbols in a valid string (one after http/s) and one before port.
This will only produce a match if if there is no :// after its first appearance in the string.
^https?:\/\/(?!.*:\/\/)\S+
Note that trying to parse a valid url from within a string is very complex, see
In search of the perfect URL validation regex, so the above does not attempt to do that.
It will just match the protocol and following non-space characters.
In Java
Pattern reg = Pattern.compile("^https?:\\/\\/(?!.*:\\/\\/)\\S+");
Matcher m = reg.matcher("http://somesite.com");
if (m.find()) {
System.out.println(m.group());
} else {
System.out.println("No match");
}
Check your programming language to see if it already has a parser. E.g. php has parse_url()
From http://net.tutsplus.com/tutorials/other/8-regular-expressions-you-should-know/
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
This may change based on the programming language/tool
/[A-Za-z]+://[A-Za-z0-9-]+.[A-Za-z0-9-:%&;?#/.=]+/g
I need to "grab" an attribute of a custom HTML tag. I know this sort of question has been asked many times before, but regex really messes with my head, and I can't seem to get it working.
A sample of XML that I need to work with is
<!-- <editable name="nameValue"> --> - content goes here - <!-- </editable> -->
I want to be able to grab the value of the name attribute, which in this case is nameValue. What I have is shown below but this returns a null value.
My regex string (for a Java app, hence the \ to escape the ") is:
"(.)?<!-- <editable name=(\".*\")?> -->.*<!-- </editable> -->(.)?"
I am trying to grab the attribute with quotation marks I figure this is the easiest and most general pattern to match. Well it just doesn't work, any help will help me keep my hair.
I don't think you need the (.)?s at the beginning and end of your regex. And you need to put in a capturing group for getting only the content-goes-here bit:
This worked for me:
String xml = "RANDOM STUFF<!-- <editable name=\"nameValue\"> --> - content goes here - <!-- </editable> -->RANDOM STUFF";
Pattern p = Pattern.compile("<!-- <editable name=(\".*\")?> -->(.*)<!-- </editable> -->");
Matcher m = p.matcher(xml);
if (m.find()) {
System.out.println(m.group(2));
} else {
System.out.println("no match found");
}
This prints:
- content goes here -
Your search is greedy. Use "\<\!-- \<editable name=\"(.*?)\"\> --\>.*?\<\!-- \<\/editable\> --\>" (added ?). Please note that this one will not work correctly with nested <editable> elements.
If you don't want to perform syntax checking, you could also simply go with: "\<\!-- \<editable name=\"(.*?)\"\> --\>" or even "\<editable name=\"(.*?)\"\>" for better simplicity and performance.
Edit: should be
Pattern re = Pattern.compile( "\\<editable name=\"(.*?)\"\\>" );
I use JavaScript, but it should help to make the expression non-greedy where possible and use not matches instead of any character matches. Not sure how similar regexps are with Java, but instead of using the expression \".*\" try using \"[^\"]*\". That will search for any character within the attribute value that isn't a quote, meaning the expression can't match beyond the attribute value.
Hope that helps
Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.
You may find the answer using TagSoup helpful.