Regex to match conan dependency from conanfile.txt - java

I am trying to create a regex in Java to match and get the name, version, channel and owner for each dependency but I haven't been able to have one that covers all the possible scenarios:
the structure is something like name/version#owner/channel, where the version might have a semver structure, the owner and channel are optional.
Currently, I have :
^(?<name>[\d\w][\d\w\+\.-]+)\/(?<version>[\d\w][\d\w\.-]+)(#(?<owner>\w+))?(\/(?<channel>.+))?$
but it's failing for boost_atomic/1.59.0+4#owner/release, since the +4 is not matched and I need the value before that -> 1.59.0
Some other scenarios that need to be valid and are valid for the regex above are:
Poco/1.9.0#pocoproject/stable
zlib/1.2.11#conan/stable
freetype/2.10.1/stable
openssl/1.0.2g/stable
openssl/1.0.2g
openssl/1.0.2g#owner
Also, there might be some dependencies with comments :
zlib/1.2.11#conan/stable # comment
In that case I would need to get rid of the component and only get the relevant information with the regex.
I am not sure if my current regex is good, but from what I've tested only some scenarios are missing

You can simplify your regex and avoid putting too many characters in that character set and escaping them, instead use something like [^\/] to capture anything except / as you want to capture anything preceding a slash.
I've made some modifications and the updated regex that should work for you is following,
^(?<name>[^\/]+)\/(?<version>[^\/#\s]+)(#(?<owner>\w+))?(\/(?<channel>\S+))?(?:\s*#\s*(?<comment>.+))?$
I've added another named group for comment as you mentioned that can also be present. Let me know if this works for you.
Try this demo
Edit: If channel contains a text like release:132434 and anything followed by a colon is to be ignored as part of channel, you can use updated regex below,
^(?<name>[^\/]+)\/(?<version>[^\/#\s]+)(?:#(?<owner>\w+))?(?:\/(?<channel>[^:\s]+)\S*)?(?:\s*#\s*(?<comment>.+))?\s*$
Updated Demo

Related

How to write regex to match a key in yaml file?

I have a yaml which looks like this..! Sonar by default providing sonar-yaml-plugin with some templates which accepts regex as input to verify particular key is present or not in .yml file.
I want regex to match entire key logging:file
server:
port: 8989
logging:
file: ./sample1.txt
path: ./log
I have tried using (logging)(?s:.*?)(file) but its not validating when I use it in sonar-plugin.
I'm not so sure what you may want to match, but maybe this regex might help you to do so or design your desired expression:
^(logging:|\s+file:)(.+)
This expression has a left boundary on start ^.
Your two words connected with an OR (|)
Then, matches everything after that using .+
You can also add additional boundaries to it, however if you could add some real samples to your question, it would be easier to answer.

Java's named capturing groups and trying to see if they are optional

I am currently using named capturing groups in a regex applied to a URL. The client feeds in the regex, but I need to get:
list of capturing group names
which of the names are required
which of the names are optional
Currently, I am cheating and translate {id} or {someVar} to a capture group and everything is required. Now however, because of add/edit, some urls are like this
/postadd
/postedit/someIdHere
so the regex is ONE route matching both. I believe it would look something like this
/postadd|/postedit/(?<id>[^/]+)
I would really really really prefer not to use a regex on the regex to find out if it is optional(as code like that is hard to read and reverse engineer). Is there any way instead to list the capturing groups and find out if it's optional or not?

Configuring the tokanisation of the search term in an elasticsearch query

I am doing a general search against elasticsearch (1.7) using a match query against a number of specified fields. This is done in a java app with one box to enter search terms in. Various search options are allowed (for example surrounding phrase with quotes to look for the phase not the component words). This means I am doing full test searches.
All is well except my account refs have forward slashes in them and a search on an account ref produces thousands of results. If I surround the account ref with quotes I get just the result I want. I assume an account ref of AC/1234/A01 is searching for [AC OR 1234 OR A01]. Initially I thought this was a regex issue but I don’t think it is.
I raised a similar question a while ago and one suggestion which I had thought worked was to add "analyzer": "keyword" to the query (in my code
queryStringQueryBuilder.analyzer("keyword")
).
The problem with this is that many of the other fields searched are not keyword and it is stopping a lot of flexible search options working (case sensitivity etc). I assume this has become something along the lines of an exact match in the text search.
I've looked at this the wrong way around for a while now and as I see it I can't fix it in the index or even in the general analyser settings as even if the account ref field is tokenised and analysed perfectly for my requirement the search will still search all the other fields for [AC OR 1234 OR A01].
Is there a way of configuring the search query to not split the account number on forward slashes? I could test ignoring all punctuation if it is possible to only split by whitespaces although I would prefer not to make such a radical change...
So I guess what I am asking is whether there is another built in analyzer which would still do a full full text search but would not split the search term up using punctuation ? If not is this something I could do with a custom analyzer (without applying it to the index itself ?)
Thanks.
The simplest way to do it is by replacing / with some character that doesn't cause the word to be split in two tokens, but doesn't interfere with your other terms (_, ., ' should work) or remove / completely using mapping char filter. There is a similar example here https://stackoverflow.com/a/23640832/783043

Replacing text in between certain symbols

I am given a string like this:
CSF#asomedatahere#iiwin#hnotwhatIwant
And I want to replace the string that is present BETWEEN #i and #h (h could be any character) . This is what I have so far and I feel that I am close, however, there may not always be a #CHAR after this #idata pattern.
(?<=#i)(.*)(?=#.*)
I would like it to work for that optionally not being there. As it can be seen in the link below it works for the first case not the second. I tried adding a '?' at the end to make the last part optional but that makes it not work for the first case.
Here is a link that will show you actively what is not working: http://fiddle.re/vtvmc
You need to expand the look-ahead to use the end of the input as well:
(?<=#i)(.*?)(?=#.*|$)
This would match
iwin#hnotwhatIwant in CSF#asomedatahere#iiwin#hnotwhatIwant
iwin#h in CSF#asomedatahere#iiwin#h
iwin in CSF#asomedatahere#iiwin.

Regex: How not to match a few letters

I have the following string: SEE ATTACHED ADDENDUM TO HUD-1194,520.07
Inside that string is HUD-1 and after that is 194,520.07. What I want is the 194,520.07 part.
I have written the following regular expression to pull that value out:
[^D\-1](?:-|\()?\$?(?:\d{1,3}[ ,]?)*(?:\.\d+)\)?
However, this pulls out: 94,520.07
I know it has something to do with this part: [^D\-1] "eating" to many of the 1's. Any ideas how I can stop it from "eating" 1's after the first one that appears in HUD-1?
UPDATED:
The reason for all the other stuff is I only want to match as well if the value after HUD-1 is a money amount. And the rest of that regex tries to determine all the different ways a money amount could be written
Why not something as simple as:
.*HUD\-1(.*+)
Ok, you need to be more restrictive I see based on your updated question. Try changing [^D\-1] to just (?:HUD\-1)?. For what it's worth, your currency RegEx is vary lax, allowing input like:
001 001 .31412341234123
You might consider not reinventing the wheel there, I'm sure you can find a currency RegEx quickly via Google. Otherwise, I'd also suggest anchoring your RegEx with a $ at the end of it.
this change will make the second match group of the regex include the full number you would like (everything after the first 1), and put the possible HUD-1 in a separate matching group, if present.
(HUD-1)?((?:-|\()?\$?(?:\d{1,3}[ ,]?)*(?:\.\d+)\)?)

Categories