Regex to dynamically replace text in java - java

What I am trying to do is make something like:
some random stuff substitute("random stuff","xxx") test
be replaced to the following:
some xxx test
If I use:
substitute\((.*?)*\)
I get to find the portion, but what I ultimately want is multiple groups where the first group is the text to search and the second group is to replace. I want it to be generic enough so I dont depend on the , since it can appear anywhere. Is there a regex that could work for all cases or should I be depending on the "" to get what I need?

Don't have tested it but you could try
\w*\(\"(.*)\".*\"(.*)\"\)
If order in substitute doesn't change you have in your first group the search and in second the replace term.
And yes I don't see another way but depending on () and "".

Related

Regex to match conan dependency from conanfile.txt

I am trying to create a regex in Java to match and get the name, version, channel and owner for each dependency but I haven't been able to have one that covers all the possible scenarios:
the structure is something like name/version#owner/channel, where the version might have a semver structure, the owner and channel are optional.
Currently, I have :
^(?<name>[\d\w][\d\w\+\.-]+)\/(?<version>[\d\w][\d\w\.-]+)(#(?<owner>\w+))?(\/(?<channel>.+))?$
but it's failing for boost_atomic/1.59.0+4#owner/release, since the +4 is not matched and I need the value before that -> 1.59.0
Some other scenarios that need to be valid and are valid for the regex above are:
Poco/1.9.0#pocoproject/stable
zlib/1.2.11#conan/stable
freetype/2.10.1/stable
openssl/1.0.2g/stable
openssl/1.0.2g
openssl/1.0.2g#owner
Also, there might be some dependencies with comments :
zlib/1.2.11#conan/stable # comment
In that case I would need to get rid of the component and only get the relevant information with the regex.
I am not sure if my current regex is good, but from what I've tested only some scenarios are missing
You can simplify your regex and avoid putting too many characters in that character set and escaping them, instead use something like [^\/] to capture anything except / as you want to capture anything preceding a slash.
I've made some modifications and the updated regex that should work for you is following,
^(?<name>[^\/]+)\/(?<version>[^\/#\s]+)(#(?<owner>\w+))?(\/(?<channel>\S+))?(?:\s*#\s*(?<comment>.+))?$
I've added another named group for comment as you mentioned that can also be present. Let me know if this works for you.
Try this demo
Edit: If channel contains a text like release:132434 and anything followed by a colon is to be ignored as part of channel, you can use updated regex below,
^(?<name>[^\/]+)\/(?<version>[^\/#\s]+)(?:#(?<owner>\w+))?(?:\/(?<channel>[^:\s]+)\S*)?(?:\s*#\s*(?<comment>.+))?\s*$
Updated Demo

Is there a way to force order of letters when using replaceall()?

I want to filter out a string like this in a link: (The link is not exactly like this, much longer).
https://www.(whatever).com/junk/section39843**no2938**.
I need to get no(number) as a String.
When trying to use replaceAll("[^no0-9]","")
to remove everything other than no(number), I get on39843no2938.
Is there a way to leave ONLY "no" with all the numbers? Numbers are fine, because I'm searching through with contains().

Java's named capturing groups and trying to see if they are optional

I am currently using named capturing groups in a regex applied to a URL. The client feeds in the regex, but I need to get:
list of capturing group names
which of the names are required
which of the names are optional
Currently, I am cheating and translate {id} or {someVar} to a capture group and everything is required. Now however, because of add/edit, some urls are like this
/postadd
/postedit/someIdHere
so the regex is ONE route matching both. I believe it would look something like this
/postadd|/postedit/(?<id>[^/]+)
I would really really really prefer not to use a regex on the regex to find out if it is optional(as code like that is hard to read and reverse engineer). Is there any way instead to list the capturing groups and find out if it's optional or not?

Regex: How not to match a few letters

I have the following string: SEE ATTACHED ADDENDUM TO HUD-1194,520.07
Inside that string is HUD-1 and after that is 194,520.07. What I want is the 194,520.07 part.
I have written the following regular expression to pull that value out:
[^D\-1](?:-|\()?\$?(?:\d{1,3}[ ,]?)*(?:\.\d+)\)?
However, this pulls out: 94,520.07
I know it has something to do with this part: [^D\-1] "eating" to many of the 1's. Any ideas how I can stop it from "eating" 1's after the first one that appears in HUD-1?
UPDATED:
The reason for all the other stuff is I only want to match as well if the value after HUD-1 is a money amount. And the rest of that regex tries to determine all the different ways a money amount could be written
Why not something as simple as:
.*HUD\-1(.*+)
Ok, you need to be more restrictive I see based on your updated question. Try changing [^D\-1] to just (?:HUD\-1)?. For what it's worth, your currency RegEx is vary lax, allowing input like:
001 001 .31412341234123
You might consider not reinventing the wheel there, I'm sure you can find a currency RegEx quickly via Google. Otherwise, I'd also suggest anchoring your RegEx with a $ at the end of it.
this change will make the second match group of the regex include the full number you would like (everything after the first 1), and put the possible HUD-1 in a separate matching group, if present.
(HUD-1)?((?:-|\()?\$?(?:\d{1,3}[ ,]?)*(?:\.\d+)\)?)

How to regex match pairs within pairs

My question is fairly straightforward, even if the purpose it will serve is pretty complicated. I will use a simple example:
AzzAyyAxxxxByyBzzB
So normally I would want to get everything between A and B. However, because some of the content between the first A and the last B (one pair) contains additional AB pairs I need to push back the end of the match. (Not sure if that last part made sense).
So what I'm looking for is some RegEx that would allow me to have the following output:
Match 1
Group 1: AzzAyyAxxxxByyBzzB
Group 2: zzAyyAxxxxByyBzz
Then I would match it again to get:
Match 2
Group 1: AyyAxxxxByyB
Group 2: yyAxxxxByy
Then finally again to get:
Match 3
Group 1: AxxxxB
Group 2: xxxx
Obviously if I try (A(.*?)B) on the whole input I get:
Match x
Group 1: AzzAyyAxxxxB
Group 2: zzAyyAxxxx
Which is not what I'm looking for :)
I hope this makes sense. I understand if this can't be done in RegEx, but I thought I would ask some of you regex wizards before I give up on it and try something else. Thanks!
Additional Info:
The project I'm working on is written in Java.
One other problem is that I'm parsing a document which could contain something like this:
AzzAyyAxxxxByyBzzB
Here is some unrelated stuff
AzzAyyAxxxxByyBzzB
AzzzBxxArrrBAssssB
And the top AB pairs needs to be separate from the bottom AB pairs
You made your regex explicitly ungreedy by using the ?. Just leave it out and the regex will consume as much as possible before matching the B:
(A(.*)B)
However, in general nested structures are beyond the scope of regular expressions. In a case like this:
AxxxByyyAzzzB
You would now also match from the first A to the last B. If this is possible in your scenario, you might be better of going through the string yourself character-by-character and counting As and Bs to figure out which ones belong together.
EDIT:
Now that you have updated the question and we figured this out in the comments, you do have the problem of multiple consecutive pairs. In this case, this cannot be done with a regex engine that does not support recursion.
However you can switch to matching from the inside out.
A([^AB]*)B
This will only get innermost pairs, because there can be neither an A nor a B between the delimiters. If you find it, you can then remove the pair and continue with your next match.
Use word boundary if you use multiline mode:
\bA(.*)B\b #for matches that does not start from beginning of line to end
or
^A(.*)B$ #for matches that start from beginning of line till end
You won't be able to do this with Regular Expressions alone. What you're describing is more Context-Free than Regular. In order to parse something like this you need to push a new context onto a stack every time to encounter an 'A' and pop the stack every time you encounter a 'B'. You need something more like a pushdown automaton than a regular expression.

Categories