I am given a string like this:
CSF#asomedatahere#iiwin#hnotwhatIwant
And I want to replace the string that is present BETWEEN #i and #h (h could be any character) . This is what I have so far and I feel that I am close, however, there may not always be a #CHAR after this #idata pattern.
(?<=#i)(.*)(?=#.*)
I would like it to work for that optionally not being there. As it can be seen in the link below it works for the first case not the second. I tried adding a '?' at the end to make the last part optional but that makes it not work for the first case.
Here is a link that will show you actively what is not working: http://fiddle.re/vtvmc
You need to expand the look-ahead to use the end of the input as well:
(?<=#i)(.*?)(?=#.*|$)
This would match
iwin#hnotwhatIwant in CSF#asomedatahere#iiwin#hnotwhatIwant
iwin#h in CSF#asomedatahere#iiwin#h
iwin in CSF#asomedatahere#iiwin.
Related
I am trying to create a regex in Java to match and get the name, version, channel and owner for each dependency but I haven't been able to have one that covers all the possible scenarios:
the structure is something like name/version#owner/channel, where the version might have a semver structure, the owner and channel are optional.
Currently, I have :
^(?<name>[\d\w][\d\w\+\.-]+)\/(?<version>[\d\w][\d\w\.-]+)(#(?<owner>\w+))?(\/(?<channel>.+))?$
but it's failing for boost_atomic/1.59.0+4#owner/release, since the +4 is not matched and I need the value before that -> 1.59.0
Some other scenarios that need to be valid and are valid for the regex above are:
Poco/1.9.0#pocoproject/stable
zlib/1.2.11#conan/stable
freetype/2.10.1/stable
openssl/1.0.2g/stable
openssl/1.0.2g
openssl/1.0.2g#owner
Also, there might be some dependencies with comments :
zlib/1.2.11#conan/stable # comment
In that case I would need to get rid of the component and only get the relevant information with the regex.
I am not sure if my current regex is good, but from what I've tested only some scenarios are missing
You can simplify your regex and avoid putting too many characters in that character set and escaping them, instead use something like [^\/] to capture anything except / as you want to capture anything preceding a slash.
I've made some modifications and the updated regex that should work for you is following,
^(?<name>[^\/]+)\/(?<version>[^\/#\s]+)(#(?<owner>\w+))?(\/(?<channel>\S+))?(?:\s*#\s*(?<comment>.+))?$
I've added another named group for comment as you mentioned that can also be present. Let me know if this works for you.
Try this demo
Edit: If channel contains a text like release:132434 and anything followed by a colon is to be ignored as part of channel, you can use updated regex below,
^(?<name>[^\/]+)\/(?<version>[^\/#\s]+)(?:#(?<owner>\w+))?(?:\/(?<channel>[^:\s]+)\S*)?(?:\s*#\s*(?<comment>.+))?\s*$
Updated Demo
The following piece of code checks for same variable portion /en(^$|.*) which is empty or any characters. So the expression should match /en AND /en/bla, /en/blue etc.
But the expression doesn't work when checking for just /en.
"/en".matches("/en(^$|.*)")
Is there a way to make this empty regex check (^$) perform with java?
edit
I mean: Is there a way to make this piece of code return true?
What you're currently doing is checking whether en is followed by the start of string then the end of string (which doesn't make sense, since the start of string needs to be first) or anything else. This should work:
"/en".matches("/en(|.*)")
Or just using ? (optional):
"/en".matches("/en(.*)?")
But it's rather pointless, since * is zero or more (so a blank string will match for .*), just this should do it:
"/en".matches("/en.*")
EDIT:
Your code was already returning true, but it was not matching the ^$ part, but rather .* (similar to the above).
I should point out that you may as well use startsWith, unless your real data is more complex:
"/en".startsWith("/en")
Is there a way to make this piece of code return true?
"/en".matches("/en(^$|.*)")
That code does return true. Just try it!
However, your pattern is unnecessarily complex. Try:
"/en".matches("/en.*")
This will match /en followed by anything (including nothing).
I have a need to replace a particular token (in this case, ?) with another token (we can start with ! or something). String's replaceAll method will work for this. But, I don't want to replace the question mark if it happens to follow the token action. (That would be bad!)
I've tried text = text.replaceAll("[^a][^c][^t][^i][^o][^n]\\?","!"); but that didn't work.
For example, I want "test.action?param=lol?omg"; to turn into test.action?param=lol!omg. I know I could do something silly like
text.replaceAll("action\\?","%%%CRAZYTOKEN%%%")
.replaceAll("\\?","!")
.replaceAll("%%%CRAZYTOKEN%%%","action?");
but that just seems like a waste of time, especially on large strings. I'd rather do it right.
You need a negative look behind
text.replaceAll("(?<!action)\\?", "!");
This asserts that the ? does not follow action
Just use a zero-width negative lookbehind assertion to only match ?s which don't follow action:
text = text.replaceAll("(?<!action)\\?", "!");
Note the extra \\ before the ?. You need to escape the ? since it is a special character in regex.
Good morning. I realize there are a ton of questions out there regarding replace and replaceAll() but i havnt seen this.
What im looking to do is parse a string (which contains valid html to a point) then after I see the second instance of <p> in the string i want to remove everything that starts with & and ends with ; until i see the next </p>
To do the second part I was hoping to use something along the lines of s.replaceAll("&*;","")
That doesnt work but hopefully it gets my point across that I am looking to replace anything that starts with & and ends with ;
You should probably leave the parsing to a DOM parser (see this question). I can almost guarantee you'll have to do this to find text within the <p> tags.
For the replacement logic, String.replaceAll uses regular expressions, which can do the matching you want.
The "wildcard" in regular expressions that you want is the .* expression. Using your example:
String ampStr = "This &escape;String";
String removed = ampStr.replaceAll("&.*;", "");
System.out.println(removed);
This outputs This String. This is because the . represents any character, and the * means "this character 0 or more times." So .* basically means "any number of characters." However, feeding it:
"This &escape;String &anotherescape;Extended"
will probably not do what you want, and it will output This Extended. To fix this, you specify exactly what you want to look for instead of the . character. This is done using [^;], which means "any character that's not a semicolon:
String removed = ampStr.replaceAll("&[^;]*;", "");
This has performance benefits over &.*?; for non-matching strings, so I highly recommend using this version, especially since not all HTML files will contain a &abc; token and the &.*?; version can have huge performance bottle-necks as a result.
The expression you want is:
s.replaceAll("&.*?;","");
But do you really want to be parsing HTML this way? You may be better off using an XML parser.
Regex Pattern - ([^=](\\s*[\\w-.]*)*$)
Test String - paginationInput.entriesPerPage=5
Java Regex Engine Crashing / Taking Ages (> 2mins) finding a match. This is not the case for the following test inputs:
paginationInput=5
paginationInput.entries=5
My requirement is to get hold of the String on the right-hand side of = and replace it with something. The above pattern is doing it fine except for the input mentioned above.
I want to understand why the error and how can I optimize the Regex for my requirement so as to avoid other peculiar cases.
You can use a look behind to make sure your string starts at the character after the =:
(?<=\\=)([\\s\\w\\-.]*)$
As for why it is crashing, it's the second * around the group. I'm not sure why you need that, since that sounds like you are asking for :
A single character, anything but equals
Then 0 or more repeats of the following group:
Any amount of white space
Then any amount of word characters, dash, or dot
End of string
Anyway, take out that *, and it doesn't spin forever anymore, but I'd still go for the more specific regex using the look behind.
Also, I don't know how you are using this, but why did you have the $ in there? Then you can only match the last one in the string (if you have more than one). It seems like you'd be better off with a look-ahead to the new line or the end: (?=\\n|$)
[Edit]: Update per comment below.
Try this:
=\\s*(.*)$