I have a regex but i am not able to interpret it: \w\1.
I thought it would match : aa since it had word a twice and first group would be a word for this regex. But its not behaving in this manner.
Does back referencing work only if we place parentheses around regex ?
Any help would be appreciated. Thanks.
\n refers to the nth capturing group. However, there are no capturing groups in your regex to refer to. You likely want:
(\w)\1
demo
As a Java string that would be "(\\w)\\1".
(\w)\1 Captures the matched subexpression and assigns it a zero-based ordinal number.
Related
I want to match something like this
$(string).not(string).not(string)
The not(string) can repeat zero or more times, after $(string).
Note that the string can be whatever things, except nested not(string).
I used the regular expression (\\$\\((.*)\\))((\\.not\\((.*?)\\))*?)(?!(\\.not)), I think the *? is to non-greedily match any number of sequence of not(string), and use the lookahead to stop the match that is not not(string), so that I can extract only the part that I want.
However, when I tested on the input like
$(string).not(string).not(string).append(string)
the group(0) returns the whole string, which I only need $(string).not(string).not(string).
Obviously I still miss something or misuse of anything, any suggestions?
Try this one (escaped for java):
(\\$\\(string\\)(?:(?:\\.not\(.*?\\))+))
It should capture just the part that you are after. You can test it out (unescaped for java though)
If we assume that parenthesis are not nested, you can write something like this:
string p = "\\$\\([^)]*\\)(?:\\.not\\([^)]*\\))*";
Not need to add a lookahead since the non-capturing group has a greedy quantifier (so the group is repeated as possible).
if what you called string in your question may be a quoted string with parenthesis inside like in Pshemo example: $(string).not(".not(foo)").not(string), you can replace each [^)]* with (?:\\s*\"[^\"]*\"\\s*|[^)]*) to ignore characters inside quoted parts.
From here, "group zero denotes the entire pattern". Use group(1).
(\$\([\w ]+\))(\.not\([\w ]+\))*
This will also work, it would give you two groups, One consisting of the word with $ sign, another would give you the set of all ".not" strings.
Please note: You might have to add escape characters for java.
I am trying to match a string that start with the set word "hotel", then a hyphen, then a word of any length, then another hyphen and finally a number of any length.
Edit: Dima gave the solution I needed in the comments of this question! Thanks Dima.
Further edit: elaborating on Dima's answer, adding capturing groups making it easier to retrieve the information entered, and correcting the last bit to only accept digits:
^hotel-(.+)-(\d+)
^hotel-(.)*$
(But hotel-something WILL work, according to your initial statement).
So, if you actually want something like:
hotel-XXXXXX-YYYYYYY
Then the regex is :
^hotel-(.)*-(.)*$
Try a regex online tester like http://www.regextester.com/.
If you want to match the start of the input, you use ^.
so if you have ^hotel-\b, that will force hotel to be at the start of the string.
as a note, you can use $ for the end of the string in a similar way.
\bhotel-[^\s-]+-[^\s-]+\b
\b means that it should be a word boundery
[^\s-] means anything but - or whitespace
https://regex101.com/r/mH3vY8/1
I am trying to extract from this kind of string ou=persons,ou=(.*),dc=company,dc=org the last string immediately preceded by a coma not followed by (.*). In the last case, this should give dc=company,dc=org.
Looking on regex, this seems to be a positive look behind (preceded by) of a negative look ahead.
So I have achieve this regex: (?<=(,(?!.*\Q(.*)\E))).*, but it returns ,dc=company,dc=org with the coma. I want the same thing without the coma. What I am doing wrong?
The comma appears because the capturing group contains it.
You can make the outside capture group noncapturing with (?:)
(?<=(?:,(?!.*\Q(.*)\E))).*
It seems that I have solved my problem alone, removing the capturing group around the negative look ahead. It gives the following regex: (?<=,(?!.*\Q(.*)\E)).*.
It is linked with the behavior of capturing groups in look arounds as explained here: http://www.regular-expressions.info/lookaround.html in the part Lookaround Is Atomic.
I'm facing some trouble writing a regular expression in Java to parse information from a logfile.
I have a String where the structure "timeinstant: some strings with any character" is repeated from 1 to N times.
timeinstant has the format "dd/mm/yyyy hh:MM:ss:MMMMMM" (M being microseconds).
What I'm trying to do is to find the microseconds of last timeinstant contained in an incoming string.
For example, with the string
] 2012/04/02 16:28:51:861819: abcdefg : lwersdgsdg remote=xx.xxx.xx.xxx:yyy3f] accepted and identified as: John 2012/04/02 16:28:51:862987: pump: Received data on connection {John} [
I'd like m.find() to point to "987: pump...". In order to get this, im using a regex with lookahead:
"(\\d{3}:)(?!\\d{4}/\\d{2}/\\d{2}\\s\\d{2}:\\d{2}:\\d{2}:\\d{6})"
But right now m.find() is pointing to 819 (contained in 2012/04/02 16:28:51:861819).
Your regex is very near to the one you need.
In your negative lookhead, you just forgot that different timestamps are separated by several characters. So you have to add .+ or .* in your lookahead to specify that.
Here is the regex you need:
"(\\d{3}):(?!.+\\d{4}/\\d{2}/\\d{2}\\s\\d{2}:\\d{2}:\\d{2}:\\d{6})"
In your example, it will give you the "987" you are looking for.
If you are only interested in the last occurrence of three digits followed by a colon, wouldn't .*(\d{3}:) work?
Why don't you just use
(\\d{3}: \\w+)
and then use find.next() until there isn't any next?
I'm using Java regular expressions to match and capture a string such as:
0::10000
A solution would be:
(0::\d{1,8})
However, the match would succeed for the input
10::10000
as well, which is wrong. Therefore, I now have:
[^\d](0::\d{1,8})
which means it must lead with any character except a number, but that means there needs to be some character before the first zero. What I really want (and what I need help with) is to say "lead with a non-number or nothing at all."
In conclusion the final solution regular expression should match the following:
0::10000kjkj0::10000
and should not match the following:
10::10000
This site may be of use if someone wants to help.
Thanks.
You need a negative lookbehind:
(?<!\d)(0::\d{1,8})
It means "match 0::\d{1,8} not preceded by \d".