Java regular expressions with negative lookahead - java

I'm facing some trouble writing a regular expression in Java to parse information from a logfile.
I have a String where the structure "timeinstant: some strings with any character" is repeated from 1 to N times.
timeinstant has the format "dd/mm/yyyy hh:MM:ss:MMMMMM" (M being microseconds).
What I'm trying to do is to find the microseconds of last timeinstant contained in an incoming string.
For example, with the string
] 2012/04/02 16:28:51:861819: abcdefg : lwersdgsdg remote=xx.xxx.xx.xxx:yyy3f] accepted and identified as: John 2012/04/02 16:28:51:862987: pump: Received data on connection {John} [
I'd like m.find() to point to "987: pump...". In order to get this, im using a regex with lookahead:
"(\\d{3}:)(?!\\d{4}/\\d{2}/\\d{2}\\s\\d{2}:\\d{2}:\\d{2}:\\d{6})"
But right now m.find() is pointing to 819 (contained in 2012/04/02 16:28:51:861819).

Your regex is very near to the one you need.
In your negative lookhead, you just forgot that different timestamps are separated by several characters. So you have to add .+ or .* in your lookahead to specify that.
Here is the regex you need:
"(\\d{3}):(?!.+\\d{4}/\\d{2}/\\d{2}\\s\\d{2}:\\d{2}:\\d{2}:\\d{6})"
In your example, it will give you the "987" you are looking for.

If you are only interested in the last occurrence of three digits followed by a colon, wouldn't .*(\d{3}:) work?

Why don't you just use
(\\d{3}: \\w+)
and then use find.next() until there isn't any next?

Related

Regex match one character one time in any combination

Read much questions and answers but got no idea how to solve my problem.
I have a String something like this:
23424(223)+32 -32
allowed in the full String is:
any number multiple times anywhere of the String
one time ) anywhere of the String
one time ( anywhere of the String
one time + anywhere of the String
multiple times spaces anywhere of the String
multiple times - anywhere of the String
For me the most problem is to find the one time character anywhere of the String. Hope you can help me.
This example String should not match. 23424(223)3+3)2 -32
You can use this regex with negative lookaheads:
^(?!.*\(.*\()(?!.*\).*\))(?!.*\+.*\+)[\d ()+-]+$
We are using 3 negative lookaheads:
(?!.*\(.*\() # Negative lookahead to disallow more than one (
(?!.*\).*\)) # Negative lookahead to disallow more than one )
(?!.*\+.*\+) # Negative lookahead to disallow more than one +
RegEx Demo
Reference: Lookarounds in regex
In Java use:
String regex = "^(?!.*\\(.*\\()(?!.*\\).*\\))(?!.*\\+.*\\+)[\\d ()+-]+$";
You can solve it without regex , just iterate over the string , and put each character in a map with this logic
if(map.get(str.charAt(i)) == null)
map.put(str.charAt(i),1)
else
map.put(str.charAt(i),map.get(str.charAt(i))+1)
when done with the loop, check each one time character in the map and see which one has more than one occurrence

Regex for multiple instances of character

In Java, using a regular expression, how would I check a string to see if it had a correct amount of instances of a character.
For example take the string hello.world.hello:world:. How could this string be checked to see if it contained two instances of a . or two instances of a :?
I have tried
Pattern p = Pattern.compile("[:]{2}");
Matcher m = p.matcher(hello.world.hello:world:);
m.find();
but that failed.
Edit
First I would like to say thank you for all the answers. I noticed a lot of the answers said something along the lines of "This means: zero or more non-colons, followed by a single colon, followed by zero or more non-colons - matched exactly twice". So if you were checking for 3 : in a string such as Hello::World: how would you do it?
Well, using matches you could use:
"([^:]*:[^:]*){2}"
This means: "zero or more non-colons, followed by a single colon, followed by zero or more non-colons - matched exactly twice".
Using find is not as good, as there may be additional : and it will just ignore them.
You can use this regex based on two lookaheads assertions:
^(?=(?:[^.]*\.){2}[^.]*$)(?=(?:[^:]*:){2}[^:]*$)
(?=(?:[^.]*\.){2}[^.]*$) makes sure there are exactly 2 DOTS and (?=(?:[^:]*:){2}[^:]*$) asserts that there are exactly 2 colons in input string.
RegEx Demo
You can determine whether the string has exectly the given number of a certain character, say ':', by attempting to match it against a pattern of this form:
^(?:[^:]*[:]){2}[^:]*$
That says exactly two non-capturing groups consisting of any number (including zero) of characters other than ':' followed by one colon, with the second group followed by any number of additional characters other than ':'.

word range or \w in negative lookbehind

I was trying to made regex for extracting word at the place of Delhi in text
sending to: GK Delhi, where the sending to: is fixed and i don't want to capture whatever at the place of GK. Actually GK will be one word in my case, what i made which should work is: (?<=sending to: \w )Delhi, means if word starts with sending to: and ends with Delhi then return Delhi.
Please help me to fix this.
Three points,
\w matches a single word character. Use \w+ to match one or more or \w* to match zero or more word characters.
Don't forget about space between DK and Delhi: \s+.
Just a note: The (?<= construct is the positive lookbehind, not negative one.
So the regex could look like this:
(?<=sending to:\s*\w+\s+)Delhi
Please also note that arbitrary-length lookbehind is only supported by very few regex engines, but you didn't say anything about the tool you are using.
Update:
Java doesn't support arbitrary-length lookbehind expressions.
The possibilities you have are:
The matched text will always be Delhi (on successful match). So if you are only checking for a match, then you could just use the regex: sending to:\s*\w+\s+Delhi.
If you want to extend the regex to other towns in future, then you could use a capturing group. The regex would be, for example, sending to:\s*\w+\s+(Delhi|Mumbai) and in Java code you would get the city name via matcher.group(1).
Please post your actual Java code of how you are using the regex if you want a more detailed advice.

How to interpret following regex?

I have a regex but i am not able to interpret it: \w\1.
I thought it would match : aa since it had word a twice and first group would be a word for this regex. But its not behaving in this manner.
Does back referencing work only if we place parentheses around regex ?
Any help would be appreciated. Thanks.
\n refers to the nth capturing group. However, there are no capturing groups in your regex to refer to. You likely want:
(\w)\1
demo
As a Java string that would be "(\\w)\\1".
(\w)\1 Captures the matched subexpression and assigns it a zero-based ordinal number.

Formulating a regex with a single dot

I am trying to formulate a regex for the following scenario :
The String to match : mName87.com
So, the string may consist of any number of alpha numeric characters , but can contain only a single dot anywhere in the string .
I formulated this regex : [a-zA-Z0-9.], but it matches even multiple dots(.)
What am i doing wrong here ?
The regex you provided matches only a single character in the whole string you're trying to validate. There are a few things to take care of in your scenario
You want to match over the whole string, so your regex must start with ^ (beginning of the string) and end with $ (end of the string).
Then you want to accept any number of alpha-numeric characters, this is done with [a-zA-Z0-9]+, here the + means one or more characters.
Then match the point: \. (you must escape it here)
Finally accept more characters again.
All together the regex would then be:
^[a-zA-Z0-9]+\.[a-zA-Z0-9]+$
You can use this regex:
\\w*\\.\\w*
You can try here
Try with:
^([a-zA-Z0-9]+\.)+[a-zA-Z]$
use this regular expression ^[a-zA-Z0-9]*\.[a-zA-Z0-9.]*$
EDITED:
Try
([a-zA-Z0-9]+\.[a-zA-Z0-9]+)|(\.[a-zA-Z0-9]+)|([a-zA-Z0-9]+\.)
That is: [a word that ends with a dot] OR [two words and the dot in the middle] OR [a word that starts with a dot]

Categories