How regex lookaround works when used alone - java

public class Test {
public static void main(String[] args){
Pattern a = Pattern.compile("(?=\\.)|(?<=\\.)");
Matcher b = a.matcher(".");
while (b.find()) System.out.print("+");
}
}
I've been reading the lookaround section on Regular-Expressions.info and trying to figure out how it works, and I'm stuck with this thing. when I run the code above the result is ++, which I don't understand, because since "." is the only token to match the pattern against, and apparently there's nothing behind or ahead of the "." so how can it match twice?

As the regex engine advances through the input, it considers both characters and positions before and after characters as distinct positions within the input.
Your input has 3 positions:
Just before the first character
The first character
Just after the first character
Position 1 matches (?=\\.).
Position 3 matches (?<=\\.).

Related

Why does this regex fails to check accurately?

I have the following regex method which does the matches in 3 stages for a given string. But for some reason the Regex fails to check some of the things. As per whatever knowledge I have gained by working they seem to be correct. Can someone please correct me what am I doing wrong here?
I have the following code:
public class App {
public static void main(String[] args) {
String identifier = "urn:abc:de:xyz:234567.1890123";
if (identifier.matches("^urn:abc:de:xyz:.*")) {
System.out.println("Match ONE");
if (identifier.matches("^urn:abc:de:xyz:[0-9]{6,12}.[0-9]{1,7}.*")) {
System.out.println("Match TWO");
if (identifier.matches("^urn:abc:de:xyz:[0-9]{6,12}.[a-zA-Z0-9.-_]{1,20}$")) {
System.out.println("Match Three");
}
}
}
}
}
Ideally, this code should generate the output
Match ONE
Match TWO
Match Three
Only when the identifier = "urn:abc:de:xyz:234567.1890123.abd12" but it provides the same output event if the identifier does not match the regex such as for the following inputs:
"urn:abc:de:xyz:234567.1890123"
"urn:abc:de:xyz:234567.1890ANC"
"urn:abc:de:xyz:234567.1890123"
"urn:abc:de:xyz:234567.1890ACB.123"
I am not understanding why is it allowing the Alphanumeric characters after the . and also it does not care about the characters after the second ..
I would like my Regex to check that the string has the following format:
String starts with urn:abc:de:xyz:
Then it has the numbers [0-9] which range from 6 to 12 (234567).
Then it has the decimal point .
Then it has the numbers [0-9] which range from 1 to 7 (1890123)
Then it has the decimal point ..
Finally it has the alphanumeric character and spcial character which range from 1 to 20 (ABC123.-_12).
This is an valid string for my regex: urn:abc:de:xyz:234567.1890123.ABC123.-_12
This is an invalid string for my regex as it misses the elements from point 6:
urn:abc:de:xyz:234567.1890123
This is also an invalid string for my regex as it misses the elements from point 4 (it has ABC instead of decimal numbers).
urn:abc:de:xyz:234567.1890ABC.ABC123.-_12
This part of the regex:
[0-9]{6,12}.[0-9]{1,7} matches 6 to 12 digits followed by any character followed by 1 to 7 digits
To match a dot, it needs to be escaped. Try this:
^urn:abc:de:xyz:[0-9]{6,12}\.[0-9]{1,7}\.[a-zA-Z0-9\-_]{1,20}$
This will match with any number of dot alphanum at the end of the string as your examples:
^urn:abc:de:xyz:\d{6,12}\.\d{1,7}(?:\.[\w-]{1,20})+$
Demo & explanation

Regular expression not matching on first and last word of string

I am trying to write a java program that will look for a specific words in a string. I have it working for the most part but it doesnt seem to match if the word to match is the first or last word in the string. Here is an example:
"trying to find the first word".matches(".*[^a-z]find[^a-z].*") //returns true
"trying to find the first word".matches(".*[^a-z]trying[^a-z].*") //returns false
"trying to find the first word".matches(".*[^a-z]word[^a-z].*") //returns false
Any idea how to make this match on any word in the string?
Thanks in advance,
Craig
The problem is your character class before and after the words [^a-z]- I think that what you actually want is a word boundary character \b (as per ColinD's comment) as opposed to not a character in the a-z range. As pointed out in the comments (thanks) you'll also needs to handle the start and end of string cases.
So try, eg:
"(?:^|.*\b)trying(?:\b.*|$)"
You can use the optional (?) , check below link and test more cases if this give proper output:
https://regex101.com/r/oP5zB8/1
(.*[^a-z]?trying[^a-z]?.*)
I think (^|^.*[^a-z])trying([^a-z].*$|$) just fits your need.
Or (?:^|^.*[^a-z])trying(?:[^a-z].*$|$) for non capturing parentheses.
You can try following program to check the existence on start and end of any string:
package com.ajsodhi.utilities;
import java.util.regex.Pattern;
public class RegExStartEndWordCheck {
public static final String stringToMatch = "StartingsomeWordsEndWord";
public static void main(String[] args) {
String regEx = "Starting[A-Za-z0-9]{0,}EndWord";
Pattern patternOriginalSign = Pattern.compile(regEx, Pattern.CASE_INSENSITIVE);
boolean OriginalStringMatchesPattern = patternOriginalSign.matcher(stringToMatch).matches();
System.out.println(OriginalStringMatchesPattern);
}
}
you should use the boundary \b that's specify a beginning or a ending of a word instead of [^a-z] which is not so logic.
Just something like
".*\\bfind\\b.*"

unable to understand how this regex operates on the input string?

I have the following regex pattern that I am splitting the String on. I don't understand how this is matched and the split happens. I do have basic understanding of regex and how it works.
public class URLmatching {
private static final Pattern SPLIT_PATTERN = Pattern.compile("(?<!(^|[A-Z0-9]))(?=[A-Z])|(?<!^)(?=[A-Z][a-z])");
public void print() throws URISyntaxException{
System.out.println(this.getClass().getSimpleName());
final String[] string = SPLIT_PATTERN.split(getClass().getSimpleName());
System.out.println(Arrays.toString(string));
}
public static void main(String[] args) throws URISyntaxException{
URLmatching u = new URLmatching();
u.print();
}
}
Output:
URLmatching
[UR, Lmatching]
This expression uses both negative look-behind (?<!), and positive look-ahead (?=).
Negative look-behind checks if the expression inside the paranthesis matches immediately before the current posistion. So, in the first example (?<!(^|[A-Z0-9])) it checks that the previous position was not beginning of string or any of [A-Z0-9].
Look-ahead checks if the expression matches immediately after the current position.
So, this expression will split if one of the two conditions matches:
(?<!(^|[A-Z0-9]))(?=[A-Z]) - This will match if the previous character is not beginning of string (^) or A-Z0-9 and the next one is A-Z. In other words, it won't match anywhere here.
An example where it would match would be UrlMatching, where it would match between l and M.
(?<!^)(?=[A-Z][a-z]) - This matches if the previous character is not beginning of string, and the next two characters are an upper-case letter (A-Z) followed by a lower-case letter (a-z). This only matches in one place, immediately before the upper-case L, giving you the output you observe.

Translate regex for Java

How can this regex ^\d+(?:[\.\,]\d+)?$ to be usable with Java.
input.matches("^\\d+(?:[\\.\\,]\\d+)?$"); // Redundant character escape
Your expression is fine. Note: Using matches; implicitly adds ^ at the start and $ at the end of your pattern. Also, you do not need to escape the characters inside of your character class.
input.matches("\\d+(?:[,.]\\d+)?");
Your code executes fine
public static void main(String[] args) throws Exception {
String input = "123";
System.out.println(input.matches("^\\d+(?:[\\.\\,]\\d+)?$"));
input = "123.123";
System.out.println(input.matches("^\\d+(?:[\\.\\,]\\d+)?$"));
input = "123,123";
System.out.println(input.matches("^\\d+(?:[\\.\\,]\\d+)?$"));
input = "123..123";
System.out.println(input.matches("^\\d+(?:[\\.\\,]\\d+)?$"));
}
prints
true
true
true
false
As per regex101, your matched string will start with one ore more digits, followed by a non capturing group that occurs once or not at all, containing a . or a ,, literally, and one or more digits, and then end.
That's what you have, that's what will match.

Regex of base classes

I am trying to create a hexadecimal calculator but I have a problem with the regex.
Basically, I want the string to only accept 0-9, A-E, and special characters +-*_
My code keeps returning false no matter how I change the regex, and the adding the asterisk is giving me a PatternSyntaxException error.
public static void main(String[] args) {
String input = "1A_16+2B_16-3C_16*4D_16";
String regex = "[0-9A-E+-_]";
System.out.println(input.matches(regex));
}
Also whenever I add the * as part of the regex it gives me this error:
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal character range near index 9
[0-9A-E+-*_]+
^
You need to match more than one character with your regex. As it currently stands you only match one character.
To match one or more characters add a + to the end of the regex
[0-9A-E+-_]+
Also to match a * just add a star in the brackets so the final regex would be
[0-9A-E+\\-_*]+
You need to escape the - otherwise the regex thinks you want to accept all character between + and _ which is not what you want.
You regex is OK there should be no exceptions, just add + at the end of regex which means one or more characters like those in brackets, and it seems you wanted * as well
"[0-9A-E+-_]+"
public static boolean isValidCode (String code) {
Pattern p = Pattern.compile("[fFtTvV\\-~^<>()]+"); //a-zA-Z
Matcher m = p.matcher(code);
return m.matches();
}

Categories