unable to understand how this regex operates on the input string? - java

I have the following regex pattern that I am splitting the String on. I don't understand how this is matched and the split happens. I do have basic understanding of regex and how it works.
public class URLmatching {
private static final Pattern SPLIT_PATTERN = Pattern.compile("(?<!(^|[A-Z0-9]))(?=[A-Z])|(?<!^)(?=[A-Z][a-z])");
public void print() throws URISyntaxException{
System.out.println(this.getClass().getSimpleName());
final String[] string = SPLIT_PATTERN.split(getClass().getSimpleName());
System.out.println(Arrays.toString(string));
}
public static void main(String[] args) throws URISyntaxException{
URLmatching u = new URLmatching();
u.print();
}
}
Output:
URLmatching
[UR, Lmatching]

This expression uses both negative look-behind (?<!), and positive look-ahead (?=).
Negative look-behind checks if the expression inside the paranthesis matches immediately before the current posistion. So, in the first example (?<!(^|[A-Z0-9])) it checks that the previous position was not beginning of string or any of [A-Z0-9].
Look-ahead checks if the expression matches immediately after the current position.
So, this expression will split if one of the two conditions matches:
(?<!(^|[A-Z0-9]))(?=[A-Z]) - This will match if the previous character is not beginning of string (^) or A-Z0-9 and the next one is A-Z. In other words, it won't match anywhere here.
An example where it would match would be UrlMatching, where it would match between l and M.
(?<!^)(?=[A-Z][a-z]) - This matches if the previous character is not beginning of string, and the next two characters are an upper-case letter (A-Z) followed by a lower-case letter (a-z). This only matches in one place, immediately before the upper-case L, giving you the output you observe.

Related

How to match two string using java Regex

String 1= abc/{ID}/plan/{ID}/planID
String 2=abc/1234/plan/456/planID
How can I match these two strings using Java regex so that it returns true? Basically {ID} can contain anything. Java regex should match abc/{anything here}/plan/{anything here}/planID
If your "{anything here}" includes nothing, you can use .*. . matches any letter, and * means that match the string with any length with the letter before, including 0 length. So .* means that "match the string with any length, composed with any letter". If {anything here} should include at least one letter, you can use +, instead of *, which means almost the same, but should match at least one letter.
My suggestion: abc/.+/plan/.+/planID
If {ID} can contain anything I assume it can also be empty.
So this regex should work :
str.matches("^abc.*plan.*planID$");
^abc at the beginning
.* Zero or more of any Character
planID$ at the end
I am just writing a small code, just check it and start making changes as per you requirement. This is working, check for your other test cases, if there is any issue please comment that test case. Specifically I am using regex, because you want to match using java regex.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class MatchUsingRejex
{
public static void main(String args[])
{
// Create a pattern to be searched
Pattern pattern = Pattern.compile("abc/.+/plan/.+/planID");
// checking, Is pattern match or not
Matcher isMatch = pattern.matcher("abc/1234/plan/456/planID");
if (isMatch.find())
System.out.println("Yes");
else
System.out.println("No");
}
}
If line always starts with 'abc' and ends with 'planid' then following way will work:
String s1 = "abc/{ID}/plan/{ID}/planID";
String s2 = "abc/1234/plan/456/planID";
String pattern = "(?i)abc(?:/\\S+)+planID$";
boolean b1 = s1.matches(pattern);
boolean b2 = s2.matches(pattern);

How regex lookaround works when used alone

public class Test {
public static void main(String[] args){
Pattern a = Pattern.compile("(?=\\.)|(?<=\\.)");
Matcher b = a.matcher(".");
while (b.find()) System.out.print("+");
}
}
I've been reading the lookaround section on Regular-Expressions.info and trying to figure out how it works, and I'm stuck with this thing. when I run the code above the result is ++, which I don't understand, because since "." is the only token to match the pattern against, and apparently there's nothing behind or ahead of the "." so how can it match twice?
As the regex engine advances through the input, it considers both characters and positions before and after characters as distinct positions within the input.
Your input has 3 positions:
Just before the first character
The first character
Just after the first character
Position 1 matches (?=\\.).
Position 3 matches (?<=\\.).

How to use regular expression to replace non-digits and math operators together?

How do I only keep chars of [0-9] and [+-*/] in a string in Java? My approach is to use a union to create a single character class comprised of [0-9] and [+-*/] character classes, but I got an empty string.
Here is an example string I use: 10+2*2-5
public void cleanup(String s){
String regex = "[^0-9[^+-*//]]";
String tmp = s.replaceAll(regex, "");
System.out.println(tmp);
}
You want you character class ([...]) to include the range 0-9 and the additional characters *, /, - and +. All you need is to put them one after the other and escape - (\\-), unless it's the last character. Then, use a negation construct (^) inside at the beginning:
public class Example {
public static void main(String[] args) {
String test = "a3f6+[,b7*\"d/-8u";
System.out.println(test.replaceAll("[^0-9/*+-]", ""));
}
}
Outputs
36+7*/-8
What about s/[^0-9]|[^+-*//]//g for the regex?
You have an issue there with the way you have use - in the second part, if you use - in the middle of an expression like you have in [^+-*/] it thinks that is part of a range expression like you did with 0-9 so you need to put the - at the end of the expression so that it isn't treated as a range.
The following expression should do what you are after:
[^0-9*/+-]

Translate regex for Java

How can this regex ^\d+(?:[\.\,]\d+)?$ to be usable with Java.
input.matches("^\\d+(?:[\\.\\,]\\d+)?$"); // Redundant character escape
Your expression is fine. Note: Using matches; implicitly adds ^ at the start and $ at the end of your pattern. Also, you do not need to escape the characters inside of your character class.
input.matches("\\d+(?:[,.]\\d+)?");
Your code executes fine
public static void main(String[] args) throws Exception {
String input = "123";
System.out.println(input.matches("^\\d+(?:[\\.\\,]\\d+)?$"));
input = "123.123";
System.out.println(input.matches("^\\d+(?:[\\.\\,]\\d+)?$"));
input = "123,123";
System.out.println(input.matches("^\\d+(?:[\\.\\,]\\d+)?$"));
input = "123..123";
System.out.println(input.matches("^\\d+(?:[\\.\\,]\\d+)?$"));
}
prints
true
true
true
false
As per regex101, your matched string will start with one ore more digits, followed by a non capturing group that occurs once or not at all, containing a . or a ,, literally, and one or more digits, and then end.
That's what you have, that's what will match.

Regex of base classes

I am trying to create a hexadecimal calculator but I have a problem with the regex.
Basically, I want the string to only accept 0-9, A-E, and special characters +-*_
My code keeps returning false no matter how I change the regex, and the adding the asterisk is giving me a PatternSyntaxException error.
public static void main(String[] args) {
String input = "1A_16+2B_16-3C_16*4D_16";
String regex = "[0-9A-E+-_]";
System.out.println(input.matches(regex));
}
Also whenever I add the * as part of the regex it gives me this error:
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal character range near index 9
[0-9A-E+-*_]+
^
You need to match more than one character with your regex. As it currently stands you only match one character.
To match one or more characters add a + to the end of the regex
[0-9A-E+-_]+
Also to match a * just add a star in the brackets so the final regex would be
[0-9A-E+\\-_*]+
You need to escape the - otherwise the regex thinks you want to accept all character between + and _ which is not what you want.
You regex is OK there should be no exceptions, just add + at the end of regex which means one or more characters like those in brackets, and it seems you wanted * as well
"[0-9A-E+-_]+"
public static boolean isValidCode (String code) {
Pattern p = Pattern.compile("[fFtTvV\\-~^<>()]+"); //a-zA-Z
Matcher m = p.matcher(code);
return m.matches();
}

Categories