Differences in RegEx syntax between Python and Java - java

I have a working regex in Python and I am trying to convert to Java. It seems that there is a subtle difference in the implementations.
The RegEx is trying to match another reg ex. The RegEx in question is:
/(\\.|[^[/\\\n]|\[(\\.|[^\]\\\n])*])+/([gim]+\b|\B)
One of the strings that it is having problems on is: /\s+/;
The reg ex is not supposed to be matching the ending ;. In Python the RegEx works correctly (and does not match the ending ;, but in Java it does include the ;.
The Question(s):
What can I do to get this RegEx working in Java?
Based on what I read here there should be no difference for this RegEx. Is there somewhere a list of differences between the RegEx implementations in Python vs Java?

Java doesn't parse Regular Expressions in the same way as Python for a small set of cases. In this particular case the nested ['s were causing problems. In Python you don't need to escape any nested [ but you do need to do that in Java.
The original RegEx (for Python):
/(\\.|[^[/\\\n]|\[(\\.|[^\]\\\n])*])+/([gim]+\b|\B)
The fixed RegEx (for Java and Python):
/(\\.|[^\[/\\\n]|\[(\\.|[^\]\\\n])*\])+/([gim]+\b|\B)

The obvious difference b/w Java and Python is that in Java you need to escape a lot of characters.
Moreover, you are probably running into a mismatch between the matching methods, not a difference in the actual regex notation:
Given the Java
String regex, input; // initialized to something
Matcher matcher = Pattern.compile( regex ).matcher( input );
Java's matcher.matches() (also Pattern.matches( regex, input )) matches the entire string. It has no direct equivalent in Python. The same result can be achieved by using re.match( regex, input ) with a regex that ends with $.
Java's matcher.find() and Python's re.search( regex, input ) match any part of the string.
Java's matcher.lookingAt() and Python's re.match( regex, input ) match the beginning of the string.
For more details also read Java's documentation of Matcher and compare to the Python documentation.
Since you said that isn't the problem, I decided to do a test: http://ideone.com/6w61T
It looks like java is doing exactly what you need it to (group 0, the entire match, doesn't contain the ;). Your problem is elsewhere.

Related

How can I make this into a Java regex?

I used regex101 to make my expression, and it looks like this using their symbols
\d+ [+-\/*] \d*
Basically I want a user to enter like 123 + 123 but the entire statement is one string with exactly one space after the first number and one space after the operator
The above expression works, but It doesn't convert the same into Java.
I thought these symbols were universal, but I guess not. Any ideas how to convert this to the proper syntax?
Regular expressions are not universal.
In general,
no two regular expression systems are the same.
Java does not have regular expressions.
Some Java classes support regular expressions.
The Pattern class defines the regular expressions that are used by some Java classes including Matcher which seems likely to be the class you are using.
As already identified in the comments,
\ is the escape-the-next-character character in Java.
If you want to represent \ in a String,
you must use \\.
For example,
\d in a regular expression must be written \\d in a Java String.
You can simply use groups () and design a RegEx as you wish. This RegEx might be one way to do so:
((\d+\s)(\+|\-)(\s\d+))
It has four groups, and you can simply call the entire input using $1:
You can also escape \ those required language-based chars.

Translate php regex to java

I have trouble to translate this php regex /^([-\.\w]+)$/ to java regex.
I try ^([-\\.\\w]+)$ but don't work.
The regex is used to validate a string used for a name of file.
in PHP is not allowed têst.ext, but in JAVA it's.
In java, it would be:
str.matches("[-.\\w]+")
There is no need to escape the dot in a character class in any language/tool.
There is no need to use ^ or $ with java's String#matches() because it's implied (the whole string must match)
There is no need to create a group (the brackets)

Regular Expressions match randomly instead of around quotes in Java

I am writing a program in Java, using Regular expressions, and have run into an error. What I am trying to do, is basically make a programming language, and parse it line by line. Where I am going wrong, is when it tries to find any strings. The thing is, is that I have to have it in the order of identifiers, strings, then integers, but I can have the identifiers find strings. Strings are defined by having double quotes around them. Here is where I have a test, and my expression: here, or here, if you do not want to go to the link:
[^"]([^\W][a-zA-Z0-9]+)[^"]
I cannot show my Java code, because it is all over the place, with the way I programmed it. It should just be the expression, and that's it.
It would be helpful if you can explain more what exactly you are trying to match. E.g. give some example texts and what your expression currently outputs for them.
At the moment I think you are trying to match Strings, text that is surrounded by ". For example foofoo"text123"barbar and your desired output is text123.
If defining a regular expression in Java, you need to escape special characters like ". Here is a Java-usable version for the Regex you have provided:
Pattern pattern = Pattern.compile("[^\"]([^\\W][a-zA-Z0-9]+)[^\"]");
You may then use the Pattern object together with a Matcher object to find your text. Here's the Java-Doc for Pattern.
Here is a Pattern that matches text surrounded by ":
Pattern pattern = Pattern.compile("\"[^\"]*\"");

How do I translate this Perl regular expression into Java?

How would you translate this Perl regex into Java?
/pattern/i
While compiles, it does not match "PattErn" for me, it fails
Pattern p = Pattern.compile("/pattern/i");
Matcher m = p.matcher("PattErn");
System.out.println(m.matches()); // prints "false"
How would you translate this Perl regex into Java?
/pattern/i
You can't.
There are a lot of reasons for this. Here are a few:
Java doesn't support as expressive a regex language as Perl does. It lacks grapheme support (like \X) and full property support (like \p{Sentence_Break=SContinue}), is missing Unicode named characters, doesn't have a (?|...|...|) branch reset operator, doesn’t have named capture groups or a logical \x{...} escape before Java 7, has no recursive regexes, etc etc etc. I could write a book on what Java is missing here: Get used to going back to a very primitive and awkward to use regex engine compared with what you’re used to.
Another even worse problem is because you have lookalike faux amis like \w and and \b and \s, and even \p{alpha} and \p{lower}, which behave differently in Java compared with Perl; in some cases the Java versions are completely unusable and buggy. That’s because Perl follows UTS#18 but before Java 7, Java did not. You must add the UNICODE_CHARACTER_CLASSES flag from Java 7 to get these to stop being broken. If you can’t use Java 7, give up now, because Java had many many many other Unicode bugs before Java 7 and it just isn’t worth the pain of dealing with them.
Java handles linebreaks via ^ and $ and ., but Perl expects Unicode linebreaks to be \R. You should look at UNIX_LINES to understand what is going on there.
Java does not by default apply any Unicode casefolding whatsoever. Make sure to add the UNICODE_CASE flag to your compilation. Otherwise you won’t get things like the various Greek sigmas all matching one another.
Finally, it is different because at best Java only does simple casefolding, while Perl always does full casefolding. That means that you won’t get \xDF to match "SS" case insensitively in Java, and similar related issues.
In summary, the closest you can get is to compile with the flags
CASE_INSENSITIVE | UNICODE_CASE | UNICODE_CHARACTER_CLASSES
which is equivalent to an embedded "(?iuU)" in the pattern string.
And remember that match in Java doesn’t mean match, perversely enough.
EDIT
And here’s the rest of the story...
While compiles, it does not match "PattErn" for me, it fails
Pattern p = Pattern.compile("/pattern/i");
Matcher m = p.matcher("PattErn");
System.out.println(m.matches()); // prints "false"
You shouldn’t have slashes around the pattern.
The best you can do is to translate
$line = "I have your PaTTerN right here";
if ($line =~ /pattern/i) {
print "matched.\n";
}
this way
import java.util.regex.*;
String line = "I have your PaTTerN right here";
String pattern = "pattern";
Pattern regcomp = Pattern.compile(pattern, CASE_INSENSITIVE
| UNICODE_CASE
// comment next line out for legacy Java \b\w\s breakage
| UNICODE_CHARACTER_CLASSES
);
Matcher regexec = regcomp.matcher(line);
if (regexec.find()) {
System.out.println("matched");
}
There, see how much easier that isn’t? :)
Java regex do not have delimiters, and use a separate argument for modifies:
Pattern p = Pattern.compile("pattern", Pattern.CASE_INSENSITIVE);
The Perl equivalent of:
/pattern/i
in Java would be:
Pattern p = Pattern.compile("(?i)pattern");
Or simply do:
System.out.println("PattErn".matches("(?i)pattern"));
Note that "string".matches("pattern") validates the pattern against the entire input string. In other words, the following would return false:
"foo pattern bar".matches("pattern")

Differences in regex syntax in Python and Java

I have a following working regex in Python and I am trying to convert it to Java, I thought that regex works the same in both languages, but obviously it doesn't.
Python regex: ^\d+;\d+-\d+
My Java attempt: ^\\d+;\\d+-\\d+
Example strings that should be matched:
3;1-2,2-3
68;12-15,1-16,66-1,1-2
What is the right solution in Java?
Thank you, Tomas
The regex is faulty for the input, don't know what you were doing in Python, but this isn't matching the whole strings in any regex I know.
This should do the trick (escaping characters are omitted):
^\d+;(\d+-\d+,?)+
I.e. you need to continue matching the number pairs separated by commas.

Categories