Translate php regex to java - java

I have trouble to translate this php regex /^([-\.\w]+)$/ to java regex.
I try ^([-\\.\\w]+)$ but don't work.
The regex is used to validate a string used for a name of file.
in PHP is not allowed têst.ext, but in JAVA it's.

In java, it would be:
str.matches("[-.\\w]+")
There is no need to escape the dot in a character class in any language/tool.
There is no need to use ^ or $ with java's String#matches() because it's implied (the whole string must match)
There is no need to create a group (the brackets)

Related

Regex pattern for multi equal operation

My goal is to compare one string with multiple other strings for equal operation using only regex in java 8.
I used below syntax
"^UK (Main Land)|German|Japan|Swiss|French|Italian$"
But this syntax works good for German,Japan,Swiss,French but validation fails for UK (Main Land) and Italian.
What the change that I have to make it work?
There are a couple of issues here.
The parentheses, as literal chars, must be escaped.
If you use Matcher.find(), you need the ^ and $ anchors to make sure the pattern matches the entire string (although \A and \z would be better), but you need to group the alternatives with either (...) or (?:...).
You do not need the group and anchors if you use String.matches() or Pattern.matches that ensure an entire string match.
I'd rather use
Boolean result = text.matches("UK \\(Main Land\\)|German|Japan|Swiss|French|Italian");

How can I make this into a Java regex?

I used regex101 to make my expression, and it looks like this using their symbols
\d+ [+-\/*] \d*
Basically I want a user to enter like 123 + 123 but the entire statement is one string with exactly one space after the first number and one space after the operator
The above expression works, but It doesn't convert the same into Java.
I thought these symbols were universal, but I guess not. Any ideas how to convert this to the proper syntax?
Regular expressions are not universal.
In general,
no two regular expression systems are the same.
Java does not have regular expressions.
Some Java classes support regular expressions.
The Pattern class defines the regular expressions that are used by some Java classes including Matcher which seems likely to be the class you are using.
As already identified in the comments,
\ is the escape-the-next-character character in Java.
If you want to represent \ in a String,
you must use \\.
For example,
\d in a regular expression must be written \\d in a Java String.
You can simply use groups () and design a RegEx as you wish. This RegEx might be one way to do so:
((\d+\s)(\+|\-)(\s\d+))
It has four groups, and you can simply call the entire input using $1:
You can also escape \ those required language-based chars.

Add Dash to Java Regex

I am trying to modify an existing Regex expression being pulled in from a properties file from a Java program that someone else built.
The current Regex expression used to match an email address is -
RR.emailRegex=^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
That matches email addresses such as abc.xyz#example.com, but now some email addresses have dashes in them such as abc-def.xyz#example.com and those are failing the Regex pattern match.
What would my new Regex expression be to add the dash to that regular expression match or is there a better way to represent that?
Basing on the regex you are using, you can add the dash into your character class:
RR.emailRegex=^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
add
RR.emailRegex=^[a-zA-Z0-9_\\.-]+#[a-zA-Z0-9_-]+\\.[a-zA-Z0-9_-]+$
Btw, you can shorten your regex like this:
RR.emailRegex=^[\\w.-]+#[\\w-]+\\.[\\w-]+$
Anyway, I would use Apache EmailValidator instead like this:
if (EmailValidator.getInstance().isValid(email)) ....
Meaning of - inside a character class is different than used elsewhere. Inside character class - denotes range. e.g. 0-9. If you want to include -, write it in beginning or ending of character class like [-0-9] or [0-9-].
You also don't need to escape . inside character class because it is treated as . literally inside character class.
Your regex can be simplified further. \w denotes [A-Za-z0-9_]. So you can use
^[-\w.]+#[\w]+\.[\w]+$
In Java, this can be written as
^[-\\w.]+#[\\w]+\\.[\\w]+$
^[a-zA-Z0-9_\\.\\-]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
Should solve your problem. In regex you need to escape anything that has meaning in the Regex engine (eg. -, ?, *, etc.).
The correct Regex fix is below.
OLD Regex Expression
^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
NEW Regex Expression
^[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$
Actually I read this post it covers all special cases, so the best one that's work correctly with java is
String pattern ="(?:[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")#(?:(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?|\\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-zA-Z0-9-]*[a-zA-Z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])";

Differences in RegEx syntax between Python and Java

I have a working regex in Python and I am trying to convert to Java. It seems that there is a subtle difference in the implementations.
The RegEx is trying to match another reg ex. The RegEx in question is:
/(\\.|[^[/\\\n]|\[(\\.|[^\]\\\n])*])+/([gim]+\b|\B)
One of the strings that it is having problems on is: /\s+/;
The reg ex is not supposed to be matching the ending ;. In Python the RegEx works correctly (and does not match the ending ;, but in Java it does include the ;.
The Question(s):
What can I do to get this RegEx working in Java?
Based on what I read here there should be no difference for this RegEx. Is there somewhere a list of differences between the RegEx implementations in Python vs Java?
Java doesn't parse Regular Expressions in the same way as Python for a small set of cases. In this particular case the nested ['s were causing problems. In Python you don't need to escape any nested [ but you do need to do that in Java.
The original RegEx (for Python):
/(\\.|[^[/\\\n]|\[(\\.|[^\]\\\n])*])+/([gim]+\b|\B)
The fixed RegEx (for Java and Python):
/(\\.|[^\[/\\\n]|\[(\\.|[^\]\\\n])*\])+/([gim]+\b|\B)
The obvious difference b/w Java and Python is that in Java you need to escape a lot of characters.
Moreover, you are probably running into a mismatch between the matching methods, not a difference in the actual regex notation:
Given the Java
String regex, input; // initialized to something
Matcher matcher = Pattern.compile( regex ).matcher( input );
Java's matcher.matches() (also Pattern.matches( regex, input )) matches the entire string. It has no direct equivalent in Python. The same result can be achieved by using re.match( regex, input ) with a regex that ends with $.
Java's matcher.find() and Python's re.search( regex, input ) match any part of the string.
Java's matcher.lookingAt() and Python's re.match( regex, input ) match the beginning of the string.
For more details also read Java's documentation of Matcher and compare to the Python documentation.
Since you said that isn't the problem, I decided to do a test: http://ideone.com/6w61T
It looks like java is doing exactly what you need it to (group 0, the entire match, doesn't contain the ;). Your problem is elsewhere.

Regex in GWT to match URLs

I implemented the Pattern class as shown here:
http://www.java2s.com/Code/Java/GWT/ImplementjavautilregexPatternwithJavascriptRegExpobject.htm
And I would like to use the following regex to match urls in my String:
(http|https):\/\/(\w+:{0,1}\w*#)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%#!\-\/]))?
Unfortunately, the Java compiler of course fails on parsing that string because it doesn't use valid escape sequences (since the above is technically a url pattern for JavaScript, not Java)
At the end of the day, I'm looking for a regex pattern that will both compile in Java and execute in JavaScript correctly.
You will have to use JSNI to do the regex evaluation part in Javascript. If you do write the regex with the escaped backslashes, that will get converted to Javascript as it is and will obviously be invalid. Thought it will work in the Hosted or Dev mode as thats still running Java bytecode, but not on the compiled application.
A simple JSNI example to test if a given string is a valid URL:
// Java method
public native boolean isValidUrl(String url) /*-{
var pattern = /(http|https):\/\/(\w+:{0,1}\w*#)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%#!\-\/]))?/;
return pattern.test(url);
}-*/;
There may be other irregularities between the Java and Javascript regex engines, so it's better to offload it completely to Javascript at least for moderately complex regexes.
The pattern itself looks fine, but I guess, its because of Backslash escaping.
Please take a look this http://www.regular-expressions.info/java.html
In literal Java strings the backslash
is an escape character. The literal
string "\\" is a single backslash. In
regular expressions, the backslash is
also an escape character. The regular
expression \\ matches a single
backslash. This regular expression as
a Java string, becomes "\\\\". That's
right: 4 backslashes to match a single
one.
So, if you reuse your Javascript regex in java, you need to replace \ to \\, and vice versa.
I don't know exactly how this would help but here is the exact function you requested in Javascript. I guess using JSNI like Anurag said will help.
var urlPattern = "(https?|ftp)://(www\\.)?(((([a-zA-Z0-9.-]+\\.){1,}[a-zA-Z]{2,4}|localhost))|((\\d{1,3}\\.){3}(\\d{1,3})))(:(\\d+))?(/([a-zA-Z0-9-._~!$&'()*+,;=:#/]|%[0-9A-F]{2})*)?(\\?([a-zA-Z0-9-._~!$&'()*+,;=:/?#]|%[0-9A-F]{2})*)?(#([a-zA-Z0-9._-]|%[0-9A-F]{2})*)?";
function isValidURL(url) {
urlPattern = "^" + urlPattern + "$";
var regex = new RegExp(urlPattern);
return regex.test(url);
}
Like what #S.Mark said, I basically took the "java" way of doing Regular Expression in Javascript.
In Java, you would just done it the following way (see how the expression is the same).
String urlPattern = "(https?|ftp)://(www\\.)?(((([a-zA-Z0-9.-]+\\.){1,}[a-zA-Z]{2,4}|localhost))|((\\d{1,3}\\.){3}(\\d{1,3})))(:(\\d+))?(/([a-zA-Z0-9-._~!$&'()*+,;=:#/]|%[0-9A-F]{2})*)?(\\?([a-zA-Z0-9-._~!$&'()*+,;=:/?#]|%[0-9A-F]{2})*)?(#([a-zA-Z0-9._-]|%[0-9A-F]{2})*)?";
Hope this helps. PS, this Regular expression works and even validates sites pointing to localhost:port) where port is any digit port number.

Categories