I have escaped special characters and verified that the string passed to the Pattern is what I want.
I printed it on the screen and all double slashes were single again.
Particularly, I want these to be found:
\z.\s.\f.jtuy \z.yu \aw.o
lambda expressions. My regex is
(\\[a-z]{1,}\.){1,}[a-z]{1,}
and it - as I said - is working online. But why not in eclipse?
Do double backslashes get to the Pattern unchanged?
Is there any replacement for them?
Thanks.
If you mean "in the Java sourcecode" by saying "in Eclipse" you might need to use four backslashes: four backslashes will become two backslashes for the regex engine. You need to escape the backslash twice: once for the Java string and the second time for the regex engine.
Related
I used regex101 to make my expression, and it looks like this using their symbols
\d+ [+-\/*] \d*
Basically I want a user to enter like 123 + 123 but the entire statement is one string with exactly one space after the first number and one space after the operator
The above expression works, but It doesn't convert the same into Java.
I thought these symbols were universal, but I guess not. Any ideas how to convert this to the proper syntax?
Regular expressions are not universal.
In general,
no two regular expression systems are the same.
Java does not have regular expressions.
Some Java classes support regular expressions.
The Pattern class defines the regular expressions that are used by some Java classes including Matcher which seems likely to be the class you are using.
As already identified in the comments,
\ is the escape-the-next-character character in Java.
If you want to represent \ in a String,
you must use \\.
For example,
\d in a regular expression must be written \\d in a Java String.
You can simply use groups () and design a RegEx as you wish. This RegEx might be one way to do so:
((\d+\s)(\+|\-)(\s\d+))
It has four groups, and you can simply call the entire input using $1:
You can also escape \ those required language-based chars.
take these strings for example:
"hello world\n" (correct - regex should match this)
"I'm happy \ here" (this is incorrect as the escape character is not
used correctly - regex should not match this one)
I've tried searching on google but didn't find anything helpful.
I want this one to be used in a parser which only parses string literals from a java code file.
Here is the the regex I used:
"\\\"(\\[tbnrf\'\"\\])*[a-zA-Z0-9\\`\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)\\_\\-\\+\\=\\|\\{\\[\\}\\]\\;\\:\\'\\/\\?\\>\\.\\<\\,]\\\""
what am I doing wrong?
I guess you gave us the regex in Java String literal form, like
String regex = \"(\[tbnrf'"\])*[a-zA-Z0-9\`\~\!\#\#\$\%\^\&\*\(\)\_\-\+\=\|\{\[\}\]\;\:\'\/\?\>\.\<\,]\";
Unpacking that from Java's String escaping syntax gives the raw regex:
\"(\[tbnrf'"\])*[a-zA-Z0-9\`\~\!\#\#\$\%\^\&\*\(\)\_\-\+\=\|\{\[\}\]\;\:\'\/\?\>\.\<\,]\"
That consists of:
\" matching a double-quote character (Java String literal begins here). Escaping the double quotes with backslash isn't necessary: " on its own is ok as well.
(\[tbnrf'"\])*: a group, repeated 0...n times. I guess you want that to match against the various Java backslash escapes, but that should read (\\[tbnrf'"\\])* with a double backslash in front and inside the character class. And maybe you want to cover the Java octal escapes as well (see the language specification), giving (\\[tbnrf01234567'"\\])*
[a-zA-Z0-9\``\~\!\#\#\$\%\^\&\*\(\)\_\-\+\=\|\{\[\}\]\;\:\'\/\?\>\.\<\,]: a character class matching one character from a selected list of alphabetic and punctuation characters. I'd replace that with [^"\\], meaning anything but double quote or backslash.
\" matching a double-quote character (string literal ends here). Once again, no need to escape the double quote.
Besides the individual elements, the overall structure of the regex probably isn't what you want: You allow only strings beginning with any number of backslash escapes, followed by exactly one non-escape character, and this enclosed in a pair of double quotes.
The overall structure should instead be "(backslash_escape|simple_character)*"
So, the complete regex would be:
"(\\[tbnrf01234567'"\\]|[^"\\])*"
or, expressed in a Java literal:
String regex = "\"(\\\\[tbnrf01234567'\"\\\\]|[^\"\\\\])*\"";
And, although this is shorter than your original attempt, I'd still not call it readable and opt for a different implementation, not using regular expressions.
P.S. Although I did some testing with my regex, I'm not at all sure that it covers all relevant cases correctly.
P.P.S. There are the \uxxxx escapes, not yet covered by the regex.
I try to use a series of delimiter for an input. It's for a homework. They said that we should use backslash () too. If I use it like this (it's at the end):
scanner.useDelimiter("\\;|\\:|\\?|\\~|/|\\.|,|\\<|\\>|\\`|\\[|\\]|\\{|\\}|\\(|\\)|\\!|\\#|\\#|\\$|\\%|\\^|\\&|\\-|\\_|\\+|\\'|\\=|\\*|\"|\\||\n|\t|\r|\\");
It won't work. It says unsupported escape sequence. If I add another backslash it says Illegal line end in string literal. If I add another it will escape to double backslash and that's not what I need.
I couldn't find any solution for this and that's why I'm asking. I already finished the homework and I used Scanner and right now changing it it's not a solution (a lot to re-implement).
Thank you.
You should use four backslashes at the end, like:
scanner.useDelimiter("\\;|\\:| ... |\r|\\\\");
This is the way it should work. You said if you tried it would match double backslashes. Have you tried it? If you did, and it still matches double backslashes, I suspect your input is escaped too somewhere. (maybe it is a string literal somewhere in your code?)
The reason behind this is that your string is de-escaped twice. Once at compile time as every other string literal in the Java language, and once compiling the regex. That means, after the first step it is escaped once, so the regex compiler gets two backslashes \\. The regex compiler will de-escape that too (just like \r), and will match a single \ character.
If you would like to match two backslashes this way, then you have to use eight backslash (\\\\\\\\ or \\\\{2}) in your literal. Yeah, pretty ugly.
You are using the delimiter in wrong way i think.
There is a related topic.
Check this first
How do I use a delimiter in Java Scanner?
I found this regex in a plugin called hi5validator for jQuery, and I found it pretty good, I'm already using it on JavaScript:
/^([\+][0-9]{1,3}([ \.\-])?)?([\(][0-9]{1,6}[\)])?([0-9 \.\-]{1,32})(([A-Za-z \:]{1,11})?[0-9]{1,4}?)$/
I wanted to use this regex but in Java, and I tried to do this same thing with another regex in that library, but when I used an online evaluator, the expression gave lots of trouble. Fortunately, i found another regex that helped with that.
As for this one, can someone give me the proper Java version?
The logic of your regex is fine - you need to fix some minor details:
Put double quotes " instead of slashes / around your regex
Do not escape with back slashes parentheses ( ), dashes - in trailing positions, pluses +, colons :, and dots . inside character classes (I am not sure if it is necessary to escape these characters in Javascript either).
Here is what you should get:
"^([+][0-9]{1,3}([ .-])?)?([(][0-9]{1,6}[)])?([0-9 .-]{1,32})(([A-Za-z :]{1,11})?[0-9]{1,4}?)$"
I recently noticed that, String.replaceAll(regex,replacement) behaves very weirdly when it comes to the escape-character "\"(slash)
For example consider there is a string with filepath - String text = "E:\\dummypath"
and we want to replace the "\\" with "/".
text.replace("\\","/") gives the output "E:/dummypath" whereas text.replaceAll("\\","/") raises the exception java.util.regex.PatternSyntaxException.
If we want to implement the same functionality with replaceAll() we need to write it as,
text.replaceAll("\\\\","/")
One notable difference is replaceAll() has its arguments as reg-ex whereas replace() has arguments character-sequence!
But text.replaceAll("\n","/") works exactly the same as its char-sequence equivalent text.replace("\n","/")
Digging Deeper:
Even more weird behaviors can be observed when we try some other inputs.
Lets assign text="Hello\nWorld\n"
Now,
text.replaceAll("\n","/"), text.replaceAll("\\n","/"), text.replaceAll("\\\n","/") all these three gives the same output Hello/World/
Java had really messed up with the reg-ex in its best possible way I feel! No other language seems to have these playful behaviors in reg-ex. Any specific reason, why Java messed up like this?
You need to esacpe twice, once for Java, once for the regex.
Java code is
"\\\\"
makes a regex string of
"\\" - two chars
but the regex needs an escape too so it turns into
\ - one symbol
#Peter Lawrey's answer describes the mechanics. The "problem" is that backslash is an escape character in both Java string literals, and in the mini-language of regexes. So when you use a string literal to represent a regex, there are two sets of escaping to consider ... depending on what you want the regex to mean.
But why is it like that?
It is a historical thing. Java originally didn't have regexes at all. The syntax rules for Java String literals were borrowed from C / C++, which also didn't have built-in regex support. Awkwardness of double escaping didn't become apparent in Java until they added regex support in the form of the Pattern class ... in Java 1.4.
So how do other languages manage to avoid this?
They do it by providing direct or indirect syntactic support for regexes in the programming language itself. For instance, in Perl, Ruby, Javascript and many other languages, there is a syntax for patterns / regexs (e.g. '/pattern/') where string literal escaping rules do not apply. In C# and Python, they provide an alternative "raw" string literal syntax in which backslashes are not escapes. (But note that if you use the normal C# / Python string syntax, you have the Java problem of double escaping.)
Why do text.replaceAll("\n","/"), text.replaceAll("\\n","/"), and text.replaceAll("\\\n","/") all give the same output?
The first case is a newline character at the String level. The Java regex language treats all non-special characters as matching themselves.
The second case is a backslash followed by an "n" at the String level. The Java regex language interprets a backslash followed by an "n" as a newline.
The final case is a backslash followed by a newline character at the String level. The Java regex language doesn't recognize this as a specific (regex) escape sequence. However in the regex language, a backslash followed by any non-alphabetic character means the latter character. So, a backslash followed by a newline character ... means the same thing as a newline.
1) Let's say you want to replace a single \ using Java's replaceAll method:
\
˪--- 1) the final backslash
2) Java's replaceAll method takes a regex as first argument. In a regex literal, \ has a special meaning, e.g. in \d which is a shortcut for [0-9] (any digit). The way to escape a metachar in a regex literal is to precede it with a \, which leads to:
\ \
| ˪--- 1) the final backslash
|
˪----- 2) the backslash needed to escape 1) in a regex literal
3) In Java, there is no regex literal: you write a regex in a string literal (unlike JavaScript for example, where you can write /\d+/). But in a string literal, \ also has a special meaning, e.g. in \n (a new line) or \t (a tab). The way to escape a metachar in a string literal is to precede it with a \, which leads to:
\\\\
|||˪--- 1) the final backslash
||˪---- 3) the backslash needed to escape 1) in a string literal
|˪----- 2) the backslash needed to escape 1) in a regex literal
˪------ 3) the backslash needed to escape 2) in a string literal
This is because Java tries to give \ a special meaning in the replacement string, so that \$ will be a literal $ sign, but in the process they seem to have removed the actual special meaning of \
While text.replaceAll("\\\\","/"), at least can be considered to be okay in some sense (though it itself is not absolutely right), all the three executions, text.replaceAll("\n","/"), text.replaceAll("\\n","/"), text.replaceAll("\\\n","/") giving same output seem even more funny. It is just contradicting as to why they have restricted the functioning of text.replaceAll("\\","/") for the same reason.
Java didn't mess up with regular expressions. It is because, Java likes to mess up with coders by trying to do something unique and different, when it is not at all required.
One way around this problem is to replace backslash with another character, use that stand-in character for intermediate replacements, then convert it back into backslash at the end. For example, to convert "\r\n" to "\n":
String out = in.replace('\\','#').replaceAll("#r#n","#n").replace('#','\\');
Of course, that won't work very well if you choose a replacement character that can occur in the input string.
I think java really messed with regular expression in String.replaceAll();
Other than java I have never seen a language parse regular expression this way. You will be confused if you have used regex in some other languages.
In case of using the "\\" in replacement string, you can use java.util.regex.Matcher.quoteReplacement(String)
String.replaceAll("/", Matcher.quoteReplacement("\\"));
By using this Matcher class you can get the expected result.