Regex matching quoted string but ignoring escaped quotation mark - java

What I want to know is how to modify following regex: \".*?\" so it will ignore escaped " character (\") so it won't end matching at \".
For example:
parameter1 = " fsfsdfsd \" " parameter2 = " fsfsfs "
I want to match:
" fsfsdfsd \" "
and
" fsfsfs "
but not
" fsfsdfsd \" " parameter2 = " fsfsfs "
etc...

Try this one:
"(?:\\"|[^"])*"
It matches "test \" though(you can probably avoid that using lookbehind). Escape the character if you need using \
Online Demo

I usually handle this sort of task by figuring out what are the elements that can appear between quote marks. In this case, each element can be:
any character that is not \ or ";
the two-character sequence \";
a \ that is not followed by ".
You can expand this if desired, by allowing \\ to represent \, for instance, or allowing other escapes; it should be pretty simple to modify the above list.
Then the regular expression just follows the rules in the list: Note: this is a regex and not a Java string literal
"(([^\\"]|\\"|\\(?!"))*)"
which means that, within the quote marks, we match one or more of: (1) a character other than \ or " (the character class); (2) the sequence \"; (3) \ not followed by " (negative lookahead). Of course, the Java string literal looks pretty ugly:
"\"(([^\\\\\"]|\\\\\"|\\\\(?!\"))*)\""
(Note: not tested.)

You will need negative lookbehind in your regex:
(?<!\\\\)\".*?(?<!\\\\)\"

correct regexp for matching strings between quotes will be:
"([^\\"]+|\\.|\\\\)*"
but besause in java slashes need to be escaped, the result expression will be:
Pattern.compile("\"(?:[^\\\\\"]+|\\\\.|\\\\\\\\)*\"");
this expression matches slash-escaped characters and slash themselve, for example:
... "123 \\\" 456 \\" ...
^ ^ slash literal
^
^ slash literal + escaped quote
regexp written in comments above will fail on this example

Related

Why does this regex match online but not in my environment? [duplicate]

I am trying out the following code and it's printing false.
I was expected that this would print true.
In addition , the Pattern.Compile() statemenet , gives a warning 'redundant escape character'.
Can someone please help me as to why this is not returning true and why do I see a warning.
public static void main(String[] args) {
String s = "\\n";
System.out.println(s);
Pattern p = Pattern.compile("\\\n");
Matcher mm = p.matcher(s);
System.out.println(mm.matches());
}
The s="\\n" means you assign a backslash and n to the variable s, and it contains a sequence of two chars, \ and n.
The Pattern.compile("\\\n") means you define a regex pattern \<LF> (a backslash and a newline, line feed, char) that matches a newline (LF) char, because escaped non-word non-special chars match themselves. \, matches a ,, \; matches a ;. Thus, this pattern won't match the string in variable s.
The redundant escape warning is thrown because \<LF> matches the same newline char that can be matched with mere <LF>.
More examples:
Regex
Regex string literal
Matching text
Matching string literal
<LF>
"\n"
<LF>
"\n"
\n
"\\n"
<LF>
"\n"
\\n
"\\\\n"
\n
"\\n"
Because "\\n" evaulates to backslash \\ and the letter n while "\\\n" evaluates to a backslash \\ and then a newline \n.
Backslashes within string literals in Java source code are interpreted as required by The Java™ Language Specification as either Unicode escapes (section 3.3) or other character escapes (section 3.10.6) It is therefore necessary to double backslashes in string literals that represent regular expressions to protect them from interpretation by the Java bytecode compiler. The string literal "\b", for example, matches a single backspace character when interpreted as a regular expression, while "\\b" matches a word boundary.
Refer : https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Your source s has two characters, '\' and 'n', if you meant it would be \ followed by a line break then it should be "\\\n"
Pattern has two characters '\' and '\n' (line break) and \ the escape characher is not needed, hence warning. If you meant \ followed by line break it should be "\\\\\n" (twice \ to escape it for regex and then \n).
String s = "\\\n";
System.out.println(s);
Pattern p = Pattern.compile("\\\\\n");
Matcher mm = p.matcher(s);
System.out.println(mm.matches());

Regular expression to match escaped sequences in java

I am looking for regex to check for all escape sequences in java
\b backspace
\t horizontal tab
\n linefeed
\f form feed
\r carriage return
\" double quote
\' single quote
\\ backslash
How do I write regex and perform validation to allow words / textarea / strings / sentences containing valid escape sequences
This regex will match all your escape sequence that you have written:
\\[btnfr"'\\]
In Java you need to duplicate the backslash, the code will result as:
Pattern p = Pattern.compile("\\\\[btnfr\\\"\\'\\\\]");
if(p.matcher("\\b backspace").find()){
System.out.println("Contains escape sequence");
}
The following regex should meet your need:
Pattern pattern = Pattern.compile("\\\\[\\\\btnfr\'\"]");
as in
Pattern pattern = Pattern.compile("\\\\[\\\\btnfr\'\"]");
String[] strings = new String[]{"\\b","\\t","\\n","\\f","\\r","\\\'","\\\"", "\\\\"};
for (String s:strings) {
System.out.println(s + " - " + pattern.matcher(s).matches());
}
To match a single \, you would have to add 4 \ inside a regex string.
Considering a string, "\\" stands for a single \.
When you have "\\" as a regex string, it means a \ which is a special character in regex and it is supposed to be followed by certain other character to form an escape sequence.
In this way, we need "\\\\", to match a single \ which is equivalent to the string "\\".
EDIT: There is no need to escape the single quote in the regex string. So "\\\\[\\\\btnfr\'\"]" can be replaced with "\\\\[\\\\btnfr'\"]".
You'll need to use DOTALL to match line terminators. You might also find \s handy as it represents all whitespace. Eg
Pattern p = Pattern.compile("([\\s\"'\\]+)", Pattern.DOTALL);
Matcher m = p.matcher("foo '\r\n\t bar");
assertTrue(m.find());
assertEquals(" '\r\n\t ", m.group(1));

REGEXP - how to read " character?

I'm using hadoop pig with regexp (REGEX_EXTRACT_ALL) - this is Java parsing.
I have a string:
"DYN_USER_ID=32753477; $Path=\"/\"; DYN_USER_CONFIRM=e6d2a0a7b7715cb10d1dca504e3c5e80; $Path=\"/\"" "Nokia6070/2.0 (03.20) Profile/MIDP-2.0 Configuration/CLDC-1.1"
I'm expeting two groups:
First: DYN_USER_ID=32753477; $Path=\"/\"; DYN_USER_CONFIRM=e6d2a0a7b7715cb10d1dca504e3c5e80; $Path=\"/\"
Second: Nokia6070/2.0 (03.20) Profile/MIDP-2.0 Configuration/CLDC-1.1
As you can see, inside the first string there is " character but with escape character \.
The simplies solution is:
"(.*)" "(.*)"
But is it the best one?
"(.*)(?<!\\\\)" "(.*)"
This uses negatve lookbehind: (?<!☀) where ☀ is some string, here the character backspace is represented by an regex-escaped and String-escaped backslash.
Ideally, you should be using the negated character class [^"] so that it matches from the first delimiter " to the last delimiter ", but the problem is that it ignores escaped " characters. If you can have escaped " and escaped \ in your strings, it will be better if you use something like this:
"((?:\\.|[^"\\])+)" "((?:\\.|[^"\\])+)"
The group (?:\\.|[^"\\])+ will match either an escaped character or many [^"\\] characters.
regex101 demo

java regex: find pattern of 1 or more numbers followed by a single

I'm having a java regex problem.
how can I find pattern of 1 or more numbers followed by a single . in a string?
"^[\\d]+[\\.]$"
^ = start of string
[\\d] = any digit
+ = 1 or more ocurrences
\\. = escaped dot char
$ = end of string
I think this is the answer to your question:
String searchText = "asdgasdgasdg a121341234.sdg asdg as12..dg a1234.sdg ";
searchText.matches("\\d+\\.[^.]");
This will match "121341234." and "1234." but not "12."
(\\d)+\\.
\\d represents any digit
+ says one or more
Refer this http://www.vogella.com/articles/JavaRegularExpressions/article.html
In regex the metacharacter \d is used to represent an integer but to represent it in a java code as a regex one would have to use \\d because of the double parsing performed on them.
First a string parser which will convert it to \d and then the regex parser which will interpret it as an integer metacharacter (which is what we want).
For the "one or more" part we use the + greedy quantifier.
To represent a . we use \\. because of the double parsing scenario.
So in the end we have (\\d)+(\\.).
\\d+)\\.
\\d is for numbers, + is for one and more, \\. is for dot. If . is written without backslash before it it matches any character.

Regular expression for String.replaceAll

I need a regular expression that can be used with replaceAll method of String class to replace all instance of * with .* except for one that has trailing \
i.e. conversion would be
[any character]*[any character] => [any character].*[any character]
* => .*
\* => \* (i.e. no conversion.)
Can someone please help me?
Use lookbehind.
String resultString = subjectString.replaceAll("(?<!\\\\)\\*", ".*");
Explanation :
"(?<!" + // Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
"\\\\" + // Match the character “\” literally
")" +
"\\*" // Match the character “*” literally
it may be possible to do without capture groups, but this should work:
myString.replaceAll("\\*([^\\\\]|$)", "*.$1");

Categories