I am using an email validation pattern I found at How to validate an email and it works fine except it allows a + in the first part of the email and that isn't allowed in my specs. The original code is
public static final String EMAIL_PATTERN = "^[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*#"
+ "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$";
protected boolean isInvalidEmail(String email) {
pattern = Pattern.compile(EMAIL_PATTERN);
matcher = pattern.matcher(email);
return !matcher.matches();
}
I thought I could just remove the + from "^[_A-Za-z0-9-\\+] but I get a Pattern Syntax Exception: Unclosed Character Class. Can someone tell me why removing the + uncloses the class? Thanks!
You have to remove the \\+ portion.
\\ escapes the \ character. \+ escapes the + regex operator. Thus \\+ breaks down to \+ which means match the literal + character.
Note: The + regex operator means match one or more of the preceding element.
The reason that it gives you Unclosed Character Class is because only removing the + now escapes the closing square bracket so it is considered part of the pattern. Hence, the class does not have a matching closing square bracket. As Jonny Henly mentions the solution is to remove the \\+ to align with your spec, but this gives the answer as to why it is unclosed.
Related
I am looking for regex which can help me replace strings like
source=abc/task=cde/env=it --> source='abc'/task='cde'/env='it'
To be more precise, I want to replace a string which starts with = and ends with either / or end of the string with ''
Tried code like this
"source=abc/task=cde/env=it".replaceAll("=(.*?)/","'$1'")
But that results in
source'abc'task'cde'env=it
Using lookahead and look behind:
(?<==)([^/]*)((?=/)|$)
Lookbehind allows you to specify what comes before your match. In this case an equals: (?<==).
The main match in my regex looks for any non-slash character, zero or more times: ([^/]*)
Lookahead allows you to specify what comes after your match. In this case, a slash: (?=/).
The $ matches the end of the line, so that the last item in your test data becomes quoted. ((?=/)|$) combines with this with the lookahead, meaning "either a slash comes after the match or this is the end of the line".
Here it is in action in a test.
#Test
public void test_quote_items() {
String regex = "(?<==)([^/]*)((?=/)|$)";
String actual = "source=abc/task=cde/env=it".replaceAll(regex,"'$1'");
String expected = "source='abc'/task='cde'/env='it'";
assertEquals(expected, actual);
}
Try
String input = "source=abc/task=cde/env=it".replaceAll("=(.*?)(/|$)","='$1'/");
The problems I found are that you are not replacing the =
and also the / is not there for the end of String, that also needs to be replaced when found.
output
source='abc'/task='cde'/env='it'/
If you don't want the last '/', that is trivial to remove isn't it.
I want to split an input string based on the regex pattern using Pattern.split(String) api. The regex uses both positive and negative lookaheads. The regex is supposed to split on a delimiter (,) and needs to ignore the delimiter if it is enclosed in double inverted quotes("x,y").
The regex is - (?<!(?<!\Q\\E)\Q\\E)\Q,\E(?=(?:[^\Q"\E]*(?<=\Q,\E)\Q"\E[[^\Q,\E|\Q"\E] | [\Q"\E]]+[^\Q"\E]*[^\Q\\E]*[\Q"\E]*)*[^\Q"\E]*$)
The input string for which this split call is getting timed out is -
"","1114356033020-0011,- [BRACKET],1114356033020-0017,- [FRAME],1114356033020-0019,- [CLIP],1114356033020-0001,- [FRAME ASSY],1114356033020-0013,- [GUSSET],1114356033020-0015,- [STIFFENER]","QH20426AD3 [RIVET,SOL FL HD],UY510AE3L [NUT,HEX],PO41071B0 [SEALING CMPD],LL510A3-10 [\"BOLT,HI-JOK\"]"
I read that the lookup technics are heavy and can cause the timeouts if the string is too long. And if I remove the backward slashes enclosing [\"BOLT,HI-JOK\"] at the end of the string, then the regex is able to detect and split.
The pattern also does not detect the first delimiter at place [STIFFENER]","QH20426AD3 with the above string. But if I remove the backward slashes enclosing [\"BOLT,HI-JOK\"] at the end of the string, then the regex is able to detect it.
I am not very experienced with the lookup in regex, can some one please give hints about how can I optimize this regex and avoid time outs?
Any pointers, article links are appreciated!
If you want to split on a comma, and the strings that follow are from an opening till closing double quote after it:
,(?="[^"\\]*(?:\\.[^"\\]*)*")
The pattern matches:
, Match a comma
(?= Positive lookahad
"[^"\\]* Match " and 0+ times any char except " or \
(?:\\.[^"\\]*)*" Optionally repeat matching \ to escape any char using the . and again match any chars other than " and /
) Close lookahead
Regex demo | Java demo
String string = "\"\",\"1114356033020-0011,- [BRACKET],1114356033020-0017,- [FRAME],1114356033020-0019,- [CLIP],1114356033020-0001,- [FRAME ASSY],1114356033020-0013,- [GUSSET],1114356033020-0015,- [STIFFENER]\",\"QH20426AD3 [RIVET,SOL FL HD],UY510AE3L [NUT,HEX],PO41071B0 [SEALING CMPD],LL510A3-10 [\\\"BOLT,HI-JOK\\\"]\"\n";
String[] parts = string.split(",(?=\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\")");
for (String part : parts)
System.out.println(part);
Output
""
"1114356033020-0011,- [BRACKET],1114356033020-0017,- [FRAME],1114356033020-0019,- [CLIP],1114356033020-0001,- [FRAME ASSY],1114356033020-0013,- [GUSSET],1114356033020-0015,- [STIFFENER]"
"QH20426AD3 [RIVET,SOL FL HD],UY510AE3L [NUT,HEX],PO41071B0 [SEALING CMPD],LL510A3-10 [\"BOLT,HI-JOK\"]"
I get a String from the JSP, containing [", e.g.
["Bulgaria
I would like to replace all the [" occurrences for [', but I don't know exactly how to do it...
I just tried:
str = str.replaceAll("[\\\"", "['");
with the result
java.util.regex.PatternSyntaxException: Unclosed character class near index 2 [\"
and
html = html.replaceAll("[\"", "['");
with the result
java.util.regex.PatternSyntaxException: Unclosed character class near index 1 [" ^
any help will be appreciated
Try this:
str.replaceAll("\\[\"", "['");
You need \\ to escape in java regex and [ is a special character in java regex, thus the \\ in front of it. " is a special character in strings so you only need one \ to escape it.
"Test[\"".replaceAll("\\[\"", "['"); // Test['
This works just fine for normal string literal ("hello").
"([^"]*)"
But I also want my regex to match literal such as "hell\"o".
This what i have been able to come up with but it doesn't work.
("(?=(\\")*)[^"]*")
here I have tried to look ahead for <\">.
How about
Pattern.compile("\"((\\\\\"|[^\"])*)\"")//
^^ - to match " literal
^^^^ - to match \ literal
^^^^^^ - will match \" literal
or
Pattern.compile("\"((?:\\\\\"|[^\"])*)\"")//
if you don't want to add more capturing groups.
This regex accept \" or any non " between quotation marks.
Demo:
String input = "ab \"cd\" ef \"gh \\\"ij\"";
Matcher m = Pattern.compile("\"((?:\\\\\"|[^\"])*)\"").matcher(input);
while (m.find())
System.out.println(m.group(1));
Output:
cd
gh \"ij
Use this method:
"((?:[^"\\\\]*|\\\\.)*)"
[^"\\\\]* now will not match \ anymore either. But on the other alternation, you get to match any escaped character.
Try with this one:
Pattern pattern = Pattern.compile("((?:\\\"|[^\"])*)");
\\\" to match \" or,
[^\"] to match anything by "
I want to split the string
String fields = "name[Employee Name], employeeno[Employee No], dob[Date of Birth], joindate[Date of Joining]";
to
name
employeeno
dob
joindate
I wrote the following java code for this but it is printing only name other matches are not printing.
String fields = "name[Employee Name], employeeno[Employee No], dob[Date of Birth], joindate[Date of Joining]";
Pattern pattern = Pattern.compile("\\[.+\\]+?,?\\s*" );
String[] split = pattern.split(fields);
for (String string : split) {
System.out.println(string);
}
What am I doing wrong here?
Thank you
This part:
\\[.+\\]
matches the first [, the .+ then gobbles up the entire string (if no line breaks are in the string) and then the \\] will match the last ].
You need to make the .+ reluctant by placing a ? after it:
Pattern pattern = Pattern.compile("\\[.+?\\]+?,?\\s*");
And shouldn't \\]+? just be \\] ?
The error is that you are matching greedily. You can change it to a non-greedy match:
Pattern.compile("\\[.+?\\],?\\s*")
^
There's an online regular expression tester at http://gskinner.com/RegExr/?2sa45 that will help you a lot when you try to understand regular expressions and how they are applied to a given input.
WOuld it be better to use Negated Character Classes to match the square brackets? \[(\w+\s)+\w+[^\]]\]
You could also see a good example how does using a negated character class work internally (without backtracking)?