Regex match for string literal including escape sequence

Regex match for string literal including escape sequence - java

This works just fine for normal string literal ("hello").
"([^"]*)"
But I also want my regex to match literal such as "hell\"o".
This what i have been able to come up with but it doesn't work.
("(?=(\\")*)[^"]*")
here I have tried to look ahead for <\">.

How about
Pattern.compile("\"((\\\\\"|[^\"])*)\"")//
^^ - to match " literal
^^^^ - to match \ literal
^^^^^^ - will match \" literal
or
Pattern.compile("\"((?:\\\\\"|[^\"])*)\"")//
if you don't want to add more capturing groups.
This regex accept \" or any non " between quotation marks.
Demo:
String input = "ab \"cd\" ef \"gh \\\"ij\"";
Matcher m = Pattern.compile("\"((?:\\\\\"|[^\"])*)\"").matcher(input);
while (m.find())
System.out.println(m.group(1));
Output:
cd
gh \"ij

Use this method:
"((?:[^"\\\\]*|\\\\.)*)"
[^"\\\\]* now will not match \ anymore either. But on the other alternation, you get to match any escaped character.

Try with this one:
Pattern pattern = Pattern.compile("((?:\\\"|[^\"])*)");
\\\" to match \" or,
[^\"] to match anything by "

Related

Exclude regex in java

I have this line take a regex
And match the value from response
Matcher m = Pattern.compile("(" + elem.get("urlRegex").getAsString() + ")").matcher(response);
And here is the elem.get("urlRegex").getAsString()
https?://(www\.)?facebook\.com/(?!(i|bussiness|legal|dialog|sharer|share\.phpr|tr|business|platform|help|ads|policies|selfxss|audiencenetwork)$)([a-zA-Z0-9_\-]|(\.))+
And response is https response
This regex should match anything like
https://www.facebook.com/testaksdflasfjasldf
https://www.facebook.com/rqwerpoiqwern
https://www.facebook.com/gbjkdasjasdfuiew
And it shouldn't match anything like
https://www.facebook.com/i
https://www.facebook.com/bussiness
https://www.facebook.com/legal
https://www.facebook.com/sharer
But it does match both and the exclude doesn't work
I did debug it on regex101 but it works
Edit 1:
I did remove $ from exclude and it works
But because there is i in the exclude group
The regex will not match anything like
https://www.facebook.com/intel
https://www.facebook.com/inscanasdas
https://www.facebook.com/iasdasdasd
Edit 2:
I did test the smiler of my code with this regex on https://www.jdoodle.com/online-java-compiler/
Regex works

You have a few mistakes in your regex:
Escaping only needs a single backslash, not two.
All characters with special meaning in regex (like ?, (, ), .) need to be escaped.
The last part of your regex was wrong.
Use this:
https\?://\(www\.\)\?facebook\.com/(?!(i|bussiness|legal|dialog|sharer|share\.phpr|tr|business|platform|help|ads|policies|selfxss|audiencenetwork)$)[a-zA-Z0-9_\-]+
Demo

Regex pattern matching is getting timed out

I want to split an input string based on the regex pattern using Pattern.split(String) api. The regex uses both positive and negative lookaheads. The regex is supposed to split on a delimiter (,) and needs to ignore the delimiter if it is enclosed in double inverted quotes("x,y").
The regex is - (?<!(?<!\Q\\E)\Q\\E)\Q,\E(?=(?:[^\Q"\E]*(?<=\Q,\E)\Q"\E[[^\Q,\E|\Q"\E] | [\Q"\E]]+[^\Q"\E]*[^\Q\\E]*[\Q"\E]*)*[^\Q"\E]*$)
The input string for which this split call is getting timed out is -
"","1114356033020-0011,- [BRACKET],1114356033020-0017,- [FRAME],1114356033020-0019,- [CLIP],1114356033020-0001,- [FRAME ASSY],1114356033020-0013,- [GUSSET],1114356033020-0015,- [STIFFENER]","QH20426AD3 [RIVET,SOL FL HD],UY510AE3L [NUT,HEX],PO41071B0 [SEALING CMPD],LL510A3-10 [\"BOLT,HI-JOK\"]"
I read that the lookup technics are heavy and can cause the timeouts if the string is too long. And if I remove the backward slashes enclosing [\"BOLT,HI-JOK\"] at the end of the string, then the regex is able to detect and split.
The pattern also does not detect the first delimiter at place [STIFFENER]","QH20426AD3 with the above string. But if I remove the backward slashes enclosing [\"BOLT,HI-JOK\"] at the end of the string, then the regex is able to detect it.
I am not very experienced with the lookup in regex, can some one please give hints about how can I optimize this regex and avoid time outs?
Any pointers, article links are appreciated!

If you want to split on a comma, and the strings that follow are from an opening till closing double quote after it:
,(?="[^"\\]*(?:\\.[^"\\]*)*")
The pattern matches:
, Match a comma
(?= Positive lookahad
"[^"\\]* Match " and 0+ times any char except " or \
(?:\\.[^"\\]*)*" Optionally repeat matching \ to escape any char using the . and again match any chars other than " and /
) Close lookahead
Regex demo | Java demo
String string = "\"\",\"1114356033020-0011,- [BRACKET],1114356033020-0017,- [FRAME],1114356033020-0019,- [CLIP],1114356033020-0001,- [FRAME ASSY],1114356033020-0013,- [GUSSET],1114356033020-0015,- [STIFFENER]\",\"QH20426AD3 [RIVET,SOL FL HD],UY510AE3L [NUT,HEX],PO41071B0 [SEALING CMPD],LL510A3-10 [\\\"BOLT,HI-JOK\\\"]\"\n";
String[] parts = string.split(",(?=\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\")");
for (String part : parts)
System.out.println(part);
Output
""
"1114356033020-0011,- [BRACKET],1114356033020-0017,- [FRAME],1114356033020-0019,- [CLIP],1114356033020-0001,- [FRAME ASSY],1114356033020-0013,- [GUSSET],1114356033020-0015,- [STIFFENER]"
"QH20426AD3 [RIVET,SOL FL HD],UY510AE3L [NUT,HEX],PO41071B0 [SEALING CMPD],LL510A3-10 [\"BOLT,HI-JOK\"]"

Regex to extract hashtags with two dot-separated parts

I'm trying to create a regular expression in order to extract some text from strings. I want to extract text from urls or normal text messages e.g.:
endpoint/?userId=#someuser.id
OR
Hi #someuser.name, how are you?
And from both I want to extract exactly #someuser.name from message and #someuser.id from url. There might be be many of those string to extract from the url and messages.
My regular expression currently looks like this:
(#[^\.]+?\.)([^\W]\w+\b)
It works fine, except one for one case and I don't know how to do it - e.g.:
Those strings SHOULD NOT be matched: # .id, #.id. There must be at least one character between # and .. One or more spaces between those characters should not be matched.
How can I do it using my current regex?

You may use
String regex = "#[^.#]*[^.#\\s][^#.]*\\.\\w+";
See the regex demo and its graph:
Details
# - a # symbol
[^.#]* - zero or more chars other than . and #
[^.#\\s] - any char but ., # and whitespace
[^#.]* - - zero or more chars other than . and #
\. - a dot
\w+ - 1+ word chars (letters, digits or _).
Java demo:
String s = "# #.id\nendpoint/?userId=#someuser.id\nHi #someuser.name, how are you?";
String regex = "#[^.#]*[^.#\\s][^#.]*\\.\\w+";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(0));
}
Output:
#someuser.id
#someuser.name

You can try the following regex:
#(\w+)\.(\w+)
demo
Notes:
remove the parenthesis if you do not want to capture any group.
in your java regex string you need to escape every \
this gives #(\\w+)\\.(\\w+)
if the id is only made of numbers you can change the second \w by [0-9]
if the username include other characters than alphabet, numbers and underscore you have to change \w into a character class with all the authorised characters defined explicitly.
Code sample:
String input = "endpoint/?userId=#someuser.id Hi #someuser.name, how are you? # .id, #.id.";
Matcher m = Pattern.compile("#(\\w+)\\.(\\w+)").matcher(input);
while (m.find()) {
System.out.println(m.group());
}
output:
#someuser.id
#someuser.name

The redefined requirements are:
We search for pattern #A.B
A can be anything, except for only whitespaces, nor may it contain # or .
B can only be regular ASCII letters or digits
Converting those requirements into a (possible) regex:
#[^.#]+((?<!#\\s+)\\.)[A-Za-z0-9]+
Explanation:
#[^.#]+((?<!#\\s+)\\.)[A-Za-z0-9]+ # The entire capture for the Java-Matcher:
# # A literal '#' character
[^.#]+ # Followed by 1 or more characters which are NOT '.' nor '#'
( \\.) # Followed by a '.' character
(?<! ) # Which is NOT preceded by (negative lookbehind):
# # A literal '#'
\\s+ # With 1 or more whitespaces
[A-Za-z0-9]+ # Followed by 1 or more alphanumeric characters
# (PS: \\w+ could be used here if '_' is allowed as well)
Test code:
String input = "endpoint/?userId=#someuser.id Hi #someuser.name, how are you? # .id #.id %^*##*(.H(#EH Ok, # some spaces here .but none here #$p€©ï#l.$p€©ï#l that should do it..";
System.out.println("Input: \""+ input + '"');
System.out.println("Outputs: ");
java.util.regex.Matcher matcher = java.util.regex.Pattern.compile("#[^.#]+((?<!#\\s+)\\.)[A-Za-z0-9]+")
.matcher(input);
while(matcher.find())
System.out.println('"'+matcher.group()+'"');
Try it online.
Which outputs:
Input: "endpoint/?userId=#someuser.id Hi #someuser.name, how are you? # .id #.id %^*##*(.H(#EH Ok, # some spaces here .but none here #$p€©ï#l.$p€©ï#l that should do it.."
Outputs:
"#someuser.id"
"#someuser.name"
"##*(.H"
"# some spaces here .but"

#(\w+)[.](\w+)
results two groups, e.g
endpoint/?userId=#someuser.id -> group[0]=someuser and group[1]=id

How to capture multiple groups in regex?

I am trying to capture following word, number:
stxt:usa,city:14
I can capture usa and 14 using:
stxt:(.*?),city:(\d.*)$
However, when text is;
stxt:usa
The regex did not work. I tried to apply or condition using | but it did not work.
stxt:(.*?),|city:(\d.*)$

You may use
(stxt|city):([^,]+)
See the regex demo (note the \n added only for the sake of the demo, you do not need it in real life).
Pattern details:
(stxt|city) - either a stxt or city substrings (you may add \b before the ( to only match a whole word) (Group 1)
: - a colon
([^,]+) - 1 or more characters other than a comma (Group 2).
Java demo:
String s = "stxt:usa,city:14";
Pattern pattern = Pattern.compile("(stxt|city):([^,]+)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}

Looking at your string, you could also find the word/digits after the colon.
:(\w+)

Regular expression to match start of an address

I would like to match the start of an address in java. I have tried with this website (http://www.regexplanet.com/advanced/java/index.html) and it did match the address but the very minute i tried it in netbean it did not.
any idea why?
Pattern p = Pattern.compile("\bcloud.*");
Matcher m = p.matcher("cloud (cloud.yahoo.com:225) - v0.00014 ( jan 10 1999 / 24:12:56 )");
System.out.println(m.matches());

\ should be escaped. Otherwise \b is interpreted as BACKSPACE instead of word boundary.
Pattern p = Pattern.compile("\\bcloud.*");
// ^^^
See http://ideone.com/1rdLg6

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex match for string literal including escape sequence - java

This works just fine for normal string literal ("hello"). "([^"])" But I also want my regex to match literal such as "hell\"o". This what i have been able to come up with but it doesn't work. ("(?=(\\"))[^"]*") here I have tried to look ahead for <\">.

Use this method: "((?:[^"\\\\]|\\\\.))" [^"\\\\]* now will not match \ anymore either. But on the other alternation, you get to match any escaped character.

Try with this one: Pattern pattern = Pattern.compile("((?:\\\"|[^\"])*)"); \\\" to match \" or, [^\"] to match anything by "

Related

Exclude regex in java

Regex pattern matching is getting timed out

Regex to extract hashtags with two dot-separated parts

How to capture multiple groups in regex?

Regular expression to match start of an address

Categories

Resources

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex match for string literal including escape sequence - java

This works just fine for normal string literal ("hello"). "([^"]*)" But I also want my regex to match literal such as "hell\"o". This what i have been able to come up with but it doesn't work. ("(?=(\\")*)[^"]*") here I have tried to look ahead for <\">.

Use this method: "((?:[^"\\\\]*|\\\\.)*)" [^"\\\\]* now will not match \ anymore either. But on the other alternation, you get to match any escaped character.

Try with this one: Pattern pattern = Pattern.compile("((?:\\\"|[^\"])*)"); \\\" to match \" or, [^\"] to match anything by "

Related

Exclude regex in java

Regex pattern matching is getting timed out

Regex to extract hashtags with two dot-separated parts

How to capture multiple groups in regex?

Regular expression to match start of an address

Categories

Resources

This works just fine for normal string literal ("hello"). "([^"])" But I also want my regex to match literal such as "hell\"o". This what i have been able to come up with but it doesn't work. ("(?=(\\"))[^"]*") here I have tried to look ahead for <\">.

Use this method: "((?:[^"\\\\]|\\\\.))" [^"\\\\]* now will not match \ anymore either. But on the other alternation, you get to match any escaped character.