Regex - negate a group within a class - java

I'm using Java's matcher to group terms in a string using the following regex:
Pattern.compile("(\\\\\"[^\\\\\"]*\\\\\"|[^\\s\\\\\"]+)");
This is the part I'm having trouble with: [^\s\\\"]
I'd like it to only match non-spaces and dangling escaped quotes such as \". Is there any way to group the \\ and \" within a character class so they're only matched together?
I tried to use lookahead/lookbehind, but found that including it within the character class put me back at square one.

A character class matches a single character. If I understood you correctly, you want to match only the string \". To do this, you don't need a character class at all--the regex \\" matches that already! (Inside a Java string, it would look like \\\\\" which is ridiculous, but there you have it.)
You can group things together using parentheses: (\\\\\"). You can also alternate inside a group like this using |. So to match non-spaces or \", you can do this: (\S|\\\\\"). (Note that \S is the same as [^\s].)
EDIT: I wasn't paying enough attention. You can match everything but \" or a space as follows: (\\\\(?!")|[^\s\\]), I think.
How about this: ([^\\s\\\\]|\\\\(?!")). This should match anything except whitespace or \ or a \ not followed by a ".

Related

Regex Match word that include a Dot

I have a Question I have this Sentence for Example:
"HalloAnna daveca.nn dave anna ca. anna"
And I only wanna match the single Standing "ca." .
My RegEx is like that :
(?i)\b(ca\.)\b
But this doesn't work and I don't know why. Any ideas ?
//Update
I excecute it with:
testSource.replaceAll()
and with
pattern.matcher(testSource).replaceAll().
both doesn´t work.
You must escape the dot and assert a non-word following:
(?i)\bca\.(?=\W)
See live demo.
You should use it like this:
Pattern.compile("(?i)\\b(ca\\.)(?=\\W)").matcher(a).replaceAll("SOME TEXT");
Which if you omit the java escapes gives a regex: (?i)\b(ca\.)\W.
Every \ in normal regex has to be escaped in java - \\.
Also, before a word you have word boundary (\b), but it applies only to a part in String where you have a change from whitespace to a alphanumeric character or the other way around. But in your case you have a dot, which is not an alphanumeric character, so you can't use \b at the end. You can use \W which means that a non-word character is following the dot. But to use \W you need to ignore it in the capture group (so it won't be replaced) - (?=.
Another issue was that you used ., which matches any character, but you actually want to match the real dot, so to do that you have to escape it - \., which in java String becomes \\..

Regular expression for match

How to find such lines in a file
######## this_is_a_line.sh ########
I tried the below regular expression but it does not work
(#)+( )+(A-Za-z0-9_)+(.sh)( )+(#)+
But this doesn't seem to work. Can anyone let me know what is wrong?
You used (...) (a capturing group) instead of [...] (a character class). Use the character class:
#+ +[A-Za-z0-9_]+\.sh +#+
^^^^^^^^^^^^^
See regex demo (note most of the capture groups are redundant here, and I removed them. Also, . must be escaped to match a literal dot.)
The [A-Za-z0-9_]+ matches 1 or more letters, digits or _.
The (A-Za-z0-9_)+ matches 1 or more sequences of A-Za-z0-9_ (see demo).
Also, in Java, you can use \w to match [A-Za-z0-9_] and shorten your regex to
#+ +\w+\.sh +#+
Do not forget that you need to double each \ in the pattern string in Java (String pattern = "#+ +\\w+\\.sh +#+";).

what is missing in my java regex?

I want to fetch
http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_608_o_6288_precious_image_1419866866.png
from
url(http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_608_o_6288_precious_image_1419866866.png)
I have tried this code:
String a = "";
Pattern pattern = Pattern.compile("url(.*)");
Matcher matcher = pattern.matcher(imgpath);
if (matcher.find()) {
a = (matcher.group(1));
}
return a;
but a == (http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_639_o_4746_precious_image_1419867529.png)
how can I fine tune it?
Why use a regular expression to begin with?
Given
final String s = "url(http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_608_o_6288_precious_image_1419866866.png)";
If the string is always the same format a simple substring(4,s.length()-1) would be better.
That said, if you insist on a regular expression:
You have to escape the ( with \( so in Java ( you have to escape the \ ) it would be \\( same with the ).
Then you can get the grouping with url\\((.+)\\), test it here!
Learn to use RegEx101.com before coming here, it will point out errors like this immediately.
As you already seem to know ( and )` represents groups which means that in regex
url(.*)
(.*) will place everything after url in group 1, which in case of
url(http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_608_o_6288_precious_image_1419866866.png)
will be
(http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_608_o_6288_precious_image_1419866866.png)
If you want to exclude ( and ) from match you need to add their literals to regex, which means you need to escape them. There are many things to do it, like adding \ before each of them, or surrounding them with [ ].
Other problem with your regex is that .* finds maximal potential match but since . represents any character (except line separators) it can also include ( and ). To solve this problem you can make * quantifier reluctant by adding ? after it so your final regex can be written as string
"url\\((.*?)\\)"
---------------
url
\\( - ( literal
(.*?) - group 1
\\) - ) literal
or you can use instead of . character class which will accept all characters except ) like
"url\\(([^)]*)\\)"
Try this regex:
url\((.*?)\)
The outermost parentheses are escaped so they will be matched literally. The inner parentheses are for capturing a group. The question mark after the .* is to make the match lazy, so the first closing parenthesis found will end the group.
Note that to use this regex in Java, you'll have to additionally escape the backslashes in order to express the above regex as a string literal:
String regex = "url\\((.*?)\\)";
You need to escape the () to match the parenthesis in the string, and then add another set of () around the part you want to pull out in group 1, the actual url. I also changed the part inside the parenthesis to [^)]*, which will match everything until it finds a ). See below:
url\(([^)]*)\)

Pattern Matching failed when "\" is input

My pattern is something like this:
"^[a-zA-Z0-9_'^&/+-\\.]{1,}#{1,1}[a-zA-Z0-9_'^&/+-.]{1,}$"
But when I try to match something with a backslash in it, like this:
"abc\\#abc"
...it does not match. Can anyone explain why?
try with below pattern
"^[a-zA-Z0-9_'^&/+-\\\\.]{1,}#{1,1}[a-zA-Z0-9_'^&/+-.]{1,}$";
or
"^[a-zA-Z0-9_'^&/+-\\{0,}}.]{1,}#{1,1}[a-zA-Z0-9_'^&/+-.]{1,}$";
The expression \\ matches a single backslash \
Try escaping each backslash of your test string with an additional backslash: e.g.
"abc\\\\#abc" becomes "abc\\\\\\\\#abc"
You need to use "\\\\" if you want the end result to look like "\"
why, you ask?
The Java compiler sees the string "\\\\" and turns that into "\\" as "\" is an escape character.
Afterwards the regular expression sees the string "\\" and turns it into "\" as an "\" is an escape character.
so to want a single backslash you must put in four.
I'm assuming you're writing the regex in your Java source code, like this:
Pattern p = Pattern.compile(
"^[a-zA-Z0-9_'^&/+-\\.]{1,}#{1,1}[a-zA-Z0-9_'^&/+-.]{1,}$"
);
I'm also assuming you meant \\. as a backslash followed by a dot, not as an escaped dot.
Because it's in a string literal, you have to escape backslashes one more time. That means you have to use four backslashes in the regex to match one in the target string. You also need to escape the - (hyphen) so the regex compiler doesn't think (for example) that [+-.] is meant to be a range expression like [0-9] or [a-z].
"^[a-zA-Z0-9_'^&/+\\\\.-]+#[a-zA-Z0-9_'^&/+.-]+$"
I also changed your {1,} to + because it means the same thing, and got rid of the {1,1} because it doesn't do anything. And I changed your & to &. I don't know how that got in there, but if you wrote it that way in your source code, it's wrong.

java regex pattern unclosed character class

I need some help. Im getting:
Caused by: java.util.regex.PatternSyntaxException: Unclosed character class near index 24
^[a-zA-Z└- 0-9£µ /.'-\]*$
^
at java.util.regex.Pattern.error(Pattern.java:1713)
at java.util.regex.Pattern.clazz(Pattern.java:2254)
at java.util.regex.Pattern.sequence(Pattern.java:1818)
at java.util.regex.Pattern.expr(Pattern.java:1752)
at java.util.regex.Pattern.compile(Pattern.java:1460)
at java.util.regex.Pattern.<init>(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:823)
Here is my code:
String testString = value.toString();
Pattern pattern = Pattern.compile("^[a-zA-Z\300-\3770-9\u0153\346 \u002F.'-\\]*$");
Matcher m = pattern.matcher(testString);
I have to use the unicode value for some because I'm working with xhtml.
Any help would be great!
Assuming that you want to match \ and - and not ]:
Pattern pattern = Pattern.compile("^[a-zA-Z\300-\3770-9\u0153\346 \u002F.'\\\\-]*$");
You need to double escape your backslashes, as \ is also an escape character in regex. Thus \\] escapes the backslash for java but not for regex. You need to add another java-escaped \ in order to regex-escape your second java-escaped \.
So \\\\ after java escaping becomes \\ which is then regex escaped to \.
Moving - to the end of the sequence means that it is used as a character, instead of a range operator as pointed out by Pshemo.
It is hard to say what are you trying to achieve, but I can see few strange things in your regex:
you have opened class of characters but never closed it. Instead you used \\] which makes ] normal character.
If you want to include ] in your characters class then you need additional ] at the end, like "^[a-zA-Z\300-\3770-9\u0153\346 \u002F.'-\\]]*$"
if you want to include \ in your characters class then you need to use \\\\ version, because you need to escape its special meaning two times, in regex engine, and in Javas String
you used - with ('-\\]) which in character class is used to specify range of characters like a-z or A-Z. To escape its special meaning you need to use \\-

Categories