I have a string:
HLN (Formerly Headline News)
I want to remove everything inside the parens and the parens themselves, leaving only:
HLN
I've tried to do this with a regex, but my difficulty is with this pattern:
"(.+?)"
When I use it, it always gives me a PatternSyntaxException. How can I fix my regex?
Because parentheses are special characters in regexps you need to escape them to match them explicitly.
For example:
"\\(.+?\\)"
String foo = "(x)()foo(x)()";
String cleanFoo = foo.replaceAll("\\([^\\(]*\\)", "");
// cleanFoo value will be "foo"
The above removes empty and non-empty parenthesis from either side of the string.
plain regex:
\([^\(]*\)
You can test here: http://www.regexplanet.com/simple/index.html
My code is based on previous answers
You could use the following regular expression to find parentheticals:
\([^)]*\)
the \( matches on a left parenthesis, the [^)]* matches any number of characters other than the right parenthesis, and the \) matches on a right parenthesis.
If you're including this in a java string, you must escape the \ characters like the following:
String regex = "\\([^)]*\\)";
String foo = "bar (baz)";
String boz = foo.replaceAll("\\(.+\\)", ""); // or replaceFirst
boz is now "bar "
Related
I'm trying to create a regular expression matcher, but it doesn't work as expected.
String input = "// source C:\\path\\to\\folder";
System.out.println(Pattern.matches("//\\s*source\\s+[a-zA-Z]:(\\[a-zA-Z0-9_-]+)+", input));
It returns false but it should pass. What is wrong with that regex?
Backslashes. That's what is wrong.
System.out.println(Pattern.matches("//\\s*source\\s+[a-zA-Z]:(\\\\[a-zA-Z0-9_-]+)+", input));
^^
In regex, a backslash must be escaped—backslashed. That's two backslashes. Add to that, Java escaping and you must write four backslashes to match one.
You forgot \\ in [a-zA-Z0-9_-]:
String input = "// source C:\\path\\to\\folder";
System.out.println(Pattern.matches("//\\s*source\\s+[a-zA-Z]:(\\\\[a-zA-Z0-9_\\-]+)+", input));
You should use: \\\\ to match a backslash in Java regex:
String input = "// source C:\\path\\to\\folder";
boolean m = Pattern.matches("//\\s*source\\s+[a-zA-Z]:(\\\\[a-zA-Z0-9_-]+)+", input);
//=> true
You need first escaping i.e. \\ for String and another escaping i.e. \\ for underlying regex engine to get a literal \.
I want to fetch
http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_608_o_6288_precious_image_1419866866.png
from
url(http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_608_o_6288_precious_image_1419866866.png)
I have tried this code:
String a = "";
Pattern pattern = Pattern.compile("url(.*)");
Matcher matcher = pattern.matcher(imgpath);
if (matcher.find()) {
a = (matcher.group(1));
}
return a;
but a == (http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_639_o_4746_precious_image_1419867529.png)
how can I fine tune it?
Why use a regular expression to begin with?
Given
final String s = "url(http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_608_o_6288_precious_image_1419866866.png)";
If the string is always the same format a simple substring(4,s.length()-1) would be better.
That said, if you insist on a regular expression:
You have to escape the ( with \( so in Java ( you have to escape the \ ) it would be \\( same with the ).
Then you can get the grouping with url\\((.+)\\), test it here!
Learn to use RegEx101.com before coming here, it will point out errors like this immediately.
As you already seem to know ( and )` represents groups which means that in regex
url(.*)
(.*) will place everything after url in group 1, which in case of
url(http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_608_o_6288_precious_image_1419866866.png)
will be
(http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_608_o_6288_precious_image_1419866866.png)
If you want to exclude ( and ) from match you need to add their literals to regex, which means you need to escape them. There are many things to do it, like adding \ before each of them, or surrounding them with [ ].
Other problem with your regex is that .* finds maximal potential match but since . represents any character (except line separators) it can also include ( and ). To solve this problem you can make * quantifier reluctant by adding ? after it so your final regex can be written as string
"url\\((.*?)\\)"
---------------
url
\\( - ( literal
(.*?) - group 1
\\) - ) literal
or you can use instead of . character class which will accept all characters except ) like
"url\\(([^)]*)\\)"
Try this regex:
url\((.*?)\)
The outermost parentheses are escaped so they will be matched literally. The inner parentheses are for capturing a group. The question mark after the .* is to make the match lazy, so the first closing parenthesis found will end the group.
Note that to use this regex in Java, you'll have to additionally escape the backslashes in order to express the above regex as a string literal:
String regex = "url\\((.*?)\\)";
You need to escape the () to match the parenthesis in the string, and then add another set of () around the part you want to pull out in group 1, the actual url. I also changed the part inside the parenthesis to [^)]*, which will match everything until it finds a ). See below:
url\(([^)]*)\)
i have a string where i want to get rid of brackets
this is my string "(name)"
and i want to get "name"
the same thing without the brackets
i had String s = "(name)";
i wrote
s = s.replaceAll("(","");
s = s.replaceAll(")","");
and i get an exception for that
Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed group near index 1
(
how do i get rid of the brackets?
Parenthesis characters ( and ) delimit the bounds of a capturing group in a regular expression which is used as the first argument in replaceAll. The characters need to be escaped.
s = s.replaceAll("\\(","");
s = s.replaceAll("\\)","");
Better yet, you could simply place the parenthesis in a character class to prevent the characters being interpreted as meta-characters
s = s.replaceAll("[()]","");
s = s.replace("(", "").replace(")", "");
Regex isn't needed here.
If you wanted to use Regex (not sure why you would) you could do something like this:
s = s.replaceAll("\\(", "").replaceAll("\\)", "");
The problem was that ( and ) are meta characters so you need to escape them (assuming you want them to be interpreted as how they appear).
String#replaceAll takes regular expression as argument.
You are using Grouping Meta-characters as regular expression argument.That is why getting error.
Meta-characters are used to group, divide, and perform special operations in patterns.
\ Escape the next meta-character (it becomes a normal/literal character)
^ Match the beginning of the line
. Match any character (except newline)
$ Match the end of the line (or before newline at the end)
| Alternation (‘or’ statement)
() Grouping
[] Custom character class
So use
1.\\( instead of (
2. \\) instead of )
You'll need to escape the brackets like this:
s = s.replaceAll("\\(","");
s = s.replaceAll("\\)","");
You need two slashes since the regex processing engine would need to see a \( to process the bracket as a literal bracket (and not as part of the regex expression), and you'll need to escape the backslash so the regex engine would be able to see it as a backslash.
You need to escape the ( and the ) they have special string literal meaning.
Do it like this:
s = s.replaceAll("\\(","");
s = s.replaceAll("\\)","");
s=s.replace("(","").replace(")","");
I have a question about using replaceAll() function.
if a string has parentheses as a pair, replace it with "",
while(S.contains("()"))
{
S = S.replaceAll("\\(\\)", "");
}
but why in replaceAll("\\(\\)", "");need to use \\(\\)?
Because as noted by the javadocs, the argument is a regular expression.
Parenthesis in a regular expression are used for grouping. If you're going to match parenthesis as part of a regular expression they must be escaped.
It's because replaceAll expects a regex and ( and ) have a special meaning in a regex expressions and need to be escaped.
An alternative is to use replace, which counter-intuitively does the same thing as replaceAll but takes a string as an input instead of a regex:
S = S.replace("()", "");
First, your code can be replaced with:
S = S.replace("()", "");
without the while loop.
Second, the first argument to .replaceAll() is a regular expression, and parens are special tokens in regular expressions (they are grouping operators).
And also, .replaceAll() replaces all occurrences, so you didn't even need the while loop here. Starting with Java 6 you could also have written:
S = S.replaceAll("\\Q()\\E", "");
It is let as an exercise to the reader as to what \Q and \E are: http://regularexpressions.info gives the answer ;)
S = S.replaceAll("\(\)", "") = the argument is a regular expression.
Because the method's first argument is a regex expression, and () are special characters in regex, so you need to escape them.
Because parentheses are special characters in regexps, so you need to escape them. To get a literal \ in a string in Java you need to escape it like so : \\.
So () => \(\) => \\(\\)
I have this line of code to remove some punctuation:
str.replaceAll("[\\-\\!\\?\\.\\,\\;\\:\\\"\\']", "");
I don't know if all the chars in this regex need to be escaped, but I escaped only for safety.
Is there some way to build a regex like this in a more clear way?
Inside [...] you don't need to escape the characters. [.] for instance wouldn't make sense anyway!
The exceptions to the rule are
] since it would close the whole [...] expression prematurely.
^ if it is the first character, since [^abc] matches everything except abc.
- unless it's the first/last character, since [a-z] matches all characters between a to z.
Thus, you could write
str.replaceAll("[-!?.,;:\"']", "")
To quote a string into a regular expression, you could also use Pattern.quote which escapes the characters in the string as necessary.
Demo:
String str = "abc-!?.,;:\"'def";
System.out.println(str.replaceAll("[-!?.,;:\"']", "")); // prints abcdef
You might need to escape the double-quotes because you have the string in double-quotes; but as aioobe says, don't escape the rest. Put the - at the end of the group, however.