In the end I need a regex which basically converts me a phone number into a E164 conform number. As for now i got this:
result = s.replaceAll("[(*)|+| ]", "");
It replaces everything fine: the spaces, the "+"-sign and also the braces "()". But it does not match the content of its braces, so that e.g. the number +49 (0)11 111 11 11 will be replaced to 49111111111.
How can I get this to work?
You can do it, but what if there's more than just a zero between parentheses?
result = s.replaceAll("\\([^()]*\\)|[*+ ]+", "");
As a verbose regex:
result = s.replaceAll(
"(?x) # Allow comments in the regex. \n" +
"\\( # Either match a ( \n" +
"[^()]* # then any number of characters except parentheses \n" +
"\\) # then a ). \n" +
"| # Or \n" +
"[*+\\ ]+ # Match one or more asterisks, pluses or spaces", "");
[(*)|+| ]
is a character class, matching any single parenthesis, asterisk, bar, plus or space character. Get rid of the square brackets and use something like
s.replaceAll("\\(.*?\\)|\\D", "");
This will remove anything between (and including) parentheses, as well as anything else that is not a digit. Note that this will not handle nested parentheses very well - it will eat everything from an open parenthesis to the first closing one it finds, so would change (123(45)67) into 67 (the unbalanced close parenthesis being removed as it's a \D)
You might try this: "(\\(\\d+\\))|\\+|\\s". Removes the paren's and contents, plus sign, and space.
I think you are expecting a little too much magic from character classes. Firstly, in character classes, don't use |. It is just another character that will be matched by the character class. Simply list all the characters you want to include without any delimiters.
Secondly, a character class really just matches single characters. So (*) inside a character class can by definition do nothing more than remove (, or * (literally) or ). If you are 100% sure that your input will never have nested parentheses or unmatched parentheses or something, then you can do something like this:
"(?:\\([^)]\\)|\\D)+"
Related
My program works how I want it to but I stumbled upon something that I don't understand.
String problem = "4 - 2";
problem = problem.replaceAll("[^-?+?0-9]+", " ");
System.out.println(Arrays.asList(problem.trim().split(" ")));
prints [4, -, 2]
but
String problem = "4 - 2";
problem = problem.replaceAll("[^+?-?0-9]+", " ");
System.out.println(Arrays.asList(problem.trim().split(" ")));
doesn't even do anything with the minus sign and prints [4, 2]
Why does it do that, it seems like both should work.
The hyphen has a special meaning inside a character class: it is used to define a character range (like a-z or 0-9), except when:
it is at the start of the character class or immediately after the negation character ^
it is escaped with a backslash
it is at the end of the character class
with some regex engines when it is after a shorthand character class like \w, \s, \d, \p{thing},... (for these one, the situation isn't ambiguous, it can't be a range)
In the first example, it is seen as a literal hyphen (since it is at the beginning).
In your second example, I assume that ?-? defines a range between ? and ? (that is nothing more than the character ?)
Note: ? doesn't have a special meaning inside a character class (it's no more a quantifier but a simple literal character)
If you are trying to match a literal - inside of a [ and ], it must be escaped, \-. In the first case, ^ marks the beginning of a match, so really you are match -?, so there is nothing to escape. In the second case, it seems like you are matching ?-?, which can cause the regular expression to function in a way you did not expect.
PS: To escape in Java, you need \\ instead of \.
In the second example, +?-? means "a plus sign, or any chars between ? and ?, inclusive. Of course, that means just ?, so the whole regex is equivalent to [^+?0-9]+.
The only time within a character class (between the square brackets) that - doesn't mean "between, inclusive" is at the start of the character class, or immediately following a ^ that starts it, or at the end of the character class, or when it's escaped (\-).
For example I have text like below :
case1:
(1) Hello, how are you?
case2:
Hi. (1) How're you doing?
Now I want to match the text which starts with (\d+).
I have tried the following regex but nothing is working.
^[\(\d+\)], ^\(\d+\).
[] are used to match any of the things you specify inside the brackets, and are to be followed by a quantifier.
The second regexp will work: ^\(\d+\), so check your code.
Check also so there's no space in front of the first parenthesis, or add \s* in front.
EDIT: Also, java can be tricky with escapes depending on if the regexp you type is directly translated to a regexp or is first a string literal. You may need to double escape your escapes.
In Java you have to escape parenthesis, so "\\(\\d+\\)" should match (1) in case one and two. Adding ^ as you did "^\\(\\d+\\)" will match only case1.
You have to use double back slashes within java string. Consider this
"\n" give you [line break]
"\\n" give you [backslash][n]
If you are going to downvote my post, at least comment to tell me WHY it's not useful.
I believe Java's Regex Engine supports Positive Lookbehind, in which case you can use the following regex:
(?<=[(][0-9]{1,9999}[)]\s?)\b.*$
Which matches:
The literal text (
Any digit [0-9], between 1 and 9999 times {1,9999}
The literal text )
A space, between 0 and 1 times \s?
A word boundary \b
Any character, between 0 and unlimited times .*
The end of a string $
I'm trying to write a regular expression that checks ahead to make sure there is either a white space character OR an opening parentheses after the words I'm searching for.
Also, I want it to look back and make sure it is preceded by either a non-Word (\W) or nothing at all (i.e. it is the beginning of the statement).
So far I have,
"(\\W?)(" + words.toString() + ")(\\s | \\()"
However, this also matches the stuff at either ends - I want this pattern to match ONLY the word itself - not the stuff around it.
I'm using Java flavor Regex.
As you tagged your question yourself, you need lookarounds:
String regex = "(?<=\\W|^)(" + Pattern.quote(words.toString()) + ")(?= |[(])"
(?<=X) means "preceded by X"
(?<!=X) means "not preceded by X"
(?=X) means "followed by X"
(?!=X) means "not followed by X"
What about the word itself: will it always start with a word character (i.e., one that matches \w)? If so, you can use a word boundary for the leading condition.
"\\b" + theWord + "(?=[\\s(])"
Otherwise, you can use a negative lookbehind:
"(?<!\\w)" + theWord + "(?=[\\s(])"
I'm assuming the word is either quoted like so:
String theWord = Pattern.quote(words.toString());
...or doesn't need to be.
If you don't want a group to be captured by the matching, you can use the special construct (?:X)
So, in your case:
"(?:\\W?)(" + words.toString() + ")(?:\\s | \\()"
You will only have two groups then, group(0) for the whole string and group(1) for the word you are looking for.
In a previous question that i asked,
String split in java using advanced regex
someone gave me a fantastic answer to my problem (as described on the above link)
but i never managed to fully understand it. Can somebody help me? The regex i was given
is this"
"(?s)(?=(([^\"]+\"){2})*[^\"]*$)\\s+"
I can understand some basic things, but there are parts of this regex that even after
thoroughly searching google i could not find, like the question mark preceding the s in the
start, or how exactly the second parenthesis works with the question mark and the equation in the start. Is it possible also to expand it and make it able to work with other types of quotes, like “ ” for example?
Any help is really appreciated.
"(?s)(?=(([^\"]+\"){2})*[^\"]*$)\\s+" Explained;
(?s) # This equals a DOTALL flag in regex, which allows the `.` to match newline characters. As far as I can tell from your regex, it's superfluous.
(?= # Start of a lookahead, it checks ahead in the regex, but matches "an empty string"(1) read more about that [here][1]
(([^\"]+\"){2})* # This group is repeated any amount of times, including none. I will explain the content in more detail.
([^\"]+\") # This is looking for one or more occurrences of a character that is not `"`, followed by a `"`.
{2} # Repeat 2 times. When combined with the previous group, it it looking for 2 occurrences of text followed by a quote. In effect, this means it is looking for an even amount of `"`.
[^\"]* # Matches any character which is not a double quote sign. This means literally _any_ character, including newline characters without enabling the DOTALL flag
$ # The lookahead actually inspects until end of string.
) # End of lookahead
\\s+ # Matches one or more whitespace characters, including spaces, tabs and so on
That complicated group up there that is repeated twice will match in whitespaces in this string which is not in between two ";
text that has a "string in it".
When used with String.split, splitting the string into; [text, that, has, a, "string in it".]
It will only match if there are an even number of ", so the following will match on all spaces;
text that nearly has a "string in it.
Splitting the string into [text, that, nearly, has, a, "string, in, it.]
(1) When I say that a capture group matches "an empty string", I mean that it actually captures nothing, it only looks ahead from the point in the regex you are, and check a condition, nothing is actually captured. The actual capture is done by \\s+ which follows the lookahead.
The (?s) part is an embedded flag expression, enabling the DOTALL mode, which means the following:
In dotall mode, the expression . matches any character, including a line terminator. By default this expression does not match line terminators.
The (?=expr) is a look-ahead expression. This means that the regex looks to match expr, but then moves back to the same point before continuing with the rest of the evaluation.
In this case, it means that the regex matches any \\s+ occurence, that is followed by any even number of ", then followed by non-" until the end ($). In other words, it checks that there are an even number of " ahead.
It can definitely be expanded to other quotes too. The only problem is the ([^\"]+\"){2} part, that will probably have to be made to use a back-reference (\n) instead of the {2}.
This is fairly simple..
Concept
It split's at \s+ whenever there are even number of " ahead.
For example:
Hello hi "Hi World"
^ ^ ^
| | |->will not split here since there are odd number of "
----
|
|->split here because there are even number of " ahead
Grammar
\s matches a \n or \r or space or \t
+ is a quantifier which matches previous character or group 1 to many times
[^\"] would match anything except "
(x){2} would match x 2 times
a(?=bc) would match if a is followed by bc
(?=ab)a would first check for ab from current position and then return back to its position.It then matches a.(?=ab)c would not match c
With (?s)(singleline mode) . would match newlines.So,In this case no need of (?s) since there are no .
I would use
\s+(?=([^"]*"[^"]*")*[^"]*$)
I have a string with the following format:
String str = "someString(anotherString)(lastString)";
I wanted to replace the lastString inside the last brackets, i.e new String should be
newStr = "someString(anotherString)(modified)";
I am using regex with "\\(([^\\}]+)\\)$" pattern.
But I am unable to change only the last content inside brackets.
The above regex gives me the output:
"someString(modified)";
I just want to replace the content of the last brackets, any characters can appear infront of last bracket.
ANy help is appreciated.
I think you have a typo in your expression. Replace the curly brackets with a regular one, and I think it will be fine.
yourString.replaceAll("(.+\\(.+\\)\\()[^\\}]+(\\)$)", "$1modified$2")
String resultString = subjectString.replaceAll(
"(?x)\\( # match opening parenthesis\n" +
"[^()]* # match 0 or more characters except parentheses\n" +
"\\) # match closing parenthesis\n" +
"$ # match the end of the string", "(modified)");
So far, this is not allowing for whitespace between the closing parenthesis and the end of the string. You might want to insert a \\s* before the $ if you need to handle that case, too.