Remove all characters between 2 characters in a String - java

I have a database with 50000 rows. I want to get all rows one by one, put them into a String variable and delete all characters between "(" and ")", then update the row. I just don't know how to delete all characters between "(" and ")". I want to do this with java.

Your regular expression would be something like:
data.replaceAll("\\(.*?\\)", "()"); //if you want to keep the brackets
OR
data.replaceAll("\\(.*?\\)", ""); //if you don't want to keep the brackets
The string consists of 3 parts:
\\( //This part matches the opening bracket
.*? //This part matches anything in between
\\) //This part matches the closing bracket
I hope this helps

Use the replaceall(string regex, string replacement) statement, like this:
data.replaceall(" and ", "");

Related

Sentence split with<sup></sup>

I have the following sentence:
String str = " And God said, <sup>c</sup>“Let there be light,” and there was light.";
How do I retrieve all of the words in the sentence, expecting the following?
And
God
said
Let
there
be
light
and
there
was
light
First, get rid of any leading or trailing space:
.trim()
Then get rid of HTML entities (&...;):
.replaceAll("&.*?;", "")
& and ; are literal chars in Regex, and .*? is the non-greedy version of "any character, any number of times".
Next get rid of tags and their contents:
.replaceAll("<(.*?)>.*?</\\1>", "")
< and > will be taken literally again, .*? is explained above, (...) defined a capturing group, and \\1 references that group.
And finally, split on any sequence of non-letters:
.split("[^a-zA-Z]+")
[a-zA-Z] means all characters from a to z and A to Z, ^ inverts the match, and + means "once or more".
So everything together would be:
String words = str.trim().replaceAll("&.*?;", "").replaceAll("<(.*?)>.*?</\\1>", "").split("[^a-zA-Z]+");
Note that this doesn't handle self-closing tags like <img src="a.png" />.
Also note that if you need full HTML parsing, you should think about letting a real engine parse it, as parsing HTML with Regex is a bad idea.
You can use String.replaceAll(regex, replacement) with the regex [^A-Za-z]+ like this to get only characters. Which will also include the sup tag and the c. Which is why you replace the tags and all between them with the first statement.
String str = " And God said, <sup>c</sup>“Let there be light,” and there was light.".replaceAll("<sup>[^<]</sup>", "");
String newstr = str.replaceAll("[^A-Za-z]+", " ");

Validate string that contains template in regex

I have a problem trying to validate this string...
So, the user selects a template: q( ). Then, the user fills in the contents inside the brackets which can end up like this:
q(a,b,c)
I have tried different ways using regex to validate this String, but it keeps returning the answer "No". I believe the problem is "q(" and ")" in my regex as I am not sure how it should look like.
Here's a snippet of the code:
String data2 = "q(a,b,c)";
String regex2 = "q([a-zA-Z0-9,'])";
if(data2.matches(regex2)){
System.out.println("yes");
}
else{
System.out.println("No");
}
I do have an alternative way by removing "q(" and ")" in data2 string, but I rather have it in regex without the need of removing characters in a String.
Any suggestions?
You need to escape the parentheses (and escape the escape character so that it compiles) and add a + at the end to indicate one or more characters
String regex2 = "q\\([a-zA-Z0-9,']+\\)";
You can read the meaning of every character in a regular expression in the Pattern javadoc.

Regex Lookahead and Lookbehinds: followed by this or that

I'm trying to write a regular expression that checks ahead to make sure there is either a white space character OR an opening parentheses after the words I'm searching for.
Also, I want it to look back and make sure it is preceded by either a non-Word (\W) or nothing at all (i.e. it is the beginning of the statement).
So far I have,
"(\\W?)(" + words.toString() + ")(\\s | \\()"
However, this also matches the stuff at either ends - I want this pattern to match ONLY the word itself - not the stuff around it.
I'm using Java flavor Regex.
As you tagged your question yourself, you need lookarounds:
String regex = "(?<=\\W|^)(" + Pattern.quote(words.toString()) + ")(?= |[(])"
(?<=X) means "preceded by X"
(?<!=X) means "not preceded by X"
(?=X) means "followed by X"
(?!=X) means "not followed by X"
What about the word itself: will it always start with a word character (i.e., one that matches \w)? If so, you can use a word boundary for the leading condition.
"\\b" + theWord + "(?=[\\s(])"
Otherwise, you can use a negative lookbehind:
"(?<!\\w)" + theWord + "(?=[\\s(])"
I'm assuming the word is either quoted like so:
String theWord = Pattern.quote(words.toString());
...or doesn't need to be.
If you don't want a group to be captured by the matching, you can use the special construct (?:X)
So, in your case:
"(?:\\W?)(" + words.toString() + ")(?:\\s | \\()"
You will only have two groups then, group(0) for the whole string and group(1) for the word you are looking for.

Replace all content within braces?

In the end I need a regex which basically converts me a phone number into a E164 conform number. As for now i got this:
result = s.replaceAll("[(*)|+| ]", "");
It replaces everything fine: the spaces, the "+"-sign and also the braces "()". But it does not match the content of its braces, so that e.g. the number +49 (0)11 111 11 11 will be replaced to 49111111111.
How can I get this to work?
You can do it, but what if there's more than just a zero between parentheses?
result = s.replaceAll("\\([^()]*\\)|[*+ ]+", "");
As a verbose regex:
result = s.replaceAll(
"(?x) # Allow comments in the regex. \n" +
"\\( # Either match a ( \n" +
"[^()]* # then any number of characters except parentheses \n" +
"\\) # then a ). \n" +
"| # Or \n" +
"[*+\\ ]+ # Match one or more asterisks, pluses or spaces", "");
[(*)|+| ]
is a character class, matching any single parenthesis, asterisk, bar, plus or space character. Get rid of the square brackets and use something like
s.replaceAll("\\(.*?\\)|\\D", "");
This will remove anything between (and including) parentheses, as well as anything else that is not a digit. Note that this will not handle nested parentheses very well - it will eat everything from an open parenthesis to the first closing one it finds, so would change (123(45)67) into 67 (the unbalanced close parenthesis being removed as it's a \D)
You might try this: "(\\(\\d+\\))|\\+|\\s". Removes the paren's and contents, plus sign, and space.
I think you are expecting a little too much magic from character classes. Firstly, in character classes, don't use |. It is just another character that will be matched by the character class. Simply list all the characters you want to include without any delimiters.
Secondly, a character class really just matches single characters. So (*) inside a character class can by definition do nothing more than remove (, or * (literally) or ). If you are 100% sure that your input will never have nested parentheses or unmatched parentheses or something, then you can do something like this:
"(?:\\([^)]\\)|\\D)+"

java regex help

I have a string with the following format:
String str = "someString(anotherString)(lastString)";
I wanted to replace the lastString inside the last brackets, i.e new String should be
newStr = "someString(anotherString)(modified)";
I am using regex with "\\(([^\\}]+)\\)$" pattern.
But I am unable to change only the last content inside brackets.
The above regex gives me the output:
"someString(modified)";
I just want to replace the content of the last brackets, any characters can appear infront of last bracket.
ANy help is appreciated.
I think you have a typo in your expression. Replace the curly brackets with a regular one, and I think it will be fine.
yourString.replaceAll("(.+\\(.+\\)\\()[^\\}]+(\\)$)", "$1modified$2")
String resultString = subjectString.replaceAll(
"(?x)\\( # match opening parenthesis\n" +
"[^()]* # match 0 or more characters except parentheses\n" +
"\\) # match closing parenthesis\n" +
"$ # match the end of the string", "(modified)");
So far, this is not allowing for whitespace between the closing parenthesis and the end of the string. You might want to insert a \\s* before the $ if you need to handle that case, too.

Categories