Deletion in a regex

Deletion in a regex - java

I will split up this problem to be more easy to me :
for this expression :
"created":"589c8377576a33706397f3f4"
I write this regex :
output_row.json.replaceAll("\"created\":\"589c8377576a33706397f3f4\"","");
It works ! Now I would like to use a dynamic token e.g. [[:xdigit:]].
I try this but It didn't work !
output_row.json.replaceAll("\"created\":\"[[:xdigit:]]\"","");
Could you advice me, please ?

[[:xdigit:]] is exactly one hex digit. Add the + quantifier to match 1 to n, or the * to match 0 to n hex digits.

Finally I found the answer :
//replace the value of the key created
output_row.json = output_row.json.replaceAll("\"created\":\"[a-zA-Z0-9]+\"","\"created\":\"" + formatted + "\"");
I don't know why this class is not accepted in Talend editor : [[:xdigit:]]not specific to Java perhaps ?
Anyway the topic is closed for me !
Ale

Related

restricting the regular expression only for a line

I have a CSV file below from one of the system.
""demo"",""kkkk""
""demo " ","fg"
" " demo" "
"demo"
"value1","" frg" ","vaue5"
"val3",""tttyy " ",""hjhj","ghuy"
Objective is get all the 2 pair double quotes removed and only one set of double quote is allowed like below. The spaces between the sets of double quote is not a fixed value. This has to be handled in a Java program using replaceAll
function in Java
"demo","kkkk"
"demo","fg"
"demo"
"demo"
"value1","frg","vaue5"
"val3","tttyy","hjhj","ghuy"
I tired this on regex101 with "[ ]*" and it works for PHP>=7.3 version but not in Java.
Also tried [\"][\"]|[^\"]\s+[\"] but still not getting desired output. Any suggestion please for the regular expression which can be used in Java program?

Based on shown sample data, you can use:
String repl = str.replaceAll("(?:\\h*\"){2}\\h*", "\"");
RegEx Demo
RegEx Details:
(?:\h*\"){2}: Match a pair of double quotes that have 0 or more whitespaces between them
\h*: Match 0 or more whitespace
Replacement is just a "

Regex Replacing issue understanding

I'm trying to program a replacement logic for invalid phone numbers, which I provide with a Map
I read through a few Regex expressions threads, but I don't know if this actually is possible.
Example:
Input phone number: +410712345678
regex I'm trying to use:
"^\\+(?:[0-9] ?){6,14}[0-9]$"
number after regex and filtering should be: +41712345678. So actually removing the first Instance of 0.
Second example:
input phone number: +41(071)2345678
regex I'm trying to use:
"^\\+(?:[0-9] ?)\\({0,3}\\){3,11}[0-9]$"
number after regex and filtering should be: +41712345678. So actually removing the First Instance of 0 and also the braces.
I'm trying to user some kind of pattern to automatically remove those invalid pieces from those phone numbers. The numbers need to be formatted that way to work with my VOIP application.
Is there any way to create a filter pattern like that with regex?

Seems like you should only apply that rule for Switzerland phone number, i.e. for +41 numbers, because simply removing the first 0 from any international number is wrong.
So, ph = ph.replaceFirst("^(\\+41)\\(?0?([0-9]{2})\\)?", "$1$2").
See regex101 for how it works.

Thank you for your answer.
I applied the Regex to my TestImport with the following code:
//...
log.debug("Applying Regex :" + SearchString + " with Replace: " + ReplaceString);
log.debug("Applying Regex for Number:" + Person.get(EPerson.Rufnummer));
Person.put(EPerson.Rufnummer, Person.get(EPerson.Rufnummer).replaceFirst(SearchString, ReplaceString));
log.debug("New Number is:" +Person.get(EPerson.Rufnummer));
log.debug("Applying Regex for Number:" + Person.get(EPerson.RufnummerMobil));
Person.put(EPerson.RufnummerMobil, Person.get(EPerson.RufnummerMobil).replaceFirst(SearchString, ReplaceString));
log.debug("New Number is:" +Person.get(EPerson.RufnummerMobil));
//...
DEBUG [AddressbookFactory] Applying Numberfilter to: {Vorname=Testinator, Nachname=Test, Rufnummer=+410717271818, RufnummerMobil=, RufnummerPrivat=+41(071)7271818, Fax=, Strasse=, PLZ=, Stadt=, Bundesland=, Email=, Firma=, URL=}
DEBUG [AddressbookFactory] Regex Detected
DEBUG [AddressbookFactory] Applying Regex :^(+41)(?0?([0-9]{2}))? with Replace: $1$2
DEBUG [AddressbookFactory] Applying Regex for Number:+410717271818
DEBUG [AddressbookFactory] New Number is: +41717271818
DEBUG [AddressbookFactory] Applying Regex for Number:+41(071)7271818
DEBUG [AddressbookFactory] New Number is: +41717271818
...
And it worked!
Thank you so much for your Quick Response!
I marked your answer as useful, but trough my "newbie" Reputation it does not indicate it.
This Question is resolved.
Sincerly Fabian95qw

search and replace string in java using pattern

Given the string
Content ID [9283745997] Content ID [9283005997] There can be text in between Content ID [9283745953] Content ID [9283741197] Content ID [928374500] There can be valid text here which should not be removed.
I want to remove the text starting Content ID followed by [9283745997] any numbers can be present between square brackets. Eventually I want the result string to be
There can be text in between There can be valid text here which should not be removed.
Could anyone please provide a valid regex to capture this recurring text but the numerals within square brackets are unique?
I appreciate your help!
My soulution to this was :
Pattern p = Pattern.compile("(Content ID \\[\\d*\\] )");
Matcher m = p.matcher(str);
StringBuffer sb = new StringBuffer();
while(m.find()) {
m.appendReplacement(sb, "");
}
m.appendTail(sb);
System.out.println(sb);

So basically you are trying to remove each of Content ID [one or more digits].
To do this you can use replaceAll("regex","replacement") method of String class. As replacement you can use empty String "".
Only problem that stays is what regex should you use.
to match Content ID just write it normally as "Content ID "
to match [ or ] you will have to add \ before each of them because they are regex metacharacters and you need to escape them (in Java you will need to write \ as "\\")
to represent one digit (character from range 0-9) regex uses \d (again in Java you will need to write \ as "\\" which will result in "\\d")
to say "one or more of previously described element" just add + after definition of such element. For example if you want to match one or more letters a you can write it as a+.
Now you should be able to create correct regex. If you will have some questions feel free to ask them in comments.

Try this one:
(Content ID \[[0-9]+\])
You can test it here: http://regexpal.com/

I would use the regex
Content ID \[\d+\] ?
Implement it like this:
str.replaceAll("Content ID \\[\\d+\\] ?", "");
You can find an explanation and demonstration here: http://regex101.com/r/qD5rJ6

Replace two quotes

I cant find a solution to this simple problem.
I want to replace two consecutive '' or `` by ".
Input:
some ``text'' dspsdj
Out:
some "text"
Why:
s.replaceAll("[`{2}'{2}]", "\"")
Out:
some ""text""
???
Thank you

You should do it like this:
s.replaceAll("``|''", "\"")
What you may have intended to do was this here:
s.replaceAll("[`']{2}", "\"")
But that wouldn't be entirely correct

String input = "some ``text'' dspsdj";
String output = input.replaceAll("`{2}|'{2}", "\"");

Put the cardinality after the class:
.replaceAll("[`']{2}", "\""));

Try this:
String resultString = subjectString.replaceAll("([\"'`])\\1", "\"");
Explanation:
<!--
(["'`])\1
Match the regular expression below and capture its match into backreference number 1 «(["'`])»
Match a single character present in the list “"'`” «["'`]»
Match the same text as most recently matched by capturing group number 1 «\1»
-->

Bug in java.util.regex in sun jdk 6.0.24?

The following code blocks on my system. Why?
System.out.println( Pattern.compile(
"^((?:[^'\"][^'\"]*|\"[^\"]*\"|'[^']*')*)/\\*.*?\\*/(.*)$",
Pattern.MULTILINE | Pattern.DOTALL ).matcher(
"\n\n\n\n\n\nUPDATE \"$SCHEMA\" SET \"VERSION\" = 12 WHERE NAME = 'SOMENAMEVALUE';"
).matches() );
The pattern (designed to detect comments of the form /*...*/ but not within ' or ") should be fast, as it is deterministic...
Why does it take soooo long?

You're running into catastrophic backtracking.
Looking at your regex, it's easy to see how .*? and (.*) can match the same content since both also can match the intervening \*/ part (dot matches all, remember). Plus (and even more problematic), they can also match the same stuff that ((?:[^'"][^'"]*|"[^"]*"|'[^']*')*) matches.
The regex engine gets bogged down in trying all the permutations, especially if the string you're testing against is long.
I've just checked your regex against your string in RegexBuddy. It aborts the match attempt after 1.000.000 steps of the regex engine. Java will keep churning on until it gets through all permutations or until a Stack Overflow occurs...
You can greatly improve the performance of your regex by prohibiting backtracking into stuff that has already been matched. You can use atomic groups for this, changing your regex into
^((?>[^'"]+|"[^"]*"|'[^']*')*)(?>/\*.*?\*/)(.*)$
or, as a Java string:
"^((?>[^'\"]+|\"[^\"]*\"|'[^']*')*)(?>/\\*.*?\\*/)(.*)$"
This reduces the number of steps the regex engine has to go through from > 1 million to 58.
Be advised though that this will only find the first occurrence of a comment, so you'll have to apply the regex repeatedly until it fails.
Edit: I just added two slashes that were important for the expressions to work. Yet I had to change more than 6 characters.... :(

I recommend that you read Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...).

I think it's because of this bit:
(?:[^'\"][^'\"]*|\"[^\"]*\"|'[^']*')*
Removing the second and third alternatives gives you:
(?:[^'\"][^'\"]*)*
or:
(?:[^'\"]+)*
Repeated repeats can take a long time.

For comment /* and */ detection I would suggest having a code like this:
String str = "\n\n\n\n\n\nUPDATE \"$SCHEMA\" /*a comment\n\n*/ SET \"VERSION\" = 12 WHERE NAME = 'SOMENAMEVALUE';";
Pattern pt = Pattern.compile("\"[^\"]*\"|'[^']*'|(/\\*.*?\\*/)",
Pattern.MULTILINE | Pattern.DOTALL);
Matcher matcher = pt.matcher(str);
boolean found = false;
while (matcher.find()) {
if (matcher.group(1) != null) {
found = true;
break;
}
}
if (found)
System.out.println("Found Comment: [" + matcher.group(1) + ']');
else
System.out.println("Didn't find Comment");
For above string it prints:
Found Comment: [/*a comment
*/]
But if I change input string to:
String str = "\n\n\n\n\n\nUPDATE \"$SCHEMA\" '/*a comment\n\n*/' SET \"VERSION\" = 12 WHERE NAME = 'SOMENAMEVALUE';";
OR
String str = "\n\n\n\n\n\nUPDATE \"$SCHEMA\" \"/*a comment\n\n*/\" SET \"VERSION\" = 12 WHERE NAME = 'SOMENAMEVALUE';";
Output is:
Didn't find Comment

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Deletion in a regex - java

[[:xdigit:]] is exactly one hex digit. Add the + quantifier to match 1 to n, or the * to match 0 to n hex digits.

Related

restricting the regular expression only for a line

Regex Replacing issue understanding

search and replace string in java using pattern

Replace two quotes

Bug in java.util.regex in sun jdk 6.0.24?

Categories

Resources