Replacing quotes in a Java String only on specific places - java

We have a String as below.
\config\test\[name="sample"]\identifier["2"]\age["3"]
I need to remove the quotes surrounding the numbers. For example, the above string after replacement should look like below.
\config\test\[name="sample"]\identifier[2]\age[3]
Currently I'm trying with the regex as below
String.replaceAll("\"\\\\d\"", "");
This is replacing the numbers also. Please help to find out a regex for this.

You can use replaceAll with this regex \"(\d+)\" so you can replace the matching of \"(\d+)\" with the capturing group (\d+) :
String str = "\\config\\test\\[name=\"sample\"]\\identifier[\"2\"]\\age[\"3\"]";
str = str.replaceAll("\"(\\d+)\"", "$1");
//----------------------^____^------^^
Output
\config\test\[name="sample"]\identifier[2]\age[3]
regex demo
Take a look about Capturing Groups

We can try doing a blanket replacement of the following pattern:
\["(\d+)"\]
And replacing it with this:
\[$1\]
Note that we specifically target quoted numbers only appearing in square brackets. This minimizes the risk of accidentally doing an unintended replacement.
Code:
String input = "\\config\\test\\[name=\"sample\"]\\identifier[\"2\"]\\age[\"3\"]";
input = input.replaceAll("\\[\"(\\d+)\"\\]", "[$1]");
System.out.println(input);
Output:
\config\test\[name="sample"]\identifier[2]\age[3]
Demo here:
Rextester

You can use:
(?:"(?=\d)|(?<=\d)")
and replace it with nothing == ( "" )
fast test:
echo '\config\test\[name="sample"]\identifier["2"]\age["3"]' | perl -lpe 's/(?:"(?=\d)|(?<=\d)")//g'
the output:
\config\test\[name="sample"]\identifier[2]\age[3]
test2:
echo 'identifier["123"]\age["456"]' | perl -lpe 's/(?:"(?=\d)|(?<=\d)")//g'
the output:
identifier[123]\age[456]
NOTE
if you have only a single double quote " it works fine; otherwise you should add quantifier + for both beginning and end "
test3:
echo '"""""1234234"""""' | perl -lpe 's/(?:"+(?=\d)|(?<=\d)"+)//g'
the output:
1234234

Related

Regex pattern matching is getting timed out

I want to split an input string based on the regex pattern using Pattern.split(String) api. The regex uses both positive and negative lookaheads. The regex is supposed to split on a delimiter (,) and needs to ignore the delimiter if it is enclosed in double inverted quotes("x,y").
The regex is - (?<!(?<!\Q\\E)\Q\\E)\Q,\E(?=(?:[^\Q"\E]*(?<=\Q,\E)\Q"\E[[^\Q,\E|\Q"\E] | [\Q"\E]]+[^\Q"\E]*[^\Q\\E]*[\Q"\E]*)*[^\Q"\E]*$)
The input string for which this split call is getting timed out is -
"","1114356033020-0011,- [BRACKET],1114356033020-0017,- [FRAME],1114356033020-0019,- [CLIP],1114356033020-0001,- [FRAME ASSY],1114356033020-0013,- [GUSSET],1114356033020-0015,- [STIFFENER]","QH20426AD3 [RIVET,SOL FL HD],UY510AE3L [NUT,HEX],PO41071B0 [SEALING CMPD],LL510A3-10 [\"BOLT,HI-JOK\"]"
I read that the lookup technics are heavy and can cause the timeouts if the string is too long. And if I remove the backward slashes enclosing [\"BOLT,HI-JOK\"] at the end of the string, then the regex is able to detect and split.
The pattern also does not detect the first delimiter at place [STIFFENER]","QH20426AD3 with the above string. But if I remove the backward slashes enclosing [\"BOLT,HI-JOK\"] at the end of the string, then the regex is able to detect it.
I am not very experienced with the lookup in regex, can some one please give hints about how can I optimize this regex and avoid time outs?
Any pointers, article links are appreciated!
If you want to split on a comma, and the strings that follow are from an opening till closing double quote after it:
,(?="[^"\\]*(?:\\.[^"\\]*)*")
The pattern matches:
, Match a comma
(?= Positive lookahad
"[^"\\]* Match " and 0+ times any char except " or \
(?:\\.[^"\\]*)*" Optionally repeat matching \ to escape any char using the . and again match any chars other than " and /
) Close lookahead
Regex demo | Java demo
String string = "\"\",\"1114356033020-0011,- [BRACKET],1114356033020-0017,- [FRAME],1114356033020-0019,- [CLIP],1114356033020-0001,- [FRAME ASSY],1114356033020-0013,- [GUSSET],1114356033020-0015,- [STIFFENER]\",\"QH20426AD3 [RIVET,SOL FL HD],UY510AE3L [NUT,HEX],PO41071B0 [SEALING CMPD],LL510A3-10 [\\\"BOLT,HI-JOK\\\"]\"\n";
String[] parts = string.split(",(?=\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\")");
for (String part : parts)
System.out.println(part);
Output
""
"1114356033020-0011,- [BRACKET],1114356033020-0017,- [FRAME],1114356033020-0019,- [CLIP],1114356033020-0001,- [FRAME ASSY],1114356033020-0013,- [GUSSET],1114356033020-0015,- [STIFFENER]"
"QH20426AD3 [RIVET,SOL FL HD],UY510AE3L [NUT,HEX],PO41071B0 [SEALING CMPD],LL510A3-10 [\"BOLT,HI-JOK\"]"

Regular expression: Replace everything before first occurence

I have the following regular expression that I'm using to remove the dev. part of my URL.
String domain = "dev.mydomain.com";
System.out.println(domain.replaceAll(".*\\.(?=.*\\.)", ""));
Outputs: mydomain.com but this is giving me issues when the domains are in the vein of dev.mydomain.com.pe or dev.mydomain.com.uk in those cases I am getting only the .com.pe and .com.uk parts.
Is there a modifier I can use on my regex to make sure it only takes what is before the first . (dot included)?
Desired output:
dev.mydomain.com -> mydomain.com
stage.mydomain.com.pe -> mydomain.com.pe
test.mydomain.com.uk -> mydomain.com.uk
You may use
^[^.]+\.(?=.*\.)
See the regex demo and the regex graph:
Details
^ - start of string
[^.]+ - 1 or more chars other than dots
\. - a dot
(?=.*\.) - followed with any 0 or more chars other than line break chars as many as possible and then a ..
Java usage example:
String result = domain.replaceFirst("^[^.]+\\.(?=.*\\.)", "");
Following regex will work for you. It will find first part (if exists), captures rest of the string as 2nd matching group and replaces the string with 2nd matching group. .*? is non-greedy search that will match until it sees first dot character.
(.*?\.)?(.*\..*)
Regex Demo
sample code:
String domain = "dev.mydomain.com";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
domain = "stage.mydomain.com.pe";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
domain = "test.mydomain.com.uk";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
domain = "mydomain.com";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
output:
mydomain.com
mydomain.com.pe
mydomain.com.uk
mydomain.com

how to check regex starts and ends with regex

I am having the regex for capturing string if they are in between double quote and not start or end with /.
But the regex solution which I wanted.
The regex should not capture
Condition 1. Capture text between two double or single quotes.
Condition 2. But it shouldn't capture if starts with [ and ends with ]
Condition 3. But it shouldn't if starts with /" and ends with /' or starts with /" and ends with /'
Example:
REGEX: \"(\/?.)*?\"
Input: Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson("test"), "m2m:cin.as"),"payloads_ul.test"),"[/"Dimming Value/"]",input["test"]["in"])
output:
captured output:
1. "test"
2. "m2m:cin.as"
3. "payloads_ul.test"
4. [/"Dimming Value/"]
5. "test"
6. "in"
Expected result:
1. "test"
2. "m2m:cin.as"
3. "payloads_ul.test"
4. [/"Dimming Value/"]
Condition 1 explanation:
Capture the text between double or single quotes.
example:
input : "test","m2m:cin.as"
output: "test","m2m:cin.as"
Condition 2 explanation:
If the regex is between starts with [ and ends with ] but it is having double or single quote then also it should not capture.
example:
input: ["test"]
output: it should not capture
Condition 3 explanation:
In the above-expected result for the input "[/"Dimming Value/"]" there is a two-time double quote but is capturing only one excluding /". So, the output is [/"Dimming Value/"]. Like this, I want if /' (single quote preceded by /).
Note:
For input "[/"Dimming Value/"]" or '[/'Dimming Value/']', here although the text is between double quote and single quote and having [ and ] it should not ignore the string. The output should be [/"Dimming Value/"].
As I understood, you want to capture text between double quotes, except:
if initial double quotes prefixed by [ or final double quotes suffixed by ]
doubles quotes prefixed by / should not be the begin or end of matched text
I don't know if you want also capture text between single quotes, because you text is not complete clear.
For create a non capture group with negative matching of prefixed chars, you need a group of type Negative Lookbehind, with syntax (?<!prefix that you dont want), but this is not present on java or javascript regex engine.
The best regex that I build to return what you want for you example (but only work on PHP or python (you can check it on site regex101.com or similar)) is:
(?<![\[/])\"(?!\])(\/?.)*?\"(?![\]/])
I added the restriction for don't match if initial double quotes suffixed by ] to prevent match "][" on text ["test"]["in"]
Anyway, this will not solve your problem, since will not work within java or javascript engine!
Do you have any way to process the results, and exclude the bad matches?
If so, you can match bad prefix and bad suffix and exclude it from the results:
[\[]?\"(\/?.)*?\"[\]]?
this will return:
"test"
"m2m:cin.as"
"payloads_ul.test"
"[/"Dimming Value/"]"
["test"]
["in"]
Full javascript code, including pos processing:
'Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson("test"), "m2m:cin.as"),"payloads_ul.test"),"[/"Dimming Value/"]",input["test"]["in"])'
.match(/[\[]?\"(\/?.)*?\"[\]]?/g).filter(s => !s.startsWith('[') && !s.endsWith(']'))
this will return:
"test"
"m2m:cin.as"
"payloads_ul.test"
"[/"Dimming Value/"]"
EDIT:
equivalent java code:
CharSequence yourStringHere = "Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson(\"test\"), \"m2m:cin.as\"),\"payloads_ul.test\"),\"[/\"Dimming Value/\"]\",input[\"test\"][\"in\"])";
Matcher m = Pattern.compile("[\\[]?\\\"(\\/?.)*?\\\"[\\]]?")
.matcher(yourStringHere);
while (m.find()) {
String s = m.group();
if (!s.startsWith("[") && !s.endsWith("]")) {
allMatches.add(s);
}
}

Java Regex, Split String at punctuation marks except in brackets

I have this String: "Hello, my Name is [[Peter.java]]."
The desired split is: [Hello, my, Name, is, [[Peter.java]]]
I split at punktuation marks but completly ignore things in these brackets.
I tried:
string.split("(?!\\[\\[.*\\]\\])\\s*(\\,|\\.|\\s)\\s*")
but this doesnt work because the output is [Hello, my, Name, is, [[Peter, java]]]. Can you help me?
Other examples:
"Hello. My name is [[Peter.java]]" --> [Hello, My, name, is, [[Peter.java]]]
"Hi. How, [[are,you]]" --> [Hi, How, [[are,you]]]
You can use this regex to split:
[.,\s]+(?!\w+])
Working demo
The code:
public void testRegex() {
String str = "Hello. my Name is [[Peter.java]].";
String[] arr = str.split("[.,\\s]+(?!\\w+])");
System.out.println(Arrays.toString(arr));
}
// Output: [Hello, my, Name, is, [[Peter.java]]]
Edit: as HamZa pointed in his comment, the regex above fails is the string is something, like this]. So, to leverage the usage of SKIP & FAIL pcre feature, this regex can be improved by using:
\[\[.*?\]\] # Match our brackets
(*SKIP)(*FAIL) # Skip that match and proceed further
| # or
[\s.,]+ # any character of: whitespace (\n, \r, \t,
\f, and " "), '.', ',' (1 or more times)
Working demo
Instead of using String.split, you'll probably want to use a different sort of regex.
/\[\[(.*?)\]\]|(\w+)\W/g
Online demo
Then use a matcher to iterate through the matches.

A Little RegEx Help Please - Part 2

Here's the input string:
loadMedia('mediacontainer1', 'http://www.something.com/videos/JohnsAwesomeVideo.flv', 'http://www.something.com/videos/JohnsAwesomeCaption.xml', '/videos/video-splash-image.gif')
With this RegExp: (?<=')[^']+.xml(?=')
... we get this:
http://www.something.com/videos/JohnsAwesomeCaption.xml
... which is exactly what I wanted. BUT this time I'd like to select the complete string EXCEPT for the above. Basically an inverse selection. The output should look like this:
loadMedia('mediacontainer1', 'http://www.something.com/videos/JohnsAwesomeVideo.flv', '', '/videos/video-splash-image.gif')
Thanks!
Do a match of your pattern, get the substring that matched, then do a replace of that exact substring with an empty string (or '', or whatever). If your language has the option to return the indices that matched rather than the text that matched it's even easier because you can just remove the text range identified by the indices.
Some languages provide a "regusb" (regular expression substitute) command that lets you replace a regular expression with another string, which lets you do the match-and-replace all in one step. Different languages may call that command by different names.
Try this if you have to do the task in Regex:
Replace
(.+)(?<=')[^']+\.xml(?=')(.+)
with
$1$2
Not sure which language you are using but in Java you can do:
str = "loadMedia('mediacontainer1', 'http://www.something.com/videos/JohnsAwesomeVideo.flv', 'http://www.something.com/videos/JohnsAwesomeCaption.xml', '/videos/video-splash-image.gif')";
System.out.println(str.replaceFirst("(?<=')[^']+\\.xml(?=')", ""));
Or in php use this code:
$str = "loadMedia('mediacontainer1', 'http://www.something.com/videos/JohnsAwesomeVideo.flv', 'http://www.something.com/videos/JohnsAwesomeCaption.xml', '/videos/video-splash-image.gif')";
echo preg_replace("~(?<=')[^']+\.xml(?=')~", "", $str);
OUTPUT (from both)
loadMedia('mediacontainer1', 'http://www.something.com/videos/JohnsAwesomeVideo.flv', '', '/videos/video-splash-image.gif')
EDIT
Use following Java code to break the string into 2 parts as per your comments:
Pattern p = Pattern.compile("^(.*)(?<=')[^']+\\.xml(?=')(.*)$");
Matcher m = p.matcher(str);
if (m.find())
System.out.println("Output: " + m.group(1) + "<br /><br />" + m.group(2));
OUTPUT
Matched: loadMedia('mediacontainer1', 'http://www.something.com/videos/JohnsAwesomeVideo.flv', '<br /><br />', '/videos/video-splash-image.gif')

Categories