restricting the regular expression only for a line - java

I have a CSV file below from one of the system.
""demo"",""kkkk""
""demo " ","fg"
" " demo" "
"demo"
"value1","" frg" ","vaue5"
"val3",""tttyy " ",""hjhj","ghuy"
Objective is get all the 2 pair double quotes removed and only one set of double quote is allowed like below. The spaces between the sets of double quote is not a fixed value. This has to be handled in a Java program using replaceAll
function in Java
"demo","kkkk"
"demo","fg"
"demo"
"demo"
"value1","frg","vaue5"
"val3","tttyy","hjhj","ghuy"
I tired this on regex101 with "[ ]*" and it works for PHP>=7.3 version but not in Java.
Also tried [\"][\"]|[^\"]\s+[\"] but still not getting desired output. Any suggestion please for the regular expression which can be used in Java program?

Based on shown sample data, you can use:
String repl = str.replaceAll("(?:\\h*\"){2}\\h*", "\"");
RegEx Demo
RegEx Details:
(?:\h*\"){2}: Match a pair of double quotes that have 0 or more whitespaces between them
\h*: Match 0 or more whitespace
Replacement is just a "

Related

Any suggestions how to create Regex for this in java for String.replaceAll()?

My String is like this.
{\\\"692950841314120\\\":[{\\\"type\\\":\\\"ads_management\\\",\\\"call_count\\\":3,\\\"total_cputime\\\":1,\\\"total_time\\\":5,\\\"estimated_time_to_regain_access\\\":0}]}
Since the key here is a variable value I am trying to replace this 692950841314120(or the values which I get from sever) with a constant like ID. My main goal is to parse this as POJO. I have tried using..
string.replaceAll("^[0-9]{15}$","ID")
but due to Slashes I think i am not able to get the desired value. Is there any better way to do this. I know I can do below Code but I don't want any ID123 if I added extra value and distort any other info in JSON.
string.replaceAll("[0-9]{15}","ID")
Strictly speaking, if you have a valid JSON string, you should parse it using something like GSON, rather than using regex. That being said, if you must use regex, you could try removing the starting and ending anchors:
string.replaceAll("[0-9]{15}", "ID")
Or maybe use double quotes instead:
string.replaceAll("\"[0-9]{15}\"", "ID")
It is safer to assume the value is inisde \" and \":.
You can then use
.replaceAll("(\\\\\")[0-9]{15}(\\\\\":)", "$1ID$2")
The regex is (\\")[0-9]{15}(\\":) and it means:
(\\") - match and capture \" substring into Group 1
[0-9]{15} - fifteen digits
(\\":) - Group 2: a \": substring.
The $1 and $2 are placeholders holding the Group 1 and 2 values.
You should use "A word boundary" \b.
Try this.
public static void main(String[] args) {
String input = "{\\\"692950841314120\\\":"
+ "[{\\\"type\\\":\\\"12345678901234567890\\\","
+ "\\\"call_count\\\":3,"
+ "\\\"total_cputime\\\":1,"
+ "\\\"total_time\\\":5,"
+ "\\\"estimated_time_to_regain_access\\\":0}]}";
System.out.println(input.replaceAll("\\b[0-9]{15}\\b", "ID"));
}
output:
{\"ID\":[{\"type\":\"12345678901234567890\",\"call_count\":3,\"total_cputime\":1,\"total_time\":5,\"estimated_time_to_regain_access\":0}]}

how to check regex starts and ends with regex

I am having the regex for capturing string if they are in between double quote and not start or end with /.
But the regex solution which I wanted.
The regex should not capture
Condition 1. Capture text between two double or single quotes.
Condition 2. But it shouldn't capture if starts with [ and ends with ]
Condition 3. But it shouldn't if starts with /" and ends with /' or starts with /" and ends with /'
Example:
REGEX: \"(\/?.)*?\"
Input: Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson("test"), "m2m:cin.as"),"payloads_ul.test"),"[/"Dimming Value/"]",input["test"]["in"])
output:
captured output:
1. "test"
2. "m2m:cin.as"
3. "payloads_ul.test"
4. [/"Dimming Value/"]
5. "test"
6. "in"
Expected result:
1. "test"
2. "m2m:cin.as"
3. "payloads_ul.test"
4. [/"Dimming Value/"]
Condition 1 explanation:
Capture the text between double or single quotes.
example:
input : "test","m2m:cin.as"
output: "test","m2m:cin.as"
Condition 2 explanation:
If the regex is between starts with [ and ends with ] but it is having double or single quote then also it should not capture.
example:
input: ["test"]
output: it should not capture
Condition 3 explanation:
In the above-expected result for the input "[/"Dimming Value/"]" there is a two-time double quote but is capturing only one excluding /". So, the output is [/"Dimming Value/"]. Like this, I want if /' (single quote preceded by /).
Note:
For input "[/"Dimming Value/"]" or '[/'Dimming Value/']', here although the text is between double quote and single quote and having [ and ] it should not ignore the string. The output should be [/"Dimming Value/"].
As I understood, you want to capture text between double quotes, except:
if initial double quotes prefixed by [ or final double quotes suffixed by ]
doubles quotes prefixed by / should not be the begin or end of matched text
I don't know if you want also capture text between single quotes, because you text is not complete clear.
For create a non capture group with negative matching of prefixed chars, you need a group of type Negative Lookbehind, with syntax (?<!prefix that you dont want), but this is not present on java or javascript regex engine.
The best regex that I build to return what you want for you example (but only work on PHP or python (you can check it on site regex101.com or similar)) is:
(?<![\[/])\"(?!\])(\/?.)*?\"(?![\]/])
I added the restriction for don't match if initial double quotes suffixed by ] to prevent match "][" on text ["test"]["in"]
Anyway, this will not solve your problem, since will not work within java or javascript engine!
Do you have any way to process the results, and exclude the bad matches?
If so, you can match bad prefix and bad suffix and exclude it from the results:
[\[]?\"(\/?.)*?\"[\]]?
this will return:
"test"
"m2m:cin.as"
"payloads_ul.test"
"[/"Dimming Value/"]"
["test"]
["in"]
Full javascript code, including pos processing:
'Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson("test"), "m2m:cin.as"),"payloads_ul.test"),"[/"Dimming Value/"]",input["test"]["in"])'
.match(/[\[]?\"(\/?.)*?\"[\]]?/g).filter(s => !s.startsWith('[') && !s.endsWith(']'))
this will return:
"test"
"m2m:cin.as"
"payloads_ul.test"
"[/"Dimming Value/"]"
EDIT:
equivalent java code:
CharSequence yourStringHere = "Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson(\"test\"), \"m2m:cin.as\"),\"payloads_ul.test\"),\"[/\"Dimming Value/\"]\",input[\"test\"][\"in\"])";
Matcher m = Pattern.compile("[\\[]?\\\"(\\/?.)*?\\\"[\\]]?")
.matcher(yourStringHere);
while (m.find()) {
String s = m.group();
if (!s.startsWith("[") && !s.endsWith("]")) {
allMatches.add(s);
}
}

Replacing quotes in a Java String only on specific places

We have a String as below.
\config\test\[name="sample"]\identifier["2"]\age["3"]
I need to remove the quotes surrounding the numbers. For example, the above string after replacement should look like below.
\config\test\[name="sample"]\identifier[2]\age[3]
Currently I'm trying with the regex as below
String.replaceAll("\"\\\\d\"", "");
This is replacing the numbers also. Please help to find out a regex for this.
You can use replaceAll with this regex \"(\d+)\" so you can replace the matching of \"(\d+)\" with the capturing group (\d+) :
String str = "\\config\\test\\[name=\"sample\"]\\identifier[\"2\"]\\age[\"3\"]";
str = str.replaceAll("\"(\\d+)\"", "$1");
//----------------------^____^------^^
Output
\config\test\[name="sample"]\identifier[2]\age[3]
regex demo
Take a look about Capturing Groups
We can try doing a blanket replacement of the following pattern:
\["(\d+)"\]
And replacing it with this:
\[$1\]
Note that we specifically target quoted numbers only appearing in square brackets. This minimizes the risk of accidentally doing an unintended replacement.
Code:
String input = "\\config\\test\\[name=\"sample\"]\\identifier[\"2\"]\\age[\"3\"]";
input = input.replaceAll("\\[\"(\\d+)\"\\]", "[$1]");
System.out.println(input);
Output:
\config\test\[name="sample"]\identifier[2]\age[3]
Demo here:
Rextester
You can use:
(?:"(?=\d)|(?<=\d)")
and replace it with nothing == ( "" )
fast test:
echo '\config\test\[name="sample"]\identifier["2"]\age["3"]' | perl -lpe 's/(?:"(?=\d)|(?<=\d)")//g'
the output:
\config\test\[name="sample"]\identifier[2]\age[3]
test2:
echo 'identifier["123"]\age["456"]' | perl -lpe 's/(?:"(?=\d)|(?<=\d)")//g'
the output:
identifier[123]\age[456]
NOTE
if you have only a single double quote " it works fine; otherwise you should add quantifier + for both beginning and end "
test3:
echo '"""""1234234"""""' | perl -lpe 's/(?:"+(?=\d)|(?<=\d)"+)//g'
the output:
1234234

Trying to create a regexp

I have a string which I want a string to parse via Java or Python regexp:
something (\var1 \var2 \var3 $var4 #var5 $var6 *fdsfdsfd #uytuytuyt fdsgfdgfdgf aaabbccc)
The number of var is unknown. Their exact names are unknown. Their names may or may not start with "\" or "$", "*", "#" or "#" and there're delimited by whitespace.
I'd like to parse them separately, that is, in capture groups, if possible. How can I do that? The output I want is a list of:
[\var1 , \var2 , \var3 , $var4 , #var5 , $var6 , *fdsfdsfd , #uytuytuyt , fdsgfdgfdgf , aaabbccc]
I don't need the java or python code, I just need the regexp. My incomplete one is:
something\s\(.+\)
something\s\((.+)\)
In this regex you are capturing the string containing all the variables. split it based on whitespace since you are sure that they are delimited by whitespace.
m = re.search('something\s\((.+)\)', input_string)
if m:
list_of_vars = m.group(1).split()

How to remove comma after a word pattern in java

Please help me out to get the specific regex to remove comma after a word pattern in java.
Assume, I would like to delete comma after each pattern where the pattern is <Word$TAG>, <Word$TAG>, <Word$TAG>, <Word$TAG>, <Word$TAG> now I want my output to be <Word$TAG> <Word$TAG> <Word$TAG> <Word$TAG> . if I used .replaceAll(), it will replace all commas, but in my <Word$TAG> Word may have a comma(,).
For example, Input.txt is as follows
mms§NNP_ACRON, site§N_NN, pe§PSP, ,,,,,§RD_PUNC, link§N_NN, ....§RD_PUNC, CID§NNP_ACRON, team§N_NN, :)§E
and Output.txt
mms§NNP_ACRON site§N_NN pe§PSP ,,,,,§RD_PUNC link§N_NN ....§RD_PUNC CID§NNP_ACRON team§N_NN :)§E
You could use ", " as search and replace it with " " (space) as below:
one.replace(", ", " ");
If you think, you have "myString, ,,," or multiple spaces in between, then you could use replace all with regex like
one.replaceAll(",\\s+", " ");
(?<=[^,\s]),
Try this.Replace by empty string.See demo.
http://regex101.com/r/lZ5mN8/5
Match the data you want, not the one you don't want.
You probably want ([^ ]+), and keep the bracketed data, separated by whitespace.
You might even want to narrow it down to ([^ ]+§[^ ]+),. Usually, stricter is better.
You could use a positive lookahead assertion to match all the commas which are followed by a space or end of the line anchor.
String s = "mms§NNP_ACRON, site§N_NN, pe§PSP, ,,,,,§RD_PUNC, link§N_NN, ....§RD_PUNC, CID§NNP_ACRON, team§N_NN, :)§E";
System.out.println(s.replaceAll(",(?=\\s|$)",""));
Output:
mms§NNP_ACRON site§N_NN pe§PSP ,,,,,§RD_PUNC link§N_NN ....§RD_PUNC CID§NNP_ACRON team§N_NN :)§E

Categories