Java, Regex: Replace fields in Json by field key

Java, Regex: Replace fields in Json by field key - java

Requirements:
For each Json field where key matches specified constant replace value with another constant.
{"regular":"a", "sensitive":"b"}
Parameters "sensitive", "*****".
Expected:
{"regular":"a", "sensitive":"*****"}
Values may, or may not have double quotes around them. Replacement constant is double quouted always. Json may be malformed. Java implementation preferably.
Key comparison is case insensitive.

Depending on how malformed your "JSON" is, the following might work - if not, we need more test cases:
"sensitive"\s*:\s* # match "sensitive":
( # capture in group 1:
"[^"]*" # any quoted value
| # or
[^\s,{}"]* # any unquoted value, ending at a comma, brace or whitespace
) # end of group 1
In Java:
String resultString = subjectString.replaceAll(
"(?x)\"sensitive\"\\s*:\\s* # match \"sensitive\":\n" +
"( # capture in group 1:\n" +
" \"[^\"]*\" # any quoted value\n" +
"| # or\n" +
" [^\\s,{}\"]* # an unquoted value, ending at comma, brace or whitespace\n" +
") # end of group 1",
"\"sensitive\":\"******\"");
Test it live on regex101.com.

You can use positive lookbehind to achieve this :
public static void main(String[] args) {
String s = "{\"regular\":\"a\", \"sensitive\":\"b\"}";
String key = "sensitive";
String val = "****";
System.out.println(s.replaceAll("(?<=\"" + key + "\":\")(\\w+)", val));
key = "regular";
System.out.println(s.replaceAll("(?<=\"" + key + "\":\")(\\w+)", val));
}
O/P :
{"regular":"a", "sensitive":"****"}
{"regular":"****", "sensitive":"b"}

You can use the following regex:
String t= "{\"regular\":\"a\", \"sensitive\":\"b\"}"; //{"regular":"a", "sensitive":"b"}
String r = t.replaceAll("(\\s*)\"?sensitive\"?\\s*:\\s*\"?b\"?\\s*", "$1\"sensitive\":\"*****\"");
System.out.println("output "+r); //output {"regular":"a", "sensitive":"*****"}
t= "{\"regular\":\"a\",sensitive:b}"; //{"regular":"a", "sensitive":"b"}
r = t.replaceAll("(\\s*)\"?sensitive\"?\\s*:\\s*\"?b\"?\\s*", "$1\"sensitive\":\"*****\"");
System.out.println("output "+r); //output {"regular":"a","sensitive":"*****"}
DEMO: https://regex101.com/r/uHUhEl/1/

Related

Regex: starts with messages and string between parent message curly brace

I want to get all the message data. Such that it should look for message and all the data between curly braces of the parent message. With the below pattern, I am not getting all parent body.
String data = "syntax = \"proto3\";\r\n" +
"package grpc;\r\n" +
"\r\n" +
"import \"envoyproxy/protoc-gen-validate/validate/validate.proto\";\r\n" +
"import \"google/api/annotations.proto\";\r\n" +
"import \"google/protobuf/wrappers.proto\";\r\n" +
"import \"protoc-gen-swagger/options/annotations.proto\";\r\n" +
"\r\n" +
"message Acc {\r\n" +
" message AccErr {\r\n" +
" enum Enum {\r\n" +
" UNKNOWN = 0;\r\n" +
" CASH = 1;\r\n" +
" }\r\n" +
" }\r\n" +
" string account_id = 1;\r\n" +
" string name = 3;\r\n" +
" string account_type = 4;\r\n" +
"}\r\n" +
"\r\n" +
"message Name {\r\n" +
" string firstname = 1;\r\n" +
" string lastname = 2;\r\n" +
"}";
List<String> allMessages = new ArrayList<>();
Pattern pattern = Pattern.compile("message[^\\}]*\\}");
Matcher matcher = pattern.matcher(data);
while (matcher.find()) {
String str = matcher.group();
allMessages.add(str);
System.out.println(str);
}
}
I am expecting response like below in my array list of string with size 2.
allMessage.get(0) should be:
message Acc {
message AccErr {
enum Enum {
UNKNOWN = 0;
CASH = 1;
}
}
string account_id = 1;
string name = 3;
string account_type = 4;
}
and allMessage.get(1) should be:
message Name {
string firstname = 1;
string lastname = 2;
}

First remove the input prior to "message" appearing at the start of the line, then split on newlines followed by "message" (include the newlines in the split so newlines that intervene parent messages are consumed):
String[] messages = data.replaceAll("(?sm)\\A.*?(?=message)", "").split("\\R+(?=message)");
See live demo.
If you actually need a List<String>, pass that result to Arrays.asList():
List<String> = Arrays.asList(data.replaceAll("(?sm)\\A.*?(?=message)", "").split("\\R+(?=message)"));
The first regex matches everything from start up to, but not including, the first line that starts with message, which is replaced with a blank (ie deleted). Breaking the down:
(?sm) turns on flags s, which makes dot also match newlines, and m, which makes ^ and $ match start and end of each line
\\A means the very start of input
.*? .* means any quantity of any character (including newline as per the s flag being set), but adding ? makes this reluctant, so it matches as few characters as possible while still matching
(?=^message) is a look ahead and means the following characters are a start of a line then "message"
See regex101 live demo for a thorough explanation.
The split regex matches one or more line break sequences when they are followed by "message":
\\R+ means one or more line break sequences (all OS variants)
(?=message) is a look ahead and means the following characters are "message"
See regex101 live demo for a thorough explanation.

Try this for your regex. It anchors on message being the start of a line, and uses a positive lookahead to find the next message or the end of messages.
Pattern.compile("(?s)\r\n(message.*?)(?=(\r\n)+message|$)")
// or
Pattern.compile("(?s)\r?\n(message.*?)(?=(\r?\n)+message|$)")
No spliting, parsing, or managing nested braces either :)
https://regex101.com/r/Wa2xxx/1

Java match two strings without last character

I've a URL with path being /mypath/check/10.10/-123.11 . I want to return true if (optionally) there are 3 digits after decimal instead of 2 e.g /mypath/check/10.101/-123.112 should return true when matched. Before decimal for both two occurences should be exact match.
To cite some examples :
Success
/mypath/check/10.10/-123.11 = /mypath/check/10.101/-123.112
/mypath/check/10.10/-123.11 = /mypath/check/10.101/-123.11
/mypath/check/10.10/-123.11 = /mypath/check/10.10/-123.112
/mypath/check/10.10/123.11 = /mypath/check/10.101/123.112
.. and so forth
Failure :
/mypath/check/10.10/-123.11 != /mypath/check/10.121/-123.152
/mypath/check/10.11/-123.11 != /mypath/check/10.12/-123.11
The numbers before decimal can include - with digits with 1 to 3 numbers.

Try /mypath/check/10\.10/-?123\.11[ ]*=[ ]*/mypath/check/(\d\d)\.\1\d?/
demo

Try this:
url1.equals(url2) || url1.equals(url2.replaceAll("\\d$", ""))

Idea
Regex subpatterns that shall match optionally are suffixed with the ? modifier. In your case this applies to the 3rd character after a decimal point.
An equality tests modulo that optional digit may be implemented in matching each occurrence of the context pattern and replacing the optional part within the match with the empty string. After this normalization the strings can be tested for equality.
Code
// Initializing test data.
// Will compare Strings in batch1, batch2 at the same array position.
//
String[] batch1 = {
"/mypath/check/10.10/-123.11"
, "/mypath/check/10.10/-123.11"
, "/mypath/check/10.10/-123.11"
, "/mypath/check/10.10/123.11"
, "/mypath/check/10.10/-123.11"
, "/mypath/check/10.11/-123.11"
};
String[] batch2 = {
"/mypath/check/10.101/-123.112"
, "/mypath/check/10.101/-123.11"
, "/mypath/check/10.10/-123.112"
, "/mypath/check/10.101/123.112"
, "/mypath/check/10.121/-123.152"
, "/mypath/check/10.12/-123.11"
};
// Regex pattern used for normalization:
// - Basic pattern: decimal point followed by 2 or 3 digits
// - Optional part: 3rd digit of the basic pattern
// - Additional context: Pattern must match at the end of the string or be followed by a non-digit character.
//
Pattern re_p = Pattern.compile("([.][0-9]{2})[0-9]?(?:$|(?![0-9]))");
// Replacer routine for processing the regex match. Returns capture group #1
Function<MatchResult, String> fnReplacer= (MatchResult m)-> { return m.group(1); };
// Processing each test case
// Expected result
// match
// match
// match
// match
// mismatch
// mismatch
//
for ( int i = 0; i < batch1.length; i++ ) {
String norm1 = re_p.matcher(batch1[i]).replaceAll(fnReplacer);
String norm2 = re_p.matcher(batch2[i]).replaceAll(fnReplacer);
if (norm1.equals(norm2)) {
System.out.println("Url pair #" + Integer.toString(i) + ": match ( '" + norm1 + "' == '" + norm2 + "' )");
} else {
System.out.println("Url pair #" + Integer.toString(i) + ": mismatch ( '" + norm1 + "' != '" + norm2 + "' )");
}
}
Demo available here (ideone.com).

I'm assuming the first URL always has exactly 2 digits after every decimal point. If so, match the 2nd URL to the regex formed by appending an optional digit to the end of each decimal fraction in the first URL.
static boolean matchURL(String url1, String url2)
{
return url2.matches(url1.replaceAll("([.][0-9]{2})", "$1[0-9]?"));
}
Test:
String url1 = "/mypath/check/10.10/-123.11";
List<String> tests = Arrays.asList(
"/mypath/check/10.10/-123.11",
"/mypath/check/10.10/-123.111",
"/mypath/check/10.101/-123.11",
"/mypath/check/10.101/-123.111",
"/mypath/check/10.11/-123.11"
);
for(String url2 : tests)
System.out.format("%s : %s = %b%n", url1, url2, matchURL(url1, url2));
Output:
/mypath/check/10.10/-123.11 : /mypath/check/10.10/-123.11 = true
/mypath/check/10.10/-123.11 : /mypath/check/10.10/-123.111 = true
/mypath/check/10.10/-123.11 : /mypath/check/10.101/-123.11 = true
/mypath/check/10.10/-123.11 : /mypath/check/10.101/-123.111 = true
/mypath/check/10.10/-123.11 : /mypath/check/10.11/-123.11 = false

Split string based on parantheses

I'm writing Scala code which splits a line based on a colon (:).
Example, for an input which looked like:
sparker0i#outlook.com : password
I was doing line.split(" : ") (which is essentially Java) and printing the email and the password on Console.
Now my requirement has changed and now a line will look like:
(sparker0i#outlook.com,sparker0i) : password
I want to individually print the email, username and password separately.
I've tried Regex by first trying to split the parantheses, but that didn't work because it is not correct (val lt = line.split("[\\\\(||//)]")). Please guide me with the correct regex/split logic.

I'm not a scala user, but instead of split, I think you can use Pattern and matcher to extract this info, your regex can use groups like:
\((.*?),(.*?)\) : (.*)
regex demo
Then you can extract group 1 for email, group 2 for username and the 3rd group for password.
val input = "(sparker0i#outlook.com,sparker0i) : password"
val pattern = """\((.*?),(.*?)\) : (.*)""".r
pattern.findAllIn(string).matchData foreach {
m => println(m.group(1) + " " + m.group(2) + " " + m.group(3))
}
Credit for this post https://stackoverflow.com/a/3051206/5558072

The regex I would use:
\((.*?),([^)]+)\) : (.+)
Regex Demo
\( # Matches (
( # Start of capture group 1
(.*?) # Capture 0 or more characters until ...
) # End of capture group 1
, # matches ,
( # start of capture group 2
[^)]+ # captures one or more characters that are not a )
) # end of capture group 2
\) # Matches )
: # matches ' : '
( # start of capture group 3
(.+) # matches rest of string
) # end of capture group 3
The Java implementation would be:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Test
{
public static void main(String[] args) {
String s = "(sparker0i#outlook.com,sparker0i) : password";
Pattern pattern = Pattern.compile("\\((.*?),([^)]+)\\) : (.+)");
Matcher m = pattern.matcher(s);
if (m.matches()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
System.out.println(m.group(3));
}
}
}
Prints:
sparker0i#outlook.com
sparker0i
password
Java Demo

In scala 2.13, there is a simple solution without regrex:
Welcome to Scala 2.13.1 (OpenJDK 64-Bit Server VM, Java 1.8.0_222).
Type in expressions for evaluation. Or try :help.
scala> val input = "(sparker0i#outlook.com,sparker0i) : password"
input: String = (sparker0i#outlook.com,sparker0i) : password
scala> val s"($mail,$user) : $pwd" = input
mail: String = sparker0i#outlook.com
user: String = sparker0i
pwd: String = password

this is without doing much change
String s = "(sparker0i#outlook.com,sparker0i) : password";
// do whatever you were doing
String[] sArr = s.split(":");
sArr[0] = sArr[0].replaceAll("[(|)]",""); // just replace those parenthesis with empty string
System.out.println(sArr[0] + " " + sArr[1]);
Output
sparker0i#outlook.com,sparker0i password

Need to extract data from CSV file

In my file I have below data, everything is string
Input
"abcd","12345","success,1234,out",,"hai"
The output should be like below
Column 1: "abcd"
Column 2: "12345"
Column 3: "success,1234,out"
Column 4: null
Column 5: "hai"
We need to use comma as a delimiter , the null value is comming without double quotes.
Could you please help me to find a regular expression to parse this data

You could try a tool like CSVReader from OpenCsv https://sourceforge.net/projects/opencsv/
You can even configure a CSVParser (used by the reader) to output null on several conditions. From the doc :
/**
* Denotes what field contents will cause the parser to return null: EMPTY_SEPARATORS, EMPTY_QUOTES, BOTH, NEITHER (default)
*/
public static final CSVReaderNullFieldIndicator DEFAULT_NULL_FIELD_INDICATOR = NEITHER;

You can use this Regular Expression
"([^"]*)"
DEMO: https://regex101.com/r/WpgU9W/1
Match 1
Group 1. 1-5 `abcd`
Match 2
Group 1. 8-13 `12345`
Match 3
Group 1. 16-32 `success,1234,out`
Match 4
Group 1. 36-39 `hai`

Using the ("[^"]+")|(?<=,)(,) regex you may find either quoted strings ("[^"]+"), which should be treated as is, or commas preceded by commas, which denote null field values. All you need now is iterate through the matches and check which of the two capture groups defined and output accordingly:
String input = "\"abcd\",\"12345\",\"success,1234,out\",,\"hai\"";
Pattern pattern = Pattern.compile("(\"[^\"]+\")|(?<=,)(,)");
Matcher matcher = pattern.matcher(input);
int col = 1;
while (matcher.find()) {
if (matcher.group(1) != null) {
System.out.println("Column " + col + ": " + matcher.group(1));
col++;
} else if (matcher.group(2) != null) {
System.out.println("Column " + col + ": null");
col++;
}
}
Demo: https://ideone.com/QmCzPE

Step #1:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "(,,)";
final String string = "\"abcd\",\"12345\",\"success,1234,out\",,\"hai\"\n"
+ "\"abcd\",\"12345\",\"success,1234,out\",\"null\",\"hai\"";
final String subst = ",\"null\",";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
// The substituted value will be contained in the result variable
final String result = matcher.replaceAll(subst);
System.out.println("Substitution result: " + result);
Original Text:
"abcd","12345","success,1234,out",,"hai"
Transformation: (with null)
"abcd","12345","success,1234,out","null","hai"
Step #2: (use REGEXP)
"([^"]*)"
Result:
abcd
12345
success,1234,out
null
hai
Credits:
Emmanuel Guiton [https://stackoverflow.com/users/7226842/emmanuel-guiton] REGEXP

You can also use the Replace function:
final String inuput = "\"abcd\",\"12345\",\"success,1234,out\",,\"hai\"";
System.out.println(inuput);
String[] strings = inuput
.replaceAll(",,", ",\"\",")
.replaceAll(",,", ",\"\",") // if you have more then one null successively
.replaceAll("\",\"", "\";\"")
.replaceAll("\"\"", "")
.split(";");
for (String string : strings) {
String output = string;
if (output.isEmpty()) {
output = null;
}
System.out.println(output);
}

Advanced parsing of numeric ranges from string

I'm using Java to parse strings input by the user, representing either single numeric values or ranges. The user can input the following string:
10-19
And his intention is to use whole numbers from 10-19 --> 10,11,12...19
The user can also specify a list of numbers:
10,15,19
Or a combination of the above:
10-19,25,33
Is there a convenient method, perhaps based on regular expressions, to perform this parsing? Or must I split the string using String.split(), then manually iterate the special signs (',' and '-' in this case)?

This is how I would go about it:
Split using the , as a delimiter.
If it matches this regular expression: ^(\\d+)-(\\d+)$, then I know I have a range. I would then extract the numbers and create my range (it might be a good idea to make sure that the first digit is lower than the second digit, because you never know...). You then act accordingly.
If it matches this regular expression: ^\\d+$ I would know I have only 1 number, so I have a specific page. I would then act accordingly.

This tested (and fully commented) regex solution meets the OP requirements:
Java regex solution
// TEST.java 20121024_0700
import java.util.regex.*;
public class TEST {
public static Boolean isValidIntRangeInput(String text) {
Pattern re_valid = Pattern.compile(
"# Validate comma separated integers/integer ranges.\n" +
"^ # Anchor to start of string. \n" +
"[0-9]+ # Integer of 1st value (required). \n" +
"(?: # Range for 1st value (optional). \n" +
" - # Dash separates range integer. \n" +
" [0-9]+ # Range integer of 1st value. \n" +
")? # Range for 1st value (optional). \n" +
"(?: # Zero or more additional values. \n" +
" , # Comma separates additional values. \n" +
" [0-9]+ # Integer of extra value (required). \n" +
" (?: # Range for extra value (optional). \n" +
" - # Dash separates range integer. \n" +
" [0-9]+ # Range integer of extra value. \n" +
" )? # Range for extra value (optional). \n" +
")* # Zero or more additional values. \n" +
"$ # Anchor to end of string. ",
Pattern.COMMENTS);
Matcher m = re_valid.matcher(text);
if (m.matches()) return true;
else return false;
}
public static void printIntRanges(String text) {
Pattern re_next_val = Pattern.compile(
"# extract next integers/integer range value. \n" +
"([0-9]+) # $1: 1st integer (Base). \n" +
"(?: # Range for value (optional). \n" +
" - # Dash separates range integer. \n" +
" ([0-9]+) # $2: 2nd integer (Range) \n" +
")? # Range for value (optional). \n" +
"(?:,|$) # End on comma or string end.",
Pattern.COMMENTS);
Matcher m = re_next_val.matcher(text);
String msg;
int i = 0;
while (m.find()) {
msg = " value["+ ++i +"] ibase="+ m.group(1);
if (m.group(2) != null) {
msg += " range="+ m.group(2);
};
System.out.println(msg);
}
}
public static void main(String[] args) {
String[] arr = new String[]
{ // Valid inputs:
"1",
"1,2,3",
"1-9",
"1-9,10-19,20-199",
"1-8,9,10-18,19,20-199",
// Invalid inputs:
"A",
"1,2,",
"1 - 9",
" ",
""
};
// Loop through all test input strings:
int i = 0;
for (String s : arr) {
String msg = "String["+ ++i +"] = \""+ s +"\" is ";
if (isValidIntRangeInput(s)) {
// Valid input line
System.out.println(msg +"valid input. Parsing...");
printIntRanges(s);
} else {
// Match attempt failed
System.out.println(msg +"NOT valid input.");
}
}
}
}
Output:
r'''
String[1] = "1" is valid input. Parsing...
value[1] ibase=1
String[2] = "1,2,3" is valid input. Parsing...
value[1] ibase=1
value[2] ibase=2
value[3] ibase=3
String[3] = "1-9" is valid input. Parsing...
value[1] ibase=1 range=9
String[4] = "1-9,10-19,20-199" is valid input. Parsing...
value[1] ibase=1 range=9
value[2] ibase=10 range=19
value[3] ibase=20 range=199
String[5] = "1-8,9,10-18,19,20-199" is valid input. Parsing...
value[1] ibase=1 range=8
value[2] ibase=9
value[3] ibase=10 range=18
value[4] ibase=19
value[5] ibase=20 range=199
String[6] = "A" is NOT valid input.
String[7] = "1,2," is NOT valid input.
String[8] = "1 - 9" is NOT valid input.
String[9] = " " is NOT valid input.
String[10] = "" is NOT valid input.
'''
Note that this solution simply demonstrates how to validate an input line and how to parse/extract value components from each line. It does not further validate that for range values the second integer is larger than the first. This logic check however, could be easily added.
Edit:2012-10-24 07:00 Fixed index i to count from zero.

You can use
strinput = '10-19,25,33'
eval(cat(2,'[',strrep(strinput,'-',':'),']'))
Best is to include some input checks, also negative numbers will give problems with this method.

In a simplest approach you can use the evil eval for this
A = eval('[10:19,25,33]')
A =
10 11 12 13 14 15 16 17 18 19 25 33
BUT of course you should think twice before you do that. Especially if this is a user-supplied string! Imagine what would happen if the user supplied any other command...
eval('!rm -rf /')
You would have to make sure that there really is nothing else than what you want. You could do this by regexp.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java, Regex: Replace fields in Json by field key - java

Related

Regex: starts with messages and string between parent message curly brace

Java match two strings without last character

Split string based on parantheses

Need to extract data from CSV file

Advanced parsing of numeric ranges from string

Categories

Resources