parsing a string by regular expression

parsing a string by regular expression - java

I have a string of
"name"=>"3B Ae", "note"=>"Test fddd \"33 Ae\" FIXME", "is_on"=>"keke, baba"
and i want to parse it by a java program into segments of
name
3B Ae
note
Test fddd \"33 Ae\" FIXME
is_on
keke, baba
It is noted that the contents of the string, i.e. name, 3B Ae, are not fixed.
Any suggestion?

If you:
replace => with :
Wrap the full string with {}
The result will look like this, which is valid JSON. You can then use a JSON parser (GSON or Jackson, for example) to parse those values into a java object.
{
"name": "3B Ae",
"note": "Test fddd \"33 Ae\" FIXME",
"is_on": "keke, baba"
}
If you have control over the process that produces this string, I highly recommend that you use a standard format like JSON or XML that can be parsed more easily on the other end.

Because of the quoting rules, I'm not certain that a regular expression (even a PCRE with negative lookbehinds) can parse this consistently. What you probably want is to use a pushdown automaton, or some other parser capable of handling a context-free language.

If you can make sure your data (key or value) does not have a => or a , (or find some other delimiters that will not occur), the solution is pretty simple:
Split the string by , you get the key => value pairs
Split the key value => pairs by => you get what you want
if inputString holds
"name"=>"3B Ae", "note"=>"Test fddd \"33 Ae\" FIXME", "is_on"=>"keke baba"
(from a file for instance)
(I have changed the , to ; from between keke and baba)
String[] keyValuePairs = inputString.split(",");
for(String oneKeyValue : keyValuePairs)
{
String[] keyAndValue = oneKeyValue.split("=>");
}

Related

Any suggestions how to create Regex for this in java for String.replaceAll()?

My String is like this.
{\\\"692950841314120\\\":[{\\\"type\\\":\\\"ads_management\\\",\\\"call_count\\\":3,\\\"total_cputime\\\":1,\\\"total_time\\\":5,\\\"estimated_time_to_regain_access\\\":0}]}
Since the key here is a variable value I am trying to replace this 692950841314120(or the values which I get from sever) with a constant like ID. My main goal is to parse this as POJO. I have tried using..
string.replaceAll("^[0-9]{15}$","ID")
but due to Slashes I think i am not able to get the desired value. Is there any better way to do this. I know I can do below Code but I don't want any ID123 if I added extra value and distort any other info in JSON.
string.replaceAll("[0-9]{15}","ID")

Strictly speaking, if you have a valid JSON string, you should parse it using something like GSON, rather than using regex. That being said, if you must use regex, you could try removing the starting and ending anchors:
string.replaceAll("[0-9]{15}", "ID")
Or maybe use double quotes instead:
string.replaceAll("\"[0-9]{15}\"", "ID")

It is safer to assume the value is inisde \" and \":.
You can then use
.replaceAll("(\\\\\")[0-9]{15}(\\\\\":)", "$1ID$2")
The regex is (\\")[0-9]{15}(\\":) and it means:
(\\") - match and capture \" substring into Group 1
[0-9]{15} - fifteen digits
(\\":) - Group 2: a \": substring.
The $1 and $2 are placeholders holding the Group 1 and 2 values.

You should use "A word boundary" \b.
Try this.
public static void main(String[] args) {
String input = "{\\\"692950841314120\\\":"
+ "[{\\\"type\\\":\\\"12345678901234567890\\\","
+ "\\\"call_count\\\":3,"
+ "\\\"total_cputime\\\":1,"
+ "\\\"total_time\\\":5,"
+ "\\\"estimated_time_to_regain_access\\\":0}]}";
System.out.println(input.replaceAll("\\b[0-9]{15}\\b", "ID"));
}
output:
{\"ID\":[{\"type\":\"12345678901234567890\",\"call_count\":3,\"total_cputime\":1,\"total_time\":5,\"estimated_time_to_regain_access\":0}]}

Take non-nested JSON String and Build Json Nested String

This is very new to me. I am reading data from a cassandra table. This data is being extracted via a "select json * ..." query but here's the thing. The format of that json is
{"acct_ref_nb": 1401040701, "txn_pst_dt": "2020-02-26", "txn_pst_tm": 1934131, "txn_am": 15000.0 ....
Every field is in quotation marks, followed by a colon, followed by the value, then a comma and the next field, so on and so forth.
We need to reformat this and have a nested structure. We also need to change the names of the fields. So you would have something like...
"{
"ccEvent": {
"account": {
"accountReferenceNumber": 1401040701,
"transactionPostDate": "2020-02-26",
"transactionPostTime": 1934131,
"transactionAmount": 15000.0,
........
Is there a preferred library to do this? I'm literally lost even at a high level on how to do this. Thanks.

Java regex lookbehind

I want to match a string that has "json" (occurs more than 2 times) and without string "from" between two "json".
For example(what I want the string match or not):
select json,json from XXX -> Yes
select json from json XXXX -> No
select json,XXXX,json from json XXX -> Yes
Why the third is matching because I just want two "json" string occurs without "from" inside between it.
After learning regex lookbehind, I'm write the regex like this:
select.*json.*?(?<!from)json.*from.*
I'm using regex lookbehind to except the from string.
But after test, I find this regex match the string "select get_json_object from get_json_object" too.
What wrong for my regex? Any suggestion is appreciated.

You need to use tempered greedy token for achieving this. Use this regex,
\bjson\b(?:(?!\bfrom\b).)+\bjson\b
This expression (?:(?!\bfrom\b).)+ will match any text that does not contain from as a whole word inside it.
Regex Demo
For matching the whole line, you can use,
^.*\bjson\b(?:(?!\bfrom\b).)+\bjson\b.*$
Like you wanted in your post, this regex will match the line as long as it finds a string where a from does not appear between two jsons
Regex Demo with full line match
Edit:
Why OP's regex select.*json.*?(?<!from)json.*from.* didn't work as expected
Your regex starts matching with select and then .* matches as much as possible, while making sure it finds json ahead followed by some optional characters and then again expects to find a json string then .* matches again some characters then expects to find a from and finally using .* zero or more optional characters.
Let's take an example string that should match.
select json from json json XXXX
It has two json string without from in between so it should match but it doesn't, because in your regex, the order or presence of json and from is fixed which is json then again json then from which is not the case in this string.
Here is a Java code demo
List<String> list = Arrays.asList("select json,json from XXX","select json from json XXXX","select json,json from json XXX","select json from json json XXXX");
list.forEach(x -> {
System.out.println(x + " --> " + x.matches(".*\\bjson\\b(?:(?!\\bfrom\\b).)+\\bjson\\b.*"));
});
Prints,
select json,json from XXX --> true
select json from json XXXX --> false
select json,json from json XXX --> true
select json from json json XXXX --> true

Extract attributes of an string

I got to deal here with a problem, caused by a dirty design. I get a list of string and want to parse attributes out of it. Unfortunately, I can't change the source, where these String were created.
Example:
String s = "type=INFO, languageCode=EN-GB, url=http://www.stackoverflow.com, ref=1, info=Text, that may contain all kind of chars., deactivated=false"
Now I want to extract the attributes type, languageCode, url, ref, info and deactivated.
The problem here is the field info, whose text is not limited by quote mark. Also commas may occur in this field, so I can't use the comma at the end of the string, to find out where is ends.
Additional, those strings not always contain all attributes. type, info and deactivated are always present, the rest is optional.
Any suggestions how I can solve this problem?

One possible solution is to search for = characters in the input and then take the single word immediately before it as the field name - it seems that all your field names are single words (no whitespace). If that's the case, you can then take everything after the = until the next field name (accounting for separating ,) as the value.
This assumes that the value cannot contain =.
Edit:
As a possible way to handle embedded =, you can see if the word in front of it is one your known field names - if not, you can possibly treat the = as an embedded character rather than an operator. This, however, assumes that you have a fixed set of known fields (some of which may not always appear). This assumption may be eased if you know that the field names are case-sensitive.

Assuming that order of elements is fixed you could write solution using regex like this one
String s = "type=INFO, languageCode=EN-GB, url=http://www.stackoverflow.com, ref=1, info=Text, that may contain all kind of chars., deactivated=false";
String regex = //type, info and deactivated are always present
"type=(?<type>.*?)"
+ "(?:, languageCode=(?<languageCode>.*?))?"//optional group
+ "(?:, url=(?<url>.*?))?"//optional group
+ "(?:, ref=(?<rel>.*?))?"//optional group
+ ", info=(?<info>.*?)"
+ ", deactivated=(?<deactivated>.*?)";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
if(m.matches()){
System.out.println("type -> "+m.group("type"));
System.out.println("languageCode -> "+m.group("languageCode"));
System.out.println("url -> "+m.group("url"));
System.out.println("rel -> "+m.group("rel"));
System.out.println("info -> "+m.group("info"));
System.out.println("deactivated -> "+m.group("deactivated"));
}
Output:
type -> INFO
languageCode -> EN-GB
url -> http://www.stackoverflow.com
rel -> 1
info -> Text, that may contain all kind of chars.
deactivated -> false
EDIT: Version2 regex searching for oneOfPossibleKeys=value where value ends with:
, oneOfPossibleKeys=
or has end of string after it (represented by $).
Code:
String s = "type=INFO, languageCode=EN-GB, url=http://www.stackoverflow.com, ref=1, info=Text, that may contain all kind of chars., deactivated=false";
String[] possibleKeys = {"type","languageCode","url","ref","info","deactivated"};
String keysStrRegex = String.join("|", possibleKeys);
//above will contain type|languageCode|url|ref|info|deactivated
String regex = "(?<key>\\b(?:"+keysStrRegex+")\\b)=(?<value>.*?(?=, (?:"+keysStrRegex+")=|$))";
// (?<key>\b(?:type|languageCode|url|ref|info|deactivated)\b)
// =
// (?<value>.*?(?=, (?:type|languageCode|url|ref|info|deactivated)=|$))System.out.println(regex);
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s);
while(m.find()){
System.out.println(m.group("key")+" -> "+m.group("value"));
}
Output:
type -> INFO
languageCode -> EN-GB
url -> http://www.stackoverflow.com
ref -> 1
info -> Text, that may contain all kind of chars.
deactivated -> false

You could use a regular expression, capturing all the "fixed" groups and using whatever remains for info. This should even work if the info part contains , or = characters. Here's some quick example (using Python, but that should not be a problem...).
>>> p = r"(type=[A-Z]+), (languageCode=[-A-Z]+), (url=[^,]+), (ref=\d), (info=.+?), (deactivated=(?:true|false))"
>>> s = "type=INFO, languageCode=EN-GB, url=http://www.stackoverflow.com, ref=1, info=Text, that may contain all kind of chars, even deactivated=true., deactivated=false"
>>> re.search(p, s).groups()
('type=INFO',
'languageCode=EN-GB',
'url=http://www.stackoverflow.com',
'ref=1',
'info=Text, that may contain all kind of chars, even deactivated=true.',
'deactivated=false')
If any of those elements are optional, you can put a ? after those groups, and make the comma optional. If the order can be different, then it's more complicated. In this case, instead of using one RegEx to capture everything at once, use several RegExes to capture the individual attributes and then remove (replace with '') those in the string before matching the next attribute. Finally, match info.
On further consideration, given that those attributes could have any order, it may be more promising to capture just everything spanning from one keyword to the next, regardless of its actual content, very similar to Pshemo's solution:
keys = "type|languageCode|url|ref|info|deactivated"
p = r"({0})=(.+?)(?=\, (?:{0})=|$)".format(keys)
matches = re.findall(p, s)
But this, too, might fail in some very obscure cases, e.g. if the info attribute contains something like ', ref=foo', including the comma. However, there seems to be no way around those ambiguities. If you had a string like info=in this string, ref=1, and in another, ref=2, ref=1, does it contain one ref attribute, or three, or none at all?

How to convert Invalid JSON String to JSONObject in Java?

I have a string response like below which is a invalid json as it contains "obj13=".I want to convert it to a JSONObject(JAVA) and use it.Is there any good way to convert it to JSONObject without using String split operation.
obj13={
players: [
{
name: "rocky",
place: "brazil",
age: "21",
},
{
name: "andy",
place: "New Zealand",
age: "23",
}
]
}

This is, of course, JavaScript, not JSON. If you can, I would go back to the service provider and ask for a JSON response.
If the format of the string is consistent, you could just use:
json=json.substring(json.indexof('=')+1);
and then parse the result. Note that most good parsers should have an option to allow the keywords without quotes and to allow the extraneous commas (mine does, but unfortunately for you it doesn't create JSONObject's but is of a lower level - it's designed to construct the data-structure of the caller's choice, which could be a JSONObject if that's what you wanted but you'd have to code it).
If the result may or may not have the assignment, you may want to get a bit fancier and ensure that the non-whitespace characters before the '=' are valid for a JS identifier and the first non-whitespace after it is '{'.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

parsing a string by regular expression - java

Because of the quoting rules, I'm not certain that a regular expression (even a PCRE with negative lookbehinds) can parse this consistently. What you probably want is to use a pushdown automaton, or some other parser capable of handling a context-free language.

Related

Any suggestions how to create Regex for this in java for String.replaceAll()?

Take non-nested JSON String and Build Json Nested String

Java regex lookbehind

Extract attributes of an string

How to convert Invalid JSON String to JSONObject in Java?

Categories

Resources