JSON broken when double quotes comes inside the key/value - java

Sample data:
{"630":{"TotalLength":"33-3/8" - 36-3/4""},"631":{"Length":"34 37 7/8"}}
We are facing the double quotes issue in JSON response. How we can replace the double quotes with " \" " which comes inside the key or value? Java is the development platform.

This answer is assuming that you are not in control of creating this JSON-like string. If you can control that part, then you should be escaping properly there itself.
In this case, since parsing systematically is not an option as it's not a valid JSON yet, all I could suggest is to go through the various strings and see if you can find a pattern on which you can apply some logic and escape all the "s which prevent the string from being a valid JSON.
Here is probably a way to start:
All of the "s that are needed to be there for the string to be a vaild JSON are surrounded by one or multiple characters among {, :, ,, and }, with or without space in between the " and the other JSON characters.
So, if you parse the JSON-like string using Java and look for all the "s, and, when encountered with one, if they are along with any of the above characters (with or without space in between), you just leave it as it is. If not, replace that " with a \".
Note that the above method may or may not work depending on the data in question. What I mean to convey is the approach that you may find useful if there's absolutely no way for the string to be escaped during it's creation, and, if these strings follow a strict pattern with respect to the unescaped "s.

Related

Conversion of String into Map in Java With Some Special Characters

I am trying to convert a String into Java Map but getting the following exception
com.google.gson.stream.MalformedJsonException
This is the string that I am trying to Map.
String myString = "{name=Nikhil Gupta,age=23,location=234Niwas#res=34}"
Map innerMap = new Gson().fromJson(myString,Map.class);
I understood the main problem here that because of these special characters I am getting this error.
If I remove those spaces and special characters then it will work fine.
Is there any way to do this without removing those spaces and special characters?
The approach used so far.
Wrapped those strings with special characters inside a single quote.
String myString = "{name='Nikhil Gupta',age='23',location='234Niwas#res=34'}"
But this is something that I don't want to use in a production environment as it will not work with nested structures.
Is there some genuine way to approach this in java?
I understood the main problem here that because of these special characters I am getting this error.
No, it's not because of "special characters" (whatever that means exactly).
{name=Nikhil Gupta,age=23,location=234Niwas#res=34}
The string you're trying to parse is simply not in JSON format, but in some other format that superficially resembles JSON. Your fixes by enclosing values in single quotes still don't make it JSON.
If it were valid JSON, it would look like this:
{"name":"Nikhil Gupta","age":23,"location":"234Niwas#res=34"}
Notable differences with your original:
Keys must be enclosed in double quotes
String values must be enclosed in double quotes (numeric values do not)
Key and value must be separated by a colon : instead of an equals sign =
Ways to solve this:
Use actual JSON format; see json.org for the specification
If you can't make it real JSON and you must absolutely use the format you are using, then you need to write your own parser for this custom non-JSON format

Escape inch symbol while splitting from comma

I have a CSV splitter with following regex for splitting a string with comma.
String[] splitData = splitCSV.split(",(?=(?:[^\"]*\"[^\"]*\"^\")*[^\"]*$)");
It works so far for String like 123, "foo", "bar", "no, split, here" but when it encounters an inch sign(") like following it cannot do the splitting.
"123, 1.0" xyz"
I need it to split into 123 and 1.0" xyz
Hope someone can provide a solution for this. Thank you.
A couple of points here:
You should be using an existing CSV processing library, not creating your own with a regex. There are many available for Java, see this question as a starting point. This is a solved problem; there's no reason to reinvent it.
The scenario you mention would be invalid* data. A quote should be escaped within a string, usually by using two quotes together. Having one unescaped quote makes the file invalid; and furthermore there is usually no reliable way to tell what the file "should" be once you have these sorts of errors. What to do about it:
If the file is within your control, correct it. Use a standard escape format for quotes within a string.
If the file is not within your control, you should handle errors separately rather than including this in your core processing. Either preprocess the file looking for errors, or use error handling available in a CSV library to do something with the lines that come back as having an incorrect format. If the errors are limited to a predictable issue that you know ahead of time, you might be able to correct them. But in most cases errors like this lead you to have to reject the lines.
*Technically there is no CSV standard, so anything goes. But this would be a data error in any reasonable format. And in the real world this almost always occurs because someone didn't think the file format through, not because they intentionally planned it this way.
What you have here is an unusual dialect of CSV.
Although there is no formalised standard for CSV, there are broadly two approaches to quotes:
Quotes are not special. That is: 7" single, 12" album is two items: 7" single and 12" album. In this dialect, items containing , are problematic.
Quotes are special. That is: "you, me","me you" is two items: you, me and me, you. In this dialect, you can put quotes around an entry in order to have a , within an item. However it makes items containing " problematic, as you have found.
The typical answer to the " problem in the second approach, is to escape quotes. So the item 7" single would appear in the CSV as "7\" single". This of course means that \ becomes a problem, but that's easily solved the same way. AC\DC 7" single appears in the CSV as "AC\\DC 7\" single".
If you can adopt one of these conventional approaches, then do so. Then you can either use an existing CSV library, or roll your own. Although a regex can consume these formats, my opinion is that it's not the clearest way to write code to consume CSV: I've found that a more explicit state machine (e.g. a switch (state) statement) is nice and clear.
If you can't change your input format, the puzzle you have to solve is, when you encounter a ", is it a metacharacter (part of a pair of quotes surrounding an item) or is it a real character that's part of the item?
As owner of the format, it's up to you to decide what the rule is. Perhaps a " should only be considered a metacharacter if it's next to a ,. But even that causes problems if you allow a mixture of quoted and unquoted items:
"A Town Called Malice", The Jam, 7", £6.99
So, you must come up with your own rules, that work in your domain, and write explicit code to handle that situation. One approach is to pre-process the input into canonical CSV so that it's again suitable for a conventional CSV parser.

Java String#contains() using String#matches() with escape character

I need a simple way to implement the contains function using matches. I believe this is my starting point:
xxx.matches("'.*yyy.*'");
But I need to make it a universal method and pre-process whatever I search for to be accepted by matches! This must be done using only the escape '\' character!
Imagine a string SEARCH_FOR that can contain some special characters that must be "regex escaped"...
String SEARCH_FOR="*.\\"
xxx.matches("'.*" + SEARCH_FOR + ".*'");
Are there any catches? Special situations? Any other "special chars should be taken into account?
Are you looking for Pattern.quote(String) ?
This escapes special characters for you.
EDIT:
After reading the comments, I really hope you try Pattern.quote(yourString.toLowerCase()) as it sounds like you've been using Pattern.quote(yourString).toLowerCase(). If DataNucleus is applying the regex then there should be no problems with using the \Q and \E escape sequence.
Since you have really asked for it, ".\\".replaceAll("(\\.|\\$|\\+|\\*|\\\\)", "\\\\\$1") outputs \.\\
This will escape .'s, $'s, + 's, *'s and \'s. Note that the security of this is now all upon you. If you don't escape something you needed to, or you escape it incorrectly, you will either allow people to use regex inside the search term when you weren't expecting to or it won't returns results that you were expecting.

Input Sanitizing to not break JSON syntax

So, in a nutshell I'm trying to create a regex that I can use in a java program that is about to submit a JSON object to my php server.
myString.replaceAll(myRegexString,"");
My question is that I am absolutely no good with regex and to add onto that I need to escape the characters properly as its stored in a string, and then also escape the characters properly inside the regex. good lordy.
What I came up with was this:
String myRegexString = "[\"',{}[]:;]"
The first backslash was to escape outer quotes to get a " in there. And then it struck me that {} and [] are also regex commands. Would I escape those as well? Like:
String myRegexString = "[\"',\{\}\[\]:;]"
Thanks in advance. In case it wasnt clear from examples above the only characters I really care about at this moment in time is:
" { } [ ] , and also ; : ' for general sqlinj protection.
UPDATE:
This is the final regex:
[\\Q\"',{}[\]:;\\E] for anyone else curious. Thanks Amit!
Why don't you use an actual JSON encoding API/framework? What you're doing is not sanitizing. What you're doing is corrupting the data. If my name is O'Reilly, I want it to be spelled O'Reilly, not OReilly. If I send a message containing [ or {, I want these to be in the messages. Use a framework or API that escapes those characters when needed rather than removing them blindly.
Googling for JSON Java will lead you to many APIs and frameworks.
Try something like
String myRegexString = "[\\Q\"',{}[]:;\\E]";
now the characters between \Q and \E are now treated as normal characters.

Regular expression for splitting JSON text in lines after symbols

I am trying to use a regular expression to have this kind of string
{
"key1"
:
value1
,
"key2"
:
"value2"
,
"arrayKey"
:
[
{
"keyA"
:
valueA
,
"keyB"
:
"valueB"
,
"keyC"
:
[
0
,
1
,
2
]
}
]
}
from
JSONObject.toString()
that is one long line of text in my Android Java app
{"key1":"value1","key2":"value2","arrayKey":[{"keyA":"valueA","keyB":"valueB","keyC":[0,1,2]}]}
I found this regular expression for finding all commas.
/(,)(?=(?:[^"]|"[^"]*")*$)/
Now I need to know:
0- if this is reliable, that is, does what they say.
1- if this is works also with commas inside double-quotes.
2- if this takes into account escaped double-quotes.
3- if I have to take into account also single quotes, as this file is produced by my app but occasionally it could be manually edited by the user.
5- It has to be used with the multi-line flag to work with multi-line text.
6- It has to work with replaceAll().
The resulting regular expression will be be used for replacing each symbol with a two-char sequence made of the symbol itself plus \n character.
The resulting text has to be still JSON text.
Subsequent replace actions will take place also for the other symbols
: [ ] { }
and other symbols that can be found in JSON files outside the alphanumeric sequences between quotes (I do not know if the mentioned symbols are the only ones).
Its not that much simple, but yes if you want to do then you need to filter characters([,{,",',:) and replace then with a new line character against it.
like:
[ should get replaced with [\n
Answer to your question is Yes its very much reliable and good to implement its just a single line of code doing all. Thats what regex is made for.
0- if this is reliable, that is, does what they say.
Let's break down the expression a little:
(,) is a capturing group that matches a single comma
(?=...) would mean a positive lookahead meaning the comma would need to be followed by a match of that group's content
(?:...)* would be a non-capturing group that can occur 0 to many times
[^"]|"[^"]*" would match either any character except a double quote ([^"]) or (|) a pair of double quotes with any character in between except other double quotes ("[^"]*")
As you can see especially the last part could make it unreliable if there are escaped double quotes in a text value, so the answer would be "this is reliable if the input is simple enough".
1- if this is works also with commas inside double-quotes.
If the double quote pairs are correctly identified any commas in between would be ignored.
2- if this takes into account escaped double-quotes.
Here's one of the major problems: escaped double quotes would need to be handled. This can get quite complex if you want to handle arbitrary cases, especially if the texts could contain commas as well.
3- if I have to take into account also single quotes, as this file is produced by my app but occasionally it could be manually edited by the user.
Single quotes aren't allowed by the JSON sepcification but many parsers support them because humans tend to use them anyway. Thus you might need to take them into account and that makes no. 2 even more complex because now there might be an unescaped double quote in a single quote text.
5- It has to be used with the multi-line flag to work with multi-line text.
I'm not entirely sure about that but adding the multi-line flag shouldn't hurt. You could add it to the expression itself though, i.e. by prepeding (?m).
6- It has to work with replaceAll().
In its current form the regex would work with String#replaceAll() because it only matches the comma - the lookahead is used to determine a match but won't result in the wrong parts being replaced. The matches themselves might not be correct though, as described above.
That being said, you should note that JSON is not a regular language and only regular languages are a perfect fit for regular expressions.
Thus I'd recommend using a proper JSON parser (there are quite a lot out there) to parse the JSON into POJOs (might just be a bunch of generic JsonObject and JsonArray instances) and reformat that according to your needs.
Here's an example of how Jackson could be used to accomplish that: https://kodejava.org/how-to-pretty-print-json-string-using-jackson/
In fact, since you're already using JSONObject.toString() you probably don't need the parser itself but just a proper formatter (if you want/need to roll your own you could have a look at the org.json.JSONObject sources ).

Categories