Reading Json data containing special symbols like curly braces, square brackets etc

Reading Json data containing special symbols like curly braces, square brackets etc - java

I have json data saved as a string in a database. The structure of JSON data is fine with the problem that it contains regex expressions and even urls which contains curry braces {} or square brackets [] etc. I can replace some of special symbols with the encodings available e.g hex or decimal encodings. and do string manipulations to take care of these. I was just wondering is there another way for handling this situation. I am getting following exception for the strings containing this type of Json data.
org.json.JSONException: Expected a ',' or '}' at character 22891 of {"wires":[{"id"....so on
Please let me know if I need to elaborate more.

Here's the first thing coming to my mind:
When putting the thing in the database try php addslashes/stripslashes (assuming you are using php to contact the database).

Related

flink "equal" symbol changed inside a string

i have a very strange issue with flink.
I have a json in input with some fields defined inside a pojo.
When i see the output, the = symbols are changed:
original string:
"body": "/opensearch/OpenSearch?searchTerms=productType:OL_2_WFR___%20OR%20OL_2_WRR___%20OR%20SL_2_WST___%20OR%20SR_2_WAT___&count=10"
String produced by flink:
"body": "/opensearch/OpenSearch?searchTerms\u003dproductType:OL_2_WFR___%20OR%20OL_2_WRR___%20OR%20SL_2_WST___%20OR%20SR_2_WAT___\u0026count\u003d10"
someone know how to resolve this issue?

They aren't, not really.
As per the JSON spec, if the byte 0x09 (ASCII tab character) appears inside a JSON string, or if the byte sequence 0x5C 0x74 (The characters \t) appears, or if the sequence 0x5C 0x75 0x30 0x30 0x30 0x39 appears (the characters \u0009), they all mean the exact same thing: There is one character there in that string, and it is the tab.
If you're having trouble with this, your JSON library is broken. Get a better one.
Most likely your JSON library is not broken and instead, either [A] you are comparing raw JSON, or attempting to retrieve info from raw JSON using e.g. regular expressions. Stop doing that, it'll be an endless parade of such 'weirdness', because you're not supposed to do this. There are all sorts of ways you can have different JSON strings that means the same thing, or [B] there is no problem here and you can just continue; you merely saw the difference and understandably assumed that there is a difference here, or that it'll cause problems down the line.
Assuming you don't do silly things like attempting to parse JSON with regular expressions or comparing raw JSON and assume that means anything relevant about the content of it, this will not be a problem.
Specifically, \u003d and = are identical as per the JSON spec. Whatever processed this JSON decided to replace one sequence with another sequence that means the same thing, which is an allowed operation.

Conversion of String into Map in Java With Some Special Characters

I am trying to convert a String into Java Map but getting the following exception
com.google.gson.stream.MalformedJsonException
This is the string that I am trying to Map.
String myString = "{name=Nikhil Gupta,age=23,location=234Niwas#res=34}"
Map innerMap = new Gson().fromJson(myString,Map.class);
I understood the main problem here that because of these special characters I am getting this error.
If I remove those spaces and special characters then it will work fine.
Is there any way to do this without removing those spaces and special characters?
The approach used so far.
Wrapped those strings with special characters inside a single quote.
String myString = "{name='Nikhil Gupta',age='23',location='234Niwas#res=34'}"
But this is something that I don't want to use in a production environment as it will not work with nested structures.
Is there some genuine way to approach this in java?

I understood the main problem here that because of these special characters I am getting this error.
No, it's not because of "special characters" (whatever that means exactly).
{name=Nikhil Gupta,age=23,location=234Niwas#res=34}
The string you're trying to parse is simply not in JSON format, but in some other format that superficially resembles JSON. Your fixes by enclosing values in single quotes still don't make it JSON.
If it were valid JSON, it would look like this:
{"name":"Nikhil Gupta","age":23,"location":"234Niwas#res=34"}
Notable differences with your original:
Keys must be enclosed in double quotes
String values must be enclosed in double quotes (numeric values do not)
Key and value must be separated by a colon : instead of an equals sign =
Ways to solve this:
Use actual JSON format; see json.org for the specification
If you can't make it real JSON and you must absolutely use the format you are using, then you need to write your own parser for this custom non-JSON format

Spring escapes already escaped string

So I have this string that I am saving in database:
{\"facebook\":\"fb.com\",\"twitter\":\"twitter.com\",\"instagram\":\"\",\"googlePlus\":\"\",\"others\":\"espn.com\"}
But when I call the GET api, I get this in JSON
{\\\"facebook\\\":\\\"fb.com\\\",\\\"twitter\\\":\\\"twitter.com\\\",\\\"instagram\\\":\\\"\\\",\\\"googlePlus\\\":\\\"\\\",\\\"others\\\":\\\"espn.com\\\"}
Why is this happening and how can I get the exact same data that is stored in database?

It is escaped again when you're retrieving the data because Spring thinks that the \ character is part of the data and not used to escape ".
You never want to store escaped characters (be it for JSON special characters, HTML characters in a text, ...), you have to store unescaped data to fix your issue. Escaping has to be done when displaying data, not when storing it.
It is a bad practice to store escaped data because of the problem you're having but also because it will take useless storage space in the database (which might not be a problem right now for you, but will be with millions of rows).

You could also use apache's build in StringEscapeUtils.unescapeJson(String input) mechanism for Json data. See reference https://commons.apache.org/proper/commons-lang/javadocs/api-3.4/org/apache/commons/lang3/StringEscapeUtils.html

After getting the data, save it in a String and then :
String newJava = str.replace("\\\", "\");

Convert JSON unicode characters to unicode value

Basically, when JSON takes in a string it will convert things like ' or & to their Unicode value. I'm trying to store the JSON value as it goes out, and when it comes back in later, compare it with the JSON value. However, when I send out something like "Let's Party" it comes back "Let\u0027s Party"
So basically, I'm looking to convert all JSON Unicode to their specific Unicode values before storing it or sending it out.

I'm looking to convert all JSON Unicode to their specific Unicode values
I doubt you want to convert all characters to \u-escapes. In that case Let's party would become \u004c\u0065\u0074\u0027\u0073\u0020\u0050\u0061\u0072\u0074\u0079.
There is nothing special about the apostrophe or ampersand that means it has to be encoded in JSON, although some encoders do so anyway (it can have advantages for using JSON inside another wrapper context where those characters are sepecial).
It looks like you want to match the exact output that another encoder produces. To do that you'd have to determine the full set of characters that that encoder decides to escape, and either alter or configure your own JSON encoder to match that. For some characters that could be as simple as doing a string replace: for example as ' may only legitimately appear in JSON as part of a string literal it would be safe to replace with \u0027 after the encoding. This is ugly and fragile though.
It is generally a bad idea to rely on the exact encoding choices of a JSON serialiser. The JSON values {"a": "'", "b": 0.0} and {"b": 0, a: "\u0027"} represent the same data and should generally be treated as equal. For comparison purposes it is usually better to parse the JSON and check the content piece-by-piece, or re-serialise using your own encoder and compare that output (assuming your JSON encoder is deterministic).

Regular expression for splitting JSON text in lines after symbols

I am trying to use a regular expression to have this kind of string
{
"key1"
:
value1
,
"key2"
:
"value2"
,
"arrayKey"
:
[
{
"keyA"
:
valueA
,
"keyB"
:
"valueB"
,
"keyC"
:
[
0
,
1
,
2
]
}
]
}
from
JSONObject.toString()
that is one long line of text in my Android Java app
{"key1":"value1","key2":"value2","arrayKey":[{"keyA":"valueA","keyB":"valueB","keyC":[0,1,2]}]}
I found this regular expression for finding all commas.
/(,)(?=(?:[^"]|"[^"]*")*$)/
Now I need to know:
0- if this is reliable, that is, does what they say.
1- if this is works also with commas inside double-quotes.
2- if this takes into account escaped double-quotes.
3- if I have to take into account also single quotes, as this file is produced by my app but occasionally it could be manually edited by the user.
5- It has to be used with the multi-line flag to work with multi-line text.
6- It has to work with replaceAll().
The resulting regular expression will be be used for replacing each symbol with a two-char sequence made of the symbol itself plus \n character.
The resulting text has to be still JSON text.
Subsequent replace actions will take place also for the other symbols
: [ ] { }
and other symbols that can be found in JSON files outside the alphanumeric sequences between quotes (I do not know if the mentioned symbols are the only ones).

Its not that much simple, but yes if you want to do then you need to filter characters([,{,",',:) and replace then with a new line character against it.
like:
[ should get replaced with [\n
Answer to your question is Yes its very much reliable and good to implement its just a single line of code doing all. Thats what regex is made for.

0- if this is reliable, that is, does what they say.
Let's break down the expression a little:
(,) is a capturing group that matches a single comma
(?=...) would mean a positive lookahead meaning the comma would need to be followed by a match of that group's content
(?:...)* would be a non-capturing group that can occur 0 to many times
[^"]|"[^"]*" would match either any character except a double quote ([^"]) or (|) a pair of double quotes with any character in between except other double quotes ("[^"]*")
As you can see especially the last part could make it unreliable if there are escaped double quotes in a text value, so the answer would be "this is reliable if the input is simple enough".
1- if this is works also with commas inside double-quotes.
If the double quote pairs are correctly identified any commas in between would be ignored.
2- if this takes into account escaped double-quotes.
Here's one of the major problems: escaped double quotes would need to be handled. This can get quite complex if you want to handle arbitrary cases, especially if the texts could contain commas as well.
3- if I have to take into account also single quotes, as this file is produced by my app but occasionally it could be manually edited by the user.
Single quotes aren't allowed by the JSON sepcification but many parsers support them because humans tend to use them anyway. Thus you might need to take them into account and that makes no. 2 even more complex because now there might be an unescaped double quote in a single quote text.
5- It has to be used with the multi-line flag to work with multi-line text.
I'm not entirely sure about that but adding the multi-line flag shouldn't hurt. You could add it to the expression itself though, i.e. by prepeding (?m).
6- It has to work with replaceAll().
In its current form the regex would work with String#replaceAll() because it only matches the comma - the lookahead is used to determine a match but won't result in the wrong parts being replaced. The matches themselves might not be correct though, as described above.
That being said, you should note that JSON is not a regular language and only regular languages are a perfect fit for regular expressions.
Thus I'd recommend using a proper JSON parser (there are quite a lot out there) to parse the JSON into POJOs (might just be a bunch of generic JsonObject and JsonArray instances) and reformat that according to your needs.
Here's an example of how Jackson could be used to accomplish that: https://kodejava.org/how-to-pretty-print-json-string-using-jackson/
In fact, since you're already using JSONObject.toString() you probably don't need the parser itself but just a proper formatter (if you want/need to roll your own you could have a look at the org.json.JSONObject sources ).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.