Convert JSON unicode characters to unicode value - java

Basically, when JSON takes in a string it will convert things like ' or & to their Unicode value. I'm trying to store the JSON value as it goes out, and when it comes back in later, compare it with the JSON value. However, when I send out something like "Let's Party" it comes back "Let\u0027s Party"
So basically, I'm looking to convert all JSON Unicode to their specific Unicode values before storing it or sending it out.

I'm looking to convert all JSON Unicode to their specific Unicode values
I doubt you want to convert all characters to \u-escapes. In that case Let's party would become \u004c\u0065\u0074\u0027\u0073\u0020\u0050\u0061\u0072\u0074\u0079.
There is nothing special about the apostrophe or ampersand that means it has to be encoded in JSON, although some encoders do so anyway (it can have advantages for using JSON inside another wrapper context where those characters are sepecial).
It looks like you want to match the exact output that another encoder produces. To do that you'd have to determine the full set of characters that that encoder decides to escape, and either alter or configure your own JSON encoder to match that. For some characters that could be as simple as doing a string replace: for example as ' may only legitimately appear in JSON as part of a string literal it would be safe to replace with \u0027 after the encoding. This is ugly and fragile though.
It is generally a bad idea to rely on the exact encoding choices of a JSON serialiser. The JSON values {"a": "'", "b": 0.0} and {"b": 0, a: "\u0027"} represent the same data and should generally be treated as equal. For comparison purposes it is usually better to parse the JSON and check the content piece-by-piece, or re-serialise using your own encoder and compare that output (assuming your JSON encoder is deterministic).

Related

flink "equal" symbol changed inside a string

i have a very strange issue with flink.
I have a json in input with some fields defined inside a pojo.
When i see the output, the = symbols are changed:
original string:
"body": "/opensearch/OpenSearch?searchTerms=productType:OL_2_WFR___%20OR%20OL_2_WRR___%20OR%20SL_2_WST___%20OR%20SR_2_WAT___&count=10"
String produced by flink:
"body": "/opensearch/OpenSearch?searchTerms\u003dproductType:OL_2_WFR___%20OR%20OL_2_WRR___%20OR%20SL_2_WST___%20OR%20SR_2_WAT___\u0026count\u003d10"
someone know how to resolve this issue?
They aren't, not really.
As per the JSON spec, if the byte 0x09 (ASCII tab character) appears inside a JSON string, or if the byte sequence 0x5C 0x74 (The characters \t) appears, or if the sequence 0x5C 0x75 0x30 0x30 0x30 0x39 appears (the characters \u0009), they all mean the exact same thing: There is one character there in that string, and it is the tab.
If you're having trouble with this, your JSON library is broken. Get a better one.
Most likely your JSON library is not broken and instead, either [A] you are comparing raw JSON, or attempting to retrieve info from raw JSON using e.g. regular expressions. Stop doing that, it'll be an endless parade of such 'weirdness', because you're not supposed to do this. There are all sorts of ways you can have different JSON strings that means the same thing, or [B] there is no problem here and you can just continue; you merely saw the difference and understandably assumed that there is a difference here, or that it'll cause problems down the line.
Assuming you don't do silly things like attempting to parse JSON with regular expressions or comparing raw JSON and assume that means anything relevant about the content of it, this will not be a problem.
Specifically, \u003d and = are identical as per the JSON spec. Whatever processed this JSON decided to replace one sequence with another sequence that means the same thing, which is an allowed operation.

Conversion of String into Map in Java With Some Special Characters

I am trying to convert a String into Java Map but getting the following exception
com.google.gson.stream.MalformedJsonException
This is the string that I am trying to Map.
String myString = "{name=Nikhil Gupta,age=23,location=234Niwas#res=34}"
Map innerMap = new Gson().fromJson(myString,Map.class);
I understood the main problem here that because of these special characters I am getting this error.
If I remove those spaces and special characters then it will work fine.
Is there any way to do this without removing those spaces and special characters?
The approach used so far.
Wrapped those strings with special characters inside a single quote.
String myString = "{name='Nikhil Gupta',age='23',location='234Niwas#res=34'}"
But this is something that I don't want to use in a production environment as it will not work with nested structures.
Is there some genuine way to approach this in java?
I understood the main problem here that because of these special characters I am getting this error.
No, it's not because of "special characters" (whatever that means exactly).
{name=Nikhil Gupta,age=23,location=234Niwas#res=34}
The string you're trying to parse is simply not in JSON format, but in some other format that superficially resembles JSON. Your fixes by enclosing values in single quotes still don't make it JSON.
If it were valid JSON, it would look like this:
{"name":"Nikhil Gupta","age":23,"location":"234Niwas#res=34"}
Notable differences with your original:
Keys must be enclosed in double quotes
String values must be enclosed in double quotes (numeric values do not)
Key and value must be separated by a colon : instead of an equals sign =
Ways to solve this:
Use actual JSON format; see json.org for the specification
If you can't make it real JSON and you must absolutely use the format you are using, then you need to write your own parser for this custom non-JSON format

Java equivalent of Javascript: JSON.stringify("long_complex_string")

TL;DR: I have a String variable in java (not a json string, just a string) and I want to encode it to json, how? Please read the rest to be sure to have understood the question.
// This is javascript, I use it for this example because I know it better than java
// All of the following strings are valid json strings
const validJsonStrings = [
"{\"key\": \"value\"}",
"true",
"[]",
"\"long_complex_string\""
];
// Each of them can be parsed/decoded as you can easily test with:
console.log(validJsonStrings.map(s => JSON.parse(s)));
I'm interested in the 4th one, that is "\"long_complex_string\"" and that decodes into "long_complex_string".
Now, back to java, Let's say I have this variable:
String myString = "long_complex_string";
This is not json, it's just a string, it could be very long and could contain many special characters including double quotes. I want to encode this string to json, I want it to be exactly like the 4th string of the previous javascript example. I've seen many examples where objects or arrays are serialized to json, but I'm having trouble finding one that accepts a single string as input.
jsonObj.get("key") will retrieve only the stored value.
Please notice that \ is a special escape character for Java Strings. To get the desired String, your original has to look like this, escaping both \ and the ".
String original = "my ve\\\"ry c\\tomplex ✪string èè òòò ììì aaa";

How do I store accented characters in S3 metadata?

I am trying to store accented characters such as ò in the metadata of an S3 object. I am using the REST API which according to this page only accepts US-ASCII: http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html
Is there a way to convert Strings in Scala or Java from Bòrd to B\u00F2rd?
I have tried using Normalizer.normalize(str, Normalizer.Form.NFD) but the character when submitted to S3 is still causing an error because it appears as ò. When I try to print out the returned String it is also showing ò.
A normalized unicode string is just normalized in terms of composing characters, not necessarily to ASCII. Using NFKC would be more likely to convert characters to ASCII forms, but certainly would not reliably to do so.
It sounds like what you want is to escape non-ascii characters. You could use e.g. UnicodeEscaper from commons-lang, and UnicodeUnescaper to translate back.

Reading Json data containing special symbols like curly braces, square brackets etc

I have json data saved as a string in a database. The structure of JSON data is fine with the problem that it contains regex expressions and even urls which contains curry braces {} or square brackets [] etc. I can replace some of special symbols with the encodings available e.g hex or decimal encodings. and do string manipulations to take care of these. I was just wondering is there another way for handling this situation. I am getting following exception for the strings containing this type of Json data.
org.json.JSONException: Expected a ',' or '}' at character 22891 of {"wires":[{"id"....so on
Please let me know if I need to elaborate more.
Here's the first thing coming to my mind:
When putting the thing in the database try php addslashes/stripslashes (assuming you are using php to contact the database).

Categories