flink "equal" symbol changed inside a string - java

i have a very strange issue with flink.
I have a json in input with some fields defined inside a pojo.
When i see the output, the = symbols are changed:
original string:
"body": "/opensearch/OpenSearch?searchTerms=productType:OL_2_WFR___%20OR%20OL_2_WRR___%20OR%20SL_2_WST___%20OR%20SR_2_WAT___&count=10"
String produced by flink:
"body": "/opensearch/OpenSearch?searchTerms\u003dproductType:OL_2_WFR___%20OR%20OL_2_WRR___%20OR%20SL_2_WST___%20OR%20SR_2_WAT___\u0026count\u003d10"
someone know how to resolve this issue?

They aren't, not really.
As per the JSON spec, if the byte 0x09 (ASCII tab character) appears inside a JSON string, or if the byte sequence 0x5C 0x74 (The characters \t) appears, or if the sequence 0x5C 0x75 0x30 0x30 0x30 0x39 appears (the characters \u0009), they all mean the exact same thing: There is one character there in that string, and it is the tab.
If you're having trouble with this, your JSON library is broken. Get a better one.
Most likely your JSON library is not broken and instead, either [A] you are comparing raw JSON, or attempting to retrieve info from raw JSON using e.g. regular expressions. Stop doing that, it'll be an endless parade of such 'weirdness', because you're not supposed to do this. There are all sorts of ways you can have different JSON strings that means the same thing, or [B] there is no problem here and you can just continue; you merely saw the difference and understandably assumed that there is a difference here, or that it'll cause problems down the line.
Assuming you don't do silly things like attempting to parse JSON with regular expressions or comparing raw JSON and assume that means anything relevant about the content of it, this will not be a problem.
Specifically, \u003d and = are identical as per the JSON spec. Whatever processed this JSON decided to replace one sequence with another sequence that means the same thing, which is an allowed operation.

Related

JSON broken when double quotes comes inside the key/value

Sample data:
{"630":{"TotalLength":"33-3/8" - 36-3/4""},"631":{"Length":"34 37 7/8"}}
We are facing the double quotes issue in JSON response. How we can replace the double quotes with " \" " which comes inside the key or value? Java is the development platform.
This answer is assuming that you are not in control of creating this JSON-like string. If you can control that part, then you should be escaping properly there itself.
In this case, since parsing systematically is not an option as it's not a valid JSON yet, all I could suggest is to go through the various strings and see if you can find a pattern on which you can apply some logic and escape all the "s which prevent the string from being a valid JSON.
Here is probably a way to start:
All of the "s that are needed to be there for the string to be a vaild JSON are surrounded by one or multiple characters among {, :, ,, and }, with or without space in between the " and the other JSON characters.
So, if you parse the JSON-like string using Java and look for all the "s, and, when encountered with one, if they are along with any of the above characters (with or without space in between), you just leave it as it is. If not, replace that " with a \".
Note that the above method may or may not work depending on the data in question. What I mean to convey is the approach that you may find useful if there's absolutely no way for the string to be escaped during it's creation, and, if these strings follow a strict pattern with respect to the unescaped "s.

Convert JSON unicode characters to unicode value

Basically, when JSON takes in a string it will convert things like ' or & to their Unicode value. I'm trying to store the JSON value as it goes out, and when it comes back in later, compare it with the JSON value. However, when I send out something like "Let's Party" it comes back "Let\u0027s Party"
So basically, I'm looking to convert all JSON Unicode to their specific Unicode values before storing it or sending it out.
I'm looking to convert all JSON Unicode to their specific Unicode values
I doubt you want to convert all characters to \u-escapes. In that case Let's party would become \u004c\u0065\u0074\u0027\u0073\u0020\u0050\u0061\u0072\u0074\u0079.
There is nothing special about the apostrophe or ampersand that means it has to be encoded in JSON, although some encoders do so anyway (it can have advantages for using JSON inside another wrapper context where those characters are sepecial).
It looks like you want to match the exact output that another encoder produces. To do that you'd have to determine the full set of characters that that encoder decides to escape, and either alter or configure your own JSON encoder to match that. For some characters that could be as simple as doing a string replace: for example as ' may only legitimately appear in JSON as part of a string literal it would be safe to replace with \u0027 after the encoding. This is ugly and fragile though.
It is generally a bad idea to rely on the exact encoding choices of a JSON serialiser. The JSON values {"a": "'", "b": 0.0} and {"b": 0, a: "\u0027"} represent the same data and should generally be treated as equal. For comparison purposes it is usually better to parse the JSON and check the content piece-by-piece, or re-serialise using your own encoder and compare that output (assuming your JSON encoder is deterministic).

Encoding a string in 128c barcode symbology

I am having some trouble with encoding this string into barcode symbology - Code 128.
Text to encode:
1021448642241082212700794828592311
I am using the universal encoder from idautomation.com:
https://www.bcgen.com/fontencoder/
I get the following output for the encoded text for Code 128:
Í*5LvJ8*r5;ÂoP<[7+.Î
However, in ";Âo" the character between the semi-colon and o (let us call it special A) - is not part of the extended character set used in Code128. (See the Latin Supplements at https://www.fonts2u.com/code-128.font)
Yet the same string shows a valid barcode at
https://www.bcgen.com/linear-barcode-creator.html
How?
If I use the output with the Special A on a webpage with a font face for barcodes, the special A character does not show up as the barcode (and that seems correct since the special A is not part of the character set).
What gives? Please help.
I am using the IDAutomation utility to encode the string to 128c symbology. If you can share code to do the encoding (in Java/Python/C/Perl) that would help too.
There are multiple fonts for Code128 that may use different characters to represent the barcode symbols. Make sure the font and the encoding logic match each other.
I used this one http://www.jtbarton.com/Barcodes/Code128.aspx (there is also sample code how to encode it on the site, but you have to translate it from VB). The font works for all three encodings (A, B and C).
Sorry, this is very late.
When you are dealing with the encoding of code 128, in any subset, it's a good idea to think of that coding in terms of numbers, not characters. At this level, when you have shifts, code-changes, checksums and stuff, intermixed with the data, the whole concept of "character" is lost.
However, this is what is happening:
The semicolon in the output corresponds to "27"
The lowercase o corresponds to "48" and the P to "79"
The "A with Macron" corresponds to your "00" sequence. This is why you should be dealing with numbers, not characters, at this level of encoding.
How would you expect it to show a character with a code of 00 ? That would be a space of NULL, neither of which is particularly visible.
Your software has simply rendered it the best way it can, which is to make the character 'visible' by adding 0x80 to it. If you look at charmap, you will see that code 0x80 is indeed A with macron.
The rest (indeed all) of your encoded string looks correct for a setc-encodation.

Replacing Java unicode encodings with actual characters

When I make web queries, for accented characters, I get special character encodings back as strings such as "\u00f3" , but I need to replace it with the actual character, like "ó" before making another query.
How would I find these cases without actually looking for each one, one by one?
It seems you're handling JSON formatted data.
Use any of the many freely available JSON libraries to handle this (and other parsing issues) for you instead of trying to do it manually.
The one from JSON.org is pretty widely used, but there are surely others that work just as well.

Reading Json data containing special symbols like curly braces, square brackets etc

I have json data saved as a string in a database. The structure of JSON data is fine with the problem that it contains regex expressions and even urls which contains curry braces {} or square brackets [] etc. I can replace some of special symbols with the encodings available e.g hex or decimal encodings. and do string manipulations to take care of these. I was just wondering is there another way for handling this situation. I am getting following exception for the strings containing this type of Json data.
org.json.JSONException: Expected a ',' or '}' at character 22891 of {"wires":[{"id"....so on
Please let me know if I need to elaborate more.
Here's the first thing coming to my mind:
When putting the thing in the database try php addslashes/stripslashes (assuming you are using php to contact the database).

Categories