JSON decode issue - java

I'm trying to decode JSON output of a Java program (jackson) and having some issues.
The cause of the problem is the following snippet:
{
"description": "... lives\uMOVE™ OFFERS ",
}
Which causes ValueError: Invalid \uXXXX escape.
Any ideas on how to fix this?
EDIT: The output is from an Avro file, the Avro package uses jackson to emit records as JSON.
EDIT2: After poking about in the source files, it might be the case that the JSON is constructed manually (sorry jackson).

What's the original string supposed to look like? \uXXXX is a unicode escape sequence, so it's interpreting \uMOVE as a single character, but it's not a valid unicode value. JSON is always assumed to be unicode, so you'll likely need to fix the string in the originating app

Try quoting the \u like this:
{
"description": "... lives\\uMOVE™ OFFERS ",
}

Basically the input isn't valid json.
The spec on http://www.json.org/ defines how strings should be be encoded. You will have to fix the JSON output from the other application.

This is a known bug in Avro versions < 1.6.0. See AVRO-851 for more details.

Jackson does not currently have a configuration feature to allow accepting such input. (Was it generated with Jackson?)
You could modify the stream parser to handle it. Follow the stack trace to the method(s) that would need changing.
You could submit a change request at http://jira.codehaus.org/browse/JACKSON for Jackson to be enhanced to provide such a feature, though I'm not sure how popular the request would be, and whether it would ever be implemented.

Related

Is there any way to deserialize a nonstandard JSON in Java

{
"id":1,
"city":"cityname",
"address":"{"name":"addressName"}"
}
for example
The value of the address field is missing an escape,Is there a way to deserialize it as a string
Yes and No.
Yes:
If there is a reliable pattern for detecting the error, you could potentially write some Java code to insert the missing escapes. Then you parse the corrected JSON using a regular JSON parser.
It might even be possible to write a custom JSON parser that treats a "{" sequence as "{\", and so on. Or modify an existing parser to do that.
No: A regular JSON parser will reject this. AFAIK, no mainstream JSON parser supports arbitrary non-standard (i.e. broken!) JSON variants.
A better idea is to fix whatever is generating the broken JSON. Or charge the customer who wants you to support this garbage a LOT OF MONEY because their data doesn't conform to the agreed requirements. (Assuming that the agreed requirements said JSON.)

Cleanest way to deserialize a non-standard (wrong) format of list of JSON string that doesn't have quote

I'm trying to deserialize a Java String to a List of String. Due to some reason, the input may come in two formats:
"[\"string1\", \"string2\"]"
or
"[string1, string2]"
The library I'm using is Jackson databind.
For the first case, it's a typical, easy case that Jackson supports.
For the second, I understand it's not a correct format of JSON and I can hack to achieve the goal by splitting this String by , and remove []s etc, but just would like to know if someone knows a clean way to deserialize something like that.
Thanks in advance.
To answer your question, you can look into YAML parsers. :)
Jackson has an extention for YAML support so that would be your clean solution.
YAML is a superset of JSON so it can parse any valid JSON... as well as many more complex transcripts (like strings without ").

Jackson cannot parse control character

In my API response, I have control-p character. Jackson parser fails to serialize the character and throws an error
com.fasterxml.jackson.core.JsonParseException: Illegal unquoted
character ((CTRL-CHAR, code 16)): has to be escaped using backslash to
be included in string value
I have investigated and found that Jackson library actually tries to catch for ctrl-char.
Can anyone suggest solutions or work around for this? Thanks in advance.
I was able to fix similar problem by setting Feature.ALLOW_UNQUOTED_CONTROL_CHARS (documentation) on JsonParser
.
The code in my case looks:
parser.setFeatureMask(parser.getFeatureMask() | JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS.getMask());
As stated by others, such JSON is invalid, but in case you have no chance to change JSON, this should help.
Have you tried to configure the mapper to force escape non-ASCII?
This might be enough:
mapper.configure(JsonGenerator.Feature.ESCAPE_NON_ASCII, true);
see documentation
But I agree with StaxMan: the JSON response should be well formatted.
Content you get is not valid JSON -- as per JSON specification, control characters MUST be escaped within String values, and CAN NOT exist outside of them. So I would recommened getting input data fixed; it is corrupt, and whoever is sending it is not doing good job of cleansing it, or properly escaping.
Barring that, you can write a Reader (or even InputStream) that filters out or converts said control characters.

How to fetch source string from a utf-8 represented string

I have a page get from the Internet, and the content is utf-8 encoded as a String which may be like:
{"has_more": true, "items": [{"body": "\u6ca1\u6709\u4f20\u8bf4\u4e2d\u7684\u90a3\u4e48\u597d",...}
I tried to use URLDecoder.decode(), but it doesn't work, it output exactly what the input is. Any suggestions? This is String object that utf-8 encoded explicit, it's not a inputStream or sth. I have done some searching effort, finding little relevant.
The source code notation is u-encoded (\uXXXX) but the String itself is an undistinguishable normal string (Java/JavaScript), like \n or \t.
The JDK has a conversion tool though:
native2ascii -encoding UTF-8 -reverse mypage.json plain-utf8.json
That's JSON encoding, which handles certain specific characters in a specific way. It is not the URL Encoding, hence that not working.
Why don't you try using a JSON library? json simple or GSON are good ones to start off.
As a curiosity: here's where the encoding you're seeing is described: RFC4627
You can use Gson to convert them to a Map.
Check out the libs in Java - JSON in Java
Yes its JSON (JavaScript Object Notation) is a lightweight data-interchange format.
Go through http://www.json.org/java/

Java library to escape/clean XML?

I get some malformed xml text input like:
"<Tag>something</Tag> 8 > 3, 2 < 3, ... <Tag>something</Tag>"
I want to clean the input so to get:
"<Tag>something</Tag> 8 > 3, 2 < 3, ... <Tag>something</Tag>"
That is, escape those special symbols like <,> and yet keep the valid tags ("<Tag>something</Tag>, note, with the same case)
Do you know of any java library to do this? Probably a xml/html parser? (though I don't really need a parser, simple a "clean" procedure)
JTidy is "HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML"
But it can also be used with xml. Check the documentation. It's incredible smart, it will probably work for you.
I don't know of any library that would do that. Your input is malformed XML, and no proper XML parser would accept it. More important, it is not always possible to distinguish an actual tag from something that looks-like-a-tag-but-is-really-text. Therefore any heuristic-based attempt that you make to solve the problem will be fragile; i.e. it could occasionally produce malformed XML.
The best approach is address the problem before you assemble the XML.
If you generate the XML by (for example) unparsing a DOM, then the unparser will take care of the escaping for you.
If you are generating the XML by templating or string bashing, then you need to call something like StringEscapeUtils.escapeXml on the relevant text chunks ... before the XML tags get incorporated.
If you leave the problem until after the "XML" has been assembled, it cannot be properly fixed.
The best solution is to fix the program generating your text input. The easiest such fix would involve an escape utility like the other answers suggested. If that's not an option, I'd use a regular expression like
</?[a-zA-Z]+ */?>
to match the expected tags, and then split the string up into tags (which you want to pass through unchanged) and text between tags (against which you want to apply an escape method.)
I wouldn't count on an XML parser to be able to do it for you because what you're dealing with isn't valid XML. It is possible for the existing lack of escaping to produce ambiguities, so you might not be able to do a perfect job either.
Check out Guava's XmlEscaper. It is in pre-release for version 11 but the code is available.
Apache Commons Lang contains a class named StringEscapeUtils which does exactly what you want! The method you'd want to use is escapeXml, I presume.

Categories