JSONParser fails while parsing very large numbers

JSONParser fails while parsing very large numbers - java

I am using org.json.simple.parser JSON Parser. One specific data contained very large numbers. For example it once failed while parsing a line with this error
java.lang.NumberFormatException: For input string: "982134839798321390034432432"
Clearly it should be parsed with BigInt data type. Or there should be an option to treat these just as strings. What can be done in this case?

This is a known issue in "json-simple", see https://github.com/fangyidong/json-simple/issues/73
You need to either:
Switch to a different JSON parser, for example https://github.com/FasterXML/jackson
Apply the patch on issue #73 to a private fork of "json-simple" and use that instead of the released version, or use "loegering"s fork at https://cliftonlabs.github.io/json-simple/ (linked from issue #73)

Related

Parsing decimal numbers, some of which lack a decimal separator, in JSON data using JSON-Simple (Java)

I am trying to use the JSON-Simple JSON processor library.
When parsing JSON fragment such as:
"speed":1.13
…I call get and cast as a Double. No problem.
Double speed = ( Double ) wind.get( "speed" );
But then I encounter a value without a decimal fraction. Ex: 1 rather than 1.0.
"speed":1
Granted, the publisher of this data should have written "speed":1.0. But they did not.
My get with casting throws an exception:
Exception in thread "main" java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.Double (java.lang.Long and java.lang.Double are in module java.base of loader 'bootstrap')
Apparently JSON-Simple insisted on parsing the JSON value of 1 as a Long. So I need a workaround, a way to tell JSON-Simple how to parse this particular element.
➥ Is there a way to tell JSON-Simple to parse the string inputs as Double regardless of whether a decimal separator (a decimal point) is present?
➥ Even better, can I tell JSON-Simple to parse the string input for a particular JSON element as BigDecimal to bypass the inaccuracy of floating-point? (that is, going from String to BigDecimal without involving floating-point along the way)

Instead of casting try
Double.parseDouble(wind.get( "speed" ).toString())

Use later version
You are using the original JSON-Simple library led by Fang Yidong. Later versions 2 and 3 were developed as a fork at this Clifton Labs page on GitHub, led by Davin Loegering.
The original does not support BigDecimal. The fork does supports BigDecimal. See the getBigDecimal method.
The fork has changed the original library quite a bit. See the History section of that Clifton Labs page.

Difference between SolrJ's ResponseParsers

The SolrJ library offers different parsers for Solr's responses.
Namely:
BinaryResponseParser
StreamingBinaryResponseParser
NoOpResponseParser
XMLResponseParser
Sadly the documentation doesn't say much about them, other than:
SolrJ uses a binary format, rather than XML, as its default format.
Users of earlier Solr releases who wish to continue working with XML
must explicitly set the parser to the XMLResponseParser, like so:
server.setParser(new XMLResponseParser());
So it looks like the XMLResponseParser is there mainly for legacy purposes.
What are the differences between the others parsers?
Can I expect performance improvements by using an other parser over the XMLResponseParser?

The Binary Stream Parsers is meant to work directly with the Java Object Format (the binary POJO format) to make the creation of data objects as smooth as possible on the client side.
The XML parser was designed to work with the old response format where there wasn't any real alternatives (as there was no binary response format in Solr). It's a lot more work to consider all the options for an XML format than use the binary format directly.
The StreamingBinaryResponseParser does the same work as the BinaryResponseParser, but has been designed to make streaming documents (i.e. not creating a list of documents and returning that list, but instead return each document by itself without having to hold them all in memory at the same time) possible. See SOLR-2112 for a description of the feature and why it was added.
Lastly, yes, if you're using SolrJ, use the binary response format, unless you have a very good reason for using the XML based one. If you have to ask the question, you're probably better off with the binary format.

Efficient transcoding of Jackson parsed JSON

I'm using Jackson streaming API to deserialise a quite large JSON (on the order of megabytes) into POJO. It's working fine, but I'd like to optimize it (both memory and processing wise, code runs on Android).
The main problem I'd like to optimize away is converting a large number of strings from UTF-8 to ISO-8859-1. Currently I use:
String result = new String(parser.getText().getBytes("ISO-8859-1"));
As I understand it, parser originally copies token content into String (getText()), then creates a byte array from it (getBytes()), which is then used to create a final String in desired encoding. Way too much allocations and copying.
Ideal solution would be if getText() would accept the encoding parameter and just give me the final string, but that's not the case.
Any other ideas, or flaws in my thinking?

You can use:
parser.getBinaryValue() (present on version 2.4 of Jackson)
or you can implement an ObjectCodec (with a method readValue(...) that knows converting bytes to String in ISO8859-1) and set it using parser.setCodec().
If you have control over the json generation, avoid using a charset different than UTF-8.

How to convert a Java object to String using Saxon

I am facing a problem with Xalon while converting Java object to String, i.e empty open close tags are converted to self closing tags. eg. <span></span> gets converted to </span>.
I have fixed simliar problem while using Saxon XSL transformer. Is it possible to use Saxon to convert a java Object to String instead of Xalon.

First, I'm sure you mean <span/> for the self-closing tag.
Second: why is this a problem? If you are generating XML, <span></span> means exactly the same as <span/>, and will be treated the same by any XML parser. (If you're reading the XML without an XML parser, then DON'T). On the other hand, if you are generating HTML, then specifying method="html" should be all you need to do, whether you are using Xalan or Saxon.
Third: I can't see any relationship between your serialization problem and the task of converting Java objects to strings.
You can certainly do such things in Saxon. The documentation for calling Java methods from Saxon can be found here: http://www.saxonica.com/documentation/extensibility/intro.xml (Sorry there's so much of it, but I don't know enough about your situation to give you a more precise pointer).

JSON-lib escaping / preserving strings

I am using JSON-lib library for java http://json-lib.sourceforge.net
I just want to add simple string which can look like JSON (but i do not want library to automatically figure out that it might be json and just to treat it as string). Looking into source of library I can't find the way to do it without ugly hacks.
example:
JSONObject object = new JSONObject();
String chatMessageFromUser = "{\"dont\":\"treat it as json\"}";
object.put("myString", chatMessageFromUser);
object.toString() will give us {"myString":{"dont":"treat it as json"}}
and i want just to have {"myString":"{\"dont\":\"treat it as json\"}"}
How to achieve it without modifying source code ? I am using this piece of code as transport for chat messages from users - so it works OK for normal chat messages, but when user will enter JSON format as message it will break it because of default behavior of JSON-lib described here.

If I understand question correctly, I think json-lib is unique in its assumption of a String being passed needing to be parsed. Other libs typically treat it as String to include (with escaping of double-quotes and backslashes as necessary), i.e. work as you would expect.
So you may want to consider other libraries: I would recommend Jackson, Gson also works.

json-simple offers a JSONObject.escape() method.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

JSONParser fails while parsing very large numbers - java

Related

Parsing decimal numbers, some of which lack a decimal separator, in JSON data using JSON-Simple (Java)

Difference between SolrJ's ResponseParsers

Efficient transcoding of Jackson parsed JSON

How to convert a Java object to String using Saxon

JSON-lib escaping / preserving strings

Categories

Resources