How to convert a large JSON string to JSON Object? - java

My spring application does a rest request to a server and the response from the server is a JSONObject string. The JSON string is very huge(200MB). I want to convert the json string to a JSONObject. Below are my code for conversion:
exchange = restTemplate.exchange(Url, HttpMethod.POST, postEntity, String.class);
jsonObject = objectMapper.readValue(exchange.getBody(), JSONObject.class);
For a single request, it is taking 3-5 seconds for conversion. But, if there are multiple requests the conversion is taking so much time (60 seconds for 8-10 requests in parallel).
Is there any better way to do this?

I'd say that transforming a 200MB chunk of JSON to an object using jackson-databind's ObjectMapper will almost always consume a lot of computing time and furthermore huge amounts of memory.
If you don't need the whole object represented by the JSON in memory at a single time, i. e. chunks of it will suffice, I would advise to switch to an approach that utilizes jackson's streaming API. You can combine it with databind on smaller subsets of the JSON, passing the resulting DTOs to some consumer (kind of visitor pattern).
I hope this is of any help for your special use-case.

Related

How to put a limit on memory usage while parsing a JSON string?

How can I rule out a java.lang.OutOfMemoryError when calling
new JSONObject(longAndMalformedJSONString)
I'm using the org.json implementation of a JSON parser.
I'm not looking for a way to decode the bad JSON String. I just want to put an upper limit on memory usage (and possibly CPU usage) and maybe get an exception that I can recover from.
Or, alternatively, is it safe to say, memory usage while parsing will never exceed a certain ratio compared to input string length? Then I could just limit that.
Or is there an alternate library that offers that?
There are two approaches when reading serialized data (JSON, XML, whatever): you either parse the entire input and keep the object in memory, or you navigate the stream via the provided API and you just keep the pieces you are interested in. It seems org.json doesn't have a streaming API, but more sophisticated libraries like Gson do:
JsonReader reader = new JsonReader(new InputStreamReader(in, "UTF-8"));
List<Message> messages = new ArrayList<Message>();
reader.beginArray();
while (reader.hasNext()) {
Message message = gson.fromJson(reader, Message.class);
messages.add(message);
}
reader.endArray();
reader.close();
You can also also put limits on the input, but it depends on the protocol you use for transferring the JSON payload

JSON performance

I have to turn a List<Map> into a JSON string. The Maps are flat and containing primitive and String data. Right now I'm using GSON. Basically like this:
List<Map> list = new ArrayList();
Map map = new HashMap();
map.put("id",100);
map.put("name","Joe");
map.put("country","US");
// ...
list.add(map);
Gson gson = new GsonBuilder().disableHtmlEscaping().setPrettyPrinting().create();
Stopwatch sw = Stopwatch.createStarted(); // guava's stopwatch
String s = gson.toJson(list);
System.err.println("toJson laps " + sw);
return s;
The list may have 100 entries and each map aprox. 20 fields. GSON really takes long time to create the JSON string. The JSON string will be returned by a HTTP response and right now it took too much time (8000ms). So I try other ones: json-smart, jackson, etc. But no one gives a significant speed boost. I trace the JSON string creation as the hot spot in execution time.
For 100 x 20 fields I really won't expect more than a second but it takes significant much more time. Is there any heal for this ?
Update
I have overseen some BLOB data that is returned. Thank you all.
You better use Jackson2
https://github.com/FasterXML/jackson
Java EE 7 has also a new JSON processing API, brand new!
http://docs.oracle.com/javaee/7/api/javax/json/package-summary.html
Profiling the different libs will provide answers. For enterprise applications, I never use GSON.
If you want to write benchmarks, use JMH. Writing high quality benchmarks isn't easy and JMH does a lot of the heavy lifting for you (although there are still a few gotchas).
If you wonder which Java Json library serializes/deserializes faster for various payload sizes, you can look at this extensive benchmark which compares a dozen of them with JMH. Fasted is dsljson, second is jackson. Gson is actually pretty slow for a 'modern' lib.

Parsing Huge JSON with Jackson

Consider a huge JSON with structure like -
{"text": "very HUGE text here.."}
I am storing this JSON as an ObjectNode object called say json.
Now I try to extract this text from the ObjectNode.
String text = json.get("text").asText()
This JSON can be like 4-5 MB in size. When I run this code, I dont get a result (program keeps executing forever).
The above method works fine for small and normal sized strings. Is there any other best practice to extract huge data from JSON?
test with jackson(fastxml), 7MB json node can be parsed in 200 milliseconds
ObjectMapper objectMapper = new ObjectMapper();
InputStream is = getClass().getResourceAsStream("/test.json");
long begin = System.currentTimeMillis();
Map<String,String> obj = objectMapper.readValue(is, HashMap.class);
long end = System.currentTimeMillis();
System.out.println(obj.get("value").length() + "\t" + (end - begin));
the output is:
7888888 168
try to upgrade you jackson?
Perhaps your default heap size is too small: if input is 5 megs UTF-8 encoded, Java String of it will usually need 10 megs of memory (char is 16-bits, most UTF-8 for english chars is single byte).
There isn't much you can do about this, regardless of JSON library, if value has to be handled as Java String; you need enough memory for the value and rest of processing. Further, since Java heap is divided into different generations, 64 megs may or may not work: since 10 megs needs to be consecutive, it probably gets allocated in the old generation.
So: see try with bigger heap size and see how much you need.

Efficient transcoding of Jackson parsed JSON

I'm using Jackson streaming API to deserialise a quite large JSON (on the order of megabytes) into POJO. It's working fine, but I'd like to optimize it (both memory and processing wise, code runs on Android).
The main problem I'd like to optimize away is converting a large number of strings from UTF-8 to ISO-8859-1. Currently I use:
String result = new String(parser.getText().getBytes("ISO-8859-1"));
As I understand it, parser originally copies token content into String (getText()), then creates a byte array from it (getBytes()), which is then used to create a final String in desired encoding. Way too much allocations and copying.
Ideal solution would be if getText() would accept the encoding parameter and just give me the final string, but that's not the case.
Any other ideas, or flaws in my thinking?
You can use:
parser.getBinaryValue() (present on version 2.4 of Jackson)
or you can implement an ObjectCodec (with a method readValue(...) that knows converting bytes to String in ISO8859-1) and set it using parser.setCodec().
If you have control over the json generation, avoid using a charset different than UTF-8.

Jackson JSON Streaming API: Read an entire object directly to String

I'm trying to stream in an array of JSON, object by object, but I need to import it as a raw JSON String.
Given an array of input like so:
[
{"object":1},
{"object":2},
...
{"object":n}
]
I am trying to iterate through the Strings:
{"object":1}
{"object":2}
...
{"object":n}
I can navigate the structure using the streaming API to validate that I have encountered an object, and all that, but I think the way I'm getting my String back is ideal.
Currently:
//[...]
//we have read a START_OBJECT token
JsonNode node = parser.readValueAsTree();
String jsonString = anObjectMapper.writeValueAsString(node);
//as opposed to String jsonString = node.toString() ;
//[...]
I imagine the building of the whole JsonNode structure involves a bunch of overhead, which is pointless if I'm just reserializing, so I'm looking for a better solution. Something along the lines of this would be ideal:
//[...]
//we have read a START_OBJECT token
String jsonString = parser.readValueAsString()
//or parser.skipChildrenAsString()
//[...]
The objects are obviously not as simple as
{"object":1}
which is why I'm looking to not waste time doing pointless node building. There may be some ideal way, involving mapping the content to objects and working with that, but I am not in a position where I am able to do that. I need the raw JSON string, one object at a time, to work with existing code.
Any suggestions or comments are appreciated. Thanks!
EDIT : parser.getText() returns the current token as text (e.g. START_OBJECT -> "{"), but not the rest of the object.
Edit2 : The motivation for using the Streaming API is to buffer objects in one by one. The actual json files can be quite large, and each object can be discarded after use, so I simply need to iterate through.
There is no way to avoid JSON tokenization (otherwise parser wouldn't know where objects start and end etc), so it will always involve some level of parsing and generation.
But you can reduce overhead slightly by reading values as TokenBuffer -- it is Jackson's internal type with lowest memory/performance overhead (and used internally whenever things need to be buffered):
TokenBuffer buf = parser.readValueAs(TokenBuffer.class);
// write straight from buffer if you have JsonGenerator
jgen.writeObject(buf);
// or, if you must, convert to byte[] or String
byte[] stuff = mapper.writeValueAsBytes();
We can do bit better however: if you can create JsonGenerator for output, just use JsonGenerator.copyCurrentStructure(JsonParser);:
jgen.copyCurrentStructure(jp); // points to END_OBJECT after copy
This will avoid all object allocation; and although it will need to decode JSON, encode back as JSON, it will be rather efficient.
And you can in fact use this even for transcoding -- read JSON, write XML/Smile/CSV/YAML/Avro -- between any formats Jackson supports.

Categories