Will using Jackson input stream bring all contents into memory - java

I have a JSON (from stream). The data is huge. So I don't want to deserialize into a concrete Java Object. So I'm thinking to use Jackson parser.
JsonFactory factory = new JsonFactory();
JsonParser parser = factory.createJsonParser(stream);
My goal is to get specific sections from stream(specific properties from the object)
{
"array": [],
"map": {},
"bool": "true",
"string": "abcd"
}
For example: I want to get only the map or the array and so on
However my question is when I use inputstream and parse it( to get specific sections of the stream) will it(the entire JSON) be brought into the memory all at once?
What is the difference between this(parsing) way and deserializing it into an object(and then getting the specific members from the object)?

Of 3 major processing modes that Jackson supports, Streaming Processing (also known as Incremental Processing) is the most efficient way to process JSON content. It has the lowest memory and processing overhead, and can often match performance of many binary data formats available on Java platform (see "Performance Comparison" link below)
According to fasterxml documentation ( http://wiki.fasterxml.com/JacksonStreamingApi ), the JsonParser does not read all the stream in the memory. It only read the stream everytime it is needed.

Related

Parse Large Json File ( around 300 Mb ) to list of POJO

I have a large JSON file roughly around 300 MB, I am parsing it using Jackson object mapper:-
private void parseJson(Object obj) {
ObjectMapper map = new ObjectMapper();
Map<String,List<POJOClass>> result = new HashMap<>();
String str = map.writeValueAsString(obj);
map.registerModule(new JSR10module());
result = map.readValue(new ByteInputStream(str.getBytes(StandardCharSets.UTF_8)),
new TypeReference<Map<String,List<POJOClass>>>
() {});
}
Parameter to parseJson is an Object which contains JSON String.
This works fine for JSON files of around 150 MB, however, it starts to fail with Heap Space error when JSON files are around 250-300 MB. I am using jackson 2.4.0
You don't have enough memory in your java process to handle such huge file.
When launching your app use the -xmx option to increase the maximum memory used by your java process:
java -Xmx512m ...
or
java -Xmx1G ...
Have you tried streaming over the JSON using, for instance: https://www.baeldung.com/jackson-streaming-api so that you do not have to put everything into memory at once.
For huge big json file, you should use jackson stream api.
You can find the document here. It cost little memory than normal way. In general, your POJO must satisfy certain shapes, it should be able to be separated into many independent parts. Such as a list of object. Then with jackson stream api, you can achieve the object in the list one by one.

How to put a limit on memory usage while parsing a JSON string?

How can I rule out a java.lang.OutOfMemoryError when calling
new JSONObject(longAndMalformedJSONString)
I'm using the org.json implementation of a JSON parser.
I'm not looking for a way to decode the bad JSON String. I just want to put an upper limit on memory usage (and possibly CPU usage) and maybe get an exception that I can recover from.
Or, alternatively, is it safe to say, memory usage while parsing will never exceed a certain ratio compared to input string length? Then I could just limit that.
Or is there an alternate library that offers that?
There are two approaches when reading serialized data (JSON, XML, whatever): you either parse the entire input and keep the object in memory, or you navigate the stream via the provided API and you just keep the pieces you are interested in. It seems org.json doesn't have a streaming API, but more sophisticated libraries like Gson do:
JsonReader reader = new JsonReader(new InputStreamReader(in, "UTF-8"));
List<Message> messages = new ArrayList<Message>();
reader.beginArray();
while (reader.hasNext()) {
Message message = gson.fromJson(reader, Message.class);
messages.add(message);
}
reader.endArray();
reader.close();
You can also also put limits on the input, but it depends on the protocol you use for transferring the JSON payload

InputBuffer in Jackson parser

I use Jackson for parsing JSON files. The file is passed as a stream to Jackson to create a parser. Here is the sample code :
JsonFactory f = new JsonFactory();
JsonParser parser = f.createParser(inputStream);
I know that createParser() prefetch data from stream into an input buffer. Subsequent call to nextToken() is served from this inputBuffer. In my application, along with parsing, I also want to keep track of the file offset of the inputStream until which I have consumed the data. Due to this buffering, offset tracking does not work.
Does anyone know if there is a way to disable buffering in Jackson? Or, is there an API call that I can use to determine if the buffer has data that has not yet been consumed?
Why not use JsonParser.getTokenLocation() or JsonParser.getCurrentLocation() to keep track of the file offset?
The returned object seems to have the byte position (in addition to the character position), which should be to position in the underlying input stream...
http://fasterxml.github.io/jackson-core/javadoc/2.2.0/com/fasterxml/jackson/core/JsonParser.html#getCurrentLocation()

JSON variable substitution placeholders

I'm looking for a Java library that can do variable substitution when marshaling Json to an Object on-the-fly.
For example, the Json template would have variable substitution sites/placeholders like:
{
"User": {
"Name": "${name}",
"Age": ${age}
}
}
that would result in the Java Object representing the following once marshaled:
{
"User": {
"Name": "Elvis",
"Age": 80
}
}
What I want is something along the lines of this:
ObjectMapper mapper = new ObjectMapper();
User user = mapper.readValue(new File("c:\\user.json.template"), User.class, "Elvis", 80);
This is really out of scope for JSON libraries, since JSON format itself has no support or notion of variable substitution. Your best bet may be to use a JSON library (like Jackson) to get a tree representation (for Jackson that would be JsonNode), then traverse it, and use another library for handling textual substitution. There are many that can do that, from stringtemplate to others (perhaps MessageFormat that other answer refers to).
It may also be possible to revert the other, if your substitutions will never funny "funny characters" (quotes, linefeeds); if so, you could use string templating lib first, JSON parser next feeding processed text.
But it is bit riskier, as usually there is eventually one case where you do end up trying to add a quote, say, and then parsing fails.
You can use a template engine such as Apache Velocity to preprocess the input stream, and then parse the result with a JSON parser. To make the process "on-the-fly", you can run Velocity in a separate thread, and let it write it's output to a PipedOutputStream.
May be the MessageFormat object from apache commons could help ?
Here is an example : http://examples.javacodegeeks.com/core-java/text/messageformat/java-messageformat-example/

Jackson JSON Streaming API: Read an entire object directly to String

I'm trying to stream in an array of JSON, object by object, but I need to import it as a raw JSON String.
Given an array of input like so:
[
{"object":1},
{"object":2},
...
{"object":n}
]
I am trying to iterate through the Strings:
{"object":1}
{"object":2}
...
{"object":n}
I can navigate the structure using the streaming API to validate that I have encountered an object, and all that, but I think the way I'm getting my String back is ideal.
Currently:
//[...]
//we have read a START_OBJECT token
JsonNode node = parser.readValueAsTree();
String jsonString = anObjectMapper.writeValueAsString(node);
//as opposed to String jsonString = node.toString() ;
//[...]
I imagine the building of the whole JsonNode structure involves a bunch of overhead, which is pointless if I'm just reserializing, so I'm looking for a better solution. Something along the lines of this would be ideal:
//[...]
//we have read a START_OBJECT token
String jsonString = parser.readValueAsString()
//or parser.skipChildrenAsString()
//[...]
The objects are obviously not as simple as
{"object":1}
which is why I'm looking to not waste time doing pointless node building. There may be some ideal way, involving mapping the content to objects and working with that, but I am not in a position where I am able to do that. I need the raw JSON string, one object at a time, to work with existing code.
Any suggestions or comments are appreciated. Thanks!
EDIT : parser.getText() returns the current token as text (e.g. START_OBJECT -> "{"), but not the rest of the object.
Edit2 : The motivation for using the Streaming API is to buffer objects in one by one. The actual json files can be quite large, and each object can be discarded after use, so I simply need to iterate through.
There is no way to avoid JSON tokenization (otherwise parser wouldn't know where objects start and end etc), so it will always involve some level of parsing and generation.
But you can reduce overhead slightly by reading values as TokenBuffer -- it is Jackson's internal type with lowest memory/performance overhead (and used internally whenever things need to be buffered):
TokenBuffer buf = parser.readValueAs(TokenBuffer.class);
// write straight from buffer if you have JsonGenerator
jgen.writeObject(buf);
// or, if you must, convert to byte[] or String
byte[] stuff = mapper.writeValueAsBytes();
We can do bit better however: if you can create JsonGenerator for output, just use JsonGenerator.copyCurrentStructure(JsonParser);:
jgen.copyCurrentStructure(jp); // points to END_OBJECT after copy
This will avoid all object allocation; and although it will need to decode JSON, encode back as JSON, it will be rather efficient.
And you can in fact use this even for transcoding -- read JSON, write XML/Smile/CSV/YAML/Avro -- between any formats Jackson supports.

Categories