Jackson multiple objects and huge json files - java

I get the feeling that the answer might be a duplicate of this: Jackson - Json to POJO With Multiple Entries but I think that potentially the question is different enough. Also I'm using raw data binding rather than full data binding.
So like the asker of that question, I have multiple objects in a file and I'm trying to turn them into POJOs and stuff them into a database of my design so I can access the data quickly rather than slowly.
The files here are in the order of tens of GB, with up to millions of objects in each file. Anyway here is what I have so far:
ObjectMapper mapper = new ObjectMapper();
Map<String,Object> data = mapper.readValue(new File("foo.json"), Map.class);
System.out.println(data.get("bar"));
And this works great for printing the bar element of the first object in foo, but I need a way to iterate through every element in a way that won't eat up all my memory.
Thanks.

You don't have to choose between Streaming (JsonParser) and ObjectMapper, do both!
Traverse a bit with parser, but then call JsonParser.readValueAs(MyType.class) to bind individual JSON Object.
Or, call ObjectMapper's readValue() method passing JsonParser at appropriate points. Or use ObjectMapper.reader(Type.class).readValues() and iterate that way.

Use this code sample to see the basic idea.
final InputStream in = new FileInputStream("json.json");
try {
for (Iterator it = new ObjectMapper().readValues(
new JsonFactory().createJsonParser(in), Map.class); it.hasNext();)
System.out.println(it.next());
}
finally { in.close();} }

Assuming you have an array wrapping your objects, create a JsonParser and then call readValuesAs with the appropriate type. It gives you back an Iterator with all your objects that reads through the file as you consume the objects.

Related

Get all JSON keys in the order they are defined in the file

I am trying to get an array of all the json keys found in a json file loaded with Google Gson. I have used SomeJsonObject.keySet() in the past, but sets do not preserve the order of their contents.
What I am looking for is something like below:
JsonElement schema_pre = new JsonParser().parse(new FileReader("SchemaFile.json"));
JsonObject schema = gson.fromJson(schema_pre, JsonObject.class);
String[] keys_all = schema.keyArray();
From JSON RFC7159:
An object is an unordered collection of zero or more name/value pairs [...]
Consequently, the API you are asking for would return more information than actually contained in the JSON object. This behaviour would therefore not comply with the standard.
If you really need an ordering of your JSON objects, you can always express the information in a JSON compliant way by using arrays instead of objects:
{"first":1, "second":2} // Unordered
[{key:"first",value:1}, {key:"second",value:2}] // Ordered
Another motivation for expressing the information as arrays is that keys might change. Most NoSQL databases are capable of creating indexes on object attributes (e.g. MongoDB) and the normalization above is usually the best way to go, even when an ordering is not required
However, if desired, the map you are looking for can still be created as a temporary index for efficient access of JSON objects by looking them up using a specific key.
Consider JsonReader, it reads from beginning to end:
JsonReader reader = new JsonReader(/*Reader*/);
List<Object> objects = new ArrayList<>();
while (reader.hasNext()) {
String name = reader.nextName();
/////....
}

Explicit type conversion between child spring object, and it's super java.util object

In spring I am using jdbcTemplate, but having a problem that it is returning a Linkedcaseinsensitivemap when querying for a List, when doing the following I still get the spring linkedcaseinsensitivemap, even if I cast it to java util List and define the left-side of the assignment as a java.util.List.
Firstly how is that even possible?
final java.util.List<Map<String, Object>> list = (java.util.List<Map<String, Object>>) jdbc
.queryForList("SELECT * FROM customer");
so, how would one achive doing this type of upcaste?
without needing to declare a second list allocate memory for it and then put the objects manually into the java.util.List?
Since the LinkedCaseInsensitive is subclassing the java object, Im having a hard time figuring out how to cast to the super object which is the java List. How to achieve this is a mystery at the moment.
since there is no way currently to know which brokers will use our AMQ, the goal is too strictly keep to jms objects,
So I can't start sending spring objects, since jms should be our standard, also please note I do not have the option to implement the AMQProtocol, I need to send basic java objects,
Since serialising to JSON has been suggested I will explain why it does not work in this case, why I'll need to send the Objects "as-are" to the receiver since they will put it into a Notes document.
for (int i = 1; i <= metadata.getColumnCount(); i++) {
String columnName = metadata.getColumnName(i);
Object values = sqlConnection.getRset().getObject(i);
doc.replaceItemValue(columnName, values);
}
So SO'ers, how does one achieve doing this more beautifully?
please help
thanks in advance!
A SQL select can return multiple rows, and each row has multiple selected columns.
The queryForList method you are calling returns a List with for each selected row a Map mapping column name to column value.
Map and List are interfaces, so Spring is free to pick whatever implementation it likes. It chooses the LinkedCaseInsensitiveHashMap for the Map because that map will list the keys in the order of insertion. So the order in which the columns were selected does not get lost.
If you wish to send the result list to a receiver that you know little about, you can probably best serialize it to JSON and send it as a text message.
You can serialize to JSON using a library like Gson or Jackson2. You create a serializer and feed it the object you wish to convert to a String.
So for example in Gson, where the serializer class is called Gson:
TextMessage message;
// initialize message and headers however you like
// then serialize it to String:
Gson gson = new Gson();
String json = gson.toJson(list);
// and set it in the message:
message.setText(json);
(You can also let Spring JmsTemplate do this for you using a MessageConverter that converts to JSON but I'd estimate that that's a bit harder to get working.)
Alternatively, if you wish to customize the Map that you send as an ObjectMessage, you can use a different query method that allows you to specify a custom RowMapper that creates a java.util.Map implementation of your liking. Note that if you use a TreeMap, it'll sort the columns alphabetically and if you use a HashMap, it'll put them in random order.
The receiver then can unpack the JSON back into Java objects. gson.fromGson(json) will return a List of Maps.
This is the only way I've figured out how to have a child object to become it's parent class without breaking out into methods - in my described scenario doing the following:
final java.util.ArrayList<java.util.Map<String, Object>> javaList = new java.util.ArrayList<java.util.Map<String, Object>>();
final java.util.List<java.util.Map<String, Object>> list = jdbc
.queryForList("SELECT * FROM customer");
javaList.addAll(list);
But doesn't look good to me, how would one achive this in a more right way?

Jackson - Working with arrays of objects, appending and removing

I'm working with the Jackson API in Java for dealing with JSON. I've been working with it a bit here and there, but nothing too in-depth.
Currently, I'm looking for a good way to take an array of JSON objects (either via a stream or String) that was created from a list of POJOs and append or remove a POJO. In the case of appending, duplicate checking isn't really necessary. As a simple example, let's say I have this array built from a list of Java objects with a single variable named "field":
[{"field":"value"},{"field":"value2"}]
And I'd like to append an object of the same type with "field" set to "value3". I could simply deserialize the whole array into a List of Java Objects, add the new object, then serialize it back into JSON, but that feels like overkill. It would be better if I could use Jackson to simply serialize the new object and append it to the end of the JSON array. The same would apply to removing an existing object from the array.
I've found a way, but strangely, it's over twice as slow as the direct deserialize-add-reserialze method with a list of 500 POJOs that have three fields each, and it only gets worse with more objects.
ObjectMapper mapper = new ObjectMapper();
JsonParser parser = mapper.getJsonFactory().createJsonParser(input);
JsonGenerator gen = mapper.getJsonFactory().createJsonGenerator(output, JsonEncoding.UTF8);
gen.writeStartArray();
parser.nextToken();
while (parser.nextToken() == JsonToken.START_OBJECT) {
//gen.writeTree(parser.readValueAsTree());
//parser.skipChildren();
//EDIT: This is much faster as the only method in the loop:
gen.copyCurrentStructure(parser);
}
gen.writeTree(mapper.valueToTree(/*new Object to add*/);
gen.writeEndArray();
gen.close();
parser.close();
Even if I don't get each object as a tree and instead move them iteratively as fields/values, it's a bit faster, but still considerably slower than the alternative. Is this to be expected or is there a better way to handle it as streaming data rather than the bulk JSON-to-Java-to-JSON method?
EDIT: AHA! Found that the JsonGenerator can directly copy the current structure from a JsonParser withcopyCurrentStructure(JsonParser). Using this in the while loop is faster and now outruns the bruteforce method by a considerable amount.

Jackson handling Wrapped elements

I'm parsing the response from last.fm API. But it seems that they used some wrapper for some of the responses, which is causing a bit of a pain. To put an example:
{
"artists":{
"artist":[
{
"name":"Coldplay",
"playcount":"816763",
"listeners":"120815",
"mbid":"cc197bad-dc9c-440d-a5b5-d52ba2e14234",
"url":"http:\/\/www.last.fm\/music\/Coldplay",
"streamable":"1"
},
{
"name":"Radiohead",
"playcount":"846668",
"listeners":"99135",
"mbid":"a74b1b7f-71a5-4011-9441-d0b5e4122711",
"url":"http:\/\/www.last.fm\/music\/Radiohead",
"streamable":"1"
}
],
"#attr":{
"page":"1",
"perPage":"2",
"totalPages":"500",
"total":"1000"
}
}
}
Not only the response is wrapped in the artists object, but the array of object has also an object wrapper.
So a wrapper class like:
public class LastFMArtistWrapper {
public List<Artist> artists;
}
Would not work. I worked around this, creating two wrapper classes, but this looks really ugly. Is there any way we can use something like the #XMLElementWrapper in Jackson?
The JSON response you are getting back from the provider is a serialized representation of a hierarchy of different objects, but from your description, it sounds like you really only need to use and work with a specific subset of this representation, the collection of artists.
One solution of mirroring this representation involves creating the same hierarchy of Java classes, which creates extra overhead in the form of unneeded classes. From what I understand, this is what you wish to avoid.
The org.json project created a generic JSONObject class, which represents a single, generic key/value pair in a larger JSON representation. A JSONObject can contain other JSONObjects and JSONArrays, mirroring the representation without the extra overhead of maintaining and writing extra classes.
Thus, these two objects can be reused throughout multiple layers of hierarchy in a JSON representation, without requiring you to replicate the structure. Here is an example of how you could proceed:
// jsonText is the string representation of your JSON
JSONObject jsonObjectWrapper = new JSONObject(jsonText);
// get the "artists" object
JSONObject jsonArtists = jsonObjectWrapper.get("artists");
// get the array and pass it to Jackson's ObjectMapper, using TypeReference
// to deserialize the JSON ArrayList to a Java ArrayList.
List<Artist> artists = objectMapper.readValue(
jsonObjectWrapper.getString("artist"),
new TypeReference<ArrayList<Artist>>() { });
Using the above method, you cut out the extra overhead of having to write extra layers of POJO objects that do nothing but add unnecessary clutter.
TestCollectionDeserialization contains some examples of the readValue method when working with collections and may be helpful.

What is the most efficient way to use Jackson to continually parse a bunch of strings?

Let's say I have an iterator of Strings. I want to create an Iterator of Java objects, and efficiently convert from one to the other. I'm not sure what the best way to do this is...the docs I've seen seem to create a new parser per String, but I'm not sure if there is an easier way?
Thanks!
Usually I would recommend just creating new JsonParser (and it does work), but if JSON Strings are very short, alternate method would be to create equivalent of StringReader that works on List or array of Strings -- sort of like java.io.SequenceInputStream, but one that works on Strings.
This should have bit lower overhead, as long as you take care NOT to concatenate Strings, but just represent Reader over equivalent of concatenated sequence.
Jackson can then read a sequence of JSON values using such Reader -- either explicitly one by one, or more conveniently, using ObjectMapper.readValues(...) (or methods from ObjectReader, instance of which you can create using various factory methods ObjectMapper has): something like
ObjectMapper mapper = new ObjectMapper();
MyReader reader = new MyReader(listOfStrings);
MappingIterator<BeanType> it = mapper.reader(BeanType.class).readValues(reader);
while (it.hasNext()) {
BeanType bean = it.nextValue();
}

Categories