I am working with a nasty API that returns complex JSON (more than 4200 lines) which includes multi dimensional arrays.
Some Objects are repeated,
in different locations of JSON.
For example:
"User":{
"$id": "9",
"Code": "NU",
"DisplayName": "My Name",
"Experience": 2.41
},
Is there an easy way to parse entire JSON file and find list of Users?
Sometimes User is on the top level and sometimes it is nested in a four dimensional array.
Short Answer
No,
you will need to do work to achieve your goal.
Some Details
This problem is not hopeless.
You can use an event-based JSON parser and ignore everything that is not related to the User object.
Here is a related Stack Overflow question with answers.
Also,
try a google search for "json sax parser java" and you will find about a million links.
Another option might be to use JSON Xpaths.
Try a google search for "json xpath java".
Related
I have the following JSON list:
[
{
"name": "John Doe The First",
"age": 36
},
{
"name": "John Doe The Second",
"age": 10
}
]
And the following Java record:
public record Person(String name, int age) {}
Is it possible to Jackson not parse JSON entries matching a condition such as age under 18 without writing a Consumer and filtering out these JSonNodes as indicated in this answer?
What I want is described in the third and, until now last, comment of this very answer.
EDIT:
Based on Hiran Chaudhuri's answer it makes perfect sense that is impossible to the parser to not parse the entire file which leads me to a second question. Isn't possible to Jackson, once the JsonNode is converted into a POJO, to filter these ones out based on the described conditions?
Something similar to what #JsonFilter does but not forcing developers to write a filter based on the Stream backed by the array returned nor creating a whole new filtered list, the array returned would contain just what is needed.
What you are asking is to filter the entries returned from parsing the file, but you want to perform the filtering before the parser (Jackson) has had a chance to read the file. This is not possible.
You would have to scan the inputstream and detect the boy has the right age. If not, you'd want to skip that entry and perform parsing on the next entity. But would you know how many bytes to skip? All that is the work of a parser.
You you'd better parse the JSON and perform your filtering afterwards before the application starts processing it.
I have the following HATEOAS style JSON response:
{
"count": 37,
"next": "https://swapi.co/api/species/?page=2",
"previous": null,
"results": [
...
]
}
I have to read all results from that endpoint, however, only way to this is to go through pages, they don't provide any endpoint with the whole result as one page.
My idea to solve this is to simply get next field from the response and repeat until "next" is not null.
Is there a better way offered in Java or Gson?
If you already know the total(count=37; total=100;...) number of element then you can fetch the result simply by pagination. As we can see only the previous and next field are provided so you have to fetch next until its value become null. Java or Gson has nothing to do with it, all it depends on how much information you are getting from backend.
My idea to solve this is to simply get next field from the response and repeat until "next" is not null.
Your approach seems legit.
Unless you can't provide a query param like limit or something similar to that, I feel like this would not be possible. As ruhul stated, your approach seems good.
I have a really big json to read and store into database. I am using mix mode of stream and object using gson. If file format is correct it works like a charm. but if format is not correct within an object then whole file is skipped with an exception (reader.hasNext() throws exception).
Is there a way to skip a particular bad record and continue to read with rest of file?
Sample json file structure -
[{
"A":1,
"B":2,
"C":3
}]
and let say comma or colon is missing in this object.
Another example is if there are multiple objects and comma is missing between }(no comma){ 2 objects.
let say comma or colon is missing in this object
Unfortunately if you're missing a comma or a colon, then it's impossible to parse the JSON data.
But:
it's actually a good thing the parser doesn't accept this data because it protects you from accidentally reading garbage. Since you are putting this data into a database, it's protecting you from potentially filling your database with garbage.
I believe the best solution is to fix the producer of this JSON data and implement the necessary safe guards to prevent bad JSON data in the future.
This question already has answers here:
Query a JSONObject in java
(6 answers)
Closed 4 years ago.
I have some json data
{
"attributesMappings": [
{
"domainType": "WI",
"attribute": [
{
"staticAttributes": [
{
"attributeName": "test",
"attributeValue": "test",
"required": true
}
]
}
]
},
{
"domainType": "PI",
"attribute": null
}
]
}
I can read the object using
JSONArray vendorData = mainObj.getJSONArray("attributesMappings");
Suppose I want to get only the object where where domain type ="WI", I know it can be done using
JSONObject obj = vendorData.getJSONObject(0);
And then I can perform the manipulations, suppose I dont know at what index "WI" will be stored, is there a way of getting the data. I know we can iterate over the array items and match for domainType.
Can we do it in a way whereby using "WI" in the getJSONObject or something of that sort I can get the complete object.
JSONObject domainType = attributeMappings.getJSONObject("WI");
AFAIK, there is no more efficient way of searching a JSON object tree than iterating it. But the cost of searching will actually be small compared with the cost of parsing the object tree.
You could potentially do better than the "parse to JSONObject" using a stream-based parser, and coding your parse event handlers to look for the information you are trying to extract. If the information you are looking for is near the beginning of the JSON serialization, you could save time by abandoning the parse as soon as you get a search "hit".
If you only doing one search of the JSON, that is the end of the story.
If you are going to search the same JSON repeatedly, then the way to get better performance is:
Parse the JSON tree, or map it to a POJO tree
Construct a separate index data structure for the in-memory tree.
Use the index to speed up searches.
So, in your example you might build an index for all attribute mappings based on the domainType field.
Alternatively, extract just the information you want into a data structure that is designed for your needs.
There are libraries around that do the equivalent of XQuery and XPath for JSON. This approach is definitely more convenient than writing a bunch of iteration code; e.g. (from #cricket_007's comment):
The JSONPath query [for your example] would be $.attributesMappings[?(#.domainType == "WI")]
For more information: Query a JSONObject in java
However, I would be hesitant to use "JSON query" libraries if you are looking for a more efficient solution.
Consider the following (simplied) JSON tree structure:
{
"id": "1",
"metaData": {
"name": "nestedName"
},
"name": "rootName"
}
I put this stucture in a com.fasterxml.jackson.databind.JsonNode object. To get the String values of these columns, I need only include this statement in my Java code:
String id = jsonNode.findPath("id").textValue();
I love this not only for its simplicity, but that my code doesn't have to be aware of the JSON tree structure it's parsing. I realize that if I want [root][name] specifically though, I'll have to have some kind of determination logic.
My question is, what is the least amount of logic I will require in order to somehow distinguish/specify what "name" to get? I've looked into the JsonNode.findValues(String fieldName) to get a list of the values, but still not sure how I would then determine which value was coming from which "name" and how to choose the "root" one, or at least, the one closest to the root.
Apologies if this is a duplicate question but I couldn't find an exact match, so asking again.
If you want a node directly underneath the root use .get()
jsonNode.get("id").textValue();
If you want to get "name" but you have problems with ambiguity you can do something like
jsonNode.findPath("metaData").findPath("name").textValue();
But then of course you know have to know something about the schema.