How to intercept Jackson JsonNodes deserialization - java

In my programm I'm doing:
private static final ObjectMapper MAPPER = new GridJettyObjectMapper();
....
JsonNode node = MAPPER.readTree(content);
My JSON contains a lot of SAME strings and I would like intercept readTree() method and put into TextNodes cached Strings (using WeakHashMap for example).
I hope this will save me a lot of memory. For now my app just OOME and in heap dump I see millions of same Strings in TextNodes.
Any idea how to do this?

After some debug I replaced
JsonNode node = MAPPER.readTree(content);
with
Pojo p = MAPPER.readValue(Pojo.class, new PojoDeserializer());
And implement in PojoDeserializer logic that do not generate many TextNodes.
I used JsonParser streaming API.

Related

Parsing a text file using java with multiple values per line to be extracted

I'm not going to lie I'm really bad at making regular expressions. I'm currently trying to parse a text file that is giving me a lot of issues. The goal is to extract the data between their respective "tags/titles". The file in question is a .qbo file laid out as follows personal information replaced with "DATA": The parts that I care about retrieving are between the "STMTTRM" and "/STMTTRM" tags as the rest I don't plan on putting in my database, but I figured it would help others see the file content I'm working with. I apologize for any confusion prior to this update.
FXHEADER:100
DATA:OFXSGML
VERSION:102
SECURITY:NONE
ENCODING:USASCII
CHARSET:1252
COMPRESSION:NONE
OLDFILEUID:NONE
NEWFILEUID:NONE
<OFX>
<SIGNONMSGSRSV1><SONRS>
<STATUS><CODE>0</CODE><SEVERITY>INFO</SEVERITY></STATUS>
<DTSERVER>20190917133617.000[-4:EDT]</DTSERVER>
<LANGUAGE>ENG</LANGUAGE>
<FI>
<ORG>DATA</ORG>
<FID>DATA</FID>
</FI>
<INTU.BID>DATA</INTU.BID>
<INTU.USERID>DATA</INTU.USERID>
</SONRS></SIGNONMSGSRSV1>
<BANKMSGSRSV1>
<STMTTRNRS>
<TRNUID>0</TRNUID>
<STATUS><CODE>0</CODE><SEVERITY>INFO</SEVERITY></STATUS>
<STMTRS>
<CURDEF>USD</CURDEF>
<BANKACCTFROM>
<BANKID>DATA</BANKID>
<ACCTID>DATA</ACCTID>
<ACCTTYPE>CHECKING</ACCTTYPE>
<NICKNAME>FREEDOM CHECKING</NICKNAME>
</BANKACCTFROM>
<BANKTRANLIST>
<DTSTART>20190717</DTSTART><DTEND>20190917</DTEND>
<STMTTRN><TRNTYPE>POS</TRNTYPE><DTPOSTED>20190717071500</DTPOSTED><TRNAMT>-5.81</TRNAMT><FITID>3893120190717WO</FITID><NAME>DATA</NAME><MEMO>POS Withdrawal</MEMO></STMTTRN>
<STMTTRN><TRNTYPE>DIRECTDEBIT</TRNTYPE><DTPOSTED>20190717085000</DTPOSTED><TRNAMT>-728.11</TRNAMT><FITID>4649920190717WE</FITID><NAME>CHASE CREDIT CRD</NAME><MEMO>DATA</MEMO></STMTTRN>
<STMTTRN><TRNTYPE>ATM</TRNTYPE><DTPOSTED>20190717160900</DTPOSTED><TRNAMT>-201.99</TRNAMT><FITID>6674020190717WA</FITID><NAME>DATA</NAME><MEMO>ATM Withdrawal</MEMO></STMTTRN>
</BANKTRANLIST>
<LEDGERBAL><BALAMT>2024.16</BALAMT><DTASOF>20190917133617.000[-4:EDT]</DTASOF></LEDGERBAL>
<AVAILBAL><BALAMT>2020.66</BALAMT><DTASOF>20190917133617.000[-4:EDT]</DTASOF></AVAILBAL>
</STMTRS>
</STMTTRNRS>
</BANKMSGSRSV1>
</OFX>
I want to be able to end with data that looks or acts like the following so that each row of data can easily be added to a database:
Example Parse
As David has already answered, It is good to parse the POS output XML using Java. If you are more interested about about regex to get all the information, you can use this regular expression.
<[^>]+>|\\n+
You can test in the following sites.
https://rubular.com/
https://www.regextester.com/
Given this is XML, I would do one of two things:
either use the Java DOM objects to marshall/unmarshall to/from Java objects (nodes and elements), or
use JAXB to achieve something similar but with better POJO representation.
Mkyong has tutorials for both. Try the dom parsing or jaxb. His tutorials are simple and easy to follow.
JAXB requires more work and dependencies. So try DOM first.
I would propose the following approach.
Read file line by line with Files:
final List<String> lines = Files.readAllLines(Paths.get("/path/to/file"));
At this point you would have all file line separated and ready to convert the string lines into something more useful. But you should create class beforehand.
Create a class for your data in line, something like:
public class STMTTRN {
private String TRNTYPE;
private String DTPOSTED;
...
...
//constructors
//getters and setters
}
Now when you have a data in each separate string and a class to hold the data, you can convert lines to objects with Jackson:
final XmlMapper xmlMapper = new XmlMapper();
final STMTTRN stmttrn = xmlMapper.readValue(lines[0], STMTTRN.class);
You may want to create a loop or make use of stream with a mapper and a collector to get the list of STMTTRN objects:
final List<STMTTRN> stmttrnData = lines.stream().map(this::mapLine).collect(Collectors.toList());
Where the mapper might be:
private STMTTRN mapLine(final String line) {
final XmlMapper xmlMapper = new XmlMapper();
try {
return xmlMapper.readValue(line, STMTTRN.class);
} catch (IOException e) {
throw new RuntimeException(e);
}
}

Edit an inputstream and write it out as a stream without storing all of its contents into memory at any given time

I have a method that is receiving an InputStream of data in JSON format. Using Jackson's ObjectMapper, I am able to convert the InputStream into a JsonNode that I can edit, like so:
JsonNode revisions = mapper.readTree(data);
From there, I am able to iterate through each element and make my changes. In doing so, though, I am storing all the elements in a list and then converting the list to a Stream. I would prefer to operate on each element one at a time from the InputStream, that way I don't have to store it all in memory.
Here's what I have:
public Stream<Revision> jsonToRevisionObjects(InputStream allData) throws IOException {
// convert the InputStream to a JsonNode
JsonNode revisions = mapper.readTree(allData);
List<Revision> newRevisions = new ArrayList<>();
for (JsonNode revision : revisions.get("results")) {
// create Revision objects and add them to newRevisions
}
return newRevisions.stream();
}
This essentially defies the point of even using Stream since I'm storing all the new Revision objects into memory. Instead, I'd like to read one element at a time and send it off to the stream before loading in the next element. Is there a way of doing this? Based on surrounding code, the input parameter will always be an InputStream (there lies the problem) and the return type will always be Stream.
this might be possible if I was able to convert an InputStream into a Stream and do the following:
return allDataStream.map(rev -> {
// create Revision object
});
but I'm not sure how to get to that point if it's a possibility.
To use streaming reads, you must use JsonParser, either directly, or by passing it to ObjectMapper/ObjectReader. If so, you may read sub-trees as JsonNode if you want to.
To construct a JsonParser from InputStream is simple:
JsonParser p = mapper.getFactory().createParser(inputStream);
but operation after this varies; you can either read token-stream directly from JsonParser, or ask ObjectMapper or ObjectReader to read next "value" from stream. And then the structure of JSON data matters; you may need to advance parser's stream (nextToken()) if you want to avoid reading all the contents.

Using jedis How to cache Java object

Using Redis Java client Jedis How can I cache Java Object?
you should convert your object as a json string to store it, then read the json and transform it back to your object.
you can use Gson in order to do so.
//store
Gson gson = new Gson();
String json = gson.toJson(myObject);
jedis.set(key,json);
//restore
String json = jedis.get(key);
MyObject object=gson.fromJson(json, MyObject.class);
You can't store objects directly into redis. So convert the object into String and then put it in Redis.
In order to do that your object must be serialized. Convert the object to ByteArray and use some encoding algorithm (ex base64encoding) and convert it as String then store in Redis.
While retrieving reverse the process, convert the String to byte array using decoding algorithm (ex: base64decoding) and the convert it to object.
I would recommend to use more convenient lib to do it: Redisson - it's a Redis based framework for Java.
It has some advantages over Jedis
You don't need to serialize/deserialize object by yourself each time
You don't need to manage connection by yourself
You can work with Redis asynchronously
Redisson does it for you and even more. It supports many popular codecs like Jackson JSON, Avro, Smile, CBOR, MsgPack, Kryo, FST, LZ4, Snappy and JDK Serialization.
RBucket<AnyObject> bucket = redisson.getBucket("anyObject");
// set an object
bucket.set(new AnyObject());
// get an object
AnyObject myObject = bucket.get();

quick-Json deserialization casting issues

I am using quick-Json to serialize a HashMap(String, String[]) in my program, but I am having trouble deserializing the object. I use this code to serialize the map
JsonGeneratorFactory generatorFactory = JsonGeneratorFactory.getInstance();
JSONGenerator generator = generatorFactory.newJsonGenerator();
String json = generator.generateJson(allCodes);
where all codes is a HashMap(String,String[]). This takes the form
[{"key1":["value1","value2"]}]
Here is the code I use to deserialize the object
JsonParserFactory parseFactory=JsonParserFactory.getInstance();
JSONParser parser = parseFactory.newJsonParser();
output = parser.parseJson(inputString);
The output takes the form
{root=[{"key1":["value1","value2"]}]}
My goal is to cast the above as a HashMap(String, String[]). The added root parameter makes that difficult though. Is there a way to return back to the desired HashMap?
Try:
HashMap outputHashMap = (HashMap) (((ArrayList) ((HashMap) parser.parseJson(json)).get("root")).get(0));
There is probably a more elegant way, but this is the best I can think of right now.
Instead of parser.parseJson(inputString) use parser.parse(inputString).
That will give you the format you are looking for.

Thrift: Serialize + Deserialize changes object

I have a thrift struct something like this:
struct GeneralContainer {
1: required string identifier;
2: required binary data;
}
The idea is to be able to pass different types of thrift objects on a single "pipe", and still be able to deserialize at the other side correctly.
But serializing a GeneralContainer object, and then deserializing it changes the contents of the data field. I am using the TBinaryProtocol:
TSerializer serializer = new TSerializer(new TBinaryProtocol.Factory());
TDeserializer deserializer = new TDeserializer(new TBinaryProtocol.Factory());
GeneralContainer container = new GeneralContainer();
container.setIdentifier("my-thrift-type");
container.setData(ByteBuffer.wrap(serializer.serialize(myThriftTypeObject)));
byte[] serializedContainer = serializer.serialize(container);
GeneralContainer testContainer = new GeneralContainer();
deserializer.deserialize(testContainer, serializedContainer);
Assert.assertEquals(container, testContainer); // fails
My guess is that some sort of markers are getting messed up when we serialize an object containing binary field using TBinaryProtocol. Is that correct? If yes, what are my options for the protocol? My goal is to minimize the size of resulting serialized byte array.
Thanks,
Aman
Tracked it to a bug in thrift 0.4 serialization. Works fine in thrift 0.8.

Categories