While working on an API implemented in Java and one of the operations requires to open a big JSON file and returns an object identified by a given string.
The file in question is formed by an array of objects, tons of object, and it has no sense to read the whole file and create tons of Java objects into memory only to return one.
So, What is a good way to read the JSON file in stream mode?
One of excellent libraries for parsing large JSON files with minimal resources is the popular GSON library. It gets at the same effect of parsing the file as both stream and object. It handles each record as it passes, then discards the stream, keeping memory usage low.
Support arbitrarily complex objects (with deep inheritance hierarchies and extensive use of generic types)
Look at this Detailed Tutorial for GSON approach,to solve it problem.
Related
I'm working on a proprietary TCP protocol. This protocol sends and receive messages with a specific sequence of bytes.
I should be complaiant to this protocol, and i cant change it.
So my input / output results are something like that :
\x01\x08\x00\x01\x00\x00\x01\xFF
\x01 - Message type
\x01 - Message type
\x00\x01 - Length
\x00\x00\x01 - Transaction
\xFF - Body
The sequence of field is important. And i want only the values of the fields in my serialization, and nothing about the structure of the class.
I'm working on a Java controller that use this protocol and I've thought to define the message structures in specific classes and serialize/deserialize them, but I was naive.
First of all I tried ObjectOutputStream, but it output the entire structure of the object, when I need only the values in a specific order.
Someone already faced this problem:
Java - Object to Fixed Byte Array
and solved it with a dedicated Marshaller.
But I was searching for a more flexible solution.
For text serialization and deserialization I've found:
http://jeyben.github.io/fixedformat4j/
that with annotation defines the schema of the line. But it outputs a String, not a byte[]. So 1 is output like "1" that is represented differently based on encoding, and often with more bytes.
What I was searching for is something that given the order of my class properties will convert each property in a bunch of bytes (based on the internal representation) and append them to a byte[].
Do you know some library used for that purpose?
Or a simple way to do that, without coding a serialization algorithm for each of my entities?
Serialization just isn't easy; it sounds from your question like you feel you can just invoke something and out rolls compact, simple, versionable, universal data you can then put on the wire. What you need to fix is to scratch the word 'just' from that sentence. You're going to have to invest some time and care.
As you figured out already, java's baked in serialization has a ton of downsides. Don't use that.
There are various serializers. The popular ones are things like GSON or Jackson, which lets you serialize java objects into JSON. This isn't particularly efficient, and is string based. This sounds like crucial downsides but they really aren't, see below.
You can also spend a little more time specifying the exact format and use protobuf which lets you write a quite lean and simple data protocol (and protobuf is available for many languages, if eventually you want to write an participant in this protocol in non-java later).
So, those are the good options: Go to JSON via Jackson or GSON, or, use protobuf.
But JSON is a string.
You can turn a string to bytes trivially using str.getBytes(StandardCharsets.UTF_8). This cannot fail due to charset encoding differences (as long as you also 'decode' in the same fashion: Turn the bytes into a string with new String(theBytes, StandardCharsets.UTF_8). UTF-8 is guaranteed to be available on all JVMs; if it is not there, your JVM is as broken as a JVM that is missing the String class - not something to worry about.
But JSON is inefficient.
Zip it up, of course. You can trivially wrap an InputStream and an OutputStream so that gzip compression is applied which is simple, available on just about every platform, and fast (it's not the most efficient cutting edge compression algorithm, but usually squeezing the last few bytes out is not worth it) - and zipped-up JSON can often be more efficient that carefully handrolled protobuf, even.
The one downside is that it's 'slow', but on modern hardware, note that the overhead of encrypting and decrypting this data (which you should obviously be doing!!) is usually multiple orders of magnitude more involved. A modern CPU is simply very, very fast - creating JSON and zipping it up is going to take 1% of CPU or less even if you are shipping the collected works of shakespeare every second.
If an arduino running on batteries needs to process this data, go with uncompressed, unencrypted protobuf-based data. If you are facebook and writing the whatsapp protocol, the IAAS creds saved by not having to unzip and decode JSON is tiny and pales in comparison to the creds you spend just running the servers, but at that scale its worth the development effort.
In just about every other case, just toss gzipped JSON on the line.
I am relatively new to Java and have much more experience with Matlab. I was wondering what the best way is to store a relatively small amount of data, which has been calculated in one program, that should be used in another program.
Example: program A computes 100 values to be stored in an array. Now I would like to access this array in program B, as it needs these values. Of course, I could just write one program all together, which also implements the part of A. However, now every time I want to execute the total program, all the values have to be calculated again (in part A), which is a waste of resources. In Matlab, I was able to easily save the array in a .mat file and load it in a different script.
Looking around to find my answer I found the option of serializing (What is object serialization? ), which I think would be a suitable for doing what I want. My question: is serializing the easiest and quickest solution to store a small amount of data in Java, or is there a quicker, more user-friendly option (like .mat files in Matlab)?
I think you have several options to do this job. Java object serialization is one possible way. From my point of view there are other options to serialize the data:
Write and read a simple text file to store the computed values.
Using Java Architecture for XML Binding (JAXB) to write annotated Java classes to XML file. Same for JSON is also available.
Using a lightweight database like SQLite or HSQLDB (native Java database).
Using Apache Thrift or Protocol Buffer to de/serializing Java objects to files.
Say I have a large file with many objects already serialized (this is the easy part). I need to be able to have random access to the objects in the file when I go to deserialize. The only way I can think to do this would be to somehow store the file pointer to each object.
Basically I will end up with a large file of serialized objects and don't want to deserialize the entire file when I go to retrieve just one object.
Can anyone point me in the right direction on this one?
You can't. Serialization is called serialization for a reason. It is serial. Random access into a stream of objects will not work, for several reasons including the stream header, object handles, ...
Straight serialization will never be the solution you want.
The serial portion of the name means that the objects are written linearly to the ObjectOutputStream.
The serialization format is well known,
here is a link to the java 6 serialization format.
You have several options:
Unserialize the entire file and go from there.
Write code to read the serialized file and generate an index.
Maybe even store the index in a file for future use.
Abandon serialization to a file and store the objects in a database.
I am just getting in to writing networked code using Sockets in Java. I'm just making some test programs. Originally I was going to send data as comma separated values, but I recently discovered ObjectOutputStream. Which method would be faster or more bandwidth efficient? For example, if I'm making a game where I have to send x and y coordinates very often, should I send it through PrintWriter separated by a comma, or make a Position class and send an instance over ObjectOutputStream. What if I change my code and need to send a lot more data?
What are the pros and cons of sending data as CSV over PrintWriter vs as fields in an object over ObjectOutputStream?
An ad-hoc binary format has a good chance of being more bandwidth-efficient than the default serialization format, which should be (but it's a wild guess, and it depends on the nature and amount of data: you should measure it if it matters) more or less as bandwidth efficient than a text-based format.
But bandwidth efficiency is not the only thing that matters.
Using serialization, the client and the server must be written in Java, and have the classes of the serialized objects in their classpath. If you intend to have clients written in any language, you shouldn't consider it.
If serialization is OK, it's of course a really easy way to transform almost any Java object into bytes, which allows you to avoid defining a format.
Note that there are alternatives that provide almost the same flexibility, but don't have the Java-only disadvantage of serialization. For example, JSON, XML, or protobuf.
I think CSV is smaller.
If you want to check data size,please try to output to a File.
and I don't recommend ObjectOutputStream to you by other reason.
Because you have to keep Objects compatibility.
Did you research about serialize and serialVersionUID?
Please check java.io.Serializable
In my program, I am reading a series of text files from the disk. With each text file, I process out some data and store the results as JSON on the disk. In this design, each file has its own JSON file. In addition to this, I also store some of the data in a separate JSON file, which stores relevant data from multiple files. My problem is that the shared JSON grows larger and larger with every file parsed, and eventually uses too much memory. I am on a 32-bit machine and have 4 GB of RAM, and cannot increase the memory size of the Java VM anymore.
Another constraint to consider is that I often refer back to the old JSON. For instance, say I pull out ObjX from FileY. In pseudo code, the following happens (using Jackson for JSON serialization/deserialization):
// In the main method.
FileYJSON = parse(FileY);
ObjX = FileYJSON.get(some_key);
sharedJSON.add(ObjX);
// In sharedJSON object
List objList;
function add(obj)
if (!objList.contains(obj))
objList.add(obj);
The only thing I can think to do is use streaming JSON, but the problem is that I frequently need to access the JSON that came before, so I don't know that stream will work. Also my data types on not only strings, which prevents me from using Jackson's streaming capabilities (I believes). Does anyone know of a good solution?
If you're getting to the point where your data structures are so large that you're running out of memory, you'll have to start using something else. I would recommend that you use a database, which will significantly speed up data retrieval and storage. It will also make the limit of your data structure the size of your hard drive, instead of the size of your RAM.
Try this page for an introduction to Java and Databases.
I can't believe that you really need nearly 4GB RAM only for text files and JSON.
I see three possible solutions.
Switch to plain text if it's possible. That is not that memory hungry.
Just open and close the files as you need them. You can order the files to a specific naming convention, like the first two/three/... digits of their hashes, and open them as you need them.
If you have so many data, you could maybe switch to a database. That would save a lot of resources.
I would prefer option 3 if it's possible for you.
you can make api and get responce.body from it