Say I have a large file with many objects already serialized (this is the easy part). I need to be able to have random access to the objects in the file when I go to deserialize. The only way I can think to do this would be to somehow store the file pointer to each object.
Basically I will end up with a large file of serialized objects and don't want to deserialize the entire file when I go to retrieve just one object.
Can anyone point me in the right direction on this one?
You can't. Serialization is called serialization for a reason. It is serial. Random access into a stream of objects will not work, for several reasons including the stream header, object handles, ...
Straight serialization will never be the solution you want.
The serial portion of the name means that the objects are written linearly to the ObjectOutputStream.
The serialization format is well known,
here is a link to the java 6 serialization format.
You have several options:
Unserialize the entire file and go from there.
Write code to read the serialized file and generate an index.
Maybe even store the index in a file for future use.
Abandon serialization to a file and store the objects in a database.
Related
While working on an API implemented in Java and one of the operations requires to open a big JSON file and returns an object identified by a given string.
The file in question is formed by an array of objects, tons of object, and it has no sense to read the whole file and create tons of Java objects into memory only to return one.
So, What is a good way to read the JSON file in stream mode?
One of excellent libraries for parsing large JSON files with minimal resources is the popular GSON library. It gets at the same effect of parsing the file as both stream and object. It handles each record as it passes, then discards the stream, keeping memory usage low.
Support arbitrarily complex objects (with deep inheritance hierarchies and extensive use of generic types)
Look at this Detailed Tutorial for GSON approach,to solve it problem.
System.out.println("Hello brave souls!");
I have a few questions about Object Serialization. I am working on a new version of my math game, and forgot to have it save the game mode on the last three sessions. The records are being saved via object serialization, which leads me here. What I want to know is:
1.) Does object serialization somehow keep hold of the time at which the objects were saved to the file?
2.) In changing ANY of the n objects in the file, do you have to load the one you want to change into memory (via cycling through the objects with a loop), change it, and then rewrite EVERY LAST FREAKING OBJECT back to the file? //seems tedious
Serialization serializes an entire object graph. If you are saving a game, you will probably want to call ObjectOutputStream.writeObject(myGame), which will write the entire game object and all non-transient properties it references, recursively.
To change it, load the game into memory using an ObjectInputStream, change a value, and write it back out.
You might also want to chain a GZIPInputStream and GZIPOutputStream if you are dealing with large amounts of data, it can shrink serialized size a good bit.
If you are dealing with really large objects, an embedded database might be a better option, since you can change a single field without loading the whole thing into RAM.
Lastly, if you want to update the timestamp of an object when it's serialized, implement the writeObject method in the Serializable pseudo-interface. Update your timestamp, then call defaultWriteObject on the supplied ObjectOutputStream. This will give you 'last persisted' behavior.
private void writeObject(java.io.ObjectOutputStream out) throws IOException
1.) Does object serialization somehow keep hold of the time at which the objects were saved to the file?
No. It saves the object and only the object, plus whatever it needs to reconstitute it, such as its class name.
2.) In changing ANY of the n objects in the file
You can't change any of the N objects in the file. You have to reconstitute the file as objects, change the object(s), and reserialize.
// seems tedious
It is tedious. Nobody said it wouldn't be tedious. You are using it as a database. It isn't. It is a serialization, which also implies that it is a stream. Exactly the same applies to a text file.
If I have a property of an object which is a large String (say the contents of a file ~ 50KB to 1 MB, maybe larger), what is the practice around declaring such a property in a POJO? All I need to do is to be able to set a value from one layer of my application and transfer it to another without making the object itself "heavy".
I was considering if it makes sense to associate an InputStream or OutputStream to get / set the value, rather than reference the String itself - which means when I attempt to read the value of the contents, I read it as a stream of bytes, rather than a whole huge string loaded into memory... thoughts?
What you're describing depends largely on your anticipated use of the data. If you're delivering the contents in raw form, then there may be more efficient ways to manage it.
For example, if your app has a web interface, your app may just provide a URL for a web server to stream the contents to the requester. If it's a CLI-based app, you may be able to get away with a simple file copy. If your app is processing the file, however, then perhaps your POJO could retain only the results of that processing rather than the raw data itself.
If you wish to provide a general pattern along the lines of using POJO's with references to external streams, I would suggest storing in your POJO something akin to a URI that tells where to find the stream (like a row ID in a database or a filename or a URI) rather than storing an instance of the stream itself. In doing so, you'll reduce the number of open file handles, prevent potential concurrency issues, and will be able to serialize those objects locally if needed without having to duplicate the raw data persisted elsewhere.
You could have an object that supplies a stream or an iterator every time you access it. Note that the content has to live on some storage, like a file. I.e your object will store a pointer (e.g. a file path) to the storage and every time someone access it, you open a stream or create an iterator and let that party read. Note also that in order to save on memory, whoever consumes it has to make sure not to store the whole content in memory.
However, 50KB or 1MB is really tiny. Unless you have like gigabytes (or maybe hundred megabytes), I wouldn't try to do something like that.
Also, even if you have large data, it's often simpler to just use files or whatever storage you'll use.
tl;dr: Just use String.
In my program, I am reading a series of text files from the disk. With each text file, I process out some data and store the results as JSON on the disk. In this design, each file has its own JSON file. In addition to this, I also store some of the data in a separate JSON file, which stores relevant data from multiple files. My problem is that the shared JSON grows larger and larger with every file parsed, and eventually uses too much memory. I am on a 32-bit machine and have 4 GB of RAM, and cannot increase the memory size of the Java VM anymore.
Another constraint to consider is that I often refer back to the old JSON. For instance, say I pull out ObjX from FileY. In pseudo code, the following happens (using Jackson for JSON serialization/deserialization):
// In the main method.
FileYJSON = parse(FileY);
ObjX = FileYJSON.get(some_key);
sharedJSON.add(ObjX);
// In sharedJSON object
List objList;
function add(obj)
if (!objList.contains(obj))
objList.add(obj);
The only thing I can think to do is use streaming JSON, but the problem is that I frequently need to access the JSON that came before, so I don't know that stream will work. Also my data types on not only strings, which prevents me from using Jackson's streaming capabilities (I believes). Does anyone know of a good solution?
If you're getting to the point where your data structures are so large that you're running out of memory, you'll have to start using something else. I would recommend that you use a database, which will significantly speed up data retrieval and storage. It will also make the limit of your data structure the size of your hard drive, instead of the size of your RAM.
Try this page for an introduction to Java and Databases.
I can't believe that you really need nearly 4GB RAM only for text files and JSON.
I see three possible solutions.
Switch to plain text if it's possible. That is not that memory hungry.
Just open and close the files as you need them. You can order the files to a specific naming convention, like the first two/three/... digits of their hashes, and open them as you need them.
If you have so many data, you could maybe switch to a database. That would save a lot of resources.
I would prefer option 3 if it's possible for you.
you can make api and get responce.body from it
I'm learning Android/Java programming, but I'm confused about persistant data.
After a lot of research it seems that the best way to store objects is to serialize them in a file, but I couldn't find a simple way to retrieve these objects.
If I create two objects and save their serialized versions, how can I retieve and list both of them? Do I need to create a file for each object with a specific ID in the filename so I can list them with a getFilesDir?
Depends on how complex those objects are (your personal preferences I guess), I have used SharedPreferences to store simple objects before, just for the sake of simplicity, while a co-worker makes generous use of SQLite, but that suits his needs.
Since you do not state what is being stored, the best advice I can give you right now is have read here, it covers how persistent data should be dealt with on Android.
"best" way? Please define your criteria.
There are databases (relational and non-relational) or file systems. You can serialize lots of ways: Java serialization, XML, Google's protobuf, and others.
Yes, you'll need a way to specify a unique representation with its object. In a relational database, you'd use a primary key. You need something like that in any system you use.
If you serialize via some mechanism to a file system, you'll have to write the object into the desired format and stream it to a file. To go the other way, specify the key for the object, read the serialized data, and parse it back into the object.