How to persist large strings in a POJO? - java

If I have a property of an object which is a large String (say the contents of a file ~ 50KB to 1 MB, maybe larger), what is the practice around declaring such a property in a POJO? All I need to do is to be able to set a value from one layer of my application and transfer it to another without making the object itself "heavy".
I was considering if it makes sense to associate an InputStream or OutputStream to get / set the value, rather than reference the String itself - which means when I attempt to read the value of the contents, I read it as a stream of bytes, rather than a whole huge string loaded into memory... thoughts?

What you're describing depends largely on your anticipated use of the data. If you're delivering the contents in raw form, then there may be more efficient ways to manage it.
For example, if your app has a web interface, your app may just provide a URL for a web server to stream the contents to the requester. If it's a CLI-based app, you may be able to get away with a simple file copy. If your app is processing the file, however, then perhaps your POJO could retain only the results of that processing rather than the raw data itself.
If you wish to provide a general pattern along the lines of using POJO's with references to external streams, I would suggest storing in your POJO something akin to a URI that tells where to find the stream (like a row ID in a database or a filename or a URI) rather than storing an instance of the stream itself. In doing so, you'll reduce the number of open file handles, prevent potential concurrency issues, and will be able to serialize those objects locally if needed without having to duplicate the raw data persisted elsewhere.

You could have an object that supplies a stream or an iterator every time you access it. Note that the content has to live on some storage, like a file. I.e your object will store a pointer (e.g. a file path) to the storage and every time someone access it, you open a stream or create an iterator and let that party read. Note also that in order to save on memory, whoever consumes it has to make sure not to store the whole content in memory.
However, 50KB or 1MB is really tiny. Unless you have like gigabytes (or maybe hundred megabytes), I wouldn't try to do something like that.
Also, even if you have large data, it's often simpler to just use files or whatever storage you'll use.
tl;dr: Just use String.

Related

In Java, can I remove specific bytes from a file?

So far I managed to do something with Byte Stream : read the original file, and write in a new file while omitting the desired bytes (and then finish by deleting/renaming the files so that there's only one left).
I'd like to know if there's a way to directly modify the bytes without requiring to manipulate more than one file. The reason is because this has to be performed when there is low memory and the file is too big, so cloning the file before trimming it may not be the best option.
I'd like to know if there's a way to directly modify the bytes without requiring to manipulate more than one file.
There isn't a SAFE way to do it.
The unsafe way to do it involves (for example) mapping the file using a MappedByteBuffer, and shuffling the bytes around.
But the problem is that if something goes wrong while you are doing this, you are liable to end up with a corrupted file.
Therefore, if the user asks to perform this operation when the device's memory is too full to hold a second copy of the file, the best thing is to tell the user to "delete some files first".
The reason is because this has to be performed when there is low memory and the file is too big, so cloning the file before trimming it may not be the best option.
If you are primarily worried about "memory" on the storage device, see above.
If you are worried about RAM, then #RealSkeptic's observation is correct. You shouldn't need to hold the entire file in RAM at the same time. You can read, modify, write it a buffer at a time.
You can't remove bytes in the middle of the file without placing the rest of the file in memory. But you can replace bytes if it can help you.

Deserializing Objects in Java

Say I have a large file with many objects already serialized (this is the easy part). I need to be able to have random access to the objects in the file when I go to deserialize. The only way I can think to do this would be to somehow store the file pointer to each object.
Basically I will end up with a large file of serialized objects and don't want to deserialize the entire file when I go to retrieve just one object.
Can anyone point me in the right direction on this one?
You can't. Serialization is called serialization for a reason. It is serial. Random access into a stream of objects will not work, for several reasons including the stream header, object handles, ...
Straight serialization will never be the solution you want.
The serial portion of the name means that the objects are written linearly to the ObjectOutputStream.
The serialization format is well known,
here is a link to the java 6 serialization format.
You have several options:
Unserialize the entire file and go from there.
Write code to read the serialized file and generate an index.
Maybe even store the index in a file for future use.
Abandon serialization to a file and store the objects in a database.

My JSON files are too big to fit into memory, what can I do?

In my program, I am reading a series of text files from the disk. With each text file, I process out some data and store the results as JSON on the disk. In this design, each file has its own JSON file. In addition to this, I also store some of the data in a separate JSON file, which stores relevant data from multiple files. My problem is that the shared JSON grows larger and larger with every file parsed, and eventually uses too much memory. I am on a 32-bit machine and have 4 GB of RAM, and cannot increase the memory size of the Java VM anymore.
Another constraint to consider is that I often refer back to the old JSON. For instance, say I pull out ObjX from FileY. In pseudo code, the following happens (using Jackson for JSON serialization/deserialization):
// In the main method.
FileYJSON = parse(FileY);
ObjX = FileYJSON.get(some_key);
sharedJSON.add(ObjX);
// In sharedJSON object
List objList;
function add(obj)
if (!objList.contains(obj))
objList.add(obj);
The only thing I can think to do is use streaming JSON, but the problem is that I frequently need to access the JSON that came before, so I don't know that stream will work. Also my data types on not only strings, which prevents me from using Jackson's streaming capabilities (I believes). Does anyone know of a good solution?
If you're getting to the point where your data structures are so large that you're running out of memory, you'll have to start using something else. I would recommend that you use a database, which will significantly speed up data retrieval and storage. It will also make the limit of your data structure the size of your hard drive, instead of the size of your RAM.
Try this page for an introduction to Java and Databases.
I can't believe that you really need nearly 4GB RAM only for text files and JSON.
I see three possible solutions.
Switch to plain text if it's possible. That is not that memory hungry.
Just open and close the files as you need them. You can order the files to a specific naming convention, like the first two/three/... digits of their hashes, and open them as you need them.
If you have so many data, you could maybe switch to a database. That would save a lot of resources.
I would prefer option 3 if it's possible for you.
you can make api and get responce.body from it

Rapidly changing Configuration/Status File? JAVA

I need some way to store a configuration/status file that needs to be changed rapidly. The status of each key value pair (key-value) is stored in that file. The status needs to be changed rather too rapidly as per the status of a communication (Digital multimedia broadcasting) hardware.
What is the best way to go about creating such a file? ini? XML? Any off the shelf filewriter in Java? I can't use databases.
It sounds like you need random access to update parts of the file frequently without re-writing the entire file. Design binary file format and use RandomAccessFile API to read/write it. You are going to want to use fixed number of bytes for key and for value, such that you can index into the middle of the file and update the value without having to re-write all of the following records. Basically, you would be re-implementing how a database stores a table.
Another alternative is to only store a single key-value pair per file such that the cost of re-writing the file is minor. Maybe you can think of a way to use file name as the key and only store value in the file content.
I'd be inclined to try the second option unless you are dealing with more than a few thousand records.
The obvious solution would be to put the "configuration" information into a Properties object, and then use Properties.store(...) or Properties.storeToXML(...) to save to a file output stream or writer.
You also need to do something to ensure that whatever is reading the file will see a consistent snapshot. For instance, you could write to a new file each time and do a delete / rename dance to replace the the old with the new.
But if the update rate for the file is too high, you are going to create a lot of disc traffic, and you are bound slow down your application. This is going to apply (eventually) no matter what file format / API you use. So, you may want to consider not writing to a file at all.
At some point, configuration that changes too rapidly becomes "program state" and not configuration. If it is changing so rapidly, why do you have confidence that you can meaningfully write it to, and then read it from, a filesystem?
Say more about what the status is an who the consumer of the data is...

best practices question: How to save a collection of images and a java object in a single file? File is read to be rendered

I am making a java program that has a collection of flash-card like objects. I store the objects in a jtree composed of defaultmutabletreenodes. Each node has a user object attached to it with has a few string/native data type parameters. However, i also want each of these objects to have an image (typical formats, jpg, png etc).
I would like to be able to store all of this information, including the images and the tree data to the disk in a single file so the file can be transferred between users and the entire tree, including the images and parameters for each object, can be reconstructed.
I had not approached a problem like this before so I was not sure what the best practices were. I found XLMEncoder (http://java.sun.com/j2se/1.4.2/docs/api/java/beans/XMLEncoder.html) to be a very effective way of storing my tree and the native data type information. However I couldn't figure out how to save the image data itself inside of the XML file, and I'm not sure it is possible since the data is binary (so restricted characters would be invalid). My next thought was to associate a hash string instead of an image within each user object, and then gzip together all of the images, with the hash strings as the names and the XMLencoded tree in the same compmressed file. That seemed really contrived though.
Does anyone know a good approach for this type of issue?
THanks!
Thanks!
Assuming this isn't just a serializable graph, consider bundling the files together in Jar format. If you already have your data structures working with XMLEncoder, you can reuse this code by saving the data as a jar entry.
If memory serves, the jar library has better support for Unicode name entries than the zip package, which is why I would favour it.
You might consider using an MS JET database (.mdb file) and storing all the stuff in there. That'll also make it easy to examine and edit the data in (for example) MS Access.
You can employ some virtual file system, which stores it's data in a single container. We develop and offer one of such files sytems, SolFS, however right now there's no Java binding for it. We will release Java JNI interface for SolFS within a month.

Categories