ObjectInputStream - reading large binary file - problems with memory - java

Before I proceed to my question : please note that I am not working on any client-server application that would require serialization, but the program I am trying to customize stores one big instance of one big class in a .dat file. I have read about this issue (memory leak in ObjectOutputStream and ObjectInputStream)and the fact that I could probably need to :
use the ObjectOutputStream.reset() method after writing the class instance in the .dat file, so that it doesn't hold the reference anymore;
re-write the code without using serialization;
split the file and read it in chunks;
change the JVM memory parameter by using -Xmx;
So, I was provided with one class that generates a language model and saves it with a .dat extension; the code was probably optimized for small model files (there are 2 model files provided as examples, both around 10MB ), but I generated a much larger model class, and it is around 40MB. Then, there is another class in another folder, totally independent on the first one, that uses this model, and the model has to be loaded using ObjectInputStream. Here comes the problem : a classic "OutOfMemoryError : Java heap space".
Writing the object:
try {
// Create an output stream to the file.
FileOutputStream file_output = new FileOutputStream (file);
ObjectOutputStream o = new ObjectOutputStream( file_output );
o.writeObject(this);
file_output.close ();
}
catch (IOException e) {
System.err.println ("IO exception = " + e );
}
Reading the object:
InputStream model = null;
ModelGeneration oRead = null;
ObjectInputStream p = null;
try {
model = new FileInputStream(filename);
BufferedInputStream buf = new BufferedInputStream(model);
p = new ObjectInputStream(buf);
oRead = (ModelGeneration) p.readObject();
p.reset();
} catch (IOException e) {
e.printStackTrace();
} catch (ClassNotFoundException e) {
e.printStackTrace();
} finally {
try {
model.close();
} catch (Exception e) {
e.printStackTrace();
}
}
I tried to use the reset() method, but it is useless because we load only one instance of one class at a time, nothing else needed. This is why I can't split the file, too: only one class instance is stored in the .dat file.
Changing the heap space seems like a worse solution than optimizing the code.
I would really appreciate your advice on what I can do.
Btw the code is here : http://svn.apache.org/repos/asf/uima/addons/trunk/Tagger/, I only implemented the required classes for a different language.
P.S. Works fine if I create a smaller model, but I would prefer the bigger one.

Related

How do I handle storage in a Java console app that cannot use DB?

I am given an assignment where we are not allowed to use a DB or libraries but only textfile for data storage.
But it has rather complex requirements, for e.g. many validations, because of that, we need to "access the db" (i.e. read the textfile) many times.
My question is: should I create a class like this:
class SomeRepository{
static ArrayList<Users> users = new ArrayList();
public SomeRepository(){
//instantiate this class on program load
//In constructor, we read the text file, instantiate and store everything inside the arraylist.
}
//public getOneUser(){ // for get methods, we don't read from text file at all }
/public save() { //text file saving code overhere }
}
Is this a good approach to solve the above problem? Currently, what we are doing is reading and writing to the text file every time we want to retrieve some data or write something new.
Wouldn't this be too expensive in terms of heap space memory? Or should I just read/write to the text file for every method?
public class IOManager {
public static void writeObjToTxtFile(String fileName, Object object) {
File file = new File(fileName + ".txt");//File will be created in the root directory where the program runs.
try (FileOutputStream fos = new FileOutputStream(file);
ObjectOutputStream oos = new ObjectOutputStream(fos);) {
oos.writeObject(object);
} catch (IOException e) {
e.printStackTrace();
}
}
public static Object readObjFromTxtFile(String fileName) {
Object obj = null;
File file = new File(fileName + ".txt");
FileInputStream fis = null;
try {
fis = new FileInputStream(file);
ObjectInputStream ois = new ObjectInputStream(fis);
obj = ois.readObject();
} catch (ClassNotFoundException | IOException e) {
e.printStackTrace();
}
return obj;
}
}
Add this class to your project. Since it's general for all Objects, you can pass and receive Objects like these as well: ArrayList<Users>. Play around and Tinker with it to fit whatever your specific purpose is. Hint: You can write other custom methods that calls these methods. eg:
public static void writeUsersToFile(ArrayList<Users> usersArrayList){
writeObjToTxtFile("users",usersArrayList);
}
Ps. Make sure your Objects implement Serializable. Eg:
public class Users implements Serializable {
}
I would suggest reading the contents of your file to a dynamic list such as an arraylist at the start of your program. Make the required queries/changes to your arraylist and then write that arraylist to your file when the program is set to close. This will save significant time over repeated file reads/writes.
This isn't without it's drawbacks, though. You don't want to hogg up memory in case of very large files - but considering this is an assignment, that may not be the case. Additionally, should your program terminate prior to the write at the end, all changes made to your database during the current execution will be lost.

Copied DocumentFile has different siize and hash to original

I'm attempting to copy / duplicate a DocumentFile in an Android application, but upon inspecting the created duplicate, it does not appear to be exactly the same as the original (which is causing a problem, because I need to do an MD5 check on both files the next time a copy is called, so as to avoid overwriting the same files).
The process is as follows:
User selects a file from a ACTION_OPEN_DOCUMENT_TREE
Source file's type is obtained
New DocumentFile in target location is initialised
Contents of first file is duplicated into second file
The initial stages are done with the following code:
// Get the source file's type
String sourceFileType = MimeTypeMap.getSingleton().getExtensionFromMimeType(contextRef.getContentResolver().getType(file.getUri()));
// Create the new (empty) file
DocumentFile newFile = targetLocation.createFile(sourceFileType, file.getName());
// Copy the file
CopyBufferedFile(new BufferedInputStream(contextRef.getContentResolver().openInputStream(file.getUri())), new BufferedOutputStream(contextRef.getContentResolver().openOutputStream(newFile.getUri())));
The main copy process is done using the following snippet:
void CopyBufferedFile(BufferedInputStream bufferedInputStream, BufferedOutputStream bufferedOutputStream)
{
// Duplicate the contents of the temporary local File to the DocumentFile
try
{
byte[] buf = new byte[1024];
bufferedInputStream.read(buf);
do
{
bufferedOutputStream.write(buf);
}
while(bufferedInputStream.read(buf) != -1);
}
catch (IOException e)
{
e.printStackTrace();
}
finally
{
try
{
if (bufferedInputStream != null) bufferedInputStream.close();
if (bufferedOutputStream != null) bufferedOutputStream.close();
}
catch (IOException e)
{
e.printStackTrace();
}
}
}
The problem that I'm facing, is that although the file copies successfully and is usable (it's a picture of a cat, and it's still a picture of a cat in the destination), it is slightly different.
The file size has changed from 2261840 to 2262016 (+176)
The MD5 hash has changed completely
Is there something wrong with my copying code that is causing the file to change slightly?
Thanks in advance.
Your copying code is incorrect. It is assuming (incorrectly) that each call to read will either return buffer.length bytes or return -1.
What you should do is capture the number of bytes read in a variable each time, and then write exactly that number of bytes. Your code for closing the streams is verbose and (in theory1) buggy as well.
Here is a rewrite that addresses both of those issues, and some others as well.
void copyBufferedFile(BufferedInputStream bufferedInputStream,
BufferedOutputStream bufferedOutputStream)
throws IOException
{
try (BufferedInputStream in = bufferedInputStream;
BufferedOutputStream out = bufferedOutputStream)
{
byte[] buf = new byte[1024];
int nosRead;
while ((nosRead = in.read(buf)) != -1) // read this carefully ...
{
out.write(buf, 0, nosRead);
}
}
}
As you can see, I have gotten rid of the bogus "catch and squash exception" handlers, and fixed the resource leak using Java 7+ try with resources.
There are still a couple of issues:
It is better for the copy function to take file name strings (or File or Path objects) as parameters and be responsible for opening the streams.
Given that you are doing block reads and writes, there is little value in using buffered streams. (Indeed, it might conceivably be making the I/O slower.) It would be better to use plain streams and make the buffer the same size as the default buffer size used by the Buffered* classes .... or larger.
If you are really concerned about performance, try using transferFrom as described here:
https://www.journaldev.com/861/java-copy-file
1 - In theory, if the bufferedInputStream.close() throws an exception, the bufferedOutputStream.close() call will be skipped. In practice, it is unlikely that closing an input stream will throw an exception. But either way, the try with resource approach will deals with this correctly, and far more concisely.

Split big object before persisting it to file

I have an object that occupies approximately 15GB in my java app's heap, and I need to keep it persistent between JVM restarts. I use ObjectOutputStream's writeObject method to write it to file on the disk every time interval. Since the writing process is very long (a few minutes) and causes some GC issues, I would like to split the object somehow to persist each part separately to a different file and not in a single action to a single file.
Is there a way to do this (and of course to retrieve it back from the files when I need it)?
FileOutputStream fos = null;
GZIPOutputStream gos = null;
ObjectOutputStream oos = null;
try {
fos = new FileOutputStream("some_path");
gos = new GZIPOutputStream(fos);
oos = new ObjectOutputStream(gos);
oos.writeObject(myLargeObject);
oos.flush();
gos.close();
fos.close();
} catch (Exception e) {
e.printStackTrace();
}
You may want to take a look at this answer. The reverse process of recomposing the split files is nothing more than reading from each individual file and appending to a "master" one.

Saving and Loading Custom Objects in Java Program

I am writing a small program to help with planning future workouts. I am nearly finished however saving and loading is giving me some trouble. The program works with a list of "ride"(a custom class) objects that hold a number of qualities (like a Dat, and then some ints and doubles)
right now, I have two methods, a "saver" and a "loader":
public void saver() {
try{ // Catch errors in I/O if necessary.
// Open a file to write to, named SavedObj.sav.
FileOutputStream saveFile=new FileOutputStream("SaveObj.sav");
// Create an ObjectOutputStream to put objects into save file.
ObjectOutputStream save = new ObjectOutputStream(saveFile);
// Now we do the save.
for (int x = 0; x < rides.size(); x++) {
save.writeObject(rides.get(x).getDate());
save.writeObject(rides.get(x).getMinutes());
save.writeObject(0);
save.writeObject(rides.get(x).getIF());
save.writeObject(rides.get(x).getTss());
}
// Close the file.
save.close(); // This also closes saveFile.
}
catch(Exception exc){
exc.printStackTrace(); // If there was an error, print the info.
}
}
public void loader() {
try{
// Open file to read from, named SavedObj.sav.
FileInputStream saveFile = new FileInputStream("SaveObj.sav");
// Create an ObjectInputStream to get objects from save file.
ObjectInputStream save = new ObjectInputStream(saveFile);
Ride worker;
while(save.available() > 0) {
worker = new Ride((Date)save.readObject(), (int)save.readObject(), (double)save.readObject(), (double)save.readObject(), (int)save.readObject());
addRide(worker.getDate(), worker.getMinutes(), 0, worker.getIF(), worker.getTss());
}
// Close the file.
save.close(); // This also closes saveFile.
}
catch(Exception exc){
exc.printStackTrace(); // If there was an error, print the info.
}
}
When I run the program, neither "save" nor "load" return any errors. A .sav file is created when one does not exist, and is edited each time the program is executed. Yet, the program never restores data from previous sessions. Please let me know if more information is required.
Thanks in advance for the help!
Don't use available() which returns the number of bytes that can be read without blocking. It doesn't mean what all bytes were read.
If your objects are never null, you could use Object readObject() to check if all data were read from the inputstream.
Date date = null;
while( (date = (Date)save.readObject()) != null) {
worker = new Ride(date, (int)save.readObject(), (double)save.readObject(), (double)save.readObject(), (int)save.readObject());
addRide(worker.getDate(), worker.getMinutes(), 0, worker.getIF(), worker.getTss());
}
Otherwise if read values may be null, you could serialize directly the Ride object or a class containing all fields to serialize rather than unitary fields which could be null With this, the check to know if all data were read with Object readObject() could work.
Do not use available() as a condition. It just tells you whether there is some byte available for immediate reading without any delay, it does not mean the stream has reached its end.
Also you should maybe add a BufferedInputStream and BufferedOutputStream between the Object and File streams, that's almost always a good idea.
To solve your issue you could e. g. first write an integer in the save method that tells you how many objects are in the file and on load read that integer and then make a simple for loop with this amount.
Or you could throw in a PushbackInputStream in the row and then as EOF check use its read() method. It will return -1 on EOF and you can abort reading. If it returns anything else, you unread() the read byte and use the ObjectInputStream that you placed on top.

Has anyone else seen the Java XML FastInfoset library corrupt text?

I read the claims from Sun people about the wonderful space economy of not only using FastInfoSet, but using it with an external vocab. The code for this purpose is include in the most recent version (1.2.8) but it is not exactly fully documented.
For many files, this works just great for me. However, we've come up with an XML file which, when serialized from DOM with the vocab I created (using the generator in the FI library), and then read back into DOM, mismatches. The mismatches are all in PC-data.
I just call setVocabulary on the serializer and setExternalVocabulary with a map from URI to vocabulary on the reader.
I had to invent my own mechanism to actually serialize a vocabulary; there didn't seem to be one anywhere in the FI library.
One fiddly bit of business is that the org.jvnet.fastinfoset.Vocabulary class is what the generator gives you, but it's not what the parsers and serializers eat. I made arrangements to serialize these, and then use the code below to turn them into the needed objects:
private static void initializeAnalysis() {
InputStream is = FastInfosetUtils.class.getResourceAsStream(ANALYSIS_VOCAB_CLASSPATH);
try {
ObjectInputStream ois = new ObjectInputStream(is);
analysisJvnetVocab = (SerializableVocabulary) ois.readObject();
ois.close();
} catch (IOException e) {
throw new RuntimeException(e);
} catch (ClassNotFoundException e) {
throw new RuntimeException(e);
}
analysisSerializerVocab = new SerializerVocabulary(analysisJvnetVocab.getVocabulary(), false);
analysisParserVocab = new ParserVocabulary(analysisJvnetVocab.getVocabulary());
}
and then, to actually write a document:
SerializerVocabulary fullVocab = new SerializerVocabulary();
fullVocab.setExternalVocabulary(ANALYSIS_VOCAB_URI, analysisSerializerVocab, false);
// pass fullVocab to setVocabulary.
and to read:
Map<Object, Object> vocabMap = new HashMap<Object, Object>();
vocabMap.put(ANALYSIS_VOCAB_URI, analysisParserVocab);
// pass map into setExternalVocabulary
I could easily imagine that the recipe for creating serialization vocabularies is not right, it's not like I was reading a tutorial. Anyone happen to know?
UPDATE
Since no one 'round here had anything to add to this question, I make a test case and filed a bug report. Somewhat to my surprise, it turned out that it was, in fact, a bug, and a fix has been made.

Categories