Split big object before persisting it to file - java

I have an object that occupies approximately 15GB in my java app's heap, and I need to keep it persistent between JVM restarts. I use ObjectOutputStream's writeObject method to write it to file on the disk every time interval. Since the writing process is very long (a few minutes) and causes some GC issues, I would like to split the object somehow to persist each part separately to a different file and not in a single action to a single file.
Is there a way to do this (and of course to retrieve it back from the files when I need it)?
FileOutputStream fos = null;
GZIPOutputStream gos = null;
ObjectOutputStream oos = null;
try {
fos = new FileOutputStream("some_path");
gos = new GZIPOutputStream(fos);
oos = new ObjectOutputStream(gos);
oos.writeObject(myLargeObject);
oos.flush();
gos.close();
fos.close();
} catch (Exception e) {
e.printStackTrace();
}

You may want to take a look at this answer. The reverse process of recomposing the split files is nothing more than reading from each individual file and appending to a "master" one.

Related

How do I handle storage in a Java console app that cannot use DB?

I am given an assignment where we are not allowed to use a DB or libraries but only textfile for data storage.
But it has rather complex requirements, for e.g. many validations, because of that, we need to "access the db" (i.e. read the textfile) many times.
My question is: should I create a class like this:
class SomeRepository{
static ArrayList<Users> users = new ArrayList();
public SomeRepository(){
//instantiate this class on program load
//In constructor, we read the text file, instantiate and store everything inside the arraylist.
}
//public getOneUser(){ // for get methods, we don't read from text file at all }
/public save() { //text file saving code overhere }
}
Is this a good approach to solve the above problem? Currently, what we are doing is reading and writing to the text file every time we want to retrieve some data or write something new.
Wouldn't this be too expensive in terms of heap space memory? Or should I just read/write to the text file for every method?
public class IOManager {
public static void writeObjToTxtFile(String fileName, Object object) {
File file = new File(fileName + ".txt");//File will be created in the root directory where the program runs.
try (FileOutputStream fos = new FileOutputStream(file);
ObjectOutputStream oos = new ObjectOutputStream(fos);) {
oos.writeObject(object);
} catch (IOException e) {
e.printStackTrace();
}
}
public static Object readObjFromTxtFile(String fileName) {
Object obj = null;
File file = new File(fileName + ".txt");
FileInputStream fis = null;
try {
fis = new FileInputStream(file);
ObjectInputStream ois = new ObjectInputStream(fis);
obj = ois.readObject();
} catch (ClassNotFoundException | IOException e) {
e.printStackTrace();
}
return obj;
}
}
Add this class to your project. Since it's general for all Objects, you can pass and receive Objects like these as well: ArrayList<Users>. Play around and Tinker with it to fit whatever your specific purpose is. Hint: You can write other custom methods that calls these methods. eg:
public static void writeUsersToFile(ArrayList<Users> usersArrayList){
writeObjToTxtFile("users",usersArrayList);
}
Ps. Make sure your Objects implement Serializable. Eg:
public class Users implements Serializable {
}
I would suggest reading the contents of your file to a dynamic list such as an arraylist at the start of your program. Make the required queries/changes to your arraylist and then write that arraylist to your file when the program is set to close. This will save significant time over repeated file reads/writes.
This isn't without it's drawbacks, though. You don't want to hogg up memory in case of very large files - but considering this is an assignment, that may not be the case. Additionally, should your program terminate prior to the write at the end, all changes made to your database during the current execution will be lost.

ObjectInputStream - reading large binary file - problems with memory

Before I proceed to my question : please note that I am not working on any client-server application that would require serialization, but the program I am trying to customize stores one big instance of one big class in a .dat file. I have read about this issue (memory leak in ObjectOutputStream and ObjectInputStream)and the fact that I could probably need to :
use the ObjectOutputStream.reset() method after writing the class instance in the .dat file, so that it doesn't hold the reference anymore;
re-write the code without using serialization;
split the file and read it in chunks;
change the JVM memory parameter by using -Xmx;
So, I was provided with one class that generates a language model and saves it with a .dat extension; the code was probably optimized for small model files (there are 2 model files provided as examples, both around 10MB ), but I generated a much larger model class, and it is around 40MB. Then, there is another class in another folder, totally independent on the first one, that uses this model, and the model has to be loaded using ObjectInputStream. Here comes the problem : a classic "OutOfMemoryError : Java heap space".
Writing the object:
try {
// Create an output stream to the file.
FileOutputStream file_output = new FileOutputStream (file);
ObjectOutputStream o = new ObjectOutputStream( file_output );
o.writeObject(this);
file_output.close ();
}
catch (IOException e) {
System.err.println ("IO exception = " + e );
}
Reading the object:
InputStream model = null;
ModelGeneration oRead = null;
ObjectInputStream p = null;
try {
model = new FileInputStream(filename);
BufferedInputStream buf = new BufferedInputStream(model);
p = new ObjectInputStream(buf);
oRead = (ModelGeneration) p.readObject();
p.reset();
} catch (IOException e) {
e.printStackTrace();
} catch (ClassNotFoundException e) {
e.printStackTrace();
} finally {
try {
model.close();
} catch (Exception e) {
e.printStackTrace();
}
}
I tried to use the reset() method, but it is useless because we load only one instance of one class at a time, nothing else needed. This is why I can't split the file, too: only one class instance is stored in the .dat file.
Changing the heap space seems like a worse solution than optimizing the code.
I would really appreciate your advice on what I can do.
Btw the code is here : http://svn.apache.org/repos/asf/uima/addons/trunk/Tagger/, I only implemented the required classes for a different language.
P.S. Works fine if I create a smaller model, but I would prefer the bigger one.

Writing a serialized object to file garbaged?

try
{
File dataFile = new File("C:/Users/keatit/Desktop/players.txt");
if(!dataFile.exists())
{
dataFile.createNewFile();
}
FileOutputStream fos = new FileOutputStream("C:/Users/keatit/Desktop/players.txt");
ObjectOutputStream oos = new ObjectOutputStream(fos);
oos.writeObject(players);
oos.close();
}
catch(FileNotFoundException fnfex)
{
System.out.println(fnfex.getMessage());
}
catch(IOException ioex)
{
System.out.println(ioex.getMessage());
}
I have a class player which implement Serializable but when I write objects to files the text is messed up and looks like the following. Any help would be much appreciated. Thank you.
"¬í sr java.util.ArrayListxÒ™Ça I sizexp w sr players.playerÌ`~%×êòœ I ageL firstNamet Ljava/lang/String;xp t Trevorsq ~ t Michaelax"
This is binary serialization. It's not meant to be writing to a human-readable text file. For that, you should look into something like JSON or YAML. I'd strongly recommend against writing to a .txt file using ObjectOutputStream - it gives the wrong impression.
The point of binary serialization is to be able to deserialize it later with the same serialization protocol - so in this case you'd use ObjectInputStream. You should find that that is able to correctly deserialize the object stored in your file.
(Side-note: FileOutputStream will create a new file automatically if it doesn't exist - you don't need to do so yourself. Additionally, you should use a try-with-resources statement to clean up automatically, rather than just calling close() outside a finally block.)

Serializing a HashMap but Text File for input is clearing self every program run. -- Java

I have already looked at other problems with these but I don't think any of them match mine.
The gist of my program is to create a HashMap of stocks (key is ticker and value is a Stock object) and then when the program ends to export the hashMap into a text file. Next time I run the program I would read in the hashMap and continue the program. So far all my other functionalists work and I can look at the text file after I run the program and I see some code in there but when I run again the text file is cleared.
I suspect that when I declare a new FileInputStream and ObjectInputStream they are somehow deleting the information in that text file and making it empty.
My code is as follows:
stockInfo = new HashMap<String, Stock>(10000);
Scanner in = new Scanner(System.in);
File file = new File("mp4output.txt");
fos = new FileOutputStream(file);
oos = new ObjectOutputStream(fos);
fis = new FileInputStream(file);
ois = new ObjectInputStream(fis);
Is how I declare my I/O streams to read in the hashMap.
Next I try to actually read in the hashMap with
try {
while (fis.available() > 0) {
Stock test = (Stock) ois.readObject();
System.out.println("Stock: " + test.getCompany());
System.out.println("High: " + test.getHigh());
System.out.println("Volume: " + test.getHigh());
System.out.println("Low: " + test.getLow());
System.out.println("Close: " + test.getClose());
System.out.println("Open: " + test.getOpen());
System.out.println("Range: " + test.getRange());
System.out.println("52 Week average: " + test.getFiftyAvg());
System.out.println("Current Price: " + test.getcurrentPrice());
}
}
catch (Exception ex) {
ex.printStackTrace();
}
However it never runs this because fis.available() always returns 0 because the file empties itself.
I feel like I have made a very dumb error somewhere but I cannon find it. Any help would be appreciated!
The FileOutputStream will - by default (or, whenever not created for appending) - truncate the output file. This truncation happens when the FOS is created, and before the FIS has a chance to read the data.
A general solution in a case like this - where the input and output are to the same file - is to read all the input, close the input stream, and then open the output stream and write the new data. In this case the truncation behavior works just fine.
With this in mind, a skeleton may look like this:
Map<String, Stock> readStocks (ObjectInputStream ois) {
// Read all
}
void writeStocks (ObjectOutputStream oos, Map<String, Stock> stocks) {
// Write all
}
Map<String, Stock> stocks;
// Use try-with-resources (Java 7+) to make life easier;
// the OIS (and underlying FIS) are guaranteed to be closed
// after this block ends.
try (ObjectInputStream ois = new ObjectInputStream(
new FileInputStream(file)) {
stocks = readStocks(ois, stocks);
}
// Make changes to loaded data
// (i.e. in accordance with user-supplied input)
updateStockData(stocks);
try (ObjectOutputStream oos = new ObjectOutputStream(
new FileOutputStream(file)) {
writeStocks(oos, stocks);
}
The Object Input/Output Streams are merely wrappers over the underlying FIS/FOS, but aren't responsible for the lower-level truncation or IO.
Also, while OIS/OOS are probably sufficient for this task, I would recommend using JSON because using JSON is "relatively painless" with POJO mappers (e.g. see Gson), and - really why I recommend it - the output is human consumable text.
Also, it's not well-defined across operating systems as to how multiple FIS/FOS objects on the same source work. The documentation leaves this vague note (but doesn't specify anything about interactions between FIS/FOS objects where an exception is not thrown).
A file output stream is an output stream for writing data to a File or to a FileDescriptor. in particular, allow a file to be opened for writing by only one FileOutputStream (or other file-writing object) at a time. In such situations the constructors in this class will fail if the file involved is already open.

Size of object serialized in File

How to mesure size of my object, it means size of memory taked (occuped) by my object serialized please ?
FileOutputStream fos = new FileOutputStream("personne.serial");
ObjectOutputStream oos = new ObjectOutputStream(fos);
try {
oos.writeObject(p);
oos.flush();
} finally {
try {
oos.close();
} finally {
fos.close();
}
}
As far as I am aware . The amount of memory an object occupies is implementation dependent it might be different on different JVMs and also depend the platform the jvm is running in. I do not believe that using stand apis its possible to calculate this accurately.

Categories