Friends,
In my application, i came across an scenario, where the user may request for an Report download as a flat file, which may have max of 17 Lakhs records (around 650 MB) of Data. During this request either my application server stops serving other threads or occurs out of memory exception.
As of now i am iterating through the result set and printing it to the file.
When i Google out for this, i came across an API named OpenCSV. I tried that too but i didn't see any improvement in the performance.
Please help me out on this.
Thanks for the quick response guys, Here i added my code snap
try {
response.setContentType("application/csv");
PrintWriter dout = response.getWriter();
while(rs.next()) {
dout.print(data row); // Here i am printing my ResultSet tubles into flat file.
dout.print("\r\n");
dout.flush();
}
OpenCSV will cleanly deal with the eccentricities of the CSV format, but a large report is still a large report. Take a look at the specific memory error, it sounds like you need to increase the Heap or Max Perm Gen space (it will depend of the error to be sure). Without any adjusting the JVM will only occupy s fixed amount of RAM (my experience is this number is 64 MB).
If you only stream the data from resultset to file without using big buffers this should be possible, but maybe you are first collecting the data in a growing list before sending to file? So you should investigate this issue.
Please specify your question more otherwise we have to speculate.
CSV format aren't limited by memory anymore --well, maybe only during prepopulating the data for CSV, but this can be done efficiently as well, for example querying subsets of rows from DB using for example LIMIT/OFFSET and immediately write it to file instead of hauling the entire DB table contents into Java's memory before writing any line. The Excel limitation of the amount rows in one "sheet" will increase to about one million.
Most decent DB's have an export-to-CSV function which can do this task undoubtely much more efficient. In case of for example MySQL, you can use the LOAD DATA INFILE command for this.
Related
Quick question, I have lost of average 2-3k data to upload, may be 2000 request/second, two options:
Use nginx-upload-module to save file to disk and then read file in java
Just send all data to java directly, java save data to file if cannot process in time
which one should I use and why?
Thanks in advance!
I need to order a huge csv file (10+ million records) with several algorithms in Java but I've some problem with memory amount.
Basically I have a huge csv file where every record has 4 fields, with different type (String, int, double).
I need to load this csv into some structure and then sort it by all fields.
What was my idea: write a Record class (with its own fields), start read csv file line by line, make a new Record object for every line and then put them into an ArrayList. Then call my sorter algorithms for each field.
It doesn't work.. I got and OutOfMemoryException when I try lo load all Record object into my ArrayList.
In this way I create tons of object and I think that is not a good idea.
What should I do when I have this huge amount of data? Which method/data structure can ben less expensive in terms of memory usage?
My point is just to use sort algs and look how they work with big set of data, it's not important save the result of sorting into a file.
I know that there are some libs for csv, but I should implements it without external libs.
Thank you very much! :D
Cut your file into pieces (depending on the size of the file) and look into merge sort. That way you can sort even big files without using a lot of memory, and it's what databases use when they have to do huge sorts.
I would use an in memory database such as h2 in in-memory-mode (jdbc:h2:mem:)
so everything stays in ram and isn't flushed to disc (provided you have enough ram, if not you might want to use the file based url). Create your table in there and write every row from the csv. Provided you set up the indexes properly sorting and grouping will be a breeze with standard sql
i am able to load a huge text file data into database where the number of lines are 33264591.
I used normal BufferedReader for reading line by line and able to push the data.
Here it is taking enormous time for loading almost 3 hrs for reading line by line and insert into database.
Could some one suggest me better way for Quick insertion of data using java?
Thank you in advance
Well, before going any further, I would suggest using a profiler and finding out why it takes so much time. If you know where the problem is, it would be easier to fix.
I believe the best way to read huge files are using BufferedReader and reading it line by line. So this is what you are doing. I am wondering if you are inserting the data in the same loop where you are reading the file. The only optimization i can think of in your scenario is to do the database inserts in a separate thread so that you should not block your file reading because of any delay in the DB inserts. DB inserts will gradually start to become slower and slower as the size of your table grows. So doing DB inserts in a separate thread will be a good idea.
Do batch inserts instead of inserting one row at a time.
In my program, I am reading a series of text files from the disk. With each text file, I process out some data and store the results as JSON on the disk. In this design, each file has its own JSON file. In addition to this, I also store some of the data in a separate JSON file, which stores relevant data from multiple files. My problem is that the shared JSON grows larger and larger with every file parsed, and eventually uses too much memory. I am on a 32-bit machine and have 4 GB of RAM, and cannot increase the memory size of the Java VM anymore.
Another constraint to consider is that I often refer back to the old JSON. For instance, say I pull out ObjX from FileY. In pseudo code, the following happens (using Jackson for JSON serialization/deserialization):
// In the main method.
FileYJSON = parse(FileY);
ObjX = FileYJSON.get(some_key);
sharedJSON.add(ObjX);
// In sharedJSON object
List objList;
function add(obj)
if (!objList.contains(obj))
objList.add(obj);
The only thing I can think to do is use streaming JSON, but the problem is that I frequently need to access the JSON that came before, so I don't know that stream will work. Also my data types on not only strings, which prevents me from using Jackson's streaming capabilities (I believes). Does anyone know of a good solution?
If you're getting to the point where your data structures are so large that you're running out of memory, you'll have to start using something else. I would recommend that you use a database, which will significantly speed up data retrieval and storage. It will also make the limit of your data structure the size of your hard drive, instead of the size of your RAM.
Try this page for an introduction to Java and Databases.
I can't believe that you really need nearly 4GB RAM only for text files and JSON.
I see three possible solutions.
Switch to plain text if it's possible. That is not that memory hungry.
Just open and close the files as you need them. You can order the files to a specific naming convention, like the first two/three/... digits of their hashes, and open them as you need them.
If you have so many data, you could maybe switch to a database. That would save a lot of resources.
I would prefer option 3 if it's possible for you.
you can make api and get responce.body from it
I am using the latest POI 3.5 for Excel reading . I have Excel MS office 2007 installed and for that poi is providing XSSF for executing the data.
For 15000 lines of data it is executing properly, but when exceeding the limit till 30000 or 100000 or 200000, it is prone to a Java heap space Exception.
Code is below :
UATinput = new FileInputStream(UATFilePath);
uatBufferedInputStream = new BufferedInputStream(UATinput);
UATworkbook = new XSSFWorkbook(uatBufferedInputStream);
I am getting the Exception in the last line for Java heap size.
I have increased the size using -Xms256m -Xmx1536m, but still for more data it is giving the Java heap space Exception.
Can anybody help me out for this Exception for the XSSFWorbook?
Instead of reading the entire file in memory try using the eventusermodel api
This is a very memory efficient way to read large files. It works on the principle of SAX parser (as opposed to DOM) in the sense that it will call callback methods when particular data structures are encountered. It might get a little tricky as it expects you to know the nitty-gritty of the underlying data
Here you can find a good tutorial on this topic
Hope this helps!
Its true guys, after using the UserEventModel, my performance was awesome. Please write to me, if you guys have any issues. djeakandane#gmail.com
If you use XSSFWorkbook, POI has to create a memory model containing your whole Excel file, thus a huge memory consumption. Maybe you could use the Event API which isn't as simple as the user API but allows lower memory consumption.
By the way you could also set a bigger value for -Xmx...
The other thing to watch in your own code is how many objects you are "new"ing. If you are creating a lot of objects as you read through cells, it could exhaust the heap as well. Make sure you are being careful with the number of objects you create.
As others have said, your best bet is to switch over the the Event API
One thing that'll make a small difference though is to not wrap your file in an input stream! XSSF will happily accept a File as the input, and that's a lower memory footprint than an InputStream. That's because POI needs random access to the contents, and with an input stream the only way to do that is to buffer the whole contents into memory. With a File, it can just seek around. Using a File rather than an InputStream will save you a little over the size of the file worth of memory.
If you can, you should pass a File. If memory is tight, write your InputStream to a file and use that!
Try this one: -Xms256m -Xmx512m.
You should really look forward to process the XML data grid behind XLSX technology. You will be liberated from the heap space problems.
Here is the tutorial:
Check both links below.
http://poi.apache.org/spreadsheet/how-to.html
http://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xssf/eventusermodel/examples/FromHowTo.java
Some basic knowledge of parsing and the use of the SAX-XML project is required.
The JVM runs with fixed available memory. Once this memory is exceed you will receive "java.lang.OutOfMemoryError". The JVM tries to make an intelligent choice about the available memory at startup (see Java settings for details) but you can overwrite the default with the following settings.
To turn performance you can use certain parameters in the JVM.
Xms1024m - Set the minimum available memory for the JVM to 1024 Megabyte
Xmx1800m - Set the maximum available memory for the JVM to 1800 Megabyte. The Java application cannot use more heap memory then defined via this parameter.
If you start your Java program from the command line use for example the following setting: java -Xmx1024m YourProgram.
You can use SXSSF, A low-memory footprint SXSSF API built on top of XSSF. "http://poi.apache.org/spreadsheet/how-to.html#sxssf"