I am writing a full database extract program in java. Database is Oracle, and it is huge. Some tables have ~260 million records. The program should create one file per table in a specific format, so using Oracle datapump etc is not an option. Also, some company security policies do not allow to write a PL/SQL procedure to create files on DB server for this requirement. I have to go with Java and JDBC.
The issue I am facing is that Since files for some of the table is huge (~30GB) I am running out of memory almost every time even with a 20GB Java Heap. During the creation of file when the file size exceeds the heap size, even with one of the most aggressive GC policy, the process seems to hang-up. For example if the file size is > 20GB and heap size is 20GB, once heap utilization hits max heap size, its slows down writing 2MB per minute or so and at this speed, it will take months to get full extract.
I am looking for some way to overcome this issue. Any help would be greatly appreciated.
Here are some details of the system configuration I have:
Java - JDK1.6.0_14
System config - RH Enterprise Linux (2.6.18) running on 4 X Intel Xeon E7450 (6 cores) #2.39GH
RAM - 32GB
Database Oracle 11g
file wirting part of the code goes below:
private void runQuery(Connection conn, String query, String filePath,
String fileName) throws SQLException, Exception {
PreparedStatement stmt = null;
ResultSet rs = null;
try {
stmt = conn.prepareStatement(query,
ResultSet.TYPE_SCROLL_INSENSITIVE,
ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(maxRecBeforWrite);
rs = stmt.executeQuery();
// Write query result to file
writeDataToFile(rs, filePath + "/" + fileName, getRecordCount(
query, conn));
} catch (SQLException sqle) {
sqle.printStackTrace();
} finally {
try {
rs.close();
stmt.close();
} catch (SQLException ex) {
throw ex;
}
}
}
private void writeDataToFile(ResultSet rs, String tempFile, String cnt)
throws SQLException, Exception {
FileOutputStream fileOut = null;
int maxLength = 0;
try {
fileOut = new FileOutputStream(tempFile, true);
FileChannel fcOut = fileOut.getChannel();
List<TableMetaData> metaList = getMetaData(rs);
maxLength = getMaxRecordLength(metaList);
// Write Header
writeHeaderRec(fileOut, maxLength);
while (rs.next()) {
// Now iterate on metaList and fetch all the column values.
writeData(rs, metaList, fcOut);
}
// Write trailer
writeTrailerRec(fileOut, cnt, maxLength);
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
try {
fileOut.close();
} catch (IOException ioe) {
fileOut = null;
throw new Exception(ioe.getMessage());
}
}
}
private void writeData(ResultSet rs, List<TableMetaData> metaList,
FileChannel fcOut) throws SQLException, IOException {
StringBuilder rec = new StringBuilder();
String lf = "\n";
for (TableMetaData tabMeta : metaList) {
rec.append(getFormattedString(rs, tabMeta));
}
rec.append(lf);
ByteBuffer byteBuf = ByteBuffer.wrap(rec.toString()
.getBytes("US-ASCII"));
fcOut.write(byteBuf);
}
private String getFormattedString(ResultSet rs, TableMetaData tabMeta)
throws SQLException, IOException {
String colValue = null;
// check if it is a CLOB column
if (tabMeta.isCLOB()) {
// Column is a CLOB, so fetch it and retrieve first clobLimit chars.
colValue = String.format("%-" + tabMeta.getColumnSize() + "s",
getCLOBString(rs, tabMeta));
} else {
colValue = String.format("%-" + tabMeta.getColumnSize() + "s", rs
.getString(tabMeta.getColumnName()));
}
return colValue;
}
Its probably due to the way you call prepareStatement, see this question for a similar problem. You don't need scrollability and a ResultSet will be read-only be default so just call
stmt = conn.prepareStatement(query);
Edit:
Map your database tables to Class usig JPA.
Now load collection of Objects from DB using Hibernate in the Batch of some tolerable size and serialize it to FILE .
Is your algorithm like the following? This is assuming a direct mapping between DB rows and lines in the file:
// open file for writing with buffered writer.
// execute JDBC statement
// iterate through result set
// convert rs to file format
// write to file
// close file
// close statement/rs/connection etc
Try using Spring JDBC Template to simplify the JDBC portion.
I believe this must be possible on default 32 MB java heap. Just fetch each row, save the data to file stream, flash and close once done.
What value are you using for maxRecBeforWrite?
Perhaps the query of the max record length is defeating your setFetchSize by forcing JDBC to scan the entire result for record length? Maybe you could delay writing your header and note the max record size on the fly.
Related
public static void main(String[] args) throws SQLException{
Connection myconn=null;
Statement stat=null;
PreparedStatement st=null;
CallableStatement cal=null;
ResultSet res=null;
String dburl="jdbc:mysql://localhost:3306/dbms";
String user="root";
String password="admin";
int charread;
try {
myconn=DriverManager.getConnection(dburl, user, password);
System.out.println("connected to database successfully");
File f=new File("test.txt");
stat=myconn.createStatement();
res=stat.executeQuery("select exp_clob from dbmslabprgm1test.new_table where ID=1");
FileWriter writer=new FileWriter(f);
System.out.println("Reading clob from db");
/*while(res.next()) {
System.out.println(res.getString(1));
}*///This comment was written to check whether clob is retrieved correctly into console.It printed the whole clob correctly
if(res.next()) {
Reader reader=res.getCharacterStream("exp_clob");
char[] buf=new char[1024];
while((charread=reader.read(buf))!=1) {
writer.write(buf,0,charread);
System.out.println(".");
}
}
}
catch(Exception e) {e.printStackTrace();}
I ran this code on eclipse IDE.It ran for sometime(I think it entered some sort of infinite loop and consumed memory a lot)and later eclipse said that it had some internal error(said that memory consumption was critically high) and stopped responding.Later I opened test.txt in file explorer even this text file made notepad not to respond.by the way,similar code ran for blob retrieval correctly.I think there is no problem with resultset because the commented part runs perfectly.
The line
while(reader.read(buf)!=1)
reads from the buffer until the length read is exactly 1 - this is unlikely ever to happen so you get a endless loop.
You probably meant to test for -1 which indicates the end of file.
Howerever just changing this is not enough as your next line is:
writer.write(buf);
This writes the whole of buf to the output - but the reader may not have filled the whole buffer.
You need to use something like:
for (;;) {
int charsRead = reader.read(buf);
if (charsRead < 0) {
break;
}
writer.write(buff, 0, charsRead);
}
You must also call close() on the writer once you have finished writing.
This question already has answers here:
Android SQLite database: slow insertion
(5 answers)
Closed 6 years ago.
I'm trying to parse values from a CSV file to a SQLite DB, however the file is quite large (~2,500,000 lines). I ran my program for a a few hours, printing where it was up to, but by my calculation, the file would have taken about 100 hours to parse completely, so I stopped it.
I'm going to have to run this program as a background process at least once a week, on a new CSV file that is around 90% similar to the previous one. I have come up with a few solutions to improve my program. However I don't know much about databases, so I have questions about each of my solutions.
Is there a more efficient way to read a CSV file than what I have already?
Is instantiating an ObjectOutputStream, and storing it as a BLOB significantly computationally expensive? I could directly add the values instead, but I use the BLOB later, so storing it now saves me from instantiating a new one multiple times.
Would connection pooling, or changing the way I use the Connection in some other way be more efficient?
I'm setting the URL column as UNIQUE so I can use INSERT OR IGNORE, but testing this on smaller datasets(~10000 lines) indicates that there is no performance gain compared to dropping the table and repopulating. Is there a faster way to add only unique values?
Are there any obvious mistakes I'm making? (Again, I know very little about databases)
public class Database{
public void createResultsTable(){
Statement stmt;
String sql = "CREATE TABLE results("
+ "ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, "
+ "TITLE TEXT NOT NULL, "
+ "URL TEXT NOT NULL UNIQUE, "
...
...
+ "SELLER TEXT NOT NULL, "
+ "BEAN BLOB);";
try {
stmt = c.createStatement();
stmt.executeUpdate(sql);
} catch (SQLException e) { e.printStackTrace();}
}
public void addCSVToDatabase(Connection conn, String src){
BufferedReader reader = null;
DBEntryBean b;
String[] vals;
try{
reader = new BufferedReader(new InputStreamReader(new FileInputStream(src), "UTF-8"));
for(String line; (line = reader.readLine()) != null;){
//Each line takes the form: "title|URL|...|...|SELLER"
vals = line.split("|");
b = new DBEntryBean();
b.setTitle(vals[0]);
b.setURL(vals[1]);
...
...
b.setSeller(vals[n]);
insert(conn, b);
}
} catch(){
}
}
public void insert(Connection conn, DBEntryBean b){
PreparedStatement pstmt = null;
String sql = "INSERT OR IGNORE INTO results("
+ "TITLE, "
+ "URL, "
...
...
+ "SELLER, "
+ "BEAN"
+ ");";
try {
pstmt = c.prepareStatement(sql);
pstmt.setString(Constants.DB_COL_TITLE, b.getTitle());
pstmt.setString(Constants.DB_COL_URL, b.getURL());
...
...
pstmt.setString(Constants.DB_COL_SELLER, b.getSeller());
// ByteArrayOutputStream baos = new ByteArrayOutputStream();
// oos = new ObjectOutputStream(baos);
// oos.writeObject(b);
// byte[] bytes = baos.toByteArray();
// pstmt.setBytes(Constants.DB_COL_BEAN, bytes);
pstmt.executeUpdate();
} catch (SQLException e) { e.printStackTrace();
} finally{
if(pstmt != null){
try{ pstmt.close(); }
catch (SQLException e) { e.printStackTrace(); }
}
}
}
}
The biggest bottleck in your code is that you are not batching the insert operations. You should really call pstmt.addBatch(); instead of pstmt.executeUpdate(); and execute the batch once you have something like a batch of 10K rows to insert.
On the CSV parsing side should really consider using a csv library to do the parsing for you. Univocity-parsers has the fastest CSV parser around and it should process these 2.5 million lines in less than a second. I'm the author of this library by the way.
String.split() is convenient but not fast. For anything more than a few dozen rows it doesn't make sense to use this.
Hope this helps.
I have a byte[] which is actually an image.
i want to store it in Oracle 11g. I created a BLOB Column in my Table. and by following i tried to insert it.
String imageStr = "xyz...."
byte[] data = imageStr.getBytes();
String sQuery = "insert into Table (LOCATION , BLOB_DATA) Values ('Lahore', data) ";
It throws exception "java.sql.SQLException: ORA-01465: invalid hex number"
I searched it and found that this type of query should be done via PreparedStaement.
so i did something following
PreparedStatement prepStmt = dbConnection.prepareStatement("insert into Table (LOCATION, BLOB_DATA) values(?,?);
prepStmt.setString(1, 'Lahore');
prepStmt.setBytes(2, bytes);
I start getting error on dbConnection.prepareStatement(String) because the DBConnection class is not Java Native class.
It's a Custom class made by Earlier Developers for Database Connection and it do not has prepareStatement(String) function in it.
So what to do now??
1. Should i create a method prepareStatement(String) in DBConnection class?
2. Should i go for first approach?
You can look at my example to store image in db
Statement s;
Connection c;
FileInputStream fis;
PreparedStatement ps;
File file;
try
{
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");//your driver
c=DriverManager.getConnection("Jdbc:Odbc:image","scott","tiger");//password and name changes according to your db
s=c.createStatement();
st.execute("Create table ImageStoring(Image_No number(5),Photo blob)");
}
catch(Exception e1)
{
e1.printStackTrace();
}
try
{
file=new File"D:/ARU/Aruphotos/4.jpg");
fis=new FileInputStream(file);
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
c=DriverManager.getConnection("Jdbc:Odbc:image","scott","tiger");
s=c.createStatement();
ps=c.prepareStatement("insert into ImageStoring values(?,?)");
ps.setInt(1,2);
ps.setBinaryStream(2,fis,(int)file.length());
System.out.println("success");
ps.execute();
ps.close();
c.close();
}
catch(Exception e)
{
e.printStackTrace();
}
}
I have a jsp wherein for each row of the table i need to display the image present in the database. I retrieve all the table row data from database including image as Blob and store it in a bean. The image is stored in the bean as byte array like this:
photo = rs.getBlob("PHOTO");
photoByteArray = photo.getBytes(1, (int)photo.length());
While looping over the list of beans in jsp, the src attribute points to a servlet like this:
<img class="img" width="55" height="50" src="displayThumbnail?photoData=${part.photoData}">
which serves the image like shown below but they don't show up however upon debugging the byte array do seem to have data.
protected void processRequest(HttpServletRequest request, HttpServletResponse response) throws IOException {
response.setContentType("image/jpeg");
OutputStream o = response.getOutputStream();
String photoDataStr = request.getParameter("photoData");
byte[] photoData = null;
if(photoDataStr != null) {
photoData = photoDataStr.getBytes();
}
o.write(photoData);
o.close();
}
However the image doesn't show up. Now, if i query the database for each individual image as shown below, the images do show up fine in that case.
protected void processRequest(HttpServletRequest request, HttpServletResponse response) {
PreparedStatement pstmt = null;
ResultSet rs = null;
Connection conn = null;
try {
if(conn == null) {
conn = open();
}
pstmt = conn.prepareStatement("select photo from PART_PHOTOS where id = ?");
String id = request.getParameter("id");
pstmt.setString(1, id);
rs = pstmt.executeQuery();
if (rs.next()) {
Blob b = rs.getBlob("photo");
response.setContentType("image/jpeg");
response.setContentLength((int) b.length());
InputStream is = b.getBinaryStream();
OutputStream os = response.getOutputStream();
byte buf[] = new byte[(int) b.length()];
is.read(buf);
os.write(buf);
os.close();
is.close();
}
} catch (Exception ex) {
System.out.println(ex.getMessage());
ex.printStackTrace();
} finally {
if (rs != null) {
try {
rs.close();
} catch (SQLException e) {
e.printStackTrace();
}
rs = null;
}
if (pstmt != null) {
try {
pstmt.close();
} catch (SQLException e) {
e.printStackTrace();
}
pstmt = null;
}
//check if it's the end of the loop
if (conn != null) {
try {
conn.close();
} catch (SQLException e) {
e.printStackTrace();
}
conn = null;
}
}
}
I would highly appreciate if anyone can provide any recommendations around the same.
You're assuming that you can put random binary data into an HTML file, and it will be parsed correctly, and sent back to your server intact. This is a bad assumption! If nothing else, the byte that corresponds to the ASCII for the quote character is going to cause problems, right? Not to mention encoding issues, and the fact that the parameters to a URL must be urlencoded. This is just doomed to fail.
To make this work, you'd have to have some kind of explicit text encoding of the binary data when you serve the page (base64, maybe), and then decode the servlet parameter back to binary image data after the URL is posted back.
Your first processRequest() snippet is only sending back the byte representation of the photoData request parameter and not the photo data identified by the parameter. Looks like a bug in your code.
It seems you are trying to solve your problem in the wrong manner. When you first create the HTML table, storing the image in your "bean" from your first query gives you nothing unless you cache the data, and the subseqent displayThumbnail request retrieves the image from the cache, avoiding the database query.
If you do not want to mess with caching, then there is no need to store the image in your initial bean since it gives you nothing, and just do something like your second processRequest() snippet to fetch the image directly when the browser asks for it.
Your ${part.photoData} expression must return some ID. In the processRequest() method you must get that ID value (by using request.getParameter("photoData")) and by that value retrieve image from database (or better from cache or from file system) and send the binary data to web client.
There is a table phonenumbers with two columns: id, and number. There are about half a million entries in the table. Database is MySQL.
The requirement is to develop a simple Java EE application, connected to that database, that allows a user to download all numbervalues in comma separated style by following a specific URL.
If we get all the values in a huge String array and then concatenate them (with comma in between all the values) in a String and then send it down to the user, does it sound a proper solution?
The application is not public and will be used by a limited no. of people.
Your best bet is to not store the data in Java's memory in any way, but just write the obtained data to the response immediately as the data comes in. You also need to configure the MySQL JDBC driver to serve the resultset row-by-row by Statement#setFetchSize() as per the MySQL JDBC driver documentation, otherwise it will cache the whole thing in memory.
Assuming you're familiar with Servlets, here's a kickoff example which takes that all into account:
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
response.setContentType("text/plain");
response.setHeader("Content-Disposition", "attachment;filename=numbers.txt"); // Force download popup.
Connection connection = null;
Statement statement = null;
ResultSet resultSet = null;
Writer writer = response.getWriter();
try {
connection = database.getConnection();
statement = connection.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
statement.setFetchSize(Integer.MIN_VALUE);
resultSet = statement.executeQuery("SELECT number FROM phonenumbers");
while (resultSet.next()) {
writer.write(resultSet.getString("number"));
if (!resultSet.isLast()) {
writer.write(",");
}
}
} catch (SQLException e) {
throw new ServletException("Query failed!", e);
} finally {
if (resultSet != null) try { resultSet.close; } catch (SQLException logOrIgnore) {}
if (statement != null) try { statement.close; } catch (SQLException logOrIgnore) {}
if (connection != null) try { connection.close; } catch (SQLException logOrIgnore) {}
}
}
There's a bit more to properly formatting CSV output. It would be easiest to use an existing library such as this one to generate the output file.
You can generate output to a file on disk (on the web server) and then redirect the browser to that file (with a cron job or whatever to clean up old data) or just stream the result directly back to the user.
If you are streaming directly be sure and set the MIME type to something that will trigger a download in the user's browser (e.g. text/csv or text/comma-separated-values)
If using Mysql 5.1+, I would simply use the proprietary syntax to dump the file somewhere and stream it in a Servlet response.
SELECT a,b,a+b INTO OUTFILE '/tmp/result.txt'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM test_table;
http://dev.mysql.com/doc/refman/5.1/en/select.html
For so many records, if you still want to use JDBC, you may try the following:
fetch the number of records fetch few
records( using a query
limit ) and write them
if you reach the number of
records in a chunk, you fetch another
one until you reach the maximum
number of records