Handling large records in a Java EE application - java

There is a table phonenumbers with two columns: id, and number. There are about half a million entries in the table. Database is MySQL.
The requirement is to develop a simple Java EE application, connected to that database, that allows a user to download all numbervalues in comma separated style by following a specific URL.
If we get all the values in a huge String array and then concatenate them (with comma in between all the values) in a String and then send it down to the user, does it sound a proper solution?
The application is not public and will be used by a limited no. of people.

Your best bet is to not store the data in Java's memory in any way, but just write the obtained data to the response immediately as the data comes in. You also need to configure the MySQL JDBC driver to serve the resultset row-by-row by Statement#setFetchSize() as per the MySQL JDBC driver documentation, otherwise it will cache the whole thing in memory.
Assuming you're familiar with Servlets, here's a kickoff example which takes that all into account:
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
response.setContentType("text/plain");
response.setHeader("Content-Disposition", "attachment;filename=numbers.txt"); // Force download popup.
Connection connection = null;
Statement statement = null;
ResultSet resultSet = null;
Writer writer = response.getWriter();
try {
connection = database.getConnection();
statement = connection.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
statement.setFetchSize(Integer.MIN_VALUE);
resultSet = statement.executeQuery("SELECT number FROM phonenumbers");
while (resultSet.next()) {
writer.write(resultSet.getString("number"));
if (!resultSet.isLast()) {
writer.write(",");
}
}
} catch (SQLException e) {
throw new ServletException("Query failed!", e);
} finally {
if (resultSet != null) try { resultSet.close; } catch (SQLException logOrIgnore) {}
if (statement != null) try { statement.close; } catch (SQLException logOrIgnore) {}
if (connection != null) try { connection.close; } catch (SQLException logOrIgnore) {}
}
}

There's a bit more to properly formatting CSV output. It would be easiest to use an existing library such as this one to generate the output file.
You can generate output to a file on disk (on the web server) and then redirect the browser to that file (with a cron job or whatever to clean up old data) or just stream the result directly back to the user.
If you are streaming directly be sure and set the MIME type to something that will trigger a download in the user's browser (e.g. text/csv or text/comma-separated-values)

If using Mysql 5.1+, I would simply use the proprietary syntax to dump the file somewhere and stream it in a Servlet response.
SELECT a,b,a+b INTO OUTFILE '/tmp/result.txt'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM test_table;
http://dev.mysql.com/doc/refman/5.1/en/select.html
For so many records, if you still want to use JDBC, you may try the following:
fetch the number of records fetch few
records( using a query
limit ) and write them
if you reach the number of
records in a chunk, you fetch another
one until you reach the maximum
number of records

Related

loading big csv file to redis

I'm trying to load a csv file (this file has 27 millions rows) in redis DB, but it throws me an exception of 'redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out'
My code:
Jedis jedis = new Jedis();
try {
BufferedReader in = new BufferedReader(new FileReader(fileName));
String line = in.readLine();
int i =0;
while (line != null) {
try{
String[] edgeDescription = line.split("\\s");
if (edgeDescription.length >=0) {
jedis.hset( String.valueOf(i) ,edgeDescription[0],edgeDescription[1]);
}
i++;
}
catch (OutOfMemoryError e)
{
e.getMessage();
}
line = in.readLine();
}
} catch (Exception e) {
e.printStackTrace();
}
Given the set of information, it is difficult to answer this question. Though I will try to give you some points to check:
Where do you state to which database to connect? I don't see any host/port spec. Can it be that in this case jedis instance tries to connect to localhost:6379?
Is your code able to write some data to redis at all?
If yes, can you introduce a counter and count how many lines are processed and how many lines are written to DB? After connection timeouts what happens with redis instance? Does it still run and everything is ok?
Did you try redis-cli -h host -p port monitor command?
A small remark: your code in general is not efficient, you should use pipelines and process writes in chunks of 1000 writes or so, this will be much faster. Did you consider this?

Efficiently adding huge amounts of data from CSV files into an SQLite DB in Java [duplicate]

This question already has answers here:
Android SQLite database: slow insertion
(5 answers)
Closed 6 years ago.
I'm trying to parse values from a CSV file to a SQLite DB, however the file is quite large (~2,500,000 lines). I ran my program for a a few hours, printing where it was up to, but by my calculation, the file would have taken about 100 hours to parse completely, so I stopped it.
I'm going to have to run this program as a background process at least once a week, on a new CSV file that is around 90% similar to the previous one. I have come up with a few solutions to improve my program. However I don't know much about databases, so I have questions about each of my solutions.
Is there a more efficient way to read a CSV file than what I have already?
Is instantiating an ObjectOutputStream, and storing it as a BLOB significantly computationally expensive? I could directly add the values instead, but I use the BLOB later, so storing it now saves me from instantiating a new one multiple times.
Would connection pooling, or changing the way I use the Connection in some other way be more efficient?
I'm setting the URL column as UNIQUE so I can use INSERT OR IGNORE, but testing this on smaller datasets(~10000 lines) indicates that there is no performance gain compared to dropping the table and repopulating. Is there a faster way to add only unique values?
Are there any obvious mistakes I'm making? (Again, I know very little about databases)
public class Database{
public void createResultsTable(){
Statement stmt;
String sql = "CREATE TABLE results("
+ "ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, "
+ "TITLE TEXT NOT NULL, "
+ "URL TEXT NOT NULL UNIQUE, "
...
...
+ "SELLER TEXT NOT NULL, "
+ "BEAN BLOB);";
try {
stmt = c.createStatement();
stmt.executeUpdate(sql);
} catch (SQLException e) { e.printStackTrace();}
}
public void addCSVToDatabase(Connection conn, String src){
BufferedReader reader = null;
DBEntryBean b;
String[] vals;
try{
reader = new BufferedReader(new InputStreamReader(new FileInputStream(src), "UTF-8"));
for(String line; (line = reader.readLine()) != null;){
//Each line takes the form: "title|URL|...|...|SELLER"
vals = line.split("|");
b = new DBEntryBean();
b.setTitle(vals[0]);
b.setURL(vals[1]);
...
...
b.setSeller(vals[n]);
insert(conn, b);
}
} catch(){
}
}
public void insert(Connection conn, DBEntryBean b){
PreparedStatement pstmt = null;
String sql = "INSERT OR IGNORE INTO results("
+ "TITLE, "
+ "URL, "
...
...
+ "SELLER, "
+ "BEAN"
+ ");";
try {
pstmt = c.prepareStatement(sql);
pstmt.setString(Constants.DB_COL_TITLE, b.getTitle());
pstmt.setString(Constants.DB_COL_URL, b.getURL());
...
...
pstmt.setString(Constants.DB_COL_SELLER, b.getSeller());
// ByteArrayOutputStream baos = new ByteArrayOutputStream();
// oos = new ObjectOutputStream(baos);
// oos.writeObject(b);
// byte[] bytes = baos.toByteArray();
// pstmt.setBytes(Constants.DB_COL_BEAN, bytes);
pstmt.executeUpdate();
} catch (SQLException e) { e.printStackTrace();
} finally{
if(pstmt != null){
try{ pstmt.close(); }
catch (SQLException e) { e.printStackTrace(); }
}
}
}
}
The biggest bottleck in your code is that you are not batching the insert operations. You should really call pstmt.addBatch(); instead of pstmt.executeUpdate(); and execute the batch once you have something like a batch of 10K rows to insert.
On the CSV parsing side should really consider using a csv library to do the parsing for you. Univocity-parsers has the fastest CSV parser around and it should process these 2.5 million lines in less than a second. I'm the author of this library by the way.
String.split() is convenient but not fast. For anything more than a few dozen rows it doesn't make sense to use this.
Hope this helps.

What is the easiest way to save a String on a server

I created a game and now I want to add a global highscore. I want to save the highscore on my server. I think the easiest way is to overwrite a textfile on my server that stores the scores and the names of the top players. How can I do this? The game is not running on my server! It is running on the client side.
Here is an example of writing a string to a file using the java.nio.file.Files class:
try {
String hightscore = "MyString";
Files.write(new File("D:/temp/file.txt").toPath(), hightscore.getBytes());
} catch (IOException e) {
e.printStackTrace();
}
You can save it as a flat file like flavio.donze did or you can use a database.
So there is usually no relevance if you want to save it one the server or somewhere else. The path decides the location.
If you want to upload the scores from a client to a server, you can use multiple solutions.
F.e. adding per RMI or Webservice call
You can hire a simple hosting PHP/MySql and save the score in a database.
try {
URL url = new URL("http://exemple.com/saveScore.php");
InputStream is = url.openStream();
Scanner scanner = new Scanner(is);
scanner.useDelimiter("\\A");
String response = scanner.hasNext() ? scanner.next() : null;
if (response == "ok") {
System.out.println("Saved!");
}
}
catch (IOException e) {
e.printStackTrace();
}
In saveScore.php, after you save, just print ok.
<?php
// DB OPERATIONS
echo "ok";
exit;

Image not served inside jsp from Servlets

I have a jsp wherein for each row of the table i need to display the image present in the database. I retrieve all the table row data from database including image as Blob and store it in a bean. The image is stored in the bean as byte array like this:
photo = rs.getBlob("PHOTO");
photoByteArray = photo.getBytes(1, (int)photo.length());
While looping over the list of beans in jsp, the src attribute points to a servlet like this:
<img class="img" width="55" height="50" src="displayThumbnail?photoData=${part.photoData}">
which serves the image like shown below but they don't show up however upon debugging the byte array do seem to have data.
protected void processRequest(HttpServletRequest request, HttpServletResponse response) throws IOException {
response.setContentType("image/jpeg");
OutputStream o = response.getOutputStream();
String photoDataStr = request.getParameter("photoData");
byte[] photoData = null;
if(photoDataStr != null) {
photoData = photoDataStr.getBytes();
}
o.write(photoData);
o.close();
}
However the image doesn't show up. Now, if i query the database for each individual image as shown below, the images do show up fine in that case.
protected void processRequest(HttpServletRequest request, HttpServletResponse response) {
PreparedStatement pstmt = null;
ResultSet rs = null;
Connection conn = null;
try {
if(conn == null) {
conn = open();
}
pstmt = conn.prepareStatement("select photo from PART_PHOTOS where id = ?");
String id = request.getParameter("id");
pstmt.setString(1, id);
rs = pstmt.executeQuery();
if (rs.next()) {
Blob b = rs.getBlob("photo");
response.setContentType("image/jpeg");
response.setContentLength((int) b.length());
InputStream is = b.getBinaryStream();
OutputStream os = response.getOutputStream();
byte buf[] = new byte[(int) b.length()];
is.read(buf);
os.write(buf);
os.close();
is.close();
}
} catch (Exception ex) {
System.out.println(ex.getMessage());
ex.printStackTrace();
} finally {
if (rs != null) {
try {
rs.close();
} catch (SQLException e) {
e.printStackTrace();
}
rs = null;
}
if (pstmt != null) {
try {
pstmt.close();
} catch (SQLException e) {
e.printStackTrace();
}
pstmt = null;
}
//check if it's the end of the loop
if (conn != null) {
try {
conn.close();
} catch (SQLException e) {
e.printStackTrace();
}
conn = null;
}
}
}
I would highly appreciate if anyone can provide any recommendations around the same.
You're assuming that you can put random binary data into an HTML file, and it will be parsed correctly, and sent back to your server intact. This is a bad assumption! If nothing else, the byte that corresponds to the ASCII for the quote character is going to cause problems, right? Not to mention encoding issues, and the fact that the parameters to a URL must be urlencoded. This is just doomed to fail.
To make this work, you'd have to have some kind of explicit text encoding of the binary data when you serve the page (base64, maybe), and then decode the servlet parameter back to binary image data after the URL is posted back.
Your first processRequest() snippet is only sending back the byte representation of the photoData request parameter and not the photo data identified by the parameter. Looks like a bug in your code.
It seems you are trying to solve your problem in the wrong manner. When you first create the HTML table, storing the image in your "bean" from your first query gives you nothing unless you cache the data, and the subseqent displayThumbnail request retrieves the image from the cache, avoiding the database query.
If you do not want to mess with caching, then there is no need to store the image in your initial bean since it gives you nothing, and just do something like your second processRequest() snippet to fetch the image directly when the browser asks for it.
Your ${part.photoData} expression must return some ID. In the processRequest() method you must get that ID value (by using request.getParameter("photoData")) and by that value retrieve image from database (or better from cache or from file system) and send the binary data to web client.

How to overcome OutOfMemoryError during huge file write

I am writing a full database extract program in java. Database is Oracle, and it is huge. Some tables have ~260 million records. The program should create one file per table in a specific format, so using Oracle datapump etc is not an option. Also, some company security policies do not allow to write a PL/SQL procedure to create files on DB server for this requirement. I have to go with Java and JDBC.
The issue I am facing is that Since files for some of the table is huge (~30GB) I am running out of memory almost every time even with a 20GB Java Heap. During the creation of file when the file size exceeds the heap size, even with one of the most aggressive GC policy, the process seems to hang-up. For example if the file size is > 20GB and heap size is 20GB, once heap utilization hits max heap size, its slows down writing 2MB per minute or so and at this speed, it will take months to get full extract.
I am looking for some way to overcome this issue. Any help would be greatly appreciated.
Here are some details of the system configuration I have:
Java - JDK1.6.0_14
System config - RH Enterprise Linux (2.6.18) running on 4 X Intel Xeon E7450 (6 cores) #2.39GH
RAM - 32GB
Database Oracle 11g
file wirting part of the code goes below:
private void runQuery(Connection conn, String query, String filePath,
String fileName) throws SQLException, Exception {
PreparedStatement stmt = null;
ResultSet rs = null;
try {
stmt = conn.prepareStatement(query,
ResultSet.TYPE_SCROLL_INSENSITIVE,
ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(maxRecBeforWrite);
rs = stmt.executeQuery();
// Write query result to file
writeDataToFile(rs, filePath + "/" + fileName, getRecordCount(
query, conn));
} catch (SQLException sqle) {
sqle.printStackTrace();
} finally {
try {
rs.close();
stmt.close();
} catch (SQLException ex) {
throw ex;
}
}
}
private void writeDataToFile(ResultSet rs, String tempFile, String cnt)
throws SQLException, Exception {
FileOutputStream fileOut = null;
int maxLength = 0;
try {
fileOut = new FileOutputStream(tempFile, true);
FileChannel fcOut = fileOut.getChannel();
List<TableMetaData> metaList = getMetaData(rs);
maxLength = getMaxRecordLength(metaList);
// Write Header
writeHeaderRec(fileOut, maxLength);
while (rs.next()) {
// Now iterate on metaList and fetch all the column values.
writeData(rs, metaList, fcOut);
}
// Write trailer
writeTrailerRec(fileOut, cnt, maxLength);
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
try {
fileOut.close();
} catch (IOException ioe) {
fileOut = null;
throw new Exception(ioe.getMessage());
}
}
}
private void writeData(ResultSet rs, List<TableMetaData> metaList,
FileChannel fcOut) throws SQLException, IOException {
StringBuilder rec = new StringBuilder();
String lf = "\n";
for (TableMetaData tabMeta : metaList) {
rec.append(getFormattedString(rs, tabMeta));
}
rec.append(lf);
ByteBuffer byteBuf = ByteBuffer.wrap(rec.toString()
.getBytes("US-ASCII"));
fcOut.write(byteBuf);
}
private String getFormattedString(ResultSet rs, TableMetaData tabMeta)
throws SQLException, IOException {
String colValue = null;
// check if it is a CLOB column
if (tabMeta.isCLOB()) {
// Column is a CLOB, so fetch it and retrieve first clobLimit chars.
colValue = String.format("%-" + tabMeta.getColumnSize() + "s",
getCLOBString(rs, tabMeta));
} else {
colValue = String.format("%-" + tabMeta.getColumnSize() + "s", rs
.getString(tabMeta.getColumnName()));
}
return colValue;
}
Its probably due to the way you call prepareStatement, see this question for a similar problem. You don't need scrollability and a ResultSet will be read-only be default so just call
stmt = conn.prepareStatement(query);
Edit:
Map your database tables to Class usig JPA.
Now load collection of Objects from DB using Hibernate in the Batch of some tolerable size and serialize it to FILE .
Is your algorithm like the following? This is assuming a direct mapping between DB rows and lines in the file:
// open file for writing with buffered writer.
// execute JDBC statement
// iterate through result set
// convert rs to file format
// write to file
// close file
// close statement/rs/connection etc
Try using Spring JDBC Template to simplify the JDBC portion.
I believe this must be possible on default 32 MB java heap. Just fetch each row, save the data to file stream, flash and close once done.
What value are you using for maxRecBeforWrite?
Perhaps the query of the max record length is defeating your setFetchSize by forcing JDBC to scan the entire result for record length? Maybe you could delay writing your header and note the max record size on the fly.

Categories