Difference in Reading .CSV file in UNIX System & Windows System

Difference in Reading .CSV file in UNIX System & Windows System - java

I have created a JSP code where we can upload a .csv file. The JSP Code is supported by a java code that reads the .csv file and compares the urls in the file with the DB and adds it into to the DB if the urls are not already present.
The above scenario works absolutely fine when its executed in a windows system.
I uploaded the succesfully executed web application folder to a unix system. When I executed the program in the UNIX system, the tool is not comparing the URLs with the DB and adds it.
I suspect there should be some problem in reading the .csv file in a UNIX sytem.
Am using fedora(linux) OS. Kindly let me know whether there is any differences in reading .csv file between a windows system and a unix system.
The .csv file I am using has the following contents,
http://www.topix.com,sdfasdf
http://rss.news.yahoo.com/rss/topstories,Apple
http://www.apple.com/354,sdfasdf
http://www.topix.com/rss/city/emporia-ks,sdfasdf
http://www.topix.com/rss/,sdfasdf
http://ax.itunes.apple.com/WebObjects/MZStoreServices.woa/ws/RSS/topsongs/limit=10/xml,sdfasdf
http://www.topix.com/rss/city/emp,sdfasdf
http://www.topix.com/rss/city/sandy-ut,dfgsdfg
http://www.apple.com,Yahoo
UPDATE FOR JEFF
try {
List items = uploadHandler.parseRequest(request);
Iterator itr = items.iterator();
while(itr.hasNext()) {
FileItem item = (FileItem) itr.next();
if(item.isFormField()) {
out.println("File Name = "+item.getFieldName()+", Value = "+item.getString());
} else {
File file = new File(destinationDir,item.getName());
item.write(file);
//String temp=item.getName();
String fileToBeRead = "C:/Program Files/Apache Software Foundation/Tomcat 5.5/webapps/Readcsv/files/"+item.getName();
String urlcnt="";
String srccnt="";
String contentType="";
Connection con=null;
Statement stmt=null;
final String rssvar="Rss";
final String other="Other";
int i=0;
int j=0;
try {
BufferedReader br = new BufferedReader(new FileReader(fileToBeRead));
String strLine = "";
StringTokenizer st = null;
while( (strLine = br.readLine()) != null)
{
st = new StringTokenizer(strLine, ",");
while(st.hasMoreTokens()){
urlcnt=st.nextToken();
srccnt=st.nextToken();
}
if(con==null){
SQLConnection.setURL("jdbc:sqlserver://192.168.2.53\\SQL2005;user=sa;password=365media;DatabaseName=LN_ADWEEK");
con=SQLConnection.getNewConnection();
stmt=con.createStatement();
}
try{
ResultSet rs;
boolean hasRows=false;
rs=stmt.executeQuery("select url from urls_linkins where url='"+urlcnt+"'");
while(rs.next()){
hasRows=true;
i++;
}
if(!hasRows){
j++;
URL url = new URL(urlcnt);
URLConnection url1=url.openConnection();
contentType=url1.getContentType();
PreparedStatement insertUrlStatement = con.prepareStatement("INSERT INTO urls_linkins(url, source_name, is_active, is_periodic, Link_Type, New_Entry) VALUES(?, ?, ?, ?, ?, ?)");
if(contentType.contains("rss") || contentType.contains("xml"))
{
insertUrlStatement.setString(1, urlcnt);
insertUrlStatement.setString(2, srccnt);
insertUrlStatement.setInt(3, 1);
insertUrlStatement.setInt(4, 0);
insertUrlStatement.setString(5, rssvar);
insertUrlStatement.setInt(6, 1);
insertUrlStatement.executeUpdate();
insertUrlStatement.close();
}
else{
insertUrlStatement.setString(1, urlcnt);
insertUrlStatement.setString(2, srccnt);
insertUrlStatement.setInt(3, 1);
insertUrlStatement.setInt(4, 0);
insertUrlStatement.setString(5, other);
insertUrlStatement.setInt(6, 1);
insertUrlStatement.executeUpdate();
insertUrlStatement.close();
}
}
}
catch(Exception e){
e.printStackTrace();
}
}
}catch(Exception e){
e.printStackTrace();
}finally{
out.println("<h2>"+j+" url has been added and "+i+" url already exists in the DB</h2>");
out.println("<a href=Addurl.jsp>Check URL</a>");
out.println("<a href=Addurl1.jsp>Add Single URL</a>");
out.println("<a href=uploadcsv.jsp>Add Multiple URL</a>");
}
}
out.close();
}
}catch(FileUploadException ex) {
log("Error encountered while parsing the request",ex);
} catch(Exception ex) {
log("Error encountered while uploading file",ex);
}
This is my reading code of the .csv file.

Yes there will be differences in reading the .csv file when you transfer from a windows machine to a unix machine even when it's a text file. There are hidden space characters which may be represented differently on the unix machine.
I suspect that the reason it is not comparing the URLs is because the space characters might be different ASCII values so it thinks they are different and adds the URL into the DB.
One suggestion would be to use the dos2unix command.
http://kb.iu.edu/data/acux.html
Hope it helps.

Related

Why is the file Read-Only?

I have got a Microsoft Access database in the resource folder of my Java application.
When the user clicks a button, this database is copied to the temp directory of the PC. Then I make a temporary VBS file in the same directory and execute it.
(This VBS file calls a VBA macro within the database, that deletes some records.)
However, as the macro attempts to delete the records an error is thrown stating that the database is read only.
Why does this happen?
Here is my code:
When the user clicks the button, some variables are set and then the following code is executed:
private void moveAccess() throws IOException {
String dbName = "sys_cl_imp.accdb";
String tempDbPath = System.getenv("TEMP").replace('\\', '/') + "/" + dbName;
InputStream in = ConscriptioLegere.class.getResourceAsStream("res/" + dbName);
File f = new File(tempDbPath);
Files.copy(in, f.toPath(), StandardCopyOption.REPLACE_EXISTING);
this.dbFilePath = tempDbPath;
System.out.println("access in temp");
f = null;
}
Then a connection is made to the database to update some data;
with
Connection con = DriverManager.getConnection("jdbc:ucanaccess://" + dbFilePath);
Statement sql = con.createStatement();
...
sql.close();
con.close();
Afterwards this is executed:
public boolean startImport() {
File vbsFile = new File(vbsFilePath);
PrintWriter pw;
try {
updateAccess();
} catch (IOException e) {
e.printStackTrace();
return false;
}
try{
pw = new PrintWriter(vbsFile);
pw.println("Set accessApp = CreateObject(\"Access.Application\")");
pw.println("accessApp.OpenCurrentDatabase (\"" + dbFilePath + "\")");
pw.println("accessApp.Run \"sys_cl_imp.importData\", \"" + saveLoc + "\"");
pw.println("accessApp.CloseCurrentDatabase");
pw.close();
Process p = Runtime.getRuntime().exec("cscript /nologo \"" + vbsFilePath + "\"");
While the process is running, the error occurres.
I don't understand why the database is open as ReadOnly.
I tried setting f to null after the copying of the db, but it proved not to work that way.

Based on this dicussion.
The solution is adding ;singleconnection=true to JDBC url. UCanAccess will close the file after JDBC connection closed.
Connection con = DriverManager.getConnection("jdbc:ucanaccess://" + dbFilePath +";singleconnection=true");

Thank you for your solution beckyang.
I managed to get it working with it, but there was a second mistake:
I deleted the contents of a table with java then closed the connection and run the vba procedure.
In the VBA I was attempting to delete the data again; but as there were none, this didn't work out.
After deleting the SQL from the VBA, the project worked :)

Efficiently adding huge amounts of data from CSV files into an SQLite DB in Java [duplicate]

This question already has answers here:
Android SQLite database: slow insertion
(5 answers)
Closed 6 years ago.
I'm trying to parse values from a CSV file to a SQLite DB, however the file is quite large (~2,500,000 lines). I ran my program for a a few hours, printing where it was up to, but by my calculation, the file would have taken about 100 hours to parse completely, so I stopped it.
I'm going to have to run this program as a background process at least once a week, on a new CSV file that is around 90% similar to the previous one. I have come up with a few solutions to improve my program. However I don't know much about databases, so I have questions about each of my solutions.
Is there a more efficient way to read a CSV file than what I have already?
Is instantiating an ObjectOutputStream, and storing it as a BLOB significantly computationally expensive? I could directly add the values instead, but I use the BLOB later, so storing it now saves me from instantiating a new one multiple times.
Would connection pooling, or changing the way I use the Connection in some other way be more efficient?
I'm setting the URL column as UNIQUE so I can use INSERT OR IGNORE, but testing this on smaller datasets(~10000 lines) indicates that there is no performance gain compared to dropping the table and repopulating. Is there a faster way to add only unique values?
Are there any obvious mistakes I'm making? (Again, I know very little about databases)
public class Database{
public void createResultsTable(){
Statement stmt;
String sql = "CREATE TABLE results("
+ "ID INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, "
+ "TITLE TEXT NOT NULL, "
+ "URL TEXT NOT NULL UNIQUE, "
...
...
+ "SELLER TEXT NOT NULL, "
+ "BEAN BLOB);";
try {
stmt = c.createStatement();
stmt.executeUpdate(sql);
} catch (SQLException e) { e.printStackTrace();}
}
public void addCSVToDatabase(Connection conn, String src){
BufferedReader reader = null;
DBEntryBean b;
String[] vals;
try{
reader = new BufferedReader(new InputStreamReader(new FileInputStream(src), "UTF-8"));
for(String line; (line = reader.readLine()) != null;){
//Each line takes the form: "title|URL|...|...|SELLER"
vals = line.split("|");
b = new DBEntryBean();
b.setTitle(vals[0]);
b.setURL(vals[1]);
...
...
b.setSeller(vals[n]);
insert(conn, b);
}
} catch(){
}
}
public void insert(Connection conn, DBEntryBean b){
PreparedStatement pstmt = null;
String sql = "INSERT OR IGNORE INTO results("
+ "TITLE, "
+ "URL, "
...
...
+ "SELLER, "
+ "BEAN"
+ ");";
try {
pstmt = c.prepareStatement(sql);
pstmt.setString(Constants.DB_COL_TITLE, b.getTitle());
pstmt.setString(Constants.DB_COL_URL, b.getURL());
...
...
pstmt.setString(Constants.DB_COL_SELLER, b.getSeller());
// ByteArrayOutputStream baos = new ByteArrayOutputStream();
// oos = new ObjectOutputStream(baos);
// oos.writeObject(b);
// byte[] bytes = baos.toByteArray();
// pstmt.setBytes(Constants.DB_COL_BEAN, bytes);
pstmt.executeUpdate();
} catch (SQLException e) { e.printStackTrace();
} finally{
if(pstmt != null){
try{ pstmt.close(); }
catch (SQLException e) { e.printStackTrace(); }
}
}
}
}

The biggest bottleck in your code is that you are not batching the insert operations. You should really call pstmt.addBatch(); instead of pstmt.executeUpdate(); and execute the batch once you have something like a batch of 10K rows to insert.
On the CSV parsing side should really consider using a csv library to do the parsing for you. Univocity-parsers has the fastest CSV parser around and it should process these 2.5 million lines in less than a second. I'm the author of this library by the way.
String.split() is convenient but not fast. For anything more than a few dozen rows it doesn't make sense to use this.
Hope this helps.

Exception: Operating System error code 3

I am doing text file bulk upload in SQL Server. Whenever i tries to upload the files, gets Following Exception:
[Microsoft] [ODBC SQL Server Driver] [SQL Server]Could not bulk insert
because file 'C:/Form/Input_File/Form.txt' could not be opened.
Operating System error code 3(The system cannot find the path
specified).
Please find the below code:
public void uploadFiles()
{
File dir = new File(inputFilesPath);
String[] children = dir.list();
String completePathFileName = "";
System.out.println(" Inside Upload ::");
String saveFileNames = "";
PreparedStatement prepStat;
DBConnection dbConnection=new DBConnection();
Connection conHandler= dbConnection.getConnection();
if(null!=conHandler)
System.out.println(" Clear ::"+conHandler);
try
{
if (children != null)
{
for (int i = 0; i < children.length; i++)
{
String filename = children[i];
System.out.println(" children[i]::"+children[i]);
// File is validated based on some business rules.
if (isValidFile(filename) == 1)
{
String[] fileSplit = filename.split("E");
String[] extnSplit = fileSplit[1].trim().split(".TXT");
completePathFileName += (completePathFileName.equals(""))
? extnSplit[0] : "^" + extnSplit[0];
saveFileNames += (saveFileNames.equals(""))
? filename : "," + filename;
System.out.println(extnSplit[0]);
}
else
{
inValidFileNames += (inValidFileNames.equals(""))
? filename : ";\n" + filename;
}
}
if (!completePathFileName.trim().equals(""))
{
System.out.println(completePathFileName);
prepStat = conHandler.prepareStatement("Exec StartFileImport ?");
prepStat.setString(1, completePathFileName);
prepStat.execute();
saveFileNameToDatabase(saveFileNames);
}
}
}
catch (SQLException sql)
{
System.out.println(sql.getMessage());
}
}
Getting Connection Object from the below code:
public Connection getConnection()
{
System.out.println("In side DB Connection...");
try{
// get a database connection
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
System.out.println("Before Driver");
conn= DriverManager.getConnection("jdbc:odbc:form26qa","form26","form26");
System.out.println("After Driver");
if(conn != null)
{
System.out.println("Connection established...");
}//if
else
{
System.out.println("Connection failed...");
}//else
}//try
catch(Exception e)
{
System.out.println("Exception ocurred..."+e);
e.printStackTrace();
}//catch
return conn;
}
Explanation:
I am reading files from the input path and tried to get the fileName and file path and uploading file into SQL Server.
Application is able to find the input file in the specified path. while uploading i am getting the above mentioned Exception
Please check and suggest me to fix the issue.

The file needs to be accessable on the server. The file path is
relative to the server, not your PC. Also, if you are trying to use a
share or a mapped drive it will not work. You need to use the UNC
path.
UNC Name Examples
\\teela\admin$ (to reach C:\WINNT)
\\teela\admin$\system32 (to reach C:\WINNT\system32)
\\teela\temp (to reach C:\temp)

Executing a .sql file through Java

I have a sql script file, i need to execute commands present in that through java. I searched in internet regarding the same, i got some code defining parsers to split the SQL statements and executing that. But none of them worked for my sql script file.Because my script file contains both create statements and alter statements without semicolon at the end[Instead it has GO]Can anybody suggest a solution to execute the script file?
Thanks,
Mahesh

For simple scripts I generally use this class from ibatis - ScriptRunner. Alternative you can spawn a new db client process from Java and feed in the script you wan't execute. This will work for all scripts, as simple solutions like ScriptRunner don't work well when the delimiters in the sql files get changed for instance.
Here's an example how to feed the sql as a string to a spawed db client process:
private void runSql(String pSql) {
String tCommand = "mysql -u " + username + (password != null ? " -p" + password : "") + " " + dbName;
System.out.println(tCommand);
try {
Process tProcess = Runtime.getRuntime().exec(tCommand);
OutputStream tOutputStream = tProcess.getOutputStream();
Writer w = new OutputStreamWriter(tOutputStream);
System.out.println(pSql);
w.write(pSql);
w.flush();
Scanner in = new Scanner(tProcess.getErrorStream());
String errorMessage = "";
while (in.hasNext()) {
errorMessage += in.next() + " ";
}
if (errorMessage.length() > 0) {
System.out.println(errorMessage);
throw new ClientSqlExecutionException(errorMessage);
}
} catch (IOException e) {
e.printStackTrace();
}
}

Have a look at Mybatis Migrations code, it does something like the one you need:
http://code.google.com/p/mybatis/wiki/Migration

You need to change the parser so it produces executable statements. But I'm not sure I understand what you mean when you say "execute through Java".
Java won't execute those SQL statements - the database you connect to will. Java can connect to a database using JDBC and send the SQL statements from the file.
I don't see why you have to parse the SQL, unless you want Java to validate them before sending them on to the database server. The server will parse and validate them again, so it feels like you're doing extra work for nothing.

The simplest solution I can present to you is this, presuming I understand your question.
1) Read text file into a string or array via Java IO.
2) Pass string or array to MySQL via JDBC.
Read from file example,
import java.io.*;
class FileRead
{
public static void main(String args[])
{
try{
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("textfile.txt");
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println (strLine);
}
//Close the input stream
in.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
}
acquired from, http://www.roseindia.net/java/beginners/java-read-file-line-by-line.shtml

The simplest way is to simply get the statements and check if they need they semi-column at the end: (this is an example and only works if it is a statement by line:
public void executeScript(String script) {
BufferedReader in = new BufferedReader(new FileReader(script));
while (in.read() > 0) {
String statement = in.readLine();
statement = statement.trim().toLowerCase();
String command = statement.split("[ ]+")[0]; // split the statement.
if (command.equals("insert") || command.equals("update") /* || any other */) {
statement = statement + ";";
}
// execute statement using jdbc
}
}
If you do not know how to use jdbc, just ask :-)

Use this slightly modified version of the com.ibatis.common.jdbc.ScriptRunner class which is fully self contained, i.e. you do not need to have any third party jar dependencies.
It is possible to change the delimiter from ; to GO. I think that should do the trick.
Here is an example:
Reader reader = new BufferedReader(*** Your FileReader instance ***);
try
{
ScriptRunner runner = new ScriptRunner(connection, false, true);
runner.setDelimiter("GO", true);
runner.runScript(reader);
}
finally
{
reader.close();
}

Apache Ant SQL Task worked for me.
https://ant.apache.org/manual/Tasks/sql.html

How to overcome OutOfMemoryError during huge file write

I am writing a full database extract program in java. Database is Oracle, and it is huge. Some tables have ~260 million records. The program should create one file per table in a specific format, so using Oracle datapump etc is not an option. Also, some company security policies do not allow to write a PL/SQL procedure to create files on DB server for this requirement. I have to go with Java and JDBC.
The issue I am facing is that Since files for some of the table is huge (~30GB) I am running out of memory almost every time even with a 20GB Java Heap. During the creation of file when the file size exceeds the heap size, even with one of the most aggressive GC policy, the process seems to hang-up. For example if the file size is > 20GB and heap size is 20GB, once heap utilization hits max heap size, its slows down writing 2MB per minute or so and at this speed, it will take months to get full extract.
I am looking for some way to overcome this issue. Any help would be greatly appreciated.
Here are some details of the system configuration I have:
Java - JDK1.6.0_14
System config - RH Enterprise Linux (2.6.18) running on 4 X Intel Xeon E7450 (6 cores) #2.39GH
RAM - 32GB
Database Oracle 11g
file wirting part of the code goes below:
private void runQuery(Connection conn, String query, String filePath,
String fileName) throws SQLException, Exception {
PreparedStatement stmt = null;
ResultSet rs = null;
try {
stmt = conn.prepareStatement(query,
ResultSet.TYPE_SCROLL_INSENSITIVE,
ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(maxRecBeforWrite);
rs = stmt.executeQuery();
// Write query result to file
writeDataToFile(rs, filePath + "/" + fileName, getRecordCount(
query, conn));
} catch (SQLException sqle) {
sqle.printStackTrace();
} finally {
try {
rs.close();
stmt.close();
} catch (SQLException ex) {
throw ex;
}
}
}
private void writeDataToFile(ResultSet rs, String tempFile, String cnt)
throws SQLException, Exception {
FileOutputStream fileOut = null;
int maxLength = 0;
try {
fileOut = new FileOutputStream(tempFile, true);
FileChannel fcOut = fileOut.getChannel();
List<TableMetaData> metaList = getMetaData(rs);
maxLength = getMaxRecordLength(metaList);
// Write Header
writeHeaderRec(fileOut, maxLength);
while (rs.next()) {
// Now iterate on metaList and fetch all the column values.
writeData(rs, metaList, fcOut);
}
// Write trailer
writeTrailerRec(fileOut, cnt, maxLength);
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
try {
fileOut.close();
} catch (IOException ioe) {
fileOut = null;
throw new Exception(ioe.getMessage());
}
}
}
private void writeData(ResultSet rs, List<TableMetaData> metaList,
FileChannel fcOut) throws SQLException, IOException {
StringBuilder rec = new StringBuilder();
String lf = "\n";
for (TableMetaData tabMeta : metaList) {
rec.append(getFormattedString(rs, tabMeta));
}
rec.append(lf);
ByteBuffer byteBuf = ByteBuffer.wrap(rec.toString()
.getBytes("US-ASCII"));
fcOut.write(byteBuf);
}
private String getFormattedString(ResultSet rs, TableMetaData tabMeta)
throws SQLException, IOException {
String colValue = null;
// check if it is a CLOB column
if (tabMeta.isCLOB()) {
// Column is a CLOB, so fetch it and retrieve first clobLimit chars.
colValue = String.format("%-" + tabMeta.getColumnSize() + "s",
getCLOBString(rs, tabMeta));
} else {
colValue = String.format("%-" + tabMeta.getColumnSize() + "s", rs
.getString(tabMeta.getColumnName()));
}
return colValue;
}

Its probably due to the way you call prepareStatement, see this question for a similar problem. You don't need scrollability and a ResultSet will be read-only be default so just call
stmt = conn.prepareStatement(query);

Edit:
Map your database tables to Class usig JPA.
Now load collection of Objects from DB using Hibernate in the Batch of some tolerable size and serialize it to FILE .

Is your algorithm like the following? This is assuming a direct mapping between DB rows and lines in the file:
// open file for writing with buffered writer.
// execute JDBC statement
// iterate through result set
// convert rs to file format
// write to file
// close file
// close statement/rs/connection etc
Try using Spring JDBC Template to simplify the JDBC portion.

I believe this must be possible on default 32 MB java heap. Just fetch each row, save the data to file stream, flash and close once done.

What value are you using for maxRecBeforWrite?
Perhaps the query of the max record length is defeating your setFetchSize by forcing JDBC to scan the entire result for record length? Maybe you could delay writing your header and note the max record size on the fly.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Difference in Reading .CSV file in UNIX System & Windows System - java

Related

Why is the file Read-Only?

Efficiently adding huge amounts of data from CSV files into an SQLite DB in Java [duplicate]

Exception: Operating System error code 3

Executing a .sql file through Java

How to overcome OutOfMemoryError during huge file write

Categories

Resources