Read 20gb text file with java [closed] - java

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I have a 20gb text file that i would like to read and store the data into a database. The problem is when I try to load it before it can print out anything to see what the program is doing it is terminated, and it seems like it might be due to the size of the file. If anyone has any suggestions on how to read this file efficiently please show me.

From another post Read large files in Java
First, if your file contains binary data, then using BufferedReader would be a big mistake (because you would be converting the data to String, which is unnecessary and could easily corrupt the data); you should use a BufferedInputStream instead. If it's text data and you need to split it along linebreaks, then using BufferedReader is OK (assuming the file contains lines of a sensible lenght).
Regarding memory, there shouldn't be any problem if you use a decently sized buffer (I'd use at least 1MB to make sure the HD is doing mostly sequential reading and writing).
If speed turns out to be a problem, you could have a look at the java.nio packages - those are supposedly faster than java.io,
As for reading it to a database, make sure you make use of some sort of bulk loading API otherwise it would take forever.
Here is an example of a bulk loading routine I use for Netezza ...
private static final void executeBulkLoad(
Connection connection,
String schema,
String tableName,
File file,
String filename,
String encoding) throws SQLException {
String filePath = file.getAbsolutePath();
String logFolderPath = filePath.replace(filename, "");
String SQLString = "INSERT INTO " + schema + "." + tableName + "\n";
SQLString += "SELECT * FROM\n";
SQLString += "EXTERNAL '" + filePath + "'\n";
SQLString += "USING\n";
SQLString += "(\n";
SQLString += " ENCODING '" + encoding + "'\n";
SQLString += " QUOTEDVALUE 'NO'\n";
SQLString += " FILLRECORD 'TRUE'\n";
SQLString += " NULLVALUE 'NULL'\n";
SQLString += " SKIPROWS 1\n";
SQLString += " DELIMITER '\\t'\n";
SQLString += " LOGDIR '" + logFolderPath + "'\n";
SQLString += " REMOTESOURCE 'JDBC'\n";
SQLString += " CTRLCHARS 'TRUE'\n";
SQLString += " IGNOREZERO 'TRUE'\n";
SQLString += " ESCAPECHAR '\\'\n";
SQLString += ");";
Statement statement = connection.createStatement();
statement.execute(SQLString);
statement.close();
}

If you need to load the information into a database you can use Spring batch,
with this you are going to read your file, manage transaction, execute process over your file, persist your rows into a database, control how much records you are going to execute a commit, I think that is a better option because the first problem is to read the large file, but your next problem will be to manage the transaction of your database, control the commits, etc. I hop It help you

If you are reading very huge file, always prefer InputStreams.
e.g.
BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line = null;
StringBuilder responseData = new StringBuilder();
while((line = in.readLine()) != null) {
// process line
}

Related

Trouble with ResultSet using executeUpdate

I am new to programming and have run into a problem while using executeUpdate with the resultSet next() method.
It iterates once only through the result set then the execute update closes the result set. I get error: ResultSet not open. Operation "next" not permitted. Verify that autocommit is off.
I have added the con.setAutoCommit(false) statement but problem still persists.
I need to run the update multiple times with different variable values.
Here is the code I have:
try {
String eidQuery = "SELECT EID FROM EMPLOYEE_DATA WHERE ACTIVE = TRUE ORDER BY EID";
int nextEID;
Statement st = con.createStatement();
con.setAutoCommit(false);
rs = st.executeQuery(eidQuery);
while (rs.next()){
nextEID = rs.getInt(1);
String getDailyTotals = "SELECT DATE, SUM(TOTAL), MAX(OUT_1) FROM PUNCHES WHERE EID = " + nextEID + " AND DATE >= '" + fd + "' "
+ "AND DATE <= '" + td + "' GROUP BY DATE";
ResultSet rs2 = st.executeQuery(getDailyTotals);
while (rs2.next()){
double dailyTotal = rs2.getDouble(2);
if (dailyTotal > 8){
double dailyOT = dailyTotal-8;
String dailyDate = rs2.getDate(1).toString();
Timestamp maxTime = rs2.getTimestamp(3);
String updateOT = "UPDATE PUNCHES SET OT = " + dailyOT + " WHERE EID = " + nextEID + " AND DATE = '" + dailyDate + "' AND OUT_1 = '" + maxTime + "'";
st.executeUpdate(updateOT);
}
}
}
rs = st.executeQuery("SELECT PUNCHES.EID, EMPLOYEE_DATA.FIRST_NAME, EMPLOYEE_DATA.LAST_NAME, SUM(PUNCHES.OT) FROM PUNCHES "
+ "JOIN EMPLOYEE_DATA ON PUNCHES.EID = EMPLOYEE_DATA.EID WHERE PUNCHES.DATE >= '" + fd + "' AND PUNCHES.DATE <= '" + td + "' GROUP BY EMPLOYEE_DATA.FIRST_NAME, EMPLOYEE_DATA.LAST_NAME, PUNCHES.EID");
Reports.setModel(DbUtils.resultSetToTableModel(rs));
} catch (SQLException ex) {
Logger.getLogger(GUI.class.getName()).log(Level.SEVERE, null, ex);
JOptionPane.showMessageDialog(null, ex);
}
You're new to programming and (obviously) Java. Here are a few recommendations that I can offer you:
Do yourself a favor and learn about PreparedStatement. You should not be creating SQL by concatenating Strings.
You are committing the classic newbie sin of mingling database and UI Swing code into a single, hard to debug lump. Better to decompose your app into layers. Start with a data access interface that encapsulates all the database code. Get that tested and give your UI an instance to work with.
Do not interleave an update query inside the loop over a ResultSet. Better to separate the two completely.
Read about MVC. You'll want your Swing View to be separate from the app Controller. Let the Controller interact with the data access interface, get the results, and give the results to the View for display. Keep them decoupled and separate.
Learn JUnit. It'll help you with testing.
From the java.sql.ResultSet javadoc:
A ResultSet object is automatically closed when the Statement object
that generated it is closed, re-executed, or used to retrieve the next
result from a sequence of multiple results.
After you execute the update, the prior ResultSet is closed. You need to rework your code to account for that.
https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
The easiest way to rework might be to use two Statements, one for the query and one for the update, but as noted in duffymo's answer there's a fair amount more you could do to improve things.
From API's statement documentation "By default, only one ResultSet object per Statement object can be open at the same time. Therefore, if the reading of one ResultSet object is interleaved with the reading of another, each must have been generated by different Statement objects"
You need two different Statements if you want to read two different ResultSet in the same nested loops.

How can I detect the presence of SQL code, in a page of android code?

The purpose of this question, is not to figure out how to prevent SQL Injection attempts.
Instead I would like to know, how to detect lines of SQL code within an
(Android Class) file
Unlike other SQL parser threads I am wanting to work within Android and I am simply after something like a Regex statement which could be used to detect SQL code within the current line
I have an example page of Android code like the below :
String query = "select * from users_table where username = '" + u_username + "' and password = '" + u_password +"'";
SQLiteDatabase db
//some un-important code here...
Cursor c = db.rawQuery( p_query, null );
return c.getCount() != 0;
}
Within which there are a lines of SQL code like:
select * from users_table where username = '" + u_username + "' and password = '" + u_password +"'
I have a function like the below, which reads in an (Android Class) file line by line. Then returns an ArrayList of String.
public ArrayList<String> SQLStatementsfromFile(File fileLocation) throws IOException {
ArrayList<String> SQLStatements = new ArrayList<>();
FileInputStream is;
BufferedReader reader;
if (fileLocation.exists()) {
is = new FileInputStream(fileLocation);
reader = new BufferedReader(new InputStreamReader(is));
String line = reader.readLine();
while (line != null) {
line = reader.readLine();
if (line != null) {
if (!line.isEmpty()) {
if (line.contains(SQL))
SQLStatements.add(line);
}
}
}
}
return SQLStatements;
}
What I would like to know is: are there any sort of possible Regex statements (or other code detection methods) which could be used to detect SQL code, from each line of android code (so it can be added to the ArrayList SQLStatements)?
To be clear, PLEASE do not give me tips on example code I have written above.
Or advice on examples given. Just PLEASE instead attempt to actually answer my question.
I know parameter-ised statements within SQL etc. are much more secure, but this is NOT why I have opened this thread, the above are simply examples of SQL code I would like to be detect (NOT SQL code I plan to utilize).
Thank you :-)
Assuming any valid SQL statement has to start with INSERT INTO, UPDATE, SELECT or DELETEand end with a semicolon ;, you can find sql using the following:
/(?:INSERT INTO|UPDATE|SELECT|DELETE)(?:[^;'"]|(?:'[^']*?')|(?:"[^"]*?"))+;/i
So first off, the sql you wrote is not what I consider valid, as it lacks the ending semicolon.
If you can ensure the synax of sql follows the rules I stated earlier, the ; to end Java code lines shouldn't interfere with this regex.
Without the semicolon, your example code does not match, but insert the semicolon and it does. Example here:
https://regex101.com/r/zB2yJ1/1

Greek Words as Question Marks in java when trying to retrieve field value from MS Access DB [duplicate]

This question already has answers here:
Java ODBC MS-Access Unicode character problems
(2 answers)
Closed 8 years ago.
I am working on a project that i need to create a connection with an MS access database.
The problem is that when i try to retrieve a valuefrom a field written in Greek , it appears every letter as questionmark . Does anyone have any idea how to solve it ?
Below is a part of my code where the problem appears
String KTA1 = KA4prwta.getText() + KA3teleutaia.getText();//KTA
String selectSQL = "SELECT * FROM [" + tablename + " ] WHERE KTA ='" + KTA1 + "'";
try {
PreparedStatement preparedStatement = conn.prepareStatement(selectSQL);
ResultSet rs = preparedStatement.executeQuery();
while (rs.next()) {
ep = rs.getString("EPON");
on = rs.getString("ONOM");
}
Epwnumotf.setText(ep);
Onomatf.setText(on);
} catch (SQLException ex) {
Logger.getLogger(DBinsert.class.getName()).log(Level.SEVERE, null, ex);
}
The issue you have sounds like it's a language encoding issue but without more information, it's hard to suggest a solution.
Where are you 'seeing' these question marks? Wherever it is, you should try using a multi-byte encoded display language and see if that resolves your issue.
E.g. if you're displaying on a website, make sure you're displaying UTF8.

MySQL, Most efficient Way to Load Data from a parsed file

My File has the following format:
Table1; Info
rec_x11;rec_x21;rec_x31;rec_x41
rec_x12;rec_x22;rec_x32;rec_x42
...
\n
Table2; Info
rec_x11;rec_x21;rec_x31;rec_x41
rec_x12;rec_x22;rec_x32;rec_x42
...
\n
Table3; Info
rec_x11;rec_x21;rec_x31;rec_x41
rec_x12;rec_x22;rec_x32;rec_x42
...
Each batch of records starting from the next line after TableX header and ending by an empty line delimiter is about 700-800 lines size.
Each such batch of lines (rec_xyz...) need to be imported into the relevant MyISAM table name indicated in the header of the batch (TableX)
I am familiar with the option to pipeline the stream using shell comands into LOAD DATA command.
I am interested in simple java snipet code which will parse this file and execute LOAD DATA for a single batch of records each time (in a for loop and maybe using seek command).
for now i am trying to use IGNORE LINES to jump over processed records, but i am not familiar if there is an option to ignore lines from BELOW?
is there a more efficient way to parse and load this type of file into DB?
EDIT
I have read that JDBC supports input stream to LOAD DATA starting from 5.1.3, can i use it to iterate over the file with an input stream and change the LOAD DATA statement each time?
I am attaching my code as a solution,
This solution is based on the additional functionality (setLocalInfileInputStream) added by MySQL Connector/J 5.1.3 and later.
I am pipe-lining input-stream into LOAD DATA INTO statement, instead of using direct file URL.
Additional info: I am using BoneCP as a connection pool
public final void readFile(final String path)
throws IOException, SQLException, InterruptedException {
File file = new File(path);
final Connection connection = getSqlDataSource().getConnection();
Statement statement = SqlDataSource.getInternalStatement(connection.createStatement());
try{
Scanner fileScanner = new Scanner(file);
fileScanner.useDelimiter(Pattern.compile("^$", Pattern.MULTILINE));
while(fileScanner.hasNext()){
String line;
while ((line = fileScanner.nextLine()).isEmpty());
InputStream is = new ByteArrayInputStream(fileScanner.next().getBytes("UTF-8"));
String [] tableName = line.split(getSeparator());
setTable((tableName[0]+"_"+tableName[1]).replace('-', '_'));
String sql = "LOAD DATA LOCAL INFILE '" + SingleCsvImportBean.getOsDependantFileName(file) + "' "
+ "INTO TABLE " + SqlUtils.escape(getTable())
+ "FIELDS TERMINATED BY '" + getSeparator()
+ "' ESCAPED BY '' LINES TERMINATED BY '" + getLinefeed() + "' ";
sql += "(" + implodeStringArray(getFields(), ", ") + ")";
sql += getSetClause();
((com.mysql.jdbc.Statement) statement).setLocalInfileInputStream(is);
statement.execute(sql);
}
}finally{
statement.close();
connection.close();
}
}

How to parse data from a text file which contains lots of records and each value is separated by a space/tab or maybe both

I actually need to create a database with 421 columns and I have created it. Now I have to load data into the database using a Java program. The data that I need to input into the DB is present in a text file. The values in the text file are separated by space/tab or may be both. How do I extract data from this text file so that the first value is entered under column1, second value under column 2 and so on.... 421st value under column421 and 422nd value under 1st column again and so on. I am little poor at file handling and parsing in java, so please help.
421 columns? Wow, you probably should split that huge table into a proper model with relationship. But anyway, that's off-topic.
If you really want to do it using Java, here is a possibility:
public static void readFromFile(String pathToFile) {
final BufferedReader reader = getFileReaderInClasspath(pathToFile);
try {
String line = null;
while ((line = reader.readLine()) != null) {
final StringTokenizer tokenizer = new StringTokenizer(line, "\s\t");
final List<String> columns = new ArrayList<String>();
while (tokenizer.hasMoreTokens()) {
columns.add(tokenizer.nextToken());
}
saveIntoDb(columns);
}
reader.close();
} catch (IOException e) {
throw new IllegalArgumentException("Error reading file (" + pathToFile + ")", e);
}
}
And obviously you'll need to implement the saveIntoDb(List) method, that inserts the columns into your database.
Alternatively, you could store each list of columns into a
List<List<String>>
which contains all the rows, and add all of that to the database at the end of the process only.
Why you dont do with sql statement? Below example for mysql, I showed delimiter as tab:
LOAD DATA LOCAL INFILE '/importfile.csv'
INTO TABLE test_table
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
(field1, filed2, field3);
http://dev.mysql.com/doc/refman/5.0/en/load-data.html
edit:
if delimiter is optional maybe you can read via buffered reader an analyze? I didnt check the code but you can write something like this and check text during the loop. I have replaced all tabs to space then I used delimiter as space but if your values containing space then it is problem
BufferedReader br = new BufferedReader(new FileReader("myText.txt"));
String s;
while((s=br.ReadLine)!= null){
s.replace("\t"," ");
String[] sub = s.split(" ");
String statement = "insert into myTable (clm1, clm2, clm3) Values (";
for(int i=0:i<sub.length;i++){
if((i % 420) == 0){
mySqlStatement.ExecuteQuery(statement);
statement = "insert into myTable (clm1, clm2, clm3) Values (";
}
if(i==0){
statement = statement + sub(i);
}else{
statement = statement + "," + sub(i);
}
}
statement = statement + ")";
mySqlStatement.ExecuteQuery(statement);
}

Categories