I'm fairly new to java and have been attempting to read a very difficult .txt file and input it into my MySQL DB.
To me, the file has some very weird delimiting rules. the delimiting seems to be all commas but other parts just do not make any sense. here is a few examples:
" "," "," "," "," "
" ",,,,,,," "
" ",0.00," "
" ",," ",," ",," "
What I do know is that all fields containing letters will be the normal ,"text", format.
all columns that only have numerals will follow this format: ,0.00, except for the first column which follows the normal format "123456789",
Then anything with no data will alternate between ,, or ," ",
I have been able to get the program to read correctly with java.sql.Statement but I need it to work with java.sql.PreparedStatement
I can get it to work with only a few columns selected but I need this to work with 100+ columns and some fields contain commas e.g. "Some Company, LLC"
Here is the code I currently have but I am at a loss as to where to go next.
import java.io.BufferedReader;
import java.io.FileReader;
import java.sql.*;
public class AccountTest {
public static void main(String[] args) throws Exception {
//Declare DB settings
String dbName = "jdbc:mysql://localhost:3306/local";
String userName = "root";
String password = "";
String fileName = "file.txt";
String psQuery = "insert into accounttest"
+ "(account,account_name,address_1,address_2,address_3) values"
+ "(?,?,?,?,?)";
Connection connect = null;
PreparedStatement statement = null;
String account = null;
String accountName = null;
String address1 = null;
String address2 =null;
String address3 = null;
//Load JDBC Driver
try {
Class.forName("com.mysql.jdbc.Driver");
}
catch (ClassNotFoundException e) {
System.out.println("JDBC driver not found.");
e.printStackTrace();
return;
}
//Attempt connection
try {
connect = DriverManager.getConnection(dbName,userName,password);
}
catch (SQLException e) {
System.out.println("E1: Connection Failed.");
e.printStackTrace();
return;
}
//Verify connection
if (connect != null) {
System.out.println("Connection successful.");
}
else {
System.out.println("E2: Connection Failed.");
}
BufferedReader bReader = new BufferedReader(new FileReader(fileName));
String line;
//import file into mysql DB
try {
//Looping the read block until all lines in the file are read.
while ((line = bReader.readLine()) != null) {
//Splitting the content of comma delimited file
String data[] = line.split("\",\"");
//Renaming array items for ease of use
account = data[0];
accountName = data[1];
address1 = data[2];
address2 = data[3];
address3 = data[4];
// removing double quotes so they do not get put into the db
account = account.replaceAll("\"", "");
accountName = accountName.replaceAll("\"", "");
address1 = address1.replaceAll("\"", "");
address2 = address2.replaceAll("\"", "");
address3 = address3.replaceAll("\"", "");
//putting data into database
statement = connect.prepareStatement(psQuery);
statement.setString(1, account);
statement.setString(2, accountName);
statement.setString(3, address1);
statement.setString(4, address2);
statement.setString(5, address3);
statement.executeUpdate();
}
}
catch (Exception e) {
e.printStackTrace();
statement = null;
}
finally {
bReader.close();
}
}
}
Sorry if it's not formatted correctly, I am still learning and after being flustered for several days trying to figure this out, I didn't bother making it look nice.
My question is would something like this be possible with such a jumbled up file? if so, how do I go about making this a possibility? Also, I am not entirely familiar with prepared statements, do I have to declare every single column or is there a simpler way?
Thanks in advance for your help.
EDIT : To clarify what I need is I need to upload a txt file to a MySQL database, I need a way to read and split(unless there is a better way) the data based on either ",", ,,,,, ,0.00, and still keep fields together that have commas in the field Some Company, LLC. I need to do this with 100+ columns and the file varies from 3000 to 6000 rows. Doing this as a prepared statement is required. I'm not sure if this is possible but I appreciate any input anyone might have on the matter.
EDIT2 : I was able to figure out how to get the messy file sorted out thanks to rpc1. instead of String data[] = line.split("\",\""); I used String data[] = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)"); I still had to write out each variable to link it to the data[] then write out each statement.setString for each column as well as write the replaceALL("\"", ""); for each column but I got it working and I couldn't find another way to use prepared statements. Thank you for all your help!
You can cycles
for example:
String psQuery = "insert into accounttest"
+ "(account,account_name,address_1,address_2,address_3,..,adrress_n) values"
+ "(?,?,?,?,?,?,..,?)"; //you have to put m=n+2 values
.....
//you can change separator
String data[] = line.replace("\",\"",";").replace("\"","").split(";");
for(int i=0;i<m;i++)
{
if(i<data.length) //if index smaller then array siz
statement.setString(i+1, data[i]);
else
statement.setString(i+1, ""); //put null
}
statement.executeUpdate();
P.S. if your csv file large use batch insert (addBatch())
and use Pattern to split string
Pattern p = Pattern.compile(";","");
p.split(st);
EDIT
Try this split function
private static Pattern pSplit = Pattern.compile("[^,\"']+|\"([^\"]*)\"|'([^']*)'"); //set pattern as global var
private static Pattern pReplace = Pattern.compile("\"");
public static Object[] split(String st)
{
List<String> list = new ArrayList<String>();
Matcher m = pSplit.matcher(st);
while (m.find())
list.add( pReplace.matcher(m.group(0)).replaceAll("")); // Add .replace("\"", "") to remove surrounding quotes.
return list.toArray();
}
for example
intput string: st="\"1212\",\"LL C ,DDD \",\"CA, SPRINGFIELD\",232.11,3232.00";
split on 5 item array:
1212
LL C ,DDD
CA, SPRINGFIELD
232.11
3232.00
EDIT2
this example solves all your problems (even empty values)
private static Pattern pSplit = Pattern.compile(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
public static String[] split2(String st)
{
String[] tokens = pSplit.split(st);
return tokens;
}
I was able to figure out both issues that I was having by this little bit of code. Again, thanks for all of your help!
for (String line = bReader.readLine(); line != null; line = bReader.readLine()) {
//Splitting the content of comma delimited file
String data[] = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
//Iterating through the file and updating the table.
statement = connect.prepareStatement(psQuery);
for (int i =0; i < data.length;i++) {
temp = data[i];
temp = temp.replaceAll("\"", "");
statement.setString(i+1, temp);
}
statement.executeUpdate();
}
Related
it seems simple to Print value of Variable in java but i am unable to do this properly.
i have Mysql table that contains first names "fname" & last names "lname". after connecting to my sql i fetch these values and store in variables. then problem starts... here is and here is my code
package signup;
import java.sql.*;
import java.util.Random;
import org.openqa.selenium.Keys;
public class Signup {
private static final String db_connect = "jdbc:mysql://localhost/test1" ;
private static final String uname = "username" ;
private static final String pass = "password" ;
private Connection conMethod(){
Connection conVar = null;
try{Class.forName("com.mysql.jdbc.Driver");conVar = DriverManager.getConnection(db_connect,uname,pass);}catch(SQLException e){e.printStackTrace();
}catch(ClassNotFoundException e){e.printStackTrace();}return conVar;}
public void selectMethod(){Statement query = null;
ResultSet rs1 = null;
Connection conVar2= conMethod();try{query = conVar2.createStatement();
rs1 = query.executeQuery("Select * from fnames2");
String[] fname=new String[500]; String[] lname=new String[500];
int a=0;while(rs1.next()){fname[a]=rs1.getString(2); lname[a]=rs1.getString(3); a++;}
String firstname = fname[1];
String lastname = lname[1];
String fullname = firstname+" "+lastname;
String email = firstname+lastname+"#yahoo.com";
System.out.println("first name is "+firstname);
System.out.println("last name is "+lastname);
System.out.println("full name is "+fullname);
System.out.println("email is "+email);
} catch(SQLException e){e.printStackTrace();}
}
public static void main (String args[]){Signup obj = new Signup();obj.selectMethod();}
}
and here is its out put
first name is PATRICIA
last name is ALISHA
full name is PATRICIA ALISHA
#yahoo.comATRICIAALISHA
you can see problem is in email variable. it should print PATRICIAALISHA#yahoo.com but it is printing something "#yahoo.comATRICIAALISHA" . Thanks
The output is consistent with lastname being "ALISHA\r". What happens is that when you print it (depending on your OS), the \r character causes the cursor to go back to the beginning of the line. This has no effect on the appearance of the output in the cases where you print "last name is" or "full name is", since the cursor will just go to the next line anyway. But it causes email to be "PATRICIAALISHA\r#yahoo.com", which means that after it outputs email is PATRICIAALISHA, the cursor goes back to the beginning of the line and overwrites what's already there with #yahoo.com, which is just enough to overwrite the text up through the P.
The program that I am writing is in Java.
I am attempting to make my program read the file "name.txt" and store the values of the text file in an array.
So far I am using a text file that will be read in my main program, a service class called People.java which will be used as a template for my program, and my main program called Names.java which will read the text file and store its values into an array.
name.txt:
John!Doe
Jane!Doe
Mike!Smith
John!Smith
George!Smith
People.java:
public class People
{
String firstname = " ";
String lastname = " ";
public People()
{
firstname = "First Name";
lastname = "Last Name";
}
public People(String firnam, String lasnam)
{
firstname = firnam;
lastname = lasnam;
}
}
Names.java:
import java.io.File;
import java.io.IOException;
import java.util.Scanner;
import java.util.StringTokenizer;
public class Names
{
public static void main(String[]args)
{
String a = " ";
String b = "empty";
String c = "empty";
int counter = 0;
People[]peoplearray=new People[5];
try
{
File names = new File("name.txt");
Scanner read = new Scanner(names);
while(read.hasNext())
{
a = read.next();
StringTokenizer token = new StringTokenizer("!", a);
while(token.hasMoreTokens())
{
b = token.nextToken();
c = token.nextToken();
}
People p = new People(b,c);
peoplearray[counter]=p;
++counter;
}
}
catch(IOException ioe1)
{
System.out.println("There was a problem reading the file.");
}
System.out.println(peoplearray[0]);
}
}
As I show in my program, I tried to print the value of peoplearray[0], but when I do this, my output reads: "empty empty" which are the values I gave String b and String c when I instantiated them.
If the program were working corrrectly, the value of peoplearray[0] should be, "John Doe" as those are the appropriate values in "names.txt"
What can I do to fix this problem?
Thanks!
StringTokenizer(String str, String delim)
is the constructor of StringTokenizer.
You have written it wrong .
Just change your line
StringTokenizer token = new StringTokenizer("!", a); to
StringTokenizer token = new StringTokenizer(a, "!");
Just change it a little bit
StringTokenizer token = new StringTokenizer(a, "!");
while(token.hasMoreTokens())
{
b = token.nextToken();
c = token.nextToken();
}
//do something with them
I am trying to get the values out of String[] value; into String lastName;, but I get errors and it says java.lang.ArrayIndexOutOfBoundsException: 2
at arduinojava.OpenFile.openCsv(OpenFile.java:51) (lastName = value[2];). Here is my code, but I am not sure if it is going wrong at the split() or declaring the variables or getting the data into another variable.
Also I am calling input.next(); three times for ignoring first row, because otherwise of study of Field of study would also be printed out..
The rows I am trying to share are in a .csv file:
University Firstname Lastname Field of study
Karlsruhe Jerone L Software Engineering
Amsterdam Shahin S Software Engineering
Mannheim Saman K Artificial Intelligence
Furtwangen Omid K Technical Computing
Esslingen Cherelle P Technical Computing
Here's my code:
// Declare Variable
JFileChooser fileChooser = new JFileChooser();
StringBuilder sb = new StringBuilder();
// StringBuilder data = new StringBuilder();
String data = "";
int rowCounter = 0;
String delimiter = ";";
String[] value;
String lastName = "";
/**
* Opencsv csv (comma-seperated values) reader
*/
public void openCsv() throws Exception {
if (fileChooser.showOpenDialog(null) == JFileChooser.APPROVE_OPTION) {
// Get file
File file = fileChooser.getSelectedFile();
// Create a scanner for the file
Scanner input = new Scanner(file);
// Ignore first row
input.next();
input.next();
input.next();
// Read from input
while (input.hasNext()) {
// Gets whole row
// data.append(rowCounter + " " + input.nextLine() + "\n");
data = input.nextLine();
// Split row data
value = data.split(String.valueOf(delimiter));
lastName = value[2];
rowCounter++;
System.out.println(rowCounter + " " + data + "Lastname: " + lastName);
}
input.close();
} else {
sb.append("No file was selected");
}
}
lines are separated by spaces not by semicolon as per your sample. Try in this way to split based on one or more spaces.
data.split("\\s+");
Change the delimiter as shown below:
String delimiter = "\\s+";
EDIT
The CSV file should be in this format. All the values should be enclosed inside double quotes and there should be a valid separator like comma,space,semicolon etc.
"University" "Firstname" "Lastname" "Field of study"
"Karlsruhe" "Jerone" "L" "Software Engineering"
"Amsterdam" "Shahin" "S" "Software Engineering"
Please check if you file is using delimiter as ';' if not add it and try it again, it should work!!
Use OpenCSV Library for read CSV files .Here is a detailed example on read/write CSV files using java by Viral Patel
I have extracted multiple data from an HTML using Jsoup and now I am trying to insert one by one into a derby db using JDBC on netbeans.
Here is my code:
public String nameOf() {
String nameStr = null;
String nameResults = "";
for(int j=100;j<=110;j++) {
refNum = j;
//System.out.println("Reference Number: " + refNum);
try {
//crawl and parse HTML from definition and causes page
Document docDandC = Jsoup.connect("http://www.abcd.edu/encylopedia/article/000" + refNum + ".htm").get();
// scrape name data
Elements name = docDandC.select("title");
nameStr = name.get(0).text();
//System.out.println(nameStr);
nameResults += nameStr + " ";
} catch (Exception e) {
//System.out.println("Reference number " + refNum + " does not exist.");
}
}
return nameResults;
So this method takes the names of diseases from 10 different HTMLs. What I am trying to do is to insert one name at a time to a derby db that I have created using JDBC. I have everything set up and all I have left to do is to insert each name in the corresponding name field of a table named DISEASE (which has fields: id, name, etc).
nameResults += nameStr + " ";
This part worries me as well since some diseases can have multiple words. Maybe I should use a list of some sort?
Please help! Thanks in advance.
Something like:
public List<String> nameOf() {
...
List<String> nameResults = new ArrayList<String>();
...
nameResults.add(nameStr);
...
return nameResults;
I want to filter a string.
Basically when someone types a message, I want certain words to be filtered out, like this:
User types: hey guys lol omg -omg mkdj*Omg*ndid
I want the filter to run and:
Output: hey guys lol - mkdjndid
And I need the filtered words to be loaded from an ArrayList that contains several words to filter out. Now at the moment I am doing if(message.contains(omg)) but that doesn't work if someone types zomg or -omg or similar.
Use replaceAll with a regex built from the bad word:
message = message.replaceAll("(?i)\\b[^\\w -]*" + badWord + "[^\\w -]*\\b", "");
This passes your test case:
public static void main( String[] args ) {
List<String> badWords = Arrays.asList( "omg", "black", "white" );
String message = "hey guys lol omg -omg mkdj*Omg*ndid";
for ( String badWord : badWords ) {
message = message.replaceAll("(?i)\\b[^\\w -]*" + badWord + "[^\\w -]*\\b", "");
}
System.out.println( message );
}
try:
input.replaceAll("(\\*?)[oO][mM][gG](\\*?)", "").split(" ")
Dave gave you the answer already, but I will emphasize the statement here. You will face a problem if you implement your algorithm with a simple for-loop that just replaces the occurrence of the filtered word. As an example, if you filter the word ass in the word 'classic' and replace it with 'butt', the resultant word will be 'clbuttic' which doesn't make any sense. Thus, I would suggest using a word list,like the ones stored in Linux under /usr/share/dict/ directory, to check if the word is valid or it needs filtering.
I don't quite get what you are trying to do.
I ran into this same problem and solved it in the following way:
1) Have a google spreadsheet with all words that I want to filter out
2) Directly download the google spreadsheet into my code with the loadConfigs method (see below)
3) Replace all l33tsp33k characters with their respective alphabet letter
4) Replace all special characters but letters from the sentence
5) Run an algorithm that checks all the possible combinations of words within a string against the list efficiently, note that this part is key - you don't want to loop over your ENTIRE list every time to see if your word is in the list. In my case, I found every combination within the string input and checked it against a hashmap (O(1) runtime). This way the runtime grows relatively to the string input, not the list input.
6) Check if the word is not used in combination with a good word (e.g. bass contains *ss). This is also loaded through the spreadsheet
6) In our case we are also posting the filtered words to Slack, but you can remove that line obviously.
We are using this in our own games and it's working like a charm. Hope you guys enjoy.
https://pimdewitte.me/2016/05/28/filtering-combinations-of-bad-words-out-of-string-inputs/
public static HashMap<String, String[]> words = new HashMap<String, String[]>();
public static void loadConfigs() {
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(new URL("https://docs.google.com/spreadsheets/d/1hIEi2YG3ydav1E06Bzf2mQbGZ12kh2fe4ISgLg_UBuM/export?format=csv").openConnection().getInputStream()));
String line = "";
int counter = 0;
while((line = reader.readLine()) != null) {
counter++;
String[] content = null;
try {
content = line.split(",");
if(content.length == 0) {
continue;
}
String word = content[0];
String[] ignore_in_combination_with_words = new String[]{};
if(content.length > 1) {
ignore_in_combination_with_words = content[1].split("_");
}
words.put(word.replaceAll(" ", ""), ignore_in_combination_with_words);
} catch(Exception e) {
e.printStackTrace();
}
}
System.out.println("Loaded " + counter + " words to filter out");
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* Iterates over a String input and checks whether a cuss word was found in a list, then checks if the word should be ignored (e.g. bass contains the word *ss).
* #param input
* #return
*/
public static ArrayList<String> badWordsFound(String input) {
if(input == null) {
return new ArrayList<>();
}
// remove leetspeak
input = input.replaceAll("1","i");
input = input.replaceAll("!","i");
input = input.replaceAll("3","e");
input = input.replaceAll("4","a");
input = input.replaceAll("#","a");
input = input.replaceAll("5","s");
input = input.replaceAll("7","t");
input = input.replaceAll("0","o");
ArrayList<String> badWords = new ArrayList<>();
input = input.toLowerCase().replaceAll("[^a-zA-Z]", "");
for(int i = 0; i < input.length(); i++) {
for(int fromIOffset = 1; fromIOffset < (input.length()+1 - i); fromIOffset++) {
String wordToCheck = input.substring(i, i + fromIOffset);
if(words.containsKey(wordToCheck)) {
// for example, if you want to say the word bass, that should be possible.
String[] ignoreCheck = words.get(wordToCheck);
boolean ignore = false;
for(int s = 0; s < ignoreCheck.length; s++ ) {
if(input.contains(ignoreCheck[s])) {
ignore = true;
break;
}
}
if(!ignore) {
badWords.add(wordToCheck);
}
}
}
}
for(String s: badWords) {
Server.getSlackManager().queue(s + " qualified as a bad word in a username");
}
return badWords;
}