Read a csv that has double quotes with comma inside - java

UPDATE: WORKING SOLUTION POSTED BELOW
I'm trying to process a csv file and I'm splitting it by comma. However, there are couple places with quotes that has comma embedded.
Example: "# 29. Toxic substances properly identified, stored, used"
Every quote that has a comma in there is wrapped around with " ", is there a way to detect this double quotes and work around the commas?
Thanks!
Original code:
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.FileWriter;
import java.io.PrintWriter;
public class csvFileReader {
public static void main(String[] args) {
String csvFile = "/Users/zzmle/Desktop/data.csv";
BufferedReader br = null;
String line = "";
String cvsSplitBy = ",";
int count=0;
try {
br = new BufferedReader(new FileReader(csvFile));
String firstline = br.readLine();
String[] header = firstline.split(",");
while ((line = br.readLine()) != null && count<10) {
//comma is the separator
String[] Restaurant = line.split(cvsSplitBy);
for (int i=0; i<header.length; i++) {
System.out.println(header[i]+": "+Restaurant[i]+" ");
}
System.out.println("-------------------");
count++;
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
Working Solution:
// #author Zhiming Zhao
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.FileWriter;
import java.io.PrintWriter;
public class csvFileReader {
public static void main(String[] args) {
String csvFile = "data.csv";
BufferedReader br = null;
String line = "";
String cvsSplitBy = ",";
int count=0;
try {
br = new BufferedReader(new FileReader(csvFile));
String firstline = br.readLine();
String[] header = firstline.split(cvsSplitBy);
while ((line = br.readLine()) != null && count<10) { //count<10 is for testing purposes
String[] Restaurant = line.split(cvsSplitBy); //comma is the separator
process(Restaurant); //this is to deal with the commas within quotation marks (which split the elements and shifts them into the wrong places)
//this part prints the header + restaurant for the first ten lines
for (int i=0; i<header.length; i++) {
System.out.println(header[i]+": "+Restaurant[i]+" ");
}
System.out.println("-------------------");
count++;
}
} catch (FileNotFoundException e) {
e.printStackTrace();
System.out.println("The file cannot be found, check if the file is under root directory");
} catch (IOException e) {
e.printStackTrace();
System.out.println("Input & Output operations error");
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
// #brief This function specifically deal with the issue of commas within the quotation marks
// #detail it gets the index number of the 2 elements containing the quotation marks, then concats them all. It works with multiple quotation marks on the same line
public static void process(String[] data) {
int index1 = -1; //initialize the index of the first ", -1 for empty
int index2 = 0; //initialize the index of the second ", 0 for empty
for (int i=0; i<data.length; i++) {
if (String.valueOf(data[i].charAt(0)).equals("\"") && index1 == -1) { //if index1 is not empty and the first char of current element is "
index1 = i; //set index1 to current index number
}
if (String.valueOf(data[i].charAt(data[i].length()-1)).equals("\"") && index1 != -1) { //if index1 is not empty and the last char of current element is "
index2 = i; //set index2 to current index number
multiconcat(index1, index2, data); //concat the elements between index1 and index2
data = multidelet(index1+1, index2, data); //delete the elements that were copied (index1+1:index2)
i -= (index2-index1); //this is to reset the cursor back to index1 (could be replaced with i = index1)
index1 = -1; //set index1 to empty
}
}
}
// #brief Copy all elements between index1 and index2 to index1, doesn't return anything
public static void multiconcat(int index1, int index2, String[] data) {
for (int i=index1+1; i<=index2; i++) {
data[index1] += data[i];
}
}
// #brief Deletes the elements between index1+1 and index2
public static String[] multidelet(int index1, int index2, String[] data) {
String[] newarr = new String[data.length-(index2-index1+1)];
int n = 0;
for (int i=0; i<data.length; i++) {
if (index1 <= i && i <= index2) continue;
newarr[n] = data[i];
n++;
}
return newarr;
}
}
The csv file
Output (one of the lines with quotation mark and comma embedded), although it's not perfect (the comma within quotation mark got eaten), it's a minor issue and I'm too lazy to fix it lol :
serial_number: DA08R0TCU
activity_date: 03/30/2018 12:00:00 AM
facility_name: KRUANG TEDD
violation_code: F035
violation_description: "# 35. Equipment/Utensils - approved; installed; clean; good repair capacity"
violation_status: capacity"
points: OUT OF COMPLIANCE
grade: 1
facility_address: A
facility_city: 5151 HOLLYWOOD BLVD
facility_id: LOS ANGELES
facility_state: FA0064949
facility_zip: CA
employee_id: 90027
owner_id: EE0000857
owner_name: OW0001034
pe_description: 5151 HOLLYWOOD LLC
program_element_pe: RESTAURANT (31-60) SEATS HIGH RISK
program_name: 1635
program_status: KRUANG TEDD
record_id: ACTIVE
score: PR0031205
service_code: 92
service_description: 1
row_id: ROUTINE INSPECTION ```

My own solution:
Read the first character of each element, if the first character is a double quote, concat this and the next ones (will need to use recursion for this) until there's an element with a double quote as the last character.
This will run considerably faster than reading char by char, as suggested by JGFMK.
And I am not allowed to use external libraries for this project.
STILL IMPLEMENTING THIS, I will update if it works
EDIT: Working solution posted in original post

don't reinvent the wheel: there are libs to read csv around e.g. http://commons.apache.org/proper/commons-csv/
http://opencsv.sourceforge.net/
https://code.google.com/archive/p/jcsv/

Related

I want to read a file and also check a word whether the word is present in the file or not. If the word is present one of my method will return +1

This is my code. I want to read a file called "write.txt" and then once it reads. Compare it with a word, here I use "target variable(of string type) once the comparison is done inside the method called findTarget it will return 1 after the condition is true. I try to call the method but I keep getting an error. test.java:88: error: cannot find symbol
String testing = findTarget(target1, source1);
^
symbol: variable target1
location: class test
1 error
can someone correct my mistake. I am quite new to programming.
import java.util.*;
import java.io.*;
public class test {
public static int findTarget( String target, String source )
{
int target_len = target.length();
int source_len = source.length();
int add = 0;
for(int i = 0;i < source_len; ++i) // i is an varialbe used to count upto
source_len.
{
int j = 0; // take another variable to count loops
while(add == 0)
{
if( j >= target_len ) // count upto target length
{
break;
}
else if( target.charAt( j ) != source.charAt( i + j ) )
{
break;
}
else
{
++j;
if( j == target_len )
{
add++; // this will return 1: true
}
}
}
}
return add;
//System.out.println(""+add);
}
public static void main ( String ... args )
{
//String target = "for";
// function 1
try
{
// read the file
File file = new File("write.txt"); //establising a file object
BufferedReader br = new BufferedReader(new FileReader(file));
//reading the files from the file object "file"
String target1;
while ((target1 = br.readLine()) != null) //as long the condition is not null it will keep printing.
System.out.println(target1);
//target.close();
}
catch (IOException e)
{
System.out.println("file error!");
}
String source1 = "Searching for a string within a string the hard way.";
// function 2
test ob = new test();
String testing = findTarget(target1, source1);
// end
//System.out.println(findTarget(target, source));
System.out.println("the answer is: "+testing);
}
}
The error is because findTarget is a class function.
So, where you have this:
test ob = new test();
String testing = findTarget(target1, source1);
...should be changed to call the function from a static context:
//test ob = new test(); not needed, the function is static
int testing = test.findTarget(target1, source1);
// also changed the testing type from String to int, as int IS findTarget's return type.
I don't have your file contents to give a trial run, but that should at least help get past the error.
=====
UPDATE:
You are close!
Inside main, change the code at your loop so that it looks like this:
String target1;
int testing = 0; // move and initialize testing here
while ((target1 = br.readLine()) != null) //as long the condition is not null it will keep printing.
{
//System.out.println(target1);
testing += test.findTarget(target1, source1);
//target1 = br.readLine();
}
System.out.println("answer is: "+testing);
I have finally been able to solve my problem. but extending the functionalities. I want to increment the add by 1. but in my programming, it keeps giving me output as
answer is: 1 answer is: 1
instead I want my program to print not two 1's rather 1+1 = 2
can someone fix this incrementing problem?
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.*;
public class test {
public static int findTarget(String target, String source) {
int target_len = target.length();
int source_len = source.length();
int add = 0;
// this function checks the character whether it is present.
for (int i = 0; i < source_len; ++i) // i is a varialbe used to count upto source_len.
{
int j = 0; // take another variable to count loops
while (add == 0)
{
if (j >= target_len) // count upto target length
{
break;
}
else if (target.charAt(j) != source.charAt(i + j))
{
break;
}
else
{
++j;
if (j == target_len)
{
add++; // this will return 1: true
}
}
}
}
return add;
//System.out.println(""+add);
}
public static void main(String... args) {
//String target = "for";
// function 1
try {
// read the file
Scanner sc = new Scanner(System.in);
System.out.println("Enter your review: ");
String source1 = sc.nextLine();
//String source1 = "Searching for a string within a string the hard way.";
File file = new File("write.txt"); //establising a file object
BufferedReader br = new BufferedReader(new FileReader(file)); //reading the files from the file object "file"
String target1;
while ((target1 = br.readLine()) != null) //as long the condition is not null it will keep printing.
{
//System.out.println(target1);
int testing = test.findTarget(target1, source1);
System.out.println("answer is: "+testing);
//target1 = br.readLine();
}
br.close();
}
catch (IOException e)
{
System.out.println("file error!");
}
}
}

ArrayIndexOutOfBoundsException generated when br.readLine() = ","

The code below has several functions which allow for things such as writing data to a document, reading it and putting the data in an array for a JTable later down the line.
package tabletest.populatetable;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
public class GetData {
DataClass[] data = new DataClass[500];
int nextPosition = 0;
public GetData() {
readData();
}
public void writeData()
{
try {
BufferedWriter bw = new BufferedWriter(new FileWriter(new File("resources/tabledata.txt")));
for(int i=0; i < nextPosition; i++) {
bw.write(data[i].toString());
bw.newLine();
}
bw.close();
} catch(Exception e) {
System.out.println("Invalid Data input");
}
}
public void readData()
{
try {
BufferedReader br = new BufferedReader(new FileReader(new File("resources/tabledata.txt")));
String nextData = br.readLine();
String[] arrayStringData;
while (nextData != null) {
try {
arrayStringData = nextData.split(",");
} catch(NullPointerException nPE) {
arrayStringData = new String[] {" ", " "};
}
for(int i = 0; i < arrayStringData.length - 1; i++) {
if(arrayStringData[i] == null || arrayStringData[i] == "") {
arrayStringData[i] = " ";
}
}
DataClass getData = new DataClass();
getData.col1 = arrayStringData[0].trim();
getData.col2 = arrayStringData[1].trim();
data[nextPosition] = getData;
nextPosition++;
nextData = br.readLine();
}
br.close();
} catch(Exception e) {
e.printStackTrace();
}
}
public String[][] dataInTableForm() {
final int colCount = 2;
String[][] temp = new String[nextPosition][colCount];
for(int i = 0; i < nextPosition; i++) {
temp[i][0] = data[i].col1;
temp[i][1] = data[i].col2;
}
return temp;
}
private class DataClass {
String col1;
String col2;
public String toString() {
return col1 + ", " + col2;
}
}
}
The document which it is reading, resources/tabledata.txt, is 12 lines long and it looks like this.
asfias, adsnj
aw,aerfae
aw,aewaa
,tre
asfd,
okfas,af
e,ds
sw,f
,
asfias, adsnj
aw,aerfae
aw,aewaa
The problem is on line 9 of the text document. This is where it is just a , on its own. When there is something before or after the comma this seems to work fine and I checked by removing the line to see if it definitely was the comma causing the problem.
When I looked at the console I discovered the problem was a ArrayIndexOutOfBoundsException and the stack trace is below.
java.lang.ArrayIndexOutOfBoundsException: 0
at tabletest.populatetable.GetData.readData(GetData.java:55)
at tabletest.populatetable.GetData.<init>(GetData.java:14)
at tabletest.Table.createTablePanel(Table.java:76)
at tabletest.Table.createPanels(Table.java:34)
at tabletest.Table.runGUI(Table.java:24)
at tabletest.Table.main(Table.java:150)
Line 55 of the code is getData.col1 = arrayStringData[0].trim();
As you can see in the code I attempted several things to prevent this occurring but I have had no luck. I also tried removing the .trim() from the end of the line; however, exactly the same problem occurs.
I would appreciate any help in fixing this problem.
Javadoc of split(String regex) says:
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
So, ",".split(",") will return an empty array, i.e. new String[0].
If you want to keep trailing empty strings, use ",".split(",", -1), which will return new String[] { "", "" }.
The split function will cut out the "," delimiter and return an empty array.
Your try-catch block is catching a null pointer exception if the nextData array is null, not if the output array is empty.
The for loop would only work if the output array contains null strings (which it doesn't).
String.split yanks the delimiter out of the resulting array. So, for a string with just "," your array would be of length 0 (the element at index 0 does not exist).
What you need to do is check whether the length is greater than 0. You can do this using the code arrayStringData.length > 0 in an if statement before trying to access arrayStringData

Eliminate the "\u3000" error in java

When I try to compile a java file, the compiler said "illegal character \u3000",
after searching, I find it is CJK Unified Ideographs
Chinese Korean and Japanese's SPACE. Instead of deleting the special SPACE manually, I decide to code a simple search-and-deleting java file to eliminate it.
However It doesnot point out the index error.
So how to write a code to eliminate this special SPACE
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.File;
import java.io.IOException;
import java.util.*;
public class BufferReadAFile {
public static void main(String[] args) {
//BufferedReader br = null;
String sCurrentLine;
String message = "";
try {
/*br = new BufferedReader(new FileReader("/Users/apple/Test/Instance1.java"));
while ((sCurrentLine = br.readLine()) != null) {
message += sCurrentLine;
}
*/
String content = new Scanner(new File("/Users/apple/Coding/Instance1.java")).useDelimiter("\\Z").next();
//System.out.println(content);
searchSubString(content.toCharArray(),"\\u3000".toCharArray());
} catch (IOException e) {
e.printStackTrace();
}
}
public static void searchSubString(char[] text, char[] ptrn) {
int i = 0, j = 0;
// pattern and text lengths
int ptrnLen = ptrn.length;
int txtLen = text.length;
// initialize new array and preprocess the pattern
int[] b = preProcessPattern(ptrn);
while (i < txtLen) {
while (j >= 0 && text[i] != ptrn[j]) {
j = b[j];
}
i++;
j++;
// a match is found
if (j == ptrnLen) {
System.out.println("found substring at index:" + (i - ptrnLen));
j = b[j];
}
}
}
public static int[] preProcessPattern(char[] ptrn) {
int i = 0, j = -1;
int ptrnLen = ptrn.length;
int[] b = new int[ptrnLen + 1];
b[i] = j;
while (i < ptrnLen) {
while (j >= 0 && ptrn[i] != ptrn[j]) {
// if there is mismatch consider the next widest border
// The borders to be examined are obtained in decreasing order from
// the values b[i], b[b[i]] etc.
j = b[j];
}
i++;
j++;
b[i] = j;
}
return b;
}
}
I don't think "\\u3000" is what you want. You can print out the string and see the content yourself. You should use "\u3000" instead. Note the single back slash.
System.out.println("\\u3000"); // This prints out \u3000
System.out.println("\u3000"); // This prints out the CJK space
Alternatively, you could just use the actual CJK space character directly as in one of the if checks in your CheckEmpty class.
In my Question, I am trying to use KMP alogrithm to search the index of a pattern in my java file
if we use "\\u3000".toCharArray() the compiler will look through each character. Which is not what we want. \\u3000 is an special white space. It is FULL-WIDTH space that only existed in Chinese Korean and Japanese languages.
If we trying to write sentence by using the FULL-WIDTH Space. It will look like:
Here is Full-width demonstration.
Very distinctive space. but is not so visible in java file. It inspire me to write the code below
import java.util.*;
import java.io.*;
public class CheckEmpty{
public static void main(String []args){
try{
String content = new Scanner(new File("/Users/apple/Coding/Instance1.java")).useDelimiter("\\Z").next();
if(content.contains(" ")){
System.out.println("English Space");
}
if(content.contains("\\u3000")){
System.out.println("Backslash 3000");
}
if(content.contains(" ")){// notice the space is a SPECIAL SPACE
System.out.println("C J K fullwidth");
//Chinese Japanese Korean white space
}
}catch(FileNotFoundException e){
e.printStackTrace();
}
}
}
As expected, the result shows:
which means the java file contains both the normal and full-width Space.
After that I am thinking to write another java file to delete all the special space:
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.File;
import java.io.PrintWriter;
import java.io.IOException;
import java.util.*;
public class DeleteTheSpecialSpace {
public static void main(String[] args) {
//BufferedReader br = null;
String sCurrentLine;
String message = "";
try {
String content = new Scanner(new File("/Users/apple/Coding/Instance1.java")).useDelimiter("\\Z").next();
content.replaceAll(" ",""); // notice the left parameter is a SPECIAL SPACE
//System.out.println(content);
PrintWriter out = new PrintWriter( "/Users/apple/Coding/Instance1.java" );
out.println(content);
} catch (IOException e) {
e.printStackTrace();
}
}
}
Finally: amazing things happen, There is no error in "Instance1.java", since all full-width space have been eliminated
Compile SUCCESS :)

Splitting an arraylist and textfile

What I'm trying to do is add numbers from a .txt file and split it by ; into my ArrayList listR2. As of now it semi works, however the result is that only the last 2 persons score is added, the first persons score is just getting null.
Is it some problem with my split?
Any ideas how i get the program to write all the scores?
It is skipping lines (from your file) in your code because you have used
for (int i = 3; i < itemStudent.length; i++) {
String test = studin.readLine(); //<--- this is the error
listR2.add(test);
}
Instead use
String test = itemStudent[i]; // to add the scores into the listR2
First, your code:
BufferedReader studin = new BufferedReader(new FileReader(studentFile));
grader.Student student;
student = new Student();
String line, eNamn, fNamn, eMail;
ArrayList<String> listR = new ArrayList<String>();
ArrayList<String> listR2 = new ArrayList<String>();
//loop for the file and setters for first, lastname and email
while ((line = studin.readLine()) != null) {
if (line.contains(";")) {
//# you don't need regex to split on a single specific character
String[] itemStudent = line.split("[;]");
eNamn = itemStudent[0];
fNamn = itemStudent[1];
eMail = itemStudent[2];
//#why are you using the Student object if you never use it in any way ?
//#also you are always updating the same "Student". if you expect to add it to say an ArrayList,
//#you need to declare a new student at the beginning of the loop (not outside of it)
student.setFirstName(fNamn);
student.setLastName(eNamn);
student.setEmail(eMail);
//Loop for the sum of the tests
Integer sum = 0; //# why Interger, the "int" primitive is more than sufficient
for (int index = 3; index < itemStudent.length; index++) {
try {
sum += Integer.parseInt(itemStudent[index]);
listR.add(itemStudent[index]);
} catch (Exception ex) {} //very bad practice, nerver silently drop exceptions.
}
//# that part is just wrong in many ways, I guess it's some left over debug/testing code
//# this also makes you skip lines as you will read as many lines as you have elements (minus 3) in itemStudent
/*
for (int i = 3; i < itemStudent.length; i++) {
String test = studin.readLine();
listR2.add(test);
}
*/
System.out.println(eNamn + " " + fNamn + " " + eMail + " SUMMA:" + sum + " " );
//# you'll get a nice pointer address, but not it's values, you need to itterate the list to view it's content
System.out.println(listR2);
}
}
The //# mark my comments
and here a quick example displaying the object approach:
(may contains misspells/missing imports but otherwise should be fine the compiler should will you). to run it:
java Main "your_file"
import java.util.ArrayList;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class Main {
public static void main(String[] args) {
class Student{
String fname;
String lname;
String mail;
int sum;
Student(String fn,String ln,String ml){
fname=fn;
lname=ln;
mail=ml;
sum=0;
}
void addScore(int n){
sum += n;
}
public String toString() {
return "Student: "+fname+" "+lname+", "+mail+" sum: "+sum;
}
}
try {
BufferedReader br = new BufferedReader(new FileReader(args[0]));
ArrayList<Student> stdnts = new ArrayList<Student>();
String line = br.readLine();
while (line != null) {
if (line.contains(";")) {
String[] stdnt_arr = line.split(";");
Student stdnt = new Student(stdnt_arr[0],stdnt_arr[1],stdnt_arr[2]);
for (int i = 3;i<stdnt_arr.length;i++){
try {
stdnt.addScore(Integer.parseInt(stdnt_arr[i]));
} catch (NumberFormatException e) {
//not a number
e.printStackTrace();
}
}
stdnts.add(stdnt);
System.out.println(stdnt.toString());
}
line = br.readLine();
}
} catch(IOException e){
//things went wrong reading the file
e.printStackTrace();
}
}
}

Calculate number of words in an ArrayList while some words are on the same line

I'm trying to calculate how many words an ArrayList contains. I know how to do this if every words is on a separate line, but some of the words are on the same line, like:
hello there
blah
cats dogs
So I'm thinking I should go through every entry and somehow find out how many words the current entry contains, something like:
public int numberOfWords(){
for(int i = 0; i < arraylist.size(); i++) {
int words = 0;
words = words + (number of words on current line);
//words should eventually equal to 5
}
return words;
}
Am I thinking right?
You should declare and instantiate int words outside of the loop the int is not reassign during every iteration of the loop. You can use the for..each syntax to loop through the list, which will eliminate the need to get() items out of the list. To handle multiple words on a line split the String into an Array and count the items in the Array.
public int numberOfWords(){
int words = 0;
for(String s:arraylist) {
words += s.split(" ").length;
}
return words;
}
Full Test
public class StackTest {
public static void main(String[] args) {
List<String> arraylist = new ArrayList<String>();
arraylist.add("hello there");
arraylist.add("blah");
arraylist.add(" cats dogs");
arraylist.add(" ");
arraylist.add(" ");
arraylist.add(" ");
int words = 0;
for(String s:arraylist) {
s = s.trim().replaceAll(" +", " "); //clean up the String
if(!s.isEmpty()){ //do not count empty strings
words += s.split(" ").length;
}
}
System.out.println(words);
}
}
Should looks like this:
public int numberOfWords(){
int words = 0;
for(int i = 0; i < arraylist.size(); i++) {
words = words + (number of words on current line);
//words should eventually equal to 5
}
return words;
}
I think this could help you .
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.StringTokenizer;
public class LineWord {
public static void main(String args[]) {
try {
File f = new File("C:\\Users\\MissingNumber\\Documents\\NetBeansProjects\\Puzzlecode\\src\\com\\test\\test.txt"); // Creating the File passing path to the constructor..!!
BufferedReader br = new BufferedReader(new FileReader(f)); //
String strLine = " ";
String filedata = "";
while ((strLine = br.readLine()) != null) {
filedata += strLine + " ";
}
StringTokenizer stk = new StringTokenizer(filedata);
List <String> token = new ArrayList <String>();
while (stk.hasMoreTokens()) {
token.add(stk.nextToken());
}
//Collections.sort(token);
System.out.println(token.size());
br.close();
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
}
}
}
So you'll red data from a file in this case and store them in a list after tokenizing them , just count them , If you just want to get input from the console use the Bufferedreader , tokenize them , separating with space , put in list , simple get size .
Hope you got what you are looking for .

Categories