How CSV parsing can be utilized - JAVA - java

I am given a file that will read the following:
"String",int,int
"String",int,int
"String",int,int
...
Given an unknown number of variables, a while (scanner.hasNextLine()) can solve to the number of entries. My goal is to take these three pieces of data and store them into a Node. I am using the method BinaryTree.addNode(String, int, int) for this. My issue comes to when I am trying to read in the data. I am trying to remove the commas within the document and then attempting to re-read the data using the following:
Scanner firstpass = new Scanner(file);
String input = firstpass.nextLine().replaceAll(",", "");
Scanner secondpass = new Scanner(input);
String variable1 = secondpass.next();
int variable2 = secondpass.nextInt();
int variable3 = secondpass.nextInt();
This however is a very innefective way of going about this.
UPDATED
The compiling errors can be fixed with the following:
try {
Scanner scanner1 = new Scanner(file);
while (scanner1.hasNextLine()) {
String inventory = scanner1.nextLine().replaceAll(",", " ");
Scanner scanner2 = new Scanner(inventory);
while (scanner2.hasNext()){
String i = scanner2.next();
System.out.print(i);
}
scanner2.close();
}
scanner1.close();
}
catch (FileNotFoundException ex) {
ex.printStackTrace();
}
which gives me the output:
"String"intint"String"intint"String"intint...
So I know I am on the right track. However any (spaces) within the "String" variable are removed. So they would output "SomeString" instead of "Some String". Also I still don't know how to remove the "" from the strings.

The format you've shown matches the CSV (Comma-Separated Values) format, so your best option is to use a CSV parser, e.g. Apache Commons CSV ™.
If you don't want to add a third-party library, you could use Regular Expression to parse the line.
Reading lines from a file should not be done with a Scanner. Use a BufferedReader instead. See Scanner vs. BufferedReader.
try (BufferedReader in = new BufferedReader(new FileReader(file))) {
Pattern p = Pattern.compile("\"(.*?)\",(-?\\d+),(-?\\d+)");
for (String line; (line = in.readLine()) != null; ) {
Matcher m = p.matcher(line);
if (! m.matches())
throw new IOException("Invalid line: " + line);
String value1 = m.group(1);
int value2 = Integer.parseInt(m.group(2));
int value3 = Integer.parseInt(m.group(3));
// use values here
}
} catch (IOException | NumberFormatException ex) {
ex.printStackTrace();
}
Note that this will not work if the string contains escaped characters, e.g. if it contains embedded double-quotes. For that, you should use a parser library.
The code above will correctly handle embedded spaces and commas.

I would instead of using
String input = firstpass.nextLine().replaceAll(",", "");
Scanner secondpass = new Scanner(input);
String variable1 = secondpass.next();
int variable2 = secondpass.nextInt();
int variable3 = secondpass.nextInt();
Use the following approach
String line = firstpass.nextLine();
String[] temp = line.split(",");
String variable1 = temp[0];
int variable2 = Integer.parseInt(temp[1]);
int variable3 = Integer.parseInt(temp[2]);

Related

Replace quotes in String

I have to replace all the commas that are between double quotes with a dot.
I'm trying to do that with the replace and replaceAll Java's methods. But I still didn't sort out a solution.
Can someone help me?
EDIT:
I have to manually parse a csv file to object. So I'm trying to string split each input line, but one number has a comma inside so i'm getting more datas than I need for the split.
Example: I have to split this string.
"""LASER MEDIA SOCIETA' COOPERATIVA""",CNF146010,FM (S),PIAZZA UMBERTO I - PISTICCI,MT,40N2323,16E3328,383,,"99,1",CITY RADIO,"H: --V: 32 dBW",0.0
Notice that I have "99,1" and the ,, before that are putting me in trouble.
Scanner var = new Scanner(new BufferedReader(new FileReader ("t1.csv")));
ArrayList<Catasto> obj = new ArrayList();
String data = var.nextLine();
String data2 = null;
String full = null;
int j = 0;
while (var.hasNextLine()) {
data = var.nextLine();
data2 = var.nextLine();
full = data + data2;
//full = full.replaceAll("\"*[,]*\"", "."); attempt 1
System.out.println(full);
ArrayList<String> parts = new ArrayList();
String[] parti = full.split(",");
//for (int i = 0; i<parti.length; i++) { this is because I'm trying to change empty string with a null
//if (parti[i] == " ") in order to solve this error: java.lang.NumberFormatException: For input string: ""
// parti[i] = null;
//}
for (int i = 0; i<12; i++) {
parts.add(parti[i]);
}
Catasto foo = new Catasto(parts);
obj.add(foo);
}
var.close();
EDIT 2:
I have solved the problem of the comma between the double quotes. But I don't know why the error: java.lang.NumberFormatException: For input string: ""
You're going to struggle to do it with a single replaceAll or replace as you need to determine pairs of quotes. Your best bet is to match pairs of quotes and the use replaceAll for the group to change the comma to a full stop.
String input = "\"One,Two,There\",\"Four,Five,Six\"";
Matcher m = Pattern.compile("\"[^\"]*\"").matcher(input);
StringBuffer sb = new StringBuffer();
while(m.find()) {
m.appendReplacement(sb, m.group().replaceAll(",", "."));
}
m.appendTail(sb);
String output = sb.toString(); // "One.Two.There","Four.Five.Six"

Java using \034 as delimiter in a string

I am trying to use '\034' field separator character as a delimiter in a string.
The issue is when I hardcode "\034"+opField and write it to a file it works, but if the "\034" character is read from a file, it writes the output as string "col1\034col2'.
I tried using StringBuilder but it escapes the \034 to "\\034".
I am using the following code to read the character from the file:
try (BufferedReader br = new BufferedReader(new FileReader(fConfig))){
int lc = 1;
for(String line;(line = br.readLine())!=null;){
String[] rowList = line.split(delim);
int row_len = rowList.length;
if (row_len<2){
System.out.println("Incorrect dictionary file row:"+fConfig.getAbsolutePath()+"\nNot enough values found at row:"+line);
}else{
String key = rowList[0];
String value = rowList[1];
dictKV.put(key, value);
}
lc++;
}
}catch(Exception e){
throw e;
}
Any help is welcome...
[update]: The same thing is happening with '\t' character, if harcoded fine, but if read from a file its getting appended as characters. "col0\tcol1"
if(colAl.toLowerCase().contains(" as ")){
String temp = colAl.replaceAll("[ ]+as[ ]+"," | ");
ArrayList<String> tempA = this.brittle_delim(temp,'|');
colAl = tempA.get(tempA.size()-1);
colAl = colAl.trim();
}else {
ArrayList<String> tempA = this.brittle_delim(colAl,' ');
colAl = tempA.get(tempA.size()-1);
colAl = colAl.trim();
}
if(i==0){
sb.append(colAl);
headerCols+=colAl.trim();
}else{
headerCols+= this.output_field_delim + colAl;
sb.append(this.output_field_delim);
sb.append(colAl);
}
}
}
System.out.println("SB Header Cols:"+sb.toString());
System.out.println("Header Cols:"+headerCols);
Output:
SB Header Cols:
SPRN_CO_ID\034FISC_YR_MTH_DSPLY_CD\034CST_OBJ_CD\034PRFT_CTR_CD\034LEGL_CO_CD\034HEAD_CT_TYPE_ID\034FIN_OWN_CD\034FUNC_AREA_CD\034HEAD_CT_NR
Header Cols:
SPRN_CO_ID\034FISC_YR_MTH_DSPLY_CD\034CST_OBJ_CD\034PRFT_CTR_CD\034LEGL_CO_CD\034HEAD_CT_TYPE_ID\034FIN_OWN_CD\034FUNC_AREA_CD\034HEAD_CT_NR
In the above code if I do the following I am getting correct results:
headerCols+= "\034"+ colAl;
output:
SPRN_CO_IDFISC_YR_MTH_DSPLY_CDCST_OBJ_CDPRFT_CTR_CDLEGL_CO_CDHEAD_CT_TYPE_IDFIN_OWN_CDFUNC_AREA_CDHEAD_CT_NR
The FS characters are there even if they are geting removed here
You should provide an example demonstrating your problem. Not just incomplete code snippets.
Following runable snippet does what you explained.
// create a file one line
byte[] bytes = "foo bar".getBytes(StandardCharsets.ISO_8859_1);
String fileName = "/tmp/foobar";
Files.write(Paths.get(fileName), bytes);
String headerCols = "";
String outputFieldDelim = "\034";
try (BufferedReader br = new BufferedReader(new FileReader(fileName))) {
// read the line from the file and split by blank character
String[] cols = br.readLine().split(" ");
// contcatenate the values with "\034"
// but ... for your code ...
// don't concatenate String objects in a loop like below
// use a StringBuilder or StringJoiner instead
headerCols += outputFieldDelim + cols[0];
headerCols += outputFieldDelim + cols[1];
}
// output with the "\034" character
System.out.println(headerCols);
I guess this is where I found my solution and the actual words for my Question.
How to unescape string literals in java

Retrieving part of a string using a delimiter

Okay So I am creating an application but I'm not sure how to get certain parts of the string. I have read In a file as such:
*tp*|21394398437984|163600
*2*|AAA|1234567894561236|STOP|20140527|Success||Automated|DSPRN1234567
*2*|AAA|1234567894561237|STOP|20140527|Success||Automated|DPSRN1234568
*3*|2
I need to read the lines beginning with 2 so I done:
s = new Scanner(new BufferedReader(new FileReader("example.dat")));
while (s.hasNext()) {
String str1 = s.nextLine ();
if(str1.startsWith("*2*")) {
System.out.print(str1);
}
}
So this will read the whole line I'm fine with that, Now my issue is I need to extract the 2nd line beginning with numbers the 4th with numbers the 5th with success and the 7th(DPSRN).
I was thinking about using a String delimiter with | as the delimiter but I'm not sure where to go after this any help would be great.
You should use String.split("|"), it will give you an array - String[]
Try following:
String test="*2*|AAA|1234567894561236|STOP|20140527|Success||Automated|DSPRN1234567";
String tok[]=test.split("\\|");
for(String s:tok){
System.out.println(s);
}
Output :
*2*
AAA
1234567894561236
STOP
20140527
Success
Automated
DSPRN1234567
What you require will be placed at tok[2], tok[4], tok[5] and tok[8].
Just split the returned line based on your search, which would return an array of String elements where you can retrieve your elements based on their index:
s = new Scanner(new BufferedReader(new FileReader("example.dat")));
String searchLine = "";
while (s.hasNext()) {
searchLine = s.nextLine();
if(searchLine.startsWith("*2*")) {
break;
}
}
String[] strs = searchLine.split("|");
String secondArgument = strs[2];
String forthArgument = strs[4];
String fifthArgument = strs[5];
String seventhArgument = strs[7];
System.out.println(secondArgument);
System.out.println(forthArgument);
System.out.println(fifthArgument);
System.out.println(seventhArgument);

How to replace all special characters with another character in java?

I want to replace all 'special characters' with a special character in java
For example 'cash&carry' will become 'cash+carry' and also 'cash$carry' will become 'cash+carry'
I have a sample CSV file as
Here the CSV headers are 'What' and 'Where'
What,Where
salon,new+york+metro
pizza,los+angeles+metro
crate&barrel,los+angeles+metro
restaurants,los+angeles+metro
gas+station,los+angeles+metro
persian+restaurant,los+angeles+metro
car+wash,los+angeles+metro
book store,los+angeles+metro
garment,los+angeles+metro
"cash,carry",los+angeles+metro
cash&carry,los+angeles+metro
cash carry,los+angeles+metro
The expected output
What,Where
salon,new+york+metro
pizza,los+angeles+metro
crate+barrel,los+angeles+metro
restaurants,los+angeles+metro
gas+station,los+angeles+metro
persian+restaurant,los+angeles+metro
car+wash,los+angeles+metro
book+store,los+angeles+metro
garment,los+angeles+metro
cash+carry,los+angeles+metro
cash+carry,los+angeles+metro
cash+carry,los+angeles+metro
The sample code is as follows
String csvfile="BidAPI.csv";
try{
// create the 'Array List'
ArrayList<String> What=new ArrayList<String>();
ArrayList<String> Where=new ArrayList<String>();
BufferedReader br=new BufferedReader(new FileReader(csvfile));
StringTokenizer st=null;
String line="";
int linenumber=0;
int columnnumber;
int free=0;
int free1=0;
while((line=br.readLine())!=null){
linenumber++;
columnnumber=0;
st=new StringTokenizer(line,",");
while(st.hasMoreTokens()){
columnnumber++;
String token=st.nextToken();
if("What".equals(token)){
free=columnnumber;
System.out.println("the value of free :"+free);
} else if("Where".equals(token)){
free1=columnnumber;
System.out.println("the value of free1 :"+free1);
}
if(linenumber>1){
if (columnnumber==free){
What.add(token);
} else if(columnnumber==free1){
Where.add(token);
}
}
}
}
// converting the 'What' Array List to array
String[] what=What.toArray(new String[What.size()]);
// converting the 'Where' Array List to array
String[] where = Where.toArray(new String[Where.size()]);
for(int i=0;i<what.length;i++){
String data = what[i].replaceAll("[^A-Za-z0-9\",]| (?!([^\"]*\"){2}[^\"]*$)", "+").replace("\"", "");
System.out.println(data);
System.out.println(where[i]);
String finaldata = data+where[i];
String json = readUrl(desturl);
br.close();
}catch(Exception e){
System.out.println("There is an error :"+e);
}
All the special characters, all the spaces and the double quotes should be removed and replaced as in the desired output.
I am using value.replaceAll("[^A-Za-z0-9 ]", "+") , but it is not working.
Error
cash
carry"
Any help is appreciated. new to regex.
You need to:
replace all commas within quotes with +
replace non-whitelist (and you need to add commas to your whitelist)
+
remove double quotes
Try this:
line = line.replaceAll("[^A-Za-z0-9\",]|,(?!(([^\"]*\"){2})*[^\"]*$)", "+").replace("\"", "");
I think your regex is pretty close. Add an exception for comma's as well and get rid of the space and you are good.
BufferedReader r = new BufferedReader(new InputStreamReader(System.in));
String line;
while ((line = r.readLine()) != null)
{
String replaced = line.replace("\"", "");
replaced = replaced.replaceAll("[^A-Za-z0-9,]", "+");
System.out.println(replaced);
}
Of course, Strings are immutable in Java. Keep that in mind. replaceAll() returns a new String and does not modify the original instance.
Demo here.
You need to first find quote and replace , inside it with +. Next you can just use replaceAll("[^A-Za-z0-9,]", "+") so you will replace all non alphanumeric characters or , with +. Your code for that can use
Pattern p = Pattern.compile("\"([^\"]*)\"");
pattern to locate quotations and appendReplacement, appendTail from Matcher class to replace founded quotations with its new version.
So in short your code can look something like
Scanner scanner = new Scanner(new File(csvfile));
Pattern p = Pattern.compile("\"([^\"]*)\"");
StringBuffer sb = new StringBuffer();
while(scanner.hasNextLine()){
String line = scanner.nextLine();
Matcher m = p.matcher(line);
while (m.find()){//find quotes
//and replace their content with content with replaced `,` by `+`
//BTW group(1) holds part of quotation without `"` marsk
m.appendReplacement(sb, m.group(1).replace(',', '+'));
}
m.appendTail(sb);//we need to also add rest of unmatched data to buffer
//now we can just normally replace special characters with +
String result = sb.toString().replaceAll("[^A-Za-z0-9,]", "+");
//after job is done we can use result, so lest print it
System.out.println(result);
//lets not forget to reset buffer for next line
sb.delete(0, sb.length());
}
Answer to the question
String csvfile="BidAPI.csv";
try{
// create the 'Array List'
ArrayList<String> What=new ArrayList<String>();
ArrayList<String> Where=new ArrayList<String>();
BufferedReader br=new BufferedReader(new FileReader(csvfile));
StringTokenizer st=null;
String line="";
int linenumber=0;
int columnnumber;
int free=0;
int free1=0;
while((line=br.readLine())!=null){
line =line.replaceAll("[^A-Za-z0-9\",]|,(?!(([^\"]*\"){2})*[^\"]*$)", "+").replace("\"", "");
linenumber++;
columnnumber=0;
st=new StringTokenizer(line,",");
while(st.hasMoreTokens()){
columnnumber++;
String token=st.nextToken();
if("What".equals(token)){
free=columnnumber;
System.out.println("the value of free :"+free);
} else if("Where".equals(token)){
free1=columnnumber;
System.out.println("the value of free1 :"+free1);
}
if(linenumber>1){
if (columnnumber==free){
What.add(token);
} else if(columnnumber==free1){
Where.add(token);
}
}
}
}
// converting the 'What' Array List to array
String[] what=What.toArray(new String[What.size()]);
// converting the 'Where' Array List to array
String[] where = Where.toArray(new String[Where.size()]);
for(int i=0;i<what.length;i++){
String data = what[i].replaceAll("[^A-Za-z0-9\",]| (?!([^\"]*\"){2}[^\"]*$)", "+").replace("\"", "");
System.out.println(data);
System.out.println(where[i]);
String finaldata = data+where[i];
String json = readUrl(desturl);
br.close();
}catch(Exception e){
System.out.println("There is an error :"+e);
}

reading from text file to string array

So I can search for a string in my text file, however, I wanted to sort data within this ArrayList and implement an algorithm. Is it possible to read from a text file and the values [Strings] within the text file be stored in a String[] Array.
Also is it possible to separate the Strings? So instead of my Array having:
[Alice was beginning to get very tired of sitting by her sister on the, bank, and of having nothing to do:]
is it possible to an array as:
["Alice", "was" "beginning" "to" "get"...]
.
public static void main(String[]args) throws IOException
{
Scanner scan = new Scanner(System.in);
String stringSearch = scan.nextLine();
BufferedReader reader = new BufferedReader(new FileReader("File1.txt"));
List<String> words = new ArrayList<String>();
String line;
while ((line = reader.readLine()) != null) {
words.add(line);
}
for(String sLine : words)
{
if (sLine.contains(stringSearch))
{
int index = words.indexOf(sLine);
System.out.println("Got a match at line " + index);
}
}
//Collections.sort(words);
//for (String str: words)
// System.out.println(str);
int size = words.size();
System.out.println("There are " + size + " Lines of text in this text file.");
reader.close();
System.out.println(words);
}
To split a line into an array of words, use this:
String words = sentence.split("[^\\w']+");
The regex [^\w'] means "not a word char or an apostrophe"
This will capture words with embedded apostrophes like "can't" and skip over all punctuation.
Edit:
A comment has raised the edge case of parsing a quoted word such as 'this' as this.
Here's the solution for that - you have to first remove wrapping quotes:
String[] words = input.replaceAll("(^|\\s)'([\\w']+)'(\\s|$)", "$1$2$3").split("[^\\w']+");
Here's some test code with edge and corner cases:
public static void main(String[] args) throws Exception {
String input = "'I', ie \"me\", can't extract 'can't' or 'can't'";
String[] words = input.replaceAll("(^|[^\\w'])'([\\w']+)'([^\\w']|$)", "$1$2$3").split("[^\\w']+");
System.out.println(Arrays.toString(words));
}
Output:
[I, ie, me, can't, extract, can't, or, can't]
Also is it possible to separate the Strings?
Yes, You can split string by using this for white spaces.
String[] strSplit;
String str = "This is test for split";
strSplit = str.split("[\\s,;!?\"]+");
See String API
Moreover you can also read a text file word by word.
Scanner scan = null;
try {
scan = new Scanner(new BufferedReader(new FileReader("Your File Path")));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
while(scan.hasNext()){
System.out.println( scan.next() );
}
See Scanner API

Categories