I have a csv file. It contains several duplicate columns. I am trying to remove these duplicates using Java. I found Apache Common csv library, some people use it to remove duplicate rows. How can I use it to remove or skip duplicate columns?
For example: my csv header is:
ID Name Email Email
So far my code is:
Reader reader = Files.newBufferedReader(Paths.get("user.csv"));
// read csv file
Iterable<CSVRecord> records = CSVFormat.DEFAULT.withFirstRecordAsHeader()
.withIgnoreHeaderCase()
.withTrim()
.parse(reader);
for (CSVRecord record : records) {
System.out.println("Record #: " + record.getRecordNumber());
System.out.println("ID: " + record.get("ID"));
System.out.println("Name: " + record.get("Name"));
System.out.println("Email: " + record.get("Email"));
}
// close the reader
reader.close();
Your code is close to what you need - you just need to use CSVPrinter to write out your data to a new file.
import java.io.IOException;
import java.io.Reader;
import java.io.Writer;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter;
import org.apache.commons.csv.CSVRecord;
public class App {
public static void main(String[] args) throws IOException {
try (final Reader reader = Files.newBufferedReader(Paths.get("source.csv"),
StandardCharsets.UTF_8)) {
final Writer writer = Files.newBufferedWriter(Paths.get("target.csv"),
StandardCharsets.UTF_8,
StandardOpenOption.CREATE); // overwrites existing output file
try (final CSVPrinter printer = CSVFormat.DEFAULT
.withHeader("ID", "Name", "Email")
.print(writer)) {
// read each input file record:
Iterable<CSVRecord> records = CSVFormat.DEFAULT
.withFirstRecordAsHeader()
.withIgnoreHeaderCase()
.withTrim()
.parse(reader);
// write each output file record
for (CSVRecord record : records) {
printer.print(record.get("ID"));
printer.print(record.get("Name"));
printer.print(record.get("Email"));
printer.println();
}
}
}
}
}
This transforms the following source file:
ID,Name,Email,Email
1,Albert,foo#bar.com,foo#bar.com
2,Brian,baz#bat.com,baz#bat.com
To this target file:
ID,Name,Email
1,Albert,foo#bar.com
2,Brian,baz#bat.com
Some points to note:
I was wrong in my comment. You do not need to use column indexes - you can use headings (as I do above) in your specific case.
Whenever reading and writing a file, it is recommended to provide the character encoding. In my case, I use UTF-8. (This assumes the original file was created as a URF-8 file, of course - or is compatible with that encoding.)
When opening the reader and the writer I use "try-with-resources" statements. These mean I do not have to explicitly close the file resources - Java takes care of that for me.
I wrote a small scripts to read from CSV in java. It takes a CSV, and push some values from the CSV into an HashMap. My CSV has 110 records ( 109 without the header ) however i get an HashMap with 54 values. When i debug, i can see that at each iteration, a line from my CSV is skipped.
Here's the code
package **CENSORED**.utils;
import com.day.cq.dam.api.Asset;
import com.day.cq.dam.api.Rendition;
import com.day.text.csv.Csv;
import java.io.*;
import java.nio.charset.StandardCharsets;
import java.util.*;
import org.apache.sling.api.resource.Resource;
import org.apache.sling.api.resource.ResourceResolver;
public class DateFormatUtils {
private static String dateFormatCsvPath = "/content/dam/csv/country_date_format.csv";
public static String getDateFormatByLocale(Locale Locale, ResourceResolver resourceResolver) {
Resource res = resourceResolver.getResource(dateFormatCsvPath);
Asset asset = res.adaptTo(Asset.class);
Rendition rendition = asset.getOriginal();
InputStream is = rendition.adaptTo(InputStream.class);
HashMap<String, String> localeToFormat = new HashMap<String, String>();
Csv csv = new Csv();
try {
Iterator<String[]> rowIterator = csv.read(is, StandardCharsets.UTF_8.name());
while (rowIterator.hasNext()) {
String[] row = rowIterator.next();
String country = row[1];
String locale = row[4];
String dateFormat = row[6];
localeToFormat.put(locale.toLowerCase() + "_" + country.toLowerCase(), dateFormat);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Here are few screenshot of my debug
at 1st iteration, the line 2 of my CSV gets added into my hashmap. The header have been skipped.
At 2nd iteration, the line 5 gets added to my hashmap, but lines 3-4 aren't.
At 3rd iteration, the line 8 gets added to my hasmap, but lines 6-7 aren't.
At the end i end up with 53 elements in my hashmap while i expect 109.
Here's also a sample of my CSV :
ISO 3166 Country Code,ISO639-2 Country Code,Country,ISO 3166 Country Code,ISO639-2 Lang,Language,Date Format
ALB,AL,Albania,sqi,sq,Albanian,yyyy-MM-dd
ARE,AE,United Arab Emirates,ara,ar,Arabic,dd/MM/yyyy
ARG,AR,Argentina,spa,es,Spanish,dd/MM/yyyy
AUS,AU,Australia,eng,en,English,d/MM/yyyy
AUT,AT,Austria,deu,de,German,dd.MM.yyyy
BEL,BE,Belgium,fra,fr,French,d/MM/yyyy
BEL,BE,Belgium,nld,nl,Dutch,d/MM/yyyy
BGR,BG,Bulgaria,bul,bg,Bulgarian,yyyy-M-d
BHR,BH,Bahrain,ara,ar,Arabic,dd/MM/yyyy
BIH,BA,Bosnia and Herzegovina,srp,sr,Serbian,yyyy-MM-dd
BLR,BY,Belarus,bel,be,Belarusian,d.M.yyyy
BOL,BO,Bolivia,spa,es,Spanish,dd-MM-yyyy
BRA,BR,Brazil,por,pt,Portuguese,dd/MM/yyyy
CAN,CA,Canada,fra,fr,French,yyyy-MM-dd
CAN,CA,Canada,eng,en,English,dd/MM/yyyy
Finally a last screenshot that shows that my CSV has correct EOL at their line
This is the csv.read() function, a class made by Adobe for AEM :
public Iterator<String[]> read(InputStream in, String charset) throws IOException {
if (charset == null) {
charset = System.getProperty("file.encoding");
}
InputStream in = new BufferedInputStream(in, 4096);
this.input = new InputStreamReader(in, charset);
return this.read();
}
I finally went with another solution since i wasnt able to use this one. For perennity, i was developing this for an AEM project; i decided to leverage the Generic List Item in ACS-common to get a dictionnary with all the values i needed instead of reading from a CSV. As #Artistotle stated, there is def something wrong with the reader so i'd advise against using com.day.text.csv.Csv;
This question already has answers here:
Multiple values in java.util.Properties
(5 answers)
Closed 2 years ago.
i have a file where i save one key with some entry (relation one to many).
From this file i need to extract values searching by key.
I just found a util (java.util.Properties) to handle properties files in Java.
It works very well for properties files.
But its return the first occurence for the key searched.
Since is present for properties files i expect that already exist also a version with multiple results allowed.
Exist a solution that returns an array of string for the researched key?
Properties is backed by a Hashtable so the key must be unique.
So if you want to stick to multiple instances of the same key you can implement the parsing yourself (if you don't depend too much on the extras managed by Properties):
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
public class FileInput {
Properties porp;
public static Map<String, List<String>> loadProperties(String file) throws IOException
{
HashMap<String, List<String>> multiMap = new HashMap<>();
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
String line = null;
while ((line = br.readLine()) != null) {
if (line.startsWith("#"))
continue; // skip comment lines
String[] parts = line.split("=");
multiMap.computeIfAbsent(parts[0].trim(), k -> new ArrayList<String>()).add(parts[1].trim());
}
}
return multiMap;
}
public static void main(String[] args) throws IOException {
Map<String,List<String>> result=loadProperties("myproperties.properties");
}
}
UPDATED: improved exception handling (valid remark #rzwitsersloot). I prefer to throw the exception so the caller can decide what to do if the properties file is missing.
For ex: I am trying search a text with name "abc"in .csv file which is present in column no 6 in multiple rows and I need to delete those rows.
I tried below code. I am able to get the line no/row no where text "abc" is present in column 6 but it is not deleting the rows.
import java.io.BufferedReader;
import java.io.*;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.List;
import com.opencsv.CSVReader;
import com.opencsv.CSVWriter;
public class ReadExcel {
public static void main(String[] args) throws Exception{
String csvFile = "csv filelocation";
CSVReader reader = new CSVReader(new FileReader(csvFile));
List<String[]> allElements = reader.readAll();
String [] nextLine;
int lineNumber = 0;
while ((nextLine = reader.readNext()) != null) {
lineNumber++;
if(nextLine[5].equalsIgnoreCase("abc")){
System.out.println("Line # " + lineNumber);
allElements.remove(lineNumber);
}
}
For reading the files in CSV format, I am currently using the library super-csv. There are various examples.
Let me know if you need help to use it.
So, if you would like to use the opencsv library, I start a new example for writing the new content in a CSV file. I take inspiration from your example code.
List<String[]> allElements; /* This list will contain the lines that cover your criteria */
/*
...
*/
CSVWriter writer = new CSVWriter(new FileWriter("yourfile.csv"));
writer.writeAll(allElements);
writer.close();
I have a CSV file which looks like this:
http://gyazo.com/5dcfb8eca4e133cbeac87f514099e320.png
I need to figure out how I can read specific cells and update them in the file.
This is the code I am using:
import java.util.List;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import com.opencsv.*;
public class ReadCSV {
private static final char SEPARATOR = ';';
public static void updateCSV(String input, String output, String replace, int row, int col) throws IOException {
CSVReader reader = new CSVReader(new FileReader(input),SEPARATOR);
List<String[]> csvBody = reader.readAll();
csvBody.get(row)[col]=replace;
reader.close();
CSVWriter writer = new CSVWriter(new FileWriter(output),SEPARATOR,' ');
writer.writeAll(csvBody);
writer.flush();
writer.close();
}
public static void main(String[] args) throws IOException {
String source = "townhall_levels.csv";
String destiantion="output.csv";
ReadCSV.updateCSV(source, destiantion, "lol", 1, 1);
}
}
In this code I am just trying to change A1 to "lol" as an example test to see if it works but I get the following error:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at ReadCSV.updateCSV(ReadCSV.java:16)
at ReadCSV.main(ReadCSV.java:30)
How should I go about achieving my goal and fixing the error?
CSV File: www.forumalliance.net/townhall_levels.csv
You're using ;as the separator to parse the file. Your file uses ,. Also, using a space as the quote char doesn't make much sense. You should use " instead, since that's also what your file uses.
The first values you're passing to row and col are 1 and 1. However, these need to start at 0.