csv parser reading headers - java

I'm working on a csv parser, I want to read headers and the rest of the csv file separately.
Here is my code to read csv.
The current code reads everything in the csv file, but I need to read headers separate.
please help me regarding this.
public class csv {
private void csvRead(File file)
{
try
{
BufferedReader br = new BufferedReader( new FileReader(file));
String strLine = "";
StringTokenizer st = null;
File cfile=new File("csv.txt");
BufferedWriter writer = new BufferedWriter(new FileWriter(cfile));
int tokenNumber = 0;
while( (strLine = br.readLine()) != null)
{
st = new StringTokenizer(strLine, ",");
while(st.hasMoreTokens())
{
tokenNumber++;
writer.write(tokenNumber+" "+ st.nextToken());
writer.newLine();
}
tokenNumber = 0;
writer.flush();
}
}
catch(Exception e)
{
e.getMessage();
}
}

We have withHeader() method available in CSVFormat. If you use this option then you will be able to read the file using headers.
CSVFormat format = CSVFormat.newFormat(',').withHeader();
Map<String, Integer> headerMap = dataCSVParser.getHeaderMap();
will give you all headers.
public class CSVFileReaderEx {
public static void main(String[] args){
readFile();
}
public static void readFile(){
List<Map<String, String>> csvInputList = new CopyOnWriteArrayList<>();
List<Map<String, Integer>> headerList = new CopyOnWriteArrayList<>();
String fileName = "C:/test.csv";
CSVFormat format = CSVFormat.newFormat(',').withHeader();
try (BufferedReader inputReader = new BufferedReader(new FileReader(new File(fileName)));
CSVParser dataCSVParser = new CSVParser(inputReader, format); ) {
List<CSVRecord> csvRecords = dataCSVParser.getRecords();
Map<String, Integer> headerMap = dataCSVParser.getHeaderMap();
headerList.add(headerMap);
headerList.forEach(System.out::println);
for(CSVRecord record : csvRecords){
Map<String, String> inputMap = new LinkedHashMap<>();
for(Map.Entry<String, Integer> header : headerMap.entrySet()){
inputMap.put(header.getKey(), record.get(header.getValue()));
}
if (!inputMap.isEmpty()) {
csvInputList.add(inputMap);
}
}
csvInputList.forEach(System.out::println);
} catch (Exception e) {
System.out.println(e);
}
}
}

Please consider the use of Commons CSV. This library is written according RFC 4180 - Common Format and MIME Type for Comma-Separated Values (CSV) Files. What is compatible to read such lines:
"aa,a","b""bb","ccc"
And the use is quite simple, there is just 3 classes, and a small sample according documentation:
Parsing of a csv-string having tabs as separators, '"' as an optional
value encapsulator, and comments starting with '#':
CSVFormat format = new CSVFormat('\t', '"', '#');
Reader in = new StringReader("a\tb\nc\td");
String[][] records = new CSVParser(in, format).getRecords();
And additionally you get this parsers already available as constants:
DEFAULT - Standard comma separated format as defined by RFC 4180.
EXCEL - Excel file format (using a comma as the value delimiter).
MYSQL - Default MySQL format used by the SELECT INTO OUTFILE and LOAD DATA INFILE operations.
TDF - Tabulation delimited format.

Have you considered OpenCSV?
Previous question here...
CSV API for Java
Looks like you can split out the header quite easily...
String fileName = "data.csv";
CSVReader reader = new CSVReader(new FileReader(fileName ));
// if the first line is the header
String[] header = reader.readNext();
// iterate over reader.readNext until it returns null
String[] line = reader.readNext();

Your code here, being
while( (strLine = br.readLine()) != null)
{
//reads everything in your csv
}
will print all of your CSV content.
For example, the following fetches your header:
Reader in = ...;
CSVFormat.EXCEL.withHeader("Col1", "Col2", "Col3").parse(in);
As suggested, life could be easier using the predefined CSVFormat from the apache commons library. Link here (https://commons.apache.org/proper/commons-csv/user-guide.html).
Cheers.

Related

Java - Apache Beam: Read file from GCS with "UCS2-LE BOM" encoding

I want to read a file in UCS2-LE BOM using TextIO, however It doesn't seem to work.
Is there a way to use TextIO with this encoding ? Or is there another library that does well with this type of encoding ?
My code is in JAVA (Apache Beam)
PCollection<KV<String, String>> csvElements =
pipeline.apply("Reads the input csv file", TextIO
.read()
.from(options.getPolledFile()))
.apply("Read File", ParDo.of(new DoFn<String, KV<String,String>>(){
#ProcessElement
public void processElement(ProcessContext c) throws UnsupportedEncodingException {
String element = c.element();
String elStr = new String(element.getBytes(),"UTF-16LE");
c.output(elStr);}}));
I found a solution, in a medium post : Solution
The file I am reading is stored in GCS, hence the added lines in try part (compared to the original code.)
file = "path to gas file";
PCollection<String> readCollection = pipeline.apply(FileIO.match().filepattern(file))
.apply(FileIO.readMatches())
.apply(FlatMapElements
.into(strings())
.via((FileIO.ReadableFile f) -> {
List<String> result = new ArrayList<>();
try {
ReadableByteChannel byteChannelParse = f.open();
InputStream inputStream = Channels.newInputStream(byteChannelParse);
BufferedReader br = new BufferedReader(new InputStreamReader(inputStream, "UTF-16"));
String line = br.readLine();
while (line != null) {
result.add(line);
line = br.readLine();
}
br.close();
inputStream.close();
}
catch (IOException e) {
throw new RuntimeException("Error while reading", e);
}
return result;
}));
P.S: I didn't add a line with credentials because I passed it into IntelliJ parameters.

Combined Xml String Split Java

I am trying to split a combined text file. The combined text file has multiple xml files inside. I want to split on <?xml version='1.0'?> which is the start of every new xml inside the combined text file. Not sure what is the best way to do this. Currently this is what I have which does not split correctly.
Updated Code Working (fixed quotation in quotes problem added Pattern.quote):
Scanner scanner = new Scanner( new File("src/main/resources/Flume_Sample"), "UTF-8" );
String combinedText = scanner.useDelimiter("\\A").next();
scanner.close(); // Put this call in a finally block
String delimiter = "<?xml version=\"1.0\"?>";
String[] xmlFiles = combinedText.split("(?="+Pattern.quote(delimiter)+")");
for (int i = 0; i < xmlFiles.length; i++){
File file = new File("src/main/resources/output_"+i);
FileWriter writer = new FileWriter(file);
writer.write(xmlFiles[i]);
System.out.println(xmlFiles[i]);
writer.close();
}
The split method takes a regular expression string, so you may want to escape your delimiter String to a valid regex :
String[] xmlFiles = combinedText.split(Pattern.quote(delimiter));
See the Pattern.quote method .
Be also aware that you will load the entire initial file in memory if you proceed this way.
A streamed approach would perform better if the input file is large...
I would use something like this if you want to parse the data manually.
public static void parseFile(File file) throws AttributeException, LineException{
BufferedReader br = null;
String s = "";
int counter = 0;
if(file != null){
try{
br = new BufferedReader(new FileReader(file));
while((s = br.readLine()) != null){
if(s.contains("<?xml version='1.0'?>")){
//Write in new file with Stringbuffer and Filewritter.
}
}
br.close();
}catch (IOException e){
System.out.println(e);
}
}
}

reading CSV file till particular column numbers

I am reading csv file method as under -
public ArrayList<String> fileRead(File f) throws IOException {
FileReader fr = new FileReader(f);
BufferedReader br = new BufferedReader(fr);
ArrayList<String> CSVData = new ArrayList<String>();
String text;
try {
while ((text = br.readLine()) != null) {
CSVData.add(text);
log.debug(text);
}
log.info(f + ": Read successfully");
br.close();
fr.close();
} catch (IOException e) {
log.error("Error in reading file " + e.getMessage());
}
return CSVData;
}
but I want to read file till defined column number e.g. till 20th column,
but if in between I will found empty cell for some column then as above code it will exit on (text = br.readLine()) != null ,
so finally my question is how to read CSV file till particular columns either its empty cell or whatever it should read till those column and break point for moving next line should be that column example 20th column ,
Thank in advance for help and support
You should use uniVocity-parsers' column selection feature to process your file.
Here's an example:
CsvParserSettings settings = new CsvParserSettings();
parserSettings.selectFields("Foo", "Bar", "Blah");
// or if your file does not have a row with column headers, you can use indexes:
parserSettings.selectIndexes(4, 20, 2);
CsvParser parser = new CsvParser(settings);
List<String[]> allRows = parser.parseAll(new FileReader(f));
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).
Do NOT try to parse a CSV by yourself. There are many intricacies there. For example, using code such as the one you pasted:
while ((text = br.readLine()) != null) {
Will break as soon as your CSV has values that contain newline characters.

Java : Writing CSV in String format to CSV in a file

A method returns a String in comma separated format. For example, the returned String can be like the one given below.
Tarantino,50,M,USA\n Carey Mulligan,27,F,UK\n Gong Li,45,F,China
I will need to get this String and write it into a CSV file. I'll have to insert a header and a footer for this file as well.
For example, when I open the file, the contents for the above data will be
Name,Age,Gender,Country
Tarantino,50,M,USA
Carey Mulligan,27,F,UK
Gong Li,45,F,China
How do we do that ? Are there any open source libraries to do this task ?
CSV format is not very well defined. You don't have to write headers for the file. Instead it is pretty SIMPLE format. Data values are separated using commas or semicolon or space etc.
You just have to write your own simple method that writes your string to a file on local computer using FileOutputStream or Writer in java.io package.
You can use this as a learning example.
I used BufferedReader because he will take care about line separators, but you can also use #split method, and write the resulting tokens.
import java.io.*;
public class Tests {
public static void main(String[] args) {
File file = new File("out.csv");
BufferedWriter out = null;
try {
out = new BufferedWriter(new FileWriter(file));
String string = "Tarantino,50,M,USA\n Carey Mulligan,27,F,UK\n Gong Li,45,F,China";
BufferedReader reader = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(string.getBytes())));
String line;
while ((line = reader.readLine()) != null) {
out.write(line.trim());
out.newLine();
}
}
catch (IOException e) {
// log something
e.printStackTrace();
}
finally {
if (out != null) {
try {
out.close();
} catch (IOException e) {
// ignored
}
}
}
}
}
This is pretty simple
String str = "Tarantino,50,M,USA\n Carey Mulligan,27,F,UK\n Gong Li,45,F,China";
PrintWriter pr = new PrintWriter(new FileWriter(new File("test.csv"), true));
String arr[] = str.split("\\n");
// splited the string by new line provided with the string
pr.println("Name,Age,Gender,Country");
// header written first and rest of data appended
for(String s : arr){
pr.println(s);
}
pr.close();
don't forget to close the stream in finally block and handle the exception

Storing single csv line object into arrays

hey ive got a chunk of code here trying to read a single line in a .csv file:
rows = new WarehouseItem[];
public void readCSV(String filename) {
FileInputStream fileStrm = null;
InputStreamReader rdr;
BufferedReader bufRdr;
int lineNum;
String line;
try {
fileStrm = new FileInputStream(filename);
rdr = new InputStreamReader(fileStrm);
bufRdr = new BufferedReader(rdr);
numRows = 0;
line = bufRdr.readLine();
while (line != null) {
rows[numRows] = line;
numRows++;
line = bufRdr.readLine();
}
fileStrm.close();
}
catch (IOException e) {
if (fileStrm != null) {
try {
fileStrm.close();
} catch (IOException ex2) {}
}
System.out.println("Error in file processing: " + e.getMessage());
}
}
on the rows[numRows] = line im trying to store the line into an array of objects(i have premade an object which contains an array of strings and the number of columns)
im not entirely sure how to store the single line im trying to read in my object.
any help would be really appreciated :)
Your life would be an awful lot easier if you used a CSV library to do this. With jackson it's really simple to read CSV into an array of objects.
For example:
CsvMapper mapper = new CsvMapper();
mapper.enable(CsvParser.Feature.WRAP_AS_ARRAY);
File csvFile = new File("input.csv"); // or from String, URL etc
MappingIterator<Object[]> it = mapper.reader(Object[].class).readValues(csvFile);
See here for more info on parsing CSV in java: http://demeranville.com/how-not-to-parse-csv-using-java/

Categories