I would like to be able to take 2 csv files as input, join them (SQL style) on a specific column, and output a new csv file which contains all file1 data, plus one of the columns of data from file2.
Any tips on what the best way to achieve this would be? Since SQL offerse the join command then possibly some method of treating the csv files as databases would work well, but I'm open to all suggestions really - the easiest wins.
All help is much appreciated!
Do some simple file IO, split each line and load it into a Set type container. Then you can do set type operations on the content of the two files:
http://www.java2s.com/Code/Java/Collections-Data-Structure/Setoperationsunionintersectiondifferencesymmetricdifferenceissubsetissuperset.htm
you can parse your CSV files and bind them to the Beans with opencsv:
http://opencsv.sourceforge.net/
here, you can bind entities in CSV to a list of Beans:
http://opencsv.sourceforge.net/#javabean-integration
you can then do with List of Beans programmaticly what you want, like appending lists each other, or a join-like logic etc.
A very simple, non-programmatic approach: import both text files into a spreadsheet, then use vlookup (or its equivalent) to look up values from one sheet into the other.
For direct manipulation of CSV files as SQL tables see:
Reading a CSV file into Java as a DB table
You might also try to use a JDBC driver for CSV files like this one:
http://sourceforge.net/projects/csvjdbc/
I have written a command line program to execute arbitrary SQL on a csv files, including multi-file joins, called gcsvsql. You can read about it here:
http://bayesianconspiracy.blogspot.com/2010/03/gcsvsql.html
There is a Google Code project for it here: http://code.google.com/p/gcsvsql/
It's written in Java/Groovy, and will run anywhere Java is available.
Related
I need to output data to a CSV file from Java, but in that csv file I hope to create multiple sheets so that data can be organized in a better way. After some googling, it seems this is not possible. A CSV file can only receive one-sheet data.
Is this true? If yes, what would be the options? Thank you.
CSV file is interpreted a sequence of characters which comply to some standardization, therefor it cannot contains more than one sheet. You can output your data in a Excel file that contains more than one sheet using the Apache POI api.
Comma Separated Value lists are generally created in plain text files, which do not have multiple pages. What you could do instead is create multiple CSV files, but if you really have a lot of data, a data base might be your best bet.
I have two csv files employee.csv and loan.csv.
In employee.csv I have four columns i.e. empid(Integer),name(String),age(Integer),education(String).
In loan.csv I have three columns i.e. loan(Double),balance(Double),empid(Integer).
Now, I want to merge these two csv files into a single csv file by empid column.So in the result.csv file the columns should be,
empid(Integer),
name(String),
age(Integer),
education(String),
loan(Double),
balance(Double).
Also I have to achieve this only by using kettle api program in Java.
Can anyone please help me?
First of all, you need to create a kettle transformation as below:
Take two "CSV Input Step", one for employee.csv and another for loan.csv
Hop the input to the "Stream Lookup" step and lookup using the "emplid"
Final step : Take a Text file output to generate a csv file output.
I have placed the ktr code in here.
Secondly, if you want to execute this transformation using Java, i suggest you read this blog. I have explained how to execute a .ktr/.kjb file using Java.
Extra points:
If its required that the names of the csv files need to be passed as a parameter from the Java code, you can do that by adding the below code:
trans.setParameterValue(parameterName, parameterValue);
where parameterName is the some variable name
and parameterValue is the name of the file or the location.
I have already taken the files names as the parameter in the kettle code i have shared.
Hope it helps :)
I have data coming from files which is spread in different files
like id,name,birthdate in one file and id,address in another file ie a csv files.
This is just an example the user has to specify the columns as its done while using SSIS
and what i want to do is create the combined file which has the whole content as
id,name,birthdate,address
are there any tools available in java/ruby for this?
I have seen the sed solution but can not go with it as the columns are not fixed......
In short i want ET function from ETL ........
Do you need Java or Ruby ? Instead have you looked at the Unix join utility ? It's analogous to the SQL join statement, except it works on text files.
I would like to know how to export the contents of a table or the data from a query to an Excel file. Also wich is the file extension that is better to export to, xls or csv?
Thanks in advance.
Edit: What i want is the user to to be able to export the contents of a JTable -containing the results from a query- to an Excel compatible file, by pressing a button.
I don't know what is the best way to do it? I found various ways but i'm not sure which one to follow. Is it possible to generate a JasperReport then export tha same data to excel?
Edit2:Ok so i decided to export to .csv like most of you suggest. My last question is which one is better to use, opecsv or javacsv? Both seem really easy to use.
Thanks!
Exporting to csv is easier - and could be done manually in a pinch depending on the data (Each new row is a new line, and cell values are seperated by a comma) - There are open source libraries available for this (http://opencsv.sourceforge.net/), and the code for copying a resultset to your output should be trivial
If you absolutely need Excel, use the Apache POI library.
You have to create text file (csv) and write the result of database.
PrintWriter out
= new PrintWriter(new BufferedWriter(new FileWriter("foo.csv")));
while(rs.next())
{
out.println(String.format("%s,%s,%s",rs.getString(1),rs.getString(2),rs.getString(3));
}
In addition to the answers already given, I would like to say that I would prefer CSV.
CSV is application-agnostic and you could manipulate the data later on with any other language/program (Python, R, Java, Excel, etc).
I had good success with jXLS:
http://jxls.sourceforge.net/
this lets you use JSP-like tags in a native Excel template with all the formatting etc. You pass data to substitute into that Excel template from Java API calls, via a Map structure (analogous to request scope vars.)
This is a good lighter-weight alternative to JasperReports if you just want formatted Excel output.
I've found numerous posts about reading CSV with Java and the APIs they were pointing at all had a line-oriented approach when it came to reading a CSV file. Something like "while you get a line, get the values of every column".
Are there better ways to do that?
Thanks for any suggestions!
You will need a database. Whether you write your own or use a third party one.
If not you will be doing sequential searches on your data to find anything.
You might want to look at this post: Reading a CSV file into Java as a DB table
It looks like you have all the info you need.