I have a .csv file with 177 rows and 18,000 odd cols.Given the column label, I should pick that particular column and as a default the first two label columns.
Please help me with this,
Thanks all,
Priya
So, what's the question? Parse CSV file. You can either implement this yourself or use third party code.
If you implement it yourself read line by line, split lines line.split(",") into elements and put it into data structure that should be a map of lists:
Map<String, List<String>> table = new LinkedHashMap<String, List<String>>();
Use column name as a key and column values as a list elements.
LinkedHashMap is preferable here to preserve the order of your columns.
Read first line that contains the column names and create list instances:
table.put(columnName, new LinkedList<String>());
Additionally create an array of column names:
String[] columns = new String[0];
table.keys().toArray();
Now continue iterating over your data and populate your table:
String[] data = line.split(",");
for (int i = 0; i < data.length; i++) {
table.get(columns[i]).add(data[i]);
}
TBD...
Good luck.
Have you looked into OpenCSV ?
You may go for OpenCSV
Related
I am asked to create a word vector space from a csv file. So I need to extract words and their vectors(the size is 57) to a dictionary for being able to reuse it for my futur operations.
My csv format is giving me a lot of problems because it's basically a text with key and doubles all separated by spaces and i wasn't able to separate correctly string and double parts until now.
So do you have any idea how to parse this file into a dictionary which contains (key, vector) type of entries.
Thanks a lot.
Here is a demonstration of csv file:
key1 4.0966564 7.963437 -2.1844673 1.9319566 -0.04495791 2.454401 3.1006012 -0.3813638 1.567303 -2.2067556 3.44506744 -4.382278 4.1457844 2.342756 -2.7707205 3.5015 2.5717492 -2.6846366...
key2 -3.968007 0.86151505 0.06163538 1.918614 0.34340435 -1.5178788 1.3857365 0.230331 0.7025755 -2.6575062 -0.7426953 3.1636698 2.8441591 0.4522623 3.3907628 2.425691 -1.2052362....
.
.
.
This data structure is called a multi-map: a key can have multiple values.
You can find examples in libraries.
If you'd rather not have the dependency, and wish to write your own, it might look like this:
public class MultiMap {
private Map<String, List<Double>> multi = new HashMap<>();
public void put(String key, Double newValue) {
if (newValue != null) {
List<Double> values = (this.multi.containsKey(key) ? this.multi.get(key) : new ArrayList<>());
values.add(newValue);
this.multi.put(key, values);
}
}
}
It's possible to use generics, but I'm too lazy to bother right now. This example is correct for your narrow use case.
Split each line into tokens by splitting at regex "\\s+". The first value is the key; iterate over all the others to add them to the multi-map.
You can do something like that :
String line = "key1 4.0966564 7.963437";
String[] parts = line.split(" ");
String key = parts[0];
ArrayList<Double> values = new ArrayList<Double>();
for(int i =1; i < parts.length; i++){
String doubleAsString = parts[i];
values.add(Double.valueOf(doubleAsString));
}
And then add this elements to your map.
I have got some csv-files, which I want to transform into a json.
Unfortunately the structure of the csv doesn't match the desired json format. a) because csv is a flat structure and the json should be of nested structure. b) because the column headers don't match the json property names.
Illustrating minimal example CSV:
ColumnNameX,ColumnNameY,ColumnNameZ
valueX,valueY,valueZ
should be transformed into this JSON:
{
"XZObject": {
"absurdlyNotNamedLikeCsvHeading": "valueX",
"AlsoNOTColumnNameZ": "valueZ" },
"YyyyyWhy": {
"ThisResemblesColumnNameY": "valueY"
}
I would naively go and make some representing POJO-classes and put in the values by position – like so (pseudocode):
class Container {Fields:XZObject,YyyyyWhy}
class XZObject {Fields:absurdlyNotNamedLikeCsvHeading,AlsoNOTColumnNameZ}
class YyyyyWhy {Fields:ThisResemblesColumnNameY}
new XZObject(absurdlyNotNamedLikeCsvHeading=csvLineElements[0],AlsoNOTColumnNameZ=csvLineElements[2])
new YyyyyWhy(ThisResemblesColumnNameY=csvLineElement[1])
new Container(XZObject,YyyyyWhy)
…then I'd transform the Container object to JSON with gson.
The problem is, when a field in the CSV gets added to the scheme, I'd have to adjust every positional mapping after the new column.
So I wonder: Is there a simple way to map CSV-columns by header to a specific JSON property? Preferably with gson-lib.
In other words: Can I i.e. map the value in column with header "ColumnNameZ" to property "XZobject.AlsoNOTColumnNameZ" in a simple way?
I think parsing the CSV file into Objects is the good way to go.
You can read the first column first, split it and calculate the index of each column at runtime. Then it doesn't matter if you add/remove or shuffle columns
Assuming you read the first line and you have
String firstRow = "ColumnNameX,ColumnNameY,ColumnNameZ";
Parse it this way:
List<String> columnList = Arrays.asList(firstRow.split(","));
int COLUMN_NAME_X_INDEX = columnList.indexOf("ColumnNameX");
int COLUMN_NAME_Y_INDEX = columnList.indexOf("ColumnNameY");
int COLUMN_NAME_Z_INDEX = columnList.indexOf("ColumnNameZ");
Than use your newly found indexes:
XZObject xzObject = new XZObject(csvLineElements[COLUMN_NAME_X_INDEX], csvLineElements[COLUMN_NAME_Z_INDEX]);
YyyyyWhy yyyyyWhy = new YyyyyWhy(csvLineElements[COLUMN_NAME_Y_INDEX]);
Container container = new Container(XZObject,YyyyyWhy);
I have a test.csv file that is formatted as:
Home,Owner,Lat,Long
5th Street,John,5.6765,-6.56464564
7th Street,Bob,7.75,-4.4534564
9th Street,Kyle,4.64,-9.566467364
10th Street,Jim,14.234,-2.5667564
I have a hashmap that reads a file that contains the same header contents such as the CSV, just a different format, with no accompanying data.
In example:
Map<Integer, String> container = new HashMap<>();
where,
Key, Value
[0][NULL]
[1][Owner]
[2][Lat]
[3][NULL]
I have also created a second hash map that:
BufferedReader reader = new BufferedReader (new FileReader("test.csv"));
CSVParser parser = new CSVParser(reader, CSVFormat.DEFAULT);
Boolean headerParsed = false;
CSVRecord headerRecord = null;
int i;
Map<String,String> value = new HashMap<>();
for (final CSVRecord record : parser) {
if (!headerParsed = false) {
headerRecord = record;
headerParsed = true;
}
for (i =0; i< record.size(); i++) {
value.put (headerRecord.get(0), record.get(0));
}
}
I want to read and compare the hashmap, if the container map has a value that is in the value map, then I put that value in to a corresponding object.
example object
public DataSet (//args) {
this.home
this.owner
this.lat
this.longitude
}
I want to create a function where the data is set inside the object when the hashmaps are compared and when a value map key is equal to a contain map key, and the value is placed is set into the object. Something really simply that is efficient at handling the setting as well.
Please note: I made the CSV header and the rows finite, in real life, the CSV could have x number of fields(Home,Owner,Lat,Long,houseType,houseColor, ect..), and a n number of values associated to those fields
First off, your approach to this problem is too unnecessarily long. From what I see, all you are trying to do is this:
Select a two columns from a CSV file, and add them to a data structure. I highlighted the word two because in a map, you have a key and a value. One column becomes the key, and the other becomes the value.
What you should do instead:
Import the names of columns you wish to add to the data structure into two strings. (You may read them from a file).
Iterate over the CSV file using the CSVParser class that you did.
Store the value corresponding to the first desired column in a string, repeat with the value corresponding to the second desired column, and push them both into a DataSet object, and push the DataSet object into a List<DataSet>.
If you prefer to stick to your way of solving the problem:
Basically, the empty file is supposed to hold just the headers (column names), and that's why you named the corresponding hash map containers. The second file is supposed to contain the values and hence you named the corresponding hash map values.
First off, where you say
if (!headerParsed = false) {
headerRecord = record;
headerParsed = true;
}
you probably mean to say
if (!headerParsed) {
headerRecord = record;
headerParsed = true;
}
and where you say
for (i =0; i< record.size(); i++) {
value.put(headerRecord.get(0), record.get(0));
}
you probably mean
for (i =0; i< record.size(); i++) {
value.put(headerRecord.get(i), record.get(i));
}
i.e. You iterate over one record and store the value corresponding to each column.
Now I haven't tried this code on my desktop, but since the for loop also iterates over Home and Longitude, I think it should create an error and you should add an extra check before calling value.put (i.e. value.put("Home", "5th Street") should create an error I suppose). Wrap it inside an if conditional and check of the headerRecord(i) even exists in the containers hash map.
for (i =0; i< record.size(); i++) {
if (container[headerRecord.get(i)] != NULL) {
value.put(headerRecord.get(i), record.get(i));
}
}
Now thing is, that the data structure itself depends on which values from the containers hash map you want to store. It could be Home and Lat, or Owner and Long. So we are stuck. How about you create a data structure like below:
struct DataSet {
string val1;
string val2;
}
Also, note that this DataSet is only for storing ONE row. For storing information from multiple rows, you need to create a Linked List of DataSet.
Lastly, the container file contains ALL the column names. Not all these columns will be stored in the Data Set (i.e. You chose to NULL Home and Long. You could have chosen to NULL Owner and Lat), hence the header file is not what you need to make this decision.
If you think about it, just iterate over the values hash map and store the first value in string val1 and the second value in val2.
List<DataSet> myList;
DataSet row;
Iterator it = values.entrySet().iterator();
while (it.hasNext()) {
Map.Entry pair = (Map.Entry)it.next();
row.val1 = pair.getKey();
row.val2 = pair.getValue();
myList.add(row);
it.remove();
}
I hope this helps.
I have an array that was created from an ArrayList which was in turn created from a ResultSet. This array contains rows of database table and each row (with several columns based on my query) exists as a single element in the array. So far so good. My problem is how to get individual values (columns) from each row which, I said earlier, now exists as an element. I can get each element (row, of course) but that is not what I want. Each element is a composite of several values and how to get those? I am a beginner and really stuck here. I think this all make sense. Here's the code how I created the array.
List resultsetRowValues = new ArrayList();
while (resultSet.next()){
for (int i = 1; i <= columnCount; i++) {
resultsetRowValues.add(resultSet.getString(i));
}
}
String[] databaseRows = (String[]) resultsetRowValues.toArray(new String[resultsetRowValues.size()]);
EDIT: More explanation
My MySQL query is as follows:
String query = "SELECT FIRSTNAME, LASTNAME, ADDRESS FROM SOMETABLE WHERE CITY='SOMECITY'";
This returns several rows in a ResultSet. And according to the sample query each element of an array will cotain three values (columns) i.e FIRSTNAME, LASTNAME and ADDRESS. But these three values exist in the array as a single element. While I want each column separately from each element (which is actually a row of the database table). When I iterate through the aarray using for loop and print the values to the console, I get output similar to the following:
Doe
Jhon
Some Street (End of First element)
Smith
Jhon
Some Apartment (End of Second element and so on)
As it is evident from the output, each element of the contains three values which are printed on separate lines.
How to get these individual values.
You probably want something like that:
List<Map<String, String>> data = new ArrayList<>();
while (resultSet.next()){
Map<String, String> map = new HashMap<>();
for (int i = 1; i <= columnCount; i++) {
map.put("column" + i, resultSet.getString(i));
}
data.add(map)
}
// usage: data.get(2).get("column12") returns line 3 / column 12
Note that there are other possible options (2D-array, guava Table, ...)
My first CSV file looks like this with header included (header is included only at the top not after every entry):
NAME,SURNAME,AGE
Fred,Krueger,Unknown
.... n records
My second file might look like this:
NAME,MIDDLENAME,SURNAME,AGE
Jason,Noname,Scarry,16
.... n records with this header template
The merged file should look like this:
NAME,SURNAME,AGE,MIDDLENAME
Fred,Krueger,Unknown,
Jason,Scarry,16,Noname
....
Basically if headers don't match, all new header titles (columns) should be added after original header and their values according to that order.
Update
Above CSV were made smaller so I can illustrate what I want to achieve, in reality CSV files are generated one step before this (merge) and can be up to 100 columns
How can I do this?
I'd create a model for the 'bigger' format (a simple class with four fields and a collection for instances of this class) and implemented two parsers, one for the first, one for the second model. Create records for all rows of both csv files and implement a writer to output the csv in the correct format. IN brief:
public void convert(File output, File...input) {
List<Record> records = new ArrayList<Record>();
for (File file:input) {
if (input.isThreeColumnFormat()) {
records.addAll(ThreeColumnFormatParser.parse(file));
} else {
records.addAll(FourColumnFormatParser.parse(file));
}
}
CsvWriter.write(output, records);
}
From your comment I see, that you a lot of different csv formats with some common columns.
You could define the model for any row in the various csv files like this:
public class Record {
Object id; // some sort of unique identifier
Map<String, String> values; // all key/values of a single row
public Record(Object id) {this.id=id;}
public void put(String key, String value){
values.put(key, value);
}
public void get(String key) {
values.get(key);
}
}
For parsing any file you would first read the header and add the column headers to a global keystore (will be needed later on for outputting), then create records for all rows, like:
//...
List<Record> records = new ArrayList<Record>()
for (File file:getAllFiles()) {
List<String> keys = getColumnsHeaders(file);
KeyStore.addAll(keys); // the store is a Set
for (String line:file.getLines()) {
String[] values = line.split(DELIMITER);
Record record = new Record(file.getName()+i); // as an example for id
for (int i = 0; i < values.length; i++) {
record.put(keys.get(i), values[i]);
}
records.add(record);
}
}
// ...
Now the keystore has all used column header names and we can iterate over the collection of all records, get all values for all keys (and get null if the file for this record didn't use the key), assemble the csv lines and write everything to a new file.
Read in the header of the first file and create a list of the column names. Now read the header of the second file and add any column names that don't exist already in the list to the end of the list. Now you have your columns in the order that you want and you can write this to the new file first.
Next I would parse each file and for each row I would create a Map of column name to value. Once the row is parsed you could then iterate over the new list of column names and pull the values from the map and write them immediately to the new file. If the value is null don't print anything (just a comma, if required).
There might be more efficient solutions available, but I think this meets the requirements you set out.
Try this:
http://ondra.zizka.cz/stranky/programovani/ruzne/querying-transforming-csv-using-sql.texy
crunch input.csv output.csv "SELECT AVG(duration) AS durAvg FROM (SELECT * FROM indata ORDER BY duration LIMIT 2 OFFSET 6)"