I am tasked with the work of Scraping data from a webpage and write them a long with other information into a CSV. Currently I used JSoup to scrape the website but my problem is not sure how to write them to a CSV.
I store the data of each scraped page inside of an Object calls CSVObject:
public class CSVObject {
String name;
String title;
String description;
String ArrayList<String> color;
String ArrayList<String> size;
String ArrayList<float> price;
}
I store these Objects in an ArrayList<CSVObject>
The name, title, description is from the scraped data but the color, size and price are from user input. They can choose multiple and it will add to the ArrayList in the Object.
The desired file output is something like this:
Name Title Description Color Size Price
Shirt Holiday Shirt Shirt Description Black S 15.99
Shirt Black M 19.99
Shirt Black L 24.99
Shirt Green S 15.99
Shirt Green M 19.99
Shirt Green L 24.99
Pants Movie Pants Pants Description Red S 17.99
...
I did some digging and found Java CSV Library in How to serialize object to CSV file? can help write file to CSV but I am not sure how to format it to the desire output. So what should I do to write the file as intended?
Flat file
Comma-Separated Values (CSV) and Tab-Delimited formats are for flat files, a single table in each. That means one set of rows that all share the same set of columns.
To export the data as seen in your example data, repeat the values in the first columns that you have suppressed. Then you would have a set of rows all sharing the same set of columns.
Hierarchy
According to your Java class, you have a hierarchy of data. That does not fit CSV format. Square peg, round hole.
To match the structure of your Java class, you should be serializing your data in a hierarchical format such as XML or JSON.
Not-really-CSV
If you insist on using that not-really-CSV format you showed, you need nested loops.
Loop your set of objects. For each of those objects, loop the lists contained within.
On the first time through the lists, write out all columns. For subsequent times in the inner loop, suppress those values, writing only a COMMA character to maintain the column count.
Straight-forward logic, nothing tricky, following the same steps you would take if writing these values by hand to paper.
Of course, any field values containing your delimiter character (COMMA, etc.) must be enclosed within quotes. Or just enclose all fields in quotes.
Here's a quick and dirty, it assumes your lists of color, prices and sizes always have the same length
interface CSVObject {
String name();
String title();
String description();
List<String> color();
List<String> size();
List<Double> price();
}
List<CSVObject> data = List.of();
String csv =data.stream()
.flatMap(co->IntStream.range(0,co.color().size())
.mapToObj(i->new String[]{co.name(),co.title(),co.description(),co.color().get(i),co.size().get(i),co.price().get(i).toString()} ))
.map(sa-> Arrays.stream(sa).collect(Collectors.joining(",")))
.collect(Collectors.joining("\n"));
Related
I have large csv with several columns with duplicated names, e.g.:
name;surname;status;fullname;status;
John;Doe;Active;John Doe;Married;
And I'd like to map columns only by names but since there are duplicates I probably need to use #BindByPosition as well. So my class would look like this:
#BindByName("name")
String name;
#BindByName("surname")
String surname;
#BindByName("status")
#BindByPosition(2)
String workStatus;
#BindByName("fullname")
String fullname;
#BindByName("status")
#BindByPosition(4)
String maritialStatus;
But it won't work, it'll only map both "statuses" and leave other fields empty. Is there a way to use position only for those two columns?
In my real code I have 130 columns and filling positions for all of them would be a nightmare - especially because the order will change in the future.
But maybe there is some other way of dealing with duplicated column names?
I'm currently using SnakeYAML lib to parse yaml. Everything is fine except the fact that i have fields that can sometimes be String or sometimes array of Strings as follows:
fields:
amount: NET A PAYER (-?[\d ]+,\d{2})
amount_untaxed: TOTAL HT ([\d ]+,\d{2})
amount_tva: TOTAL TVA ([\d ]+,\d{2})
and with arrays:
fields:
amount: TotalincludingVAT£(\d+\.\d+)
date:
- Invoicedate(\d{1,2}\w+,\d{4})
- Issuedate(\d{1,2}\w+,\d{4})
so how can i make something that can handle fields containing single string and fields containing arrays ? I'm currently using a Map >String,String> to store fields but it doesnt handle fields that are arrays. I tried a Map>String, List>String>> to handle this but this doesnt work for single strings.
I have a List<Person> that I want to write to a CSV-file and then read it again. My problem is that the Person-class have a List<Insurance>-attribute. Therefor the line length will depend on how many objects are in the List<Insurance>-attribute. A Person can have 0 or multiple insurances and these are stored in a List in the specific Person. Is there a reasonable way of doing this?
Like this:
public class Person {
private int insuranceNR;
private String firstName;
private String lastName;
private List<Insurance> insurances;
}
I have failed to find questions like mine, but if there are please redirect me. Thanks.
CSV is not the best choice here because it is supposed to have fixed number of columns for effective parsing. You should use json or xml.
However, if you still want to use CSV, you must ensure there is only 1 list element in the class (future modifications will break the consistency), and that too is written at the end of row.
something like this
1, Bill, Gates, Ins1, Ins2, Ins3
2, Donald, Trump
3, Elon, Musk, Ins4
4, Jeff, Bezos, Ins5, Ins6, Ins7, Ins8, Ins9
In your code, only consider first 3 elements as fixed, and iterate over remaining accordingly.
Here is a reference to a problem similar to yours:
Using CsvBeanReader to read a CSV file with a variable number of columns
I have a text file which gets created by a batch script where the field position are always the same. Within that file, I want to go to a specific position and add a field to it.
For example, suppose my file has only two lines and has the following fields:
number customer_id account_no address price plan
number customer_id account_no address price plan
I want to add an extra field between address and price so the new file will look similar to this:
number customer_id account_no address newfield price plan
number customer_id account_no address newfield price plan
I couldn't find any Utility class that can go to a specific position of a line in a file and write to it.
I could do it the tedious way of placing the fields to an array including the whitespace and spitting it out field by field to a new file, however, that was too cumbersome and very tedious (since it has more than 90 fields) and was wondering if there is an easier way.
I could do it the tedious way of placing the fields to an array including the whitespace and spitting it out field by field to a new file, however, that was too cumbersome and very tedious (since it has more than 90 fields) and was wondering if there is an easier way.
If you were to do it by using Java, there is no way you can insert data in between text. It only allows you to append to the end of the file.
Anyway if you need to insert a new column and maintain the format consistency of the entire data file, you need to insert a whitespace for those rows without a value, hence rewriting all rows is still inevitable.
I couldn't find any Utility class that can go to a specific position of a line in a file and write to it
If you couldn't find one, you can write one yourself. It is actually not that tedious. Just write a utility class yourself, so you can use it in future.
//A brief example..
public final class FileUtility{
private static String filepath;
public static void setFilePath(String filepath){
FileUtility.filepath = filepath;
}
public static int searchField (String fieldname, int lineNo){
//return position of given field namein a specific line
}
public static void insertDataAt (String data, int column){
//return position of given field
}
public static boolean dataExist(String data, int lineNo){
//return true if given data exist at given line number
}
}
Forget about appending to the middle of the file. Files are sequence of bytes, so, you need to process all the bytes after insert points first to move them N bytes forward to create a place for your modification. This costs more than processing file on the fly and writing new lines to another file.
i got a task which iam not sure of how to solve.
I have to fill a JTable with rows i get from a .txt document. The problem is that there are multiple .txt documents which have more or less rows and columns for the JTable.
example:
inside the cars.txt:
id;hp;price;quantity
1;100;7000;5
4;120;20000;2
7;300;80000;3
inside the bikes.txt
id;price;quantity;color;year
3;80;20;red;2010
5;200;40;green;2011
12;150;10;blue;2007
So, when a .txt is chosen a JDialog will pop up with a JTable inside, where the data will be shown.
I thought that i could maybe create a "class Anything" where i have a instance variable String[][] which i can define the sizes by reading the .txt and after saving the data in one array i can count how many rows and how many columns it has,
with the cars.txt example it would be: String[4][3]
Is that a good way to work with or is there a better way to do it?
Thanks for the help :D
Your question is a bit vague on what you want to do specifically.
Do you want to simply fill the table with all data given? Or do you only want certain columns used? When you choose the text files are you aware of which column names they have (can you hardcode this or not).
A good start would be...
EDITED here's the solution.....
DefaultTableModel dtm = (DefaultTableModel)yourJTable.getModel();
// This divides your txt file to a string array divided by rows
string[] RowSplit = yourTxtFileThatYouRead.split("\n");
//this assumes that your txt file contains the column headers
dtm.setColumnHeaders(RowSplit[0].split(";"));
//Start the iteration at 1 to skip the column headers
for (int i = 1; i < RowSplit.length; ++i) {
dtm.addRow(RowSplit[i].split(//some delimeter //));
dtm.fireTableDataChanged();
The first part sets the column headers and enables for variation within your table column size.
The second part sequentially adds rows.
edited for formatting
edited for better answer
As shown in How to Use Tables: Creating a Table Model, you can extend AbstractTableModel to manage models of arbitrary dimensions. Let your model manage a List<List<String>>. Parse the first line of each file into a List<String> that is accessed by your implementations of getColumnCount() and getColumnName(). Parse subsequent lines into one List<String> per row; access the List of such rows in your implementation of getValueAt(). A related example that manages a Map<String, String> is shown here. Although more complex, you can use Class Literals as Runtime-Type Tokens for non-string data; return the token in your implementation of getColumnClass() to get the default render and editor for supported types. Finally, consider one of these file based JDBC drivers for flat files.