I'm trying to read a CSV file into a list of lists (of strings), pass it around for getting some data from a database, build a new list of lists of new data, then pass that list of lists so it can be written to a new CSV file. I've looked all over, and I can't seem to find an example on how to do it.
I'd rather not use simple arrays since the files will vary in size and I won't know what to use for the dimensions of the arrays. I have no issues dealing with the files. I'm just not sure how to deal with the list of lists.
Most of the examples I've found will create multi-dimensional arrays or perform actions inside the loop that's reading the data from the file. I know I can do that, but I want to write object-oriented code. If you could provide some example code or point me to a reference, that would be great.
ArrayList<ArrayList<String>> listOLists = new ArrayList<ArrayList<String>>();
ArrayList<String> singleList = new ArrayList<String>();
singleList.add("hello");
singleList.add("world");
listOLists.add(singleList);
ArrayList<String> anotherList = new ArrayList<String>();
anotherList.add("this is another list");
listOLists.add(anotherList);
Here's an example that reads a list of CSV strings into a list of lists and then loops through that list of lists and prints the CSV strings back out to the console.
import java.util.ArrayList;
import java.util.List;
public class ListExample
{
public static void main(final String[] args)
{
//sample CSV strings...pretend they came from a file
String[] csvStrings = new String[] {
"abc,def,ghi,jkl,mno",
"pqr,stu,vwx,yz",
"123,345,678,90"
};
List<List<String>> csvList = new ArrayList<List<String>>();
//pretend you're looping through lines in a file here
for(String line : csvStrings)
{
String[] linePieces = line.split(",");
List<String> csvPieces = new ArrayList<String>(linePieces.length);
for(String piece : linePieces)
{
csvPieces.add(piece);
}
csvList.add(csvPieces);
}
//write the CSV back out to the console
for(List<String> csv : csvList)
{
//dumb logic to place the commas correctly
if(!csv.isEmpty())
{
System.out.print(csv.get(0));
for(int i=1; i < csv.size(); i++)
{
System.out.print("," + csv.get(i));
}
}
System.out.print("\n");
}
}
}
Pretty straightforward I think. Just a couple points to notice:
I recommend using "List" instead of "ArrayList" on the left side when creating list objects. It's better to pass around the interface "List" because then if later you need to change to using something like Vector (e.g. you now need synchronized lists), you only need to change the line with the "new" statement. No matter what implementation of list you use, e.g. Vector or ArrayList, you still always just pass around List<String>.
In the ArrayList constructor, you can leave the list empty and it will default to a certain size and then grow dynamically as needed. But if you know how big your list might be, you can sometimes save some performance. For instance, if you knew there were always going to be 500 lines in your file, then you could do:
List<List<String>> csvList = new ArrayList<List<String>>(500);
That way you would never waste processing time waiting for your list to grow dynamically grow. This is why I pass "linePieces.length" to the constructor. Not usually a big deal, but helpful sometimes.
Hope that helps!
If you are really like to know that handle CSV files perfectly in Java, it's not good to try to implement CSV reader/writer by yourself. Check below out.
http://opencsv.sourceforge.net/
When your CSV document includes double-quotes or newlines, you will face difficulties.
To learn object-oriented approach at first, seeing other implementation (by Java) will help you. And I think it's not good way to manage one row in a List. CSV doesn't allow you to have difference column size.
The example provided by #tster shows how to create a list of list. I will provide an example for iterating over such a list.
Iterator<List<String>> iter = listOlist.iterator();
while(iter.hasNext()){
Iterator<String> siter = iter.next().iterator();
while(siter.hasNext()){
String s = siter.next();
System.out.println(s);
}
}
Something like this would work for reading:
String filename = "something.csv";
BufferedReader input = null;
List<List<String>> csvData = new ArrayList<List<String>>();
try
{
input = new BufferedReader(new FileReader(filename));
String line = null;
while (( line = input.readLine()) != null)
{
String[] data = line.split(",");
csvData.add(Arrays.toList(data));
}
}
catch (Exception ex)
{
ex.printStackTrace();
}
finally
{
if(input != null)
{
input.close();
}
}
I'd second what xrath said - you're better off using an existing library to handle reading / writing CSV.
If you do plan on rolling your own framework, I'd also suggest not using List<List<String>> as your implementation - you'd probably be better off implementing CSVDocument and CSVRow classes (that may internally uses a List<CSVRow> or List<String> respectively), though for users, only expose an immutable List or an array.
Simply using List<List<String>> leaves too many unchecked edge cases and relying on implementation details - like, are headers stored separately from the data? or are they in the first row of the List<List<String>>? What if I want to access data by column header from the row rather than by index?
what happens when you call things like :
// reads CSV data, 5 rows, 5 columns
List<List<String>> csvData = readCSVData();
csvData.get(1).add("extraDataAfterColumn");
// now row 1 has a value in (nonexistant) column 6
csvData.get(2).remove(3);
// values in columns 4 and 5 moved to columns 3 and 4,
// attempting to access column 5 now throws an IndexOutOfBoundsException.
You could attempt to validate all this when writing out the CSV file, and this may work in some cases... but in others, you'll be alerting the user of an exception far away from where the erroneous change was made, resulting in difficult debugging.
public class TEst {
public static void main(String[] args) {
List<Integer> ls=new ArrayList<>();
ls.add(1);
ls.add(2);
List<Integer> ls1=new ArrayList<>();
ls1.add(3);
ls1.add(4);
List<List<Integer>> ls2=new ArrayList<>();
ls2.add(ls);
ls2.add(ls1);
List<List<List<Integer>>> ls3=new ArrayList<>();
ls3.add(ls2);
methodRecursion(ls3);
}
private static void methodRecursion(List ls3) {
for(Object ls4:ls3)
{
if(ls4 instanceof List)
{
methodRecursion((List)ls4);
}else {
System.out.print(ls4);
}
}
}
}
Also this is an example of how to print List of List using advanced for loop:
public static void main(String[] args){
int[] a={1,3, 7, 8, 3, 9, 2, 4, 10};
List<List<Integer>> triplets;
triplets=sumOfThreeNaive(a, 13);
for (List<Integer> list : triplets){
for (int triplet: list){
System.out.print(triplet+" ");
}
System.out.println();
}
}
Related
So I am trying to create an for loop to find unique elements in a ArrayList.
I already have a ArrayList stored with user input of 20 places (repeats are allowed) but I am stuck on how to count the number of different places inputted in the list excluding duplicates. (i would like to avoid using hash)
Input:
[park, park, sea, beach, town]
Output:
[Number of unique places = 4]
Heres a rough example of the code I'm trying to make:
public static void main(String[] args) {
ArrayList<City> place = new ArrayList();
Scanner sc = new Scanner(System.in);
for(...) { // this is just to receive 20 inputs from users using the scanner
...
}
# This is where i am lost on creating a for loop...
}
you can use a Set for that.
https://docs.oracle.com/javase/7/docs/api/java/util/Set.html
Store the list data to the Set.Set will not have duplicates in it, so the size of set will be the elements without duplicates.
use this method to get the set size.
https://docs.oracle.com/javase/7/docs/api/java/util/Set.html#size()
Sample Code.
List<String> citiesWithDuplicates =
Arrays.asList(new String[] {"park", "park", "sea", "beach", "town"});
Set<String> cities = new HashSet<>(citiesWithDuplicates);
System.out.println("Number of unique places = " + cities.size());
If you are able to use Java 8, you can use the distinct method of Java streams:
int numOfUniquePlaces = list.stream().distinct().count();
Otherwise, using a set is the easiest solution. Since you don't want to use "hash", use a TreeSet (although HashSet is in most cases the better solution). If that is not an option either, you'll have to manually check for each element whether it's a duplicate or not.
One way that comes to mind (without using Set or hashvalues) is to make a second list.
ArrayList<City> places = new ArrayList<>();
//Fill array
ArrayList<String> uniquePlaces = new ArrayList<>();
for (City city : places){
if (!uniquePlaces.contains(city.getPlace())){
uniquePlaces.add(city.getPlace());
}
}
//number of unique places:
int uniqueCount = uniquePlaces.size();
Note that this is not super efficient =D
If you do not want to use implementations of Set or Map interfaces (that would solve you problem with one line of code) and you want to stuck with ArrayList, I suggest use something like Collections.sort() method. It will sort you elements. Then iterate through the sorted array and compare and count duplicates. This trick can make solving your iteration problem easier.
Anyway, I strongly recommend using one of the implementations of Set interface.
Use following answer. This will add last duplicate element in distinct list if there are multiple duplicate elements.
List<String> citiesWithDuplicates = Arrays.asList(new String[] {
"park", "park", "sea", "beach", "town", "park", "beach" });
List<String> distinctCities = new ArrayList<String>();
int currentIndex = 0;
for (String city : citiesWithDuplicates) {
int index = citiesWithDuplicates.lastIndexOf(city);
if (index == currentIndex) {
distinctCities.add(city);
}
currentIndex++;
}
System.out.println("[ Number of unique places = "
+ distinctCities.size() + "]");
Well if you do not want to use any HashSets or similar options, a quick and dirty nested for-loop like this for example does the trick (it is just slow as hell if you have a lot of items (20 would be just fine)):
int differentCount=0;
for(City city1 : place){
boolean same=false;
for(City city2 : place){
if(city1.equals(city2)){
same=true;
break;
}
}
if(!same)
differentCount++;
}
System.out.printf("Number of unique places = %d\n",differentCount);
I want to take the contents of a CSV file and remove the duplicates in it. This is a topic that's gotten a lot of coverage here and elsewhere, but none of the suggested methods work for me: the final result still contains the duplicate values.
These are the steps I'm taking to get the text from the CSV file:
String holder = "";
Scanner input = new Scanner(new File("C:"+File.separator+"followers.csv")).useDelimiter(",");
List<String> temp = new ArrayList<String>();
while (input.hasNext())
{
holder = input.next();
temp.add(holder);
}
input.close();
So far, so good.
After trying to turn the ArrayList into a LinkedHashSet and a whole lot else, to no avail, this is what I'm on currently:
List<String> finalList = new ArrayList<String>();
for (String s : temp)
{
if (!finalList.contains(s))
{
finalList.add(s);
}
}
finalList.forEach(System.out::println);
But finalList still contains the duplicate values.
I'm assuming the problem lies with how I'm getting the CSV values into the ArrayList in the first place, but I have no idea where I'm going wrong.
An elegant solution to remove duplicates (without keeping the order) is
Set<String> hs = new HashSet<>();
//assume the ArrayList temp contains your data with duplicates
hs.addAll(temp);
temp.clear();
temp.addAll(hs);
temp then contains your data without duplicates.
You are probably getting whitespaces and new lines mixed with your values, hence the duplicates. Try parsing with uniVocity-parsers CsvParser as it eliminates these for you, works faster, and gives you much better support for handling the CSV format in general.
Try this to eliminate your dupes:
CsvParserSettings settings = new CsvParserSettings();
settings.getFormat().setLineSeparator("\n");
// creates a CSV parser
CsvParser parser = new CsvParser(settings);
// parses all rows in one go.
List<String[]> allRows = parser.parseAll(new File("C:"+File.separator+"followers.csv")));
Set<String> result = new LinkedHashSet<>();
for(String[] row : allRows){
for(String element : row){
if(element != null){
//assuming the case of these these elements don't matter
//remove the ".toLowerCase()" part if it does.
result.add(element.toLowerCase());
}
}
}
System.out.println(result); //here's your deduplicated data.
Hope it helps.
Disclosure: I'm the author of this libary, it's open-source and free (Apache 2.0 License)
Hi all please help me achieve this scenario where I have multiple files like aaa.txt, bbb.txt, ccc.txt with data as
aaa.txt:
100110,StringA,22
200110,StringB,2
300110,StringC, 12
400110,StringD,34
500110,StringE,423
bbb.txt as:
100110,StringA,20.1
200110,StringB,2.1
300110,StringC, 12.2
400110,StringD,3.2
500110,StringE,42.1
and ccc.txt as:
100110,StringA,2.1
200110,StringB,2.1
300110,StringC, 11
400110,StringD,3.2
500110,StringE,4.1
Now I have to read all the three files (huge files) and report the result as
100110: (22, 20.1,2.1).
Issue is with the size of files and how to achieve this in optimized way.
I assume you have some sort of code to handle reading the files line by line, so I'll pseudocode a scanner that can keep pulling lines.
The easiest way to handle this would be to use a Map. In this case, I'll just use a HashMap.
HashMap<String, String[]> map = new HashMap<>();
while (aaa.hasNextLine()) {
String[] lineContents = aaa.nextLine().split(",");
String[] array = new String[3];
array[0] = lineContents[2].trim();
map.put(lineContents[0], array);
}
while (bbb.hasNextLine()) {
String[] lineContents = bbb.nextLine().split(",");
String[] array = map.get(lineContents[0]);
if (array != null) {
array[1] = lineContents[2].trim();
map.put(lineContents[0], lineContents[2].trim());
} else {
array = new String[3];
array[1] = lineContents[2].trim();
map.put(lineContents[0], array);
}
}
// same for c, with a new index of 2
To add synchronicity, you would probably use one of these maps.
Then you'd create 3 threads that just read and put.
Unless you are doing a lot of processing on loading these files, or are reading a lot of smaller files, it might work better as a sequential operation.
If your files are all ordered, simply maintain an array of Scanner pointing to your files and read the lines one by one, output the result file in a file as you go.
Doing so, you will only keep in memory as many lines as the number of files. It is both time and memory efficient.
If your files are not ordered, you can use the sort command to sort them.
How to set values for two dimension array of objects in java.
following is my for loop :
Object[][] hexgenSecurityInferenceData = null;
for (String methodName: knowGoodMap.keySet()) {
hexgenSecurityInferenceData = new Object[][] {
{
(KnownGoodInfoRO) knowGoodMap.get(methodName), new Object[] {
(MethodPropertiesRO) methodPropertiesMap.get(methodName), (List) methodParametersMap.get(methodName)
}
},
};
}
this prints only one row of data. I am sure that i make mistake when adding values to Array of Object but really don't know how to fix.
Kindly help me to fix this
You can't add elements to an array - you can only set elements in an array.
I suggest you have a List<Object[]> instead:
List<Object[]> hexgenSecurityInferenceData = new ArrayList<Object[]>();
for (String methodName:knowGoodMap.keySet()) {
hexgenSecurityInferenceData.add(new Object[] {
knowGoodMap.get(methodName),
new Object[] {
methodPropertiesMap.get(methodName),
methodParametersMap.get(methodName)
}
});
}
(I've removed the casts as they were pointless... you're storing the values in an Object[] anyway. The only benefit of the casts would be to cause an exception if the objects were of an unexpected type.)
You could still use an array if you really wanted, but you'd need to create it with the right size to start with, and then keep the "current index". It's then generally harder to use arrays than lists anyway.
If you really need an array, you can create one from the list:
Object[][] array = hexgenSecurityInferenceData.toArray(new Object[0][]);
Doing it in two stages this way will be simpler than directly populating an array up-front.
I'd actually suggest two further changes:
Don't just use Object[] for this... create a type to encapsulate this data. With your current approach, you've even got a nested Object[] within the Object[]... any code reading this data will be horrible.
Use entrySet() instead of keySet(), then you don't need to fetch the value by key
You have a matrix of objects Object[][] so if you want to populate this 2-d array you have to do something like:
Object[][] hexgenSecurityInferenceData=new Object[10][10];
for(int i=0; i<10;i++){
for(int j=0; j<10;j++){
hexgenSecurityInferenceData[i][j] = new Object();
}
}
But as well pointed by Jon its better to have your own implementation/encapsulation instead of using Object
Using List is the best way to resolve this. However still you can do using object[] by initializing array.
Object[][] hexgenSecurityInferenceData = new Object[knowGoodMap.keySet().size()][];
int i = 0;
for (String methodName : knowGoodMap.keySet())
{
hexgenSecurityInferenceData[i][0] = new Object[][]
{
{(KnownGoodInfoRO) knowGoodMap.get(methodName),
new Object[]{(MethodPropertiesRO) methodPropertiesMap.get(methodName), (List) methodParametersMap.get(methodName)}
}
};
i++;
}
I'm storing my wordcount into the value field of a HashMap, how can I then get the 500 top words in the text?
public ArrayList<String> topWords (int numberOfWordsToFind, ArrayList<String> theText) {
//ArrayList<String> frequentWords = new ArrayList<String>();
ArrayList<String> topWordsArray= new ArrayList<String>();
HashMap<String,Integer> frequentWords = new HashMap<String,Integer>();
int wordCounter=0;
for (int i=0; i<theText.size();i++){
if(frequentWords.containsKey(theText.get(i))){
//find value and increment
wordCounter=frequentWords.get(theText.get(i));
wordCounter++;
frequentWords.put(theText.get(i),wordCounter);
}
else {
//new word
frequentWords.put(theText.get(i),1);
}
}
for (int i=0; i<theText.size();i++){
if (frequentWords.containsKey(theText.get(i))){
// what to write here?
frequentWords.get(theText.get(i));
}
}
return topWordsArray;
}
One other approach you may wish to look at is to think of this another way: is a Map really the right conceptual object here? It may be good to think of this as being a good use of a much-neglected-in-Java data structure, the bag. A bag is like a set, but allows an item to be in the set multiple times. This simplifies the 'adding a found word' very much.
Google's guava-libraries provides a Bag structure, though there it's called a Multiset. Using a Multiset, you could just call .add() once for each word, even if it's already in there. Even easier, though, you could throw your loop away:
Multiset<String> words = HashMultiset.create(theText);
Now you have a Multiset, what do you do? Well, you can call entrySet(), which gives you a collection of Multimap.Entry objects. You can then stick them in a List (they come in a Set), and sort them using a Comparator. Full code might look like (using a few other fancy Guava features to show them off):
Multiset<String> words = HashMultiset.create(theWords);
List<Multiset.Entry<String>> wordCounts = Lists.newArrayList(words.entrySet());
Collections.sort(wordCounts, new Comparator<Multiset.Entry<String>>() {
public int compare(Multiset.Entry<String> left, Multiset.Entry<String> right) {
// Note reversal of 'right' and 'left' to get descending order
return right.getCount().compareTo(left.getCount());
}
});
// wordCounts now contains all the words, sorted by count descending
// Take the first 50 entries (alternative: use a loop; this is simple because
// it copes easily with < 50 elements)
Iterable<Multiset.Entry<String>> first50 = Iterables.limit(wordCounts, 50);
// Guava-ey alternative: use a Function and Iterables.transform, but in this case
// the 'manual' way is probably simpler:
for (Multiset.Entry<String> entry : first50) {
wordArray.add(entry.getElement());
}
and you're done!
Here you can find a guide how to sort a HashMap by the values. After the sorting you can just iterate over the first 500 entries.
Take a look at the TreeBidiMap provided by the Apache Commons Collections package. http://commons.apache.org/collections/api-release/org/apache/commons/collections/bidimap/TreeBidiMap.html
It allows you to sort the map according to both the key or the value set.
Hope it helps.
Zhongxian