I am trying to compare two CSV files that have the same data but columns in different orders. When the column orders match, the following code works: How can I tweak my following code to make it work when column orders don't match between the CSV files?
Set<String> source = new HashSet<>(org.apache.commons.io.FileUtils.readLines(new File(sourceFile)));
Set<String> target = new HashSet<>(org.apache.commons.io.FileUtils.readLines(new File(targetFile)));
return source.containsAll(target) && target.containsAll(source)
For example, the above test pass when the source file and target file are in this way:
source file:
a,b,c
1,2,3
4,5,6
target file:
a,b,c
1,2,3
4,5,6
However, the source file is same, but if the target file is in the following way, it doesn't work.
target file:
a,c,b
1,3,2
4,6,5
A Set relies on properly functioning .equalsmethod for comparison, whether detecting duplicates, or comparing it's elements to those in another Collection. When I saw this question, my first thought was to create a new class for Objects to put into your Set Objects, replacing the String Objects. But, at the time, it was easier and faster to produce the code in my previous answer.
Here is another solution, which is closer to my first thought. To start, I created a Pair class, which overrides .hashCode () and .equals (Object other).
package comparecsv1;
import java.util.Objects;
public class Pair <T, U> {
private final T t;
private final U u;
Pair (T aT, U aU) {
this.t = aT;
this.u = aU;
}
#Override
public int hashCode() {
int hash = 3;
hash = 59 * hash + Objects.hashCode(this.t);
hash = 59 * hash + Objects.hashCode(this.u);
return hash;
}
#Override
public boolean equals(Object obj) {
if (this == obj) { return true; }
if (obj == null) { return false; }
if (getClass() != obj.getClass()) { return false; }
final Pair<?, ?> other = (Pair<?, ?>) obj;
if (!Objects.equals(this.t, other.t)) {
return false;
}
return Objects.equals(this.u, other.u);
} // end equals
} // end class pair
The .equals (Object obj) and the .hashCode () methods were auto-generated by the IDE. As you know, .hashCode() should always be overridden when .equals is overridden. Also, some Collection Objects, such as HashMap and HashSet rely on proper .hashCode() methods.
After creating class Pair<T,U>, I created class CompareCSV1. The idea here is to use a Set<Set<Pair<String, String>>> where you have Set<String> in your code.
A Pair<String, String> pairs a value from a column with the header for the column in which it appears.
A Set<Pair<String, String>> represents one row.
A Set<Set<Pair<String, String>>> represents all the rows.
package comparecsv1;
import java.util.Arrays;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
public final class CompareCSV1 {
private final Set<Set<Pair<String, String>>> theSet;
private final String [] columnHeader;
private CompareCSV1 (String columnHeadings, String headerSplitRegex) {
columnHeader = columnHeadings.split (headerSplitRegex);
theSet = new HashSet<> ();
}
private Set<Pair<String, String>> createLine
(String columnSource, String columnSplitRegex) {
String [] column = columnSource.split (columnSplitRegex);
Set<Pair<String, String>> lineSet = new HashSet<> ();
int i = 0;
for (String columnValue: column) {
lineSet.add (new Pair (columnValue, columnHeader [i++]));
}
return lineSet;
}
public Set<Set<Pair<String, String>>> getSet () { return theSet; }
public String [] getColumnHeaders () {
return Arrays.copyOf (columnHeader, columnHeader.length);
}
public static CompareCSV1 createFromData (List<String> theData
, String headerSplitRegex, String columnSplitRegex) {
CompareCSV1 result =
new CompareCSV1 (theData.get(0), headerSplitRegex);
for (int i = 1; i < theData.size(); ++i) {
result.theSet.add(result.createLine(theData.get(i), columnSplitRegex));
}
return result;
}
public static void main(String[] args) {
String [] sourceData = {"a,b,c,d,e", "6,7,8,9,10", "1,2,3,4,5"
,"11,12,13,14,15", "16,17,18,19,20"};
String [] targetData = {"c,b,e,d,a", "3,2,5,4,1", "8,7,10,9,6"
,"13,12,15,14,11", "18,17,20,19,16"};
List<String> source = Arrays.asList(sourceData);
List<String> target = Arrays.asList (targetData);
CompareCSV1 sourceCSV = createFromData (source, ",", ",");
CompareCSV1 targetCSV = createFromData (target, ",", ",");
System.out.println ("Source contains target? "
+ sourceCSV.getSet().containsAll (targetCSV.getSet())
+ ". Target contains source? "
+ targetCSV.getSet().containsAll (sourceCSV.getSet())
+ ". Are equal? " + targetCSV.getSet().equals (sourceCSV.getSet()));
} // end main
} // end class CompareCSV1
This code has some things in common with the code in my first answer:
Except for the column header lines, which must be first in the "source" and "Target" data, matching lines in one file can be in a different order in the other file.
I used String [] Objects, with calls to Arrays.asList method as substitutes for your data sources.
It does not contain code to guard against errors, such as lines in the file having different numbers of columns from other lines, or no header line.
I hard coded "," as the String split expression in main. But, the new methods allow the String split expression to be passed. It allows a separate String split expressions for the column header line and the data lines.
Here is some code that could work. It relies on the first line of each file containing column headers.
It's a bit more than a tweak, though. It's an "old dog" approach.
The original code in the question has these lines:
Set<String> source = new HashSet<>(org.apache.commons.io.FileUtils.readLines(new File(sourceFile)));
Set<String> target = new HashSet<>(org.apache.commons.io.FileUtils.readLines(new File(targetFile)));
With this solution, the data coming in needs more processing before it will be ready to be put into a Set. Those two lines get changed as follows:
List<String> source = (org.apache.commons.io.FileUtils.readLines(new File(sourceFile)));
List<String> target = (org.apache.commons.io.FileUtils.readLines(new File(targetFile)));
This approach will compare column headers in the target file and the source file. It will use that to build an int [] that indicates the difference in column order.
After the order difference array is filled, the data in the file will be put into a pair of Set<List<String>>. Each List<String> will represent one line from the source and target data files. Each String in the List will be data from one column.
In the following code, main is the test driver. Only for my testing purposes, the data files have been replaced by a pair of String [] and reading the file with org.apache.commons.io.FileUtils.readLines has been replaced with Arrays.asList.
package comparecsv;
import java.util.Arrays;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
public class CompareCSV {
private static int [] columnReorder;
private static void headersOrder
(String sourceHeader, String targetHeader) {
String [] columnHeader = sourceHeader.split (",");
List<String> sourceColumn = Arrays.asList (columnHeader);
columnReorder = new int [columnHeader.length];
String [] targetColumn = targetHeader.split (",");
for (int i = 0; i < targetColumn.length; ++i) {
int j = sourceColumn.indexOf(targetColumn[i]);
columnReorder [i] = j;
}
}
private static Set<List<String>> toSet
(List<String> data, boolean reorder) {
Set<List<String>> dataSet = new HashSet<> ();
for (String s: data) {
String [] byColumn = s.split (",");
if (reorder) {
String [] reordered = new String [byColumn.length];
for (int i = 0; i < byColumn.length; ++i) {
reordered[columnReorder[i]] = byColumn [i];
}
dataSet.add (Arrays.asList (reordered));
} else {
dataSet.add (Arrays.asList(byColumn));
}
}
return dataSet;
}
public static void main(String[] args) {
String [] sourceData = {"a,b,c,d,e", "1,2,3,4,5", "6,7,8,9,10"
,"11,12,13,14,15", "16,17,18,19,20"};
String [] targetData = {"c,b,e,d,a", "3,2,5,4,1", "8,7,10,9,6"
,"13,12,15,14,11", "18,17,20,19,16"};
List<String> source = Arrays.asList(sourceData);
List<String> target = Arrays.asList (targetData);
headersOrder (source.get(0), target.get(0));
Set<List<String>> sourceSet = toSet (source, false);
Set<List<String>> targetSet = toSet (target, true);
System.out.println ( sourceSet.containsAll (targetSet)
+ " " + targetSet.containsAll (sourceSet) + " " +
( sourceSet.containsAll (targetSet)
&& targetSet.containsAll (sourceSet)));
}
}
MethodheadersOrder compares the headers, column by column, and populates the columnReorder array. Method toSet creates the Set<List<String>>, either reordering the columns or not, according to the value of the boolean argument.
For the sake of simplification, this assumes lines are easily split using comma. Data such as dog, "Reginald, III", 3 will cause failure.
In testing this, I found lines in the file can be matched with their counterpart in the other file, regardless of order of the lines. Here is an example:
Source:
a,b,c
1,2,3
4,5,6
7,8,9
Target:
a,b,c
4,5,6
7,8,9
1,2,3
The result would be the contents match.
I believe this would match a result from the O/P question code. However, for this solution to work, the first line in each file must contain column headers.
import java.util.*;
public class CarProduct{
String color;
String modelname;
String price;
public CarProduct(String c, String m, String p){
color = c;
modelname = m;
price = p;
}
}
class HashMapApplication{
public static void main(String []ar){
ArrayList<CarProduct> arraylist1 = new ArrayList<CarProduct>();
ArrayList<CarProduct> arraylist2 = new ArrayList<CarProduct>();
ArrayList<CarProduct> arraylist3 = new ArrayList<CarProduct>();
HashMap<String,ArrayList> hashmap = new HashMap<String,ArrayList>();
CarProduct Tata = new CarProduct("black","12 Lakhs","Aria");
arraylist1.add(Tata);
hashmap.put("Tata",arraylist1);
CarProduct WolksWagen = new CarProduct("off white","10 Lakhs","Passat");
arraylist2.add(WolksWagen);
hashmap.put("WolksWagen",arraylist2);
CarProduct Mahindra = new CarProduct("white","15 Lakhs","XUV");
arraylist3.add(Mahindra);
hashmap.put("Mahindra",arraylist3);
//get(int index)
//map.get(id).add(value);
//hashmap.get("")
// this contains error i dont know how iterate it because the class is there and i need access each and every field in it
for (Entry<String, ArrayList<CarProduct>> entry : hashmap.entrySet()) {
System.out.print(entry.getKey()+" | ");
for(String property : entry.getValue()){
System.out.print(property+" ");
}
System.out.println();
}
}
}
I want to extract the values from hashmap.
Please help, the key is given and the value will be arraylist.
Please help to convert the object thing in string and in displaying each and every value using get method
import java.util.*;
public class CarProduct{
String color;
String modelname;
String price;
public CarProduct(String c, String m, String p){
color = c;
modelname = m;
price = p;
}
#Override
public String toString()
{
return "Model: " + modelname + " Colour:" + color + " Price:" + price ;
}
}
class HashMapApplication{
public static void main(String []ar){
List<CarProduct> arraylist1 = new ArrayList<CarProduct>();
List<CarProduct> arraylist2 = new ArrayList<CarProduct>();
List<CarProduct> arraylist3 = new ArrayList<CarProduct>();
Map<String,List<CarProduct>> hashmap = new HashMap<String, List<CarProduct>>();
CarProduct Tata = new CarProduct("black","12 Lakhs","Aria");
arraylist1.add(Tata);
hashmap.put("Tata",arraylist1);
CarProduct WolksWagen = new CarProduct("off white","10 Lakhs","Passat");
arraylist2.add(WolksWagen);
hashmap.put("WolksWagen",arraylist2);
CarProduct Mahindra = new CarProduct("white","15 Lakhs","XUV");
arraylist3.add(Mahindra);
hashmap.put("Mahindra",arraylist3);
for (Map.Entry<String, List<CarProduct>> entry : hashmap.entrySet()) {
System.out.print(entry.getKey()+" | ");
for(CarProduct property : entry.getValue()){
System.out.print(property+" ");
}
System.out.println();
}
}
}
So there's a couple of things to correct.
Firstly the hashmap, we need to make sure the hashmap is properly typed so we've specified the type of the ArrayList otherwise we'll only be able to get Objects.
Secondly the for loop. Here we need to change the inner for loop so that it loops on CarProduct's and not Strings.
Lastly for printing the property we need to override the toString() method in CarProduct this will allow you to get the car product and put it directly in a System.out.print() as you have done.
One thing I should add is that currently you are putting the inputs into your initializer in the wrong order. Your constructor specifies color, model, price but you're using your initializer as color, price, model.
EDIT: Made the lists and maps more generic to reflect the comments on the original question
for (Map.Entry<String, ArrayList> entry : hashmap.entrySet()) {
System.out.print(entry.getKey()+" | ");
//get the arraylist first.
ArrayList<CarProduct> arrayList = entry.getValue();
for(CarProduct x: arrayList){
//display the carProduct
}
System.out.println();
}
You are basically trying to print ArrayList.toString() which will not give proper response. Try to first get the arraylist and then iterate over its contents.
I am new to java, and I have this problem; I am working with a webservice from android where I send a request and I get an answer formatted like this string: 1-0,2-0,3-0,4-0,5-0,6-0,7-0,8-0,12-0,13-0 where the number before the "-" means the number of my button and the number after "-" means the button status. I split this string into an array like this:
String buttons = "1-0,2-0,3-0,4-0,5-0,6-0,7-0,8-0,13-0,14-0";
String[] totalButtons = buttons.split(",");
then I make a new request to get the status of my buttons and I get this
String status = "1-0,2-0,3-2,4-0,5-4,6-0,7-4,8-0,9-2,10-1,13-4,14-2";
String[] statusButtons = status.split(",");
The number of the buttons are going to be the same all time; in this case 10 buttons.
The problem that I have is how to compare each element of the two arrays if they can change their status every two seconds and I receive more buttons than the first time and I have to change their status with the new value. For example the first element of the array one is equal to the first element of the second array so there is no problem, but the first array does not have two elements in the second array in this case 9-2,10-1 so they should be deleted. The final result should be like this
String buttons = "1-0,2-0,3-0,4-0,5-0,6-0,7-0,8-0,13-0,14-0";
String status = "1-0,2-0,3-2,4-0,5-4,6-0,7-4,8-0,9-2,10-1,13-4,14-2";
String finalButtons = "1-0,2-0,3-2,4-0,5-4,6-0,7-4,8-0,13-4,14-2";
Here's an idea to get you started;
Map<String,String> buttonStatus = new HashMap<String,String>();
for (String button : totalButtons) {
String parts[] = button.split("-");
buttonStatus.put(parts[0], parts[1]);
}
for (String button : statusButtons) {
String parts[] = button.split("-");
if (buttonStatus.containsKey(parts[0])) {
buttonStatus.put(parts[0], parts[1]);
}
// Java 8 has a "replace" method that will change the value only if the key
// already exists; unfortunately, Android doesn't support it
}
The result will be a map whose keys are taken from the original totalButtons, and whose values will be taken from statusButtons if present. You can go through the keys and values in the Map to get the results, but they won't be in order; if you want them to be in the same order as totalButtons, go through totalButtons again and use buttonStatus.get to get each value.
The javadoc for Map is here.
I would split up each of those again and then compare those values.
ex:
String[] doubleSplit = totalButtons[index].split("-"); // "1-0" -> {"1", "0"}
import java.util.HashMap;
import java.util.Map;
/**
* #author Davide
*/
public class test {
static Map map;
public static void main(String[] args) {
// init value
String buttons = "1-0,2-0,3-0,4-0,5-4,6-0,7-0,8-0,13-0,14-0";
String[] keys = buttons.split("(-[0-9]*,*)");
init(keys);
// new value
String status = "1-0,2-0,3-2,4-0,5-4,6-0,7-4,8-0,9-2,10-1,13-4,14-2";
String[] statusButtons = status.split(",");
update(statusButtons);
print();
}
public static void init(String[] keys) {
map = new HashMap<Integer, Integer>();
for (String k : keys) {
map.put(Integer.valueOf(k), 0);
}
}
public static void update(String[] statusButtons) {
for (String state : statusButtons) {
String[] split = state.split("-");
int k = Integer.valueOf(split[0]);
int v = Integer.valueOf(split[1]);
if (map.containsKey(k)) {
map.put(k, v);
}
}
}
public static void print() {
String out = "";
for (Object k : map.keySet()) {
out += k + "-" + map.get(k) + ",";
}
System.out.println(out.substring(0, out.length() - 1));
}
}
I am decomposing a series of 90,000+ strings into a discrete list of the individual, non-duplicated pairs of words that are included in the strings with the rxcui id values associated with each string. I have developed a method which tries to accomplish this, but it is producing a lot of redundancy. Analysis of the data shows there are about 12,000 unique words in the 90,000+ source strings, after I clean and format the contents of the strings.
How can I change the code below so that it avoids creating the redundant rows in the destination 2D ArrayList (shown below the code)?
public static ArrayList<ArrayList<String>> getAllWords(String[] tempsArray){//int count = tempsArray.length;
int fieldslenlessthan2 = 0;//ArrayList<String> outputarr = new ArrayList<String>();
ArrayList<ArrayList<String>> twoDimArrayList= new ArrayList<ArrayList<String>>();
int idx = 0;
for (String s : tempsArray) {
String[] fields = s.split("\t");//System.out.println(" --- fields.length is: "+fields.length);
if(fields.length>1){
ArrayList<String> row = new ArrayList<String>();
System.out.println("fields[0] is: "+fields[0]);
String cleanedTerms = cleanTerms(fields[1]);
String[] words = cleanedTerms.split(" ");
for(int j=0;j<words.length;j++){
String word=words[j].trim();
word = word.toLowerCase();
if(isValidWord(word)){//outputarr.add(word);
System.out.println("words["+j+"] is: "+word);
row.add(word_id);//WORD_ID NEEDS TO BE CREATED BY SOME METHOD.
row.add(fields[0]);
row.add(word);
twoDimArrayList.add(row);
idx += 1;
}
}
}else{fieldslenlessthan2 += 1;}
}
System.out.println("........... fieldslenlessthan2 is: "+fieldslenlessthan2);
return twoDimArrayList;
}
The output of the above method currently looks like the following, with many rxcui values for some name values, and with many name values for some rxcui:
How do I change the code above so that the output is a list of unique pairs of name/rxcui values, summarizing all relevant data from the current output while removing only the redundancies?
If you just need a Collection of all words, use a HashSet Sets are primarily used for contains logic. If you need to associate a value with your string use a HashMap
public HashSet<String> getUniqueWords(String[] stringArray) {
HashSet<String> uniqueWords = new HashSet<String>();
for (String str : stringArray) {
uniqueWords.add(str);
}
return uniqueWords;
}
This will give you a collection of all the unique Strings in your array. If you need an ID use a HashMap
String[] strList; // your String array
int idCounter = 0;
HashMap<String, Integer> stringIDMap = new HashMap<String, Integer>();
for (String str : strList) {
if (!stringIDMap.contains(str)) {
stringIDMap.put(str, new Integer(idCounter));
idCounter++;
}
}
This will provide you a HashMap with unique String keys and unique Integer values. To get an id for a String you do this:
stringIDMap.get("myString"); // returns the Integer ID associated with the String "myString"
UPDATE
Based on the question update from the OP. I recommend creating an object that holds the String value and the rxcui. You can then place these in a Set or HashMap using a similar implementation to the one provided above.
public MyObject(String str, int rxcui); // The constructor for your new object
MyObject mo1 = new MyObject("hello", 5);
Either
mySet.add(myObject);
will work or
myMap.put(mo1.getStr, mo1.getRxcui);
What is the purpose of the unique word ID? Is the word itself not unique enough since you are not keeping duplicates?
A very basic way would be to keep a counter going as you are checking new words. For each word that doesn't already exist you could increase the counter and use the new value as the unique id.
Lastly, might I suggest you use a HashMap instead. It would allow you to both insert and retrieve words in O(1) time. I am not entirely sure what you are going for, but I think the HashMap might give you more range.
Edit2:
It would be something a little more along these lines. This should help you out.
public static Set<DataPair> getAllWords(String[] tempsArray) {
Set<DataPair> set = new HashSet<>();
for (String row : tempsArray) {
// PARSE YOUR STRING DATA
// the way you were doing it seemed fine but something like this
String[] rowArray = row.split(" ");
String word = row[1];
int id = Integer.parseInt(row[0]);
DataPair pair = new DataPair(word, id);
set.add(pair);
}
return set;
}
class DataPair {
private String word;
private int id;
public DataPair(String word, int id) {
this.word = word;
this.id = id;
}
public boolean equals(Object o) {
if (o instanceof DataPair) {
return ((DataPair) o).word.equals(word) && ((DataPair) o).id == id;
}
return false;
}
}