Java Remove Duplicates from file search for String Array [0] - java

I have a long text file.
Now I will remove duplicates from the file. The problem is that the search parameter is the first word in the list, split by ":"
For example:
The file lines:
11234567:229283:29833204:2394803
11234567:4577546765:655776:564456456
43523:455543:54335434:53445
11234567:43455:544354:5443
Now I will have this here:
11234567:229283:29833204:2394803
43523:455543:54335434:53445
I need to get the first line from the duplicates, other will be ignored.
I tried this:
Set<String> lines11;
try (BufferedReader reader11 = new BufferedReader(new FileReader("test.txt"))) {
lines11 = new HashSet<>(10000); // maybe should be bigger
String line11;
while ((line11 = reader11.readLine()) != null) {
lines11.add(line11);
}
} // maybe should be bigger
try (BufferedWriter writer11 = new BufferedWriter(new FileWriter("test.txt"))) {
for (String unique : lines11) {
writer11.write(unique);
writer11.newLine();
}
}
That is working, but it removes only when the complete line is duplicated.
How can I change it so that it looks for the first word in every line and checks for duplicates here; when no duplicate is found, save the complete line; if duplicate then ignore the line?

You need to maintain a Set<String> that holds only the first word of each line.
List<String> lines11;
Set<String> dups;
try (BufferedReader reader11 = new BufferedReader(new FileReader("test.txt"))) {
lines11 = new ArrayList<>();
dups = new HashSet<>();
String line11;
while ((line11 = reader11.readLine()) != null) {
String first = line11.split(":")[0]; // assuming your separator is :
if (!dups.contains(first)) {
lines11.add(line11);
dups.add(first);
}
}
}
try (BufferedWriter writer11 = new BufferedWriter(new FileWriter("test.txt"))) {
for (String unique : lines11) {
writer11.write(unique);
writer11.newLine();
}
}

i will write the section about adding to list
use HashMap
String tmp[] = null;
HashMap<String, String> lines = new HashMap<String, String>();
String line11 = "";
while ((line11 = reader11.readLine()) != null) {
tmp = line11.split(":");
if(!lines.containsKey(tmp[0])){
lines.put(tmp[0], line11);
}
}
so the loop will add only uinuque lines , using first word as key

You can add the data in list and take one more set in which you will add first word in that set and try add every time first of new line if it is in set, then it will not be added and return false. On that basis you can add data in list or directly in you new bufferreader.
List<String> lines11;
Set<String> uniqueRecords;
try (BufferedReader reader11 = new BufferedReader(new FileReader("test.txt"))) {
lines11 = new ArrayList<>(); // no need to give size it will increase dynamically
uniqueRecords = new HashSet<>();
String line11;
while ((line11 = reader11.readLine()) != null) {
String firstWord = line11.substring(0, firstWord.firstIndexOf(" "));
if(uniqueRecords.add(firstWord )){
lines11.add(line11);
}
}
} // maybe should be bigger
try (BufferedWriter writer11 = new BufferedWriter(new FileWriter("test.txt"))) {
for (String unique : lines11) {
writer11.write(unique);
writer11.newLine();
}
}

Related

Merge two array list into a TreeMap in java

I want to combine these two text files
Driver details text file:
AB11; Angela
AB22; Beatrice
Journeys text file:
AB22,Edinburgh ,6
AB11,Thunderdome,1
AB11,Station,5
And I want my output to be only the names and where the person has been. It should look like this:
Angela
Thunderdone
Station
Beatrice
Edinburgh
Here is my code. I'm not sure what i'm doing wrong but i'm not getting the right output.
ArrayList<String> names = new ArrayList<String>();
TreeSet<String> destinations = new TreeSet<String>();
public TaxiReader() {
BufferedReader brName = null;
BufferedReader brDest = null;
try {
// Have the buffered readers start to read the text files
brName = new BufferedReader(new FileReader("taxi_details.txt"));
brDest = new BufferedReader(new FileReader("2017_journeys.txt"));
String line = brName.readLine();
String lines = brDest.readLine();
while (line != null && lines != null ){
// The input lines are split on the basis of certain characters that the text files use to split up the fields within them
String name [] = line.split(";");
String destination [] = lines.split(",");
// Add names and destinations to the different arraylists
String x = new String(name[1]);
//names.add(x);
String y = new String (destination[1]);
destinations.add(y);
// add arraylists to treemap
TreeMap <String, TreeSet<String>> taxiDetails = new TreeMap <String, TreeSet<String>> ();
taxiDetails.put(x, destinations);
System.out.println(taxiDetails);
// Reads the next line of the text files
line = brName.readLine();
lines = brDest.readLine();
}
// Catch blocks exist here to catch every potential error
} catch (FileNotFoundException ex) {
ex.printStackTrace();
} catch (IOException ex) {
ex.printStackTrace();
// Finally block exists to close the files and handle any potential exceptions that can happen as a result
} finally {
try {
if (brName != null)
brName.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
public static void main (String [] args){
TaxiReader reader = new TaxiReader();
}
You are reading 2 files in parallel, I don't think that's gonna work too well. Try reading one file at a time.
Also you might want to rethink your data structures.
The first file relates a key "AB11" to a value "Angela". A map is better than an arraylist:
Map<String, String> names = new HashMap<String, String>();
String key = line.split(",")[0]; // "AB11"
String value = line.split(",")[1]; // "Angela"
names.put(key, value)
names.get("AB11"); // "Angela"
Similarly, the second file relates a key "AB11" to multiple values "Thunderdome", "Station". You could also use a map for this:
Map<String, List<String>> destinations = new HashMap<String, List<String>>();
String key = line.split(",")[0]; // "AB11"
String value = line.split(",")[1]; // "Station"
if(map.get(key) == null) {
List<String> values = new LinkedList<String>();
values.add(value);
map.put(key, values);
} else {
// we already have a destination value stored for this key
// add a new destination to the list
List<String> values = map.get(key);
values.add(value);
}
To get the output you want:
// for each entry in the names map
for(Map.Entry<String, String> entry : names.entrySet()) {
String key = entry.getKey();
String name = entry.getValue();
// print the name
System.out.println(name);
// use the key to retrieve the list of destinations for this name
List<String> values = destinations.get(key);
for(String destination : values) {
// print each destination with a small indentation
System.out.println(" " + destination);
}
}

How to compare and edit two csv files in java depending on one column?

public class CompareCSV {
public static void main(String args[]) throws FileNotFoundException, IOException {
String path = "C:\\csv\\";
String file1 = "file1.csv";
String file2 = "file2.csv";
String file3 = "file3.csv";
ArrayList<String> al1 = new ArrayList<String>();
ArrayList<String> al2 = new ArrayList<String>();
BufferedReader CSVFile1 = new BufferedReader(new FileReader("/C:/Users/bida0916/Desktop/macro.csv"));
String dataRow1 = CSVFile1.readLine();
while (dataRow1 != null) {
String[] dataArray1 = dataRow1.split(",");
for (String item1 : dataArray1) {
al1.add(item1);
}
dataRow1 = CSVFile1.readLine();
}
CSVFile1.close();
BufferedReader CSVFile2 = new BufferedReader(new FileReader("C:/Users/bida0916/Desktop/Deprecated.csv"));
String dataRow2 = CSVFile2.readLine();
while (dataRow2 != null) {
String[] dataArray2 = dataRow2.split(",");
for (String item2 : dataArray2) {
al2.add(item2);
}
dataRow2 = CSVFile2.readLine();
}
CSVFile2.close();
for (String bs : al2) {
al1.remove(bs);
}
int size = al1.size();
System.out.println(size);
try {
FileWriter writer = new FileWriter("C:/Users/bida0916/Desktop/NewMacro.csv");
while (size != 0) {
size--;
writer.append("" + al1.get(size));
writer.append('\n');
}
writer.flush();
writer.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
I want to compare two csv files in java and want to have the complete details removed of one csv file from the other by comparing the first column of both the files. Currently I am getting a csv file with one column only having all details jumbled up.
You are adding all values of all columns to a single list, that's why you get the mess in your output:
ArrayList<String> al1=new ArrayList<String>();
//...
String[] dataArray1 = dataRow1.split(",");
for (String item1:dataArray1)
{
al1.add(item1);
}
Add the complete string array from your file to your list, then you can access your data in a structured way:
List<String[]> al1 = new ArrayList<>();
//...
String[] dataArray1 = dataRow1.split(",");
al1.add(dataArray1);
But for removal of rows I'd recommend to use Maps for faster access, where the key is the element on which you decide which row to delete and the value is the full row from your cvs file:
Map<String, String> al1 = new HashMap<>(); // or LinkedHashMap if row order is relevant
//...
String[] dataArray1 = dataRow1.split(",");
al1.put(dataArray1[0], dataRow1);
But be aware, that if two rows in a file contain the same value in the first column, only one will be preserved. If that's possible you might need to adopt that solution to store the data in a Map<String, Set<String>> or Map<String, List<String>>.
At this point I'd like to recommend to extract the file-reading to a separate method, which you can reuse to read both of your input-files and reduce duplicate code:
Map<String, String> al1 = readInputCsvFile(file1);
Map<String, String> al2 = readInputCsvFile(file2);
For the deletion of the lines which shall be removed, iterate over the key set of one of the maps and remove the entry from the other:
for (String key : al2.keySet()) {
al1.remove(key);
}
And for writing your output file, just write the row read from the original file as stored in the 'value' of your map.
for (String dataRow : al1.values()) {
writer.append(dataRow);
writer.append('\n');
}
EDIT
If you need to perform operations based on other data columns you should rather store the 'split-array' in the map instead of the full-line string read from the file. Then you have all data columns separately available:
Map<String, String[]> al2 = new HashMap<>();
//...
String[] dataArray2 = dataRow2.split(",");
al2.put(dataArray2[0], dataArray2);
You might then, e.g. add a condition for deleting:
for (Entry<String, String[]> entry : al2.entrySet()) {
String[] data = entry.getValue();
if ("delete".equals(data[17])) {
al1.remove(entry.getKey());
}
}
For writing your output file you have to rebuild the csv-format.
I'd recommend to use Apache commons-lang StringUtils for that task:
for (String[] data : al1.values()) {
writer.append(StringUtils.join(data, ","));
writer.append('\n');
}

Java: Read from txt file and store each word only once in array + sorting

I am having a problem with my program. What i am supposed to do is:
find all words from some txt files
store each word in array only once
Then sort alphabetically
I dont know how to ensure that each word won't appear twice(or more) in my array.
For example, a sentence from one of my files: My cat is huge and my dog is lazy.
I want the words "my" and "is" to appear only once in my array, not twice.
As for the sorting, is there anything that i can use from Java ? I don't know.
Any help is appreciated!
Here is what i have done so far:
try {
File dir = new File("path of folder that contains my files")
for (File f : dir.listFiles()) {
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(f)));
String line = null;
while((line = br.readLine())!= null) {
String [] tokens = line.split(",\\s+|\\s*\\\"\\s*|\\s+|\\.\\s*|\\s*\\:\\s*");
}
}
}
Here is the modified code to have sorted unique words:
try {
TreeSet<String> uniqueSortedWords = new TreeSet<String>();
File dir = new File(
"words.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(
new FileInputStream(dir)));
String line = null;
while ((line = br.readLine()) != null) {
String[] tokens = line
.split(",\\s+|\\s*\\\"\\s*|\\s+|\\.\\s*|\\s*\\:\\s*");
for(String token: tokens) {
uniqueSortedWords.add(token);
}
}
System.out.println(uniqueSortedWords);
//call uniqueSortedWords.toArray() to have output in an array
} catch (Exception e) {
e.printStackTrace();
}
ifI guess you are looking for a code something like this.
try {
ArrayList<String> list = new ArrayList<String>();
File dir = new File("path of folder that contains my files")
for (File f : dir.listFiles()) {
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(f)));
String line = null;
while((line = br.readLine())!= null) {
String [] tokens = line.split(",\\s+|\\s*\\\"\\s*|\\s+|\\.\\s*|\\s*\\:\\s*");
for(int i=0; i<tokens.length(); i++)
{ //Adding non-duplicates to arraylist
if (!list.contains(tokens[i])
{
list.add(tokens[i]);
}
}
}
Collections.Sort(list);
}
}
catch(Exception ex){}
Do not forget: import java.util.*; at the beginning of your code to use Collections.Sort();
EDIT
Even though contains is a built-in method you can directly use with ArrayLists, this is how such a method works in fact (just in case if you are curious):
public static boolean ifContains(ArrayList<String> list, String name) {
for (String item : list) {
if (item.getName().equals(name)) {
return true;
}
}
return false;
}
then to call it:
ifContains(list, tokens[i]))
You can use the combination of HashSet and TreeSet
Hashset:hashset allows null object.
TreeSet:treeset will not allow null object,treeset elements are sorted in ascending order by default.
Both HashSet and TreeSet does not hold duplicate elements.
try {
Set<String> list = new HashSet<>();
File f = new File("data.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(f)));
String line = null;
while ((line = br.readLine()) != null) {
String[] tokens = line.split(",\\s+|\\s*\\\"\\s*|\\s+|\\.\\s*|\\s*\\:\\s*");// other alternative:line.split("[,;-!]")
for (String token : tokens) {
list.add(token);
}
}
// Add the list to treeSet;Elements in treeSet are sorted
// Note: words must have the same case either lowercase or uppercase
// for sorting to work correctly
TreeSet<String> sortedSet = new TreeSet<>();
sortedSet.addAll(list);
Iterator<String> ite = sortedSet.iterator();
while (ite.hasNext()) {
System.out.println(ite.next());
}
} catch (Exception e) {
e.printStackTrace();
}

Java - Adding sections of a txt file to an array

I have imported a .csv database file that lists the users of the program, along with other information in the form : UserName, Password, PropertyName, EstimatedValue.
I have figured how to get the username but it will only read the last username on the database and not the others. Help would be greatly appreciated.
import java.util.*;
import java.io.*;
public class readCSV
{
String[] userData;
public void checkLogin() throws IOException
{
try
{
File file = new File("C:/Users/Sean/Documents/Programming assigment/Users.csv");
BufferedReader bufRdr = new BufferedReader(new FileReader(file));
String lineRead = bufRdr.readLine();
while(lineRead != null)
{
this.userData = lineRead.split(",");
lineRead = bufRdr.readLine();
}
bufRdr.close();
}
catch(Exception er){
System.out.print(er);
System.exit(0);
}
}
}
The offending line is this:
this.userData = lineRead.split(",");
You should put it into some collection, e.g. a list
final List<String[]> userData = new LinkedList<String[]> ();
try
{
File file = new File("C:/Users/Sean/Documents/Programming assigment/Users.csv");
BufferedReader bufRdr = new BufferedReader(new FileReader(file));
String lineRead = bufRdr.readLine();
while(lineRead != null)
{
this.userData.add (lineRead.split(","));
}
bufRdr.close();
}
catch(Exception er){
System.out.print(er);
System.exit(0);
}
Your line;
this.userData = lineRead.split(",");
overwrites the value of this.userData with each iteration, the result is that it just holds the value from the final iteration.
If you want to read many useres you need an ArrayList of userdata:
Where this.userData is defined as
ArrayList<UserData> userDataList;
and in your loop:
while(lineRead != null)
{
this.userDataList.add(lineRead.split(","));
lineRead = bufRdr.readLine();
}
Your current code loops through all names, but overwrites the value in each iteration.
Finally only the last value is kept.
your String[] (userData) is being replaced/overwritten on every iteration, you will have to Store them in an array/collection.
List<String[]> list = new ArrayList<String[]>();
while((lineRead=bufRdr.readLine())!= null)
{
this.userData = lineRead.split(",");
list.add(this.userData);
}
bufRdr.close();
To print the contents:
for(String[] str : list){
for(String s: str){
System.out.pritnln(s);
}
}
The problem is that in your while loop you are assigning your string to the same variable... so once you have read the entire file.. the variable holds the last value only.
What you need to do is:
Vector<String> userData = new Vector<String>();
then in your loop...
userData.add(lineRead);
then later you can split each one and do additional processing at that time....

Compare values in two files

I have two files Which should contain the same values between Substring 0 and 10 though not in order. I have Managed to Outprint the values in each file but I need to Know how to Report say id the Value is in the first File and Notin the second file and vice versa. The files are in these formats.
6436346346....Other details
9348734873....Other details
9349839829....Other details
second file
8484545487....Other details
9348734873....Other details
9349839829....Other details
The first record in the first file does not appear in the second file and the first record in the second file does not appear in the first file. I need to be able to report this mismatch in this format:
Record 6436346346 is in the firstfile and not in the secondfile.
Record 8484545487 is in the secondfile and not in the firstfile.
Here is the code I currently have that gives me the required Output from the two files to compare.
package compare.numbers;
import java.io.*;
/**
*
* #author implvcb
*/
public class CompareNumbers {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
File f = new File("C:/Analysis/");
String line;
String line1;
try {
String firstfile = "C:/Analysis/RL001.TXT";
FileInputStream fs = new FileInputStream(firstfile);
BufferedReader br = new BufferedReader(new InputStreamReader(fs));
while ((line = br.readLine()) != null) {
String account = line.substring(0, 10);
System.out.println(account);
}
String secondfile = "C:/Analysis/RL003.TXT";
FileInputStream fs1 = new FileInputStream(secondfile);
BufferedReader br1 = new BufferedReader(new InputStreamReader(fs1));
while ((line1 = br1.readLine()) != null) {
String account1 = line1.substring(0, 10);
System.out.println(account1);
}
} catch (Exception e) {
e.fillInStackTrace();
}
}
}
Please help on how I can effectively achieve this.
I think I needed to say that am new to java and may not grab the ideas that easily but Am trying.
Here is the sample code to do that:
public static void eliminateCommon(String file1, String file2) throws IOException
{
List<String> lines1 = readLines(file1);
List<String> lines2 = readLines(file2);
Iterator<String> linesItr = lines1.iterator();
while (linesItr.hasNext()) {
String checkLine = linesItr.next();
if (lines2.contains(checkLine)) {
linesItr.remove();
lines2.remove(checkLine);
}
}
//now lines1 will contain string that are not present in lines2
//now lines2 will contain string that are not present in lines1
System.out.println(lines1);
System.out.println(lines2);
}
public static List<String> readLines(String fileName) throws IOException
{
List<String> lines = new ArrayList<String>();
FileInputStream fs = new FileInputStream(fileName);
BufferedReader br = new BufferedReader(new InputStreamReader(fs));
String line = null;
while ((line = br.readLine()) != null) {
String account = line.substring(0, 10);
lines.add(account);
}
return lines;
}
Perhaps you are looking for something like this
Set<String> set1 = new HashSet<>(FileUtils.readLines(new File("C:/Analysis/RL001.TXT")));
Set<String> set2 = new HashSet<>(FileUtils.readLines(new File("C:/Analysis/RL003.TXT")));
Set<String> onlyInSet1 = new HashSet<>(set1);
onlyInSet1.removeAll(set2);
Set<String> onlyInSet2 = new HashSet<>(set2);
onlyInSet2.removeAll(set1);
If you guarantee that the files will always be the same format, and each readLine() function is going to return a different number, why not have an array of strings, rather than a single string. You can then compare the outcome with greater ease.
Ok, first I would save the two sets of strings in to collections
Set<String> s1 = new HashSet<String>(), s2 = new HashSet<String>();
//...
while ((line = br.readLine()) != null) {
//...
s1.add(line);
}
Then you can compare those sets and find elements that do not appear in both sets. You can find some ideas on how to do that here.
If you need to know the line number as well, you could just create a String wrapper:
class Element {
public String str;
public int lineNr;
public boolean equals(Element compElement) {
return compElement.str.equals(str);
}
}
Then you can just use Set<Element> instead.
Open two Scanners, and :
final TreeSet<Integer> ts1 = new TreeSet<Integer>();
final TreeSet<Integer> ts2 = new TreeSet<Integer>();
while (scan1.hasNextLine() && scan2.hasNexLine) {
ts1.add(Integer.valueOf(scan1.nextLigne().subString(0,10));
ts1.add(Integer.valueOf(scan1.nextLigne().subString(0,10));
}
You can now compare ordered results of the two trees
EDIT
Modified with TreeSet
Put values from each file to two separate HashSets accordingly.
Iterate over one of the HashSets and check whether each value exists in the other HashSet. Report if not.
Iterate over other HashSet and do same thing for this.

Categories