Compare values in two files - java

I have two files Which should contain the same values between Substring 0 and 10 though not in order. I have Managed to Outprint the values in each file but I need to Know how to Report say id the Value is in the first File and Notin the second file and vice versa. The files are in these formats.
6436346346....Other details
9348734873....Other details
9349839829....Other details
second file
8484545487....Other details
9348734873....Other details
9349839829....Other details
The first record in the first file does not appear in the second file and the first record in the second file does not appear in the first file. I need to be able to report this mismatch in this format:
Record 6436346346 is in the firstfile and not in the secondfile.
Record 8484545487 is in the secondfile and not in the firstfile.
Here is the code I currently have that gives me the required Output from the two files to compare.
package compare.numbers;
import java.io.*;
/**
*
* #author implvcb
*/
public class CompareNumbers {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
File f = new File("C:/Analysis/");
String line;
String line1;
try {
String firstfile = "C:/Analysis/RL001.TXT";
FileInputStream fs = new FileInputStream(firstfile);
BufferedReader br = new BufferedReader(new InputStreamReader(fs));
while ((line = br.readLine()) != null) {
String account = line.substring(0, 10);
System.out.println(account);
}
String secondfile = "C:/Analysis/RL003.TXT";
FileInputStream fs1 = new FileInputStream(secondfile);
BufferedReader br1 = new BufferedReader(new InputStreamReader(fs1));
while ((line1 = br1.readLine()) != null) {
String account1 = line1.substring(0, 10);
System.out.println(account1);
}
} catch (Exception e) {
e.fillInStackTrace();
}
}
}
Please help on how I can effectively achieve this.
I think I needed to say that am new to java and may not grab the ideas that easily but Am trying.

Here is the sample code to do that:
public static void eliminateCommon(String file1, String file2) throws IOException
{
List<String> lines1 = readLines(file1);
List<String> lines2 = readLines(file2);
Iterator<String> linesItr = lines1.iterator();
while (linesItr.hasNext()) {
String checkLine = linesItr.next();
if (lines2.contains(checkLine)) {
linesItr.remove();
lines2.remove(checkLine);
}
}
//now lines1 will contain string that are not present in lines2
//now lines2 will contain string that are not present in lines1
System.out.println(lines1);
System.out.println(lines2);
}
public static List<String> readLines(String fileName) throws IOException
{
List<String> lines = new ArrayList<String>();
FileInputStream fs = new FileInputStream(fileName);
BufferedReader br = new BufferedReader(new InputStreamReader(fs));
String line = null;
while ((line = br.readLine()) != null) {
String account = line.substring(0, 10);
lines.add(account);
}
return lines;
}

Perhaps you are looking for something like this
Set<String> set1 = new HashSet<>(FileUtils.readLines(new File("C:/Analysis/RL001.TXT")));
Set<String> set2 = new HashSet<>(FileUtils.readLines(new File("C:/Analysis/RL003.TXT")));
Set<String> onlyInSet1 = new HashSet<>(set1);
onlyInSet1.removeAll(set2);
Set<String> onlyInSet2 = new HashSet<>(set2);
onlyInSet2.removeAll(set1);

If you guarantee that the files will always be the same format, and each readLine() function is going to return a different number, why not have an array of strings, rather than a single string. You can then compare the outcome with greater ease.

Ok, first I would save the two sets of strings in to collections
Set<String> s1 = new HashSet<String>(), s2 = new HashSet<String>();
//...
while ((line = br.readLine()) != null) {
//...
s1.add(line);
}
Then you can compare those sets and find elements that do not appear in both sets. You can find some ideas on how to do that here.
If you need to know the line number as well, you could just create a String wrapper:
class Element {
public String str;
public int lineNr;
public boolean equals(Element compElement) {
return compElement.str.equals(str);
}
}
Then you can just use Set<Element> instead.

Open two Scanners, and :
final TreeSet<Integer> ts1 = new TreeSet<Integer>();
final TreeSet<Integer> ts2 = new TreeSet<Integer>();
while (scan1.hasNextLine() && scan2.hasNexLine) {
ts1.add(Integer.valueOf(scan1.nextLigne().subString(0,10));
ts1.add(Integer.valueOf(scan1.nextLigne().subString(0,10));
}
You can now compare ordered results of the two trees
EDIT
Modified with TreeSet

Put values from each file to two separate HashSets accordingly.
Iterate over one of the HashSets and check whether each value exists in the other HashSet. Report if not.
Iterate over other HashSet and do same thing for this.

Related

Read Tab-Separated-Columns into Lists - Java

Tab-Separated File:
2019-06-06 10:00:00 1.0
2019-06-06 11:00:00 2.0
I'd like to iterate over the file once and add the value of each column to a list.
My working approach would be:
import java.util.*;
import java.io.*;
public class Program {
public static void main(String[] args)
{
ArrayList<Double> List_1 = new ArrayList<Double>();
ArrayList<Double> List_2 = new ArrayList<Double>();
String[] values = null;
String fileName = "File.txt";
File file = new File(fileName);
try
{
Scanner inputStream = new Scanner(file);
while (inputStream.hasNextLine()){
try {
String data = inputStream.nextLine();
values = data.split("\\t");
if (values[1] != null && !values[1].isEmpty() == true) {
double val_1 = Double.parseDouble(values[1]);
List_1.add(val_1);
}
if (values[2] != null && !values[2].isEmpty() == true) {
double val_2 = Double.parseDouble(values[2]);
List_2.add(val_2);
}
}
catch (ArrayIndexOutOfBoundsException exception){
}
}
inputStream.close();
}
catch (FileNotFoundException e) {
e.printStackTrace();
}
System.out.println(List_1);
System.out.println(List_2);
}
}
I get:
[1.0]
[2.0]
It doesn't work without the checks for null, ìsEmpty and the ArrayIndexOutOfBoundsException.
I would appreciate any hints on how to save a few lines while keeping the scanner approach.
One option is to create a Map of Lists using column number as a key. This approach gives you "unlimited" number of columns and exactly the same output than one in the question.
public class Program {
public static void main(String[] args) throws Exception
{
Map<Integer, List<Double>> listMap = new TreeMap<Integer, List<Double>>();
String[] values = null;
String fileName = "File.csv";
File file = new File(fileName);
Scanner inputStream = new Scanner(file);
while (inputStream.hasNextLine()){
String data = inputStream.nextLine();
values = data.split("\\t");
for (int column = 1; column < values.length; column++) {
List<Double> list = listMap.get(column);
if (list == null) {
listMap.put(column, list = new ArrayList<Double>());
}
if (!values[column].isEmpty()) {
list.add(Double.parseDouble(values[column]));
}
}
}
inputStream.close();
for(List<Double> list : listMap.values()) {
System.out.println(list);
}
}
}
You can clean up your code some by using try-with resources to open and close the Scanner for you:
try (Scanner inputStream = new Scanner(file))
{
//your code...
}
This is useful because the inputStream will be closed automatically once the try block is left and you will not need to close it manually with inputStream.close();.
Additionally if you really want to "save lines" you can also combine these steps:
double val_2 = Double.parseDouble(values[2]);
List_2.add(val_2);
Into a single step each, since you do not actually use the val_2 anywhere else:
List_2.add(Double.parseDouble(values[2]));
Finally you are also using !values[1].isEmpty() == true which is comparing a boolean value to true. This is typically bad practice and you can reduce it to !values[1].isEmpty() instead which will have the same functionality. Try not to use == with booleans as there is no need.
you can do it like below:
BufferedReader bfr = Files.newBufferedReader(Paths.get("inputFileDir.tsv"));
String line = null;
List<List<String>> listOfLists = new ArrayList<>(100);
while((line = bfr.readLine()) != null) {
String[] cols = line.split("\\t");
List<String> outputList = new ArrayList<>(cols);
//at this line your expected list of cols of each line is ready to use.
listOfLists.add(outputList);
}
As a matter of fact, it is a simple code in java. But because it seems that you are a beginner in java and code like a python programmer, I decided to write a sample code to let you have a good start point. good luck

How to compare 2 files and remove non-existent lines?

I'm trying to remove non-existing lines from file 1 compared to file 2
Example:
Input
file 1
text
example
word
file 2
example
word
Output
file 1
example
word
My code is totally the opposite: it eliminates all duplicate words in the 2 files.
My actual output is:
file 1
text
Code
BufferedReader reader2 = new BufferedReader(new FileReader(file2));
Set<String> lines2 = new HashSet<String>(10000);
String line2;
while ((line2 = reader.readLine()) != null) {
lines2.add(line);
}
BufferedReader reader = new BufferedReader(new FileReader(file1));
Set<String> lines = new HashSet<String>(10000);
String line;
while ((line = reader.readLine()) != null) {
lines.add(line);
}
Set set3 = new HashSet(lines);
set3.removeAll(lines2);
You need the intersection between the two sets. Right now you are calculating the symmetrical difference between the sets.
public static void main(String []args){
Set<String> file1 = new HashSet<>();
Set<String> file2 = new HashSet<>();
file1.add("text");
file1.add("example");
file1.add("word");
file2.add("example");
file2.add("word");
Set<String> intersection = new HashSet<>(file1);
intersection.retainAll(file2);
System.out.println(intersection);
}
Output:
[word, example]
Ok you are almost there with your approach all you missing is another line of code were you call
lines.removeAll(set3);
then you have the set (lines) with the needed result.
In your original code, you read in file 2 then file 1 and just removed the words in file2 from file1, leaving the one different word.
Here I wrote out the code, and commented. You needed to have a set that then removed that one word from the complete list.
In my code I made a new set, just in case you want to rebuild the first set, and leave it as un-modified.
package scrapCompare;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
public class CompareLines {
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
//You create a set of words from file 1.
BufferedReader reader = new BufferedReader(new FileReader("file1"));
Set<String> lines = new HashSet<String>(10000);
String line;
while ((line = reader.readLine()) != null) {
lines.add(line);
}
//You create a set of words from file 2.
BufferedReader reader2 = new BufferedReader(new FileReader("file2"));
Set<String> lines2 = new HashSet<String>(10000);
String line2;
while ((line2 = reader2.readLine()) != null) {
lines2.add(line2);
}
//In your original code, you create a third set of words equal to file 1, and then delete all the words from file 2.
//It isolates the one different word, but you stopped there.
Set set3 = new HashSet(lines);
set3.removeAll(lines2);
lines.removeAll(set3);
//the answer set is made, in case you want to rebuild the lines set.
Set <String> answer = lines;
//iterator for printing to console.
Iterator<String> itr = answer.iterator();
//print the answer to console
while(itr.hasNext())
System.out.println(itr.next());
//close your readers
reader.close();
reader2.close();
}
}
public class RemoveLine {
public static void main(String[] args) throws IOException {
String file = "../file.txt";
String file1 = "../file1.txt";
String file2 = "../file2.txt";
BufferedReader reader2 = new BufferedReader(new FileReader(file2));
Set<String> lines2 = new HashSet<String>(10000);
String line2;
while ((line2 = reader2.readLine()) != null) {
lines2.add(line2);
}
BufferedReader reader1 = new BufferedReader(new FileReader(file1));
Set<String> lines1 = new HashSet<String>(10000);
String line1;
while ((line1 = reader1.readLine()) != null) {
lines1.add(line1);
}
Set<String> outPut = lines1.stream().filter(l1 -> lines2.stream().anyMatch(l2 -> l2.equals(l1))).collect(Collectors.toSet());
Charset utf8 = StandardCharsets.UTF_8;
Files.write(Paths.get(file), outPut, utf8, StandardOpenOption.CREATE);
}
}

Java: Read from txt file and store each word only once in array + sorting

I am having a problem with my program. What i am supposed to do is:
find all words from some txt files
store each word in array only once
Then sort alphabetically
I dont know how to ensure that each word won't appear twice(or more) in my array.
For example, a sentence from one of my files: My cat is huge and my dog is lazy.
I want the words "my" and "is" to appear only once in my array, not twice.
As for the sorting, is there anything that i can use from Java ? I don't know.
Any help is appreciated!
Here is what i have done so far:
try {
File dir = new File("path of folder that contains my files")
for (File f : dir.listFiles()) {
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(f)));
String line = null;
while((line = br.readLine())!= null) {
String [] tokens = line.split(",\\s+|\\s*\\\"\\s*|\\s+|\\.\\s*|\\s*\\:\\s*");
}
}
}
Here is the modified code to have sorted unique words:
try {
TreeSet<String> uniqueSortedWords = new TreeSet<String>();
File dir = new File(
"words.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(
new FileInputStream(dir)));
String line = null;
while ((line = br.readLine()) != null) {
String[] tokens = line
.split(",\\s+|\\s*\\\"\\s*|\\s+|\\.\\s*|\\s*\\:\\s*");
for(String token: tokens) {
uniqueSortedWords.add(token);
}
}
System.out.println(uniqueSortedWords);
//call uniqueSortedWords.toArray() to have output in an array
} catch (Exception e) {
e.printStackTrace();
}
ifI guess you are looking for a code something like this.
try {
ArrayList<String> list = new ArrayList<String>();
File dir = new File("path of folder that contains my files")
for (File f : dir.listFiles()) {
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(f)));
String line = null;
while((line = br.readLine())!= null) {
String [] tokens = line.split(",\\s+|\\s*\\\"\\s*|\\s+|\\.\\s*|\\s*\\:\\s*");
for(int i=0; i<tokens.length(); i++)
{ //Adding non-duplicates to arraylist
if (!list.contains(tokens[i])
{
list.add(tokens[i]);
}
}
}
Collections.Sort(list);
}
}
catch(Exception ex){}
Do not forget: import java.util.*; at the beginning of your code to use Collections.Sort();
EDIT
Even though contains is a built-in method you can directly use with ArrayLists, this is how such a method works in fact (just in case if you are curious):
public static boolean ifContains(ArrayList<String> list, String name) {
for (String item : list) {
if (item.getName().equals(name)) {
return true;
}
}
return false;
}
then to call it:
ifContains(list, tokens[i]))
You can use the combination of HashSet and TreeSet
Hashset:hashset allows null object.
TreeSet:treeset will not allow null object,treeset elements are sorted in ascending order by default.
Both HashSet and TreeSet does not hold duplicate elements.
try {
Set<String> list = new HashSet<>();
File f = new File("data.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(f)));
String line = null;
while ((line = br.readLine()) != null) {
String[] tokens = line.split(",\\s+|\\s*\\\"\\s*|\\s+|\\.\\s*|\\s*\\:\\s*");// other alternative:line.split("[,;-!]")
for (String token : tokens) {
list.add(token);
}
}
// Add the list to treeSet;Elements in treeSet are sorted
// Note: words must have the same case either lowercase or uppercase
// for sorting to work correctly
TreeSet<String> sortedSet = new TreeSet<>();
sortedSet.addAll(list);
Iterator<String> ite = sortedSet.iterator();
while (ite.hasNext()) {
System.out.println(ite.next());
}
} catch (Exception e) {
e.printStackTrace();
}

Java Remove Duplicates from file search for String Array [0]

I have a long text file.
Now I will remove duplicates from the file. The problem is that the search parameter is the first word in the list, split by ":"
For example:
The file lines:
11234567:229283:29833204:2394803
11234567:4577546765:655776:564456456
43523:455543:54335434:53445
11234567:43455:544354:5443
Now I will have this here:
11234567:229283:29833204:2394803
43523:455543:54335434:53445
I need to get the first line from the duplicates, other will be ignored.
I tried this:
Set<String> lines11;
try (BufferedReader reader11 = new BufferedReader(new FileReader("test.txt"))) {
lines11 = new HashSet<>(10000); // maybe should be bigger
String line11;
while ((line11 = reader11.readLine()) != null) {
lines11.add(line11);
}
} // maybe should be bigger
try (BufferedWriter writer11 = new BufferedWriter(new FileWriter("test.txt"))) {
for (String unique : lines11) {
writer11.write(unique);
writer11.newLine();
}
}
That is working, but it removes only when the complete line is duplicated.
How can I change it so that it looks for the first word in every line and checks for duplicates here; when no duplicate is found, save the complete line; if duplicate then ignore the line?
You need to maintain a Set<String> that holds only the first word of each line.
List<String> lines11;
Set<String> dups;
try (BufferedReader reader11 = new BufferedReader(new FileReader("test.txt"))) {
lines11 = new ArrayList<>();
dups = new HashSet<>();
String line11;
while ((line11 = reader11.readLine()) != null) {
String first = line11.split(":")[0]; // assuming your separator is :
if (!dups.contains(first)) {
lines11.add(line11);
dups.add(first);
}
}
}
try (BufferedWriter writer11 = new BufferedWriter(new FileWriter("test.txt"))) {
for (String unique : lines11) {
writer11.write(unique);
writer11.newLine();
}
}
i will write the section about adding to list
use HashMap
String tmp[] = null;
HashMap<String, String> lines = new HashMap<String, String>();
String line11 = "";
while ((line11 = reader11.readLine()) != null) {
tmp = line11.split(":");
if(!lines.containsKey(tmp[0])){
lines.put(tmp[0], line11);
}
}
so the loop will add only uinuque lines , using first word as key
You can add the data in list and take one more set in which you will add first word in that set and try add every time first of new line if it is in set, then it will not be added and return false. On that basis you can add data in list or directly in you new bufferreader.
List<String> lines11;
Set<String> uniqueRecords;
try (BufferedReader reader11 = new BufferedReader(new FileReader("test.txt"))) {
lines11 = new ArrayList<>(); // no need to give size it will increase dynamically
uniqueRecords = new HashSet<>();
String line11;
while ((line11 = reader11.readLine()) != null) {
String firstWord = line11.substring(0, firstWord.firstIndexOf(" "));
if(uniqueRecords.add(firstWord )){
lines11.add(line11);
}
}
} // maybe should be bigger
try (BufferedWriter writer11 = new BufferedWriter(new FileWriter("test.txt"))) {
for (String unique : lines11) {
writer11.write(unique);
writer11.newLine();
}
}

How do I read from a File to an array

I am trying to read from a file to an array. I tried two different styles and both aren't working. Below are the two styles.
Style 1
public class FileRead {
int i;
String a[] = new String[2];
public void read() throws FileNotFoundException {
//Z means: "The end of the input but for the final terminator, if any"
a[i] = new Scanner(new File("C:\\Users\\nnanna\\Documents\\login.txt")).useDelimiter("\\n").next();
for(i=0; i<=a.length; i++){
System.out.println("" + a[i]);
}
}
public static void main(String args[]) throws FileNotFoundException{
new FileRead().read();
}
}
Style 2
public class FileReadExample {
private int j = 0;
String path = null;
public void fileRead(File file){
StringBuilder attachPhoneNumber = new StringBuilder();
try{
FileReader read = new FileReader(file);
BufferedReader bufferedReader = new BufferedReader(read);
while((path = bufferedReader.readLine()) != null){
String a[] = new String[3];
a[j] = path;
j++;
System.out.println(path);
System.out.println(a[j]);
}
bufferedReader.close();
}catch(IOException exception){
exception.printStackTrace();
}
}
I need it to read each line of string and store each line in an array. But neither works. How do I go about it?
Do yourself a favor and use a library that provides this functionality for you, e.g.
Guava:
// one String per File
String data = Files.toString(file, Charsets.UTF_8);
// or one String per Line
List<String> data = Files.readLines(file, Charsets.UTF_8);
Commons / IO:
// one String per File
String data = FileUtils.readFileToString(file, "UTF-8");
// or one String per Line
List<String> data = FileUtils.readLines(file, "UTF-8");
It's not really clear exactly what you're trying to do (partly with quite a lot of code commented out, leaving other code which won't even compile), but I'd recommend you look at using Guava:
List<String> lines = Files.readLines(file, Charsets.UTF_8);
That way you don't need to mess around with the file handling yourself at all.

Categories