How to compare 2 files and remove non-existent lines? - java

I'm trying to remove non-existing lines from file 1 compared to file 2
Example:
Input
file 1
text
example
word
file 2
example
word
Output
file 1
example
word
My code is totally the opposite: it eliminates all duplicate words in the 2 files.
My actual output is:
file 1
text
Code
BufferedReader reader2 = new BufferedReader(new FileReader(file2));
Set<String> lines2 = new HashSet<String>(10000);
String line2;
while ((line2 = reader.readLine()) != null) {
lines2.add(line);
}
BufferedReader reader = new BufferedReader(new FileReader(file1));
Set<String> lines = new HashSet<String>(10000);
String line;
while ((line = reader.readLine()) != null) {
lines.add(line);
}
Set set3 = new HashSet(lines);
set3.removeAll(lines2);

You need the intersection between the two sets. Right now you are calculating the symmetrical difference between the sets.
public static void main(String []args){
Set<String> file1 = new HashSet<>();
Set<String> file2 = new HashSet<>();
file1.add("text");
file1.add("example");
file1.add("word");
file2.add("example");
file2.add("word");
Set<String> intersection = new HashSet<>(file1);
intersection.retainAll(file2);
System.out.println(intersection);
}
Output:
[word, example]

Ok you are almost there with your approach all you missing is another line of code were you call
lines.removeAll(set3);
then you have the set (lines) with the needed result.

In your original code, you read in file 2 then file 1 and just removed the words in file2 from file1, leaving the one different word.
Here I wrote out the code, and commented. You needed to have a set that then removed that one word from the complete list.
In my code I made a new set, just in case you want to rebuild the first set, and leave it as un-modified.
package scrapCompare;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
public class CompareLines {
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
//You create a set of words from file 1.
BufferedReader reader = new BufferedReader(new FileReader("file1"));
Set<String> lines = new HashSet<String>(10000);
String line;
while ((line = reader.readLine()) != null) {
lines.add(line);
}
//You create a set of words from file 2.
BufferedReader reader2 = new BufferedReader(new FileReader("file2"));
Set<String> lines2 = new HashSet<String>(10000);
String line2;
while ((line2 = reader2.readLine()) != null) {
lines2.add(line2);
}
//In your original code, you create a third set of words equal to file 1, and then delete all the words from file 2.
//It isolates the one different word, but you stopped there.
Set set3 = new HashSet(lines);
set3.removeAll(lines2);
lines.removeAll(set3);
//the answer set is made, in case you want to rebuild the lines set.
Set <String> answer = lines;
//iterator for printing to console.
Iterator<String> itr = answer.iterator();
//print the answer to console
while(itr.hasNext())
System.out.println(itr.next());
//close your readers
reader.close();
reader2.close();
}
}

public class RemoveLine {
public static void main(String[] args) throws IOException {
String file = "../file.txt";
String file1 = "../file1.txt";
String file2 = "../file2.txt";
BufferedReader reader2 = new BufferedReader(new FileReader(file2));
Set<String> lines2 = new HashSet<String>(10000);
String line2;
while ((line2 = reader2.readLine()) != null) {
lines2.add(line2);
}
BufferedReader reader1 = new BufferedReader(new FileReader(file1));
Set<String> lines1 = new HashSet<String>(10000);
String line1;
while ((line1 = reader1.readLine()) != null) {
lines1.add(line1);
}
Set<String> outPut = lines1.stream().filter(l1 -> lines2.stream().anyMatch(l2 -> l2.equals(l1))).collect(Collectors.toSet());
Charset utf8 = StandardCharsets.UTF_8;
Files.write(Paths.get(file), outPut, utf8, StandardOpenOption.CREATE);
}
}

Related

How would I transfer 5-letter words into a new text file?

Let's say I have a txt file that has the whole dictionary in it. how would I make this code be able to transer only 5-letter words into a new created txt file?
import java.io.*;
public class wordwebster {
public static void main(String[] args) throws IOException {
int five = 0;
File directory = new File(".");
String webster = directory.getCanonicalPath() + File.separator+ "webster.txt";
String fiveLetterWords = directory.getCanonicalPath()+ File.separator +"fiveLetterWords.txt";
File fin = new File(webster);
FileInputStream file = new FileInputStream(fin);
BufferedReader input = new BufferedReader(new InputStreamReader(file));
FileWriter fileStream = new FileWriter(fiveLetterWords,true);
BufferedWriter output = new BufferedWriter(fileStream);
String line = null;
while ((line = input.readLine())!= null){
output.write(line);
output.newLine();
}
input.close();
output.close();
}
}
EDIT:
As asked, let's say the input file (webster.txt) contain the words
Sentence
Frequent
Hello
Send
Variety
False
I would need only five letter words be extracted (Hello and False) and be put into a new file (fiveLetterWords.txt).
If you need to allow only words whose length is exactly five, you can just put an if condition to check before writing into file. Modify your while loop to this,
while ((line = input.readLine()) != null) {
if (line.trim().length() == 5) {
output.write(line);
output.newLine();
}
}
Hope this helps. Let me know if you face any issues.

Java: Read from txt file and store each word only once in array + sorting

I am having a problem with my program. What i am supposed to do is:
find all words from some txt files
store each word in array only once
Then sort alphabetically
I dont know how to ensure that each word won't appear twice(or more) in my array.
For example, a sentence from one of my files: My cat is huge and my dog is lazy.
I want the words "my" and "is" to appear only once in my array, not twice.
As for the sorting, is there anything that i can use from Java ? I don't know.
Any help is appreciated!
Here is what i have done so far:
try {
File dir = new File("path of folder that contains my files")
for (File f : dir.listFiles()) {
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(f)));
String line = null;
while((line = br.readLine())!= null) {
String [] tokens = line.split(",\\s+|\\s*\\\"\\s*|\\s+|\\.\\s*|\\s*\\:\\s*");
}
}
}
Here is the modified code to have sorted unique words:
try {
TreeSet<String> uniqueSortedWords = new TreeSet<String>();
File dir = new File(
"words.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(
new FileInputStream(dir)));
String line = null;
while ((line = br.readLine()) != null) {
String[] tokens = line
.split(",\\s+|\\s*\\\"\\s*|\\s+|\\.\\s*|\\s*\\:\\s*");
for(String token: tokens) {
uniqueSortedWords.add(token);
}
}
System.out.println(uniqueSortedWords);
//call uniqueSortedWords.toArray() to have output in an array
} catch (Exception e) {
e.printStackTrace();
}
ifI guess you are looking for a code something like this.
try {
ArrayList<String> list = new ArrayList<String>();
File dir = new File("path of folder that contains my files")
for (File f : dir.listFiles()) {
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(f)));
String line = null;
while((line = br.readLine())!= null) {
String [] tokens = line.split(",\\s+|\\s*\\\"\\s*|\\s+|\\.\\s*|\\s*\\:\\s*");
for(int i=0; i<tokens.length(); i++)
{ //Adding non-duplicates to arraylist
if (!list.contains(tokens[i])
{
list.add(tokens[i]);
}
}
}
Collections.Sort(list);
}
}
catch(Exception ex){}
Do not forget: import java.util.*; at the beginning of your code to use Collections.Sort();
EDIT
Even though contains is a built-in method you can directly use with ArrayLists, this is how such a method works in fact (just in case if you are curious):
public static boolean ifContains(ArrayList<String> list, String name) {
for (String item : list) {
if (item.getName().equals(name)) {
return true;
}
}
return false;
}
then to call it:
ifContains(list, tokens[i]))
You can use the combination of HashSet and TreeSet
Hashset:hashset allows null object.
TreeSet:treeset will not allow null object,treeset elements are sorted in ascending order by default.
Both HashSet and TreeSet does not hold duplicate elements.
try {
Set<String> list = new HashSet<>();
File f = new File("data.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(f)));
String line = null;
while ((line = br.readLine()) != null) {
String[] tokens = line.split(",\\s+|\\s*\\\"\\s*|\\s+|\\.\\s*|\\s*\\:\\s*");// other alternative:line.split("[,;-!]")
for (String token : tokens) {
list.add(token);
}
}
// Add the list to treeSet;Elements in treeSet are sorted
// Note: words must have the same case either lowercase or uppercase
// for sorting to work correctly
TreeSet<String> sortedSet = new TreeSet<>();
sortedSet.addAll(list);
Iterator<String> ite = sortedSet.iterator();
while (ite.hasNext()) {
System.out.println(ite.next());
}
} catch (Exception e) {
e.printStackTrace();
}

Java FileReader: How to assign every line in the text file to a variable

I want every line in my textdoc to be assigned to a variable.
import java.io.*;
import static java.lang.System.*;
class readfile {
public static void main(String[] args) {
try {
FileReader fr = new FileReader("filename");
BufferedReader br = new Buffered(fr);
String str;
while ((str = br.readLine()) != null) {}
br.close();
} catch (IOException e) {
out.println("file not found");
}
}
}
I would suggest you create a List and store every line in a list like below:
String str;
List<String> fileText = ....;
while ((str = br.readLine()) != null) {
fileText.add(str);
}
A Java 8 solution for creating a List of lines
Path path = Paths.get("filename");
List<String> lines = Files.lines(path).collect(Collectors.toList());
why do you want to add each line to a separate variable? It is better to add the lines to a list. Then you can access any line as you want.
In JDK 6 or below
List<String> lines = new ArrayList<String>();
while(reader.ready())
lines.add(reader.readLine());
In JDK 7 or above
List<String> lines = Files.readAllLines(Paths.get(fileName),
Charset.defaultCharset());
I would do
List<string> allText= new List<String>();
While(str.hasNextLine){
allText.add(str.nextLine);
}

Compare values in two files

I have two files Which should contain the same values between Substring 0 and 10 though not in order. I have Managed to Outprint the values in each file but I need to Know how to Report say id the Value is in the first File and Notin the second file and vice versa. The files are in these formats.
6436346346....Other details
9348734873....Other details
9349839829....Other details
second file
8484545487....Other details
9348734873....Other details
9349839829....Other details
The first record in the first file does not appear in the second file and the first record in the second file does not appear in the first file. I need to be able to report this mismatch in this format:
Record 6436346346 is in the firstfile and not in the secondfile.
Record 8484545487 is in the secondfile and not in the firstfile.
Here is the code I currently have that gives me the required Output from the two files to compare.
package compare.numbers;
import java.io.*;
/**
*
* #author implvcb
*/
public class CompareNumbers {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
File f = new File("C:/Analysis/");
String line;
String line1;
try {
String firstfile = "C:/Analysis/RL001.TXT";
FileInputStream fs = new FileInputStream(firstfile);
BufferedReader br = new BufferedReader(new InputStreamReader(fs));
while ((line = br.readLine()) != null) {
String account = line.substring(0, 10);
System.out.println(account);
}
String secondfile = "C:/Analysis/RL003.TXT";
FileInputStream fs1 = new FileInputStream(secondfile);
BufferedReader br1 = new BufferedReader(new InputStreamReader(fs1));
while ((line1 = br1.readLine()) != null) {
String account1 = line1.substring(0, 10);
System.out.println(account1);
}
} catch (Exception e) {
e.fillInStackTrace();
}
}
}
Please help on how I can effectively achieve this.
I think I needed to say that am new to java and may not grab the ideas that easily but Am trying.
Here is the sample code to do that:
public static void eliminateCommon(String file1, String file2) throws IOException
{
List<String> lines1 = readLines(file1);
List<String> lines2 = readLines(file2);
Iterator<String> linesItr = lines1.iterator();
while (linesItr.hasNext()) {
String checkLine = linesItr.next();
if (lines2.contains(checkLine)) {
linesItr.remove();
lines2.remove(checkLine);
}
}
//now lines1 will contain string that are not present in lines2
//now lines2 will contain string that are not present in lines1
System.out.println(lines1);
System.out.println(lines2);
}
public static List<String> readLines(String fileName) throws IOException
{
List<String> lines = new ArrayList<String>();
FileInputStream fs = new FileInputStream(fileName);
BufferedReader br = new BufferedReader(new InputStreamReader(fs));
String line = null;
while ((line = br.readLine()) != null) {
String account = line.substring(0, 10);
lines.add(account);
}
return lines;
}
Perhaps you are looking for something like this
Set<String> set1 = new HashSet<>(FileUtils.readLines(new File("C:/Analysis/RL001.TXT")));
Set<String> set2 = new HashSet<>(FileUtils.readLines(new File("C:/Analysis/RL003.TXT")));
Set<String> onlyInSet1 = new HashSet<>(set1);
onlyInSet1.removeAll(set2);
Set<String> onlyInSet2 = new HashSet<>(set2);
onlyInSet2.removeAll(set1);
If you guarantee that the files will always be the same format, and each readLine() function is going to return a different number, why not have an array of strings, rather than a single string. You can then compare the outcome with greater ease.
Ok, first I would save the two sets of strings in to collections
Set<String> s1 = new HashSet<String>(), s2 = new HashSet<String>();
//...
while ((line = br.readLine()) != null) {
//...
s1.add(line);
}
Then you can compare those sets and find elements that do not appear in both sets. You can find some ideas on how to do that here.
If you need to know the line number as well, you could just create a String wrapper:
class Element {
public String str;
public int lineNr;
public boolean equals(Element compElement) {
return compElement.str.equals(str);
}
}
Then you can just use Set<Element> instead.
Open two Scanners, and :
final TreeSet<Integer> ts1 = new TreeSet<Integer>();
final TreeSet<Integer> ts2 = new TreeSet<Integer>();
while (scan1.hasNextLine() && scan2.hasNexLine) {
ts1.add(Integer.valueOf(scan1.nextLigne().subString(0,10));
ts1.add(Integer.valueOf(scan1.nextLigne().subString(0,10));
}
You can now compare ordered results of the two trees
EDIT
Modified with TreeSet
Put values from each file to two separate HashSets accordingly.
Iterate over one of the HashSets and check whether each value exists in the other HashSet. Report if not.
Iterate over other HashSet and do same thing for this.

How to populate an ArrayList from words in a text file?

I have a text file containing words separated by newline , like the following format:
>hello
>world
>example
How do i create an ArrayList and store each word as an element?
You can use apache commons FileUtils.readLines().
I think the List it returns is already an ArrayList, but you can use the constructor ArrayList(Collection) to make sure you get one.
public static void main(String[] args) throws IOException{
// TODO Auto-generated method stub
File file = new File("names.txt");
ArrayList<String> names = new ArrayList<String>();
Scanner in = new Scanner(file);
while (in.hasNextLine()){
names.add(in.nextLine());
}
Collections.sort(names);
for(int i=0; i<names.size(); ++i){
System.out.println(names.get(i));
}
The simplest way is to use Guava:
File file = new File("foo.txt");
List<String> words = Files.readLines(file, Charsets.UTF_8);
(It's not guaranteed to be an ArrayList, but I'd hope that wouldn't matter.)
You read the file line-by-line, create an ArrayList for Strings, and add line.substring(1) to the defined ArrayList if line.length>0.
I put the file at "C:\file.txt"; if you run the following it fils an ArrayList with the words and prints them.
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.List;
public class Main {
public static void main(String[] args) throws Exception {
File file = new File("C:\\file.txt");
BufferedReader br = new BufferedReader(new FileReader(file));
List<String> lines = new ArrayList<String>();
String line = br.readLine();
while(line != null) {
lines.add(line.replace(">", ""));
line = br.readLine();
}
for(String l : lines) {
System.out.println(l);
}
}
}
I'm sure they're lots of libraries that do this with 1 line, but here's a "pure" Java implementation:
Notice that we've "wrapped"/"decorated" etc. a standard FileReader (which only has read one byte at a time) with a BufferedReader which gives us a nicer readLine() method.
BufferedReader reader = null;
try {
reader = new BufferedReader(new InputStreamReader(
new FileInputStream("test.txt"),
Charset.forName("ISO-8859-1")));
List<String> lines = new ArrayList<String>();
String line;
while ((line = reader.readLine()) != null) {
lines.add(line);
}
System.out.println(lines);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (reader != null) {
reader.close();
}
}

Categories