Set with duplicates java - import from file - java - java

I have a small project.
The project imports the txt file to String (coding similar to CSV - contains semicolons = ";").
In the next steps, the String is changed to ArrayList.
Then, using Predicate, I remove elements that do not interest me.
At the end I replace ArrayList on TreeSet to remove duplicates.
Unfortunately, there is a problem here because the duplicates occur ...
I checked in Notepadd ++ changing the encoding on ANSI to check whether there are no unnecessary characters.
Unfortunately, everything looks good and duplicates are still there.
Uploaded input file - https://drive.google.com/open?id=1OqIKUTvMwK3FPzNvutLu-GYpvocUsSgu
Any idea?
public class OpenSCV {
private static final String SAMPLE_CSV_FILE_PATH = "/Downloads/all.txt";
public static void main(String[] args) throws IOException {
File file = new File(SAMPLE_CSV_FILE_PATH);
String str = FileUtils.readFileToString(file, "utf-8");
str = str.trim();
String str2 = str.replace("\n", ";").replace("\"", "" ).replace("\n\n",";").replace("\\*www.*\\","")
.replace("\u0000","").replace(",",";").replace(" ","").replaceAll(";{2,}",";");
List<String> lista1 = new ArrayList<>(Arrays.asList((str2.split(";"))));
Predicate<String> predicate = s -> !(s.contains("#"));
Set<String> removeDuplicates = new TreeSet<>(lista1);
removeDuplicates.removeIf(predicate);
String fileName2 = "/Downloads/allMails.txt";
try ( BufferedWriter bw =
new BufferedWriter (new FileWriter (fileName2)) )
{
for (String line : removeDuplicates) {
bw.write (line + "\n");
}
bw.close ();
} catch (IOException e) {
e.printStackTrace ();
}
}
}

before doing str.replace you can try str.trim to remove any spaces or unwanted and unseen characters.
str = str.trim()

Related

JAVA String Reversing order of string in file io

I have to write code that will reverse the order of the string and write it in a new file. For example :
Hi my name is Bob.
I am ten years old.
The reversed will be :
I am ten years old.
Hi my name is Bob.
This is what I have so far. Not sure what to write for the outWriter print statement. Any help will be appreciated. Thanks!
import java.io.*;
import java.util.ArrayList;
import java.util.Scanner;
public class FileRewinder {
public static void main(String[] args) {
File inputFile = new File("ascii.txt");
ArrayList<String> list1 = new ArrayList<String>();
Scanner inputScanner;
try {
inputScanner = new Scanner(inputFile);
} catch (FileNotFoundException f) {
System.out.println("File not found :" + f);
return;
}
while (inputScanner.hasNextLine()) {
String curLine = inputScanner .nextLine();
System.out.println(curLine );
}
inputScanner.close();
File outputFile = new File("hi.txt");
PrintWriter outWriter = null;
try {
outWriter = new PrintWriter(outputFile);
} catch (FileNotFoundException e) {
System.out.println("File not found :" + e);
return;
}
outWriter.println(???);
outWriter.close();
}
}
My suggestion is read entire file first and store sentences(you can split by .) in a LinkedList<String>(this will keep insertion order)
Then use Iterator and get sentences in reverse order. and write them into a file. make sure to put . just after each sentence.
After System.out.println(curLine ); add list1.add(curline); that will place your lines of text into your list.
At the end create a loop over list1 backwards:
for(int i = list1.size() - 1 , i > 0, --i) {
outWriter.println(list1[i]);
}
If the file contains an amount of lines which can be loaded into the memory. You can read all lines into a list, reverse the order of the list and write the list back to the disk.
public class Reverse {
static final Charset FILE_ENCODING = StandardCharsets.UTF_8;
public static void main(String[] args) throws IOException {
List<String> inLines = Files.readAllLines(Paths.get("ascii.txt"), FILE_ENCODING);
Collections.reverse(inLines);
Files.write(Paths.get("hi.txt"), inLines, FILE_ENCODING);
}
}

Java BufferedWriter isn't working

Im having a problem with a BufferedWriter. I am reading in a 50,000 word wordlist, using a stemming algorithm and creating a new wordlist that just contains the word stems. Instead of this new file containing any stems however it litrally just contains:
-
Here is my code:
public static void main(String[] args) {
BufferedReader reader=null;
BufferedWriter writer=null;
try {
writer = new BufferedWriter(new FileWriter(new File("src/newwordlist.txt")));
HashSet<String> db = new HashSet<String>();
reader = new BufferedReader(new InputStreamReader(new FileInputStream("src/wordlist"),"UTF-8"));
String word;
int i=0;
while ((word=reader.readLine())!=null) {
i++;
Stemmer s= new Stemmer();
s.addword(word);
s.stem();
String stem =s.toString();
if(!db.contains(stem)){
db.add(stem);
writer.write(stem);
//System.out.println(stem);
}
}
System.out.println("Reduced file from " + i + " words to " + db.size());
reader.close();
writer.close();
} catch (IOException e1) {
e1.printStackTrace();
}
}
The output i get on the console is:
Reduced file from 58110 words to 28201
So i know its working. Ive also tried changing writer.write(stem); to writer.write("hi"); and I still get the same output in newwordlist.txt.
I know its no fault of the Stemmer class, Ive tried outputting the stem string (where I commented the code) and that produced the correct output to console so the fault must be with the writer but I dont understand what.
Edit 1
I simplified to code to:
BufferedReader reader=null;
BufferedWriter writer=null;
try {
writer = new BufferedWriter(new FileWriter(new File("src/newwordlist.txt")));
HashSet<String> db = new HashSet<String>();
reader = new BufferedReader(new InputStreamReader(new FileInputStream("src/wordlist.txt"),"UTF-8"));
String word;
int i=0;
while ((word=reader.readLine())!=null) {
i++;
if(!db.contains(word)){
db.add(word);
writer.write("hi");
}
}
System.out.println("Reduced file from " + i + " words to " + db.size());
reader.close();
writer.close();
} catch (IOException e1) {
e1.printStackTrace();
}
Now i get console output:
Reduced file from 58110 words to 58109
But the output file is still blank
I would expect the code as given in the Question to produce a file that consists of one line, consisting of all of the "stems" concatenated. (Or in the "hi" version, one line consisting of "hihihi...." repeated a large number of times.)
It is conceivable that whatever you are using to view the file cannot cope with an input file that consists of many thousands of characters ... and no end-of-line.
Change
writer.write(stem);
to
writer.write(stem);
writer.write(EOL);
where EOL is the platform specific end-of-line sequence.
Assuming you are using Java 7, it would be better to use try-with-resource to make sure that the output stream is always closed / flushed, even if there is an error:
public static void main(String[] args) {
try (BufferedReader reader = new BufferedReader(
new InputStreamReader(new FileInputStream("src/wordlist"), "UTF-8"));
BufferedWriter writer = new BufferedWriter(new FileWriter(
new File("src/newwordlist.txt")));
HashSet<String> db = new HashSet<>();
String EOL = System.getProperty("line.separator");
String word;
int i = 0;
while ((word = reader.readLine()) != null) {
i++;
Stemmer s = new Stemmer();
s.addword(word);
s.stem();
String stem = s.toString();
if (db.add(stem)) {
writer.write(stem);
writer.write(EOL);
}
}
System.out.println("Reduced file from " + i + " words to " + db.size());
} catch (IOException e1) {
e1.printStackTrace();
}
}
(I tidied up a couple of other things too ...)
The reason you get Reduced file from 58110 words to 58109 console output is that you only have one System.out.println statement after the loop.
The writer should write words only to the output file src/newwordlist.txt and not to the console. If you want your program to output words to the console add additional System.out.println(word) after writer.write("hi");
Hope this helps...
Works for me. Is this your exact class, did you edit it before pasting in?
wordlist;
the
cat
sat
on
the
mat
newwordlist.txt;
thecatsatonmat
My Stemmer just returns the word you gave it.
public class Stemmer {
private String word;
public void addword(String word) {
this.word = word;
}
public void stem() {
// TODO Auto-generated method stub
}
#Override
public String toString() {
return word;
}
}
According to the Java documentation you need to use BufferedWriter.write() as follows:
write(string,offset,length);
so try:
writer.write(stem,0,stem.length());
When I run your edited code I get one line with
hihihihihihihihihihihihihi ............
As expected.
Perhaps you intended to add newline characters line this.
if(!db.contains(word)){
db.add(word);
writer.write(word);
writer.write("\n");
}

reading from text file to string array

So I can search for a string in my text file, however, I wanted to sort data within this ArrayList and implement an algorithm. Is it possible to read from a text file and the values [Strings] within the text file be stored in a String[] Array.
Also is it possible to separate the Strings? So instead of my Array having:
[Alice was beginning to get very tired of sitting by her sister on the, bank, and of having nothing to do:]
is it possible to an array as:
["Alice", "was" "beginning" "to" "get"...]
.
public static void main(String[]args) throws IOException
{
Scanner scan = new Scanner(System.in);
String stringSearch = scan.nextLine();
BufferedReader reader = new BufferedReader(new FileReader("File1.txt"));
List<String> words = new ArrayList<String>();
String line;
while ((line = reader.readLine()) != null) {
words.add(line);
}
for(String sLine : words)
{
if (sLine.contains(stringSearch))
{
int index = words.indexOf(sLine);
System.out.println("Got a match at line " + index);
}
}
//Collections.sort(words);
//for (String str: words)
// System.out.println(str);
int size = words.size();
System.out.println("There are " + size + " Lines of text in this text file.");
reader.close();
System.out.println(words);
}
To split a line into an array of words, use this:
String words = sentence.split("[^\\w']+");
The regex [^\w'] means "not a word char or an apostrophe"
This will capture words with embedded apostrophes like "can't" and skip over all punctuation.
Edit:
A comment has raised the edge case of parsing a quoted word such as 'this' as this.
Here's the solution for that - you have to first remove wrapping quotes:
String[] words = input.replaceAll("(^|\\s)'([\\w']+)'(\\s|$)", "$1$2$3").split("[^\\w']+");
Here's some test code with edge and corner cases:
public static void main(String[] args) throws Exception {
String input = "'I', ie \"me\", can't extract 'can't' or 'can't'";
String[] words = input.replaceAll("(^|[^\\w'])'([\\w']+)'([^\\w']|$)", "$1$2$3").split("[^\\w']+");
System.out.println(Arrays.toString(words));
}
Output:
[I, ie, me, can't, extract, can't, or, can't]
Also is it possible to separate the Strings?
Yes, You can split string by using this for white spaces.
String[] strSplit;
String str = "This is test for split";
strSplit = str.split("[\\s,;!?\"]+");
See String API
Moreover you can also read a text file word by word.
Scanner scan = null;
try {
scan = new Scanner(new BufferedReader(new FileReader("Your File Path")));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
while(scan.hasNext()){
System.out.println( scan.next() );
}
See Scanner API

Compare values in two files

I have two files Which should contain the same values between Substring 0 and 10 though not in order. I have Managed to Outprint the values in each file but I need to Know how to Report say id the Value is in the first File and Notin the second file and vice versa. The files are in these formats.
6436346346....Other details
9348734873....Other details
9349839829....Other details
second file
8484545487....Other details
9348734873....Other details
9349839829....Other details
The first record in the first file does not appear in the second file and the first record in the second file does not appear in the first file. I need to be able to report this mismatch in this format:
Record 6436346346 is in the firstfile and not in the secondfile.
Record 8484545487 is in the secondfile and not in the firstfile.
Here is the code I currently have that gives me the required Output from the two files to compare.
package compare.numbers;
import java.io.*;
/**
*
* #author implvcb
*/
public class CompareNumbers {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
File f = new File("C:/Analysis/");
String line;
String line1;
try {
String firstfile = "C:/Analysis/RL001.TXT";
FileInputStream fs = new FileInputStream(firstfile);
BufferedReader br = new BufferedReader(new InputStreamReader(fs));
while ((line = br.readLine()) != null) {
String account = line.substring(0, 10);
System.out.println(account);
}
String secondfile = "C:/Analysis/RL003.TXT";
FileInputStream fs1 = new FileInputStream(secondfile);
BufferedReader br1 = new BufferedReader(new InputStreamReader(fs1));
while ((line1 = br1.readLine()) != null) {
String account1 = line1.substring(0, 10);
System.out.println(account1);
}
} catch (Exception e) {
e.fillInStackTrace();
}
}
}
Please help on how I can effectively achieve this.
I think I needed to say that am new to java and may not grab the ideas that easily but Am trying.
Here is the sample code to do that:
public static void eliminateCommon(String file1, String file2) throws IOException
{
List<String> lines1 = readLines(file1);
List<String> lines2 = readLines(file2);
Iterator<String> linesItr = lines1.iterator();
while (linesItr.hasNext()) {
String checkLine = linesItr.next();
if (lines2.contains(checkLine)) {
linesItr.remove();
lines2.remove(checkLine);
}
}
//now lines1 will contain string that are not present in lines2
//now lines2 will contain string that are not present in lines1
System.out.println(lines1);
System.out.println(lines2);
}
public static List<String> readLines(String fileName) throws IOException
{
List<String> lines = new ArrayList<String>();
FileInputStream fs = new FileInputStream(fileName);
BufferedReader br = new BufferedReader(new InputStreamReader(fs));
String line = null;
while ((line = br.readLine()) != null) {
String account = line.substring(0, 10);
lines.add(account);
}
return lines;
}
Perhaps you are looking for something like this
Set<String> set1 = new HashSet<>(FileUtils.readLines(new File("C:/Analysis/RL001.TXT")));
Set<String> set2 = new HashSet<>(FileUtils.readLines(new File("C:/Analysis/RL003.TXT")));
Set<String> onlyInSet1 = new HashSet<>(set1);
onlyInSet1.removeAll(set2);
Set<String> onlyInSet2 = new HashSet<>(set2);
onlyInSet2.removeAll(set1);
If you guarantee that the files will always be the same format, and each readLine() function is going to return a different number, why not have an array of strings, rather than a single string. You can then compare the outcome with greater ease.
Ok, first I would save the two sets of strings in to collections
Set<String> s1 = new HashSet<String>(), s2 = new HashSet<String>();
//...
while ((line = br.readLine()) != null) {
//...
s1.add(line);
}
Then you can compare those sets and find elements that do not appear in both sets. You can find some ideas on how to do that here.
If you need to know the line number as well, you could just create a String wrapper:
class Element {
public String str;
public int lineNr;
public boolean equals(Element compElement) {
return compElement.str.equals(str);
}
}
Then you can just use Set<Element> instead.
Open two Scanners, and :
final TreeSet<Integer> ts1 = new TreeSet<Integer>();
final TreeSet<Integer> ts2 = new TreeSet<Integer>();
while (scan1.hasNextLine() && scan2.hasNexLine) {
ts1.add(Integer.valueOf(scan1.nextLigne().subString(0,10));
ts1.add(Integer.valueOf(scan1.nextLigne().subString(0,10));
}
You can now compare ordered results of the two trees
EDIT
Modified with TreeSet
Put values from each file to two separate HashSets accordingly.
Iterate over one of the HashSets and check whether each value exists in the other HashSet. Report if not.
Iterate over other HashSet and do same thing for this.

How do I read from a File to an array

I am trying to read from a file to an array. I tried two different styles and both aren't working. Below are the two styles.
Style 1
public class FileRead {
int i;
String a[] = new String[2];
public void read() throws FileNotFoundException {
//Z means: "The end of the input but for the final terminator, if any"
a[i] = new Scanner(new File("C:\\Users\\nnanna\\Documents\\login.txt")).useDelimiter("\\n").next();
for(i=0; i<=a.length; i++){
System.out.println("" + a[i]);
}
}
public static void main(String args[]) throws FileNotFoundException{
new FileRead().read();
}
}
Style 2
public class FileReadExample {
private int j = 0;
String path = null;
public void fileRead(File file){
StringBuilder attachPhoneNumber = new StringBuilder();
try{
FileReader read = new FileReader(file);
BufferedReader bufferedReader = new BufferedReader(read);
while((path = bufferedReader.readLine()) != null){
String a[] = new String[3];
a[j] = path;
j++;
System.out.println(path);
System.out.println(a[j]);
}
bufferedReader.close();
}catch(IOException exception){
exception.printStackTrace();
}
}
I need it to read each line of string and store each line in an array. But neither works. How do I go about it?
Do yourself a favor and use a library that provides this functionality for you, e.g.
Guava:
// one String per File
String data = Files.toString(file, Charsets.UTF_8);
// or one String per Line
List<String> data = Files.readLines(file, Charsets.UTF_8);
Commons / IO:
// one String per File
String data = FileUtils.readFileToString(file, "UTF-8");
// or one String per Line
List<String> data = FileUtils.readLines(file, "UTF-8");
It's not really clear exactly what you're trying to do (partly with quite a lot of code commented out, leaving other code which won't even compile), but I'd recommend you look at using Guava:
List<String> lines = Files.readLines(file, Charsets.UTF_8);
That way you don't need to mess around with the file handling yourself at all.

Categories