I've been following various posts on here regarding removing stop words from an ArrayList (More so this one than others). But i've come across some issues when customising this code to suit my needs.
My code reads in two files, a textfile of stop words, and a textfile of data collected from Twitter. I store the stopwords in a HashSet and ultimately want to remove them from the textfile of Twitter data (that's stored in an ArrayList). But the problem i have with my code is that everything works (such as reading the files and appending the output to a file), except for the removal of stopwords.
The files i'm currently using for tests are here
public static void main(String[] args) {
ArrayList<String> listOfWords = new ArrayList<String>();
try {
// Read in sto pwords text file aswell as the textfile to edit
Scanner stopWordsFile = new Scanner(new File("stopwords_twitter.txt"));
Scanner textFile = new Scanner(new File("LiverpoolTest.txt"));
// Create a set for the stop words
Set<String> stopWords = new HashSet<String>();
// For each stopword split them and transform them to lowercase
while (stopWordsFile.hasNext()) {
stopWords.add(stopWordsFile.next().trim());
}
// Creates an empty list for the text files contents
ArrayList<String> words = new ArrayList<String>();
/* For each word in the file correct (removing words between the delimiters)
them and add them to the ArrayList */
while (textFile.hasNextLine()) {
for (String word : textFile.nextLine().trim().toLowerCase()
.replaceAll("/-/-/.*?/-/-/\\s*","").split("/")) {
words.add(word);
}
}
// Iterate over the ArrayList
for(String word : words) {
String wordCompare = word.toLowerCase();
// If the word isn't a stop word, add to listOfWords ArrayList
if (!stopWords.contains(wordCompare)) {
listOfWords.add(word);
}
}
stopWordsFile.close();
textFile.close();
} catch(FileNotFoundException e){
e.printStackTrace();
}
try {
File fileName;
FileWriter fw;
// Create a new textfile for listOfWords
fileName = new File("LiverpoolNoStopWords.txt");
fw = new FileWriter(fileName, true);
// Output listOfWords to a new textfile
for (String str : listOfWords) {
String word = str + "\n";
System.out.print(word);
fw.write(word);
}
fw.close();
} catch(IOException e){
System.err.println("Error. Cannot open file for writing.");
System.exit(1);
}
}
all it took is to debug the program. This is something OP could have done him/herself.
To test the loading of the stop words, I printed the contents of stopWords. It was correct.
To test the parsing of the twitter words, I printed wordCompare right after it is set:
String wordCompare = word.toLowerCase();
System.out.println("|"+wordCompare+"|");
and got this:
|the redmen tv : chris sat down with spanish journalist guillem balague to talk through liverpool’s season as a whole, how real madrid have been playing and how they are likely to play against liverpool tonight!|
||
|watch now:https:|
||
|t.co|
|oqmcx3zs9c|
|subscribe: https:|
||
|t.co|
|tbybrgabge https:|
||
|t.co|
|s6010yicen|
||
|the redmen tv : real madrid v liverpool | https:|
||
|t.co|
|jlwbp8q7bf|
||
|we have hours of build up content including interviews with;|
So obviously, the problem is that the split() doesn't split into words. and indeed, the split was expecting forward slash "/" as delimiter. changed to .split("\\s+") split by whitespace.
added a print in case stop words found
if (!stopWords.contains(wordCompare)) {
listOfWords.add(word);
} else {
System.out.println("####$$");
}
and got this:
|the|
####$$
|redmen|
|tv|
|:|
|chris|
|sat|
####$$
|down|
####$$
|with|
####$$
|spanish|
I am trying to parse a csv file in java and am running in to a problem. When I try to split the csv up like so:
public static void main(String[] args) {
String nameOfFile = "KingstonNorthWard2016Distribution.csv";
File file = new File(nameOfFile);
try {
Scanner inputStream = new Scanner(file);
while (inputStream.hasNext()){
String data = inputStream.next();
String[] values = data.split(",");
System.out.println(Arrays.toString(values));
}
inputStream.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
My console returns this:
[Distribution]
[Report]
[]
[Print]
[Date/Time:]
[31/10/2016]
[07:27:10PM]
[Bayside]
[City]
[Council]
[2016]
Despite my csv looking like this:
Distribution Report ,,,,,,,,,,,,,,,,,,,,,,,,,,
Print Date/Time: 31/10/2016 07:27:10PM,,,,,,,,,,,,,,,,,,,,,,,,,,
Bayside City Council 2016,,,,,,,,,,,,,,,,,,,,,,,,,,
Central Ward,,,,,,,,,,,,,,,,,,,,,,,,,,
What I can't understand is why my console doesn't look like this:
[Distribution report]
[Print Date/Time: 31/10/2016 07:27:10PM]
[Bayside City Council 2016]
[Central Ward]
If anyone could help that would by great. For extra points, the csv later goes on to list names like "Smith, John" so bear that in mind if my split is in need of change. Thanks in advance.
hasNext and next iterate over words, you want hasNextLine and nextLine.
As for the fields which contain your delimiter, we'd have to look at a sample from your dataset to try and see if there is a rule we can define which shows a delimiter can be ignored by split.
I need to read a file, completely and split the strings inside the file and store it in a variable using Java
See below example, my text file contains
devarajan 1000210 08754540275 600019
ramesh 1000210 08754540275 600019
udhay 1000210 08754540275 600019
I tired using string position but it is not working out.
Please find attached sample file as well. Regards
My Code:
public class Program {
public static void main(String[] args) {
String line = "devarajan 1000210 08754540275 600019 ";
String[] words = line.split("\\W+");
for (String word : words) {
System.out.println(word);
}
}
}
Output:
devarajan 1000210 08754540276
My file will contain the list of string 10-10 position will be name 20-30 position will be empid 30-40 will phone number. so while i used the previous snippet i am getting blank spaces "devarajan" " 1000210".. i should avoid that blank spaces.
In turn my code is splitting up as soon as it encounters blank space, instead of position
#Twelve, # Kick : I am getting the output as follows for your snippet
but imagine if i have a space in my name ex: "twelve dollar" instead of "
twelvedollar", then the name will get split and stored in different array position. and that is the reason, i have asked whether it is possible to split the string based on the position
just one way to do it ..
try {
Scanner inFile = new Scanner(new File("myInputFile.txt"));
String[] data;
ArrayList<String[]> arr = new ArrayList<String[]>();
while (inFile.hasNext()) {
data = inFile.nextLine().split("\\s+"); // or split("\t") if using tabs
System.out.println(Arrays.toString(data));
arr.add(data);
}
}
catch (FileNotFoundException fe) {
fe.printStackTrace();
}
I am trying to read a text file in order to copy some parts of it into a new text file.
This is how I create my Scanner item :
// folder
File vMainFolder = new File(System.getProperty("user.home"),"LightDic");
if (!vMainFolder.exists() & !vMainFolder.mkdirs()) {
System.out.println("Missing LightDic folder.");
return;
}
// file
System.out.println("Enter the source file's name : ");
Scanner vSc = new Scanner(System.in);
String vNomSource = vSc.next();
Scanner vSource;
try {
vSource = new Scanner(new File(vMainFolder, vNomSource+".txt"));
} catch (final java.io.FileNotFoundException pExp) {
System.out.println("Dictionnary not found.");
return;
}
And this is how I wrote my while structure :
while (vSource.hasNextLine()) {
System.out.println("test : entering the loop");
String vMot = vSource.nextLine(); /* edit : I added this statement, which I've forgotten in my previous post */
}
When executing the program, it never prints "test : entering the loop".
Of course, this file I am testing is not empty, it is a list of words like so :
a
Ã
abaissa
abaissable
abaissables
abaissai
I don't understand what I did wrong, I've used this method a few times in the past.
Problem solved.
I don't really know why, but I solved the problem changing the encoding of my FRdic.txt file from ANSI to UTF-8, and then the file was read.
Thanks to everyone who tried to help me.
Is it normal that a text file encoded in ANSI is not read by the JVM ?
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Java - Find a line in a file and remove
I have a code that Get id number and search for its records, if exist, display it.
I want if found, delete it record.
One solution for delete a line( a user record) is create another file and copy all lines without found record.
can anyone tell me another solution? (Simple solution)
my BookRecords.txt file is this:
Name Date Number
one 2002 22
two 2003 33
three 2004 44
four 2005 55
my Code to find :
String bookid=jTextField2.getText();
File f=new File("C:\\BookRecords.txt");
try{
FileReader Bfr=new FileReader(f);
BufferedReader Bbr=new BufferedReader(Bfr);
String bs;
while( (bs=Bbr.readLine()) != null ){
String[] Ust=bs.split(" ");
String Bname=Ust[0];
String Bdate=Ust[1];
String id = Ust[2];
if (id.equals(bookid.trim())
jLabel1.setText("Book Found, "+ Bname + " " + Bdate);
break;
}
}
}
catch (IOException ex) {
ex.printStackTrace();
}
please help to delete a Line(a Record)
Thanks.
Working on a single text file is - uhm - a bit strange. But I would recommend, that you create a new text file (output):
PrintWriter out = new PrintWriter(new FileWriter("output.txt"));
Only write the lines that don't match the book's ID.
while (...) {
...
if (!id.equals(bookid.trim())) {
out.println(bs);
}
}
out.close();
Later you can rename the file, if you like.
replace the entire line in the text file with a backspace character when found
\b