Words read from file are null - java

I am attempting to read some words off of the file "words.txt", then use them in other classes of my program when it runs. This is what I have found on the internet, and it doesn't seem to be working properly.
public static List<String> wordsList;
public static void refreshWords(){
String fileName = "words.txt";
String line = null;
try {
FileReader fileReader =
new FileReader(fileName);
BufferedReader bufferedReader =
new BufferedReader(fileReader);
while((line = bufferedReader.readLine()) != null) {
for(String tempWord : line.split(" ")){
wordsList.add(tempWord);
}
}
bufferedReader.close();
}
catch(FileNotFoundException ex) {
System.out.println(
"Unable to open file '" +
fileName + "'");
}
catch(IOException ex) {
System.out.println(
"Error reading file '"
+ fileName + "'");
}
}
public static List<String> getListOfWords(){
return wordsList;
}
I, from the message displayed before the program even runs, which cancels the entire thing, can determine that the error is sparking from adding tempWord to wordsList. I would assume that tempWord is null, but I can't seem to find a reason why it is.
All that I have in the file are a bunch of random words that I thought of off the top of my head, formatted like the following:
this game turtle forest soccer football ball java list annoyed

What you are using there is the old way of doing it (before Java 7).
With Java 7 / 8, reading a file is much easier. So rather than looking for bugs, I'd rewrite this using the new API:
List<String> lines = Files.readAllLines(yourFile.toPath(), StandardCharsets.UTF_8);
See Files.readAllLines(Path, Charset)
Also, in your question, you are splitting lines into words. That's highly unusual, word lists are almost always one word per line.

Related

Creating an inverted index with limited memory in java

Im curious on how create an Inverted Index on data that doesn't fit into memory. So right now I'm reading a file directory and indexing the files based on the contents inside the file, I am using a HashMap to store the index. The code below is a snippet from a function I use and I call the function on an entire directory. What do I do if this directory was just massive and the HashMap can't fit all the entries. Yes, This does sound like premature optimization. Im just having fun. I don't want to use Lucene so don't even mention it because I'm tired as to seeing that as the majority answer to "Index" stuff. This HashMap is my only constraint everything else is stored in files to easily reference stuff later on.
Im just curious how I can do this since it stores it in the map like so
keyword -> file1,file2,file3,etc..(locations)
keyword2 -> file9,file11,file13,etc..(locations)
My thoughts were to create a file which would some how be able to update itself to be like the format above but I feel thats not efficient.
Code Snippet
br = new BufferedReader(new FileReader(file));
while ((line = br.readLine()) != null) {
for (String _word : line.split("\\W+")) {
word = _word.toLowerCase();
if (!ignore_words.contains(word)) {
fileLocations = index.get(word);
if (fileLocations == null) {
fileLocations = new LinkedList<Long>();
index.put(word, fileLocations);
}
fileLocations.add(file_offset);
}
}
}
br.close();
Update:
So I managed to come up with something, but performance wise I feel this is slow, especially if there was a large amount of data. I basically created a file that would just have to word and its offset on each line the word appeared.Lets name it index.txt.
It had the format of like so
word1:offset
word2:offset
word1:offset <-encountered again.
word3:offset
etc...
I then created multiple files for each word and appended the offset to that file each time it was encountered in the index.txt file.
So basically the format of the word files are like so
word1.txt -- Format
word1:offset1:offset2:offset3:offset4...and so on
each time word1 is encountered in the index.txt file it would append it to the word1.txt file and add to end.
Then finally, I go through all the word files I created and overwrite the index.txt file with the final output in the index file looking like so
word1:offset1:offset2:offset3:offset4:...
word2:offset9:offset11:offset13:offset14:...
etc..
Then to finish it up, I delete all the word files.
The nasty code snippet for this is below, its a fair amount.
public void createIndex(String word, long file_offset)
{
PrintWriter writer;
try {
writer = new PrintWriter(new FileWriter(this.file,true));
writer.write(word + ":" + file_offset + "\n");
writer.close();
}
catch (IOException ioe)
{
ioe.printStackTrace();
}
}
public void mergeFiles()
{
String line;
String wordLine;
String[] contents;
String[] wordContents;
BufferedReader reader;
BufferedReader mergeReader;
PrintWriter writer;
PrintWriter mergeWriter;
try {
reader = new BufferedReader(new FileReader(this.file));
while((line = reader.readLine()) != null)
{
contents = line.split(":");
writer = new PrintWriter(new FileWriter(
new File(contents[0] + ".txt"),true));
if(this.words.get(contents[0]) == null)
{
this.words.put(contents[0], contents[0]);
writer.write(contents[0] + ":");
}
writer.write(contents[1] + ":");
writer.close();
}
//This could be put in its own method below.
mergeWriter = new PrintWriter(new FileWriter(this.file));
for(String word : this.words.keySet())
{
mergeReader = new BufferedReader(
new FileReader(new File(word + ".txt")));
while((wordLine = mergeReader.readLine()) != null)
{
mergeWriter.write(wordLine + "\n");
}
}
mergeWriter.close();
deleteFiles();
}
catch(IOException ioe)
{
ioe.printStackTrace();
}
}
public void deleteFiles()
{
File toDelete;
for(String word : this.words.keySet())
{
toDelete = new File(word + ".txt");
if(toDelete.exists())
{
toDelete.delete();
}
}
}

How do I add lines (read from a .txt) separated by AN ENTER KEY (in .txt) into separate elements of string arrayList?

I'm reading .txt file into my program and am adding lines of the .txt into a String arrayList. How do I add lines DELINEATED BY AN ENTER KEY (in .txt) into separate elements of the arrayList? Right now if I had the following written in text:
this is a test
test
test test
It would output:
this is a testtesttest test
What I want it to do is read things on a per line basis, and put it into different elements of the stringArrayList. So I want "this is a test" to be an element, and "test", and then finally "test test".
My code is really ugly, but right now all I want to do is get it to work for my purpose. My first purpose is getting to read a .txt by line. My second purpose is going to be parsing an element for a particular substring (a URL), connecting that URL to the internet, and then comparing a part of that page source of the webpage (parsing for a particular keyword) to the line ABOVE the substring I desire. But that's a question for another time :^)
import java.io.*;
import java.util.*;
public class Test {
public static void main(String [] args) {
// The name of the file to open.
String fileName = "test.txt";
List<String> listA = new ArrayList<String>();
// This will reference one line at a time
String line = null;
try {
// FileReader reads text files in the default encoding.
FileReader fileReader = new FileReader(fileName);
// Always wrap FileReader in BufferedReader.
BufferedReader bufferedReader = new BufferedReader(fileReader);
while((line = bufferedReader.readLine()) != null) {
System.out.println(line);
listA.add(line);
//*** THIS IS WHERE THE MAGIC HAPPENS ***\\ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
}
// Always close files.
bufferedReader.close();
}
catch(FileNotFoundException ex) {
System.out.println(
"Unable to open da file ofheee hah. '" +
fileName + "'");
}
catch(IOException ex) {
System.out.println(
"Error reading file '"
+ fileName + "'");
// Or we could just do this:
// ex.printStackTrace();
}
System.out.println();
System.out.println("array FOr loop thingy incoming:");
System.out.println();
for (int i = 0; i < listA.size(); i++) {
System.out.print((listA.get(i)).toString());
}
}
}
You just have to use println instead of print:
System.out.println((listA.get(i)).toString());
Alternatively, you can add the line break character \n
Your code seems to be working so far. If you just want to see what elements are in listA, just print it out:
System.out.println(listA);
Output:
[this is a test, , test, , test test, ]
Note that the extra lines in your input file are also being stored in listA. I'm not sure if that's the behavior you want.

How to read a file in Android

I have a text file called "high.txt". I need the data inside for my Android app. But I have absolutely no idea how to read it into an ArrayList of the Strings. I tried the normal way of doing it in Java but apparently that doesn't work in Android since it cant find the file. So how do I go about doing this? I have put it in my res folder. But how do you take the input stream that you get from opening the file within Android and read it into an ArrayList of Strings. I am stuck on that part.
Basically it would look something like this:
3. What do you do for an upcoming test?
L: make sure I know what I'm studying and really review and study for this thing. Its what Im good at. Understand the material really well.
CL: Time to study. I got this, but I really need to make sure I know it,
M: Tests can be tough, but there are tips and tricks. Focus on the important, interesting stuff. Cram in all the little details just to get past this test.
CR: -sigh- I don't like these tests. Hope I've studied enough to pass or maybe do well.
R: Screw the test. I'll study later, day before should be good.
This is for a sample question and all the lines will be stored as separate strings in the array list.
If you put the text file in your assets folder you can use code like this which I've taken and modified from one of my projects:
public static void importData(Context context) {
try {
BufferedReader br = new BufferedReader(new InputStreamReader(context.getAssets().open("high.txt")));
String line;
while ((line = br.readLine()) != null) {
String[] columns = line.split(",");
Model model = new Model();
model.date = DateUtil.getCalendar(columns[0], "MM/dd/yyyy");
model.name = columns[1];
dbHelper.insertModel(model);
}
} catch (IOException e) {
e.printStackTrace();
}
}
Within the loop you can do anything you need with the columns, what this example is doing is creating an object from each row and saving it in the database.
For this example the text file would look something like this:
15/04/2013,Bob
03/03/2013,John
21/04/2013,Steve
If you want to read file from External storage than use below method.
public void readFileFromExternal(){
String path = Environment.getExternalStorageDirectory().getPath()
+ "/AppTextFile.txt";
try {
BufferedReader reader = new BufferedReader(new FileReader(path));
String line, results = "";
while( ( line = reader.readLine() ) != null)
{
results += line;
}
reader.close();
Log.d("FILE","Data in your file : " + results);
} catch (Exception e) {
}
}
//find all files from folder /assets/txt/
String[] elements;
try {
elements = getAssets().list("txt");
} catch (IOException e) {
e.printStackTrace();
}
//for every files read text per line
for (String fileName : elements) {
Log.d("xxx", "File: " + fileName);
try {
InputStream open = getAssets().open("txt/" + fileName);
InputStreamReader inputStreamReader = new InputStreamReader(open);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
String line = "";
while ((line = bufferedReader.readLine()) != null) {
Log.d("xxx", line);
}
} catch (IOException e) {
e.printStackTrace();
}
}

Java BufferedWriter isn't working

Im having a problem with a BufferedWriter. I am reading in a 50,000 word wordlist, using a stemming algorithm and creating a new wordlist that just contains the word stems. Instead of this new file containing any stems however it litrally just contains:
-
Here is my code:
public static void main(String[] args) {
BufferedReader reader=null;
BufferedWriter writer=null;
try {
writer = new BufferedWriter(new FileWriter(new File("src/newwordlist.txt")));
HashSet<String> db = new HashSet<String>();
reader = new BufferedReader(new InputStreamReader(new FileInputStream("src/wordlist"),"UTF-8"));
String word;
int i=0;
while ((word=reader.readLine())!=null) {
i++;
Stemmer s= new Stemmer();
s.addword(word);
s.stem();
String stem =s.toString();
if(!db.contains(stem)){
db.add(stem);
writer.write(stem);
//System.out.println(stem);
}
}
System.out.println("Reduced file from " + i + " words to " + db.size());
reader.close();
writer.close();
} catch (IOException e1) {
e1.printStackTrace();
}
}
The output i get on the console is:
Reduced file from 58110 words to 28201
So i know its working. Ive also tried changing writer.write(stem); to writer.write("hi"); and I still get the same output in newwordlist.txt.
I know its no fault of the Stemmer class, Ive tried outputting the stem string (where I commented the code) and that produced the correct output to console so the fault must be with the writer but I dont understand what.
Edit 1
I simplified to code to:
BufferedReader reader=null;
BufferedWriter writer=null;
try {
writer = new BufferedWriter(new FileWriter(new File("src/newwordlist.txt")));
HashSet<String> db = new HashSet<String>();
reader = new BufferedReader(new InputStreamReader(new FileInputStream("src/wordlist.txt"),"UTF-8"));
String word;
int i=0;
while ((word=reader.readLine())!=null) {
i++;
if(!db.contains(word)){
db.add(word);
writer.write("hi");
}
}
System.out.println("Reduced file from " + i + " words to " + db.size());
reader.close();
writer.close();
} catch (IOException e1) {
e1.printStackTrace();
}
Now i get console output:
Reduced file from 58110 words to 58109
But the output file is still blank
I would expect the code as given in the Question to produce a file that consists of one line, consisting of all of the "stems" concatenated. (Or in the "hi" version, one line consisting of "hihihi...." repeated a large number of times.)
It is conceivable that whatever you are using to view the file cannot cope with an input file that consists of many thousands of characters ... and no end-of-line.
Change
writer.write(stem);
to
writer.write(stem);
writer.write(EOL);
where EOL is the platform specific end-of-line sequence.
Assuming you are using Java 7, it would be better to use try-with-resource to make sure that the output stream is always closed / flushed, even if there is an error:
public static void main(String[] args) {
try (BufferedReader reader = new BufferedReader(
new InputStreamReader(new FileInputStream("src/wordlist"), "UTF-8"));
BufferedWriter writer = new BufferedWriter(new FileWriter(
new File("src/newwordlist.txt")));
HashSet<String> db = new HashSet<>();
String EOL = System.getProperty("line.separator");
String word;
int i = 0;
while ((word = reader.readLine()) != null) {
i++;
Stemmer s = new Stemmer();
s.addword(word);
s.stem();
String stem = s.toString();
if (db.add(stem)) {
writer.write(stem);
writer.write(EOL);
}
}
System.out.println("Reduced file from " + i + " words to " + db.size());
} catch (IOException e1) {
e1.printStackTrace();
}
}
(I tidied up a couple of other things too ...)
The reason you get Reduced file from 58110 words to 58109 console output is that you only have one System.out.println statement after the loop.
The writer should write words only to the output file src/newwordlist.txt and not to the console. If you want your program to output words to the console add additional System.out.println(word) after writer.write("hi");
Hope this helps...
Works for me. Is this your exact class, did you edit it before pasting in?
wordlist;
the
cat
sat
on
the
mat
newwordlist.txt;
thecatsatonmat
My Stemmer just returns the word you gave it.
public class Stemmer {
private String word;
public void addword(String word) {
this.word = word;
}
public void stem() {
// TODO Auto-generated method stub
}
#Override
public String toString() {
return word;
}
}
According to the Java documentation you need to use BufferedWriter.write() as follows:
write(string,offset,length);
so try:
writer.write(stem,0,stem.length());
When I run your edited code I get one line with
hihihihihihihihihihihihihi ............
As expected.
Perhaps you intended to add newline characters line this.
if(!db.contains(word)){
db.add(word);
writer.write(word);
writer.write("\n");
}

iterating over files giving File not found exception even when the file is there in the dir. Can you please tell me why? Thanks

public static void main(String[] args)
{
try
{
File dir = new File("D:\\WayneProject\\Logs");
if(dir.isDirectory())
{
for(File child: dir.listFiles()) //NOT WORKING AFTER 1 ITERATION
{
if(child.isFile())
{
String currentFile = child.getName();
String[] fileOutput = currentFile.split("\\.");
processFile(currentFile,fileOutput[0]);
}
}
}
}
Please check comments. Iterating over files giving File not found exception (for the second iteration) even when the file is there in the dir. Can you please tell me why? Thanks
My other function. The fileOutput is used to set the name of the destination file:
public static void processFile(String fileName, String fileOutput)
{
try
{
BufferedReader br = new BufferedReader(new FileReader(fileName));
String str = null;
File fileDest1 = new File("D:\\" + fileOutput + "1.csv");
BufferedWriter wr1 = new BufferedWriter(new FileWriter(fileDest1));
File fileDest2 = new File("D:\\" + fileOutput + "2.csv");
BufferedWriter wr2 = new BufferedWriter(new FileWriter(fileDest2));
wr1.write("Date, Memory Free\n");
wr2.write("Date, %Idle\n");
while((str=br.readLine()) != null)
{
String[] st = str.split("\\s+");
if (st[0].equals("MemFree:"))
{
wr1.write(st[1] + ",\n");
}
if(isDouble(st))
{
wr2.write(st[6] + "," + "\n");
}
if(isDate(st[0]))
{
String subStr = str.substring(0, 20);
wr1.write(subStr + ",");
wr2.write(subStr + ",");
}
}
br.close();
wr1.close();
wr2.close();
}
catch (FileNotFoundException e)
{
e.printStackTrace();
}
catch(IOException e)
{
e.printStackTrace();
}
}
I suspect this is the problem in two ways:
String currentFile = child.getName();
String[] fileOutput = currentFile.split(".");
processFile(currentFile,fileOutput[0]);
getName() only returns the last part of the filename - the name within the directory. So unless your processFile part then puts the directory part back, you're asking it to process a file within the current working directory.
split takes a regular expression. By providing . as the regular expression, you're splitting on every character. I strongly suspect you actually want currentFile.split("\\.") which will actually split on a dot.
You haven't given any indication of what processFile is doing, but I suspect at least one of those is the root cause, and probably both.
It's worth taking a step back and looking at your diagnostics here, too. If you look at what's being passed to processFile you should be able to understand what's wrong - that it's not a problem with the file system, it's a problem with how you're computing the arguments to processFile. Being able to diagnose errors like this is a very important part of software development.
Your code works fine for me. What error you might be having is in processFile function you are creating a file object from the fileName, which is not existing. Then might be trying to read the contents of the file which might be throwing you FileNotFoundException. just comment processFile function and your code will work.

Categories