I am trying to compare a .txt file that has a list of words, and a String[] array that is also filled with words.
Solved thank you.
Assuming you're ultimately just trying to get a list of words that are in both files:
Scanner fileReader = new Scanner(file);
Set<String> words = new HashSet<>();
while (fileReader.hasNext()) {
String s = fileReader.next();
words.add(s);
}
fileReader.close();
Scanner otherFileReader = new Scanner(otherFile);
List<String> wordsInBothFiles = new ArrayList<>();
while (otherFileReader.hasNext()) {
String s = otherFileReader.next();
if (words.contains(s)) {
wordsInBothFiles.add(s);
}
}
otherFileReader.close();
// Do whatever it is you have to do with the shared words, like printing them:
// for (String s : wordsInBothFiles) {
// System.out.println(s);
// }
If you check the documentation it will usually explain why a method throws an exception. In this case "no line was found" means you've hit the end of your file. There are two possible ways this error could come about:
String nextLine = scanner.nextLine(); //problem 1: reads a file with no lines
while (scanner.hasNextLine()) {
linearSearch(words,nextLine);
System.out.println(nextLine);
}
scanner.nextLine(); //problem 2: reads after there is not next line
Since you loop appears to be infinite I'd wager you're getting the exception from the first line and can fix it by adding the following check before String nextLine = scanner.nextLine();:
if(!scanner.hasNextLine()) {
System.out.println("empty file: "+filePath)
return; //or break or otherwise terminate
}
Beyond that you may still have some other issues but hopefully this resolves your present problem.
Related
I'm building a program that reads in a text file of stop words, then reads in a text file of tweets collected from Twitter. I'm trying to remove the stop words from the collection of tweets, so that i'm just left with the 'interesting' vocabulary, then in turn it prints them out to the console.
However, nothing is printing out to the console, so it's clear that it is not working... It was working prior to the importation of the test.txt file (when i was using a string created in the program, splitting it, then storing it in an array).
Any help with reading in the test.txt file and pulling out the stop words, then printing the listOfWords list to the console.
Any help would be appreciated
import java.util.*;
import java.io.*;
public class RemoveStopWords {
public static void main(String[] args) {
try {
Scanner stopWordsFile = new Scanner(new File("stopwords_twitter.txt"));
Scanner textFile = new Scanner(new File("Test.txt"));
// Create a set for the stop words (a set as it doesn't allow duplicates)
Set<String> stopWords = new HashSet<String>();
// For each word in the file
while (stopWordsFile.hasNext()) {
stopWords.add(stopWordsFile.next().trim().toLowerCase());
}
// Splits strings and stores each word into a list
ArrayList<String> words = new ArrayList<String>();
while (stopWordsFile.hasNext()) {
words.add(textFile.next().trim().toLowerCase());
}
// Create an empty list (a list because it allows duplicates)
ArrayList<String> listOfWords = new ArrayList<String>();
// Iterate over the array
for(String word : words) {
// Converts current string index to lowercase
String toCompare = word.toLowerCase();
// If the word isn't a stop word, add to listOfWords list
if (!stopWords.contains(toCompare)) {
listOfWords.add(word);
}
}
stopWordsFile.close();
textFile.close();
for (String str : listOfWords) {
System.out.print(str + " ");
}
} catch(FileNotFoundException e){
e.printStackTrace();
}
}
}
You have two while (stopWordsFile.hasNext()), the second one will always return false:
// For each word in the file
while (stopWordsFile.hasNext()) {
stopWords.add(stopWordsFile.next().trim().toLowerCase());
}
// Splits strings and stores each word into a list
ArrayList<String> words = new ArrayList<String>();
while (stopWordsFile.hasNext()) {
words.add(textFile.next().trim().toLowerCase());
}
You should use
while (textFile.hasNext())
instead
while (stopWordsFile.hasNext())
at the second one.
The problem is you are reading words from your file twice:
while (stopWordsFile.hasNext()) { // this will never execute as stopWordsFile has no nextElement left
words.add(textFile.next().trim().toLowerCase());
}
Therefore change your second while condition to :
while (textFile.hasNext()) {
words.add(textFile.next().trim().toLowerCase());
}
copy your file into another file by reading it line by line and with each iteration (each line) tested if you have a line containing a 'stopword' if it is the case you remove it from the line and you copy the line in your file otherwise copies the line as it is
I am trying to search for words within a text file and replace all upper-cased with lower-cased characters. The problem is that when I use the replace All function using a regular expression I get a syntax error. I have tried different tactics, but it doesn't work. Any tips? I think that maybe I should create a replace All method that I would have to invoke, but I don't really see its use.
public static void main() throws FileNotFoundException {
ArrayList<String> inputContents = new ArrayList<>();
Scanner inFile =
new Scanner(new FileReader("H:\\csc8001\\data.txt"));
while(inFile.hasNextLine())
{
String line = inFile.nextLine();
inputContents.add(inFile.nextLine());
}
inFile.close();
ArrayList<String> dictionary = new ArrayList<>();
for(int i= 0; i <inputContents.size(); i++)
{
String newLine = inFile.nextLine();
newLine = newLine(i).replaceAll("[^A-Za-z0-9]");
dictionary.add(inFile.nextLine());
}
// PrintWriter outFile =
// new PrintWriter("H:\\csc8001\\results.txt");
}
There is a compilation error on this line:
newLine = newLine(i).replaceAll("[^A-Za-z0-9]");
Because replaceAll takes 2 parameters: a regex and a replacement.
(And because newLine(i) is non-sense.)
This should be closer to what you need:
newLine = newLine.replaceAll("[^A-Za-z0-9]+", " ");
That is, replace non-empty sequences of non-[A-Za-z0-9] characters with a space.
To convert all uppercase letters to lowercase, it's simpler and better to use toLowerCase.
There are many other issues in your code too. For example, some lines in the input will be skipped, due to some inappropriate inFile.nextLine calls. Also, the input file is closed after the first loop, but the second tries to use it, which makes no sense.
With these and a few other issues cleaned up, this should be closer to what you want:
Scanner inFile = new Scanner(new FileReader("H:\\csc8001\\data.txt"));
List<String> inputContents = new ArrayList<>();
while (inFile.hasNextLine()) {
inputContents.add(inFile.nextLine());
}
inFile.close();
List<String> dictionary = new ArrayList<>();
for (String line : inputContents) {
dictionary.add(line.replaceAll("[^A-Za-z0-9]+", " ").toLowerCase());
}
If you want to add words to the dictionary instead of lines, you also need to split the lines on spaces. One simple way to achieve that:
dictionary.addAll(Arrays.asList(line.replaceAll("[^A-Za-z0-9]+", " ").toLowerCase().split(" ")));
I am using 'java.util.Scanner' to read and scan for keywords and want to print the previous 5 lines and next 5 lines of the encountered keyword, below is my code
ArrayList<String> keywords = new ArrayList<String>();
keywords.add("ERROR");
keywords.add("EXCEPTION");
java.io.File file = new java.io.File(LOG_FILE);
Scanner input = null;
try {
input = new Scanner(file);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
int count = 0;
String previousLine = null;
while(input.hasNext()){
String line = input.nextLine();
for(String keyword : keywords){
if(line.contains(keyword)){
//print prev 5 lines
system.out.println(previousLine); // this will print only last previous line ( i need last 5 previous lines)
???
//print next 5 lines
system.out.println(input.nextLine());
system.out.println(input.nextLine());
system.out.println(input.nextLine());
system.out.println(input.nextLine());
system.out.println(input.nextLine());
}
previousLine = line;
}
any pointers to print previous 5 lines..?
any pointers to print previous 5 lines..?
Save them in an Dequeue<String> such as a LinkedList<String> for its "First In First Out (FIFO)" behavior.
Either that or use 5 variables or an array of 5 Strings, manually move Strings from one slot or variable to another, and then print them.
If you use Dequeue/LinkedList, use the Dequeue's addFirst(...) method to add a new String to the beginning and removeLast() to remove the list's last String (if its size is > 5). Iterate through the LinkedList to get the current Strings it contains.
Other suggestions:
Your Scanner's check scanner.hasNextXXX() method should match the get method, scanner.nextXXX(). So you should check for hasNextLine() if you're going to call nextLine(). Otherwise you risk problems.
Please try to post real code here in your questions, not sort-of, will never compile code. i.e., system.out.println vs System.out.println. I know it's a little thing, but it means a lot when others try to play with your code.
Use ArrayList's contains(...) method to get rid of that for loop.
e.g.,
LinkedList<String> fivePrevLines = new LinkedList<>();
java.io.File file = new java.io.File(LOG_FILE);
Scanner input = null;
try {
input = new Scanner(file);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
while (input.hasNextLine()) {
String line = input.nextLine();
if (keywords.contains(line)) {
System.out.println("keyword found!");
for (String prevLine : fivePrevLines) {
System.out.println(prevLine);
}
} else {
fivePrevLines.addFirst(line);
if (fivePrevLines.size() > 5) {
fivePrevLines.removeLast();
}
}
}
if (input != null) {
input.close();
}
Edit
You state in comment:
ok i ran small test program to see if the contains(...) method works ...<unreadable unformatted code>... and this returned keyword not found...!
It's all how you use it. The contains(...) method works to check if a Collection contains another object. It won't work if you feed it a huge String that may or may not use one of the Strings in the collection, but will work on the individual Strings that comprise the larger String. For example:
ArrayList<String> temp = new ArrayList<String>();
temp.add("error");
temp.add("exception");
String s = "Internal Exception: org.apache.tomcat.dbcp.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object";
String[] tokens = s.split("[\\s\\.:,]+");
for (String token : tokens) {
if (temp.contains(token.toLowerCase())) {
System.out.println("keyword found: " + token);
} else {
System.out.println("keyword not found: " + token);
}
}
Also, you will want to avoid posting code in comments since they don't retain their formatting and are unreadable and untestable. Instead edit your original question and post a comment to alert us to the edit.
Edit 2
As per dspyz:
For stacks and queues, when there isn't any significant functionality/performance reason to use one over the other, you should default to ArrayDeque rather than LinkedList. It's generally faster, takes up less memory, and requires less garbage collection.
If your file is small (< a million lines) you are way better off just copying the lines into an ArrayList and then getting the next and previous 5 lines using random access into the array.
Sometimes the best solution is just plain brute force.
Your code is going to get tricky if you have two keyword hits inside your +-5 line window. Let's say you have hits two lines apart. Do you dump two 10-line windows? One 12-line window?
Random access will make implementing this stuff way easier.
Im working on the question below and am quite close but in line 19 and 32 I get the following error and cant figure it out.
foreach not applicable to expression type
for (String place: s)
Question:
Tax inspectors have available to them two text files, called unemployed.txt and taxpayers.txt, respectively. Each file contains a collection of names, one name per line. The inspectors regard anyone who occurs in both files as a dodgy character. Write a program which prints the names of the dodgy characters. Make good use of Java’s support for sets.
My code:
class Dodgy {
public static void main(String[] args) {
HashSet<String> hs = new HashSet<String>();
Scanner sc1 = null;
try {sc1 = new Scanner(new File("taxpayers.txt"));}
catch(FileNotFoundException e){};
while (sc1.hasNextLine()) {
String line = sc1.nextLine();
String s = line;
for (String place: s) {
if((hs.contains(place))==true){
System.out.println(place + " is a dodgy character.");
hs.add(place);}
}
}
Scanner sc2 = null;
try {sc2 = new Scanner(new File("unemployed.txt"));}
catch(FileNotFoundException e){};
while (sc2.hasNextLine()) {
String line = sc2.nextLine();
String s = line;
for (String place: s) {
if((hs.contains(place))==true){
System.out.println(place + " is a dodgy character.");
hs.add(place);}
}
}
}
}
You're trying to iterate over "each string within a string" - what does that even mean?
It feels like you only need to iterate over each line in each file... you don't need to iterate within a line.
Secondly - in your first loop, you're only looking at the first file, so how could you possibly detect dodgy characters?
I would consider abstracting the problem to:
Write a method to read a file and populate a hash set.
Call that method twice to create two sets, then find the intersection.
Foreach is applicable for only java.lang.Iterable types. Since String is not, so is the error.
If your intention is to iterate characters in the string, then replace that "s" with "s.toCharArray()" which returns you an array that is java.lang.Iterable.
I'm reading a text file line by line and converting it into a string.
I'm trying to figure out how to check if the last line of the file is a specific word ("FILTER").
I've tried to use the endsWith(String) method of String class but it's not detecting the word when it appears.
Rather naive solution, but this should work:
String[] lines = fileContents.split("\n");
String lastLine = lines[lines.length - 1];
if("FILTER".equals(lastLine)){
// Do Stuff
}
Not sure why .endsWith() wouldn't work. Is there an extra newline at the end? (In which case the above wouldn't work). Do the cases always match?
.trim() your string before checking with endsWith(..) (if the file really ends with the desired string. If not, you can simply use .contains(..))
public static boolean compareInFile(String inputWord) {
String word = "";
File file = new File("Deepak.txt");
try {
Scanner input = new Scanner(file);
while (input.hasNext()) {
word = input.next();
if (inputWord.equals(word)) {
return true;
}
}
} catch (Exception error) {
}
return false;
}
With
myString.endsWith("FILTER")
the very last characters of the last line are checked. Maybe the method
myString.contains("FILTER")
is the right method for you? If you only want to check the last ... e.g.20 chars try to substring the string and then check for the equals method.