Counting Postive and Negative words in a file using dictionaries (Java) - java

I'm trying to determine the number of occurrences positive and negative words in a file to calculate whether the file has a positive or a negative tone.
I'm currently having issues trying to parse a file for the number of positive and negative words contained in the file. At the moment, I'm currently using a BufferedReader to read the main file I'm trying to determine the positive and negative words from as well as the two files containing the dictionary of positive and negative words. However the problem I'm having is its comparing each word with the corresponding word number in the positive and negative files.
Here is my current code:
import java.io.*;
import java.util.Scanner;
public class ParseTest {
public static void main(String args[]) throws IOException
{
File file1 = new File("fileforparsing");
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file1)));
File file2 = new File("positivewordsdictionary");
BufferedReader br1 = new BufferedReader(new InputStreamReader(new FileInputStream(file2)));
int positive = 0;
Scanner sc1 = new Scanner(br);
Scanner sc2 = new Scanner(br1);
while (sc1.hasNext() && sc2.hasNext()) {
String str1 = sc1.next();
String str2 = sc2.next();
if (str1.equals(str2))
positive = positive +1;
}
while (sc2.hasNext())
System.out.println(positive);
sc1.close();
sc2.close();
}
}
I know whats wrong whereby the scanner is just constantly moving to the next line when I'd like the original file to stay on the same line until it has finished parsing it against the dictionary but I'm not really sure how to make it do what I want. Any help would be greatly appreciated.
Thank you in advance.

This won't work. You would need to reopen the dictionary file every time. The other thing is that it will be awfully slow. If the dictionaries are not too large, you should load them in memory and then do a read only on the file you're trying to analyze.
public static void main(String args[]) throws IOException {
Set<String> positive = loadDictionary("positivewordsdictionary");
Set<String> negative = loadDictionary("negativewordsdictionary");
File file = new File("fileforparsing");
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
Scanner sc = new Scanner(br);
String word;
long positiveCount = 0;
long negativeCount = 0;
while (sc.hasNext()) {
word = sc.next();
if (positive.contains(word)) {
System.out.println("Found positive "+positiveCount+":"+word);
positiveCount++;
}
if (negative.contains(word)) {
System.out.println("Found negative "+positiveCount+":"+word);
negativeCount++;
}
}
br.close();
}
public static Set<String> loadDictionary(String fileName) throws IOException {
Set<String> words = new HashSet<String>();
File file = new File(fileName);
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
Scanner sc = new Scanner(br);
while (sc.hasNext()) {
words.add(sc.next());
}
br.close();
return words;
}
Update: I've tried running the code and it's working.

Bad approach.. Dont open 2 files simultaneously... First open your positive words file.. Take data out and store it as keys in a Map. Now, do the same for negative words file... Now start reading the file line by line and check if the read String contains positive/negative word.. if yes, increase the count (value of map. initialize values to 0 at the beginning.)

Consider filling a Set (eg. HashSet) with the positive words at the start of your application.
You can use your scanner in a loop to do this:
while(sc2.hasNext()) {
set.add(sc2.next());
}
Then, when you are looping through the other file, you can just check the set to see if it contains the word:
while(sc1.hasNext()) {
if (set.contains(sc1.next()) {
positive++;
}
}

Related

Putting a text file into an ArrayList, but if word exist it skips it

I´m in a bit of a struggle here, I´m trying to add each word from a textfile to an ArrayList and every time the reader comes across the same word again it will skip it. (Makes sense?)
I don't even know where to start. I kind of know that I need one loop that adds the textfile to the ArrayList and one the checks if the word is not in the list. Any ideas?
PS: Just started with Java
This is what I've done so far, don't even know if I'm on the right path..
public String findWord(){
int text = 0;
int i = 0;
while sc.hasNextLine()){
wordArray[i] = sc.nextLine();
}
if wordArray[i].contains() {
}
i++;
}
A List (an ArrayList or otherwise) is not the best data structure to use; a Set is better. In pseudo code:
define a Set
for each word
if adding to the set returns false, skip it
else do whatever do want to do with the (first time encountered) word
The add() method of Set returns true if the set changed as a result of the call, which only happens if the word isn't already in the set, because sets disallow duplicates.
I once made a similar program, it read through a textfile and counted how many times a word came up.
Id start with importing a scanner, as well as a file system(this needs to be at the top of the java class)
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.File;
import java.io.PrintStream;
import java.util.Scanner;
then you can make file, as well as a scanner reading from this file, make sure to adjsut the path to the file accordingly. The new Printstream is not necessary but when dealing with a big amount of data i dont like to overflow the console.
public static void main(String[] args) throws FileNotFoundException {
File file=new File("E:/Youtube analytics/input/input.txt");
Scanner scanner = new Scanner(file); //will read from the file above
PrintStream out = new PrintStream(new FileOutputStream("E:/Youtube analytics/output/output.txt"));
System.setOut(out);
}
after this you can use scanner.next() to get the next word so you would write something like this:
String[] array=new String[MaxAmountOfWords];//this will make an array
int numberOfWords=0;
String currentWord="";
while(scanner.hasNext()){
currentWord=scanner.next();
if(isNotInArray(currentWord))
{
array[numberOfWords]=currentWord
}
numberOfWords++;
}
If you dont understand any of this or need further guidence to progress, let me know. It is hard to help you if we dont exactly know where you are at...
You can try this:
public List<String> getAllWords(String filePath){
String line;
List<String> allWords = new ArrayList<String>();
BufferedReader reader = new BufferedReader(new FileReader(new File(filePath)));
//read each line of the file
while((line = reader.readLine()) != null) {
//get each word in the line
for(String word: line.split("(\\w)+"))
//validate if the current word is not empty
if(!word.isEmpty())
if(!allWords.contains(word))
allWords.add(word);
}
}
return allWords;
}
Best solution is to use a Set. But if you still want to use a List, here goes:
Suppose the file has the following data:
Hi how are you
I am Hardi
Who are you
Code will be:
List<String> list = new ArrayList<>();
// Get the file.
FileInputStream fis = new FileInputStream("C:/Users/hdinesh/Desktop/samples.txt");
//Construct BufferedReader from InputStreamReader
BufferedReader br = new BufferedReader(new InputStreamReader(fis));
String line = null;
// Loop through each line in the file
while ((line = br.readLine()) != null) {
// Regex for finding just the words
String[] strArray = line.split("[ ]");
for (int i = 0; i< strArray.length; i++) {
if (!list.contains(strArray[i])) {
list.add(strArray[i]);
}
}
}
br.close();
System.out.println(list.toString());
If your text file has sentences with special characters, you will have to write a regex for that.

Reading in a file and processing data

I am a noobie at programming and I can't seem to figure out what to do.
I am to write a Java program that reads in any number of lines from a file and generate a report with:
the count of the number of values read
the total sum
the average score (to 2 decimal places)
the maximum value along with the corresponding name.
the minimum value along with the corresponding name.
The input file looks like this:
55527 levaaj01
57508 levaaj02
58537 schrsd01
59552 waterj01
60552 boersm01
61552 kercvj01
62552 buttkp02
64552 duncdj01
65552 beingm01
I program runs fine, but when I add in
score = input.nextInt(); and
player = input.next();
The program stops working and the keyboard input seems to stop working for the filename.
I am trying to read each line with the int and name separately so that I can process the data and complete my assignment. I don't really know what to do next.
Here is my code:
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.Scanner;
public class Program1 {
private Scanner input = new Scanner(System.in);
private static int fileRead = 0;
private String fileName = "";
private int count = 0;
private int score = 0;
private String player = "";
public static void main(String[] args) {
Program1 p1 = new Program1();
p1.getFirstDecision();
p1.readIn();
}
public void getFirstDecision() { //*************************************
System.out.println("What is the name of the input file?");
fileName = input.nextLine(); // gcgc_dat.txt
}
public void readIn(){ //*********************************************
try {
FileReader fr = new FileReader(fileName + ".txt");
fileRead = 1;
BufferedReader br = new BufferedReader(fr);
String str;
int line = 0;
while((str = br.readLine()) != null){
score = input.nextInt();
player = input.next();
System.out.println(str);
line++;
score = score + score;
count++;
}
System.out.println(count);
System.out.println(score);
br.close();
}
catch (Exception ex){
System.out.println("There is no shop named: " + fileName);
}
}
}
The way you used BufferReader with Scanner is totally wrong .
Note: you can use BufferReader in Scanner constructor.
For example :
try( Scanner input = new Scanner( new BufferedReader(new FileReader("your file path goes here")))){
}catch(IOException e){
}
Note: your file reading process or other processes must be in try block because in catch block you cannot do anything because your connection is closed. It is called try catch block with resources.
Note:
A BufferedReader will create a buffer. This should result in faster
reading from the file. Why? Because the buffer gets filled with the
contents of the file. So, you put a bigger chunk of the file in RAM
(if you are dealing with small files, the buffer can contain the whole
file). Now if the Scanner wants to read two bytes, it can read two
bytes from the buffer, instead of having to ask for two bytes to the
hard drive.
Generally speaking, it is much faster to read 10 times 4096 bytes
instead of 4096 times 10 bytes.
Source BufferedReader in Scanner's constructor
Suggestion: you can just read each line of your file by using BufferReader and do your parsing by yourself, or you can use Scanner class that gives you ability to do parsing tokens.
difference between Scanner and BufferReader
As a hint you can use this sample for your parsing goal
Code:
String input = "Kick 20";
String[] inputSplited = input.split(" ");
System.out.println("My splited name is " + inputSplited[0]);
System.out.println("Next year I am " + (Integer.parseInt(inputSplited[1])+1));
Output:
My splited name is Kick
Next year I am 21
Hope you can fixed your program by given hints.

Reading each line of a file and searching for a specific word in java

So I have an assignment that requires me to "Search a file line by line for a given string. The output must contain the line number, and the line itself, for example if the word files was picked the output look something like
5: He had the files
9: the fILEs were his
Code:
void Search(String input) throws IOException {
int x = 1;
FileReader Search = new FileReader(f);
Scanner in = new Scanner(f);
LineNumberReader L = new LineNumberReader(Search, x);
StreamTokenizer token = new StreamTokenizer(Search);
while (in.hasNextLine())
{
try
{
if (!in.findInLine(input).isEmpty())
{
display(Integer.toString(L.getLineNumber()) + ": " + L.readLine(), "\n");
in.nextLine();
}
} catch (NullPointerException e)
{
System.out.println("Something Happened");
in.nextLine();
}
}
}
So far there are 3 issues I need to figure out with my code.
As soon as instance occurs where the searched is not in a line, it immediately displays the next line, even though the searched word is not in the line, and then terminates from there without having displayed the rest of the lines that had the word in it.
It supposed to display lines with the word, regardless of casing, but does not.
Preferably, it's supposed to display all of them at once, but instead is displaying line by line, until it errors out and terminates.
You're main problem is here...
FileReader Search = new FileReader(f);
Scanner in = new Scanner(f);
LineNumberReader L = new LineNumberReader(Search, x);
StreamTokenizer token = new StreamTokenizer(Search);
while (in.hasNextLine())
{
You've basically opened two file readers against the same file, but you seem to be expecting them to know about each other. You advance the Scanner, but that has no effect on the LineNumberReader. This then messes up the reporting and line reading process.
Reading from Scanner should look more like...
while (in.hasNextLine()) {
String text = in.nextLine();
Having said that, I'd actually drop the Scanner in favor of the LineNumberReader as it will provide you with more useful information which you would otherwise have to do yourself.
For example...
FileReader Search = new FileReader(new File("TestFile"));
LineNumberReader L = new LineNumberReader(Search, x);
String text = null;
while ((text = L.readLine()) != null) {
// Convert the two values to lower case for comparison...
if (text.toLowerCase().contains(input.toLowerCase())) {
System.out.println(L.getLineNumber() + ": " + text);
}
}

removeAll operation on arraylist makes program hang

I'm trying to read in from two files and store them in two separate arraylists. The files consist of words which are either alone on a line or multiple words on a line separated by commas.
I read each file with the following code (not complete):
ArrayList<String> temp = new ArrayList<>();
FileInputStream fis;
fis = new FileInputStream(fileName);
Scanner scan = new Scanner(fis);
while (scan.hasNextLine()) {
Scanner input = new Scanner(scan.nextLine());
input.useDelimiter(",");
while (scan.hasNext()) {
String md5 = scan.next();
temp.add(md5);
}
}
scan.close();
return temp;
Each file contains almost 1 million words (I don't know the exact number), so I'm not entirely sure that the above code works correctly - but it seems to.
I now want to find out how many words are exclusive to the first file/arraylist. To do so I planned on using list1.removeAll(list2) and then checking the size of list1 - but for some reason this is not working. The code:
public static ArrayList differentWords(String fileName1, String fileName2) {
ArrayList<String> file1 = readFile(fileName1);
ArrayList<String> file2 = readFile(fileName2);
file1.removeAll(file2);
return file1;
}
My main method contains a few different calls and everything works fine until I reach the above code, which just causes the program to hang (in netbeans it's just "running").
Any idea why this is happening?
You are not using input in
while (scan.hasNextLine()) {
Scanner input = new Scanner(scan.nextLine());
input.useDelimiter(",");
while (scan.hasNext()) {
String md5 = scan.next();
temp.add(md5);
}
}
I think you meant to do this:
while (scan.hasNextLine()) {
Scanner input = new Scanner(scan.nextLine());
input.useDelimiter(",");
while (input.hasNext()) {
String md5 = input.next();
temp.add(md5);
}
}
but that said you should look into String#split() that will probably save you some time:
while (scan.hasNextLine()) {
String line = scan.nextLine();
String[] tokens = line.split(",");
for (String token: tokens) {
temp.add(token);
}
}
try this :
for(String s1 : file1){
for(String s2 : file2){
if(s1.equals(s2)){file1.remove(s1))}
}
}

Basic File Reading to Array Storage

I have a simple Java questions and I need a simple answer, if possible. I need to input the data from the file and store the data into an array. To do this, I will have to have the program open the data file, count the number of elements in the file, close the file, initialize your array, reopen the file and load the data into the array. I am mainly having trouble getting the file data stored as an array. Here's what I have:
The to read file is here: https://www.dropbox.com/s/0ylb3iloj9af7qz/scores.txt
import java.io.*;
import java.util.*;
import javax.swing.*;
import java.text.*;
public class StandardizedScore8
{
//Accounting for a potential exception and exception subclasses
public static void main(String[] args) throws IOException
{
// TODO a LOT
String filename;
int i=0;
Scanner scan = new Scanner(System.in);
System.out.println("\nEnter the file name:");
filename=scan.nextLine();
File file = new File(filename);
//File file = new File ("scores.txt");
Scanner inputFile = new Scanner (file);
String [] fileArray = new String [filename];
//Scanner inFile = new Scanner (new File ("scores.txt"));
//User-input
// System.out.println("Reading from 'scores.txt'");
// System.out.println("\nEnter the file name:");
// filename=scan.nextLine();
//File-naming/retrieving
// File file = new File(filename);
// Scanner inputFile = new Scanner(file);
I recommend you use a Collection. This way, you don't have to know the size of the file beforehand and you'll read it only once, not twice. The Collection will manage its own size.
Yes, you can if you don't care about the trouble of doing things twice. Use while(inputFile.hasNext()) i++;
to count the number of elements and create an array:
String[] scores = new String[i];
If you do care, use a list instead of an array:
List<String> list = new ArrayList<String>();
while(inputFile.hasNext()) list.add(inputFile.next());
You can get list elements like list.get(i), set list element like list.set(i,"string") and get the length of list list.size().
By the way, your line of String [] fileArray = new String [filename];is incorrect. You need to use an int to create an array instead of a String.
/*
* Do it the easy way using a List
*
*/
public static void main(String[] args) throws IOException
{
Scanner scan = new Scanner(System.in);
System.out.println("\nEnter the file name:");
String filename = scan.nextLine();
FileReader fileReader = new FileReader(filename);
BufferedReader reader = new BufferedReader(fileReader);
List<String> lineList = new ArrayList<String>();
String thisLine = reader.readLine();
while (thisLine != null) {
lineList.add(thisLine);
thisLine = reader.readLine();
}
// test it
int i = 0;
for (String testLine : lineList) {
System.out.println("Line " + i + ": " + testLine);
i++;
}
}
We can use the ArrayList collection to store the values from the file to the array without knowing the size of the array before hand.
You can get more info on ArrayList collections from the following urls.
http://docs.oracle.com/javase/tutorial/collections/implementations/index.html
http://www.java-samples.com/showtutorial.php?tutorialid=234

Categories