This code is to find the TF-IDF of words in 40 text files in a folder called docs, whenever I use this program I keep on getting null pointer exceptions. I believe it is coming from the computeTermFrequencies method. I want it to print the top 5 TF-IDF words from each file.
Any help would be greatly appreciated! Thank you!
import java.util.*;
import java.io.*;
public class KeywordExtractor {
public static void main(String[] args) {
String dir = args[0]; // name of directory with input files
HashMap<String, Integer> dfs;
dfs = readDocumentFrequencies("freqs.txt");
for(int i = 1; i <= 40; i++){
String name = dir + "/" + i + ".txt";
HashMap<String,Integer> tfs = computeTermFrequencies(name);
HashMap<String,Double> tfidf = computeTFIDF(tfs,dfs,40);
System.out.println(i + ".txt");
printTopKeywords(tfidf,5);
System.out.println();
}
}
//method to that takes string as input and returns hashmap with amount of times
//each word appears in the file
public static HashMap<String, Integer> computeTermFrequencies(String filename) {
HashMap<String, Integer> hm2 = new HashMap<String, Integer>();
try{
FileReader fr = new FileReader(filename);
BufferedReader br = new BufferedReader(fr);
String line = "";
line = normalize(line);
//for(String line = br.readLine(); line != null; line = br.readLine()){
while((line=br.readLine())!=null){
String[] words = line.split(" ");
for(int i = 0; i < words.length; i++){
String word = words[i];
if(hm2.containsKey(word)){
int x = hm2.get(word);
x++;
hm2.put(word,x);
}else{
hm2.put(word,1);
}
} //end for
}//end for
}catch(IOException e){
//error
}
return hm2;
}
//method to read frequency file created in another class, it returns a hashMap
public static HashMap<String, Integer> readDocumentFrequencies(String filename){
HashMap<String, Integer> hm = new HashMap<String, Integer>();
//try block
try{
//read file
FileReader fr = new FileReader(filename);
BufferedReader br = new BufferedReader(fr);
//for loop to loop through and take words and put in hashmap
for(String line = br.readLine(); line != null; line = br.readLine()){
String[] a = line.split(" ");
String word = a[0];
int number = Integer.parseInt(a[1]);
//put word in hashmap with the frequency of the word
hm.put(word,number);
if(hm.get(word)==null){
System.out.println("sads");
}
}//end for
}
catch(IOException e){
//error
}
return hm;
}
public static HashMap<String, Double> computeTFIDF(HashMap<String, Integer> tfs, HashMap<String, Integer> dfs,
double nDocs) {
HashMap<String, Double> hm3 = new HashMap<String, Double>();
for(String key:tfs.keySet()){
/*if(dfs.get(key)==null){
System.out.println(key);
}*/
double idf = Math.log(nDocs/dfs.get(key));
double tf = tfs.get(key);
hm3.put(key,tf*idf);
}
return hm3;
}
/**
* This method prints the top K keywords by TF-IDF in descending order.
*/
public static void printTopKeywords(HashMap<String, Double> tfidfs, int k) {
ValueComparator vc = new ValueComparator(tfidfs);
TreeMap<String, Double> sortedMap = new TreeMap<String, Double>(vc);
sortedMap.putAll(tfidfs);
int i = 0;
for(Map.Entry<String, Double> entry: sortedMap.entrySet()){
String key = entry.getKey();
Double value = entry.getValue();
System.out.println(key + " " + value);
i++;
if (i >= k) {
break;
}
}
}
public static String normalize(String word) {
return word.replaceAll("[^a-zA-Z ']", "").toLowerCase();
}
}
/*
* This class makes printTopKeywords work. Do not modify.
*/
class ValueComparator implements Comparator<String> {
Map<String, Double> map;
public ValueComparator(Map<String, Double> base) {
this.map = base;
}
public int compare(String a, String b) {
if (map.get(a) >= map.get(b)) {
return -1;
} else {
return 1;
} // returning 0 would merge keys
}
}
Related
I know there are other solutions out there but nothing is working for me.
Question: In my main method, I group together IDs by rating and make the rating the key and the rest of the info the value as a List. When I create the hashmap and put in the lists I can accurately print the contents of the hashmap. However, once I pass the map the evaluate method, the values are lost and I cannot iterate in the same way that I did in the main method, even though the logic is the same. I am not experienced with the Map class in java. Can somebody please help me figure out why when I pass the Map to my evaluate method that I can no longer iterate the Map?
import java.io.*;
import java.util.*;
public class Evaluate {
public static double grandTotal;
public static void main(String[] args) throws Exception {
FileInputStream fs = new FileInputStream("testInput.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fs));
FileInputStream fs2 = new FileInputStream("testTest.txt");
BufferedReader br2 = new BufferedReader(new InputStreamReader(fs2));
String line;
String line2;
String[] bloop;
String bleep;
String flooper;
String splitter;
String[] splitInput;
List<String> oneStarList= new ArrayList<String>();
List<String> twoStarList= new ArrayList<String>();
List<String> threeStarList= new ArrayList<String>();
List<String> fourStarList= new ArrayList<String>();
List<String> fiveStarList= new ArrayList<String>();
List<String> values2 = new ArrayList<String>();
try {
while ((line=br.readLine()) != null) {
bloop = new String[10];
bloop = line.split("\\s+");
bleep = bloop[1].toString();
flooper = (bloop[0]+" "+bloop[2]+" "+bloop[3]+" "+bloop[4]);
if (bleep.equals("1")){
oneStarList.add(flooper);
}
else if (bleep.equals("2")){
twoStarList.add(flooper);
}
else if (bleep.equals("3")){
threeStarList.add(flooper);
}
else if (bleep.equals("4")){
fourStarList.add(flooper);
}
else if (bleep.equals("5")){
fiveStarList.add(flooper);
}
grandTotal+=(Double.parseDouble(bloop[2]));
}
}
catch (Exception e){
}
Map<String,List<String>> hmap = new HashMap<String,List<String>>();
hmap.put("1",oneStarList);
hmap.put("2", twoStarList);
hmap.put("3", threeStarList);
hmap.put("4", fourStarList);
hmap.put("5", fiveStarList);
while ((line2=br2.readLine()) != null) {
splitInput = new String[5];
splitInput = line2.split("\\s+");
evaluate(splitInput[0],splitInput[1],hmap);
}
br.close();
br2.close();
}
public static void evaluate(String movID, String usrID, Map<String,List<String>> hash) throws Exception{
FileWriter fw = new FileWriter("outputTest.txt");
BufferedWriter bwr = new BufferedWriter(fw);
List<String> values = new ArrayList<String>();
List<String> outputList = new ArrayList<String>();
String[] floop;
String fleep;
int movIDtotal=0;
int usrIDtotal=0;
int totalValues=0;
double pmovIDStar=0;
double pusrIDStar=0;
double pmovID=0;
double pusrID=0;
double numID=0;
double keyTotalProb=0;
String keyOutputStr;
String keyHold;
final Set<Map.Entry<String,List<String>>> entries = hash.entrySet();
for (String key : hash.keySet()){
values = hash.get(key);
System.out.println(key + ":");
for (int i=0;i<values.size();i++){
System.out.println(values.get(i));
floop = new String[5];
fleep = values.get(i);
floop = fleep.split("\\s+");
if (movID.equals(floop[0])){
movIDtotal++;
totalValues++;
}
if (usrID.equals(floop[0])){
usrIDtotal++;
totalValues++;
}
}
values.clear();
}
for (Map.Entry<String, List<String>> entry: entries){
values= entry.getValue();
keyHold = entry.getKey();
for (int j=0;j<values.size();j++){
floop = new String[5];
fleep = values.get(j);
floop = fleep.split("\\s+");
if (movID.equals(floop[0])){
pmovIDStar = Double.parseDouble(floop[3]);
numID = Double.parseDouble(floop[1]);
pmovID = (numID/movIDtotal);
}
if (usrID.equals(floop[0])){
pusrIDStar = Double.parseDouble(floop[3]);
numID = Double.parseDouble(floop[1]);
pusrID = (numID/usrIDtotal);
}
}
keyTotalProb = ((totalValues/grandTotal)*(pmovIDStar)*(pusrIDStar))/(pusrID*pmovID);
keyOutputStr = Double.toString(keyTotalProb);
outputList.add(keyHold);
outputList.add(keyOutputStr);
values.clear();
}
double max = Double.MIN_VALUE;
for (int m=0;m<outputList.size();m+=2){
double coolguy = Double.parseDouble(outputList.get(m+1));
int index = 0;
if(coolguy>max){
max = coolguy;
index = m;
}
try {
bwr.write(String.format("%-1s %-1s %-1s%n", movID,usrID,outputList.get(index)));
bwr.close();
fw.close();
}
catch(Exception e) {
}
}
}
}
Backup info: I'm trying to build a java program that essentially performs the final stage of the Naive Bayes algorithm to predict user ratings (1-5) for movies. I have used MapReduce to train data and now I have an input file where each line contains a string containing information in this order without the commas (movie or user id,rating , number of times rating and ID occur together in total, number of times ID occurs in total, probability that ID and rating occur together out of all ratings for ID). Essentially this is the classification stage.
never suppress excetions. especially when you do coding/debugging.
catch (Exception e){ } is very bad practice
When you do:
final Set<Map.Entry<String,List<String>>> entries = hash.entrySet();
it does not copy hash.entrySet to entries. It creates another reference to it.
same is for values= entry.getValue();
then what do you expect after your first loop (and others too)?
when you do:
values.clear();
your values gone from the lists which are in hash and since entries is just a reference to hash.entrySet() you have what you've done - empty lists.
I have a java program and it produces the output as follows :
termname :docname : termcount
Forexample termname is hello and docname is :2 and termcount is :4
hello:doc1:4
.....
......
I stored all the values in a map. here is the following program
public class tuple {
public static void main(String[]args) throws FileNotFoundException, UnsupportedEncodingException, SQLException, ClassNotFoundException, Exception{
File file2 = new File("D:\\logs\\tuple.txt");
PrintWriter tupled = new PrintWriter(file2, "UTF-8");
List<Map<String, Integer>> list = new ArrayList<>();
Map<String, Integer>map= new HashMap<>();;
String word;
//Iterate over documents
for (int i = 1; i <= 2; i++) {
//map = new HashMap<>();
Scanner tdsc = new Scanner(new File("D:\\logs\\AfterStem" + i + ".txt"));
//Iterate over words
while (tdsc.hasNext()) {
word = tdsc.next();
final Integer freq = map.get(word);
if (freq == null) {
map.put(word, 1);
} else {
map.put(word, map.get(word) + 1);
}
}
list.add(map);
}
// tupled.println(list);
//tupled.close();
//Print result
int documentNumber = 0;
for (Map<String, Integer> document : list) {
for (Map.Entry<String, Integer> entry : document.entrySet()) {
documentNumber++;
//System.out.println(entry.getKey() + ":doc"+documentNumber+":" + entry.getValue());
tupled.print(entry.getKey());
tupled.print(":doc:");
tupled.print(Integer.toString(documentNumber));
tupled.print(",");
tupled.println(entry.getValue());
}
//documentNumber++;
}
tupled.close();
Now I want to store this values into derby database of neatbeans.
How I would be able to do that ?
My String looks like this
http://localhost:8080/HospitalServer/files/file?id=34&firstname=alex&lastname=ozouf&age=33&firstname=kevin&lastname=gerfild&age=27
I use this code to parse the parameters
final Map<String, List<String>> query_pairs = new LinkedHashMap<String, List<String>>();
final String[] pairs = query.split("&");
for (String pair : pairs) {
final int idx = pair.indexOf("=");
final String key = idx > 0 ? URLDecoder.decode(pair.substring(0, idx), "UTF-8") : pair;
if (!query_pairs.containsKey(key)) {
query_pairs.put(key, new LinkedList<String>());
}
final String value = idx > 0 && pair.length() > idx + 1 ? URLDecoder.decode(pair.substring(idx + 1), "UTF-8") : null;
query_pairs.get(key).add(value);
}
System.out.println(query_pairs);
The result is
{id=[34], firstname=[alex, kevin], lastname=[ozouf, gerfild], age=[33, 27]}
The result is not too bad but I want to group the parameters by person.
{id=[34], 1=[alex,ozouf,33 ], 2=[kevin, gerfild,27]}
I can create it from the previous result but I have the feeling that the job is done twice. What do you think I shall do ?
Here's how you can do it without using any library:
import java.util.Map;
import java.util.HashMap;
public class MyUrlParser {
private static final String SEPARATOR = ",";
public static void main(String[] args) {
final String URL = "http://localhost:8080/HospitalServer/files/file?id=34&firstname=alex&lastname=ozouf&age=33&firstname=kevin&lastname=gerfild&age=27";
MyUrlParser mup = new MyUrlParser();
try {
Map<String, String> parsed = mup.parse(URL);
System.out.println(parsed);
} catch (Exception e) {
System.err.println(e.getMessage());
}
}
public Map<String, String> parse(String url) throws Exception {
Map<String, String> retMap = new HashMap<>();
int queryStringPos = url.indexOf("?");
if (-1 == queryStringPos) {
throw new Exception("Invalid URL");
}
String queryString = url.substring(queryStringPos + 1);
String[] parameters = queryString.split("&");
if (parameters.length > 0) {
retMap.put("id", parameters[0]);
int personCounter = 0;
for (int minSize = 4; minSize <= parameters.length; minSize += 3) {
StringBuilder person = new StringBuilder();
person.append(parameters[minSize-3]);
person.append(SEPARATOR);
person.append(parameters[minSize-2]);
person.append(SEPARATOR);
person.append(parameters[minSize-1]);
personCounter++;
retMap.put("person" + personCounter, person.toString());
}
}
return retMap;
}
}
I am trying to write a method that takes an InputStream variable and returns a HashMap back to main. However I'm stuck on how to return the variable of HashMap. New to Java so I do not know what I'm doing wrong.For the return statement: pairsCount cannot be resolved to variable. Thanks in advance.
private static Map<String, Integer> getHashMap(InputStream in)
{
if (in != null)
{
// Using a Scanner object to read one word at a time from the input stream.
#SuppressWarnings("resource")
Scanner sc = new Scanner(in);
String word;
System.out.println(" - Assignment 1 -s%n%n\n");
// Continue getting words until we reach the end of input
List<String> inputWords = new ArrayList<String>();
while (sc.hasNext())
{
word = sc.next();
if (!word.equals(null))
{
inputWords.add(word);
}
}
Map<String, Integer> pairsCount = new HashMap<>();
Iterator<String> it = inputWords.iterator();
String currentWord = null;
String previousWord = null;
Integer wordCount = 0;
while(it.hasNext())
{
currentWord = it.next();
if( previousWord != null )
{
String key = previousWord.concat( "#" ).concat( currentWord );
if( pairsCount.containsKey( key ) )
{
Integer lastCount = pairsCount.get( key );
pairsCount.put( key, lastCount + 1 );
wordCount = wordCount + lastCount;
}
else
{
pairsCount.put( key, 1 );
wordCount = 1;
}
}
previousWord = currentWord;
}
}
return (pairsCount);
}
This is probably because variable pairsCount is out of its scope.
You define it inside of the if block but trying to return it outside.
So try define Map pairsCount = new HashMap<>();
before the if (in != null)
This is my csv data:
Name,Code,Price,Colour,Type,Stock
A,1001,35000,Red,Car Paint,54
B,1002,56000,Blue,House Paint,90
As you can see, my coding is inefficient.
This is because all the textfields in netbeans do not allow same variable names, I have to give different variable names to each text field (Example: code1, code2, code3, name1, name2,name3)
Can someone help me on how to loop this data so they do it four times and i dont have to repeat the coding? and to skip the process if the fields are blank.
The following is my coding:
try
{
for(int z=0; z<4;z++)
{
String code1;
code1=this.text1.getText();
System.out.println("this is the code: " + code1);
String qty;
int qty1;
qty=this.quantity1.getText();
qty1=Integer.parseInt(qty);
System.out.println("quantity: "+qty1);
String code2;
code2=this.text2.getText();
System.out.println("this is the code: " + code2);
int qty2;
qty=this.quantity2.getText();
qty2=Integer.parseInt(qty);
System.out.println("quantity: "+qty2);
String code3;
code3=this.text3.getText();
System.out.println("this is the code: " + code3);
int qty3;
qty=this.quantity2.getText();
qty3=Integer.parseInt(qty);
System.out.println("quantity: "+qty3);
String code4;
code4=this.text4.getText();
System.out.println("this is the code: " + code4);
int qty4;
qty=this.quantity2.getText();
qty4=Integer.parseInt(qty);
System.out.println("quantity: "+qty4);
int sum=0;
BufferedReader line = new BufferedReader(new FileReader(new File("C:\\Users\\Laura Sutardja\\Documents\\IB DP\\Computer Science HL\\cs\\product.txt")));
String indata;
ArrayList<String[]> dataArr = new ArrayList<>();
String[] club = new String[6];
String[] value;
while ((indata = line.readLine()) != null) {
value = indata.split(",");
dataArr.add(value);
}
for (int i = 0; i < dataArr.size(); i++) {
String[] nameData = dataArr.get(i);
if (nameData[1].equals(code1)) {
System.out.println("Found name.");
name1.setText(""+ nameData[0]);
int price;
price=Integer.parseInt(nameData[2]);
int totalprice=qty1*price;
String total=Integer.toString(totalprice);
price1.setText(total);
sum=sum+totalprice;
break;
}
}
for (int i = 0; i < dataArr.size(); i++) {
String[] nameData = dataArr.get(i);
if (nameData[1].equals(code2)) {
System.out.println("Found name.");
name2.setText(""+ nameData[0]);
int price;
price=Integer.parseInt(nameData[2]);
int totalprice=qty2*price;
String total=Integer.toString(totalprice);
price2.setText(total);
sum=sum+totalprice;
break;
}
}
for (int i = 0; i < dataArr.size(); i++) {
String[] nameData = dataArr.get(i);
if (nameData[1].equals(code3)) {
System.out.println("Found name.");
name3.setText(""+ nameData[0]);
int price;
price=Integer.parseInt(nameData[2]);
int totalprice=qty3*price;
int totalprice3=totalprice;
String total=Integer.toString(totalprice);
price3.setText(total);
sum=sum+totalprice;
break;
}
}
for (int i = 0; i < dataArr.size(); i++) {
String[] nameData = dataArr.get(i);
if (nameData[1].equals(code4)) {
System.out.println("Found name.");
name4.setText(""+ nameData[0]);
int price;
price=Integer.parseInt(nameData[2]);
int totalprice=qty4*price;
int totalprice4=totalprice;
String total=Integer.toString(totalprice);
price4.setText(total);
sum=sum+totalprice;
break;
}
}
total1.setText("Rp. "+sum);
}
}
catch ( IOException iox )
{
System.out.println("Error");
}
Why don't you use a library like http://commons.apache.org/proper/commons-csv/
Solving this problem is actually rather straight forward if you break it down into separate parts.
First you need to solve the problem of loading the data into an internal data representation that is easy to use. Just loading the file into Java is rather simple and you have already done this:
BufferedReader csvFile = new BufferedReader(new FileReader(new File(path)));
String line = "start";
int count = 0;
while((line = csvFile.readLine()) != null){
System.out.println(line);
}
csvFile.close();
The next problem is splitting the line and store it in a meaningful way - for each line.
HashMap<Integer, String> record = new HashMap<Integer, String>();
String[] raw = line.split(",");
for(int i=0;i<raw.length; i++){
record.put(i, raw[i]);
}
Now you state you only want to store records that have non-empty fields so we need to check for that:
HashMap<Integer, String> record = new HashMap<Integer, String>();
String[] raw = line.split(",");
Boolean store = true;
for(int i=0;i<raw.length; i++){
if(raw[i].equals("") || raw[i].equals(null)){
store = false;
break;
}
record.put(i, raw[i]);
}
if(store)
csvData.add(record);
Now, you can load each record of the csv file as a dictionary that you can easily use. All that remains is to save a list of these dictionaries.
ArrayList<Map<Integer, String>> csvData = new ArrayList<Map<Integer, String>>();
BufferedReader csvFile = new BufferedReader(new FileReader(new File(path)));
String line = "start";
int count = 0;
while((line = csvFile.readLine()) != null){
if(count == 0){//skip first line
count++;
continue;
}
HashMap<Integer, String> record = new HashMap<Integer, String>();
String[] raw = line.split(",");
Boolean store = true;
for(int i=0;i<raw.length; i++){
if(raw[i].equals("") || raw[i].equals(null))
{
store = false;
break;
}
record.put(i, raw[i]);
}
if(store)
csvData.add(record);
}
csvFile.close();
Full code snippet that loads in data and easily access whatever information you want:
public class Main {
public static final int NAME = 0;
public static final int CODE = 1;
public static final int PRICE = 2;
public static final int COLOR = 3;
public static final int TYPE = 4;
public static final int STOCK = 5;
public static void main(String[] args) throws IOException{
ArrayList<Map<Integer, String>> csvData = loadCSVFile("C:\\path\\to\\file\\products.txt");
//Print some of the data
System.out.println("---------------------------");
for(Map<Integer, String> record : csvData){
printInfo(record);
}
}
public static ArrayList<Map<Integer, String>> loadCSVFile(String path) throws IOException{
ArrayList<Map<Integer, String>> csvData = new ArrayList<Map<Integer, String>>();
BufferedReader csvFile = new BufferedReader(new FileReader(new File(path)));
String line = "start";
int count = 0;
while((line = csvFile.readLine()) != null){
if(count == 0){
count++;
continue;
}
HashMap<Integer, String> record = new HashMap<Integer, String>();
String[] raw = line.split(",");
Boolean store = true;
for(int i=0;i<raw.length; i++){
if(raw[i].equals("") || raw[i].equals(null))
{
store = false;
break;
}
record.put(i, raw[i]);
}
if(store)
csvData.add(record);
}
csvFile.close();
return csvData;
}
public static void printInfo(Map<Integer, String> record){
System.out.println(record.get(CODE) + " : " + record.get(TYPE));
System.out.println(record.get(NAME) + " : " + record.get(STOCK) + " : " + record.get(PRICE));
System.out.println("---------------------------");
}
}