Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am still learning JAVA and have been trying to find a solution for my program for a few days, but I haven't gotten it fixed yet.
I have many text files (my program saves). The files look like this:
text (tab) number (tab) number (tab)...
text (tab) number (tab) number (tab)...
(tab) means that there is tabulation mark,
text means that is text (string),
number means that there is number (integer).
number of files can be from 1 up to 32 and file with names like: january1; january2; january3...
I need to read all of those files (ignore strings) and sum only numbers like so:
while ((line = br.readLine()) != null) {
counter=counter+1;
String[] info = line.split("\\s+");
for(int j = 2; j < 8; j++) {
int num = Integer.parseInt(info[j]);
data[j][counter]=data[j][counter]+num;
}
};
Simply I want sum all that "tables" to array of arrays (or to any similar kind of variable) and then display it as table. If someone knows any solution or can link any similar calculation, that would be awesome!
So, as I see it, you have four questions you need answered, this goes against the site etiquette of asking A question, but will give it a shot
How to list a series of files, presumably using some kind of filter
How to read a file and process the data in some meaningful way
How to manage the data in data structure
Show the data in a JTable.
Listing files
Probably the simplest way to list files is to use File#list and pass a FileFilter which meets your needs
File[] files = new File(".").listFiles(new FileFilter() {
#Override
public boolean accept(File pathname) {
return pathname.getName().toLowerCase().startsWith("janurary");
}
});
Now, I'd write a method which took a File object representing the directory you want to list and a FileFilter to use to search it...
public File[] listFiles(File dir, FileFilter filter) throws IOException {
if (dir.exists()) {
if (dir.isDirectory()) {
return dir.listFiles(filter);
} else {
throw new IOException(dir + " is not a valid directory");
}
} else {
throw new IOException(dir + " does not exist");
}
}
This way you could search for a number of different set of files based on different FileFilters.
Of course, you could also use the newer Paths/Files API to find files as well
Reading files...
Reading multiple files comes down to the same thing, reading a single file...
// BufferedReader has a nice readline method which makes
// it easier to read text with. You could use a Scanner
// but I prefer BufferedReader, but that's me...
try (BufferedReader br = new BufferedReader(new FileReader(new File("...")))) {
String line = null;
// Read each line
while ((line = br.readLine()) != null) {
// Split the line into individual parts, on the <tab> character
String parts[] = line.split("\t");
int sum = 0;
// Staring from the first number, sum the line...
for (int index = 1; index < parts.length; index++) {
sum += Integer.parseInt(parts[index].trim());
}
// Store the key/value pairs together some how
}
}
Now, we need some way to store the results of the calculations...
Have a look at Basic I/O for more details
Managing the data
Now, there are any number of ways you could do this, but since the amount of data is variable, you want a data structure that can grow dynamically.
My first thought would be to use a Map, but this assumes you want to combining rows with the same name, otherwise you should just us a List within a List, where the outer List represents the rows and the Inner list represents the column values...
Map<String, Integer> data = new HashMap<>(25);
File[] files = listFiles(someDir, januraryFilter);
for (File file : files {
readFile(file, data);
}
Where readFile is basically the code from before
protected void readData(File file, Map<String, Integer> data) throws IOException {
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
String line = null;
// Read each line
while ((line = br.readLine()) != null) {
//...
// Store the key/value pairs together some how
String name = parts[0];
if (data.containsKey(name)) {
int previous = data.get(name);
sum += previous;
}
data.put(name, sum);
}
}
}
Have a look at the Collections Trail for more details
Showing the data
And finally, we need to show the data. You could simply use a DefaultTableModel, but you already have the data in structure, why not re-use it with a custom TableModel
public class SummaryTableModel extends AbstractTableModel {
private Map<String, Integer> data;
private List<String> keyMap;
public SummaryTableModel(Map<String, Integer> data) {
this.data = new HashMap<>(data);
keyMap = new ArrayList<>(data.keySet());
}
#Override
public int getRowCount() {
return data.size();
}
#Override
public int getColumnCount() {
return 2;
}
#Override
public Class<?> getColumnClass(int columnIndex) {
Class type = Object.class;
switch (columnIndex) {
case 0:
type = String.class;
break;
case 1:
type = Integer.class;
break;
}
return type;
}
#Override
public Object getValueAt(int rowIndex, int columnIndex) {
Object value = null;
switch (columnIndex) {
case 0:
value = keyMap.get(rowIndex);
break;
case 1:
String key = keyMap.get(rowIndex);
value = data.get(key);
break;
}
return value;
}
}
Then you would simply apply it to a JTable...
add(new JScrollPane(new JTable(new SummaryTableModel(data)));
Take a look at How to Use Tables for more details
Conclusion
There are a lot of assumptions that have to be made which are missing from the context of the question; does the order of the files matter? Do you care about duplicate entries?
So it becomes near impossible to provide a single "answer" which will solve all of your problems
I took all the january1 january2... files from the location and used your same function to calculate the value to be stored.
Then I created a table with two headers, Day and Number. Then just added rows according to the values generated.
DefaultTableModel model = new DefaultTableModel();
JTable table = new JTable(model);
String line;
model.addColumn("Day");
model.addColumn("Number");
BufferedReader br = null;
model.addRow(new Object[]{"a","b"});
for(int i = 1; i < 32; i++)
{
try {
String sCurrentLine;
String filename = "january"+i;
br = new BufferedReader(new FileReader("C:\\january"+i+".txt"));
int counter = 0;
while ((sCurrentLine = br.readLine()) != null) {
counter=counter+1;
String[] info = sCurrentLine.split("\\s+");
int sum = 0;
for(int j = 2; j < 8; j++) {
int num = Integer.parseInt(info[j]);
sum += num;
}
model.addRow(new Object[]{filename, sum+""});
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (br != null)br.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
JFrame f = new JFrame();
f.setSize(300, 300);
f.add(new JScrollPane(table));
f.setVisible(true);
Use Labled Loop and Try-Catch. Below piece adds all number in a line.
You could get some hint from here:
String line = "text 1 2 3 4 del";
String splitLine[] = line.split("\t");
int sumLine = 0;
int i = 0;
contSum: for (; i < splitLine.length; i++) {
try {
sumLine += Integer.parseInt(splitLine[i]);
} catch (Exception e) {
continue contSum;
}
}
System.out.println(sumLine);
Here is another example using vectors . in this example directories will be searched for ".txt" files and added to the JTable.
The doIt method will take in the folder where your text files are located.
this will then with recursion, look for files in folders.
each file found will be split and added following you example file.
public class FileFolderReader
{
private Vector<Vector> rows = new Vector<Vector>();
public static void main(String[] args)
{
FileFolderReader fileFolderReader = new FileFolderReader();
fileFolderReader.doIt("D:\\folderoffiles");
}
private void doIt(String path)
{
System.out.println(findFile(new File(path)) + " in total");
JFrame frame = new JFrame();
frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
Vector<String> columnNames = new Vector<String>();
columnNames.addElement("File Name");
columnNames.addElement("Size");
JTable table = new JTable(rows, columnNames);
JScrollPane scrollPane = new JScrollPane(table);
frame.add(scrollPane, BorderLayout.CENTER);
frame.setSize(300, 150);
frame.setVisible(true);
}
private int findFile(File file)
{
int totalPerFile = 0;
int total = 0;
File[] list = file.listFiles(new FilenameFilter()
{
public boolean accept(File dir, String fileName)
{
return fileName.endsWith(".txt");
}
});
if (list != null)
for (File textFile : list)
{
if (textFile.isDirectory())
{
total = findFile(textFile);
}
else
{
totalPerFile = scanFile(textFile);
System.out.println(totalPerFile + " in " + textFile.getName());
Vector<String> rowItem = new Vector<String>();
rowItem.addElement(textFile.getName());
rowItem.addElement(Integer.toString(totalPerFile));
rows.addElement(rowItem);
total = total + totalPerFile;
}
}
return total;
}
public int scanFile(File file)
{
int sum = 0;
Scanner scanner = null;
try
{
scanner = new Scanner(file);
while (scanner.hasNextLine())
{
String line = scanner.nextLine();
String[] info = line.split("\\s+");
int count = 1;
for (String stingInt : info)
{
if (count != 1)
{
sum = sum + Integer.parseInt(stingInt);
}
count++;
}
}
scanner.close();
}
catch (FileNotFoundException e)
{
// you will need to handle this
// don't do this !
e.printStackTrace();
}
return sum;
}
}
Related
I have a csv file, after I overwrite 1 line with the Write method, after re-writing to the file everything is already added to the end of the file, and not to a specific line
using System.Collections;
using System.Collections.Generic;
using UnityEngine.UI;
using UnityEngine;
using System.Text;
using System.IO;
public class LoadQuestion : MonoBehaviour
{
int index;
string path;
FileStream file;
StreamReader reader;
StreamWriter writer;
public Text City;
public string[] allQuestion;
public string[] addedQuestion;
private void Start()
{
index = 0;
path = Application.dataPath + "/Files/Questions.csv";
allQuestion = File.ReadAllLines(path, Encoding.GetEncoding(1251));
file = new FileStream(path, FileMode.Open, FileAccess.ReadWrite);
writer = new StreamWriter(file, Encoding.GetEncoding(1251));
reader = new StreamReader(file, Encoding.GetEncoding(1251));
writer.AutoFlush = true;
List<string> _questions = new List<string>();
for (int i = 0; i < allQuestion.Length; i++)
{
char status = allQuestion[i][0];
if (status == '0')
{
_questions.Add(allQuestion[i]);
}
}
addedQuestion = _questions.ToArray();
City.text = ParseToCity(addedQuestion[0]);
}
private string ParseToCity(string current)
{
string _city = "";
string[] data = current.Split(';');
_city = data[2];
return _city;
}
private void OnApplicationQuit()
{
writer.Close();
reader.Close();
file.Close();
}
public void IKnow()
{
string[] quest = addedQuestion[index].Split(';');
int indexFromFile = int.Parse(quest[1]);
string questBeforeAnsver = "";
for (int i = 0; i < quest.Length; i++)
{
if (i == 0)
{
questBeforeAnsver += "1";
}
else
{
questBeforeAnsver += ";" + quest[i];
}
}
Debug.Log("indexFromFile : " + indexFromFile);
for (int i = 0; i < allQuestion.Length; i++)
{
if (i == indexFromFile)
{
writer.Write(questBeforeAnsver);
break;
}
else
{
reader.ReadLine();
}
}
reader.DiscardBufferedData();
reader.BaseStream.Seek(0, SeekOrigin.Begin);
if (index < addedQuestion.Length - 1)
{
index++;
}
City.text = ParseToCity(addedQuestion[index]);
}
}
There are lines in the file by type :
0;0;Africa
0;1;London
0;2;Paris
The bottom line is that this is a game, and only those questions whose status is 0, that is, unanswered, are downloaded from the file. And if during the game the user clicks that he knows the answer, then there is a line in the file and is overwritten, only the status is no longer 0, but 1 and when the game is repeated, this question will not load.
It turns out for me that the first question is overwritten successfully, and all subsequent ones are simply added at the end of the file :
1;0;Africa
0;1;London
0;2;Paris1;1;London1;2;Paris
What's wrong ?
The video shows everything in detail
I want to read a csv File and put words " Jakarta " and " Bandung " in a combobox. Here's the input
id,from,
1,Jakarta
2,Jakarta
5,Jakarta
6,Jakarta
10,Bandung
11,Bandung
12,Bandung
I managed to get the words and put it in the combobox, but as you can see, the text file itself contains a lot word " Jakarta " and " Bandung " while i want to show both only once in the combobox.
Here's my temporary code, which works for now but inefficient and probably can't be used if the word has more variety
public String location;
private void formWindowOpened(java.awt.event.WindowEvent evt) {
String csvFile = "C:\\Users\\USER\\Desktop\\Project Data.csv";
BufferedReader br = null;
LineNumberReader reader = null;
String line = "";
String cvsSplitBy = "-|\\,";
br = new BufferedReader(new FileReader(csvFile));
reader = new LineNumberReader(new FileReader(csvFile));
while ((line = br.readLine()) != null) {
// use comma as separator
String[] bookingdata = line.split(cvsSplitBy);
location = bookingdata[1];
ComboBoxModel model = cmb1.getModel();
int size = model.getSize();
cmb1.addItem(location);
for(int i = 1; i < size; i++){
if(model.getElementAt(i).equals("from")){
cmb1.removeItemAt(i);
}
else if(model.getElementAt(i).equals("Bandung")){
cmb1.removeItemAt(i);
}
for(int j = 2; j < i; j++){
if(model.getElementAt(j).equals("Jakarta")){
cmb1.removeItemAt(j);
}
}
}
}
}
Someone else recommended this approach
boolean isEquals = false;
for(i = 0; i < a && !isEquals; i++){
isEquals = location.equals("Jakarta");
if(isEquals){
cmb1.addItem("Jakarta");
}
}
This code doesn't work. As the code doesn't stop once it adds a " Jakarta " but it stops after it completed a loop. thus it still creates duplicate within the combobox.
I would like to know if there's any other code i can try. Thank you
Try putting all the words in a Set first and then add it in the combobox. Set itself will take care of exact one occurrence of each word.
Something like this:
while ((line = br.readLine()) != null) {
// use comma as separator
String[] bookingdata = line.split(cvsSplitBy);
location = bookingdata[1];
ComboBoxModel model = cmb1.getModel();
int size = model.getSize();
// add all location in set and set will only allow distinct values
locationSet.add(location);
}
// after looping through all location put it in combobox
for(String location:locationSet)cmb1.addItem(location);
}
}
As discussed in comments, Sets are meant to keep unique values. Please find the screenshot of JShell below:
PS: This is just to give an idea and may need some amendment as per requirement.
--EDITED--
As discussed, it seems you are still missing something, I tried and write below piece of code and worked fine
package com.digital.core;
import java.util.HashSet;
import java.util.Set;
import javax.swing.JComboBox;
import javax.swing.JFrame;
public class Test {
public static void main(String[] args) {
JFrame jframe = new JFrame();
jframe.setSize(300, 300);
String data = "id,from,\n" +
"1,Jakarta\n" +
"2,Jakarta\n" +
"5,Jakarta\n" +
"6,Jakarta\n" +
"10,Bandung\n" +
"11,Bandung\n" +
"12,Bandung";
String[] dataArr = data.split("\n");
Set<String> locationSet = new HashSet<>();
for(String line:dataArr) {
locationSet.add(line.split(",")[1]);
}
JComboBox<String> comboBox = new JComboBox<>();
for(String location:locationSet)
comboBox.addItem(location);
jframe.add(comboBox);
jframe.setVisible(true);
}
}
You could create an ObservablArrayList of strings and as you read the CSV file, check if the list already contains that string:
ObservableList<String> locationsList = FXCollections.observableArrayList();
// Add your strings to the array as they're loaded, but check to
// make sure the string does not already exist
if (!locationsList.contains(location)) {
locationsList.add(location);
}
Then, after reading the whole file and populating the list, just set the items in your combobox to that ObservableArrayList.
So I have several arrays that I create with large CSVs with some basic code - no issues there. But I want to be able to make a new array from a CSV with a method instead of just copying and pasting code. Here's basically what I want to do.
public static void makeArray(String name, String path, int rows, int columns){
//get CSV from path and make array with that data
}
But here's what I currently have, and it doesn't work.
static String[][] makeArray(String name, String path, int rows, int columns) {
name = new String[rows][columns];
Scanner scanIn = null;
int r = 0;
String inputLine = "";
try {
System.out.println("Setting up " + name);
scanIn = new Scanner(new BufferedReader(new FileReader(path)));
while (scanIn.hasNextLine()) {
inputLine = scanIn.nextLine();
String[] inArray = inputLine.split(",");
for(int x = 0; x < inArray.length; x++) {
name[r][x] = inArray[x];
}
r++;
}
return name;
} catch(Exception e){
System.out.println(e);
}
}
I appreciate the help!!
Your code has the following issues:
Java is pass-by-value, so remove the name parameter and return the newly created array instead.
Don't ignore exceptions. If you don't want callers to have to process the IOException, catch it and send an unchecked exception.
Use try-with-resources to make sure your Reader is closed correctly.
Don't pre-declare variables. Declare them where used.
Applying fixes for the above issue, your code becomes:
static String[][] makeArray(String path, int rows, int columns) {
String[][] name = new String[rows][columns];
try (Scanner scanIn = new Scanner(new BufferedReader(new FileReader(path)))) {
for (int r = 0; scanIn.hasNextLine(); r++) {
String inputLine = scanIn.nextLine();
String[] inArray = inputLine.split(",");
for (int x = 0; x < inArray.length; x++) {
name[r][x] = inArray[x];
}
}
} catch (IOException e) {
throw new IllegalArgumentException("Error reading file '" + path + "': " + e, e);
}
return name;
}
Be aware that the code will fail if the file contains too many lines, or if a line contains too many values. You might want to check for that, or make the code figure out the number of rows by itself, e.g. using a List.
UPDATE
To auto-create an array with the number of rows actually found in the file, you can use the following code.
The code also uses the Arrays.copyOf() method to make sure every row is the given number of columns. If a line in the file contains more values than that, they are silently discarded.
static String[][] makeArray(String path, int columns) {
List<String[]> rows = new ArrayList<>();
try (Scanner in = new Scanner(new BufferedReader(new FileReader(path)))) {
while (in.hasNextLine()) {
String[] values = in.nextLine().split(",");
rows.add(Arrays.copyOf(values, columns)); // ignores excessive values
}
} catch (IOException e) {
throw new IllegalArgumentException("Error reading file '" + path + "': " + e, e);
}
return rows.toArray(new String[rows.size()][]);
}
Make a method that creates an array with given parameters?
The problem with your code is that you are not returning anything in your method due to void keyword and the fact that you didn't use return keyword.
So here's what you need to do:
First its return type should be String[][].
Second you should return the edited array in the end of your method.
Another thing is that you are trying to assign a `String[]
name = new String[rows][columns];
Which will cause the Error cannot convert String[][] to String.
So in your method definition change the type of name to String[][] name or even better declare it inside the method because you are initializing it in the method.
So I have to read out a string from a file in Java. It's for a highscore system.
Each line of the file contains something similiar like this: "24/Kilian".
The number in front of the / is the score and the text after the / is the name.
Now, my problem is that I have to sort the scores descending and write them back into the file. The new scores should overwrite the old ones.
I tried it but I can't get it working properly.
I already wrote some code which reads the score + name line by line out of the file.
public static void sortScores() {
String [][]scores = null;
int i = 1;
try (BufferedReader br = new BufferedReader(new FileReader("score.txt"))) {
String line;
while ((line = br.readLine()) != null) {
System.out.println(line);
scores[i][0] = line.substring(0, line.indexOf("/"));
scores[i][1] = line.substring(line.indexOf("/"), line.length());
i++;
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
So, this code basically writes the score and the name in a 2D array like this:
score[0][0] = "24";
score[0][1] = "Kilian";
score[1][0] = "33";
score[1][1] = "Name";
score[2][0] = "45";
score[2][1] = "AnotherName";
I hope someone can help me with my problem.
You can use java.util.Arrays's sort-Method:
Arrays.sort(scores, (a, b) -> -a[0].compareTo(b[0]));
But this lead to the case that "3" will be above "23". So probably you should create new class which holds the value and use an ArrayList
I'd recomend you to make a new class Score which holds your data (score + name) and add a new instance of Score into a ArrayList for each row you read from the file. After that you can implement a Comparator and sort your ArrayList. It's much easier because you don't know how big your string array will get and you need to know that when you're working with arrays.
public class Score {
public Score(int score, String name) {
this.score = score;
this.name = name;
}
int score;
String name;
// getter
}
List<Score> scoreList = new ArrayList<>();
String line;
while ((line = br.readLine()) != null) {
scoreList.add(new Score(Integer.parseInt(line.substring(0, line.indexOf("/"))), line.substring(line.indexOf("/"), line.length())));
}
Collections.sort(scoreList, new Comparator<Score>() {
public int compare(Score s1, Score s2) {
return s1.getScore() - s2.getScore();
}
}
// write to file
You can try it:
HashMap<Integer, String > map = new HashMap<>();
try (BufferedReader br = new BufferedReader(new FileReader("score.txt"))) {
String line;
while ((line = br.readLine()) != null) {
System.out.println(line);
String[] lines = line.split("/");
map.put(Integer.valueOf(lines[0]),lines[1]);
}
SortedSet<Integer> keys = new TreeSet<Integer>(map.keySet());
keys.forEach(k -> System.out.println(map.get(k).toString() + " value " + k ));
Use Arrays.sort(arr, comparator) with a custom comparator:
Arrays.sort(theArray, new Comparator<String[]>(){
#Override
public int compare(final String[] first, final String[] second){
// here you should usually check that first and second
// a) are not null and b) have at least two items
// updated after comments: comparing Double, not Strings
// makes more sense, thanks Bart Kiers
return Double.valueOf(second[1]).compareTo(
Double.valueOf(first[1])
);
}
});
System.out.println(Arrays.deepToString(theArray));
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am new to java
Can anyone help me with the code to tell how much 2 text files match with each other?
Suppose i have two Files 'a.txt' and 'b.txt'
then i need to know the percentage of match.
thanks
Read in the two files to two Strings str1, str2.
Iterate through each, counting matching chars. Divide number of matches by number of compares, and multiply by 100 to get a percentage.
Scanner sca = new Scanner(new File ("a.txt"));
Scanner scb = new Scanner(new File ("b.txt"));
StringBuilder sba = new StringBuilder();
StringBuilder sbb = new StringBuilder();
while(sca.hasnext()){
sba.append(sca.next());
}
while(scb.hasnext()){
sbb.append(scb.next());
}
String a = sba.toString();
String b = sbb.toString();
int maxlen = Math.max(a.length,b.length);
int matches;
for(int i =0; i<maxlen; i++){
if(a.length <=i || b.length <=i){
break;
}
if(a.chatAt(i)==b.charAt(i)){
matches++;
}
return (((double)matches/(double)maxlen)*100.0)
import java.io.BufferedInputStream;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Scanner;
import java.util.StringTokenizer;
class File_meta_Data // class to store the metadata of file so that scoring can be done
{
String FileName;
long lineNumber;
long Position_In_Line;
long Position_In_Document;
File_meta_Data()
{
FileName = null;
lineNumber = 0;
Position_In_Line = 0;
Position_In_Document = 0;
}
}
public class bluestackv1 {
static int getNumberofInputFiles() // seeks number of resource files from user
{
System.out.println("enter the number of files");
Scanner scan = new Scanner(System.in);
return(scan.nextInt());
}
static List getFiles(int Number_of_input_files) // seeks full path of resource files from user
{
Scanner scan = new Scanner(System.in);
List filename = new ArrayList();
int i;
for(i=0;i<Number_of_input_files;i++)
{
System.out.println("enter the filename");
filename.add(scan.next());
}
return(filename);
}
static String getfile() // seeks the full pathname of the file which has to be matched with resource files
{
System.out.println("enter the name of file to be matched");
Scanner scan = new Scanner(System.in);
return(scan.next());
}
static Map MakeIndex(List filename) // output the index in the map.
{
BufferedReader reader = null; //buffered reader to read file
int count;
Map index = new HashMap();
for(count=0;count<filename.size();count++) // for all files mentioned in the resource list create index of its contents
{
try {
reader = new BufferedReader(new FileReader((String) filename.get(count)));
long lineNumber;
lineNumber=0;
int Count_of_words_in_document;
Count_of_words_in_document = 0;
String line = reader.readLine(); // data is read line by line
while(line!=null)
{
StringTokenizer tokens = new StringTokenizer(line, " ");// here the delimiter is <space> bt it can be changed to <\n>,<\t>,<\r> etc depending on problem statement
lineNumber++;
long Count_of_words_in_line;
Count_of_words_in_line = 0;
while(tokens.hasMoreTokens())
{
List<File_meta_Data> temp = new ArrayList<File_meta_Data>();
String word = tokens.nextToken();
File_meta_Data metadata = new File_meta_Data();
Count_of_words_in_document++; // contains the word number in the document
Count_of_words_in_line++; // contains the word number in line. used for scoring
metadata.FileName = filename.get(count).toString();
metadata.lineNumber = lineNumber;
metadata.Position_In_Document = Count_of_words_in_document;
metadata.Position_In_Line = Count_of_words_in_line;
int occurence;
occurence=0;
if(index.containsKey(word)) //if the word has occured already then update the new entry which concatenates the older and new entries
{
Map temp7 = new HashMap();
temp7 = (Map) index.get(word);
if(temp7.containsKey(metadata.FileName)) // entry of child Map is changed
{
List<File_meta_Data> temp8 = new ArrayList<File_meta_Data>();
temp8 = (List<File_meta_Data>)temp7.get(metadata.FileName); //outputs fioles which contain the word along with its location
temp7.remove(metadata.FileName);
temp8.add(metadata);
temp7.put(metadata.FileName, temp8); // updated entry is added
}
else // if the word has occured for the first time and no entry is in the hashMap
{
temp.add(metadata);
temp7.put(metadata.FileName, temp);
temp=null;
}
Map temp9 = new HashMap();
temp9 = (Map) index.get(word);
index.remove(word);
temp9.putAll(temp7);
index.put(word, temp9);
}
else // similarly is done for parent map also
{
Map temp6 = new HashMap();
temp.add(metadata);
temp6.put(metadata.FileName, temp);
index.put(word,temp6);
}
}
line = reader.readLine();
}
index.put("#words_in_file:"+(String)filename.get(count),Count_of_words_in_document);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
return(index);
}
static String search(Map index,List filename) throws IOException //scores each resource file by comparing with each word in input file
{
double[] overlap = new double[filename.size()]; //stores overlap/coord scores
double[] sigma = new double[filename.size()]; // stores ∑t in q ( tf(t in d) · idf(t)^2 for each resource file
int i;
double max, maxid; // stores file info with max score
max=0;
maxid= -1;
for(i=0;i<filename.size();i++)
{
overlap[i] = 0;
sigma[i] = 0;
}
String bestfile = new String();
double maxscore;
maxscore = -1;
double total;
double cord;
total=0;
File File_to_be_matched = new File(getfile());
BufferedReader reader = new BufferedReader(new FileReader(File_to_be_matched));
String line = reader.readLine();
while(line!=null) //similar to index function
{
StringTokenizer tokens = new StringTokenizer(line, " ");
while(tokens.hasMoreTokens())
{
String word = tokens.nextToken();
double tf,idf;
tf = 0;
idf = 0;
total=total+1;
if(index.containsKey(word))
{
Map temp = new HashMap();
for(i=0;i<filename.size();i++) // for each file a score is calculated for corresponding word which afterwards added
{
int j,count,docFreq;
count=0;
docFreq=0;
temp = (Map) index.get(word);
if(temp.containsKey(filename.get(i)))
{
List l2= (List) temp.get(filename.get(i));
tf = (int) Math.pow((long) l2.size(),0.5); //calculate the term frequency
docFreq = temp.size(); // tells in how many files the word occurs in the file
overlap[i]++;
}
else
{
tf=0;
}
idf = (int) (1 + Math.log((long)(filename.size())/(1+docFreq)));// more the occurence higher similarity of file
sigma[i] = sigma[i] + (int)(Math.pow((long)idf,2) * tf);
}
}
}
line = reader.readLine();
}
double subsetRatio;
for(i=0;i<filename.size();i++) // all scores are added
{
int x = (int)index.get("#words_in_file:"+(String)filename.get(i));
subsetRatio = overlap[i]/x;
overlap[i] = overlap[i]/total;
overlap[i] = overlap[i] * sigma[i];
overlap[i] = overlap[i] * subsetRatio; // files which are subset of some have higher priority
if(max<overlap[i]) // maximum score is calculated
{
max=overlap[i];
maxid = i;
}
}
if(maxid!=-1)
return (String) (filename.get((int) maxid));
else
return("error: Matching does not took place");
}
public static void main(String[] args) throws IOException
{
List filename = new ArrayList();
int Number_of_input_files = getNumberofInputFiles();
filename = getFiles(Number_of_input_files);
Map index = new HashMap();
index = MakeIndex(filename);
//match(index);
while(1==1) //infinite loop
{
String Most_similar_file = search(index,filename);
System.out.println("the most similar file is : "+Most_similar_file);
}
}
}
The problem is to find the most similar file among several resource files.
there are 2 sub-problems to this question
first, as the question states, how to find the most similar file which is done by associating each file with a score by considering different aspects of the content of files
second, to parse each and every word of the input file with a comparatively large resource files
to solve the second problem, Reverse Indexing has been used with HashMaps in java. Since our problem was simple and not modifying i used Inherited Maps instead of Comparator based MapReduce
while searching computing complexity = o(RESOURCEFILES * TOTAL_WORDS_IN _INPUTFILE)
the first problem has been solved by following formula
score(q,d) = coord(q,d) • ∑t in q ( tf(t in d) • idf(t)^2) . subsetRatio
1) coord(q,d) = overlap / maxOverlap
Implication: of the terms in the query, a document that contains more terms will have a higher score
Rational : Score factor based on how many of the query terms are found in the specified document
2) tf(t in d) = sqrt(freq)
Term frequency factor for the term (t) in the document (d).
Implication: the more frequent a term occurs in a document, the greater its score
Rationale: documents which contains more of a term are generally more relevant
3) idf(t) = log(numDocs/(docFreq+1)) + 1 I
implication: the greater the occurrence of a term in different documents, the lower its score
Rational : common terms are less important than uncommon ones
4) SubsetRation = number of occuring words / total words
implication : suppose 2 files, both superlative of input file then file with lesser excessive data will have hiegher similarity
Rational : files with similar content must have higher priority
****************test cases************************
1) input file has no similar word than the resource files
2) input file is similar in content to any one of the file
3) input file is similar in content but different in metadata(meaning position of words is not similar)
4) input file is a subset of resource files
5) input file contains very common words like all 'a' or 'and'
6) input file is not at the location
7) input file cannot be read
Look into opening files, reading them as characters. You actually just need to get a char from each, then check if they match. If they match, then increment the total counter and the match counter. If they don't, only the total counter.
Read more on handling files and streams here: http://docs.oracle.com/javase/tutorial/essential/io/charstreams.html
An example would be this:
BufferedReader br1 = null;
BufferedReader br2 = null;
try
{
br1 = new BufferedReader(new InputStreamReader(new FileInputStream(new File("a.txt")), "UTF-8"));
br2 = new BufferedReader(new InputStreamReader(new FileInputStream(new File("b.txt")), "UTF-8"));
//add logic here
}
catch (Exception e)
{
e.printStackTrace();
}
finally
{
if (br1 != null)
{
try
{
br1.close();
}
catch (Exception e)
{
}
}
if (br2 != null)
{
try
{
br2.close();
}
catch (Exception e)
{
}
}
}