I have developed a java code that takes a text file as input and selects the duplicate words and gives output by creating a new text file containing the duplicate words, now I need it to select triple duplicated words, but i cannot get it correctly. below is my java code-
import java.util.*;
import java.io.*;
public class CheckDuplicate {
public static void main(String[] args) throws Exception{
// TODO Auto-generated method stub
FileReader file1=new FileReader("/home/goutam/workspace/DuplicateWord/clean_2014.txt");
BufferedReader reader1=new BufferedReader(file1);
File f=new File("Reduplication.txt");
FileWriter fw=new FileWriter(f);
String line=reader1.readLine();
while(line!=null){
String[] arr=line.split(" ");
if(arr.length>1){
for(int i=0;i<arr.length;i++){
if(i<arr.length-1){
int cmp=arr[i].compareTo(arr[i+1]);
if(cmp==0){
fw.write(arr[i].toString());
fw.write("\n");
}
}
}
}
line=reader1.readLine();
}
reader1.close();
file1.close();
}
}
Your code doesn't work because you're only considering adjacent elements.
Instead of having nested loops, you can achieve what you want easily using Map that String as a value and an integer that indicates the count as the value.
When you first encounter a string, you insert it with a value of 1
When you have a string that's already in the map, you simply increment its value
Then you can iterate on the values and pick the keys with value > what you want.
I highly recommend you using the debugger, it helps you better understanding the flow of your program.
This should do the job, note: I did not compile nor test it but at least it should provide you some directions.
public void findRepeatingWords( int atLeastNRepetitions ) {
try ( BufferedReader reader1 = new BufferedReader( new FileReader("/home/goutam/workspace/DuplicateWord/clean_2014.txt") ) ) {
// There are libraries that can do this, but yeah... doing it old style here
// Note that usage of AtomicInteger is just a convenience so that we can reduce some lines of codes, not used for atomic operations
Map<String, AtomicInteger> m = new LinkedHashMap<String, AtomicInteger>() {
#Override
public AtomicInteger get( Object key ) {
AtomicInteger cnt = super.get( key );
if ( cnt == null ) {
cnt = new AtomicInteger( 0 );
super.put( key, cnt );
}
return cnt;
}
};
String line = reader1.readLine();
while( line!=null ){
// Note we use \\W here that means non-word character (e.g. spaces, tabs, punctuation,...)
String[] arr = line.split( "\\W" );
for ( String word : arr ) {
m.get( word ).incrementAndGet();
}
line = reader1.readLine();
}
}
}
private void writeRepeatedWords( int atLeastNRepetitions, Map<String, AtomicInteger> m ) {
File f = new File( "Reduplication.txt" );
try ( PrintWriter pw = new PrintWriter( new FileWriter( f ) ) ) {
for ( Map.Entry<String, AtomicInteger> entry : m.entrySet() ) {
if ( entry.getValue().get() >= atLeastNRepetitions ) {
pw.println( entry.getKey() );
}
}
}
}
Here is the thing you are searching for, I have performed it using LinkedHashMap, It's a dynamic code, you select not only double, triple but also go for n number of time.
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.util.LinkedHashMap;
import java.util.Map;
import java.util.Map.Entry;
public class A3 {
public static void main(String[] args) throws IOException {
BufferedReader reader1 = new BufferedReader(new java.io.FileReader(
"src/Source/A3_data"));
PrintWriter duplicatewriter = new PrintWriter(
"src/Source/A3_out_double", "UTF-8");
PrintWriter tripleduplicatewriter = new PrintWriter(
"src/Source/A3_out_tripple", "UTF-8");
LinkedHashMap<String, Integer> map = new LinkedHashMap<>();
String line = reader1.readLine();
while (line != null) {
String[] words = line.split(" ");
int count = 0;
while (count < words.length) {
String key = words[count];
Integer value = map.getOrDefault(key, 0) + 1;
map.put(key, value);
count++;
}
line = reader1.readLine();
}
for (Entry<String, Integer> entry : map.entrySet()) {
if (entry.getValue() == 2)
duplicatewriter.println(entry.getKey());
if (entry.getValue() == 3)
tripleduplicatewriter.println(entry.getKey());
}
duplicatewriter.close();
tripleduplicatewriter.close();
}
}
Since you want the items to appear 3 times in a row, I modified my code to achieve your goal:
public static void main(String[] args) throws Exception {
FileReader file1 = new FileReader("/home/goutam/workspace/DuplicateWord/clean_2014.txt");
BufferedReader reader1 = new BufferedReader(file1);
File f = new File("Reduplication.txt");
FileWriter fw = new FileWriter(f);
String line = reader1.readLine();
while (line != null) {
String[] arr = line.split(" ");
if (arr.length > 1) {
for (int i = 0; i < arr.length; i++) {
if (i < arr.length - 2) { // change from length-1 to length-2
int cmp = arr[i].compareTo(arr[i + 1]);
if (cmp == 0) {
if (arr[i + 1].equals(arr[i + 2])) { // keep comparing the next 2 items
System.out.println(arr[i].toString() + "\n");
fw.write(arr[i].toString());
fw.write("\n");
}
}
}
}
}
line = reader1.readLine();
}
reader1.close();
file1.close();
}
Try This this code print if count is greater than 3 you can use any number
public static void getStringTripple(String a){
String s[]=a.split(" ");
List<String> asList = Arrays.asList(s);
Set<String> mySet = new HashSet<String>(asList);
for(String ss: mySet){
if(Collections.frequency(asList,ss)>=3)
System.out.println(ss + " " +Collections.frequency(asList,ss));
}
}
Related
I have two text files. I have to develop a java program which compares the two files and find unique words. I have tried a few methods but didn’t work. Example:
test1.txt:
I am a robot. My name is Sofia.
test2.txt:
Hello I am a man. My name is Alex
Output:
Hello robot man Sofia Alex
I approach was like this:
import java.io.*;
import java.util.*;
public class Main {
public static void main(String[] args)
throws FileNotFoundException {
Scanner input = new Scanner(new File("test1.txt"));
Scanner scan = new Scanner(new File("test2.txt"));
ArrayList<String> al = new ArrayList<String>();
ArrayList<String> a2 = new ArrayList<String>();
String test;
while (input.hasNext()) {
String next = input.next();
}
System.out.println("arraylist" + al);
while (scan.hasNext()) {
test = scan.next();
a2.add(test);
}
System.out.println("arraylist2" + a2);
for( int i = 0; i < al.size(); i++){
for(int j = 0; j < a2.size(); j++){
if(al.get(i).equals(a2.get(j))){
break;}
else{
System.out.println(al.get(i));break;
}
}
}
}
}
Note that this is a quick and dirty approach and pretty inefficient. Furthermore I dont know your exact requirements (full stops? Upper/lowercase?).
Also take into account that this program doesn't check which list is longer. But this should give you a good hint:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Scanner;
public class Main {
public static void main(String[] args) throws FileNotFoundException {
Scanner input = new Scanner(new File("test1.txt"));
Scanner scan = new Scanner(new File("test2.txt"));
ArrayList<String> list1 = new ArrayList<String>();
ArrayList<String> list2 = new ArrayList<String>();
while (input.hasNext()) {
list1.add(input.next());
}
while (scan.hasNext()) {
list2.add(scan.next());
}
// iterate over list 1
for (int i = list1.size() - 1; i >= 0; i--) {
// if there is a occurence of two identical strings
if (list2.contains(list1.get(i))) {
// remove the String from list 2
list2.remove(list2.indexOf(list1.get(i)));
// remove the String from list 1
list1.remove(i);
}
}
// merge the lists
list1.addAll(list2);
// remove full stops
for (int i = 0; i < list1.size(); i++) {
list1.set(i, list1.get(i).replace(".", ""));
}
System.out.println("Unique Values: " + list1);
}
}
Assumptions are the text file contains only (.) as sentence terminator.
public static void main(String[] args) throws Exception
{
// Skipping reading from file and storing in string
String stringFromFileOne = "I am a robot. My name is Sofia.";
String stringFromFileTwo = "Hello I am a man. My name is Alex";
Set<String> set1 = Arrays.asList(stringFromFileOne.split(" "))
.stream()
.map(s -> s.toLowerCase())
.map(m -> m.contains(".") ? m.replace(".", "") : m)
.sorted()
.collect(Collectors.toSet());
Set<String> set2 = Arrays.asList(stringFromFileTwo.split(" "))
.stream()
.map(s -> s.toLowerCase())
.map(m -> m.contains(".") ? m.replace(".", "") : m)
.sorted()
.collect(Collectors.toSet());
List<String> uniqueWords;
if (set1.size() > set2.size()) {
uniqueWords = getUniqueWords(set2, set1);
} else {
uniqueWords = getUniqueWords(set1, set2);
}
System.out.println("uniqueWords:" + uniqueWords);
}
private static List<String> getUniqueWords(Set<String> removeFromSet, Set<String> iterateOverSet) {
List<String> uniqueWords;
Set<String> tempSet = new HashSet<String>(removeFromSet);
removeFromSet.removeAll(iterateOverSet);
uniqueWords = iterateOverSet.stream().filter(f -> !tempSet.contains(f) && !f.isEmpty())
.collect(Collectors.toList());
uniqueWords.addAll(removeFromSet);
return uniqueWords;
}
You can use guava library which gives you difference between two sets.
import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.HashSet;
import java.util.Set;
import java.util.StringTokenizer;
import com.google.common.collect.Sets;
public class WordTest {
public static void main(String[] args) {
WordTest wordTest = new WordTest();
Set<String> firstFileWords = wordTest.getAllWords("E:\\testing1.txt");
Set<String> secondFileWords = wordTest.getAllWords("E:\\testing2.txt");
Set<String> diff = Sets.difference(firstFileWords, secondFileWords);
Set<String> diff2 = Sets.difference(secondFileWords, firstFileWords);
System.out.println("Set 1: " + firstFileWords);
System.out.println("Set 2: " + secondFileWords);
System.out.println("Difference between " + "Set 1 and Set 2: " + diff);
System.out.println("Difference between " + "Set 2 and Set 1: " + diff2);
}
public Set<String> getAllWords(String path) {
FileInputStream fis = null;
DataInputStream dis = null;
BufferedReader br = null;
Set<String> wordList = new HashSet<>();
try {
fis = new FileInputStream(path);
dis = new DataInputStream(fis);
br = new BufferedReader(new InputStreamReader(dis));
String line = null;
while ((line = br.readLine()) != null) {
StringTokenizer st = new StringTokenizer(line, " ,.;:\"");
while (st.hasMoreTokens()) {
wordList.add(st.nextToken());
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (br != null)
br.close();
} catch (Exception ex) {
}
}
return wordList;
}
}
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Arrays;
import java.util.HashSet;
public class FileComparision {
public static void main(String[] args) throws IOException {
HashSet<String> uniqueSet=new HashSet<String>();
//split the lines based on the delimiter and add it to set
BufferedReader reader=new BufferedReader(new FileReader("test1.txt"));
String line;
while ((line = reader.readLine()) != null) {
Arrays.asList(line.split(" ")).forEach(word->uniqueSet.add(word) ); ;
}
reader.close();
reader=new BufferedReader(new FileReader("test2.txt"));
while ((line = reader.readLine()) != null) {
Arrays.asList(line.split(" ")).forEach(word->{
if(!uniqueSet.contains(word)) {
uniqueSet.add(word) ;
}else {
uniqueSet.remove(word);
}
});
}
reader.close();
//to remove unnecessary characters
//uniqueSet.remove(".");
System.out.println(uniqueSet);
}
}
public static String readFile(String fileName)throws Exception
{
String data = "";
data = new String(Files.readAllBytes(Paths.get(fileName)));
return data;
}
public static void main(String[] args) throws Exception
{
String data = readFileAsString("C:\\Users\\pb\\Desktop\\text1.txt");
String data1 = readFileAsString("C:\\Users\\pb\\Desktop\\text2.txt");
String array[]=data.split(" ");
String array1[]=data1.split(" ");
for(int i=0;i<=array1.length-1;i++){
if(data.contains(array1[i])){
}else{
System.out.println(array1[i]);
}
}
for(int i=0;i<=array.length-1;i++){
if(data1.contains(array[i])){
}else{
System.out.println(array[i]);
}
}
}
My program is opening a file and then saves its words and their byte distance from the file beginning . Though the file has too many duplicate words that i don't want . Also i want my list to be in alphabetical order . The problem is that when i fix the order the duplicate are messed and vice versa . Here is my code:
import java.io.*;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashSet;
import java.util.LinkedList;
import java.util.Set;
class MyMain {
public static void main(String[] args) throws IOException {
ArrayList<DictPage> listOfWords = new ArrayList<DictPage>();
LinkedList<Page> Eurethrio = new LinkedList<Page>();
File file = new File("C:\\Kennedy.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
//This will reference one line at a time...
String line = null;
int line_count=0;
int byte_count;
int total_byte_count=0;
int fromIndex;
int kat = 0;
while( (line = br.readLine())!= null ){
line_count++;
fromIndex=0;
String [] tokens = line.split(",\\s+|\\s*\\\"\\s*|\\s+|\\.\\s*|\\s*\\:\\s*");
String line_rest=line;
for (int i=1; i <= tokens.length; i++) {
byte_count = line_rest.indexOf(tokens[i-1]);
//if ( tokens[i-1].length() != 0)
//System.out.println("\n(line:" + line_count + ", word:" + i + ", start_byte:" + (total_byte_count + fromIndex) + "' word_length:" + tokens[i-1].length() + ") = " + tokens[i-1]);
fromIndex = fromIndex + byte_count + 1 + tokens[i-1].length();
if (fromIndex < line.length())
line_rest = line.substring(fromIndex);
if(!listOfWords.contains(tokens[i-1])){//Na mhn apothikevetai h idia leksh
//listOfWords.add(tokens[i-1]);
listOfWords.add(new DictPage(tokens[i-1],kat));
kat++;
}
Eurethrio.add(new Page("Kennedy",fromIndex));
}
total_byte_count += fromIndex;
Eurethrio.add(new Page("Kennedy", total_byte_count));
}
Set<DictPage> hs = new HashSet<DictPage>();
hs.addAll(listOfWords);
listOfWords.clear();
listOfWords.addAll(hs);
if (listOfWords.size() > 0) {
Collections.sort(listOfWords, new Comparator<DictPage>() {
#Override
public int compare(final DictPage object1, final DictPage object2) {
return object1.getWord().compareTo(object2.getWord());
}
} );
}
//Ektypwsh leksewn...
for (int i = 0; i<listOfWords.size();i++){
System.out.println(""+listOfWords.get(i).getWord()+" "+listOfWords.get(i).getPage());
}
for (int i = 0;i<Eurethrio.size();i++){
System.out.println(""+Eurethrio.get(i).getFile()+" "+Eurethrio.get(i).getBytes());
}
}
}
Use the TreeSet instead of ArrayList, and you'll get automatically order and no repeatings.
In the first place, why are you using ArrayList to store your list of words.
ArrayList<DictPage> listOfWords = new ArrayList<DictPage>();
You should use Set (like HashSet, TreeSet or some implementation of Set) to store your words if you don't want duplicates.
Set<DictPage> listOfWords = new Hashset<DictPage>(); //no duplicates but not sorted
Or
Set<DictPage> listOfWords = new Treeset<DictPage>(); //no duplicates and sorted as well
This would make sure that your list of words does not contain any duplicates.
And if you want them sorted straight away, you can use TreeSet which will make it more easier for you.
use this.
public void stripDuplicatesFromFile(String filename) {
try {
BufferedReader reader = new BufferedReader(new FileReader(filename));
Set<String> lines = new HashSet<String>();
String line;
while ((line = reader.readLine()) != null) {
lines.add(line);
}
reader.close();
BufferedWriter writer = new BufferedWriter(new FileWriter(filename));
for (String unique : lines) {
writer.write(unique);
writer.newLine();
}
writer.close();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
it takes filepath as an input, find duplicate lines and remove them. But if you have large file do not use this. I'm using this method on a very small size of a .txt file (kind of log file and order is not imported).
I have got two text files with data in the following format
data.txt file as following format
A 10
B 20
C 15
data1.txt file is in format (start node,end node, distance):
A B 5
A C 10
B C 20
I am trying to implement a search strategy, for that I need to load the data from data.txt and ONLY the start node and end node from data1.txt (i.e. I dont need the distance). I need to store this information in a stack as I think it would be a best data structure for implementing greedy search.
Actually I am not sure how to get started with file I/O to read these files and store them in array to implement greedy search. So I would highly appreciate any starting idea on how to proceed.
I am new to this, so please bear with me. Any help is much appreciated. Thanks.
EDIT:
Here is what I have got till now
String heuristic_file = "data.txt";
try
{
FileReader inputHeuristic = new FileReader(heuristic_file);
BufferedReader bufferReader = new BufferedReader(inputHeuristic);
String line;
while ((line = bufferReader.readLine()) != null)
{
System.out.println(line);
}
bufferReader.close();
} catch(Exception e) {
System.out.println("Error reading file " + e.getMessage());
}
My approach, doesn't differ fundamentally from the others. Please regard the try/catch/finally blocks. Always put the closing statements into the finally block, so the opened file is guaranteed to be closed, even if an exception was thrown while reading the file.
The part between the two //[...] could surely be done more efficient. Maybe reading the whole file in one take and then parsing the text backwards and searching for a line-break? Maybe a Stream-API supports to set the reading position. I honestly don't know. I didn't need that, up to now.
I chose to use the verbose initialization of the BufferedReader, because then you can specify the expected encoding of the file. In your case it doesn't matter, since your files do not contain symbols out of the standard ASCII range, but I believe it's a semi-best-practice.
Before you ask: r.close() takes care of closing the underlying InputStreamReader and FileInputStream in the right order, till all readers and streams are closed.
public static void readDataFile(String dir, String file1, String file2)
throws IOException
{
File datafile1 = new File(dir, file1);
File datafile2 = new File(dir, file2);
if (datafile1.exists())
{
BufferedReader r = null;
try
{
r = new BufferedReader(
new InputStreamReader(
new FileInputStream(datafile1),
"UTF-8"
)
);
String row;
Stack<Object[]> s = new Stack<Object[]>();
String[] pair;
Integer datapoint;
while((row = r.readLine()) != null)
{
if (row != null && row.trim().length() > 0)
{
// You could use " " instead of "\\s"
// but the latter regular expression
// shorthand-character-class will
// split the row on tab-symbols, too
pair = row.split("\\s");
if (pair != null && pair.length == 2)
{
datapoint = null;
try
{
datapoint = Integer.parseInt(pair[1], 10);
}
catch(NumberFormatException f) { }
// Later you can validate datapairs
// by using
// if (s.pop()[1] != null)
s.add(new Object[] { pair[0], datapoint});
}
}
}
}
catch (UnsupportedEncodingException e1) { }
catch (FileNotFoundException e2) { }
catch (IOException e3) { }
finally
{
if (r != null) r.close();
}
}
// Do something similar with datafile2
if (datafile2.exists())
{
// [...do the same as in the first try/catch block...]
String firstrow = null, lastrow = null;
String row = null;
int i = 0;
do
{
lastrow = row;
row = r.readLine();
if (i == 0)
firstrow = row;
i++;
} while(row != null);
// [...parse firstrow and lastrow into a datastructure...]
}
}
use split
while ((line = bufferReader.readLine()) != null)
{
String[] tokens = line.split(" ");
System.out.println(line + " -> [" + tokens[0] + "]" + "[" + tokens[1] + "][" + tokens[2] + "]");
}
if you must have this in an array you can use the following:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.text.ParseException;
import java.util.ArrayList;
import java.util.List;
public class NodeTest {
public static void main(String[] args) throws ParseException {
try {
File first = new File("data.txt");
File second = new File("data1.txt");
Node[] nodes1 = getNodes(first);
Node[] nodes2 = getNodes(second);
print(nodes1);
print(nodes2);
}
catch(Exception e) {
System.out.println("Error reading file " + e.getMessage());
}
}
public static final void print(Node[] nodes) {
System.out.println("======================");
for(Node node : nodes) {
System.out.println(node);
}
System.out.println("======================");
}
public static final Node[] getNodes(File file) throws IOException {
FileReader inputHeuristic = new FileReader(file);
BufferedReader bufferReader = new BufferedReader(inputHeuristic);
String line;
List<Node> list = new ArrayList<Node>();
while ((line = bufferReader.readLine()) != null) {
String[] tokens = line.split(" ");
list.add(new Node(tokens[0], tokens[1]));
}
bufferReader.close();
return list.toArray(new Node[list.size()]);
}
}
class Node {
String start;
String end;
public Node(String start, String end){
this.start = start;
this.end = end;
}
public String toString() {
return "[" + start + "][" + end + "]";
}
}
Something like this?
HashSet<String> nodes = new HashSet<String>();
try(BufferedReader br = new BufferedReader(new FileReader("data.txt"))) {
String line = br.readLine();
while (line != null) {
String[] l = line.split(" ");
nodes.add(l[0]);
line = br.readLine();
}
}
try(BufferedReader br = new BufferedReader(new FileReader("data1.txt"))) {
String line = br.readLine();
while (line != null) {
String[] l = line.split(" ");
if (nodes.contains(l[0]) || nodes.contains(l[1]))
// Do whatever you want ...
line = br.readLine();
}
}
I have wrote Java code to count sum of occurrences. It uses 2 .txt files as input and gives words and frequencies as output.
I would also like to print, which file how many times contains a given word. Do you have any idea how to do this?
public class JavaApplication2
{
public static void main(String[] args) throws IOException
{
Path filePath1 = Paths.get("test.txt");
Path filePath2 = Paths.get("test2.txt");
Scanner readerL = new Scanner(filePath1);
Scanner readerR = new Scanner(filePath2);
String line1 = readerL.nextLine();
String line2 = readerR.nextLine();
String text = new String();
text=text.concat(line1).concat(line2);
String[] keys = text.split("[!.?:;\\s]");
String[] uniqueKeys;
int count = 0;
System.out.println(text);
uniqueKeys = getUniqueKeys(keys);
for(String key: uniqueKeys)
{
if(null == key)
{
break;
}
for(String s : keys)
{
if(key.equals(s))
{
count++;
}
}
System.out.println("["+key+"] frequency : "+count);
count=0;
}
}
private static String[] getUniqueKeys(String[] keys)
{
String[] uniqueKeys = new String[keys.length];
uniqueKeys[0] = keys[0];
int uniqueKeyIndex = 1;
boolean keyAlreadyExists = false;
for(int i=1; i<keys.length ; i++)
{
for(int j=0; j<=uniqueKeyIndex; j++)
{
if(keys[i].equals(uniqueKeys[j]))
{
keyAlreadyExists = true;
}
}
if(!keyAlreadyExists)
{
uniqueKeys[uniqueKeyIndex] = keys[i];
uniqueKeyIndex++;
}
keyAlreadyExists = false;
}
return uniqueKeys;
}
Firstly, instead of using an array for unique keys, use a HashMap<String, Integer>. It's a lot more efficient.
Your best option is to run your processing over each line/file separately, and store these counts separately. Then merge the two counts to get the overall frequencies.
More Detail:
String[] keys = text.split("[!.?:;\\s]");
HashMap<String,Integer> uniqueKeys = new HashMap<>();
for(String key : keys){
if(uniqueKeys.containsKey(key)){
// if your keys is already in map, increment count of it
uniqueKeys.put(key, uniqueKeys.get(map) + 1);
}else{
// if it isn't in it, add it
uniqueKeys.put(key, 1);
}
}
// You now have the count of all unique keys in a given text
// To print them to console
for(Entry<String, Integer> keyCount : uniqueKeys.getEntrySet()){
System.out.println(keyCount.getKey() + ": " + keyCount.getValue());
}
// To merge, if you're using Java 8
for(Entry<String, Integer> keyEntry : uniqueKeys1.getEntrySet()){
uniqueKeys2.merge(keyEntry.getKey(), keyEntry.getValue(), Integer::add);
}
// To merge, otherwise
for(Entry<String, Integer> keyEntry : uniqueKeys1.getEntrySet()){
if(uniqueKeys2.containsKey()){
uniqueKeys2.put(keyEntry.getKey(),
uniqueKeys2.get(keyEntry.getKey()) + keyEntry.getValue());
}else{
uniqueKeys2.put(keyEntry.getKey(), keyEntry.getValue());
}
}
UPDATE : code for word(s) occurences (thanks #George)
This example is for a file, you can use it for multiple files :
public class MyTest {
Map<String,Integer> mapTable;
public MyTest(List<String> wordList){
//initialize map
makeMap(wordList);
}
public void makeMap(List<String> wordList){
mapTable = new HashMap();
for(int i = 0; i < wordList.size(); i++){
//fill the map up
mapTable.put(wordList.get(i), 0);
}
}
//update occurences in a map
public void updateMap(String [] _words){
for(int i = 0; i < _words.length; i++){
updateWordCount(_words[i]);
}
}
public void updateWordCount(String _word){
int value = 0;
//check if a word present
if(mapTable.containsKey(_word)){
value = mapTable.get(_word);
value++;
mapTable.put(_word, value);
}
}
public void DisplayCounts(){
for( String key : mapTable.keySet()){
System.out.println("Word : "+key+"\t Occurrence(s) :"+mapTable.get(key));
}
}
public void getWordCount(){
String filePath = "C:\\Users\\Jyo\\Desktop\\help.txt";
String line = "";
try {
// FileReader reads text files in the default encoding.
FileReader fileReader = new FileReader(filePath);
// Always wrap FileReader in BufferedReader.
BufferedReader bufferedReader = new BufferedReader(fileReader);
String _words[] = null;
while((line = bufferedReader.readLine()) != null) {
System.out.println(line);
_words = line.split(" ");
updateMap(_words);
}
// Always close files.
bufferedReader.close();
} catch (Exception e) {
System.out.println("Error :"+e.getMessage());
}
}
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
List<String> wordList = new ArrayList<>();
wordList.add("data");
wordList.add("select");
MyTest mt = new MyTest(wordList);
mt.getWordCount();
mt.DisplayCounts();
}
}
import java.io.;
import java.util.;
public class file1{
public static void main(String[] args) throws Exception{
HashMap<String,Integer> words_fre = new HashMap<String,Integer>();
HashSet<String> words = new HashSet<String>();
try{
File folder = new File("/home/jsrathore/Dropbox/Semester 6th/IR_Lab/lab_01/one");
File[] listOfFiles = folder.listFiles();
BufferedReader bufferedReader=null;
FileInputStream inputfilename=null;
BufferedWriter out= new BufferedWriter(new OutputStreamWriter(new FileOutputStream("outfilename.txt",false), "UTF-8"));
for(File file : listOfFiles){
inputfilename= new FileInputStream(file);
/*System.out.println(file); */
bufferedReader= new BufferedReader(new InputStreamReader(inputfilename, "UTF-8"));
String s;
while((s = bufferedReader.readLine()) != null){
/*System.out.println(line);*/
s = s.replaceAll("\\<.*?>"," ");
if(s.contains("॥") || s.contains(":")|| s.contains("।")||
s.contains(",")|| s.contains("!")|| s.contains("?")){
s=s.replace("॥"," ");
s=s.replace(":"," ");
s=s.replace("।"," ");
s=s.replace(","," ");
s=s.replace("!"," ");
s=s.replace("?"," ");
}
StringTokenizer st = new StringTokenizer(s," ");
while (st.hasMoreTokens()) {
/*out.write(st.nextToken()+"\n");*/
String str=(st.nextToken()).toString();
words.add(str);
}
for(String str : words){
if(words_fre.containsKey(str)){
int a = words_fre.get(str);
words_fre.put(str,a+1);
}else{
words_fre.put(str,1);/*uwords++;//unique words count */
}
}
words.clear();
/*out.write("\n");
out.close();*/
}
Object[] key = words_fre.keySet().toArray();
Arrays.sort(key);
for (int i = 0; i < key.length; i++) {
//System.out.println(key[i]+"= "+words_fre.get(key[i]));
out.write(key[i]+" : "+words_fre.get(key[i]) +"\n");
}
}
out.close();
bufferedReader.close();
}catch(FileNotFoundException ex){
System.out.println("Error in reading line");
}catch(IOException ex){
/*System.out.println("Error in reading line"+fileReader );*/
ex.printStackTrace();
}
}
}
Late answer, however below code will count word frequency efficiently if there are multiple files
import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Scanner;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicLong;
public class WordCounter implements Runnable {
private final Scanner scanner;
private Map<String, AtomicLong> sharedCounter;
public WordCounter(Scanner scanner, Map<String, AtomicLong> sharedCounter) {
this.scanner = scanner;
this.sharedCounter = sharedCounter;
}
public void run() {
if (scanner == null) {
return;
}
while (scanner.hasNext()) {
String word = scanner.next().toLowerCase();
sharedCounter.putIfAbsent(word, new AtomicLong(0));
sharedCounter.get(word).incrementAndGet();
}
}
public static void main(String[] args) throws IOException {
// Number of parallel thread to run
int THREAD_COUNT = 10;
List<Path> paths = new ArrayList<>();
// Add path
paths.add(Paths.get("test1.txt"));
paths.add(Paths.get("test2.txt"));
// Shared word counter
Map<String, AtomicLong> sharedCounter = new ConcurrentHashMap<>();
ExecutorService executor = Executors.newFixedThreadPool(THREAD_COUNT);
for (Path path : paths) {
executor.execute(new WordCounter(new Scanner(path), sharedCounter));
}
executor.shutdown();
// Wait until all threads are finish
while (!executor.isTerminated()) {
}
System.out.println(sharedCounter);
}
}
So here I have code I have HashMap made up by the words in file, I am adding words and writing them on file and it works, but when I use my remove function for some reaseon doesnt do anything here is the code :
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;
public class Main {
public static File file = new File( C:\\Users\\N\\Desktop\\Newfolder\\Dictionary\\src\\nmishewa\\geekycamp\\dictionary\\bg_win1251.txt");
public static int value = 1;
private static Scanner input;
public static Scanner in = new Scanner(System.in);
public static Map<String, Integer> map = new HashMap<String, Integer>();
public static void main(String[] args) throws FileNotFoundException {
readFile();
System.out.println("Enter number of function wanted" + "\n1 to add"
+ "\n2 for searching by prefix" + "\n3 for deleting");
int choice = in.nextInt();
if (choice == 1) {
System.out.println("enter words seprated by comma");
String wd = in.next();
add(wd);
}
if (choice == 2) {
System.out.println("Enter prefix");
String wd = in.next();
prefixSearch(wd);
}
if (choice == 3) {
System.out.println("ENTER word to delete");
String wd = in.next();
remove(wd);
}
}
public static void readFile() throws FileNotFoundException {
input = new Scanner(file);
boolean done = false;
int value = 1;
while (input.hasNext()) {
String word = input.next().toLowerCase();
String[] line = word.split("[,\\s]+");
for (int j = 0; j < line.length; j++) {
map.put(line[j], value);
value++;
done = true;
}
}
if (done == true) {
System.out.println("Succes");
}
}
public static void prefixSearch(String wd) {
System.out.println("Enter prefix");
String prefix = wd.toLowerCase();
for (Map.Entry<String, Integer> key : map.entrySet()) {
if (key.getKey().startsWith(prefix)) {
System.out.println(key.getKey());
}
}
}
public static void add(String wd) {
boolean done = false;
String word = wd.toLowerCase();
String[] line = word.split("[,\\s]+");
for (int j = 0; j < line.length; j++) {
if (!map.containsKey(line[j])) {
map.put(line[j], value);
value++;
try {
FileWriter fw = new FileWriter(file.getAbsoluteFile());
BufferedWriter bw = new BufferedWriter(fw);
bw.write(map.toString());
bw.close();
done = true;
} catch (Exception e) {
e.printStackTrace();
}
} else {
continue;
}
}
if (done == true) {
System.out.println("Success");
}
}
public static void remove(String wd) {
boolean done = false;
String word = wd.toLowerCase();
String[] line = word.split("[,\\s]+");
for (int j = 0; j < line.length; j++) {
for (Map.Entry<String, Integer> key : map.entrySet()) {
if (key.getKey().equals(line[j])) {
map.remove(key.getKey(), key.getValue());
try {
FileWriter fw = new FileWriter(file.getAbsoluteFile());
BufferedWriter bw = new BufferedWriter(fw);
bw.write(map.toString());
bw.close();
done = true;
} catch (Exception e) {
e.printStackTrace();
}
} else {
continue;
}
}
}
if (done == true) {
System.out.println("Succes");
}
}
}
Every other method is working just fine, but remove. Is there something wrong with the loops, maybe use more optimal way or?
The reason for failure is that you're trying to change the map while iterating the entries. As with any collection - if you try to modify it while iterating it you'll get ConcurrentModificationException.
Further, there's a redundant inner for-loop (redundant because the whole purpose of a map is that you won't have to iterate it when you're looking for a specific value/s) which means that you'll try to override the file many times when only once is sufficient.
public static void remove(String wd) {
boolean done = false;
String word = wd.toLowerCase();
String[] line = word.split("[,\\s]+");
for (int j = 0; j < line.length; j++) {
map.remove(line[j]);
}
try {
FileWriter fw = new FileWriter(file.getAbsoluteFile());
BufferedWriter bw = new BufferedWriter(fw);
bw.write(map.toString());
bw.close();
done = true;
} catch (Exception e) {
e.printStackTrace();
}
if (done == true) {
System.out.println("Success");
}
}
The issues that I can see in your code are the following:
You forgot a quote when defining the file:
public static File file = new File( C:\\Users\\N\\Desktop\\Newfolder\\Dictionary\\src\\nmishewa\\geekycamp\\dictionary\\bg_win1251.txt")
should be:
public static File file = new File("C:\\Users\\N\\Desktop\\Newfolder\\Dictionary\\src\\nmishewa\\geekycamp\\dictionary\\bg_win1251.txt");
The remove() function in a map receives only one parameter, which is the key of the entry you want to remove, so:
map.remove(key.getKey(), key.getValue());
should be:
map.remove(key.getKey());
Also, since your getting the entrySet of your map, maybe you should consider renaming the key variable in the rename() function to entry.