Duplicate Encoder codewars java Junit exception - java

I am doing a kata on Codewars named "Duplicate Encoder".
The code I have written does its job correctly, but junit(4.12) insists it does not for some reason. Both on the website and in my IDE (Eclipse). I have no idea why that is. Could someone shine some light on this issue? Thanks.
The class to be tested:
package com.danman;
import java.util.*;
public class Person {
static String encode(String word){
word = word.toLowerCase();
List<String> bank = new ArrayList<>();
StringBuilder wordTwo = new StringBuilder("");
//1: create a list of all unique elements in the string
for (int n = 0; n < word.length(); n++) {
String temp = word.substring(n, n+1);
if (temp.equals(" ")){continue;}
bank.add(temp);
}
for (int r = 0; r <word.length(); r++){
List<String> bankTwo = bank;
Iterator<String> it = bankTwo.iterator();
String tempTwo = word.substring(r, r+1);
int count = 0;
//2: iterate through the list of elements and append the appropriate token to the StringBuilder
while (it.hasNext()){
if (it.next().equals(tempTwo)){
++count;
}
}
if (count <= 1){
wordTwo.append("(");
} else {
wordTwo.append(")");
}`enter code here`
}
word = wordTwo.toString();
return word;
}
public static void main(String[] args) {
Person rinus = new Person();
System.out.println(rinus.encode("Prespecialized"));
}
Junit file:
package com.danman;
import org.junit.Test;
import static org.junit.Assert.assertEquals;
public class PersonTest {
#Test
public void test() {
assertEquals(")()())()(()()(", Person.encode("Prespecialized"));
assertEquals("))))())))", Person.encode(" ()( "));
}

As far as I understand your code, first assert is ok. I don't know why encoding " ()( " should return "))))())))". You iterate through bank list of characters in given string (spaces are excluded in that list), checking whether there is more than one occurence of each character from the word in the bank list. When you check if there is more than one space, the answer will be no, appending (, because count value will equal 0 (due to spaces being excluded from the bank list).
The second assert should rather be
assertEquals("((()()(((", Person.encode(" ()( "));

Maybe you need this
import java.util.ArrayList;
import java.util.Iterator;
public class DuplicateEncoder {
static String encode(String word) {
word=word.toUpperCase();
ArrayList<String> stack1 =new ArrayList<>();
StringBuilder stringBuilder = new StringBuilder();
for(int i=0;i<word.length();i++){
String t = word.substring(i,i+1);
stack1.add(t);
}
for(int i=0;i<word.length();i++){
Iterator<String> iterator =stack1.iterator();
String t = word.substring(i,i+1);
int count=0;
while(iterator.hasNext()){
if(iterator.next().equals(t)){
count++;
}
}
if(count>1){
stringBuilder.append(")");
}
else stringBuilder.append("(");
}
word=stringBuilder.toString();
return word;}
public static void main(String[] args) {
encode("Pup");
}
}

Related

How to check if String[] contains my element? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I've been asked to write a code that adds elements to an array with a few conditions. I've searched all over StackOverflow to find out how to find an element in an array but all give me errors, so I'm guessing it's something wrong with my code and I can't figure out what. Any help is appreciated.
public class WordList
{
String [] words;
int count = 0;
int max = 2;
WordList()
{
words = new String[max];
this.words = words;
this.count = count;
}
public static void main (String[] args)
{
WordList w1 = new WordList();
System.out.println(w1.addWord("Dog"));
System.out.println(w1.addWord("Cat"));
System.out.println(w1.addWord("Fish"));
}
public int addWord(String newWord)
{
for(int i = 0; i < words.length; i++)
{
if(words.contains(newWord) == false && words.length < max)
{
words[i] = newWord;
}
else if(words.contains(newWord) == false && words.length == max)
{
max *= 2;
words[i] = newWord;
}
count = i + 1;
}
return count;
}
I think you could use Set instead of array.
public class WordList {
private final Set<String> words = new HashSet<>();
public int addWord(String word) {
if (word != null)
words.add(word);
return words.size();
}
public static void main(String[] args) {
WordList w1 = new WordList();
System.out.println(w1.addWord("Dog"));
System.out.println(w1.addWord("Cat"));
System.out.println(w1.addWord("Fish"));
}
}
I think this is what you're trying to do. Regular arrays dont have an indexOf or contains method so you need to use Arrays (make sure you import it too)
public int addWord(String newWord)
{
List <String> myList = Arrays.asList(words);
for(int i = 0; i < words.length; i++)
{
if(myList.indexOf(newWord) == -1 && words.length < max)
{
words[i] = newWord;
}
else if(myList.indexOf(newWord) == -1 && words.length == max)
{
max *= 2;
words[i] = newWord;
}
count = i + 1;
}
return count;
}
In Any way you'll need to write your own method to find it. It you are not limited with memory - you can convert array to List and use .contains() method.
Arrays.asList(words).contains(newWord);
Otherwise you may use stream to find the element.
Arrays.stream(words).anyMatch(newWord::equals);
I think you are looking for a solution something like this, but this is using ArrayList and not Array. Arrays do not have default contains method and you would need to implement your own method.
import java.util.ArrayList;
import java.util.List;
public class WordList {
private static List<String> words = new ArrayList<>();
public void addWord(String word) {
if (!words.contains(word)) {
words.add(word);
}
}
public static List<String> getWords() {
return words;
}
public static void main(String... args) {
WordList instance = new WordList();
instance.addWord("Dog");
instance.addWord("Cat");
instance.addWord("Fish");
System.out.println(instance.getWords());
}
}
Error was in words.contains(newWord)
cannot find symbol symbol: method contains(String) location:
variable words of type String[]
It could resolve by change code -
Arrays.asList(words).contains(newWord) instead of words.contains(newWord).
Remember you will need to import java.util.Arrays
Full code -
import java.util.Arrays;
/**
*
* #author pronet
*/
public class WordList
{
String [] words;
int count = 0;
int max = 2;
WordList()
{
words = new String[max];
this.words = words;
this.count = count;
}
public static void main (String[] args)
{
WordList w1 = new WordList();
System.out.println(w1.addWord("Dog"));
System.out.println(w1.addWord("Cat"));
System.out.println(w1.addWord("Fish"));
}
public int addWord(String newWord)
{
for(int i = 0; i < words.length; i++)
{
if(Arrays.asList( words ).contains(newWord) == false && words.length < max)
{
words[i] = newWord;
}
else if(Arrays.asList( words ).contains(newWord) == false && words.length == max)
{
max *= 2;
words[i] = newWord;
}
count = i + 1;
}
return count;
}
}
Hope it could help you.
import java.util.HashSet;
import java.util.Set;
public class WordList {
private final Set<String> words = new HashSet<>();
public static void main(String[] args) {
WordList w1 = new WordList();
System.out.println(w1.addWord("Dog"));
System.out.println(w1.addWord("Cat"));
System.out.println(w1.addWord("Fish"));
}
public int addWord(String word) {
words.add(word);
return words.size();
}
}

Given a Morse String with out any spaces, how to find the no. of words it can represent irrespective of the meaning

Given A morse String eg. aet = ".- . -" if the spaces are removed it will become an ambiguous morse string ".-.-" which can represent "aet","eta","ent","etet" etc.
the problem is to find the no.of words that the morse string without spaces can represent irrespective of the meaning of the words. The constraint is that the new word which is formed should be the same size of the input i.e "aet" = "ent" and other words like "etet" should be discarded.
i implemented a recursive solution for some reason it is not working. below is my code and thinking of converting this to DP approach to increase time efficiency. Can some one help to point out the mistake in the below code and is DP a right approach to follow for this problem? Thanks in advance!!
EDIT 1 :- The program gives me an output but not the correct one. for ex. for the morse String representing aet = ".- . -" if given without any spaces to the program ".-.-" it should give an out put "3" i.e 3 words can be formed that is of the same size as the input including the input "aet","eta","ent" but it gives me an output "1". I think there is some thing wrong with the recursive calls.
The approach used here is to simply cut the morse string in a place where first valid morse code is encountered and the repeat the process with the rest of the string untill 3 such valid morse code are found and check whether whole morse string is consumed. if consumed increment the word count and repeat the process for different values of substring size(end variable in the below code).
I hope this helps!!.Tried my best to explain as clearly as I could.
import java.util.*;
import java.io.*;
import java.math.*;
import java.text.*;
public class MorseCode2 {
static Map<String,String> morseCode;
static Map<String,String> morseCode2;
static int count = 0;
public static void main(String args[]){
String[] alpha = {"a","b","c","d","e","f","g","h","i","j","k",
"l","m","n","o","p","q","r","s","t","u","v",
"w","x","y","z"};
String[] morse = {".-","-...","-.-.","-..",".","..-.","--.","....","..",".---","-.-",".-..","--","-.","---",".--.","--.-",".-.","...","-","..-","...-",
".--","-..-","-.--","--.."};
morseCode = new HashMap<String,String>();
morseCode2 = new HashMap<String,String>();
for(int i = 0;i<26;i++){
morseCode.put(morse[i],alpha[i]);
}
for(int i = 0;i<26;i++){
morseCode2.put(alpha[i],morse[i]);
}
Scanner in = new Scanner(System.in);
String input = in.next();
String morseString = "";
for(int j = 0; j< input.length(); j++){
morseString += morseCode2.get(input.charAt(j)+"");
}
countPossibleWord(morseString,input.length(),0,1,0);
System.out.println(count);
in.close();
}
public static void countPossibleWord(String s,int inputSize,int start,int end,int tempCount){
if(start >= s.length() || end > s.length()){
return;
}
if(tempCount>inputSize){
return;
}
String sub = s.substring(start, end);
if(sub.length()>4){
return;
}
if(morseCode.get(sub)!=null){
tempCount++;
countPossibleWord(s,inputSize,end,end+1,tempCount);
}
else{
countPossibleWord(s,inputSize,start,end+1,tempCount);
}
if(tempCount == inputSize && end == s.length()){
count++;
}
countPossibleWord(s,inputSize,start,end+1,0);
}
}
EDIT 2 :- Thank you all for your Responses and Extremely sorry for the confusing code, will surely try to improve on writing neat and clear code. learnt a lot from your replies!!
And i also some how made the code work, the problem was I passed wrong argument which changed the state of the recursive calls. Instead of passing "tempCount-1" for the last argument in the last function call in the method "countPossibleWord" i passed "0" this altered the state. found this after running through the code manually for larger inputs. below is the corrected method
public static void countPossibleWord(String s,int inputSize,int start,int end,int tempCount){
if(start >= s.length() || end > s.length()){
return;
}
if(tempCount>inputSize){
return;
}
String sub = s.substring(start, end);
if(sub.length()>4){
return;
}
if(morseCode.get(sub)!=null){
tempCount++;
countPossibleWord(s,inputSize,end,end+1,tempCount);
}
else{
countPossibleWord(s,inputSize,start,end+1,tempCount);
}
if(tempCount == inputSize && end == s.length()){
count++;
}
countPossibleWord(s,inputSize,start,end+1,tempCount-1);
}
}
If you like to have a recursive function, you should be clear about your parameters (use as few as possible) as well as when to step down and when to go up again.
My solution would look something like
public static int countPossibleWord(String strMorse, String strAlpha, int inputSize) {
if (strMorse.length() > 0) { // still input to process
if (strAlpha.length() >= inputSize)
return 0; // String already has wrong size
int count = 0;
for (int i = 0; i < morse.length; i++) { // try all morse codes
if (strMorse.startsWith(morse[i])) { // on the beginning of the given string
count += countPossibleWord(strMorse.substring(morse[i].length()), strAlpha+alpha[i], inputSize);
}
}
return count;
} else {
if( strAlpha.length() == inputSize ) {
System.out.println( strAlpha );
return 1; // one solution has been found
} else {
return 0; // String has wrong size
}
}
}
Your morse and alpha arrays need to be static variables for this to work.
Note that there is only one situation where the recursion will step down: when there is some input left and the size limit is not reached. Then it will check for the next possible letter in the loop.
All other cases will lead the recursion to go one step up again - and when going up, it will return the number of solutions found.
Call it like this:
System.out.println(countPossibleWord(morseString, "", input.length() ));
The fact that you use a class variable instead of the returned value of the recursive function makes it extremely unclear. Even for you as #Thomas Weller said. You should clarify the possible cases when a count one more letter. I deleted eclipse, hence I coded it in C, I hope I will still help you to understand the algo :(understand char* as string)
char morse[26][5] = {".-","-...","-.-.","-..",".","..-.","--.","....","..",".---","-.-",".-..","--","-.","---",
".--.","--.-",".-.","...","-","..-","...-",".--","-..-","-.--","--.."};
int countPossibleWord(char* s, int inputSize, int start, char* buffer, int sizeBuff){
if(start == inputSize){
if(sizeBuff == 0) return 1;
else return 0;
}
char buff[sizeBuff+2]; //
strncpy(buff, buffer, sizeBuff);//
buff[sizeBuff] = s[start]; // buff = buff+s[start]
buff[sizeBuff+1] = '\0'; //
for(int i = 0; i < 26; ++i){
//run the equivalent of your map to find a match
if(strcmp(buff, morse[i]) == 0)
return countPossibleWord(s, inputSize, start+1, "", 0) + countPossibleWord(s, inputSize, start+1, buff, sizeBuff+1);
}
return countPossibleWord(s, inputSize, start+1, buff, sizeBuff+1);
}
The problem with your code is, that you don't understand it any more, because it's not clean as described by Robert C. Martin. Compare your code to the following. This is certainly still not the cleanest, but I think you can understand what it does. Tell me if you don't.
Consider this main program:
import java.util.LinkedList;
import java.util.List;
import java.util.Scanner;
public class Program {
public static void main(String[] args) {
String morsetext = enterTextOnConsole();
MorseTable morseTable = new MorseTable();
MorseCode code = convertToMorseCodeWithoutSpaces(morsetext, morseTable);
List<String> guesses = getAllPossibleMeanings(code, morseTable);
List<String> guessesOfSameLength = filterForSameLength(morsetext, guesses);
printListOnConsole(guessesOfSameLength);
}
private static void printListOnConsole(List<String> guessesOfSameLength) {
for (String text : guessesOfSameLength) {
System.out.println(text);
}
}
private static List<String> filterForSameLength(String morsetext, List<String> guesses) {
List<String> guessesOfSameLength = new LinkedList<String>();
for (String guess : guesses) {
if (guess.length() == morsetext.length())
{
guessesOfSameLength.add(guess);
}
}
return guessesOfSameLength;
}
private static List<String> getAllPossibleMeanings(MorseCode code, MorseTable morseTable) {
MorseCodeGuesser guesser = new MorseCodeGuesser(morseTable);
List<String> guesses = guesser.guess(code);
return guesses;
}
private static MorseCode convertToMorseCodeWithoutSpaces(String morsetext, MorseTable morseTable) {
MorseCode code = new MorseCode(morseTable);
code.fromText(morsetext);
code.stripSpaces();
return code;
}
private static String enterTextOnConsole() {
Scanner scanner = new Scanner(System.in);
String text = scanner.next();
scanner.close();
return text;
}
}
and the following MorseTable class:
import java.util.HashMap;
import java.util.Map;
public class MorseTable {
private static final Map<String, String> morseTable;
private static int longestCode = -1;
static
{
morseTable = new HashMap<String, String>();
morseTable.put("a", ".-");
morseTable.put("b", "-...");
morseTable.put("c", "-.-.");
morseTable.put("e", ".");
morseTable.put("t", "-");
morseTable.put("n", "-.");
// TODO: add more codes
for (String code : morseTable.values()) {
longestCode = Math.max(longestCode, code.length());
}
}
public String getMorseCodeForCharacter(char c) throws IllegalArgumentException {
String characterString = ""+c;
if (morseTable.containsKey(characterString)) {
return morseTable.get(characterString);
}
else {
throw new IllegalArgumentException("No morse code for '"+characterString+"'.");
}
}
public int lengthOfLongestMorseCode() {
return longestCode;
}
public String getTextForMorseCode(String morseCode) throws IllegalArgumentException {
for (String key : morseTable.keySet()) {
if (morseTable.get(key).equals(morseCode)) {
return key;
}
}
throw new IllegalArgumentException("No character for morse code '"+morseCode+"'.");
}
}
and the MorseCode class
public class MorseCode {
public MorseCode(MorseTable morseTable)
{
_morseTable = morseTable;
}
final MorseTable _morseTable;
String morseCode = "";
public void fromText(String morsetext) {
for(int i=0; i<morsetext.length(); i++) {
char morseCharacter = morsetext.charAt(i);
morseCode += _morseTable.getMorseCodeForCharacter((morseCharacter));
morseCode += " "; // pause between characters
}
}
public void stripSpaces() {
morseCode = morseCode.replaceAll(" ", "");
}
public MorseCode substring(int begin, int end) {
MorseCode subcode = new MorseCode(_morseTable);
try{
subcode.morseCode = morseCode.substring(begin, end);
} catch(StringIndexOutOfBoundsException s) {
subcode.morseCode = "";
}
return subcode;
}
public MorseCode substring(int begin) {
return substring(begin, morseCode.length());
}
public String asPrintableString() {
return morseCode;
}
public boolean isEmpty() {
return morseCode.isEmpty();
}
}
and last not least, the MorseCodeGuesser
import java.util.LinkedList;
import java.util.List;
public class MorseCodeGuesser {
private final MorseTable _morseTable;
public MorseCodeGuesser(MorseTable morseTable) {
_morseTable = morseTable;
}
public List<String> guess(MorseCode code) {
List<String> wordList = new LinkedList<String>();
if (code.isEmpty()) return wordList;
for(int firstCodeLength=1; firstCodeLength<=_morseTable.lengthOfLongestMorseCode(); firstCodeLength++) {
List<String> guesses = guess(code, firstCodeLength);
wordList.addAll(guesses);
}
return wordList;
}
private List<String> guess(MorseCode code, int firstCodeLength) {
MorseCode firstCode = code.substring(0, firstCodeLength);
String firstCharacter;
try{
firstCharacter = _morseTable.getTextForMorseCode(firstCode.asPrintableString());
} catch(IllegalArgumentException i) {
return new LinkedList<String>(); // no results for invalid code
}
MorseCode remainingCode = code.substring(firstCodeLength);
if (remainingCode.isEmpty()) {
List<String> result = new LinkedList<String>();
result.add(firstCharacter); // sole result if nothing is left
return result;
}
List<String> result = new LinkedList<String>();
List<String> remainingPossibilities = guess(remainingCode);
for (String possibility : remainingPossibilities) {
result.add(firstCharacter + possibility); // combined results
}
return result;
}
}
I have pasted my own solution to it. I have followed DFS and it is giving the correct answer for the given problem statement. Please ask if there are any queries.
alpha =["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
key = [".-","-...","-.-.","-..",".","..-.","--.","....","..",".---","-.-",".-..","--","-.","---",".--.","--.-",".-.","...","-","..-","...-",".--",
"-..-","-.--","--.."]
dic = dict(list(zip(key,alpha)))
def morse_code(morse,count,res,char,length):
global dic
if count == length - 1:
if morse[char:] in dic:
res = res + 1
return res
word = ''
for i in range(char,len(morse)):
word = word + morse[i]
if word not in dic:
continue
else:
count = count + 1
res = morse_code(morse,count,res,i+1,length)
count = count - 1
return res
if __name__ = 'main'
inp = input()
morse = ''
for i in inp:
morse = morse + key[ord(i)-ord('a')]
result = morse_code(morse,0,0,0,len(inp))
print(result)

How to convert string of numbers into int array in java

I have an input String of "0102030405" how can I split this number by two so that it would have an output of String[] ("01", "02", "03", "04", "05").
Try this,
String a = "0102030405";
System.out.println(Arrays.toString(a.split("(?<=\\G.{2})")));
String input = "0102030405";
String[] output = new String[input.length()/2];
int k=0;
for(int i=0;i<input.length();i+=2){
output[k++] = input.substring(i, i+2);
}
for(String s:output){
System.out.println(s);
}
You could try something like reading each two characters from a string.
This could be solved by: "(?<=\\G.{2})"
But I think a cleaner solution is this:
string.substring(startStringInt, endStringInt);
Here is a complete example:
package Main;
import java.util.ArrayList;
import java.util.List;
public class Test {
public static void main(String[] args) {
for (String part : splitString("0102030405", 2)) {
System.out.println(part);
}
}
private static List<String> splitString(String string, int numberOfChars) {
List<String> result = new ArrayList<String>();
for (int i = 0; i < string.length(); i += numberOfChars)
{
result.add(string.substring(i, Math.min(string.length(), i + numberOfChars)));
}
return result;
}
}
import java.util.ArrayList;
public class HelloWorld{
public static void main(String []args){
HelloWorld h1 = new HelloWorld();
String a = "0102030405";
System.out.println(h1.getSplitString(a));
}
private ArrayList<String> getSplitString(String stringToBeSplitted) {
char[] charArray = stringToBeSplitted.toCharArray();
int stringLength = charArray.length;
ArrayList<String> outPutArray = new ArrayList<String>();
for(int i=0; i <= stringLength-2; i+=2){
outPutArray.add("" + charArray[i] + charArray[i+1]);
}
return outPutArray;
}
}
Here the String is first split into a char array. Then using a for loop two digits are concatenated and put into a ArrayList to return. If you need an Array to return you can change the return type to String[] and in the return statement change it to
outPutArray.toArray(new String[outPutArray.size()];
If you insert a string has odd number of characters last character will be omitted. Change the loop condition to fix that.

Regex to count the number of syllables in a text

I searched the whole internet and to my sadness found that there is no correct implementation of count of syllables in a text using regex on the internet. First I would like to clear the definition of a syllable:
Syllables are defined as: a contiguous sequence of vowels, except for a lone "e" at the end of a word if the word has another set of contiguous vowels, makes up one syllable. y is considered a vowel.
I used the following regex expression statement (with split in Java):
import java.io.FileReader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Scanner;
class Graph {
private Map<Integer, ArrayList<Integer>> adjLists;
private int numberOfVertices;
private int numberOfEdges;
public Graph(int V){
adjLists = new HashMap<>(V);
for(int i=0; i<V; i++){
adjLists.put(i, new ArrayList<Integer>());
}
this.numberOfVertices = V;
this.numberOfEdges = 0;
}
public int getNumberOfEdges(){
return this.numberOfEdges;
}
public int getNumberOfVertices(){
return this.numberOfVertices;
}
public void addVertex(){
adjLists.put(getNumberOfVertices(), new ArrayList<Integer>());
this.numberOfVertices++;
}
public void addEdge(int u, int v){
adjLists.get(u).add(v);
adjLists.get(v).add(u);
this.numberOfEdges++;
}
public ArrayList<Integer> getNeighbours(int u){
return new ArrayList<Integer>(adjLists.get(u));
}
public void printTheGraph() {
for(Entry<Integer, ArrayList<Integer>> list: adjLists.entrySet()){
System.out.print(list.getKey()+": ");
for(Integer i: list.getValue()){
System.out.print(i+" ");
}
System.out.println();
}
}
}
#SuppressWarnings("resource")
public class AdjacencyListGraphTest {
public static void main(String[] args) throws Exception {
FileReader reader = new FileReader("graphData");
Scanner in = new Scanner(reader);
int E, V;
V = in.nextInt();
E = in.nextInt();
Graph graph = new Graph(V);
for(int i=0; i<E; i++){
int u, v;
u = in.nextInt();
v = in.nextInt();
graph.addEdge(u, v);
}
graph.printTheGraph();
}
}
But thet didn't work.
The main problem is to how the last 'e' rule is to be figured out using regex. Only the regex expression would suffice. Thank you.
P.S: People unknown to the topic please don't point to other stackoverflow questions as none of them has a correct implemented answer.
This gives you a number of syllables vowels in a word:
public int getNumVowels(String word) {
String regexp = "[bcdfghjklmnpqrstvwxz]*[aeiouy]+[bcdfghjklmnpqrstvwxz]*";
Pattern p = Pattern.compile(regexp);
Matcher m = p.matcher(word.toLowerCase());
int count = 0;
while (m.find()) {
count++;
}
return count;
}
You can call it on every word in your string array:
String[] words = getText().split("\\s+");
for (String word : words ) {
System.out.println("Word: " + word + ", vowels: " + getNumVowels(word));
}

Java: I've created a list of word objects to include the name and the frequency, but having trouble updating the frequency

I'm working on a project which has a dictionary of words and I'm extracting them and adding them to an ArrayList as word objects. I have a class called Word as below.
What I'm wondering is how do I access these word objects to update the frequency? As part of this project, I need to only have one unique word, and increase the frequency of that word by the number of occurrences in the dictionary.
Word(String word)
{
this.word = word;
this.freq = 0;
}
public String getWord() {
return word;
}
public int getFreq() {
return freq;
}
public void setFreq() {
freq = freq + 1;
}
This is how I am adding the word objects to the ArrayList...I think it's ok?
String pattern = "[^a-zA-Z\\s]";
String strippedString = line.replaceAll(pattern, "");
line = strippedString.toLowerCase();
StringTokenizer st = new StringTokenizer(line);
while (st.hasMoreTokens())
{
String newWord = st.nextToken();
word.add(new Word(newWord));
count++;
}
Instead of an ArrayList use a Bag. This keeps the counts for you.
Use a map to store the words and the Word Object. Ideally a hashset is enough to do this. But internally a hashset is going to use a HashMap anyway. The following piece of code will also be useful for you to increase the frequency of the words that you had already inserted.
Map<String, Word> wordsMap = new HashMap<String, Word>();
String pattern = "[^a-zA-Z\\s]";
String strippedString = line.replaceAll(pattern, "");
line = strippedString.toLowerCase();
StringTokenizer st = new StringTokenizer(line);
while (st.hasMoreTokens())
{
String newWord = st.nextToken();
if(!wordsMap.containsKey(newWord)){
wordsMap.put(newWord, new Word(newWord));
}else{
Word existingWord = wordsMap.get(newWord);
existingWord.setFreq();
}
count++;
}
I would solve the problem with the following code:
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class Word {
private final String word;
private int frequency;
public Word(String word) {
this.word = word;
this.frequency = 0;
}
public String getWord() {
return word;
}
public int getFrequency() {
return frequency;
}
public void increaseFrequency() {
frequency++;
}
I didn't call this method setFrequency because it is not a real setter method. For a real setter method, you would pass it exactly one parameter.
public static List<Word> histogram(String sentence) {
First, compute the frequency of the individual words.
String[] words = sentence.split("\\W+");
Map<String, Word> histo = new HashMap<String, Word>();
for (String word : words) {
Word w = histo.get(word);
if (w == null) {
w = new Word(word);
histo.put(word, w);
}
w.increaseFrequency();
}
Then, sort the words such that words with higher frequency appear first.
If the frequency is the same, the words are sorted almost alphabetically.
List<Word> ordered = new ArrayList<Word>(histo.values());
Collections.sort(ordered, new Comparator<Word>() {
public int compare(Word a, Word b) {
int fa = a.getFrequency();
int fb = b.getFrequency();
if (fa < fb)
return 1;
if (fa > fb)
return -1;
return a.getWord().compareTo(b.getWord());
}
});
return ordered;
}
Finally, test the code with a simple example.
public static void main(String[] args) {
List<Word> freq = histogram("a brown cat eats a white cat.");
for (Word word : freq) {
System.out.printf("%4d %s\n", word.getFrequency(), word.getWord());
}
}
}
You can use a google collections' Multiset of String instead of the Word class

Categories