Faster way to check if we can write message from given letters - java

I needed to write a function that takes as input two strings. One is the message I want to write and second are given letters. Letters are ordered randomly.There is no guarantee that each letter occurs a similar number of times .some letters might be missing entirely.
The function should determine if I can write message with the given
letters and it should return true or false accordingly.
I coded it and I think it is very fast, but how can I improve it having in mind the string with letters would be very large while the message would be very short?
Is there a fastest way?
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
public class LetterBowl {
public static void main(String []args){
String message = generateRandomStringUpToThousandChars();
String bowlWithLetters = generateRandomStringUpToThousandChars();
if(canConstructMessage(message, bowlWithLetters)) {
System.out.println("Message '" + message + "' can be constructed with letters from bowl : " + bowlWithLetters);
}
}
public static boolean canConstructMessage(String message, String letters) {
Map<Character,Integer> letterMap = stringToCharacterMap(letters);
char[] messageList = stringToCharacterList(message);
for(char c : messageList) {
if (!containsLetterAndSubtract(c,letterMap))
return false;
}
return true;
}
// checks if map(bowl) contains char andsubtract one char from map(or removes it if it is last one)
public static boolean containsLetterAndSubtract(char c, Map<Character,Integer> letterMap) {
if(letterMap.containsKey(c)) {
if(letterMap.get(c) > 1) {
letterMap.put(c, letterMap.get(c) - 1);
} else {
letterMap.remove(c);
}
return true;
}
return false;
}
public static char[] stringToCharacterList(String message) {
return message.replaceAll(" ", "").toCharArray();
}
public static Map<Character,Integer> stringToCharacterMap(String s) {
Map<Character,Integer> map = new HashMap<Character,Integer>();
for (char c : s.toCharArray()) {
if(map.containsKey(c))
map.put(c, map.get(c) + 1);
else
map.put(c, 1);
}
return map;
}
public static String generateRandomStringUpToThousandChars(){
char[] chars = "abcdefghijklmnopqrstuvwxyz".toCharArray();
StringBuilder sb = new StringBuilder();
Random random = new Random();
for (int i = 0; i < random.nextInt(1000); i++) {
char c = chars[random.nextInt(chars.length)];
sb.append(c);
}
String output = sb.toString();
return output;
};
}
For large bowl size and smaller msg size i found this would be mor efficient :
public static boolean canConstructMessageSorted(String message, String bowlWithLetters) {
int counter = 0;
boolean hasLetter;
//sorting
char[] chars = bowlWithLetters.toCharArray();
Arrays.sort(chars);
String sortedBowl = new String(chars);
//sorting
chars = message.toCharArray();
Arrays.sort(chars);
String sortedMsg = new String(chars);
for (int i = 0; i < sortedMsg.length(); i++) {
hasLetter = false;
for( ; counter < sortedBowl.length() ; counter++) {
if(sortedMsg.charAt(i) == sortedBowl.charAt(counter)) {
hasLetter = true;
break;
}
}
if(!hasLetter) return false;
}
return true;
}

You're operating at O(message.size + letters.size). This is the lowest worst-case time-complexity that I could figure out, on hand. Referring to the fastest way, there's always more you could do. For example, defining the method
public static char[] stringToCharacterList(String message)
and only using it once is technically time-inefficient. You could have simply put that body of code within the canConstructMessage() method, saving another item from being placed on, and taken off of the stack. Although this is such a small fragment of time, when you say fastest, it could be worth talking about.

For every letter in letters, remove 1 copy of it from the message. If the message ends up empty, the answer is "yes":
public static boolean canConstructMessage(String message, String letters) {
for (int i = 0; i < letters.length(); i++)
message = message.replaceFirst("" + letters.charAt(i), "");
return message.isEmpty();
}
If reusing letters is allowed, you can do it in 1 line:
public static boolean canConstructMessage(String message, String letters) {
return letters.chars().boxed().collect(Collectors.toSet())
.containsAll(message.chars().boxed().collect(Collectors.toSet());
}

I found this would be more efficient for large bowl size and small msg size :
public static boolean canConstructMessageSorted(String message, String bowlWithLetters) {
int counter = 0;
boolean hasLetter;
//sorting
char[] chars = bowlWithLetters.toCharArray();
Arrays.sort(chars);
String sortedBowl = new String(chars);
//sorting
chars = message.toCharArray();
Arrays.sort(chars);
String sortedMsg = new String(chars);
for (int i = 0; i < sortedMsg.length(); i++) {
hasLetter = false;
for( ; counter < sortedBowl.length() ; counter++) {
if(sortedMsg.charAt(i) == sortedBowl.charAt(counter)) {
hasLetter = true;
break;
}
}
if(!hasLetter) return false;
}
return true;
}

Related

are two anagram or not? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
My question is that in this code, initially we have taken boolean isAnagram false, and then set the condition, but we are getting wrong result. As it is clearly understood that they are not anagram but code output is 'anagram' .
package strings;
public class Anagrams {
public static void main(String[] args) {
String a = "aab";
String b = "abc";
boolean isAnagram = false;
int al[] = new int[256];
int bl[] = new int[256];
for(char c:a.toCharArray()) {
int index = (int)c;
al[index]++;
}
for(char c:b.toCharArray()) {
int index = (int)c;
bl[index]++;
}
for(int i = 0; i<256; i++) {
if(al[i] == bl[i]) {
isAnagram = true;
}
}
if(isAnagram) {
System.out.println("anagram");
}else {
System.out.println("not anagram");
}
}
}
}
I think sorting the string and then compare them is more simple.
public static void main(String[] args) {
String a = "aab";
String b = "abc";
char[] a1 = a.toLowerCase().toCharArray();
char[] b1 = b.toLowerCase().toCharArray();
Arrays.sort(a1);
Arrays.sort(b1);
boolean isAnagram = new String(a1).equals(new String(b1));
System.out.println(isAnagram ? "anagram" : "not anagram");
}
Okay.
The questioner wants his own algorithm to work.
The main bug is that it needs to find mismatches in the char set for two words being compared.
So you can declare a counter and while you iterate through char position in both words you increase the counter every time you find a mismatch between the number of some specific letter in the first and the second word.
At the end, if the counter > 0, this means the words have different sets of chars.
The working code:
class Ideone
{
// Online Java Compiler
// Use this editor to write, compile and run your Java code online
public static void main(String[] args) {
String a = "aab";
String b = "abb";
int mismatch = 0;
boolean isAnagram = true;
int al[] = new int[143859];
int bl[] = new int[143859];
for(char c:a.toCharArray()) {
int index = (int)c;
al[index]++;
}
for(char c:b.toCharArray()) {
int index = (int)c;
bl[index]++;
}
for(int i = 0; i<143859; i++) {
if(al[i] != bl[i]) {
mismatch++;
}
}
if (mismatch>0) isAnagram = false;
if(isAnagram) {
System.out.println("anagram");
}else {
System.out.println("not anagram");
}
}
}
Your code yields true if ONE char count matches. But it should only be true if ALL char counts match. Turn the logic around, start with true and set to false on the first mismatch. Change the line
boolean isAnagram = false;
to
boolean isAnagram = true;
and
if(al[i] == bl[i]) {
isAnagram = true;
}
to
if(al[i] != bl[i]) {
isAnagram = false;
break;
}
But sorting the strings is indeed the solution that is more readable and easier to understand.
The problem is the last for-loop:
for(int i = 0; i<256; i++) {
if(al[i] == bl[i]) {
isAnagram = true;
}
}
If only a single position in both arrays match, isAnagram is set to true. To fix the problem, We can inverse our perspective: Let us assume that the two Strings are anagrams at the start (boolean isAnagram = true;) and set the flag to false iff. the two arrays a and b differ on some index i. We can also break the loop on the first mismatch we find.
public static void main(String[] args) {
String a = "aab";
String b = "aac";
boolean isAnagram = true;
int al[] = new int[256];
int bl[] = new int[256];
for (char c : a.toCharArray()) {
int index = (int) c;
al[index]++;
}
for (char c : b.toCharArray()) {
int index = (int) c;
bl[index]++;
}
for (int i = 0; i < 256; i++) {
if (al[i] != bl[i]) {
isAnagram = false;
break;
}
}
if (isAnagram) {
System.out.println("anagram");
} else {
System.out.println("not anagram");
}
}
Ideone demo
Since chars in Java are encoded in unicode, it could occur that the int-value of a char is >= 256 (Ideone demo). To prevent this problem, we can use a Map<Integer, Integer> to keep track of the codepoint frequency:
public static boolean areAnagrams(String s, String t) {
Objects.requireNonNull(s, "Parameter \"s\" is null");
Objects.requireNonNull(t, "Parameter \"t\" is null");
return Objects.equals(s, t) ||
Objects.equals(getCodePointFrequency(s), getCodePointFrequency(t));
}
public static Map<Integer, Integer> getCodePointFrequency(String s) {
return s.codePoints()
.boxed()
.collect(Collectors.toMap(Function.identity(), c -> 1, Integer::sum));
}
Ideone demo
It should be mentioned that this solution has a worst-case time complexity of O(n log(n)) since an insert into a map only guarantees O(log(n)), not O(1). The average case, however, should be O(max(n)), with n being the length of the longer String of s and t.

Common characters in n strings

I m trying to make a function that prints the number of characters common in given n strings. (note that characters may be used multiple times)
I am struggling to perform this operation on n strings However I did it for 2 strings without any characters repeated more than once.
I have posted my code.
public class CommonChars {
public static void main(String[] args) {
String str1 = "abcd";
String str2 = "bcde";
StringBuffer sb = new StringBuffer();
// get unique chars from both the strings
str1 = uniqueChar(str1);
str2 = uniqueChar(str2);
int count = 0;
int str1Len = str1.length();
int str2Len = str2.length();
for (int i = 0; i < str1Len; i++) {
for (int j = 0; j < str2Len; j++) {
// found match stop the loop
if (str1.charAt(i) == str2.charAt(j)) {
count++;
sb.append(str1.charAt(i));
break;
}
}
}
System.out.println("Common Chars Count : " + count + "\nCommon Chars :" +
sb.toString());
}
public static String uniqueChar(String inputString) {
String outputstr="",temp="";
for(int i=0;i<inputstr.length();i++) {
if(temp.indexOf(inputstr.charAt(i))<0) {
temp+=inputstr.charAt(i);
}
}
System.out.println("completed");
return temp;
}
}
3
abcaa
bcbd
bgc
3
their may be chances that a same character can be present multiple times in
a string and you are not supposed to eliminate those characters instead
check the no. of times they are repeated in other strings. for eg
3
abacd
aaxyz
aatre
output should be 2
it will be better if i get solution in java
You have to convert all Strings to Set of Characters and retain all from the first one. Below solution has many places which could be optimised but you should understand general idea.
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
public class Main {
public static void main(String[] args) {
List<String> input = Arrays.asList("jonas", "ton", "bonny");
System.out.println(findCommonCharsFor(input));
}
public static Collection<Character> findCommonCharsFor(List<String> strings) {
if (strings == null || strings.isEmpty()) {
return Collections.emptyList();
}
Set<Character> commonChars = convertStringToSetOfChars(strings.get(0));
strings.stream().skip(1).forEach(s -> commonChars.retainAll(convertStringToSetOfChars(s)));
return commonChars;
}
private static Set<Character> convertStringToSetOfChars(String string) {
if (string == null || string.isEmpty()) {
return Collections.emptySet();
}
Set<Character> set = new HashSet<>(string.length() + 10);
for (char c : string.toCharArray()) {
set.add(c);
}
return set;
}
}
Above code prints:
[n, o]
A better strategy for your problem is to use this method:
public int[] countChars(String s){
int[] count = new int[26];
for(char c: s.toCharArray()){
count[c-'a']++;
}
return count;
}
Now if you have n Strings (String[] strings) just find the min of common chars for each letter:
int[][] result = new int[n][26]
for(int i = 0; i<strings.length;i++){
result[i] = countChars(s);
}
// now if you sum the min common chars for each counter you are ready
int commonChars = 0;
for(int i = 0; i< 26;i++){
int min = result[0][i];
for(int i = 1; i< n;i++){
if(min>result[j][i]){
min = result[j][i];
}
}
commonChars+=min;
}
Get list of characters for each string:
List<Character> chars1 = s1.chars() // list of chars for first string
.mapToObj(c -> (char) c)
.collect(Collectors.toList());
List<Character> chars2 = s2.chars() // list of chars for second string
.mapToObj(c -> (char) c)
.collect(Collectors.toList());
Then use retainAll method:
chars1.retainAll(chars2); // retain in chars1 only the chars that are contained in the chars2 also
System.out.println(chars1.size());
If you want to get number of unique chars just use Collectors.toSet() instead of toList()
Well if one goes for hashing:
public static int uniqueChars(String first, String second) {
boolean[] hash = new boolean[26];
int count = 0;
//reduce first string to unique letters
for (char c : first.toLowerCase().toCharArray()) {
hash[c - 'a'] = true;
}
//reduce to unique letters in both strings
for(char c : second.toLowerCase().toCharArray()){
if(hash[c - 'a']){
count++;
hash[c - 'a'] = false;
}
}
return count;
}
This is using bucketsort which gives a n+m complexity but needs the 26 buckets(the "hash" array).
Imo one can't do better in regards of complexity as you need to look at every letter at least once which sums up to n+m.
Insitu the best you can get is imho somewhere in the range of O(n log(n) ) .
Your aproach is somewhere in the league of O(n²)
Addon: if you need the characters as a String(in essence the same as above with count is the length of the String returned):
public static String uniqueChars(String first, String second) {
boolean[] hash = new boolean[26];
StringBuilder sb = new StringBuilder();
for (char c : first.toLowerCase().toCharArray()) {
hash[c - 'a'] = true;
}
for(char c : second.toLowerCase().toCharArray()){
if(hash[c - 'a']){
sb.append(c);
hash[c - 'a'] = false;
}
}
return sb.toString();
}
public static String getCommonCharacters(String... words) {
if (words == null || words.length == 0)
return "";
Set<Character> unique = words[0].chars().mapToObj(ch -> (char)ch).collect(Collectors.toCollection(TreeSet::new));
for (String word : words)
unique.retainAll(word.chars().mapToObj(ch -> (char)ch).collect(Collectors.toSet()));
return unique.stream().map(String::valueOf).collect(Collectors.joining());
}
Another variant without creating temporary Set and using Character.
public static String getCommonCharacters(String... words) {
if (words == null || words.length == 0)
return "";
int[] arr = new int[26];
boolean[] tmp = new boolean[26];
for (String word : words) {
Arrays.fill(tmp, false);
for (int i = 0; i < word.length(); i++) {
int pos = Character.toLowerCase(word.charAt(i)) - 'a';
if (tmp[pos])
continue;
tmp[pos] = true;
arr[pos]++;
}
}
StringBuilder buf = new StringBuilder(26);
for (int i = 0; i < arr.length; i++)
if (arr[i] == words.length)
buf.append((char)('a' + i));
return buf.toString();
}
Demo
System.out.println(getCommonCharacters("abcd", "bcde")); // bcd

Isogram- a word without repeated letters

I want to develop a java code to detect a repeated letter in word and print the desired result but mine keeps iterating and i have no idea on how to get about it. Here is the code:
import java.util.*;
public class Isogram {
public static void main(String[] args){
Scanner input = new Scanner(System.in);
System.out.println("Enter the name: ");
String car = input.nextLine().toLowerCase();
char[] jhd = car.toCharArray();
Arrays.sort(jhd);
for(int ch = 0; ch < jhd.length; ch++){
try {
if (jhd[ch] == jhd[ch + 1]) {// || jhd[ch] == jhd[ch]){
System.out.print("THis is an Isogram");
} else {
System.out.println("Ripu from here");
}
} catch(ArrayIndexOutOfBoundsException ae) {
System.out.println(ae);
}
}
}
}
If u have an adjustment or a better code it will be helpful.
private static String isIsogram(String s){
String[] ary = s.split("");
Set<String> mySet = new HashSet<String>(Arrays.asList(ary));
if(s.length() == mySet.size()){
return "Yes!";
}else{
return "NO";
}
}
Create an array from the string.
Convert array to a List and then create a Set from that List. Set
keeps only unique values.
If set size equals the initial string length then it is an
isogram. If set is smaller than initial string then there were
duplicate characters.
public static boolean isIsogram(String str) {
boolean status = true;
char [] array = str.toCharArray();
Character[] charObjectArray = ArrayUtils.toObject(array);
Map<Character,Integer> map = new HashMap<>();
for (Character i : charObjectArray){
Integer value = 1;
if (map.containsKey(i)){
Integer val = map.get(i);
map.put(i,val+1);
}
else map.put(i,value);
}
for (Integer integer: map.values()){
if (integer>1){
status= false;
break;
}
else status = true;
}
return status;
}
}

Trouble breaking from a method

I am having difficulties with my method returning true. It is a boolean method that takes two words and tries to see if one can be turned into the other by transposing two neighboring letters. I have had no troubles getting the false boolean. When the code gets to the for loop with an if statement in it it runs fine but does not return true when the if statement is satisfied. For some reason it continues through the for loop. For example, when comparing "teh" and "the" when the loop hits 1 the if statement is satisfied but does not return true, the for lo
public static boolean transposable(String word1, String word2)
{
ArrayList<Character> word1char = new ArrayList<Character>();
ArrayList<Character> word2char = new ArrayList<Character>();
int word1length = word1.length();
int word2length = word2.length();
int count = 0;
String w1 = word1.toUpperCase();
String w2 = word2.toUpperCase();
if(word1length != word2length)
{
return false;
}
for(int i = 0; i < word1length; i++)
{
char letter1 = w1.charAt(i);
word1char.add(letter1);
char letter2 = w2.charAt(i);
word2char.add(letter2);
}
for(int i = 0; i < word1length; i++)
{
char w1c = word1char.get(i);
char w2c = word2char.get(i);
if(w1c == w2c)
{
count++;
}
}
if(count < word1length - 2)
{
return false;
}
for(int i = 0; i < word1length; i++)
{
char w1c = word1char.get(i);
char w2c = word2char.get(i+1);
if(w1c == w2c)
{
return true;
}
}
return false;
}
op just keeps running. What am I doing wrong?
As pointed out in the comments this doesn't seem to be the easiest way around this problem. Here is a solution which tries to follow your logic and includes the use of toUpperCase() and ArrayLists.
Going over your code it looks like you were getting a bit lost in your logic. This is because you had one method trying to do everything. Break things down into smaller methods and you also will benefit by not having to repeat code and it keeps things much cleaner. The code below is tested with Java8 (although there is no reason why this should not work with Java 7).
public static void main(String args[]) {
String word1 = "Hello";
String word2 = "Hlelo";
transposable(word1, word2);
}
private static boolean transposable(String word1, String word2) {
// Get an ArrayList of characters for both words.
ArrayList<Character> word1CharacterList = listOfCharacters(word1);
ArrayList<Character> word2CharacterList = listOfCharacters(word2);
boolean areWordsEqual;
// Check that the size of the CharacterLists is the same
if (word1CharacterList.size() != word2CharacterList.size()) {
return false;
}
// check to see if words are equal to start with
areWordsEqual = checkIfTwoWordsAreTheSame(word1CharacterList, word2CharacterList);
System.out.print("\n" + "Words are equal to be begin with = " + areWordsEqual);
if (!areWordsEqual) {
/*
This loop i must start at 1 because you can't shift an ArrayList index of 0 to the left!
Loops through all the possible combinations and checks if there is a match.
*/
for (int i = 1; i < word1CharacterList.size(); i++) {
ArrayList<Character> adjustedArrayList = shiftNeighbouringCharacter(word2CharacterList, i);
areWordsEqual = checkIfTwoWordsAreTheSame(word1CharacterList, adjustedArrayList);
System.out.print("\n" + "Loop count " + i + " words are equal " + areWordsEqual + word1CharacterList + adjustedArrayList.toString());
if (areWordsEqual) {
break;
}
}
}
return areWordsEqual;
}
// takes in a String as a parameter and returns an ArrayList of Characters in the order of the String parameter.
private static ArrayList<Character> listOfCharacters(String word) {
ArrayList<Character> wordCharacters = new ArrayList<Character>();
String tempWord = word.toUpperCase();
for (int wordLength = 0; wordLength < tempWord.length(); wordLength++) {
Character currentCharacter = tempWord.charAt(wordLength);
wordCharacters.add(currentCharacter);
}
return wordCharacters;
}
// takes in two character arrayLists, and compares each index character.
private static boolean checkIfTwoWordsAreTheSame(ArrayList<Character> characterList1, ArrayList<Character> characterList2) {
// compare list1 against list two
for (int i = 0; i < characterList1.size(); i++) {
Character currentCharacterList1 = characterList1.get(i);
Character currentCharacterList2 = characterList2.get(i);
if (!currentCharacterList1.equals(currentCharacterList2)) {
return false;
}
}
return true;
}
// this method takes in an ArrayList of characters and the initial index that we want to shift one place to the left.
private static ArrayList<Character> shiftNeighbouringCharacter(ArrayList<Character> characterListToShift, int indexToShiftLeft) {
ArrayList<Character> tempCharacterList = new ArrayList<Character>();
int indexAtLeft = indexToShiftLeft - 1;
// fill the new arrayList full of nulls. We will have to remove these nulls later before we can add proper values in their place.
for (int i = 0; i < characterListToShift.size(); i++) {
tempCharacterList.add(null);
}
//get the current index of indexToShift
Character characterOfIndexToShift = characterListToShift.get(indexToShiftLeft);
Character currentCharacterInThePositionToShiftTo = characterListToShift.get(indexAtLeft);
tempCharacterList.remove(indexAtLeft);
tempCharacterList.add(indexAtLeft, characterOfIndexToShift);
tempCharacterList.remove(indexToShiftLeft);
tempCharacterList.add(indexToShiftLeft, currentCharacterInThePositionToShiftTo);
for (int i = 0; i < characterListToShift.size(); i++) {
if (tempCharacterList.get(i) == null) {
Character character = characterListToShift.get(i);
tempCharacterList.remove(i);
tempCharacterList.add(i, character);
}
}
return tempCharacterList;
}
Hope this helps. If you are still struggling then follow along in your debugger. :)

counting unique words in a string without using an array

So my task is to write a program that counts the number of words and unique words in a given string that we get from the user without using arrays.
I can do the first task and was wondering how I could go about doing the second part.
For counting the number of words in the string I have
boolean increment = false;
for (int i = 0; i < inputPhrase.length(); i++){
if(validChar(inputPhrase.charAt(i))) //validChar(char c) is a simple method that returns a valid character{
increment = true;
}
else if(increment){
phraseWordCount ++;
increment = false;
}
}
if(increment) phraseWordCount++; //in the case the last word is a valid character
(originally i left this out and was off by one word)
to count unique words can I somehow modify this?
Here a suggestion how to do it without arrays:
1) Read every char until a blank is found and add this char to a second String.
2) If a blank is found, add it (or another token to seperate words) to the second String.
2a) Read every word from second String comparing it to the current word from he input String
public static void main(String[] args) {
final String input = "This is a sentence that is containing three times the word is";
final char token = '#';
String processedInput = "";
String currentWord = "";
int wordCount = 0;
int uniqueWordCount = 0;
for (char c : input.toCharArray()) {
if (c != ' ') {
processedInput += c;
currentWord += c;
} else {
processedInput += token;
wordCount++;
String existingWord = "";
int occurences = 0;
for (char c1 : processedInput.toCharArray()) {
if (c1 != token) {
existingWord += c1;
} else {
if (existingWord.equals(currentWord)) {
occurences++;
}
existingWord = "";
}
}
if (occurences <= 1) {
System.out.printf("New word: %s\n", currentWord);
uniqueWordCount++;
}
currentWord = "";
}
}
wordCount++;
System.out.printf("%d words total, %d unique\n", wordCount, uniqueWordCount);
}
Output
New word: This
New word: is
New word: a
New word: sentence
New word: that
New word: containing
New word: three
New word: times
New word: the
New word: word
12 words total, 10 unique
Using the Collections API you can count words with the following method:
private int countWords(final String text) {
Scanner scanner = new Scanner(text);
Set<String> uniqueWords = new HashSet<String>();
while (scanner.hasNext()) {
uniqueWords.add(scanner.next());
}
scanner.close();
return uniqueWords.size();
}
If it is possible that you get normal sentences with punctuation marks you can change the second line to:
Scanner scanner = new Scanner(text.replaceAll("[^0-9a-zA-Z\\s]", "").toLowerCase());
Every time a word ends findUpTo checks if the word is contained in the input before the start of that word. So "if if if" would count as one unique and three total words.
/**
* Created for http://stackoverflow.com/q/22981210/1266906
*/
public class UniqueWords {
public static void main(String[] args) {
String inputPhrase = "one two ones two three one";
countWords(inputPhrase);
}
private static void countWords(String inputPhrase) {
boolean increment = false;
int wordStart = -1;
int phraseWordCount = 0;
int uniqueWordCount = 0;
for (int i = 0; i < inputPhrase.length(); i++){
if(validChar(inputPhrase.charAt(i))) { //validChar(char c) is a simple method that returns a valid character{
increment = true;
if(wordStart == -1) {
wordStart = i;
}
} else if(increment) {
phraseWordCount++;
final String lastWord = inputPhrase.substring(wordStart, i);
boolean unique = findUpTo(lastWord, inputPhrase, wordStart);
if(unique) {
uniqueWordCount++;
}
increment = false;
wordStart = -1;
}
}
if(increment) {
phraseWordCount++; //in the case the last word is a valid character
final String lastWord = inputPhrase.substring(wordStart, inputPhrase.length());
boolean unique = findUpTo(lastWord, inputPhrase, wordStart);
if(unique) {
uniqueWordCount++;
}
}
System.out.println("Words: "+phraseWordCount);
System.out.println("Unique: "+uniqueWordCount);
}
private static boolean findUpTo(String needle, String haystack, int lastPos) {
boolean previousValid = false;
boolean unique = true;
for(int j = 0; unique && j < lastPos - needle.length(); j++) {
final boolean nextValid = validChar(haystack.charAt(j));
if(!previousValid && nextValid) {
// Word start
previousValid = true;
for (int k = 0; k < lastPos - j; k++) {
if(k == needle.length()) {
// We matched all characters. Only if the word isn't finished it is unique
unique = validChar(haystack.charAt(j+k));
break;
}
if (needle.charAt(k) != haystack.charAt(j+k)) {
break;
}
}
} else {
previousValid = nextValid;
}
}
return unique;
}
private static boolean validChar(char c) {
return Character.isAlphabetic(c);
}
}

Categories