I have a program that shows you whether two words are anagrams of one another. There are a few examples that will not work properly and I would appreciate any help, although if it were not advanced that would be great, as I am a 1st year programmer. "schoolmaster" and "theclassroom" are anagrams of one another, however when I change "theclassroom" to "theclafsroom" it still says they are anagrams, what am I doing wrong?
import java.util.ArrayList;
public class AnagramCheck {
public static void main(String args[]) {
String phrase1 = "tbeclassroom";
phrase1 = (phrase1.toLowerCase()).trim();
char[] phrase1Arr = phrase1.toCharArray();
String phrase2 = "schoolmaster";
phrase2 = (phrase2.toLowerCase()).trim();
ArrayList<Character> phrase2ArrList = convertStringToArraylist(phrase2);
if (phrase1.length() != phrase2.length()) {
System.out.print("There is no anagram present.");
} else {
boolean isFound = true;
for (int i = 0; i < phrase1Arr.length; i++) {
for (int j = 0; j < phrase2ArrList.size(); j++) {
if (phrase1Arr[i] == phrase2ArrList.get(j)) {
System.out.print("There is a common element.\n");
isFound =;
phrase2ArrList.remove(j);
}
}
if (isFound == false) {
System.out.print("There are no anagrams present.");
return;
}
}
System.out.printf("%s is an anagram of %s", phrase1, phrase2);
}
}
public static ArrayList<Character> convertStringToArraylist(String str) {
ArrayList<Character> charList = new ArrayList<Character>();
for (int i = 0; i < str.length(); i++) {
charList.add(str.charAt(i));
}
return charList;
}
}
Two words are anagrams of each other if they contain the same number of characters and the same characters. You should only need to sort the characters in lexicographic order, and determine if all the characters in one string are equal to and in the same order as all of the characters in the other string.
Here's a code example. Look into Arrays in the API to understand what's going on here.
public boolean isAnagram(String firstWord, String secondWord) {
char[] word1 = firstWord.replaceAll("[\\s]", "").toCharArray();
char[] word2 = secondWord.replaceAll("[\\s]", "").toCharArray();
Arrays.sort(word1);
Arrays.sort(word2);
return Arrays.equals(word1, word2);
}
Fastest algorithm would be to map each of the 26 English characters to a unique prime number. Then calculate the product of the string. By the fundamental theorem of arithmetic, 2 strings are anagrams if and only if their products are the same.
If you sort either array, the solution becomes O(n log n). but if you use a hashmap, it's O(n). tested and working.
char[] word1 = "test".toCharArray();
char[] word2 = "tes".toCharArray();
Map<Character, Integer> lettersInWord1 = new HashMap<Character, Integer>();
for (char c : word1) {
int count = 1;
if (lettersInWord1.containsKey(c)) {
count = lettersInWord1.get(c) + 1;
}
lettersInWord1.put(c, count);
}
for (char c : word2) {
int count = -1;
if (lettersInWord1.containsKey(c)) {
count = lettersInWord1.get(c) - 1;
}
lettersInWord1.put(c, count);
}
for (char c : lettersInWord1.keySet()) {
if (lettersInWord1.get(c) != 0) {
return false;
}
}
return true;
Here's a simple fast O(n) solution without using sorting or multiple loops or hash maps. We increment the count of each character in the first array and decrement the count of each character in the second array. If the resulting counts array is full of zeros, the strings are anagrams. Can be expanded to include other characters by increasing the size of the counts array.
class AnagramsFaster{
private static boolean compare(String a, String b){
char[] aArr = a.toLowerCase().toCharArray(), bArr = b.toLowerCase().toCharArray();
if (aArr.length != bArr.length)
return false;
int[] counts = new int[26]; // An array to hold the number of occurrences of each character
for (int i = 0; i < aArr.length; i++){
counts[aArr[i]-97]++; // Increment the count of the character at i
counts[bArr[i]-97]--; // Decrement the count of the character at i
}
// If the strings are anagrams, the counts array will be full of zeros
for (int i = 0; i<26; i++)
if (counts[i] != 0)
return false;
return true;
}
public static void main(String[] args){
System.out.println(compare(args[0], args[1]));
}
}
Lots of people have presented solutions, but I just want to talk about the algorithmic complexity of some of the common approaches:
The simple "sort the characters using Arrays.sort()" approach is going to be O(N log N).
If you use radix sorting, that reduces to O(N) with O(M) space, where M is the number of distinct characters in the alphabet. (That is 26 in English ... but in theory we ought to consider multi-lingual anagrams.)
The "count the characters" using an array of counts is also O(N) ... and faster than radix sort because you don't need to reconstruct the sorted string. Space usage will be O(M).
A "count the characters" using a dictionary, hashmap, treemap, or equivalent will be slower that the array approach, unless the alphabet is huge.
The elegant "product-of-primes" approach is unfortunately O(N^2) in the worst case This is because for long-enough words or phrases, the product of the primes won't fit into a long. That means that you'd need to use BigInteger, and N times multiplying a BigInteger by a small constant is O(N^2).
For a hypothetical large alphabet, the scaling factor is going to be large. The worst-case space usage to hold the product of the primes as a BigInteger is (I think) O(N*logM).
A hashcode based approach is usually O(N) if the words are not anagrams. If the hashcodes are equal, then you still need to do a proper anagram test. So this is not a complete solution.
I would also like to highlight that most of the posted answers assume that each code-point in the input string is represented as a single char value. This is not a valid assumption for code-points outside of the BMP (plane 0); e.g. if an input string contains emojis.
The solutions that make the invalid assumption will probably work most of the time anyway. A code-point outside of the BMP will represented in the string as two char values: a low surrogate and a high surrogate. If the strings contain only one such code-point, we can get away with treating the char values as if they were code-points. However, we can get into trouble when the strings being tested contain 2 or more code-points. Then the faulty algorithms will fail to distinguish some cases. For example, [SH1, SL1, SH2, SL2] versus [SH1, SL2, SH2, SL1] where the SH<n> and SL<2> denote high and low surrogates respectively. The net result will be false anagrams.
Alex Salauyou's answer gives a couple of solutions that will work for all valid Unicode code-points.
O(n) solution without any kind of sorting and using only one map.
public boolean isAnagram(String leftString, String rightString) {
if (leftString == null || rightString == null) {
return false;
} else if (leftString.length() != rightString.length()) {
return false;
}
Map<Character, Integer> occurrencesMap = new HashMap<>();
for(int i = 0; i < leftString.length(); i++){
char charFromLeft = leftString.charAt(i);
int nrOfCharsInLeft = occurrencesMap.containsKey(charFromLeft) ? occurrencesMap.get(charFromLeft) : 0;
occurrencesMap.put(charFromLeft, ++nrOfCharsInLeft);
char charFromRight = rightString.charAt(i);
int nrOfCharsInRight = occurrencesMap.containsKey(charFromRight) ? occurrencesMap.get(charFromRight) : 0;
occurrencesMap.put(charFromRight, --nrOfCharsInRight);
}
for(int occurrencesNr : occurrencesMap.values()){
if(occurrencesNr != 0){
return false;
}
}
return true;
}
and less generic solution but a little bit faster one. You have to place your alphabet here:
public boolean isAnagram(String leftString, String rightString) {
if (leftString == null || rightString == null) {
return false;
} else if (leftString.length() != rightString.length()) {
return false;
}
char letters[] = {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'};
Map<Character, Integer> occurrencesMap = new HashMap<>();
for (char l : letters) {
occurrencesMap.put(l, 0);
}
for(int i = 0; i < leftString.length(); i++){
char charFromLeft = leftString.charAt(i);
Integer nrOfCharsInLeft = occurrencesMap.get(charFromLeft);
occurrencesMap.put(charFromLeft, ++nrOfCharsInLeft);
char charFromRight = rightString.charAt(i);
Integer nrOfCharsInRight = occurrencesMap.get(charFromRight);
occurrencesMap.put(charFromRight, --nrOfCharsInRight);
}
for(Integer occurrencesNr : occurrencesMap.values()){
if(occurrencesNr != 0){
return false;
}
}
return true;
}
We're walking two equal length strings and tracking the differences between them. We don't care what the differences are, we just want to know if they have the same characters or not. We can do this in O(n/2) without any post processing (or a lot of primes).
public class TestAnagram {
public static boolean isAnagram(String first, String second) {
String positive = first.toLowerCase();
String negative = second.toLowerCase();
if (positive.length() != negative.length()) {
return false;
}
int[] counts = new int[26];
int diff = 0;
for (int i = 0; i < positive.length(); i++) {
int pos = (int) positive.charAt(i) - 97; // convert the char into an array index
if (counts[pos] >= 0) { // the other string doesn't have this
diff++; // an increase in differences
} else { // it does have it
diff--; // a decrease in differences
}
counts[pos]++; // track it
int neg = (int) negative.charAt(i) - 97;
if (counts[neg] <= 0) { // the other string doesn't have this
diff++; // an increase in differences
} else { // it does have it
diff--; // a decrease in differences
}
counts[neg]--; // track it
}
return diff == 0;
}
public static void main(String[] args) {
System.out.println(isAnagram("zMarry", "zArmry")); // true
System.out.println(isAnagram("basiparachromatin", "marsipobranchiata")); // true
System.out.println(isAnagram("hydroxydeoxycorticosterones", "hydroxydesoxycorticosterone")); // true
System.out.println(isAnagram("hydroxydeoxycorticosterones", "hydroxydesoxycorticosterons")); // false
System.out.println(isAnagram("zArmcy", "zArmry")); // false
}
}
Yes this code is dependent on the ASCII English character set of lowercase characters but it shouldn't be hard to modify to other languages. You can always use a Map[Character, Int] to track the same information, it'll just be slower.
By using more memory (an HashMap of at most N/2 elements)we do not need to sort the strings.
public static boolean areAnagrams(String one, String two) {
if (one.length() == two.length()) {
String s0 = one.toLowerCase();
String s1 = two.toLowerCase();
HashMap<Character, Integer> chars = new HashMap<Character, Integer>(one.length());
Integer count;
for (char c : s0.toCharArray()) {
count = chars.get(c);
count = Integer.valueOf(count != null ? count + 1 : 1);
chars.put(c, count);
}
for (char c : s1.toCharArray()) {
count = chars.get(c);
if (count == null) {
return false;
} else {
count--;
chars.put(c, count);
}
}
for (Integer i : chars.values()) {
if (i != 0) {
return false;
}
}
return true;
} else {
return false;
}
}
This function is actually running in O(N) ... instead of O(NlogN) for the solution that sorts the strings. If I were to assume that you are going to use only alphabetic characters I could only use an array of 26 ints (from a to z without accents or decorations) instead of the hashmap.
If we define that :
N = |one| + |two|
we do one iteration over N (once over one to increment the counters, and once to decrement them over two).
Then to check the totals we iterate over at mose N/2.
The other algorithms described have one advantage: they do not use extra memory assuming that Arrays.sort uses inplace versions of QuickSort or merge sort. But since we are talking about anagrams I will assume that we are talking about human languages, thus words should not be long enough to give memory issues.
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
package Algorithms;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import javax.swing.JOptionPane;
/**
*
* #author Mokhtar
*/
public class Anagrams {
//Write aprogram to check if two words are anagrams
public static void main(String[] args) {
Anagrams an=new Anagrams();
ArrayList<String> l=new ArrayList<String>();
String result=JOptionPane.showInputDialog("How many words to test anagrams");
if(Integer.parseInt(result) >1)
{
for(int i=0;i<Integer.parseInt(result);i++)
{
String word=JOptionPane.showInputDialog("Enter word #"+i);
l.add(word);
}
System.out.println(an.isanagrams(l));
}
else
{
JOptionPane.showMessageDialog(null, "Can not be tested, \nYou can test two words or more");
}
}
private static String sortString( String w )
{
char[] ch = w.toCharArray();
Arrays.sort(ch);
return new String(ch);
}
public boolean isanagrams(ArrayList<String> l)
{
boolean isanagrams=true;
ArrayList<String> anagrams = null;
HashMap<String, ArrayList<String>> map = new HashMap<String, ArrayList<String>>();
for(int i=0;i<l.size();i++)
{
String word = l.get(i);
String sortedWord = sortString(word);
anagrams = map.get( sortedWord );
if( anagrams == null ) anagrams = new ArrayList<String>();
anagrams.add(word);
map.put(sortedWord, anagrams);
}
for(int h=0;h<l.size();h++)
{
if(!anagrams.contains(l.get(h)))
{
isanagrams=false;
break;
}
}
return isanagrams;
//}
}
}
I am a C++ developer and the code below is in C++. I believe the fastest and easiest way to go about it would be the following:
Create a vector of ints of size 26, with all slots initialized to 0, and place each character of the string into the appropriate position in the vector. Remember, the vector is in alphabetical order and so if the first letter in the string is z, it would go in myvector[26]. Note: This can be done using ASCII characters so essentially your code will look something like this:
string s = zadg;
for(int i =0; i < s.size(); ++i){
myvector[s[i] - 'a'] = myvector['s[i] - 'a'] + 1;
}
So inserting all the elements would take O(n) time as you would only traverse the list once. You can now do the exact same thing for the second string and that too would take O(n) time. You can then compare the two vectors by checking to see if the counters in each slot are the same. If they are, that means you had the same number of EACH character in both the strings and thus they are anagrams. The comparing of the two vectors should also take O(n) time as you are only traversing through it once.
Note: The code only works for a single word of characters. If you have spaces, and numbers and symbols, you can just create a vector of size 96 (ASCII characters 32-127) and instead of saying - 'a' you would say - ' ' as the space character is the first one in the ASCII list of characters.
I hope that helps. If i have made a mistake somewhere, please leave a comment.
So far all proposed solutions work with separate char items, not code points. I'd like to propose two solutions to properly handle surrogate pairs as well (those are characters from U+10000 to U+10FFFF, composed of two char items).
1) One-line O(n logn) solution which utilizes Java 8 CharSequence.codePoints() stream:
static boolean areAnagrams(CharSequence a, CharSequence b) {
return Arrays.equals(a.codePoints().sorted().toArray(),
b.codePoints().sorted().toArray());
}
2) Less elegant O(n) solution (in fact, it will be faster only for long strings with low chances to be anagrams):
static boolean areAnagrams(CharSequence a, CharSequence b) {
int len = a.length();
if (len != b.length())
return false;
// collect codepoint occurrences in "a"
Map<Integer, Integer> ocr = new HashMap<>(64);
a.codePoints().forEach(c -> ocr.merge(c, 1, Integer::sum));
// for each codepoint in "b", look for matching occurrence
for (int i = 0, c = 0; i < len; i += Character.charCount(c)) {
int cc = ocr.getOrDefault((c = Character.codePointAt(b, i)), 0);
if (cc == 0)
return false;
ocr.put(c, cc - 1);
}
return true;
}
Thanks for pointing out to make comment, while making comment I found that there was incorrect logic. I corrected the logic and added comment for each piece of code.
// Time complexity: O(N) where N is number of character in String
// Required space :constant space.
// will work for string that contains ASCII chars
private static boolean isAnagram(String s1, String s2) {
// if length of both string's are not equal then they are not anagram of each other
if(s1.length() != s2.length())return false;
// array to store the presence of a character with number of occurrences.
int []seen = new int[256];
// initialize the array with zero. Do not need to initialize specifically since by default element will initialized by 0.
// Added this is just increase the readability of the code.
Arrays.fill(seen, 0);
// convert each string to lower case if you want to make ABC and aBC as anagram, other wise no need to change the case.
s1 = s1.toLowerCase();
s2 = s2.toLowerCase();
// iterate through the first string and count the occurrences of each character
for(int i =0; i < s1.length(); i++){
seen[s1.charAt(i)] = seen[s1.charAt(i)] +1;
}
// iterate through second string and if any char has 0 occurrence then return false, it mean some char in s2 is there that is not present in s1.
// other wise reduce the occurrences by one every time .
for(int i =0; i < s2.length(); i++){
if(seen[s2.charAt(i)] ==0)return false;
seen[s2.charAt(i)] = seen[s2.charAt(i)]-1;
}
// now if both string have same occurrence of each character then the seen array must contains all element as zero. if any one has non zero element return false mean there are
// some character that either does not appear in one of the string or/and mismatch in occurrences
for(int i = 0; i < 256; i++){
if(seen[i] != 0)return false;
}
return true;
}
IMHO, the most efficient solution was provided by #Siguza, I have extended it to cover strings with space e.g: "William Shakespeare", "I am a weakish speller", "School master", "The classroom"
public int getAnagramScore(String word, String anagram) {
if (word == null || anagram == null) {
throw new NullPointerException("Both, word and anagram, must be non-null");
}
char[] wordArray = word.trim().toLowerCase().toCharArray();
char[] anagramArray = anagram.trim().toLowerCase().toCharArray();
int[] alphabetCountArray = new int[26];
int reference = 'a';
for (int i = 0; i < wordArray.length; i++) {
if (!Character.isWhitespace(wordArray[i])) {
alphabetCountArray[wordArray[i] - reference]++;
}
}
for (int i = 0; i < anagramArray.length; i++) {
if (!Character.isWhitespace(anagramArray[i])) {
alphabetCountArray[anagramArray[i] - reference]--;
}
}
for (int i = 0; i < 26; i++)
if (alphabetCountArray[i] != 0)
return 0;
return word.length();
}
// When this method returns 0 means strings are Anagram, else Not.
public static int isAnagram(String str1, String str2) {
int value = 0;
if (str1.length() == str2.length()) {
for (int i = 0; i < str1.length(); i++) {
value = value + str1.charAt(i);
value = value - str2.charAt(i);
}
} else {
value = -1;
}
return value;
}
Many complicated answers here. Base on the accepted answer and the comment mentioning the 'ac'-'bb' issue assuming A=65 B=66 C=67, we could simply use the square of each integer that represent a char and solve the problem:
public boolean anagram(String s, String t) {
if(s.length() != t.length())
return false;
int value = 0;
for(int i = 0; i < s.length(); i++){
value += ((int)s.charAt(i))^2;
value -= ((int)t.charAt(i))^2;
}
return value == 0;
}
A similar answer may have been posted in C++, here it is again in Java. Note that the most elegant way would be to use a Trie to store the characters in sorted order, however, that's a more complex solution. One way is to use a hashset to store all the words we are comparing and then compare them one by one. To compare them, make an array of characters with the index representing the ANCII value of the characters (using a normalizer since ie. ANCII value of 'a' is 97) and the value representing the occurrence count of that character. This will run in O(n) time and use O(m*z) space where m is the size of the currentWord and z the size for the storedWord, both for which we create a Char[].
public static boolean makeAnagram(String currentWord, String storedWord){
if(currentWord.length() != storedWord.length()) return false;//words must be same length
Integer[] currentWordChars = new Integer[totalAlphabets];
Integer[] storedWordChars = new Integer[totalAlphabets];
//create a temp Arrays to compare the words
storeWordCharacterInArray(currentWordChars, currentWord);
storeWordCharacterInArray(storedWordChars, storedWord);
for(int i = 0; i < totalAlphabets; i++){
//compare the new word to the current charList to see if anagram is possible
if(currentWordChars[i] != storedWordChars[i]) return false;
}
return true;//and store this word in the HashSet of word in the Heap
}
//for each word store its characters
public static void storeWordCharacterInArray(Integer[] characterList, String word){
char[] charCheck = word.toCharArray();
for(char c: charCheck){
Character cc = c;
int index = cc.charValue()-indexNormalizer;
characterList[index] += 1;
}
}
How a mathematician might think about the problem before writing any code:
The relation "are anagrams" between strings is an equivalence relation, so partitions the set of all strings into equivalence classes.
Suppose we had a rule to choose a representative (crib) from each class, then it's easy to test whether two classes are the same by comparing their representatives.
An obvious representative for a set of strings is "the smallest element by lexicographic order", which is easy to compute from any element by sorting. For example, the representative of the anagram class containing 'hat' is 'aht'.
In your example "schoolmaster" and "theclassroom" are anagrams because they are both in the anagram class with crib "acehlmoorsst".
In pseudocode:
>>> def crib(word):
... return sorted(word)
...
>>> crib("schoolmaster") == crib("theclassroom")
True
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
/**
* Check if Anagram by Prime Number Logic
* #author Pallav
*
*/
public class Anagram {
public static void main(String args[]) {
System.out.println(isAnagram(args[0].toUpperCase(),
args[1].toUpperCase()));
}
/**
*
* #param word : The String 1
* #param anagram_word : The String 2 with which Anagram to be verified
* #return true or false based on Anagram
*/
public static Boolean isAnagram(String word, String anagram_word) {
//If length is different return false
if (word.length() != anagram_word.length()) {
return false;
}
char[] words_char = word.toCharArray();//Get the Char Array of First String
char[] anagram_word_char = anagram_word.toCharArray();//Get the Char Array of Second String
int words_char_num = 1;//Initialize Multiplication Factor to 1
int anagram_word_num = 1;//Initialize Multiplication Factor to 1 for String 2
Map<Character, Integer> wordPrimeMap = wordPrimeMap();//Get the Prime numbers Mapped to each alphabets in English
for (int i = 0; i < words_char.length; i++) {
words_char_num *= wordPrimeMap.get(words_char[i]);//get Multiplication value for String 1
}
for (int i = 0; i < anagram_word_char.length; i++) {
anagram_word_num *= wordPrimeMap.get(anagram_word_char[i]);//get Multiplication value for String 2
}
return anagram_word_num == words_char_num;
}
/**
* Get the Prime numbers Mapped to each alphabets in English
* #return
*/
public static Map<Character, Integer> wordPrimeMap() {
List<Integer> primes = primes(26);
int k = 65;
Map<Character, Integer> map = new TreeMap<Character, Integer>();
for (int i = 0; i < primes.size(); i++) {
Character character = (char) k;
map.put(character, primes.get(i));
k++;
}
// System.out.println(map);
return map;
}
/**
* get first N prime Numbers where Number is greater than 2
* #param N : Number of Prime Numbers
* #return
*/
public static List<Integer> primes(Integer N) {
List<Integer> primes = new ArrayList<Integer>();
primes.add(2);
primes.add(3);
int n = 5;
int k = 0;
do {
boolean is_prime = true;
for (int i = 2; i <= Math.sqrt(n); i++) {
if (n % i == 0) {
is_prime = false;
break;
}
}
if (is_prime == true) {
primes.add(n);
}
n++;
// System.out.println(k);
} while (primes.size() < N);
// }
return primes;
}
}
Here is my solution.First explode the strings into char arrays then sort them and then comparing if they are equal or not. I guess time complexity of this code is O(a+b).if a=b we can say O(2A)
public boolean isAnagram(String s1, String s2) {
StringBuilder sb1 = new StringBuilder();
StringBuilder sb2 = new StringBuilder();
if (s1.length() != s2.length())
return false;
char arr1[] = s1.toCharArray();
char arr2[] = s2.toCharArray();
Arrays.sort(arr1);
Arrays.sort(arr2);
for (char c : arr1) {
sb1.append(c);
}
for (char c : arr2) {
sb2.append(c);
}
System.out.println(sb1.toString());
System.out.println(sb2.toString());
if (sb1.toString().equals(sb2.toString()))
return true;
else
return false;
}
There are 3 solution i can think of :
Using sorting
# O(NlogN) + O(MlogM) time, O(1) space
def solve_by_sort(word1, word2):
return sorted(word1) == sorted(word2)
Using letter frequency count
# O(N+M) time, O(N+M) space
def solve_by_letter_frequency(word1, word2):
from collections import Counter
return Counter(word1) == Counter(word2)
Using the concept of prime factorization. (assign primes to each letter)
import operator
from functools import reduce
# O(N) time, O(1) space - prime factorization
def solve_by_prime_number_hash(word1, word2):
return get_prime_number_hash(word1) == get_prime_number_hash(word2)
def get_prime_number_hash(word):
letter_code = {'a': 2, 'b': 3, 'c': 5, 'd': 7, 'e': 11, 'f': 13, 'g': 17, 'h': 19, 'i': 23, 'j': 29, 'k': 31,'l': 37, 'm': 41, 'n': 43,'o': 47, 'p': 53, 'q': 59, 'r': 61, 's': 67, 't': 71, 'u': 73, 'v': 79, 'w': 83, 'x': 89, 'y': 97,'z': 101}
return 0 if not word else reduce(operator.mul, [letter_code[letter] for letter in word])
I have put more detailed analysis of these in my medium story.
Sorting approach is not the best one. It takes O(n) space and O(nlogn) time. Instead, make a hash map of characters and count them (increment characters that appear in the first string and decrement characters that appear in the second string). When some count reaches zero, remove it from hash. Finally, if two strings are anagrams, then the hash table will be empty in the end - otherwise it will not be empty.
Couple of important notes: (1) Ignore letter case and (2) Ignore white space.
Here is the detailed analysis and implementation in C#: Testing If Two Strings are Anagrams
Some other solution without sorting.
public static boolean isAnagram(String s1, String s2){
//case insensitive anagram
StringBuffer sb = new StringBuffer(s2.toLowerCase());
for (char c: s1.toLowerCase().toCharArray()){
if (Character.isLetter(c)){
int index = sb.indexOf(String.valueOf(c));
if (index == -1){
//char does not exist in other s2
return false;
}
sb.deleteCharAt(index);
}
}
for (char c: sb.toString().toCharArray()){
//only allow whitespace as left overs
if (!Character.isWhitespace(c)){
return false;
}
}
return true;
}
A simple method to figure out whether the testString is an anagram of the baseString.
private static boolean isAnagram(String baseString, String testString){
//Assume that there are no empty spaces in either string.
if(baseString.length() != testString.length()){
System.out.println("The 2 given words cannot be anagram since their lengths are different");
return false;
}
else{
if(baseString.length() == testString.length()){
if(baseString.equalsIgnoreCase(testString)){
System.out.println("The 2 given words are anagram since they are identical.");
return true;
}
else{
List<Character> list = new ArrayList<>();
for(Character ch : baseString.toLowerCase().toCharArray()){
list.add(ch);
}
System.out.println("List is : "+ list);
for(Character ch : testString.toLowerCase().toCharArray()){
if(list.contains(ch)){
list.remove(ch);
}
}
if(list.isEmpty()){
System.out.println("The 2 words are anagrams");
return true;
}
}
}
}
return false;
}
Sorry, the solution is in C#, but I think the different elements used to arrive at the solution is quite intuitive. Slight tweak required for hyphenated words but for normal words it should work fine.
internal bool isAnagram(string input1,string input2)
{
Dictionary<char, int> outChars = AddToDict(input2.ToLower().Replace(" ", ""));
input1 = input1.ToLower().Replace(" ","");
foreach(char c in input1)
{
if (outChars.ContainsKey(c))
{
if (outChars[c] > 1)
outChars[c] -= 1;
else
outChars.Remove(c);
}
}
return outChars.Count == 0;
}
private Dictionary<char, int> AddToDict(string input)
{
Dictionary<char, int> inputChars = new Dictionary<char, int>();
foreach(char c in input)
{
if(inputChars.ContainsKey(c))
{
inputChars[c] += 1;
}
else
{
inputChars.Add(c, 1);
}
}
return inputChars;
}
I saw that no one has used the "hashcode" approach to find out the anagrams. I found my approach little different than the approaches discussed above hence thought of sharing it. I wrote the below code to find the anagrams which works in O(n).
/**
* This class performs the logic of finding anagrams
* #author ripudam
*
*/
public class AnagramTest {
public static boolean isAnagram(final String word1, final String word2) {
if (word1 == null || word2 == null || word1.length() != word2.length()) {
return false;
}
if (word1.equals(word2)) {
return true;
}
final AnagramWrapper word1Obj = new AnagramWrapper(word1);
final AnagramWrapper word2Obj = new AnagramWrapper(word2);
if (word1Obj.equals(word2Obj)) {
return true;
}
return false;
}
/*
* Inner class to wrap the string received for anagram check to find the
* hash
*/
static class AnagramWrapper {
String word;
public AnagramWrapper(final String word) {
this.word = word;
}
#Override
public boolean equals(final Object obj) {
return hashCode() == obj.hashCode();
}
#Override
public int hashCode() {
final char[] array = word.toCharArray();
int hashcode = 0;
for (final char c : array) {
hashcode = hashcode + (c * c);
}
return hashcode;
}
}
}
Here is another approach using HashMap in Java
public static boolean isAnagram(String first, String second) {
if (first == null || second == null) {
return false;
}
if (first.length() != second.length()) {
return false;
}
return doCheckAnagramUsingHashMap(first.toLowerCase(), second.toLowerCase());
}
private static boolean doCheckAnagramUsingHashMap(final String first, final String second) {
Map<Character, Integer> counter = populateMap(first, second);
return validateMap(counter);
}
private static boolean validateMap(Map<Character, Integer> counter) {
for (int val : counter.values()) {
if (val != 0) {
return false;
}
}
return true;
}
Here is the test case
#Test
public void anagramTest() {
assertTrue(StringUtil.isAnagram("keep" , "PeeK"));
assertFalse(StringUtil.isAnagram("Hello", "hell"));
assertTrue(StringUtil.isAnagram("SiLeNt caT", "LisTen cat"));
}
private static boolean checkAnagram(String s1, String s2) {
if (s1 == null || s2 == null) {
return false;
} else if (s1.length() != s2.length()) {
return false;
}
char[] a1 = s1.toCharArray();
char[] a2 = s2.toCharArray();
int length = s2.length();
int s1Count = 0;
int s2Count = 0;
for (int i = 0; i < length; i++) {
s1Count+=a1[i];
s2Count+=a2[i];
}
return s2Count == s1Count ? true : false;
}
The simplest solution with complexity O(N) is using Map.
public static Boolean checkAnagram(String string1, String string2) {
Boolean anagram = true;
Map<Character, Integer> map1 = new HashMap<>();
Map<Character, Integer> map2 = new HashMap<>();
char[] chars1 = string1.toCharArray();
char[] chars2 = string2.toCharArray();
for(int i=0; i<chars1.length; i++) {
if(map1.get(chars1[i]) == null) {
map1.put(chars1[i], 1);
} else {
map1.put(chars1[i], map1.get(chars1[i])+1);
}
if(map2.get(chars2[i]) == null) {
map2.put(chars2[i], 1);
} else {
map2.put(chars2[i], map2.get(chars2[i])+1);
}
}
Set<Map.Entry<Character, Integer>> entrySet1 = map1.entrySet();
Set<Map.Entry<Character, Integer>> entrySet2 = map2.entrySet();
for(Map.Entry<Character, Integer> entry:entrySet1) {
if(entry.getValue() != map2.get(entry.getKey())) {
anagram = false;
break;
}
}
return anagram;
}
let's take a question: Given two strings s and t, write a function to determine if t is an anagram of s.
For example,
s = "anagram", t = "nagaram", return true.
s = "rat", t = "car", return false.
Method 1(Using HashMap ):
public class Method1 {
public static void main(String[] args) {
String a = "protijayi";
String b = "jayiproti";
System.out.println(isAnagram(a, b ));// output => true
}
private static boolean isAnagram(String a, String b) {
Map<Character ,Integer> map = new HashMap<>();
for( char c : a.toCharArray()) {
map.put(c, map.getOrDefault(c, 0 ) + 1 );
}
for(char c : b.toCharArray()) {
int count = map.getOrDefault(c, 0);
if(count == 0 ) {return false ; }
else {map.put(c, count - 1 ) ; }
}
return true;
}
}
Method 2 :
public class Method2 {
public static void main(String[] args) {
String a = "protijayi";
String b = "jayiproti";
System.out.println(isAnagram(a, b));// output=> true
}
private static boolean isAnagram(String a, String b) {
int[] alphabet = new int[26];
for(int i = 0 ; i < a.length() ;i++) {
alphabet[a.charAt(i) - 'a']++ ;
}
for (int i = 0; i < b.length(); i++) {
alphabet[b.charAt(i) - 'a']-- ;
}
for( int w : alphabet ) {
if(w != 0 ) {return false;}
}
return true;
}
}
Method 3 :
public class Method3 {
public static void main(String[] args) {
String a = "protijayi";
String b = "jayiproti";
System.out.println(isAnagram(a, b ));// output => true
}
private static boolean isAnagram(String a, String b) {
char[] ca = a.toCharArray() ;
char[] cb = b.toCharArray();
Arrays.sort( ca );
Arrays.sort( cb );
return Arrays.equals(ca , cb );
}
}
Method 4 :
public class AnagramsOrNot {
public static void main(String[] args) {
String a = "Protijayi";
String b = "jayiProti";
isAnagram(a, b);
}
private static void isAnagram(String a, String b) {
Map<Integer, Integer> map = new LinkedHashMap<>();
a.codePoints().forEach(code -> map.put(code, map.getOrDefault(code, 0) + 1));
System.out.println(map);
b.codePoints().forEach(code -> map.put(code, map.getOrDefault(code, 0) - 1));
System.out.println(map);
if (map.values().contains(0)) {
System.out.println("Anagrams");
} else {
System.out.println("Not Anagrams");
}
}
}
In Python:
def areAnagram(a, b):
if len(a) != len(b): return False
count1 = [0] * 256
count2 = [0] * 256
for i in a:count1[ord(i)] += 1
for i in b:count2[ord(i)] += 1
for i in range(256):
if(count1[i] != count2[i]):return False
return True
str1 = "Giniiii"
str2 = "Protijayi"
print(areAnagram(str1, str2))
Let's take another famous Interview Question: Group the Anagrams from a given String:
public class GroupAnagrams {
public static void main(String[] args) {
String a = "Gini Gina Protijayi iGin aGin jayiProti Soudipta";
Map<String, List<String>> map = Arrays.stream(a.split(" ")).collect(Collectors.groupingBy(GroupAnagrams::sortedString));
System.out.println("MAP => " + map);
map.forEach((k,v) -> System.out.println(k +" and the anagrams are =>" + v ));
/*
Look at the Map output:
MAP => {Giin=[Gini, iGin], Paiijorty=[Protijayi, jayiProti], Sadioptu=[Soudipta], Gain=[Gina, aGin]}
As we can see, there are multiple Lists. Hence, we have to use a flatMap(List::stream)
Now, Look at the output:
Paiijorty and the anagrams are =>[Protijayi, jayiProti]
Now, look at this output:
Sadioptu and the anagrams are =>[Soudipta]
List contains only word. No anagrams.
That means we have to work with map.values(). List contains all the anagrams.
*/
String stringFromMapHavingListofLists = map.values().stream().flatMap(List::stream).collect(Collectors.joining(" "));
System.out.println(stringFromMapHavingListofLists);
}
public static String sortedString(String a) {
String sortedString = a.chars().sorted()
.collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append).toString();
return sortedString;
}
/*
* The output : Gini iGin Protijayi jayiProti Soudipta Gina aGin
* All the anagrams are side by side.
*/
}
Now to Group Anagrams in Python is again easy.We have to :
Sort the lists. Then, Create a dictionary. Now dictionary will tell us where are those anagrams are( Indices of Dictionary). Then values of the dictionary is the actual indices of the anagrams.
def groupAnagrams(words):
# sort each word in the list
A = [''.join(sorted(word)) for word in words]
dict = {}
for indexofsamewords, names in enumerate(A):
dict.setdefault(names, []).append(indexofsamewords)
print(dict)
#{'AOOPR': [0, 2, 5, 11, 13], 'ABTU': [1, 3, 4], 'Sorry': [6], 'adnopr': [7], 'Sadioptu': [8, 16], ' KPaaehiklry': [9], 'Taeggllnouy': [10], 'Leov': [12], 'Paiijorty': [14, 18], 'Paaaikpr': [15], 'Saaaabhmryz': [17], ' CNaachlortttu': [19], 'Saaaaborvz': [20]}
for index in dict.values():
print([words[i] for i in index])
if __name__ == '__main__':
# list of words
words = ["ROOPA","TABU","OOPAR","BUTA","BUAT" , "PAROO","Soudipta",
"Kheyali Park", "Tollygaunge", "AROOP","Love","AOORP", "Protijayi","Paikpara","dipSouta","Shyambazaar",
"jayiProti", "North Calcutta", "Sovabazaar"]
groupAnagrams(words)
The Output :
['ROOPA', 'OOPAR', 'PAROO', 'AROOP', 'AOORP']
['TABU', 'BUTA', 'BUAT']
['Soudipta', 'dipSouta']
['Kheyali Park']
['Tollygaunge']
['Love']
['Protijayi', 'jayiProti']
['Paikpara']
['Shyambazaar']
['North Calcutta']
['Sovabazaar']
Another Important Anagram Question : Find the Anagram occuring Max. number of times.
In the Example, ROOPA is the word which has occured maximum number of times.
Hence, ['ROOPA' 'OOPAR' 'PAROO' 'AROOP' 'AOORP'] will be the final output.
from sqlite3 import collections
from statistics import mode, mean
import numpy as np
# list of words
words = ["ROOPA","TABU","OOPAR","BUTA","BUAT" , "PAROO","Soudipta",
"Kheyali Park", "Tollygaunge", "AROOP","Love","AOORP",
"Protijayi","Paikpara","dipSouta","Shyambazaar",
"jayiProti", "North Calcutta", "Sovabazaar"]
print(".....Method 1....... ")
sortedwords = [''.join(sorted(word)) for word in words]
print(sortedwords)
print("...........")
LongestAnagram = np.array(words)[np.array(sortedwords) == mode(sortedwords)]
# Longest anagram
print("Longest anagram by Method 1:")
print(LongestAnagram)
print(".....................................................")
print(".....Method 2....... ")
A = [''.join(sorted(word)) for word in words]
dict = {}
for indexofsamewords,samewords in enumerate(A):
dict.setdefault(samewords,[]).append(samewords)
#print(dict)
#{'AOOPR': ['AOOPR', 'AOOPR', 'AOOPR', 'AOOPR', 'AOOPR'], 'ABTU': ['ABTU', 'ABTU', 'ABTU'], 'Sadioptu': ['Sadioptu', 'Sadioptu'], ' KPaaehiklry': [' KPaaehiklry'], 'Taeggllnouy': ['Taeggllnouy'], 'Leov': ['Leov'], 'Paiijorty': ['Paiijorty', 'Paiijorty'], 'Paaaikpr': ['Paaaikpr'], 'Saaaabhmryz': ['Saaaabhmryz'], ' CNaachlortttu': [' CNaachlortttu'], 'Saaaaborvz': ['Saaaaborvz']}
aa = max(dict.items() , key = lambda x : len(x[1]))
print("aa => " , aa)
word, anagrams = aa
print("Longest anagram by Method 2:")
print(" ".join(anagrams))
The Output :
.....Method 1.......
['AOOPR', 'ABTU', 'AOOPR', 'ABTU', 'ABTU', 'AOOPR', 'Sadioptu', ' KPaaehiklry', 'Taeggllnouy', 'AOOPR', 'Leov', 'AOOPR', 'Paiijorty', 'Paaaikpr', 'Sadioptu', 'Saaaabhmryz', 'Paiijorty', ' CNaachlortttu', 'Saaaaborvz']
...........
Longest anagram by Method 1:
['ROOPA' 'OOPAR' 'PAROO' 'AROOP' 'AOORP']
.....................................................
.....Method 2.......
aa => ('AOOPR', ['AOOPR', 'AOOPR', 'AOOPR', 'AOOPR', 'AOOPR'])
Longest anagram by Method 2:
AOOPR AOOPR AOOPR AOOPR AOOPR
This could be the simple function call
A mix of functional Code and Imperative style of code
static boolean isAnagram(String a, String b) {
String sortedA = "";
Object[] aArr = a.toLowerCase().chars().sorted().mapToObj(i -> (char) i).toArray();
for (Object o: aArr) {
sortedA = sortedA.concat(o.toString());
}
String sortedB = "";
Object[] bArr = b.toLowerCase().chars().sorted().mapToObj(i -> (char) i).toArray();
for (Object o: bArr) {
sortedB = sortedB.concat(o.toString());
}
if(sortedA.equals(sortedB))
return true;
else
return false;
}
Related
I have to create a vowel counter and sorter, where someone can input a word or phrase and the program picks out, counts, and sorts the vowels. I have the code to where it counts and sorts the variables and shows their counts to the user, but it doesn't say which vowel has which count and I have exhausted all of my resources. I am very new to coding and know very little, so if there's anything anyone can do to help, I would appreciate it endlessly.
int[] vowelcounter = {a, e, i, o, u}; //This is the count of the vowels after reading the input.
boolean hasswapped = true;
while(hasswapped)
{
hasswapped = false;
for(int j = 0; j<vowelcounter.length; j++)
{
for(int k = j+1; k<vowelcounter.length; k++)
{
if(vowelcounter[j] > vowelcounter[k])
{
int temp = vowelcounter[j];
vowelcounter[j] = vowelcounter[j+1];
vowelcounter[j+1] = temp;
hasswapped = true;
}
}
}
}
for(int j=0; j<vowelcounter.length; j++)
{
System.out.println(vowelcounter[j]);
}
Instead of int value to represent a counter, a class may be introduced to store and print both the vowel character and its count:
class VowelCount {
private final char vowel;
private int count = 0;
public VowelCount(char v) {
this.vowel = v;
}
public void add() { this.count++; }
public int getCount() { return this.count; }
public char getVowel() { return this.vowel; }
#Override
public String toString() { return "Vowel '" + vowel + "' count = " + count;}
}
Then instead of int[] count an array of VowelCount is created and sorted:
VowelCount[] vowelcounter = {
new VowelCount('a'), new VowelCount('e'), new VowelCount('i'),
new VowelCount('o'), new VowelCount('u')
};
Sorting may be implemented using standard method Arrays::sort with a custom comparator instead of home-made bubble sorting
Arrays.sort(vowelcounter, Comparator.comparingInt(VowelCount::getCount));
Then printing of the stats is as follows (using for-each loop along with the overriden toString):
for (VowelCount v: vowelcounter) {
System.out.println(v); // print sorted by count
}
More advanced ways of calculating the frequencies is to use a map of vowels to their frequencies and sort the map by counter value.
You can use something that is called HashMap
HashMap<String, Integer> vowelCounts = new HashMap<>();
To add data to it just do:
vowelCounts.put("a", 1); // The vowel "a" is once in the sentence
vowelCounts.put("e", 2); // The vowel "e" is 2 times in the sentence
To print to the console:
for(String vowel : vowelCounts.keySet() ) {
System.out.println(vowel + ": " + vowelCounts.get(vowel));
}
For more info: click me!
Have a char[] vowels = { 'a', 'e', 'i', 'o', 'u' }. Every time you swap the counters, make an identical swap in the vowels array.
int temp = vowelcounter[j];
vowelcounter[j] = vowelcounter[j+1];
vowelcounter[j+1] = temp;
char temp2 = vowel[j];
vowel[j] = vowel[j+1];
vowel[j+1] = temp2;
hasswapped = true;
At the end, print out vowel[j] next to vowelcounter[j];
I m trying to make a function that prints the number of characters common in given n strings. (note that characters may be used multiple times)
I am struggling to perform this operation on n strings However I did it for 2 strings without any characters repeated more than once.
I have posted my code.
public class CommonChars {
public static void main(String[] args) {
String str1 = "abcd";
String str2 = "bcde";
StringBuffer sb = new StringBuffer();
// get unique chars from both the strings
str1 = uniqueChar(str1);
str2 = uniqueChar(str2);
int count = 0;
int str1Len = str1.length();
int str2Len = str2.length();
for (int i = 0; i < str1Len; i++) {
for (int j = 0; j < str2Len; j++) {
// found match stop the loop
if (str1.charAt(i) == str2.charAt(j)) {
count++;
sb.append(str1.charAt(i));
break;
}
}
}
System.out.println("Common Chars Count : " + count + "\nCommon Chars :" +
sb.toString());
}
public static String uniqueChar(String inputString) {
String outputstr="",temp="";
for(int i=0;i<inputstr.length();i++) {
if(temp.indexOf(inputstr.charAt(i))<0) {
temp+=inputstr.charAt(i);
}
}
System.out.println("completed");
return temp;
}
}
3
abcaa
bcbd
bgc
3
their may be chances that a same character can be present multiple times in
a string and you are not supposed to eliminate those characters instead
check the no. of times they are repeated in other strings. for eg
3
abacd
aaxyz
aatre
output should be 2
it will be better if i get solution in java
You have to convert all Strings to Set of Characters and retain all from the first one. Below solution has many places which could be optimised but you should understand general idea.
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
public class Main {
public static void main(String[] args) {
List<String> input = Arrays.asList("jonas", "ton", "bonny");
System.out.println(findCommonCharsFor(input));
}
public static Collection<Character> findCommonCharsFor(List<String> strings) {
if (strings == null || strings.isEmpty()) {
return Collections.emptyList();
}
Set<Character> commonChars = convertStringToSetOfChars(strings.get(0));
strings.stream().skip(1).forEach(s -> commonChars.retainAll(convertStringToSetOfChars(s)));
return commonChars;
}
private static Set<Character> convertStringToSetOfChars(String string) {
if (string == null || string.isEmpty()) {
return Collections.emptySet();
}
Set<Character> set = new HashSet<>(string.length() + 10);
for (char c : string.toCharArray()) {
set.add(c);
}
return set;
}
}
Above code prints:
[n, o]
A better strategy for your problem is to use this method:
public int[] countChars(String s){
int[] count = new int[26];
for(char c: s.toCharArray()){
count[c-'a']++;
}
return count;
}
Now if you have n Strings (String[] strings) just find the min of common chars for each letter:
int[][] result = new int[n][26]
for(int i = 0; i<strings.length;i++){
result[i] = countChars(s);
}
// now if you sum the min common chars for each counter you are ready
int commonChars = 0;
for(int i = 0; i< 26;i++){
int min = result[0][i];
for(int i = 1; i< n;i++){
if(min>result[j][i]){
min = result[j][i];
}
}
commonChars+=min;
}
Get list of characters for each string:
List<Character> chars1 = s1.chars() // list of chars for first string
.mapToObj(c -> (char) c)
.collect(Collectors.toList());
List<Character> chars2 = s2.chars() // list of chars for second string
.mapToObj(c -> (char) c)
.collect(Collectors.toList());
Then use retainAll method:
chars1.retainAll(chars2); // retain in chars1 only the chars that are contained in the chars2 also
System.out.println(chars1.size());
If you want to get number of unique chars just use Collectors.toSet() instead of toList()
Well if one goes for hashing:
public static int uniqueChars(String first, String second) {
boolean[] hash = new boolean[26];
int count = 0;
//reduce first string to unique letters
for (char c : first.toLowerCase().toCharArray()) {
hash[c - 'a'] = true;
}
//reduce to unique letters in both strings
for(char c : second.toLowerCase().toCharArray()){
if(hash[c - 'a']){
count++;
hash[c - 'a'] = false;
}
}
return count;
}
This is using bucketsort which gives a n+m complexity but needs the 26 buckets(the "hash" array).
Imo one can't do better in regards of complexity as you need to look at every letter at least once which sums up to n+m.
Insitu the best you can get is imho somewhere in the range of O(n log(n) ) .
Your aproach is somewhere in the league of O(n²)
Addon: if you need the characters as a String(in essence the same as above with count is the length of the String returned):
public static String uniqueChars(String first, String second) {
boolean[] hash = new boolean[26];
StringBuilder sb = new StringBuilder();
for (char c : first.toLowerCase().toCharArray()) {
hash[c - 'a'] = true;
}
for(char c : second.toLowerCase().toCharArray()){
if(hash[c - 'a']){
sb.append(c);
hash[c - 'a'] = false;
}
}
return sb.toString();
}
public static String getCommonCharacters(String... words) {
if (words == null || words.length == 0)
return "";
Set<Character> unique = words[0].chars().mapToObj(ch -> (char)ch).collect(Collectors.toCollection(TreeSet::new));
for (String word : words)
unique.retainAll(word.chars().mapToObj(ch -> (char)ch).collect(Collectors.toSet()));
return unique.stream().map(String::valueOf).collect(Collectors.joining());
}
Another variant without creating temporary Set and using Character.
public static String getCommonCharacters(String... words) {
if (words == null || words.length == 0)
return "";
int[] arr = new int[26];
boolean[] tmp = new boolean[26];
for (String word : words) {
Arrays.fill(tmp, false);
for (int i = 0; i < word.length(); i++) {
int pos = Character.toLowerCase(word.charAt(i)) - 'a';
if (tmp[pos])
continue;
tmp[pos] = true;
arr[pos]++;
}
}
StringBuilder buf = new StringBuilder(26);
for (int i = 0; i < arr.length; i++)
if (arr[i] == words.length)
buf.append((char)('a' + i));
return buf.toString();
}
Demo
System.out.println(getCommonCharacters("abcd", "bcde")); // bcd
Here is my code for whether two strings are anagrams or not
static boolean isAnagram(String a, String b) {
if (a.length() != b.length()) return false;
a = a.toLowerCase();
b = b.toLowerCase();
int m1=0;
for(int i=0;i<a.length();i++){
m1 += (int)a.charAt(i);
m1 -= (int)b.charAt(i);
}
return m1==0;
}
My code fails for two test cases
case 1: String a="xyzw";and String b="xyxy";
case 2: String a="bbcc"; and String b="dabc";
can anyone help me passing the above two cases?
I think your code doesn't work because you sum up the code of characters but maybe answer is zero however their are not equal, for example: "ad" "bc"
the better way is to do this is to sort characters of strings, if they has same array length and same order, so two string are anagram.
static boolean isAnagram(String str1, String str2) {
int[] str1Chars = str1.toLowerCase().chars().sorted().toArray();
int[] str2Chars = str2.toLowerCase().chars().sorted().toArray();
return Arrays.equals(str1Chars, str2Chars);
}
I hope this help you. (it is a little hard because I use stream to create and sort array of characters)
Try this:
import java.io.*;
class GFG{
/* function to check whether two strings are
anagram of each other */
static boolean areAnagram(char[] str1, char[] str2)
{
// Get lenghts of both strings
int n1 = str1.length;
int n2 = str2.length;
// If length of both strings is not same,
// then they cannot be anagram
if (n1 != n2)
return false;
// Sort both strings
quickSort(str1, 0, n1 - 1);
quickSort(str2, 0, n2 - 1);
// Compare sorted strings
for (int i = 0; i < n1; i++)
if (str1[i] != str2[i])
return false;
return true;
}
// Following functions (exchange and partition
// are needed for quickSort)
static void exchange(char A[],int a, int b)
{
char temp;
temp = A[a];
A[a] = A[b];
A[b] = temp;
}
static int partition(char A[], int si, int ei)
{
char x = A[ei];
int i = (si - 1);
int j;
for (j = si; j <= ei - 1; j++)
{
if(A[j] <= x)
{
i++;
exchange(A, i, j);
}
}
exchange (A, i+1 , ei);
return (i + 1);
}
/* Implementation of Quick Sort
A[] --> Array to be sorted
si --> Starting index
ei --> Ending index
*/
static void quickSort(char A[], int si, int ei)
{
int pi; /* Partitioning index */
if(si < ei)
{
pi = partition(A, si, ei);
quickSort(A, si, pi - 1);
quickSort(A, pi + 1, ei);
}
}
/* Driver program to test to print printDups*/
public static void main(String args[])
{
char str1[] = {'t','e','s','t'};
char str2[] = {'t','t','e','w'};
if (areAnagram(str1, str2))
System.out.println("The two strings are"+
" anagram of each other");
else
System.out.println("The two strings are not"+
" anagram of each other");
}
}
The implementation isn't correct. While a pair of anagrams will always have the same length and the same sum of characters, this is not a sufficient condition. There are many pairs of strings that have the same length and the same sum of characters and are not anagrams. E.g., "ad" and "bc".
A better implementation would count the number of times each character appears in each string and compare them. E.g.:
public static boolean isAnagram(String a, String b) {
return charCounts(a).equals(charCounts(b));
}
private static Map<Integer, Long> charCounts(String s) {
return s.chars()
.boxed()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
}
static boolean isAnagram(String a, String b) {
if (a.length() != b.length())
return false;
a = a.toLowerCase();
b = b.toLowerCase();
HashMap<Integer, Integer> m1 = new HashMap<>(); // Key is ascii number, Value is count. For String a
HashMap<Integer, Integer> m2 = new HashMap<>(); // Key is ascii number, Value is count. For String b
for (int i = 0; i < a.length(); i++) {
int an = (int) (a.charAt(i));
int bn = (int) (b.charAt(i));
// Add 1 to current ascii number. String a.
if (m1.containsKey(an)) {
m1.put(an, m1.get(an) + 1);
}else {
m1.put(an, 1);
}
// Add 1 to current ascii number. String b.
if (m2.containsKey(bn)) {
m2.put(bn, m2.get(bn) + 1);
}else {
m2.put(bn, 1);
}
}
//Check both count equals().
return m1.equals(m2);
}
you should check per every letters.
If (ascii of a[0] == ascii of b[0] + 1) and (ascii of a[1] == ascii of b[1] - 1) It will return true because 1 - 1 is zero.
Sorry for very very complex code.
Adding character values is error prone logic, because A+C and B+B generate same number. The best option with this case is using Arrays. Look at the code below -
static boolean isAnagram(String a, String b) {
if (a.length() != b.length()) return false;
a = a.toLowerCase();
b = b.toLowerCase();
char[] charA = a.toCharArray();
Arrays.sort(charA);
char[] charB = b.toCharArray();
Arrays.sort(charB);
return Arrays.equals(charA, charB);
}
This should give you what you want.
Try this. It will execute in the O(word.length).
public boolean checkForAnagram(String str1, String str2) {
if (str1 == null || str2 == null || str1.length() != str2.length()) {
return false;
}
return Arrays.equals(getCharFrequencyTable(str1), getCharFrequencyTable(str2));
}
private int[] getCharFrequencyTable(String str) {
int[] frequencyTable = new int[256]; //I am using array instead of hashmap to make you realize that its a constant time operation.
char[] charArrayOfStr = str.toLowerCase().toCharArray();
for(char c : charArrayOfStr) {
frequencyTable[c] = frequencyTable[c]+1;
}
return frequencyTable;
}
Check out below methods :
/**
* Java program - String Anagram Example.
* This program checks if two Strings are anagrams or not
*/
public class AnagramCheck {
/*
* One way to find if two Strings are anagram in Java. This method
* assumes both arguments are not null and in lowercase.
*
* #return true, if both String are anagram
*/
public static boolean isAnagram(String word, String anagram){
if(word.length() != anagram.length()){
return false;
}
char[] chars = word.toCharArray();
for(char c : chars){
int index = anagram.indexOf(c);
if(index != -1){
anagram = anagram.substring(0,index) + anagram.substring(index +1, anagram.length());
}else{
return false;
}
}
return anagram.isEmpty();
}
/*
* Another way to check if two Strings are anagram or not in Java
* This method assumes that both word and anagram are not null and lowercase
* #return true, if both Strings are anagram.
*/
public static boolean iAnagram(String word, String anagram){
char[] charFromWord = word.toCharArray();
char[] charFromAnagram = anagram.toCharArray();
Arrays.sort(charFromWord);
Arrays.sort(charFromAnagram);
return Arrays.equals(charFromWord, charFromAnagram);
}
public static boolean checkAnagram(String first, String second){
char[] characters = first.toCharArray();
StringBuilder sbSecond = new StringBuilder(second);
for(char ch : characters){
int index = sbSecond.indexOf("" + ch);
if(index != -1){
sbSecond.deleteCharAt(index);
}else{
return false;
}
}
return sbSecond.length()==0 ? true : false;
}
}
You are adding the ascii values of characters in given strings and comparing them, which will not always give you correct results. Consider this:
String a="acd" and String b="ccb"
both of them will give you a sum of 296 but these are not anagrams.
You can count of occurrences of characters in both the string and compare them. In above example, it will give you {"a":1,"c":1,"d":1} and {"c":2,"b":1}.
Also,you can associate a prime number with each of the character set [a-z] where 'a' matches 2, 'b' matches 3, 'c' matches 5 and so on.
Next, you can calculate the multiplication of the prime numbers associated with characters in the given string. The multiplication follows associativity rules (xy = yx).
Example:
abc --> 2*3*5 = 30
cba --> 5*3*2 = 30
Note: If the string size is huge, this might not be the best approach as you might encounter overflow issues.
I wrote this class that can check if two given strings are permutations of each other. However, it is my understanding that this runs at O(n^2) time because the string.indexOf() runs at O(n) time.
How can this program be made more efficient?
import java.util.*;
public class IsPermutation{
public void IsPermutation(){
System.out.println("Checks if two strings are permutations of each other.");
System.out.println("Call the check() method");
}
public boolean check(){
Scanner console = new Scanner(System.in);
System.out.print("Insert first string: ");
String first = console.nextLine();
System.out.print("Insert second string: ");
String second = console.nextLine();
if (first.length() != second.length()){
System.out.println("Not permutations");
return false;
}
for (int i = 0; i < first.length(); i++){
if (second.indexOf(first.charAt(i)) == -1){
System.out.println("Not permutations");
return false;
}
}
System.out.println("permutations");
return true;
}
}
First, it can be done in O(nlogn) by sorting the two strings (after converting them to char[]), and then simple equality test will tell you if the original strings are permutations or not.
An O(n) solution average case can be achieved by creating a HashMap<Character, Integer>, where each key is a character in the string, and the value is the number of its occurances (This is called a Histogram). After you have it, again a simple equality check of the two maps will tell you if the original strings are permutations.
One way to archive O(n) is to count the frequency of every character.
I would use a HashMap with the characters as keys and the frequencys as values.
//create a HashMap containing the frequencys of every character of the String (runtime O(n) )
public HashMap<Character, Integer> getFrequencys(String s){
HashMap<Character, Integer> map = new HashMap<>();
for(int i=0; i<s.length(); i++){
//get character at position i
char c = s.charAt(i);
//get old frequency (edited: if the character is added for the
//first time, the old frequency is 0)
int frequency;
if(map.containsKey(c)){
frequency = map.get(c);
}else{
frequency = 0;
}
//increment frequency by 1
map.put(c, frequency+1 );
}
return map;
}
now you can create a HashMap for both Strings and compare if the frequency of every character is the same
//runtime O(3*n) = O(n)
public boolean compare(String s1, String s2){
if(s1.length() != s2.length()){
return false;
}
//runtime O(n)
HashMap<Character, Integer> map1 = getFrequencys(s1);
HashMap<Character, Integer> map2 = getFrequencys(s2);
//Iterate over every character in map1 (every character contained in s1) (runtime O(n) )
for(Character c : map1.keySet()){
//if the characters frequencys are different, the strings arent permutations
if( map2.get(c) != map1.get(c)){
return false;
}
}
//since every character in s1 has the same frequency in s2,
//and the number of characters is equal => s2 must be a permutation of s1
return true;
}
edit: there was a nullpointer error in the (untested) code
Sorting Solution:
public void IsPermutation(String str1, String str2) {
char[] sortedCharArray1 = Arrays.sort(str1.toCharArray());
char[] sortedCharArray2 = Arrays.sort(str2.toCharArray());
return Arrays.equals(sortedCharArray1, sortedCharArray2);
}
Time Complexity: O(n log n)
Space Complexity: O(n)
Frequency count solution:
//Assuming that characters are only ASCII. The solutions can easily be modified for all characters
public void IsPermutation(String str1, String str2) {
if (str1.length() != str2.length())
return false;
int freqCountStr1[] = new int[256];
int freqCountStr2[] = new int[256];
for (int i = 0; i < str1.length(); ++i) {
int c1 = str1.charAt(i);
int c2 = str2.charAt(i);
++freqCountStr1[c1];
++freqCountStr2[c2];
}
for (int i = 0; i < str1.length(); ++i) {
if (freqCountStr1[i] != freqCountStr2[i]) {
return false;
}
}
return true;
}
}
Time Complexity: O(n)
Space Complexity: O(256)
I was recently in an interview and they asked me the following question:
Write a function to return true if a string matches a pattern, false
otherwise
Pattern: 1 character per item, (a-z), input: space delimited string
This was my solution for the first problem:
static boolean isMatch(String pattern, String input) {
char[] letters = pattern.toCharArray();
String[] split = input.split("\\s+");
if (letters.length != split.length) {
// early return - not possible to match if lengths aren't equal
return false;
}
Map<String, Character> map = new HashMap<>();
// aaaa test test test1 test1
boolean used[] = new boolean[26];
for (int i = 0; i < letters.length; i++) {
Character existing = map.get(split[i]);
if (existing == null) {
// put into map if not found yet
if (used[(int)(letters[i] - 'a')]) {
return false;
}
used[(int)(letters[i] - 'a')] = true;
map.put(split[i], letters[i]);
} else {
// doesn't match - return false
if (existing != letters[i]) {
return false;
}
}
}
return true;
}
public static void main(String[] argv) {
System.out.println(isMatch("aba", "blue green blue"));
System.out.println(isMatch("aba", "blue green green"));
}
The next part of the problem stumped me:
With no delimiters in the input, write the same function.
eg:
isMatch("aba", "bluegreenblue") -> true
isMatch("abc","bluegreenyellow") -> true
isMatch("aba", "t1t2t1") -> true
isMatch("aba", "t1t1t1") -> false
isMatch("aba", "t1t11t1") -> true
isMatch("abab", "t1t2t1t2") -> true
isMatch("abcdefg", "ieqfkvu") -> true
isMatch("abcdefg", "bluegreenredyellowpurplesilvergold") -> true
isMatch("ababac", "bluegreenbluegreenbluewhite") -> true
isMatch("abdefghijklmnopqrstuvwxyz", "zyxwvutsrqponmlkjihgfedcba") -> true
I wrote a bruteforce solution (generating all possible splits of the input string of size letters.length and checking in turn against isMatch) but the interviewer said it wasn't optimal.
I have no idea how to solve this part of the problem, is this even possible or am I missing something?
They were looking for something with a time complexity of O(M x N ^ C), where M is the length of the pattern and N is the length of the input, C is some constant.
Clarifications
I'm not looking for a regex solution, even if it works.
I'm not looking for the naive solution that generates all possible splits and checks them, even with optimization since that'll always be exponential time.
It is possible to optimize a backtracking solution. Instead of generating all splits first and then checking that it is a valid one, we can check it "on fly". Let's assume that we have already split a prefix(with length p) of the initial string and have matched i characters from the pattern. Let's take look at the i + 1 character.
If there is a string in the prefix that corresponds to the i + 1 letter, we should just check that a substring that starts at the position p + 1 is equal to it. If it is, we just proceed to i + 1 and p + the length of this string. Otherwise, we can kill this branch.
If there is no such string, we should try all substrings that start in the position p + 1 and end somewhere after it.
We can also use the following idea to reduce the number of branches in your solution: we can estimate the length of the suffix of the pattern which has not been processed yet(we know the length for the letters that already stand for some strings, and we know a trivial lower bound of the length of a string for any letter in the pattern(it is 1)). It allows us to kill a branch if the remaining part of the initial string is too short to match a the rest of the pattern.
This solution still has an exponential time complexity, but it can work much faster than generating all splits because invalid solutions can be thrown away much earlier, so the number of reachable states can reduce significantly.
I feel like this is cheating, and I'm not convinced the capture group and reluctant quantifier will do the right thing. Or maybe they're looking to see if you can recognize that, because of how quantifiers work, matching is ambiguous.
boolean matches(String s, String pattern) {
StringBuilder patternBuilder = new StringBuilder();
Map<Character, Integer> backreferences = new HashMap<>();
int nextBackreference = 1;
for (int i = 0; i < pattern.length(); i++) {
char c = pattern.charAt(i);
if (!backreferences.containsKey(c)) {
backreferences.put(c, nextBackreference++);
patternBuilder.append("(.*?)");
} else {
patternBuilder.append('\\').append(backreferences.get(c));
}
}
return s.matches(patternBuilder.toString());
}
You could improve on brute force by first assuming token lengths, and checking that the sum of token lengths equals the length of the test string. That would be quicker than pattern matching each time. Still very slow as number of unique tokens increases however.
UPDATE:
Here is my solution. Based it off of the explanation I made before.
import com.google.common.collect.*;
import org.apache.commons.lang3.StringUtils;
import org.apache.commons.lang3.tuple.Pair;
import org.apache.commons.math3.util.Combinations;
import java.util.*;
/**
* Created by carlos on 2/14/15.
*/
public class PatternMatcher {
public static boolean isMatch(char[] pattern, String searchString){
return isMatch(pattern, searchString, new TreeMap<Integer, Pair<Integer, Integer>>(), Sets.newHashSet());
}
private static boolean isMatch(char[] pattern, String searchString, Map<Integer, Pair<Integer, Integer>> candidateSolution, Set<String> mappedStrings) {
List<Integer> occurrencesOfCharacterInPattern = getNextUnmappedPatternOccurrences(candidateSolution, pattern);
if(occurrencesOfCharacterInPattern.size() == 0)
return isValidSolution(candidateSolution, searchString, pattern, mappedStrings);
List<Pair<Integer, Integer>> sectionsOfUnmappedStrings = sectionsOfUnmappedStrings(searchString, candidateSolution);
if(sectionsOfUnmappedStrings.size() == 0)
return false;
String firstUnmappedString = substring(searchString, sectionsOfUnmappedStrings.get(0));
for (int substringSize = 1; substringSize <= firstUnmappedString.length(); substringSize++) {
String candidateSubstring = firstUnmappedString.substring(0, substringSize);
if(mappedStrings.contains(candidateSubstring))
continue;
List<Pair<Integer, Integer>> listOfAllOccurrencesOfSubstringInString = Lists.newArrayList();
for (int currentIndex = 0; currentIndex < sectionsOfUnmappedStrings.size(); currentIndex++) {
Pair<Integer,Integer> currentUnmappedSection = sectionsOfUnmappedStrings.get(currentIndex);
List<Pair<Integer, Integer>> occurrencesOfSubstringInString =
findAllInstancesOfSubstringInString(searchString, candidateSubstring,
currentUnmappedSection);
for(Pair<Integer,Integer> possibleAddition:occurrencesOfSubstringInString) {
listOfAllOccurrencesOfSubstringInString.add(possibleAddition);
}
}
if(listOfAllOccurrencesOfSubstringInString.size() < occurrencesOfCharacterInPattern.size())
return false;
Iterator<int []> possibleSolutionIterator =
new Combinations(listOfAllOccurrencesOfSubstringInString.size(),
occurrencesOfCharacterInPattern.size()).iterator();
iteratorLoop:
while(possibleSolutionIterator.hasNext()) {
Set<String> newMappedSets = Sets.newHashSet(mappedStrings);
newMappedSets.add(candidateSubstring);
TreeMap<Integer,Pair<Integer,Integer>> newCandidateSolution = Maps.newTreeMap();
// why doesn't Maps.newTreeMap(candidateSolution) work?
newCandidateSolution.putAll(candidateSolution);
int [] possibleSolutionIndexSet = possibleSolutionIterator.next();
for(int i = 0; i < possibleSolutionIndexSet.length; i++) {
Pair<Integer, Integer> candidatePair = listOfAllOccurrencesOfSubstringInString.get(possibleSolutionIndexSet[i]);
//if(candidateSolution.containsValue(Pair.of(0,1)) && candidateSolution.containsValue(Pair.of(9,10)) && candidateSolution.containsValue(Pair.of(18,19)) && listOfAllOccurrencesOfSubstringInString.size() == 3 && candidateSolution.size() == 3 && possibleSolutionIndexSet[0]==0 && possibleSolutionIndexSet[1] == 2){
if (makesSenseToInsert(newCandidateSolution, occurrencesOfCharacterInPattern.get(i), candidatePair))
newCandidateSolution.put(occurrencesOfCharacterInPattern.get(i), candidatePair);
else
break iteratorLoop;
}
if (isMatch(pattern, searchString, newCandidateSolution,newMappedSets))
return true;
}
}
return false;
}
private static boolean makesSenseToInsert(TreeMap<Integer, Pair<Integer, Integer>> newCandidateSolution, Integer startIndex, Pair<Integer, Integer> candidatePair) {
if(newCandidateSolution.size() == 0)
return true;
if(newCandidateSolution.floorEntry(startIndex).getValue().getRight() > candidatePair.getLeft())
return false;
Map.Entry<Integer, Pair<Integer, Integer>> ceilingEntry = newCandidateSolution.ceilingEntry(startIndex);
if(ceilingEntry !=null)
if(ceilingEntry.getValue().getLeft() < candidatePair.getRight())
return false;
return true;
}
private static boolean isValidSolution( Map<Integer, Pair<Integer, Integer>> candidateSolution,String searchString, char [] pattern, Set<String> mappedStrings){
List<Pair<Integer,Integer>> values = Lists.newArrayList(candidateSolution.values());
return areIntegersConsecutive(Lists.newArrayList(candidateSolution.keySet())) &&
arePairsConsecutive(values) &&
values.get(values.size() - 1).getRight() == searchString.length() &&
patternsAreUnique(pattern,mappedStrings);
}
private static boolean patternsAreUnique(char[] pattern, Set<String> mappedStrings) {
Set<Character> uniquePatterns = Sets.newHashSet();
for(Character character:pattern)
uniquePatterns.add(character);
return uniquePatterns.size() == mappedStrings.size();
}
private static List<Integer> getNextUnmappedPatternOccurrences(Map<Integer, Pair<Integer, Integer>> candidateSolution, char[] searchArray){
List<Integer> allMappedIndexes = Lists.newLinkedList(candidateSolution.keySet());
if(allMappedIndexes.size() == 0){
return occurrencesOfCharacterInArray(searchArray,searchArray[0]);
}
if(allMappedIndexes.size() == searchArray.length){
return Lists.newArrayList();
}
for(int i = 0; i < allMappedIndexes.size()-1; i++){
if(!areIntegersConsecutive(allMappedIndexes.get(i),allMappedIndexes.get(i+1))){
return occurrencesOfCharacterInArray(searchArray,searchArray[i+1]);
}
}
List<Integer> listOfNextUnmappedPattern = Lists.newArrayList();
listOfNextUnmappedPattern.add(allMappedIndexes.size());
return listOfNextUnmappedPattern;
}
private static String substring(String string, Pair<Integer,Integer> bounds){
try{
string.substring(bounds.getLeft(),bounds.getRight());
}catch (StringIndexOutOfBoundsException e){
System.out.println();
}
return string.substring(bounds.getLeft(),bounds.getRight());
}
private static List<Pair<Integer, Integer>> sectionsOfUnmappedStrings(String searchString, Map<Integer, Pair<Integer, Integer>> candidateSolution) {
if(candidateSolution.size() == 0) {
return Lists.newArrayList(Pair.of(0, searchString.length()));
}
List<Pair<Integer, Integer>> sectionsOfUnmappedStrings = Lists.newArrayList();
List<Pair<Integer,Integer>> allMappedPairs = Lists.newLinkedList(candidateSolution.values());
// Dont have to worry about the first index being mapped because of the way the first candidate solution is made
for(int i = 0; i < allMappedPairs.size() - 1; i++){
if(!arePairsConsecutive(allMappedPairs.get(i), allMappedPairs.get(i + 1))){
Pair<Integer,Integer> candidatePair = Pair.of(allMappedPairs.get(i).getRight(), allMappedPairs.get(i + 1).getLeft());
sectionsOfUnmappedStrings.add(candidatePair);
}
}
Pair<Integer,Integer> lastMappedPair = allMappedPairs.get(allMappedPairs.size() - 1);
if(lastMappedPair.getRight() != searchString.length()){
sectionsOfUnmappedStrings.add(Pair.of(lastMappedPair.getRight(),searchString.length()));
}
return sectionsOfUnmappedStrings;
}
public static boolean areIntegersConsecutive(List<Integer> integers){
for(int i = 0; i < integers.size() - 1; i++)
if(!areIntegersConsecutive(integers.get(i),integers.get(i+1)))
return false;
return true;
}
public static boolean areIntegersConsecutive(int left, int right){
return left == (right - 1);
}
public static boolean arePairsConsecutive(List<Pair<Integer,Integer>> pairs){
for(int i = 0; i < pairs.size() - 1; i++)
if(!arePairsConsecutive(pairs.get(i), pairs.get(i + 1)))
return false;
return true;
}
public static boolean arePairsConsecutive(Pair<Integer, Integer> left, Pair<Integer, Integer> right){
return left.getRight() == right.getLeft();
}
public static List<Integer> occurrencesOfCharacterInArray(char[] searchArray, char searchCharacter){
assert(searchArray.length>0);
List<Integer> occurrences = Lists.newLinkedList();
for(int i = 0;i<searchArray.length;i++){
if(searchArray[i] == searchCharacter)
occurrences.add(i);
}
return occurrences;
}
public static List<Pair<Integer,Integer>> findAllInstancesOfSubstringInString(String searchString, String substring, Pair<Integer,Integer> bounds){
String string = substring(searchString,bounds);
assert(StringUtils.isNoneBlank(substring,string));
int lastIndex = 0;
List<Pair<Integer,Integer>> listOfOccurrences = Lists.newLinkedList();
while(lastIndex != -1){
lastIndex = string.indexOf(substring,lastIndex);
if(lastIndex != -1){
int newIndex = lastIndex + substring.length();
listOfOccurrences.add(Pair.of(lastIndex + bounds.getLeft(), newIndex + bounds.getLeft()));
lastIndex = newIndex;
}
}
return listOfOccurrences;
}
}
It works with the cases provided, but is not thoroughly tested. Let me know if there are any mistakes.
ORIGINAL RESPONSE:
Assuming your string you are searching can have arbitrary length tokens (which some of your examples do) then:
You want to start trying to break your string into parts that match the pattern. Looking for contradictions along the way to cut down on your search tree.
When you start processing you're going to select N characters of the beginning of the string. Now, go and see if you can find that substring in the rest of the string. If you can't then it can't possibly be a solution. If you can then your string looks something like this
(N characters)<...>[(N characters)<...>] where either one of the <...> contains 0+ characters and aren't necessarily the same substring. And whats inside of [] could repeat a number of times equal to the number of times (N characters) appears in the string.
Now, you have the first letter of your pattern matched, your not sure if the rest of the pattern matches, but you can basically re-use this algorithm (with modifications) to interrogate the <...> parts of the string.
You would do this for N = 1,2,3,4...
Make sense?
I'll work an example (which doesn't cover all cases, but hopefully illustrates) Note, when i'm referring to substrings in the pattern i'll use single quotes and when i'm referring to substrings of the string i'll use double quotes.
isMatch("ababac", "bluegreenbluegreenbluewhite")
Ok, 'a' is my first pattern.
for N = 1 i get the string "b"
where is "b" in the search string?
bluegreenbluegreenbluewhite.
Ok, so at this point this string MIGHT match with "b" being the pattern 'a'. Lets see if we can do the same with the pattern 'b'. Logically, 'b' MUST be the entire string "luegreen" (because its squeezed between two consecutive 'a' patterns) then I check in between the 2nd and 3rd 'a'. YUP, its "luegreen".
Ok, so far i've matched all but the 'c' of my pattern. Easy case, its the rest of the string. It matches.
This is basically writing a Perl regex parser. ababc = (.+)(.+)(\1)(\2)(.+). So you just have to convert it to a Perl regex
Here's a sample snippet of my code:
public static final boolean isMatch(String patternStr, String input) {
// Initial Check (If all the characters in the pattern string are unique, degenerate case -> immediately return true)
char[] patt = patternStr.toCharArray();
Arrays.sort(patt);
boolean uniqueCase = true;
for (int i = 1; i < patt.length; i++) {
if (patt[i] == patt[i - 1]) {
uniqueCase = false;
break;
}
}
if (uniqueCase) {
return true;
}
String t1 = patternStr;
String t2 = input;
if (patternStr.length() == 0 && input.length() == 0) {
return true;
} else if (patternStr.length() != 0 && input.length() == 0) {
return false;
} else if (patternStr.length() == 0 && input.length() != 0) {
return false;
}
int count = 0;
StringBuffer sb = new StringBuffer();
char[] chars = input.toCharArray();
String match = "";
// first read for the first character pattern
for (int i = 0; i < chars.length; i++) {
sb.append(chars[i]);
count++;
if (!input.substring(count, input.length()).contains(sb.toString())) {
match = sb.delete(sb.length() - 1, sb.length()).toString();
break;
}
}
if (match.length() == 0) {
match = t2;
}
// based on that character, update patternStr and input string
t1 = t1.replace(String.valueOf(t1.charAt(0)), "");
t2 = t2.replace(match, "");
return isMatch(t1, t2);
}
I basically decided to first parse the pattern string and determine if there are any matching characters in the pattern string. For example in "aab" "a" is used twice in the pattern string and so "a" cannot map to something else. Otherwise, if there are no matching characters in a string such as "abc", it won't matter what our input string is since the pattern is unique and so it doesn't matter what each pattern character matches to (degenerative case).
If there are matching characters in the pattern string, then I would begin to check what each string matches to. Unfortunately, without knowing the delimiter I wouldn't know how long each string would be. Instead, I just decided to parse 1 character at a time and check if the other parts of the string contains the same string and continue adding characters to the buffer letter by letter until the buffer string cannot be found in the input string. Once I have the string determined, it's now in the buffer I would simply delete all the matched strings in the input string and the character pattern from the pattern string then recurse.
Apologies if my explanation wasn't very clear, I hope my code can be clear though.