Check anagrams existence - java

Compare 2 Strings and return if they are anagrams or not.
I have a working code:
import java.util.*;
public class HelloWorld {
public static void main(String[] args) {
HashMap<String, Integer> map= new HashMap<>();
HashMap<String, Integer> map1= new HashMap<>();
String str1 = "abaa";
String str2 = "baaa";
String str3 = "bbbb"; //false
for(int i=0 ; i < str1.length(); i++){ //sr1 map
String value = String.valueOf(str1.charAt(i));
if (map1.containsKey(value)) {
map1.put(value, map1.get(value) + 1);
} else {
// No such key
map1.put(value, 1);
}
}
for(int i=0 ; i < str1.length(); i++){ //str2 map
String value = String.valueOf(str3.charAt(i));
if (map.containsKey(value)) {
map.put(value, map.get(value) + 1);
} else {
// No such key
map.put(value, 1);
}
}
if(map1.equals(map)){
System.out.println("true"); //anagrams
} else{
System.out.println("FalsE"); //not anagrams
}
}
}
It outputs TRUE for str1, str2 and FALSE for str1, str3 as it should.
I did this using hashmaps though, and I was wondering if this is efficient. How can I calculate the efficiency of this? What is a more efficient method?
Efficiency: Seems like 2 O(n) calls and the hashmap calls are all O(1). Explain?

The computational complexity of your implementation is O(n), assuming n is the number of characters for each string, all having the same length. You have two operations:
Creating a HashMap is O(n): For each of the n chars you do one lookup and one insert, which is O(1)
Comparing two HashMaps is also O(n): For each key in one, you look it up in the other and compare the two values, which is O(1)
Together, these operations still run in O(n). This assumes that your HashMap implementation does not have too many collisions for each bucket.

The complexity is O(n) in your case. There is no way the last hash compare is bigger than O(n), and the complexity could be in the worst case 3*O(n) that means O(n) per total.
I have a suggestion to improve your solution:
You can use a simple char[26] array instead of hashmap since you have only 26 letters.
Since your array has only 0 values, you only have to do ++array[String.valueOf(str1.charAt(i)) - 97] (no if/else required)
In the second for, you decrement for each character --array[String.valueOf(str1.charAt(i)) - 97] (no if/else required)
For the final step you go through the 26 items and you print "is not anagram" if you find array[i] != 0 and return...or print "is anagram" after this for
It's still O(n), but uses less memory I think and the last step is clearer..because you have O(27) constant, that is O(1)
Edited: I updated the code because I forgot you use java and not c++, sorry. 97 is the char value of 'a' used to normalize the letters from 97-122 to 0-25

Related

Is there an efficient algorithm for outputting all strings stored in a sorted lexicographically list that are a permutation of an input string?

I would like to find the most efficient algorithm for this problem:
Given a string str and a list of strings lst that consists of only lowercase English characters and is sorted lexicographically, find all the words in lst that are a permutation of str.
for example:
str = "cat", lst = {"aca", "acc", "act", "cta", "tac"}
would return: {"act", "cta", "tac"}
I already have an algorithm that doesn't take advantage of the fact that lst is lexicographically ordered, and I am looking for the most efficient algorithm that takes advantage of this fact.
My algorithm goes like this:
public List<String> getPermutations(String str, List<String> lst){
List<String> res = new ArrayList<>();
for (String word : lst)
if (checkPermutation(word, str))
res.add(word);
return res;
}
public boolean checkPermutation(String word1, String word2) {
if (word1.length() != word2.length())
return false;
int[] count = new int[26];
int i;
for (i = 0; i < word1.length(); i++) {
count[word1.charAt(i) - 'a']++;
count[word2.charAt(i) - 'a']--;
}
for (i = 0; i < 26; i++)
if (count[i] != 0) {
return false;
}
return true;
}
Total runtime is O(NK) where N is the number of strings in lst, and k is the length of str.
One simple optimisation (that only becomes meaningful for really large data sets, as it doesn't really improve the O(NK):
put all the characters of your incoming str into a Set strChars
now: when iterating the words in your list: fetch the first character of each entry
if strChars.contains(charFromListEntry): check whether it is a permutation
else: obviously, that list word can't be a permutation
Note: the sorted ordering doesn't help much here: because you still have to check the first char of the next string from your list.
There might be other checks to avoid the costly checkPermutation() run, for example to first compare the lengths of the words: when the list string is shorter than the input string, it obviously can't be a permutation of all chars.
But as said, in the end you have to iterate over all entries in your list and determine whether an entry is a permutation. There is no way avoiding the corresponding "looping". The only thing you can affect is the cost that occurs within your loop.
Finally: if your List of strings would be a Set, then you could "simply" compute all permutations of your incoming str, and check for each permutation whether it is contained in that Set. But of course, in order to turn a list into a set, you have to iterate that thing.
Instead of iterating over the list and checking each element for being a permutation of your string, you can iterate over all permutations of the string and check each presence in the list using binary search.
E.g.
public List<String> getPermutations(String str, List<String> lst){
List<String> res = new ArrayList<>();
perm(str, (1L << str.length()) - 1, new StringBuilder(), lst, res);
return res;
}
private void perm(String source, long unused,
StringBuilder sb, List<String> lst, List<String> result) {
if(unused == 0) {
int i = Collections.binarySearch(lst, sb.toString());
if(i >= 0) result.add(lst.get(i));
}
for(long r = unused, l; (l = Long.highestOneBit(r)) != 0; r-=l) {
sb.append(source.charAt(Long.numberOfTrailingZeros(l)));
perm(source, unused & ~l, sb, lst, result);
sb.setLength(sb.length() - 1);
}
}
Now, the time complexity is O(K! × log N) which is not necessarily better than the O(NK) of your approach. It heavily depends on the magnitude of K and N. If the string is really short and the list really large, it may have an advantage.
There are a lot of optimizations imaginable. E.g. instead constructing each permutation, followed by a binary search, each recursion step could do a partial search to identify the potential search range for the next step and skip when it’s clear that the permutations can’t be contained. While this could raise the performance significantly, it can’t change the fundamental time complexity, i.e. the worst case.

ArrayList vs HashMap time complexity

The scenario is the following:
You have 2 strings (s1, s2) and want to check whether one is a permutation of the other so you generate all permutations of lets say s1 and store them and then iterate over and compare against s2 until either it's found or not.
Now, in this scenario, i am deliberating whether an ArrayList is better to use or a HashMap when considering strictly time complexity as i believe both have O(N) space complexity.
According to the javadocs, ArrayList has a search complexity of O(N) whereas HashMap is O(1). If this is the case, is there any reason to favor using ArrayList over HashMap here since HashMap would be faster?
The only potential downside i could think of is that your (k,v) pairs might be a bit weird if you did something like where the key = value, i.e. {k = "ABCD", v = "ABCD"}, etc..
As shown here:
import java.io.*;
import java.util.*;
class GFG{
static int NO_OF_CHARS = 256;
/* function to check whether two strings
are Permutation of each other */
static boolean arePermutation(char str1[], char str2[])
{
// Create 2 count arrays and initialize
// all values as 0
int count1[] = new int [NO_OF_CHARS];
Arrays.fill(count1, 0);
int count2[] = new int [NO_OF_CHARS];
Arrays.fill(count2, 0);
int i;
// For each character in input strings,
// increment count in the corresponding
// count array
for (i = 0; i <str1.length && i < str2.length ;
i++)
{
count1[str1[i]]++;
count2[str2[i]]++;
}
// If both strings are of different length.
// Removing this condition will make the program
// fail for strings like "aaca" and "aca"
if (str1.length != str2.length)
return false;
// Compare count arrays
for (i = 0; i < NO_OF_CHARS; i++)
if (count1[i] != count2[i])
return false;
return true;
}
/* Driver program to test to print printDups*/
public static void main(String args[])
{
char str1[] = ("geeksforgeeks").toCharArray();
char str2[] = ("forgeeksgeeks").toCharArray();
if ( arePermutation(str1, str2) )
System.out.println("Yes");
else
System.out.println("No");
}
}
// This code is contributed by Nikita Tiwari.
If you're glued to your implementation, use a HashSet, it still has O(1) lookup time, just without keys
You can use HashSet as you need only one parameter.

How to handle the time complexity for permutation of strings during anagrams search?

I have a program that computes that whether two strings are anagrams or not.
It works fine for inputs of strings below length of 10.
When I input two strings whose lengths are equal and have lengths of more than 10 program runs and doesn't produce an answer .
My concept is that if two strings are anagrams one string must be a permutation of other string.
This program generates the all permutations from one string, and after that it checks is there any matching permutation for the other string. In this case I wanted to ignore cases.
It returns false when there is no matching string found or the comparing strings are not equal in length, otherwise returns true.
public class Anagrams {
static ArrayList<String> str = new ArrayList<>();
static boolean isAnagram(String a, String b) {
// there is no need for checking these two
// strings because their length doesn't match
if (a.length() != b.length())
return false;
Anagrams.permute(a, 0, a.length() - 1);
for (String string : Anagrams.str)
if (string.equalsIgnoreCase(b))
// returns true if there is a matching string
// for b in the permuted string list of a
return true;
// returns false if there is no matching string
// for b in the permuted string list of a
return false;
}
private static void permute(String str, int l, int r) {
if (l == r)
// adds the permuted strings to the ArrayList
Anagrams.str.add(str);
else {
for (int i = l; i <= r; i++) {
str = Anagrams.swap(str, l, i);
Anagrams.permute(str, l + 1, r);
str = Anagrams.swap(str, l, i);
}
}
}
public static String swap(String a, int i, int j) {
char temp;
char[] charArray = a.toCharArray();
temp = charArray[i];
charArray[i] = charArray[j];
charArray[j] = temp;
return String.valueOf(charArray);
}
}
1. I want to know why can't this program process larger strings
2. I want to know how to fix this problem
Can you figure it out?
To solve this problem and check whether two strings are anagrams you don't actually need to generate every single permutation of the source string and then match it against the second one. What you can do instead, is count the frequency of each character in the first string, and then verify whether the same frequency applies for the second string.
The solution above requires one pass for each string, hence Θ(n) time complexity. In addition, you need auxiliary storage for counting characters which is Θ(1) space complexity. These are asymptotically tight bounds.
you're doing it in very expensive way and the time complexity here is exponential because your'e using permutations which requires factorials and factorials grow very fast , as you're doing permutations it will take time to get the output when the input is greater than 10.
11 factorial = 39916800
12 factorial = 479001600
13 factorial = 6227020800
and so on...
So don't think you're not getting an output for big numbers you will eventually get it
If you go something like 20-30 factorial i think i will take years to produce any output , if you use loops , with recursion you will overflow the stack.
fact : 50 factorial is a number that big it is more than the number of sand grains on earth , and computer surrender when they have to deal with numbers that big.
That is why they make you include special character in passwords to make the number of permutations too big that computers will not able to crack it for years if they try every permutations , and encryption also depends on that weakness of the computers.
So you don't have to and should not do that to solve it (because computer are not good very at it), it is an overkill
why don't you take each character from one string and match it with every character of other string, it will be quadratic at in worst case.
And if you sort both the strings then you can just say
string1.equals(string2)
true means anagram
false means not anagram
and it will take linear time,except the time taken in sorting.
You can first get arrays of characters from these strings, then sort them, and then compare the two sorted arrays. This method works with both regular characters and surrogate pairs.
public static void main(String[] args) {
System.out.println(isAnagram("ABCD", "DCBA")); // true
System.out.println(isAnagram("𝗔𝗕𝗖𝗗", "𝗗𝗖𝗕𝗔")); // true
}
static boolean isAnagram(String a, String b) {
// invalid incoming data
if (a == null || b == null
|| a.length() != b.length())
return false;
char[] aArr = a.toCharArray();
char[] bArr = b.toCharArray();
Arrays.sort(aArr);
Arrays.sort(bArr);
return Arrays.equals(aArr, bArr);
}
See also: Check if one array is a subset of the other array - special case

Space complexity of Partition Label Problem

A string S of lowercase letters is given. We want to partition this string into as many parts as possible so that each letter appears in at most one part, and return a list of integers representing size of each part.
Input: S = "ababcbacadefegdehijhklij"
Output: [9,7,8]
Explanation:
The partition is "ababcbaca", "defegde", "hijhklij".
This is a partition so that each letter appears in at most one part.
A partition like "ababcbacadefegde", "hijhklij" is incorrect, because it splits S into less parts.
Below is my Code for the above problem:
class Solution {
public List<Integer> partitionLabels(String S) {
char[] st = S.toCharArray();
int k=0,c=0;
List<Integer> res = new ArrayList<Integer> ();
Set<Integer> visited = new HashSet<Integer> ();
for(int i=0 ; i<st.length ; i++)
{
int idx = S.lastIndexOf(st[i]);
if(visited.add(i) && idx>i && idx>k)
{
k = Math.max(k,idx);
visited.add(k);
}
else if(i == k)
{
res.add(i-c+1);
c=i+1;
k++;
}
}
return res;
}
}
The above code works and the time complexity of the above code in O(n) since it visits each element once.
But what is the space complexity? Since I am using a Char array whose size is the same as the the String S and a Set whose Max size can be the size of the String S, is it also O(n)?
As you only use one dimensional arrays and lists your space complexity is O(n).
But your time complexity is O(n²) because S.lastIndexOf(st[i]); is O(n).
If you wanted also time to be O(n) you have to pre-process the string once (= O(n)) to determine the last occurence of each character, f.i. with a map to keep the retrieving time O(1).

First non-repeating character in a stream

My answer to this question is as follows, but I want to know if I can use this code and what will be the complexity:
import java.util.LinkedHashMap;
import java.util.Map.Entry;
public class FirstNonRepeatingCharacterinAString {
private char firstNonRepeatingCharacter(String str) {
LinkedHashMap<Character, Integer> hash =
new LinkedHashMap<Character, Integer>();
for(int i = 0 ; i< str.length() ; i++)
{
if(hash.get(str.charAt(i))==null)
hash.put(str.charAt(i), 1);
else
hash.put(str.charAt(i), hash.get(str.charAt(i))+1);
}
System.out.println(hash.toString());
for(Entry<Character, Integer> c : hash.entrySet())
{
if(c.getValue() == 1)
return c.getKey();
}
return 0 ;
}
public static void main(String args[])
{
String str = "geeksforgeeks";
FirstNonRepeatingCharacterinAString obj =
new FirstNonRepeatingCharacterinAString();
char c = obj.firstNonRepeatingCharacter(str);
System.out.println(c);
}
}
Your question about whether you "can use this code" is a little ambiguous - if you wrote it, I'd think you can use it :)
As for the complexity, it is O(n) where n is the number of characters in the String. To count the number of occurrences, you must iterate over the entire String, plus iterate over them again to find the first one with a count of 1. In the worst case, you have no non-repeating characters, or the only non-repeating character is the last one. In either case, you have to iterate over the whole String once more. So it's O(n+n) = O(n).
EDIT
There is a bug in your code, by the way. Because you are using an insertion-order LinkedHashMap, each call to put(Character,Integer) results in a re-ordering of the underlying list. You should probably use a LinkedHashMap<Character,int[]> instead, and check for the presence of keys before putting. If they exist, then merely increment the value stored in the int[] to avoid re-ording the map by making another put call. Even so, the resulting list will be in reverse order from the way you iterate over it, so the first non-repeating character will be the last one you find when iterating over it whose value is 1. Alternatively, you could just iterate in reverse in your first for loop, then you avoid having to always go through the entire Entry set if the first non-repeating character comes sooner than the final character in the original String.

Categories