Odd comparison problem in checking for anagram

Odd comparison problem in checking for anagram - java

I'm sorry, the title's awful; however, I couldn't think of any better way to summarize my plight.
In trying to solve a problem involving checking to see if one string is an anagram of another, I implemented a solution that involved removing all whitespace from both strings, converting them both to character arrays, sorting them and then seeing if they are equal to eachother.
If so, the program prints out "Is an anagram.", otherwise "Is not an anagram."
The problem is that even though my code compiles successfully and runs fine, the end result will always be "Is not an anagram.", regardless of whether or not the two original strings are indeed anagrams of each other. Quick code I inserted for debugging shows that, in a case with an actual anagram, the two character arrays I end up comparing are apparently identical, yet the result of the comparison is false.
I can't tell why exactly this is happening, unless I'm overlooking something incredibly obvious or there are some extra undisplayed characters in what I compare.
Here's the code:
import java.util.Arrays;
import java.util.Scanner;
public class Anagram {
public static void main(String[] args) {
char[] test1;
char[] test2;
Scanner input = new Scanner(System.in);
System.out.print("Enter first phrase>");
test1 = input.nextLine().replaceAll(" ", "").toCharArray();
Arrays.sort(test1);
System.out.print("Enter second phrase>");
test2 = input.nextLine().replaceAll(" ", "").toCharArray();
Arrays.sort(test2);
if (test1.equals(test2)) {
System.out.println("Is an anagram.");
}
else {
System.out.println("Is not an anagram.");
}
/* debugging */
System.out.println(test1);
System.out.println(test2);
System.out.println(test1.equals(test2));
}
}
And the resulting output from a test run:
Enter first phrase>CS AT WATERLOO
Enter second phrase>COOL AS WET ART
Is not an anagram.
AACELOORSTTW
AACELOORSTTW
false
Any and all help is greatly appreciated.

Use the Arrays.equals() method to compare two arrays. It will compare the elements of the arrays, whereas the default Object.equals() method will not.
Returns true if the two specified arrays of chars are equal to one another. Two arrays are considered equal if both arrays contain the same number of elements, and all corresponding pairs of elements in the two arrays are equal. In other words, two arrays are equal if they contain the same elements in the same order. Also, two array references are considered equal if both are null.

The .equals method of the array itself doesn't compare the contents of the array.
If you want to do that, you'll have to do it yourself - something like:
for(int i = 0; i < test1.length; i++) {
if(test1[i] != test2[i]) {
return false;
}
}
return true;
EDIT: Or use the static Arrays.equals.

Related

Comparing two char arrays in Java

I am trying to compare two char arrays lexicographically, using loops and arrays only. I solved the task, however, I think my code is bulky and unnecessarily long and would like an advice on how to optimize it. I am a beginner. See code below:
//Compare Character Arrays Lexicographically
//Write a program that compares two char arrays lexicographically (letter by letter).
// Research how to convert string to char array.
Scanner scanner = new Scanner(System.in);
String word1 = scanner.nextLine();
String word2 = scanner.nextLine();
char[] firstArray = word1.toCharArray();
char[] secondArray = word2.toCharArray();
for (char element : firstArray) {
System.out.print(element + " ");
}
System.out.println();
for (char element : secondArray) {
System.out.print(element + " ");
}
System.out.println();
String s = String.valueOf(firstArray);
String b = String.valueOf(secondArray);
int result = s.compareTo(b);
if (result < 0) {
System.out.println("First");
} else if (result > 0) {
System.out.println("Second");
} else {
System.out.println("Equal");
}
}
}

I think its pretty normal. You've done it right. There's not much code to reduce here , best you can do is not write the two for loops to print the char arrays. Or if you are wanting to print the two arrays then maybe use System.out.println(Arrays.toString(array_name)); instead of two full dedicated for/for each loops. It does the same thing in the background but makes your code look a little bit cleaner and that's what you are looking for.

As commented by tgdavies, you schoolwork assignment was likely intended for you to compare characters in your own code rather than call String#compareTo.
In real life, sorting words alphabetically is quite complex because of various cultural norms across various languages and dialects. For real work, we rely on collation tools rather than write our own sorting code. For example, an e with the diacritical ’ may sort before or after an e without, depending on the cultural context.
But for a schoolwork assignment, the goal of the exercise is likely to have you compare each letter of each word by examining its code point number, the number assigned to identify each character defined in Unicode. These code point numbers are assigned by Unicode in roughly alphabetical order. This code point number ordering is not sufficient to do sorting in real work, but is presumably good enough for your assignment, especially for text using only basic American English using letters a-z/A-Z.
So, if the numbers are the same, move to the next character in each word. When you reach the nth letter that are not the same in both, then in overly simplistic terms, you know which comes after which alphabetically. If all the numbers are the same, the words are the same.
Another real world problem is the char type has been legacy since Java 5, essentially broken since Java 2. As a 16-bit value, char is physically incapable of representing most characters.
So instead of char arrays, use int arrays to hold code point integer numbers.
int[] firstWordCodePoints = firstWord.codePoints().toArray() ;

What should be the logic of hashfunction() in order to check that two strings are anagrams or not?

I want to write a function that takes string as a parameter and returns a number corresponding to that string.
Integer hashfunction(String a)
{
//logic
}
Actually the question im solving is as follows :
Given an array of strings, return all groups of strings that are anagrams. Represent a group by a list of integers representing the index in the original list.
Input : cat dog god tca
Output : [[1, 4], [2, 3]]
Here is my implementation :-
public class Solution {
Integer hashfunction(String a)
{
int i=0;int ans=0;
for(i=0;i<a.length();i++)
{
ans+=(int)(a.charAt(i));//Adding all ASCII values
}
return new Integer(ans);
}
**Obviously this approach is incorrect**
public ArrayList<ArrayList<Integer>> anagrams(final List<String> a) {
int i=0;
HashMap<String,Integer> hashtable=new HashMap<String,Integer>();
ArrayList<Integer> mylist=new ArrayList<Integer>();
ArrayList<ArrayList<Integer>> answer=new ArrayList<ArrayList<Integer>>();
if(a.size()==1)
{
mylist.add(new Integer(1));
answer.add(mylist);
return answer;
}
int j=1;
for(i=0;i<a.size()-1;i++)
{
hashtable.put(a.get(i),hashfunction(a.get(i)));
for(j=i+1;j<a.size();j++)
{
if(hashtable.containsValue(hashfunction(a.get(j))))
{
mylist.add(new Integer(i+1));
mylist.add(new Integer(j+1));
answer.add(mylist);
mylist.clear();
}
}
}
return answer;
}
}

Oh boy... there's quite a bit of stuff that's open for interpretation here. Case-sensitivity, locales, characters allowed/blacklisted... There are going to be a lot of ways to answer the general question. So, first, let me lay down a few assumptions:
Case doesn't matter. ("Rat" is an anagram of "Tar", even with the capital lettering.)
Locale is American English when it comes to the alphabet. (26 letters from A-Z. Compare this to Spanish, which has 28 IIRC, among which 'll' is considered a single letter and a potential consideration for Spanish anagrams!)
Whitespace is ignored in our definition of an anagram. ("arthas menethil" is an anagram of "trash in a helmet" even though the number of whitespaces is different.)
An empty string (null, 0-length, all white-space) has a "hash" (I prefer the term "digest", but a name is a name) of 1.
If you don't like any of those assumptions, you can modify them as you wish. Of course, that will result in the following algorithm being slightly different, but they're a good set of guidelines that will make the general algorithm relatively easy to understand and refactor if you wish.
Two strings are anagrams if they are exhaustively composed of the same set of characters and the same number of each included character. There's a lot of tools available in Java that makes this task fairly simple. We have String methods, Lists, Comparators, boxed primitives, and existing hashCode methods for... well, all of those. And we're going to use them to make our "hash" method.
private static int hashString(String s) {
if (s == null) return 0; // An empty/null string will return 0.
List<Character> charList = new ArrayList<>();
String lowercase = s.toLowerCase(); // This gets us around case sensitivity
for (int i = 0; i < lowercase.length(); i++) {
Character c = Character.valueOf(lowercase.charAt(i));
if (Character.isWhitespace(c)) continue; // spaces don't count
charList.add(c); // Note the character for future processing...
}
// Now we have a list of Characters... Sort it!
Collections.sort(charList);
return charList.hashCode(); // See contract of java.util.List#haschCode
}
And voila; you have a method that can digest a string and produce an integer representing it, regardless of the order of the characters within. You can use this as the basis for determining whether two strings are anagrams of each other... but I wouldn't. You asked for a digest function that produces an Integer, but keep in mind that in java, an Integer is merely a 32-bit value. This method can only produce about 4.2-billion unique values, and there are a whole lot more than 4.2-billion strings you can throw at it. This method can produce collisions and give you nonsensical results. If that's a problem, you might want to consider using BigInteger instead.
private static BigInteger hashString(String s) {
BigInteger THIRTY_ONE = BigInteger.valueOf(31); // You should promote this to a class constant!
if (s == null) return BigInteger.ONE; // An empty/null string will return 1.
BigInteger r = BigInteger.ONE; // The value of r will be returned by this method
List<Character> charList = new ArrayList<>();
String lowercase = s.toLowerCase(); // This gets us around case sensitivity
for (int i = 0; i < lowercase.length(); i++) {
Character c = Character.valueOf(lowercase.charAt(i));
if (Character.isWhitespace(c)) continue; // spaces don't count
charList.add(c); // Note the character for future processing...
}
// Now we have a list of Characters... Sort it!
Collections.sort(charList);
// Calculate our bighash, similar to how java's List interface does.
for (Character c : charList) {
int charHash = c.hashCode();
r=r.multiply(THIRTY_ONE).add(BigInteger.valueOf(charHash));
}
return r;
}

You need a number that is the same for all strings made up of the same characters.
The String.hashCode method returns a number that is the same for all strings made up of the same characters in the same order.
If you can sort all words consistently (for example: alphabetically) then String.hashCode will return the same number for all anagrams.
return String.valueOf(Arrays.sort(inputString.toCharArray())).hashCode();
Note: this will work for all words that are anagrams (no false negatives) but it may not work for all words that are not anagrams (possibly false positives). This is highly unlikely for short words, but once you get to words that are hundreds of characters long, you will start encountering more than one set of anagrams with the same hash code.
Also note: this gives you the answer to the (title of the) question, but it isn't enough for the question you're solving. You need to figure out how to relate this number to an index in your original list.

What will be the output from the following three code segments?

I'm currently on a self learning course for Java and have gotten completely stumped at one of the questions and was wonder if anyone can help me see sense...
Question: What will be the output from the following three code segments? Explain fully the differences.
public static void method2(){
String mystring1 = "Hello World";
String mystring2 = new String("Hello World");
if (mystring1.equals(mystring2)) {
System.out.println("M2 The 2 strings are equal");
} else {
System.out.println("M2 The 2 strings are not equal");
}
}
public static void method3(){
String mystring1 = "Hello World";
String mystring2 = "Hello World";
if (mystring1 == mystring2) {
System.out.println("M3 The 2 strings are equal");
} else {
System.out.println("M3 The 2 strings are not equal");
}
}
The answer I gave:
Method 2:
"M2 The 2 strings are equal"
It returns equal because even though they are two separate strings the (mystring1.equals(mystring2)) recognises that the two strings have the exact same value. If == was used here it return as not equal because they are two different objects.
Method 3:
"M2 The 2 strings are equal"
The 2 strings are equal because they are both pointing towards the exact same string in the pool. == was used here making it look at the two values and it recognises that they both have the exact same characters. It recognises that Hello World was already in the pool so it points myString2 towards that string.
I was pretty confident in my answer but it's wrong. Any help?

Both will return true.
1) 2 new string objects are created but use .equals which means their actual value is compared. Which is equal.
2) 1 new string object is created because they are both constant at compile time. This will result in them pointing to the same object.
This sentence might be your issue:
== was used here making it look at the two values and it recognises that they both have the exact same characters.
== checks for reference equality whereas you're describing value equality.

First two are equal, second two are not. But unless you put it into main() method there will be no output at all.
EDIT: second pair are not the same because "==" compares addresses in memory.

You're right about the first one.
However the second would return "M3 The 2 strings are not equal".
This is because == tests for reference equality and since they are two different variables, they would not equal.

String Permutations

I was recently trying to write a script that print out all the permutations of a word in Java. For some reason it only prints out one. I just can't figure it out!
import java.util.*;
public class AllPermutations {
ArrayList<String> letters = new ArrayList<String>();
public void main(){
letters.add("H");
letters.add("a");
letters.add("s");
permutate("",letters);
}
public void permutate(String word, ArrayList<String> lettersLeft){
if(lettersLeft.size()==0){
System.out.println(word);
}else{
for(int i=0;i<lettersLeft.size();i++){
String newWord = new String();
newWord = word+lettersLeft.get(i);
lettersLeft.remove(i);
permutate(newWord, lettersLeft);
}
}
}
}

You need to add the letter you have removed back to the lettersLeft list
public void permutate(String word, ArrayList<String> lettersLeft){
if(lettersLeft.size()==0){
System.out.println(word);
}else{
for(int i=0;i<lettersLeft.size();i++){
String temp = lettersLeft.remove(i);
String newWord = word+temp;
permutate(newWord, lettersLeft);
lettersLeft.add(i, temp);
}
}
}
I haven't tested it, but I think it should work.
The problem is that Java/you are passing by reference, not copy (ArrayList). Therefore once you reach the bottom of your recursion tree, lettersLeft will contain 0 elements, and once you go back up, it will still have 0 elements.
As a side note, StringBuilder/StringBuffer is better at doing string permutation task, since String is immutable, therefore you are wasting a lot of resource creating new Strings, n! to be exact. The difference between the two StringBuilder/Buffer is up to you to discover.

The reason for that is lettersLeft is being passed by reference always. Once you are removing a letter from lettersLeft, it is being permanently removed. So for the first iteration you have "HAS" printed out. once that finishes, the recursion algorithm backs up a level to make the second iteration, but what do you know?? lettersLeft is empty. so it terminates without passing by the if statement causing it not to get another word or permutation. In order to resolve this, create a local copy, just like you did with newWord. Hope that helps.

In this case you are removing the letters from the Arraylist and it gets empty till it reaches the end of first word.. Then after that list size is always zero...Add the removed letter back to the list...........
I would recommend you to use the below link and find good examples of String Permutations as there are both memory efficient and space efficient solutions of String permutations...
http://www.codingeek.com/java/strings/find-all-possible-permutations-of-string-using-recursive-method/

Alternative to substring

I have a strange problem when adding a value to a String array which is later involved in an array sort using a hash map. I have a filename XFR900a, and the XFR900 part is added to the array using the following code;
private ArrayList<String> Types = new ArrayList<String>();
...
Types.add(name.substring(0,(name.length() - 1));
System.out.println(name.substring(0,(name.length() - 1));
I even print the line which gives "XFR900", however the array sort later on behaves differently when I use the following code instead;
Types.add("XFR900");
System.out.println(name.substring(0,(name.length() - 1));
which is simply the substring part done manually, very confusing.
Are there any good alternatives to substring, as there must be some odd non ascii character in there?
Phil
UPDATE
Thanks for your comments everyone. Here is some of the code that later compares the string;
for (int i=0;i< matchedArray.size();i++){
//run through the arrays
if (last == matchedArray.get(i)) {
//add arrays to a data array
ArrayList data = new ArrayList();
data.add(matchedArray1.get(i));
data.add(matchedArray2.get(i));
data.add(matchedArray3.get(i));
data.add(matchedArray4.get(i));
data.add(matchedArray5.get(i));
//put into hash map
map.put(matchedArray.get(i), data);
}
else {
//TODO
System.out.println("DO NOT MATCH :" + last + "-" + matchedArray.get(i));
As you can see I have added a test System.out.println("DO NOT MATCH" ... and below is some the output;
DO NOT MATCH :FR99-XFR900
DO NOT MATCH :XFR900-XFR900
I only run the substring on the XFR900a filename. The problem is that for the test line to be printed last != matchedArray.get(i) however they are then the same when printed out to the display.
Phil

You should never use the == operator to compare the content of strings. == checks if it is the same object. Write last.equals(matchedArray.get(i)) instead. The equals() method checks if to object are equal, not if they are the same. In case of String it checks if the two strings consists of the same characters. This might eliminate your strange behaviour.
PS: The behaviour of == on string is a little unpredictable because the java virtual machine does some optimization. If two strings are equal it is possible that the jvm uses the same object for both. This is possible because String objects are immutable anyway. This would explain the difference in behaviour if you write down the substring manually. In the one case the jvm optimizes, in the other it doesn't.

Use .equals() rather than == because they are strings!
if (last.equals(matchedArray.get(i))) {}

Never use == operator if you wanted to check the value since operator will check the Object reference equality, use equals operator which check on the value not the reference i.e. for (int i=0;i< matchedArray.size();i++){
//run through the arrays
if (last.equals(matchedArray.get(i))) { // Line edited
//add arrays to a data array
ArrayList data = new ArrayList();
data.add(matchedArray1.get(i));
data.add(matchedArray2.get(i));
data.add(matchedArray3.get(i));
data.add(matchedArray4.get(i));
data.add(matchedArray5.get(i));
//put into hash map
map.put(matchedArray.get(i), data);
}
else {
//TODO
System.out.println("DO NOT MATCH :" + last + "-" + matchedArray.get(i));

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.