Letters having the same equivalence - java

Looking ideas on how to go about accomplishing this. Basically I want certain characters to have equivalence.
For example: M = N
So: Mum = Nun
However: Mum can also equal Num.
I was advised to try a map of replacements and this worked until the third example where not all M's are to be changed to N's.
Thanks
This is the code for the map of replacements:
HashMap<String,String> replacements = new HashMap<>();
replacements.put("n","m");
replacements.put("m","n");
String ignoreFirstChar = names[j].charAt(0) + (names[j].substring(1,names[j].length()).replaceAll("[^a-zA-Z]+", "").toLowerCase());
String result = "";
for(int i1 = 0; i1 < ignoreFirstChar.length(); i1++) {
String rpl = replacements.get(ignoreFirstChar.charAt(i1)+"");
result += rpl==null?ignoreFirstChar.charAt(i1):rpl;
}
System.out.println(ignoreFirstChar);
System.out.println(result);

I assume M and m are not equivalent. Therefore, if M = N, we cannot say M = n. If you would like to use a "map of replacements" as you have been suggested, I would use it for the purpose of normalizing your strings.
You would take the current problem of
Given strings x and y, determine whether x equals y
and change it to
Given strings x and y, determine whether normalize(x) equals normalize(y)
The purpose of normalizing your strings is to apply any equivalence rules that you have, such as M = N. That way "Mum" would be converted to "Num", and then you can compare the two strings without having to worry about the rules because they've already been applied.
The normalize method would look something like
/*
* Takes each character in inStr and replaces them as necessary based on
* your replacement map. For example, if you see an "n", then replace it with "m"
*/
String normalize(String inStr) {
String newStr;
// something happens
return newStr;
}
If case-sensitivity is not important, then you would again normalize your strings by first converting them to lower-case or upper-case (doesn't matter, as long as it is consistent)

Related

Replacing a character in a string from another string with the same char index

I'm trying to search and reveal unknown characters in a string. Both strings are of length 12.
Example:
String s1 = "1x11222xx333";
String s2 = "111122223333"
The program should check for all unknowns in s1 represented by x|X and get the relevant chars in s2 and replace the x|X by the relevant char.
So far my code has replaced only the first x|X with the relevant char from s2 but printed duplicates for the rest of the unknowns with the char for the first x|X.
Here is my code:
String VoucherNumber = "1111x22xx333";
String VoucherRecord = "111122223333";
String testVoucher = null;
char x = 'x'|'X';
System.out.println(VoucherNumber); // including unknowns
//find x|X in the string VoucherNumber
for(int i = 0; i < VoucherNumber.length(); i++){
if (VoucherNumber.charAt(i) == x){
testVoucher = VoucherNumber.replace(VoucherNumber.charAt(i), VoucherRecord.charAt(i));
}
}
System.out.println(testVoucher); //after replacing unknowns
}
}
I am always a fan of using StringBuilders, so here's a solution using that:
private static String replaceUnknownChars(String strWithUnknownChars, String fullStr) {
StringBuilder sb = new StringBuilder(strWithUnknownChars);
while ((int index = Math.max(sb.toString().indexOf('x'), sb.toString().indexOf('X'))) != -1) {
sb.setCharAt(index, fullStr.charAt(index));
}
return sb.toString();
}
It's quite straightforward. You create a new string builder. While a x or X can still be found in the string builder (indexOf('X') != -1), get the index and setCharAt.
Your are using String.replace(char, char) the wrong way, the doc says
Returns a new string resulting from replacing all occurrences of oldChar in this string with newChar.
So you if you have more than one character, this will replace every one with the same value.
You need to "change" only the character at a specific spot, for this, the easiest is to use the char array that you can get with String.toCharArray, from this, this is you can use the same logic.
Of course, you can use String.indexOf to find the index of a specific character
Note : char c = 'x'|'X'; will not give you the expected result. This will do a binary operation giving a value that is not the one you want.
The OR will return 1 if one of the bit is 1.
0111 1000 (x)
0101 1000 (X)
OR
0111 1000 (x)
But the result will be an integer (every numeric operation return at minimum an integer, you can find more information about that)
You have two solution here, you either use two variable (or an array) or if you can, you use String.toLowerCase an use only char c = 'x'

Java string arrayList: Sorting the elements in descending order (like polynomials in math)

I'm still quite new to programming, so I'm sorry if I caused you to face palm.
Right now, I am trying to create parentheses-expander in Java. The current program can already expand the parentheses, but it can not simplify the results, because the terms are not in the descending order. I do understand that you could try to add the terms without re-ordering them by comparing the variables contained in each of the elements. However, I want the program to "show work" like a human, so I need the terms in descending order.
And for that, I want to create a method that, given a string arrayList, re-orders the elements in something like descending order for polynomials in math.
If any of the variables had exponents, the variable is just repeated to the number of the exponent.
for example:
X^2 = XX,
a^3 = aaa,
Z^5 = ZZZZZ
Also, there will be no negative exponents nor parentheses.
All elements have either + or - at the beginning(and no other operators after that).
All elements have a coefficient, even if it is 1.
Capital letters have higher importance than lower case letters, and elements with just numbers should be re-located to the very end.
I forgot the mathematical word for that, but the terms should be ordered in a interest of A, then B so on until Z, and then a,b,c,...so on.(I mean, terms with most A comes first, B second ,C third... up until z)
Coefficients and operators should be ignored.
For example, if the input was this:
[-1b,+3XX,-4AA,+1aaa,+20CCa,-9ABa,-9ABaa,+20CCCa,+3BBX,+1aab,+10]
Then I want the method to return the arrayList like:
[-4AA,-9ABaa,-9ABa,+3BBX,+20CCCa,+20CCa,+3XX,+1aaa,+1aab,-1b,+10]
I'm very much stuck right here. any help will be appreciated. If I didn't describe my problem clear enough, please let me know. I will clarify.
I believe wolfram alpha already has parentheses expanding capabilities. However, I still want to make this.
If anyone can help me with this, that will be amazing. Thanks in advance!
You have a couple of challenges that need to be dealt with individually:
How do I parse something like -1b into a format I can work with?
How do I sort by a custom sorting rule?
For the first part, your rule is very well-defined and the format is pretty simple. This lends itself well to using a regular expression to parse it:
Also, there will be no negative exponents nor parentheses. All elements have either + or - at the beginning(and no other operators after that). All elements have a coefficient, even if it is 1.
So a good regular expression format might be:
([-+]\d+)(\w+)?
This would result in two "capture groups". The first would be the numeric part, and the second would be the (optional) repeated string part.
After decomposing each entry into these two separate parts, it is pretty easy to come up with a set of rules for determining the sort order:
If both of them are numbers (having only the first part), then sort as numbers
If one of them is a number, and the other has letters, sort the number afterward.
If both have numbers and letters, sort according to the letters only using normal String sorting.
An easy way to do custom sorting is to write a custom Comparator class which would be used as an argument to the sort function. Combining all the ideas presented above that might look something like this:
public class PolynomialComparator implements Comparator<String> {
private static Pattern pattern = Pattern.compile("([-+]\\d+)(\\w+)?");
#Override
public int compare(String s1, String s2) {
if (s1 == null) throw new NullPointerException("s1");
if (s2 == null) throw new NullPointerException("s2");
int compare = 0;
Matcher m1 = pattern.matcher(s1);
Matcher m2 = pattern.matcher(s2);
if (!m1.matches()) throw new IllegalArgumentException("Invalid Polynomial format: " + s1);
if (!m2.matches()) throw new IllegalArgumentException("Invalid Polynomial format: " + s2);
int n1 = Integer.parseInt(m1.group(1));
int n2 = Integer.parseInt(m2.group(1));
String p1 = m1.group(2);
String p2 = m2.group(2);
if (p1 == null && p2 == null) { // Rule #1: just compare numbers
compare = n2 - n1;
} else if (p1 == null) { // Rule #2: always sort number last
compare = 1;
} else if (p2 == null) { // Rule #2: always sort non-number first
compare = -1;
} else { // Rule #3: compare the letters
compare = m1.group(2).compareTo(m2.group(2));
}
return compare;
}
}
Finally, to tie it all together, here is a simple program that correctly sorts your provided example using this Comparator (with the exception of your second and third entry which I believe is wrong in your example):
public static void main(String args[]){
String input = "[-1b,+3XX,-4AA,+1aaa,+20CCa,-9ABa,-9ABaa,+20CCCa,+3BBX,+1aab,+10]";
String[] array = input.substring(1, input.length() - 1).split(",");
Arrays.sort(array, new PolynomialComparator());
System.out.println("[" + String.join(",", array) + "]");
}
OUTPUT: [-4AA,-9ABa,-9ABaa,+3BBX,+20CCCa,+20CCa,+3XX,+1aaa,+1aab,-1b,+10]
Hopefully you can spend some time walking through this and learn a few ideas that will help you on your way. Cheers!

What should be the logic of hashfunction() in order to check that two strings are anagrams or not?

I want to write a function that takes string as a parameter and returns a number corresponding to that string.
Integer hashfunction(String a)
{
//logic
}
Actually the question im solving is as follows :
Given an array of strings, return all groups of strings that are anagrams. Represent a group by a list of integers representing the index in the original list.
Input : cat dog god tca
Output : [[1, 4], [2, 3]]
Here is my implementation :-
public class Solution {
Integer hashfunction(String a)
{
int i=0;int ans=0;
for(i=0;i<a.length();i++)
{
ans+=(int)(a.charAt(i));//Adding all ASCII values
}
return new Integer(ans);
}
**Obviously this approach is incorrect**
public ArrayList<ArrayList<Integer>> anagrams(final List<String> a) {
int i=0;
HashMap<String,Integer> hashtable=new HashMap<String,Integer>();
ArrayList<Integer> mylist=new ArrayList<Integer>();
ArrayList<ArrayList<Integer>> answer=new ArrayList<ArrayList<Integer>>();
if(a.size()==1)
{
mylist.add(new Integer(1));
answer.add(mylist);
return answer;
}
int j=1;
for(i=0;i<a.size()-1;i++)
{
hashtable.put(a.get(i),hashfunction(a.get(i)));
for(j=i+1;j<a.size();j++)
{
if(hashtable.containsValue(hashfunction(a.get(j))))
{
mylist.add(new Integer(i+1));
mylist.add(new Integer(j+1));
answer.add(mylist);
mylist.clear();
}
}
}
return answer;
}
}
Oh boy... there's quite a bit of stuff that's open for interpretation here. Case-sensitivity, locales, characters allowed/blacklisted... There are going to be a lot of ways to answer the general question. So, first, let me lay down a few assumptions:
Case doesn't matter. ("Rat" is an anagram of "Tar", even with the capital lettering.)
Locale is American English when it comes to the alphabet. (26 letters from A-Z. Compare this to Spanish, which has 28 IIRC, among which 'll' is considered a single letter and a potential consideration for Spanish anagrams!)
Whitespace is ignored in our definition of an anagram. ("arthas menethil" is an anagram of "trash in a helmet" even though the number of whitespaces is different.)
An empty string (null, 0-length, all white-space) has a "hash" (I prefer the term "digest", but a name is a name) of 1.
If you don't like any of those assumptions, you can modify them as you wish. Of course, that will result in the following algorithm being slightly different, but they're a good set of guidelines that will make the general algorithm relatively easy to understand and refactor if you wish.
Two strings are anagrams if they are exhaustively composed of the same set of characters and the same number of each included character. There's a lot of tools available in Java that makes this task fairly simple. We have String methods, Lists, Comparators, boxed primitives, and existing hashCode methods for... well, all of those. And we're going to use them to make our "hash" method.
private static int hashString(String s) {
if (s == null) return 0; // An empty/null string will return 0.
List<Character> charList = new ArrayList<>();
String lowercase = s.toLowerCase(); // This gets us around case sensitivity
for (int i = 0; i < lowercase.length(); i++) {
Character c = Character.valueOf(lowercase.charAt(i));
if (Character.isWhitespace(c)) continue; // spaces don't count
charList.add(c); // Note the character for future processing...
}
// Now we have a list of Characters... Sort it!
Collections.sort(charList);
return charList.hashCode(); // See contract of java.util.List#haschCode
}
And voila; you have a method that can digest a string and produce an integer representing it, regardless of the order of the characters within. You can use this as the basis for determining whether two strings are anagrams of each other... but I wouldn't. You asked for a digest function that produces an Integer, but keep in mind that in java, an Integer is merely a 32-bit value. This method can only produce about 4.2-billion unique values, and there are a whole lot more than 4.2-billion strings you can throw at it. This method can produce collisions and give you nonsensical results. If that's a problem, you might want to consider using BigInteger instead.
private static BigInteger hashString(String s) {
BigInteger THIRTY_ONE = BigInteger.valueOf(31); // You should promote this to a class constant!
if (s == null) return BigInteger.ONE; // An empty/null string will return 1.
BigInteger r = BigInteger.ONE; // The value of r will be returned by this method
List<Character> charList = new ArrayList<>();
String lowercase = s.toLowerCase(); // This gets us around case sensitivity
for (int i = 0; i < lowercase.length(); i++) {
Character c = Character.valueOf(lowercase.charAt(i));
if (Character.isWhitespace(c)) continue; // spaces don't count
charList.add(c); // Note the character for future processing...
}
// Now we have a list of Characters... Sort it!
Collections.sort(charList);
// Calculate our bighash, similar to how java's List interface does.
for (Character c : charList) {
int charHash = c.hashCode();
r=r.multiply(THIRTY_ONE).add(BigInteger.valueOf(charHash));
}
return r;
}
You need a number that is the same for all strings made up of the same characters.
The String.hashCode method returns a number that is the same for all strings made up of the same characters in the same order.
If you can sort all words consistently (for example: alphabetically) then String.hashCode will return the same number for all anagrams.
return String.valueOf(Arrays.sort(inputString.toCharArray())).hashCode();
Note: this will work for all words that are anagrams (no false negatives) but it may not work for all words that are not anagrams (possibly false positives). This is highly unlikely for short words, but once you get to words that are hundreds of characters long, you will start encountering more than one set of anagrams with the same hash code.
Also note: this gives you the answer to the (title of the) question, but it isn't enough for the question you're solving. You need to figure out how to relate this number to an index in your original list.

Remove all non alphabetic characters from a String array in java

I'm trying to write a method that removes all non alphabetic characters from a Java String[] and then convert the String to an lower case string. I've tried using regular expression to replace the occurence of all non alphabetic characters by "" .However, the output that I am getting is not able to do so. Here is the code
static String[] inputValidator(String[] line) {
for(int i = 0; i < line.length; i++) {
line[i].replaceAll("[^a-zA-Z]", "");
line[i].toLowerCase();
}
return line;
}
However if I try to supply an input that has non alphabets (say - or .) the output also consists of them, as they are not removed.
Example Input
A dog is an animal. Animals are not people.
Output that I'm getting
A
dog
is
an
animal.
Animals
are
not
people.
Output that is expected
a
dog
is
an
animal
animals
are
not
people
The problem is your changes are not being stored because Strings are immutable. Each of the method calls is returning a new String representing the change, with the current String staying the same. You just need to store the returned String back into the array.
line[i] = line[i].replaceAll("[^a-zA-Z]", "");
line[i] = line[i].toLowerCase();
Because the each method is returning a String you can chain your method calls together. This will perform the second method call on the result of the first, allowing you to do both actions in one line.
line[i] = line[i].replaceAll("[^a-zA-Z]", "").toLowerCase();
You need to assign the result of your regex back to lines[i].
for ( int i = 0; i < line.length; i++) {
line[i] = line[i].replaceAll("[^a-zA-Z]", "").toLowerCase();
}
It doesn't work because strings are immutable, you need to set a value
e.g.
line[i] = line[i].toLowerCase();
You must reassign the result of toLowerCase() and replaceAll() back to line[i], since Java String is immutable (its internal value never changes, and the methods in String class will return a new String object instead of modifying the String object).
As it already answered , just thought of sharing one more way that was not mentioned here >
str = str.replaceAll("\\P{Alnum}", "").toLowerCase();
A cool (but slightly cumbersome, if you don't like casting) way of doing what you want to do is go through the entire string, index by index, casting each result from String.charAt(index) to (byte), and then checking to see if that byte is either a) in the numeric range of lower-case alphabetic characters (a = 97 to z = 122), in which case cast it back to char and add it to a String, array, or what-have-you, or b) in the numeric range of upper-case alphabetic characters (A = 65 to Z = 90), in which case add 32 (A + 22 = 65 + 32 = 97 = a) and cast that to char and add it in. If it is in neither of those ranges, simply discard it.
You can also use Arrays.setAll for this:
Arrays.setAll(array, i -> array[i].replaceAll("[^a-zA-Z]", "").toLowerCase());
Here is working method
String name = "Joy.78#,+~'{/>";
String[] stringArray = name.split("\\W+");
StringBuilder result = new StringBuilder();
for (int i = 0; i < stringArray.length; i++) {
result.append(stringArray[i]);
}
String nameNew = result.toString();
nameNew.toLowerCase();
public static void solve(String line){
// trim to remove unwanted spaces
line= line.trim();
String[] split = line.split("\\W+");
// print using for-each
for (String s : split) {
System.out.println(s);
}

intersection of two strings using Java HashSet

I am trying to learn Java by doing some assignments from a Stanford class and am having trouble answering this question.
boolean stringIntersect(String a, String b, int len): Given 2 strings,
consider all the substrings within them of length len. Returns true if
there are any such substrings which appear in both strings. Compute
this in O(n) time using a HashSet.
I can't figure out how to do it using a Hashset because you cannot store repeating characters. So stringIntersect(hoopla, loopla, 5) should return true.
thanks!
Edit: Thanks so much for all your prompt responses. It was helpful to see explanations as well as code. I guess I couldn't see why storing substrings in a hashset would make the algorithm more efficient. I originally had a solution like :
public static boolean stringIntersect(String a, String b, int len) {
assert (len>=1);
if (len>a.length() || len>b.length()) return false;
String s1=new String(),s2=new String();
if (a.length()<b.length()){
s1=a;
s2=b;
}
else {
s1=b;
s2=a;
}
int index = 0;
while (index<=s1.length()-len){
if (s2.contains(s1.substring(index,index+len)))return true;
index++;
}
return false;
}
I'm not sure I understand what you mean by "you cannot store repeating characters" A hashset is a Set, so it can do two things: you can add value to it, and you can add values to it, and you can check if a value is already in it. In this case, the problem wants you to answer the question by storing strings, not chars, in the HashSet. To do this in java:
Set<String> stringSet = new HashSet<String>();
Try breaking this problem into two parts:
1. Generate all the substrings of length len of a string
2. Use this to solve the problem.
The hint for part two is:
Step 1: For the first string enter the substrings into a hashset
Step 2: For the second string, check the values in the hashset
Note (Advanced): this problem is poorly specified. Entering and checking strings in a hashtable is O the length of the string. For string a of length n you have O(n-k) substrings of length k. So for string a being a string of length n and string b being a string of length m you have O((n-k)*k+(m-k)*k) this is not really big Oh of n, since your running time for k = n/2 is O((n/2)*(n/2)) = O(n^2)
Edit: So what if you actually want to do this in O(n) (or perhaps O(n+m+k))? My belief is that the original homework was asking for something like the algorithm I described above. But we can do better. Whats more, we can do better and still make a HashSet the crucial tool for our algorithm. The idea is to perform our search using a "Rolling Hash." Wikipedia describes a couple: http://en.wikipedia.org/wiki/Rolling_hash, but we will implement our own.
A simple solution would be to XOR the values of the character hashes together. This could allow us to add a new char to the hash O(1) and remove one O(1) making computing the next hash trivial. But this simple algorithm wont work for two reasons
The character hashes may not provide sufficient entropy. Okay, we dont know if we will have this problem, but lets solve it anyways, just for fun.
We will hash permutations to the same value ... "abc" should not have the same hash as "cba"
To solve the first problem we can use an idea from AI, namely lets steel from Zobrist hashing. The idea is to assign every possible character a random value of a greater length. If we were using ASCI, we could easily create an array with all the ASCI characters, but that will run into problems when using unicode characters. The alternative is to assign values lazily.
object LazyCharHash{
private val map = HashMap.empty[Char,Int]
private val r = new Random
def lHash(c: Char): Int = {
val d = map.get(c)
d match {
case None => {
map.put(c,r.nextInt)
lHash(c)
}
case Some(v) => v
}
}
}
This is Scala code. Scala tends to be less verbose than Java, but still allows me to use Java collections, as such I will be using imperative style Scala through out. It wouldn't be that hard to translate.
The second problem can be solved aswell. First, instead of using a pure XOR, we combine our XOR with a shift, thus the hash function is now:
def fullHash(s: String) = {
var h = 0
for(i <- 0 until s.length){
h = h >>> 1
h = h ^ LazyCharHash.lHash(s.charAt(i))
}
h
}
Of-course, using fullHash wont give a performance advantage. It is just a specification
We need a way of using our hash function to store values in the HashSet (I promised we would use it). We can just create a wrapper class:
class HString(hash: Int, string: String){
def getHash = hash
def getString = string
override def equals(otherHString: Any): Boolean = {
otherHString match {
case other: HString => (hash == other.getHash) && (string == other.getString)
case _ => false
}
}
override def hashCode = hash
}
Okay, to make the hashing function rolling, we just have to XOR the value associated with the character we will no longer be using. To that just takes shifting that value by the appropriate amount.
def stringIntersect(a: String, b: String, len: Int): Boolean = {
val stringSet = new HashSet[HString]()
var h = 0
for(i <- 0 until len){
h = h >>> 1
h = h ^ LazyCharHash.lHash(a.charAt(i))
}
stringSet.add(new HString(h,a.substring(0,len)))
for(i <- len until a.length){
h = h >>> 1
h = h ^ (LazyCharHash.lHash(a.charAt(i - len)) >>> (len))
h = h ^ LazyCharHash.lHash(a.charAt(i))
stringSet.add(new HString(h,a.substring(i - len + 1,i + 1)))
}
...
You can figure out how to finish this code on your own.
Is this O(n)? Well, it matters what mean. Big Oh, big Omega, big Theta, are all metrics of bounds. They could serve as metrics of the worst case of the algorithm, the best case, or something else. In this case these modification gives expected O(n) performance, but this only holds if we avoid hash collisions. It still take O(n) to tell if two Strings are equals. This random approach works pretty well, and you can scale up the size of the random bit arrays to make it work better, but it does not have guaranteed performance.
You should not store characters in the Hashset, but substrings.
When considering string "hoopla": if you store the substrings "hoopl" and "oopla" in the Hashset (linear operation), then it's linear again to find if one of the substrings of "loopla" matches.
I don't know how they're thinking you're supposed to use the HashSet but I ended up doing a solution like this:
public class StringComparator {
public static boolean compare( String a, String b, int len ) {
Set<String> pieces = new HashSet<String>();
for ( int x = 0; (x + len) <= b.length(); x++ ) {
pieces.add( a.substring( x, x + len ) );
}
for ( String piece : pieces ) {
if ( b.contains(piece) ) {
return true;
}
}
return false;
}
}

Categories