How to Develop a Hash function for traffic license numbers?

How to Develop a Hash function for traffic license numbers? - java

Develop a hash function to generate an index value between 0-4999 inclusive for a given traffic license number. Your hash function should generate as few as possible collisions. Hash function should use the properties of license numbers. Hash method should take the license number as a single String and return an index value. We assume that the license numbers to be in the following format: City code is a number between 10 and 99 inclusive. Three letters are any letter combination from English alphabet with 26 chars. Two digits number is a number between 10 and 99 inclusive.
I wrote something about this question but, collisions are a lot (1800 for 5k)
static long printValue(String s) {
long result = 0;
for (int i = 0; i < s.length(); i++) {
result += Math.pow(27, MAX_LENGTH - i - 1) * (1 + s.charAt(i) - 'A');
}
result = result % 5009;
return (int) result;
}
public int hashF(String str) {
String a = str.substring(0, 2);
String b = str.substring(5, 7);
String middle = str.substring(2, 5);
int q = (int) printValue(middle);
String last = a + q + b;
int index = Integer.parseInt(last);
index = index % 5009;
return index;
}
Link for orjinal file of licence numbers.
These are some examples from file of traffic licence number. Collisions must be 300 (maximum).
65HNM25
93DTV23
94WPX23
31RKK46
15YXX90
31MDV74
45BOG99
65JRM50
77VXR55
39TKY41
80MJU73
63QYE57
38FCO80
45ORI16
17CHN73
70SXR63
87CVM74
27EEE85
32PFJ91
50PBA66
70TVK72
15YLS20
80MPM74
21ZRN20
36VVE84
58IDW24
77VDC89
19BVK93
28SUF63

Your problem is not your code, but mathematics. Even a (perfect for you, but not very useful) hash code that produces consecutive hashes that are then mod 5000, ie
10AAA10 -> 0
10AAA11 -> 1
... etc
99ZZZ99 -> 600 (90 * 26 * 26 * 26 * 90) % 5000
will statistically produce over 1800 collisions and is no better than the simplest implementation, which is to use String's hashCode:
int hash = Math.abs(number.hashCode() % 5000);
It's a silly exercise, as it has no real world use.

Your split of the license plate into 3 parts is fine. But converting the middle to a number, hashing it, then adding the two outside strings, converting that all to an integer, and then finally executing a modulo on that is ... awkward.
I would start off with converting the prefix (10-99) to an integer, and then subtracting 10 to get the range 0-89.
Then, for each letter, I'd multiply the result by 26, and add the index of the letter (0-25).
Third, I'd multiply the whole result by 90 (the range of the final part), convert the final 2 characters to an integer, subtract 10 to convert the 10-99 range to 0-89, and add to the result from earlier.
Finally, mod the result with 5000 to get to required 0-4999 range.
Pseudo code:
result = toInt(prefix) - 10
foreach letter in middle:
result = result * 26 + ( letter - 'A' )
result = result * 90 + ( toInt(suffix) - 10)
result = result % 5000

Related

append strings with increasing frequency

You are given two strings S and T. An infinitely long string is formed in the following manner:
Take an empty string,
Append S one time,
Append T two times,
Append S three times,
Append T four times,
and so on, appending the strings alternately and increasing the number of repetitions by 1 each time.
You will also be given an integer K.
You need to tell the Kth Character of this infinitely long string.
Sample Input (S, T, K):
a
bc
4
Sample Output:
b
Sample Explanation:
The string formed will be "abcbcaaabcbcbcbcaaaaa...". So the 4th character is "b".
My attempt:
public class FindKthCharacter {
public char find(String S, String T, int K) {
// lengths of S and T
int s = S.length();
int t = T.length();
// Counters for S and T
int sCounter = 1;
int tCounter = 2;
// To store final chunks of string
StringBuilder sb = new StringBuilder();
// Loop until K is greater than zero
while (K > 0) {
if (K > sCounter * s) {
K -= sCounter * s;
sCounter += 2;
if (K > tCounter * t) {
K -= tCounter * t;
tCounter += 2;
} else {
return sb.append(T.repeat(tCounter)).charAt(K - 1);
}
} else {
return sb.append(S.repeat(sCounter)).charAt(K - 1);
}
}
return '\u0000';
}
}
But is there any better way to reduce its time complexity?

I've tried to give a guide here, rather than just give the solution.
If s and t are the lengths of the strings S and T, then you need to find the largest odd n such that
(1+3+5+...+n)s + (2+4+6+...+(n+1))t < K.
You can simplify these expressions to get a quadratic equation in n.
Let N be (1+3+..+n)s + (2+4+6+...+(n+1))t. You know that K will lie either in the next (n+2) copies of S, or the (n+3) copies of T that follow. Compare K to N+(n+2)s, and take the appropriate letter of either S or T using a modulo.
The only difficult step here is solving the large quadratic, but you can do it in O(log K) arithmetic operations easily enough by doubling n until it's too large, and then using a binary search on the remaining range. (If K is not too large so that floating point is viable, you can do it in O(1) time using the well-known quadratic formula).

Here my quick attempt, there probably is a better solution. Runtime is still O(sqrt n), but memory is O(1).
public static char find(String a, String b, int k) {
int lenA = a.length();
int lenB = b.length();
int rep = 0;
boolean isA = false;
while (k >= 0) {
++rep;
isA = !isA;
k -= (isA ? lenA : lenB) * rep;
}
int len = (isA ? lenA : lenB);
int idx = (len * rep + k) % len;
return (isA ? a : b).charAt(idx);
}

Here's a O(1) solution that took me some time to come up with (read I would have failed an interview on time). Hopefully the process is clear and you can implement it in code.
Our Goal is to return the char that maps to the kth index.
But How? Just 4 easy steps, actually.
Step 1: Find out how many iterations of our two patterns it would take to represent at least k characters.
Step 2: Using this above number of iterations i, return how many characters are present in the previous i-1 iterations.
Step 3: Get the number of characters n into iteration i that our kth character is. (k - result of step 2)
Step 4: Mod n by the length of the pattern to get index into pattern for the specific char. If i is odd, look into s, else look into t.
For step 1, we need to find a formula to give us the iteration i that character k is in. To derive this formula, it may be easier to first derive the formula needed for step 2.
Step 2's formula is basically given an iteration i, return how many characters are present in that iteration. We are solving for 'k' in this equation and are given i, while it's the opposite for step 1 where were are solving for i given k. If we can derive the equation of find k given i, then we can surely reverse it to find i given k.
Now, let's try to derive the formula for step 2 and find k given i. Here it's best to start with the most basic example to see the pattern.
s = "a", t = "b"
i=1 a
i=2 abb
i=3 abbaaa
i=4 abbaaabbbb
i=5 abbaaabbbbaaaaa
i=6 abbaaabbbbaaaaabbbbbb
Counting the total number of combined chars for each pattern during its next iteration gives us:
#iterations of pattern: 1 2 3 4 5 6 7 8 9 10
every new s iteration: 1, 4, 9, 16, 25, 36, 49, 64, 81, 100
every new t iteration: 2, 6, 12, 20, 30, 42, 56, 72, 90, 110
You might notice some nice patterns here. For example, s has a really nice formula to find out how many combined characters it has at any given iteration. It's simply (# of s iterations^2)*s.length. t also has a simple formula. It is (# of t iterations * (# of t iterations + 1))*t.length. You may have noticed that these formulas are the formulas for sum of odd and even numbers (if you did you get a kudos). This makes sense because each pattern's sum for an iteration i is the sum of all of its previous iterations.
Using s,t as length of their respective patterns, we now have the following formula to find the total number of chars at a given iteration.
#chars = s*(# of s iterations)^2 + t * (# of t iterations * (# of t iterations + 1))
Now we just need to do some math to get the number of iterations for each pattern given i.
# of s iterations given i = ceil(i/2.0)
# of t iterations given i = floor(i/2) which / operation gives us by default
Plugging these back into our formula we get:
total # of chars = s*(ceil(i/2.0)^2) + t*((i/2)*((i/2)+1))
We have just completed step 2, and we now know at any given iteration how many total chars there are. We could stop here and start picking random iterations and adjusting accordingly until we get near k, but we can do better than that. Let's use the above formula now to complete step 1 which we skipped. We just need to reorganize our equation to solve for i now.
Doing some simplyfying we get:
// 2
// i i i
// s (-) + t - ( - + 1 ) = k
// 2 2 2
// ----------------------------
// 2
// i t i
// s - + - ( - + 1 )i = k
// 4 2 2
// ----------------------------
// 2 2
// si ti ti
// ---- + ---- + ---- - k = 0
// 4 4 2
// ----------------------------
//
// 2 2
// si + ti + 2ti - 4k = 0
// ----------------------------
// 2
// (s + t)i + 2ti - 4k = 0
// ----------------------------
This looks like a polynomial. Wow! You're right! That means we can solve it using the quadratic formula.
A=(s+t), B=2t, C=-4k
quadratic formula = (-2t + sqrt(2t^2 + 16(s+t)k)) / 2(s+t)
This is our formula for step 1, and it will give us the iteration that the kth character is on. We just need to ceil it. I'm actually not smart enough to know why this works. It just does. Here is a desmos graph that graphs our two polynomials from step 2: s(Siterations)^2 and t(Titerations (Titerations + 1)).
The area under both curves is our total number of chars at an iteration (the vertical line). The formula from step 1 is also graphed, and we can see that for any s, t, k that the x intercept (which represents our xth iteration) is always: previous iteration < x <= current iteration, which is why the ceil works.
We have now completed steps 1 and 2. We have a formula to get the ith iteration that the kth character is on and a formula that gives us how many characters are in an ith iteration. Steps 3 and 4 should follow and we get our answer. This is constant time.

Karatsuba algorithm implementation: works for small ns, breaks for bigger ns

I'm working on an implementation of the Karatsuba algorithm of multiplying numbers, but unlike most implementations using Strings as the primary data structure instead of BigNumbers or longs. I've written a recursive solution to the problem that appears to work for all n < 6, but for some reason it fails to work for odd ns greater than 6, despite all of the base cases working. Here's the karatsuba part of the program, with a few prints left behind from debugging. All of the methods used in this should work as intended, I tested them thoroughly. For a value factor1 = "180" and factor2 = "109", the correct result is outputted. For a value factor1 = "1111" and factor2 = "1111" the correct result is outputted. For a factor1 = "2348711" and factor2 = "8579294" the program outputs "20358060808034" when it should output "20150282190034". I've tried backtracing the logic, and I can't find where exactly it goes wrong. If anyone has any insight as to where something may not work, any help is appreciated.
public static String multiply(String factor1, String factor2) {
// base case of length = 1
System.out.println("Factor1 " + factor1 + " factor2 " + factor2);
if (factor1.length() == 1 && factor2.length() == 1) {
return smallNumberMultiplication(factor1, factor2);
} else if (factor1.length() == 1 && factor2.length() == 2) { //these conditions needed for odd-size #s
return smallNumberMultiplication(factor1, factor2); // max iteration = 10
} else if (factor1.length() == 2 && factor2.length() == 1) {
return smallNumberMultiplication(factor2, factor1); // max iteration = 10
}
// check which factor is smaller, find the index at which the value is split
int numberLength = factor1.length();
int middleIndex = numberLength / 2;
// Find the power to which 10 is raised such that it follows Karatsuba's algorithm for ac
int powerValue = numberLength + numberLength % 2;
// divide both numbers into two parts bounded by middleIndex place
String[] tempSplitString = splitString(factor1, middleIndex);
String f1Large = tempSplitString[0], f1Small = tempSplitString[1];
tempSplitString = splitString(factor2, middleIndex);
String f2Large = tempSplitString[0], f2Small = tempSplitString[1];
String multiplyHighestNumbers, multiplySmallestNumbers, multiplyMiddleNumbers;
// large factor1 * large factor2
multiplyHighestNumbers = multiply(f1Large, f2Large);
// Multiply (f1Large + f1Small)*(f2Large + f2Small)
multiplyMiddleNumbers = multiply(addTwoValues(f1Large, f1Small), addTwoValues(f2Large, f2Small));
// small factor1 * small factor2
multiplySmallestNumbers = multiply(f1Small, f2Small);
// add trailing zeros to values (multiply by 10^powerValue)
String finalHighestNumber = addTrailingZeros(multiplyHighestNumbers, powerValue);
String finalMiddleNumber = addTrailingZeros(
subtractTwoValues(subtractTwoValues(multiplyMiddleNumbers, multiplyHighestNumbers),
multiplySmallestNumbers),
powerValue / 2);
String finalSmallestNumber = multiplySmallestNumbers;
// add each part together
return removeLeadingZeros(addTwoValues(addTwoValues(finalHighestNumber, finalMiddleNumber), finalSmallestNumber));
}

I noticed two problems:
using different values for splitting (middleIndex) and shifting (powerValue) (needlessly implemented by tacking on zeroes).
For productHighParts("multiplyHighestNumbers") to be closer in length to the other products, use (factor1.length() + factor2.length()) / 4 (half the average length of both factors).
this length has to be the length of the less significant part in splitString(), not the leading part.
(Note that the first two controlled statements can be combined:
if (factor1.length() <= 1 && factor2.length() <= 2).)

How can I covert an integer to base 3 [duplicate]

This question already has answers here:
Convert from one base to another in Java
(10 answers)
Closed 4 years ago.
I found a challenge online, and thought its pretty interesting but tried multiple times to understand what base 3 is and how u can get there, but unfortunately no clue how.
Write a method, convertIntegerToBase3() which does the following:
-- Accepts an integer parameter (from 0 to 26) and converts it to base 3, which is stored
as a string, which is the return value.
public String convertIntgerToBase(int num){
if(numberConv >= 0 && numberConv <= 26){
//What do I do here?
}
else{
System.out.println("Error! Number entered wasn't in the range of 0 and 26);
}
}

Update
Because below answer at just works up to base 10 I've come up with following answer which works up to base 36:
private static final char[] CHARS = "0123456789abcdefghijklmnopqrstuvwxyz".toCharArray();
private static String convertIntToBase(int i, int base){
final StringBuilder builder = new StringBuilder();
do{
builder.append(CHARS[i % base]);
i /= base;
} while(i > 0);
return builder.reverse().toString();
}
The logic stays the same, but now by accessing the CHARS array we can get up to base 36. Because we have now the whole alphabet and the numbers to create a new number in an other base.
Using this will now yield correct numbers for e.g. base 16:
convertIntToBase(255, 16);
Will return the correct hex value:
ff
Old
It's pretty simple, by division and using the remainder of the base and can be made also generic:
public String convertIntToBase(int i, int base){
final StringBuilder builder = new StringBuilder();
do {
builder.append(i % base);
i /= base;
} while(i > 0);
return builder.reverse().toString();
}
which then can be used like the following:
convertIntToBase(24, 3);
Which yields:
220
This works, as said with division and the remainder (modulo). With the sample number 24 we can go through the steps pretty easy. The iteration and calculation (i /= base) % base can be split up into the following parts:
divide i through the base (i /= base)
split of the comma places (This is done automatically because division of integers in java are always floored)
See what remainder stays by whole-dividing the i with the base (i % base)
repeat if i is bigger than zero
So with the i = 24 and base = 3 it goes through the following steps:
24 % 3 = 0
24 / 3 = 8
8 % 3 = 2, because by whole-division we get that 3 is 8 times in 2, leaving 2
8 / 3 = 2
2 % 3 = 0
2 / 3 = 0
leaving you with the known result: 220

Base 3 is simply representing a number as a polynomial where the base of the polynomial is 3. It can extend for all basis, like base 1.2, base 99... etc.
Number in Decimal = (A0)*3^0 + (A1)*3^1 + (A2)*3^2 ...
where AX can have in value that falls with in (x)mod(3) or
(x)mod(base). In your case a range of [0,1,2]. (x)mod(Anything) is
always positive or zero.
Try to think how you could guess with your base might, be but, its essentially a problem asking you to divide a number. If you know how to divide your integer number with that polynomial, you've figured out the problem.

Why does quot need to be repeatedly subtracted by 1?

This question is based off this thread Programming Riddle: How might you translate an Excel column name to a number?
Here is code from that question to translate a column number to an excel column name
public String getColName (int colNum) {
String res = "";
int quot = colNum;
int rem;
/*1. Subtract one from number.
*2. Save the mod 26 value.
*3. Divide the number by 26, save result.
*4. Convert the remainder to a letter.
*5. Repeat until the number is zero.
*6. Return that bitch...
*/
while(quot > 0)
{
quot = quot - 1;
rem = quot % 26;
quot = quot / 26;
//cast to a char and add to the beginning of the string
//add 97 to convert to the correct ascii number
res = (char)(rem+97) + res;
}
return res;
}
I tested this code thoroughly and it works but I have a question about what this line needs to be repeated for this to work
quot = quot - 1;
From my understanding the quot is needed to map the col number to distance away from 'a'. That means 1 should map to 0 distance from 'a', 2 to 1 distance away from 'a' and so on. But don't you need to subtract this one once to account for this? Not in a loop
I mean eventually,
quot = quot / 26;
will stop the loop.

Excel columns aren't a normal number system. It's not just base 26. The first two-digit column is "AA". In any normal number system, the first two digit number is composed of two different digits. Basically, in excel column numbering, there is no "zero" digit.
To account for this difference, 1 is subtracted at each iteration.

check number present in a sequences

I am writing a program which I found on a coding competition website, I have sort of figured out how to solve the problem but, I am stuck on a math part of it, I am completely diluting the problem and showing what I need.
first I need to check if a number is part of a sequence, my sequence is 2*a+1 where a is the previous element in the sequence or 2^n-1 to get nth item in the sequence. so it is 1,3,7,15,31,63...
I don't really want to create the whole sequence and check if a number is present, but I am not sure what a quicker method to do this would be.
Second if I am given a number lets say 25, I want to figure out the next highest number in my sequence to this number. So for 25 it would be 31 and for 47 it would be 63, for 8 it would be 13.
How can i do these things without creating the whole sequence.
I have seen similar questions here with different sequences but I am still not sure how to solve this

Start by finding the explicit formula for any term in your sequence. I'm too lazy to write out a proof, so just add 1 to each term in your sequence:
1 + 1 = 2
3 + 1 = 4
7 + 1 = 8
15 + 1 = 16
31 + 1 = 32
63 + 1 = 64
...
You can clearly see that a_n = 2^n - 1.
To check if a particular number is in your sequence, assume that it is:
x = 2^n - 1
x + 1 = 2^n
From Wikipedia:
The binary representation of integers makes it possible to apply a
very fast test to determine whether a given positive integer x is a
power of two:
positive x is a power of two ⇔ (x & (x − 1)) equals to zero.
So to check, just do:
bool in_sequence(int n) {
return ((n + 1) & n) == 0;
}

As #Blender already pointed out your sequence is essentially 2^n - 1, you can use this trick if you use integer format to store it:
boolean inSequence(int value) {
for (int i = 0x7FFF; i != 0; i >>>= 1) {
if (value == i) {
return true;
}
}
return false;
}
Note that for every elements in your sequence, its binary representation will be lots of 0s and then lots of 1s.
For example, 7 in binary is 0000000000000000000000000000111 and 63 in binary is 0000000000000000000000000111111.
This solution starts from 01111111111111111111111111111111 and use an unsigned bitshift, then compare if it is equal to your value.
Nice and simple.

How to find the next higher number :
For example, we get 19 ( 10011 ) , should return 31 (11111)
int findNext(int n){
if(n == 0) return 1;
int ret = 2; // start from 10
while( (n>>1) > 0){ // end with 100000
ret<<1;
}
return ret-1;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to Develop a Hash function for traffic license numbers? - java

Related

append strings with increasing frequency

Karatsuba algorithm implementation: works for small ns, breaks for bigger ns

How can I covert an integer to base 3 [duplicate]

Why does quot need to be repeatedly subtracted by 1?

check number present in a sequences

Categories

Resources