I need to measure the physical distance between two places whose names are provided as strings. Since sometimes the names are written slightly differently, I was looking for a library that could help me measure the difference and then combine it with a measure of the latitude and longitude to select the correct matches. Preferred languages: Java or PHP.
Any suggestions?
Have a look at the Levenshtein distance. This is a way of measuring how different two strings are from one another.
Hopefully I understood your question correctly; using "distance" in the same sentence as "latitude and longitude" could be confusing!
Although written in c (with python and tcl bindings), libdistance would be a tool for applying several distances metrics on strings/data.
Metrics included:
bloom
damerau
euclid
hamming
jaccard
levenshtein
manhattan
minkowski
needleman_wunsch
I took the liberty to translate a piece of C# code I've written to calculate the Levenshtein distance into Java code. It uses only two single-dimension arrays that alternate instead of a big jagged array:
public static int getDifference(String a, String b)
{
// Minimize the amount of storage needed:
if (a.length() > b.length())
{
// Swap:
String x = a;
a = b;
b = x;
}
// Store only two rows of the matrix, instead of a big one
int[] mat1 = new int[a.length() + 1];
int[] mat2 = new int[a.length() + 1];
int i;
int j;
for (i = 1; i <= a.length(); i++)
mat1[i] = i;
mat2[0] = 1;
for (j = 1; j <= b.length(); j++)
{
for (i = 1; i <= a.length(); i++)
{
int c = (a.charAt(i - 1) == b.charAt(j - 1) ? 0 : 1);
mat2[i] =
Math.min(mat1[i - 1] + c,
Math.min(mat1[i] + 1, mat2[i - 1] + 1));
}
// Swap:
int[] x = mat1;
mat1 = mat2;
mat2 = x;
mat2[0] = mat1[0] + 1;
}
// It's row #1 because we swap rows at the end of each outer loop,
// as we are to return the last number on the lowest row
return mat1[a.length()];
}
It is not rigorously tested, but it seems to be working okay. It was based on a Python implementation I made for a university exercise. Hope this helps!
You might get some decent results using a phonetic algorithm to find slightly misspelld names.
Also, if you use a more mechanical edit distance, you'll probably see better results using a weighted function that accounts for keyboard geometry (i.e. physically close keys are "cheaper" to replace than far off ones). That's a patented method btw, so be careful not to write something that becomes too popular ;)
I would recommend either Levenshtein Distance or the Jaccard Distance for comparing text.
I found SumMetrics in Java, but haven't used it.
Related
Alright, so I'm working on a leveling system in java. I have this from my previous question for defining the level "exp" requirements:
int[] levels = new int[100];
for (int i = 1; i < 100; i++) {
levels[i] = (int) (levels[i-1] * 1.1);
}
Now, my question is how would I determine if an exp level is between two different integers in the array, and then return the lower of the two? I've found something close, but not quite what I'm looking for here where it says binary search. Once I find which value the exp falls between I'll be able to determine a user's level. Or, if anyone else has a better idea, please don't hesitate to mention it. Please excuse my possible nooby mistakes, I'm new to Java. Thanks in advance to any answers.
Solved, thanks for all the wonderful answers.
With a general sorted array of numbers, binary search is the way to go, which is O(log n). But because there is a mathematical relationship between the numbers (each number is 1.1 times the previous one), take advantage of that fact. You're looking for the maximum exponent level such that
levels[0] * Math.pow(1.1, level) <= exp
Solving for level,
level = log{base 1.1}(exp / levels[0])
Taking advantage of the fact that loga(b) = ln(b) / ln(a)...
int level = (int) Math.log(exp/levels[0]) / Math.log(1.1);
Because of the mathematical relationship, you just need this calculation, and no searching, so it's O(1).
double base = 1;
double factor = 1.1;
for (double score : Arrays.asList(1.0, 1.1, 1.3, 8.6, 9.46))
{
int level = (int) (Math.log(score / base) / Math.log(factor));
System.out.println(level);
}
Prints
0
1
2
22
23
You can use the built-in binary search method:
int exp = . . .
int pos = Arrays.binarySearch(levels, exp);
if (pos < 0) {
// no exact match -- change pos to the insertion index
pos = -pos - 1;
// Now exp is between levels[pos] and levels[pos - 1]
// (or less than levels[0] if pos is now 0)
} else {
// exp is exactly equal to levels[pos]
}
Yes, you should do binary search here as your array elements seem to be sorted.
This follows from the way you initialize them.
You can also do linear search of course but the former is better.
Why not just use the Arrays Class?
int valueToFind = 100;
int index = Arrays.binarySearch(levels, valueToFind)
My question is if given an array,we have to split that into two sub-arrays such that the absolute difference between the sum of the two arrays is minimum with a condition that the difference between number of elements of the arrays should be atmost one.
Let me give you an example.Suppose
Example 1: 100 210 100 75 340
Answer :
Array1{100,210,100} and Array2{75,340} --> Difference = |410-415|=5
Example 2: 10 10 10 10 40
Answer : Array1{10,10,10} and Array2{10,40} --> Difference = |30-50|=20
Here we can see that though we can divide the array into {10,10,10,10} and {40},we are not dividing because the constraint "the number of elements between the arrays should be atmost 1" will be violated if we do so.
Can somebody provide a solution for this ?
My approach:
->Calculate sum of the array
->Divide the sum by 2
->Let the size of the knapsack=sum/2
->Consider the weights of the array values as 1.(If you have come across the knapsack problem ,you may know about the weight concept)
->Then consider the array values as the values of the weights.
->Calculate the answer which will be array1 sum.
->Total sum-answer=array2 sum
This approach fails.
Calculating the two arrays sum is enough.We are not interested in which elements form the sum.
Thank you!
Source: This is an ICPC problem.
I have an algorithm that works in O(n3) time, but I have no hard proof it is optimal. It seems to work for every test input I give it (including some with negative numbers), so I figured it was worth sharing.
You start by splitting the input into two equally sized arrays (call them one[] and two[]?). Start with one[0], and see which element in two[] would give you the best result if swapped. Whichever one gives the best result, swap. If none give a better result, don't swap it. Then move on to the next element in one[] and do it again.
That part is O(2) by itself. The problem is, it might not get the best results the first time through. If you just keep doing it until you don't make any more swaps, you end up with an ugly bubble-type construction which makes it O(n3) total.
Here's some ugly Java code to demonstrate (also at ideone.com if you want to play with it):
static int[] input = {1,2,3,4,5,-6,7,8,9,10,200,-1000,100,250,-720,1080,200,300,400,500,50,74};
public static void main(String[] args) {
int[] two = new int[input.length/2];
int[] one = new int[input.length - two.length];
int totalSum = 0;
for(int i=0;i<input.length;i++){
totalSum += input[i];
if(i<one.length)
one[i] = input[i];
else
two[i-one.length] = input[i];
}
float goal = totalSum / 2f;
boolean swapped;
do{
swapped = false;
for(int j=0;j<one.length;j++){
int curSum = sum(one);
float curBestDiff = Math.abs(goal - curSum);
int curBestIndex = -1;
for(int i=0;i<two.length;i++){
int testSum = curSum - one[j] + two[i];
float diff = Math.abs(goal - testSum);
if(diff < curBestDiff){
curBestDiff = diff;
curBestIndex = i;
}
}
if(curBestIndex >= 0){
swapped = true;
System.out.println("swapping " + one[j] + " and " + two[curBestIndex]);
int tmp = one[j];
one[j] = two[curBestIndex];
two[curBestIndex] = tmp;
}
}
} while(swapped);
System.out.println(Arrays.toString(one));
System.out.println(Arrays.toString(two));
System.out.println("diff = " + Math.abs(sum(one) - sum(two)));
}
static int sum(int[] list){
int sum = 0;
for(int i=0;i<list.length;i++)
sum += list[i];
return sum;
}
Can you provide more information on the upper limit of the input?
For your algorithm, I think your are trying to pick floor(n/2) items and find it's maximum sum of value as array1 sum...(If this is not your original thought then please ignore the following lines)
If this is the case, then knapsack size should be n/2 instead of sum/2,
but even so, I think it's still not working. The ans is min(|a - b|) and maximizing a is a different issue. For eg, {2,2,10,10}, you will get a = 20, b = 4, while the ans is a = b = 12.
To answer the problem, I think I need more information of the upper limit of the input..
I cannot come up with a brilliant dp state but a 3-dimensional state
dp(i,n,v) := in first i-th items, pick n items out and make a sum of value v
each state is either 0 or 1 (false or true)
dp(i,n,v) = dp(i-1, n, v) | dp(i-1, n-1, v-V[i])
This dp state is so naive that it has a really high complexity which usually cannot pass a ACM / ICPC problem, so if possible please provide more information and see if I can come up another better solution...Hope I can help a bit :)
DP soluction will give lg(n) time. Two array, iterate one from start to end, and calculate the sum, the other iterate from end to start, and do the same thing. Finally, iterate from start to end and get minimal difference.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
In-place transposition of a matrix
Recently attended an Technical Written Interview. Came through the following question.
I have an array as say
testArray = {a1,a2,a3,...an,b1,b2,b3,....bn,c1,c2,c3,.....,cn}
I need to sort this array as `
testArray = {a1,b1,c1,a2,b2,c2,a3,b3,c3,.....,an,bn,cn}
Constraint is I should not use extra memory, should not use any inbuilt function.
Should write complete code, it can be in any language and can also use any data structure.
eg:
Input: {1,2,3,4,5,6,7,8,9}, n = 3
Output: {1,4,7,2,5,8,3,6,9}
I could not get any solution within the constraint, can anyone provide solution or suggestion?
This is just a matrix transpose operation. And there is even a problem and solution for in-place matrix transposition on Wikipedia.
No extra space is impossible, since you need to at least go through the array. O(1) additional memory is possible, with heavy penalty on the time complexity.
The solution is built on follow-the-cycle algorithm in the Wikipedia page: for each cell, we will find the cell with the smallest index in the cycle. If the cell with the smallest index is greater than or equal (>=) to the index of the current cell, we will perform chain swapping. Otherwise, we ignore the cell, since it has been swapped correctly. The (loosely analyzed) upper bound on time complexity can go as high as O((MN)2) (we go through M * N cells, and the cycle can only be as long as the total number of cells).
Impossibility
It is impossible to implement this algorithm without extra use of memory and an arbitrary length because you need a an iterator to traverse the list and that takes up space.
Finding the right indices to swap
For fixed lengths of the array and fixed n you can use a matrix transpose algorithm.
and in order to swap the elements y
The algorithm you are looking for is a matrix transpose algorithm.
so you have to swap every element exactly once iterating through it.
http://en.wikipedia.org/wiki/Transpose
basically you have to swap the m -th element in the n - th component with the n - th element in the m -th component. This can be done by a double loop.
m = length(array)/n;
for (i = 0; i < m; i++)
for (j = 0; j < n; j++)
{
index_1 = i * m + j;
index_2 = j * m + i
swap(index_1, index_2);
}
Note: For fixed m and n this loop can be completely unrolled and therefore m, i, j can be replaced by a constant.
Swaping without Memory consumption
In order to swap every element without using extra space you can use the XOR swap algorithm as pointed out in the comments:
X := X XOR Y
Y := Y XOR X
X := X XOR Y
The simplest way to swap two numbers (a and b) without using a temporary variable is like this:
b = b + a;
a = b - a;
b = b - a;
If you write that in a function, then you're part of the way there. How you keep track of which variable to swap within the arrays without using a temporary variable eludes me right now.
Bear in mind voters: he doesn't actually need to sort the array, just swap the right values.
Edit: this will work with large values in Java (and in C/C++ unless you turn on some very aggressive compiler optimisations - the behaviour is undefined but defaults to sane). The values will just wrap around.
Second edit - some (rather untested) code to flip the array around, with I think 4 integers over the memory limit. It's while technically massively unthreadsafe, but it would be parallelisable just because you only access each array location once at most:
static int[] a = {1,2,3,4,
5,6,7,8,
9,10,11,12,
13,14,15,16};
static int n = 4;
public static void main(String[] args)
{
for(int i = 0; i < a.length/n; i++) // 1 integer
for(int j = 0; j < n; j++) // 1 integer
if(j > i)
swap(i*n+j, j*n+i);
}
static void swap(int aPos, int bPos) // 2 integers
{
if(a[aPos] != a[bPos])
{
a[bPos] = a[aPos] + a[bPos];
a[aPos] = a[bPos] - a[aPos];
a[bPos] = a[bPos] - a[aPos];
}
}
Apologies if this misunderstands the question; I read it carefully and couldn't work out what was needed other than this.
Take a look at Quicksort algorithm
For more information about available algorithms, go to Sorting algorithm page.
I have a sorted list of ratios, and I need to find a "bin size" that is small enough so that none of them overlap. To put it shortly, I need to do what the title says. If you want a little background, read on.
I am working on a graphical experiment that deals with ratios and the ability of the eye to distinguish between these ratios quickly. So when we are forming these experiments, we use flashes of dots with various ratios chosen from dot bins. A bin is just a range of possible ratios with the mentioned array elements in the center. All dot bins need to be the same size. This means that we need to find the elements in the array that are nearest each other. Keep in mind that the array is sorted.
Can anyone think of a quick cool way to do this? I have never been particularly algorithmically inclined, so right now I am just running through the array backwards and subtracting the next element from the current one and checking that against a sum. Thanks
private double findNumerostyBinRangeConstant(double[] ratios) {
int minI = 0;
double min = 0;
for (int i = ratios.length -1; i > 0; i--) {
if (ratios[i] - ratios[i-1] > min) {
min = ratios[i] - ratios[i-1];
minI = i;
}
}
return Math.sqrt(ratios[minI]/ratios[minI - 1]); //Essentiall a geometric mean. Doesn't really matter.
}
Forward moving functions, fixed some logic issues that you had. Since you are looking for the minimum double, your initial comparison variable should start at the max. Removed the comparison by subtraction because you weren't using it later, replaced it with the division.
Note: Haven't tested the fringe cases, including zeros and negatives.
private double findNumerostyBinRangeConstant(double[] ratios) {
double result = Double.MAX_VALUE;
for (int i = 0; i<ratios.length-1; i++) {
if (ratios[i+1]/ratios[i] < result){
result = ratios[i+1]/ratios[i];
}
}
return Math.sqrt(result);
}
Only change: flipped the array search to go in increasing direction -- many architectures prefer going in positive direction. (Some don't.) Haven't verified that I didn't introduce an off-by-one error. (Sorry.)
private double findNumerostyBinRangeConstant(double[] ratios) {
int minI = 0;
double min = Double.MAX_VALUE;
for (int i = 0; i <= ratios.length-1; i++) {
if (ratios[i+1] - ratios[i] < min) {
min = ratios[i+1] - ratios[i];
minI = i;
}
}
return Math.sqrt(ratios[minI+1]/ratios[minI]);
}
Here is a fun one: I need to generate random x/y pairs that are correlated at a given value of Pearson product moment correlation coefficient, or Pearson r. You can imagine this as two arrays, array X and array Y, where the values of array X and array Y must be re-generated, re-ordered or transformed until they are correlated with each other at a given level of Pearson r. Here is the kicker: Array X and Array Y must be uniform distributions.
I can do this with a normal distribution, but transforming the values without skewing the distribution has me stumped. I tried re-ordering the values in the arrays to increase the correlation, but I will never get arrays correlated at 1.00 or -1.00 just by sorting.
Any ideas?
--
here is the AS3 code for random correlated gaussians, to get the wheels turning:
public static function nextCorrelatedGaussians(r:Number):Array{
var d1:Number;
var d2:Number;
var n1:Number;
var n2:Number;
var lambda:Number;
var r:Number;
var arr:Array = new Array();
var isNeg:Boolean;
if (r<0){
r *= -1;
isNeg=true;
}
lambda= ( (r*r) - Math.sqrt( (r*r) - (r*r*r*r) ) ) / (( 2*r*r ) - 1 );
n1 = nextGaussian();
n2 = nextGaussian();
d1 = n1;
d2 = ((lambda*n1) + ((1-lambda)*n2)) / Math.sqrt( (lambda*lambda) + (1-lambda)*(1-lambda));
if (isNeg) {d2*= -1}
arr.push(d1);
arr.push(d2);
return arr;
}
I ended up writing a short paper on this
It doesn't include your sorting method (although in practice I think it's similar to my first method, in a roundabout way), but does describe two ways that don't require iteration.
Here is an implementation of of twolfe18's algorithm written in Actionscript 3:
for (var j:int=0; j < size; j++) {
xValues[i]=Math.random());
}
var varX:Number = Util.variance(xValues);
var varianceE:Number = 1/(r*varX) - varX;
for (var i:int=0; i < size; i++) {
yValues[i] = xValues[i] + boxMuller(0, Math.sqrt(varianceE));
}
boxMuller is just a method that generates a random Gaussian with the arguments (mean, stdDev).
size is the size of the distribution.
Sample output
Target p: 0.8
Generated p: 0.04846346291280387
variance of x distribution: 0.0707786253165176
varianceE: 17.589920412141158
As you can see I'm still a ways off. Any suggestions?
This apparently simple question has been messing up with my mind since yesterday evening! I looked for the topic of simulating distributions with a dependency, and the best I found is this: simulate dependent random variables. The gist of it is, you can easily simulate 2 normals with given correlation, and they outline a method to transform these non-independent normals, but this won't preserve correlation. The correlation of the transform will be correlated, so to speak, but not identical. See the paragraph "Rank correlation coefficents".
Edit: from what I gather from the second part of the article, the copula method would allow you to simulate / generate random variables with rank correlation.
start with the model y = x + e where e is the error (a normal random variable). e should have a mean of 0 and variance k.
long story short, you can write a formula for the expected value of the Pearson in terms of k, and solve for k. note, you cannot randomly generate data with the Pearson exactly equal to a specific value, only with the expected Pearson of a specific value.
i'll try to come back and edit this post to include a closed form solution when i have access to some paper.
EDIT: ok, i have a hand-wavy solution that is probably correct (but will require testing to confirm). for now, assume desired Pearson = p > 0 (you can figure out the p < 0 case). like i mentioned earlier, set your model for Y = X + E (X is uniform, E is normal).
sample to get your x's
compute var(x)
the variance of E should be: (1/(rsd(x)))^2 - var(x)
generate your y's based on your x's and sample from your normal random variable E
for p < 0, set Y = -X + E. proceed accordingly.
basically, this follows from the definition of Pearson: cov(x,y)/var(x)*var(y). when you add noise to the x's (Y = X + E), the expected covariance cov(x,y) should not change from that with no noise. the var(x) does not change. the var(y) is the sum of var(x) and var(e), hence my solution.
SECOND EDIT: ok, i need to read definitions better. the definition of Pearson is cov(x, y)/(sd(x)sd(y)). from that, i think the true value of var(E) should be (1/(rsd(x)))^2 - var(x). see if that works.
To get a correlation of 1 both X and Y should be the same, so copy X to Y and you have a correlation of 1. To get a -1 correlation, make Y = 1 - X. (assuming X values are [0,1])
A strange problem demands a strange solution -- here is how I solved it.
-Generate array X
-Clone array X to Create array Y
-Sort array X (you can use whatever method you want to sort array X -- quicksort, heapsort anything stable.)
-Measure the starting level of pearson's R with array X sorted and array Y unsorted.
WHILE the correlation is outside of the range you are hoping for
IF the correlation is to low
run one iteration of CombSort11 on array Y then recheck correlation
ELSE IF the correlation is too high
randomly swap two values and recheck correlation
And thats it! Combsort is the real key, it has the effect of increasing the correlation slowly and steadily. Check out Jason Harrison's demo to see what I mean. To get a negative correlation you can invert the sort or invert one of the arrays after the whole process is complete.
Here is my implementation in AS3:
public static function nextReliableCorrelatedUniforms(r:Number, size:int, error:Number):Array {
var yValues:Array = new Array;
var xValues:Array = new Array;
var coVar:Number = 0;
for (var e:int=0; e < size; e++) { //create x values
xValues.push(Math.random());
}
yValues = xValues.concat();
if(r != 1.0){
xValues.sort(Array.NUMERIC);
}
var trueR:Number = Util.getPearson(xValues, yValues);
while(Math.abs(trueR-r)>error){
if (trueR < r-error){ // combsort11 for y
var gap:int = yValues.length;
var swapped:Boolean = true;
while (trueR <= r-error) {
if (gap > 1) {
gap = Math.round(gap / 1.3);
}
var i:int = 0;
swapped = false;
while (i + gap < yValues.length && trueR <= r-error) {
if (yValues[i] > yValues[i + gap]) {
var t:Number = yValues[i];
yValues[i] = yValues[i + gap];
yValues[i + gap] = t;
trueR = Util.getPearson(xValues, yValues)
swapped = true;
}
i++;
}
}
}
else { // decorrelate
while (trueR >= r+error) {
var a:int = Random.randomUniformIntegerBetween(0, size-1);
var b:int = Random.randomUniformIntegerBetween(0, size-1);
var temp:Number = yValues[a];
yValues[a] = yValues[b];
yValues[b] = temp;
trueR = Util.getPearson(xValues, yValues)
}
}
}
var correlates:Array = new Array;
for (var h:int=0; h < size; h++) {
var pair:Array = new Array(xValues[h], yValues[h]);
correlates.push(pair);}
return correlates;
}