what mean by this parameters in euclidean hash family?

what mean by this parameters in euclidean hash family? - java

in euclidean hash family library in java i do not understand parameters w and if there is relation between choice the value of w and value of dimension.
public EuclidianHashFamily(int w,int dimensions){
this.dimensions = dimensions;
this.w=w;
i want to know what mean by this integers to know what values assign to it.
public EuclideanHash(int dimensions,int w){
Random rand = new Random();
this.w = w;
this.offset = rand.nextInt(w);
randomProjection = new Vector(dimensions);
for(int d=0; d<dimensions; d++) {
//mean 0
//standard deviation 1.0
double val = rand.nextGaussian();
randomProjection.set(d, val);

A bit of web search leading to me quoting Wikipedia:
One of the main applications of LSH is to provide a method for efficient approximate nearest neighbor search algorithms. Consider an LSH family 𝓕. The algorithm has two main parameters: the width parameter k and the number of hash tables L.
So, w is the width.

Related

Shuffling through all the points in a 3-dimensional space without storing all possible coordinates

I'm programming a 3-dimensional cellular automata. The way I'm iterating through it right now in each generation is:
Create a list of all possible coordinates in the 3D space.
Shuffle the list.
Iterate through the list until all coordinates have been visited.
Goto 2.
Here's the code:
I've a simple 3 integer struct
public class Coordinate
{
public int x;
public int y;
public int z;
public Coordinate(int x, int y, int z) {this.x = x; this.y = y; this.z = z;}
}
then at some point I do this:
List<Coordinate> all_coordinates = new ArrayList<>();
[...]
for(int z=0 ; z<length ; z++)
{
for(int x=0 ; x<diameter ; x++)
{
for(int y=0 ; y<diameter ; y++)
{
all_coordinates.add(new Coordinate(x,y,z));
}
}
}
and then in the main algorithm I do this:
private void next_generation()
{
Collections.shuffle(all_coordinates);
for (int i=0 ; i < all_coordinates.size() ; i++)
{
[...]
}
}
The problem is, once the automata gets too large, the list containing all possible points gets huge. I need a way to shuffle through all the points without having to actually store all the possible points in memory. How should I go about this?

One way to do this is to start by mapping your three dimensional coordinates into a single dimension. Let's say that your three dimensions' sizes are X, Y, and Z. So your x coordinate goes from 0 to X-1, etc. The full size of your space is X*Y*Z. We'll call that S.
To map any coordinate in 3-space to 1-space, you use the formula (x*X) + (Y*y) + z.
Of course, once you generate the numbers, you have to convert back to 3-space. That's a simple matter of reversing the conversion above. Assuming that coord is the 1-space coordinate:
x = coord/X
coord = coord % X
y = coord/Y
z = coord % Y
Now, with a single dimension to work with, you've simplified the problem to one of generating all the numbers from 0 to S in pseudo-random order, without duplication.
I know of at least three ways to do this. The simplest uses a multiplicative inverse, as I showed here: Given a number, produce another random number that is the same every time and distinct from all other results.
When you've generated all of the numbers, you "re-shuffle" the list by picking a different x and m values for the multiplicative inverse calculations.
Another way of creating a non-repeating pseudo-random sequence in a particular range is with a linear feedback shift register. I don't have a ready example, but I have used them. To change the order, (i.e. re-shuffle), you re-initialize the generator with different parameters.
You might also be interested in the answers to this question: Unique (non-repeating) random numbers in O(1)?. That user was only looking for 1,000 numbers, so he could use a table, and the accepted answer reflects that. Other answers cover the LFSR, and a Linear congruential generator that is designed with a specific period.
None of the methods I mentioned require that you maintain much state. The amount of state you need to maintain is constant, whether your range is 20 or 20,000,000.
Note that all of the methods I mentioned above give pseudo-random sequences. They will not be truly random, but they'll likely be close enough to random to fit your needs.

Working out a point using curve fitting in java

The following code produces a curve that should fit fit the points
1, 1
150, 250
10000, 500
100000, 750
100000, 1000
I built this code based off the documentation here, however, I am not entirely sure how to use the data correctly for further calcuations and whether PolynomialCurveFitter.create(3) will affect the answers in these future calcuations.
For example, how would I use the data outputted to calculate what is the x value if the y value is 200 and how would the value differ if I had PolynomialCurveFitter.create(2) instead of PolynomialCurveFitter.create(3)?
import java.util.ArrayList;
import java.util.Arrays;
import org.apache.commons.math3.fitting.PolynomialCurveFitter;
import org.apache.commons.math3.fitting.WeightedObservedPoints;
public class MyFuncFitter {
public static void main(String[] args) {
ArrayList<Integer> keyPoints = new ArrayList<Integer>();
keyPoints.add(1);
keyPoints.add(150);
keyPoints.add(10000);
keyPoints.add(100000);
keyPoints.add(1000000);
WeightedObservedPoints obs = new WeightedObservedPoints();
if(keyPoints != null && keyPoints.size() != 1) {
int size = keyPoints.size();
int sectionSize = (int) (1000 / (size - 1));
for(int i = 0; i < size; i++) {
if(i != 0)
obs.add(keyPoints.get(i), i * sectionSize);
else
obs.add(keyPoints.get(0), 1);
}
} else if(keyPoints.size() == 1 && keyPoints.get(0) >= 1) {
obs.add(1, 1);
obs.add(keyPoints.get(0), 1000);
}
PolynomialCurveFitter fitter = PolynomialCurveFitter.create(3);
fitter.withStartPoint(new double[] {keyPoints.get(0), 1});
double[] coeff = fitter.fit(obs.toList());
System.out.println(Arrays.toString(coeff));
}
}

About what the consequences of changing d for your function
PolynomialCurveFitter.create takes the degree of the polynomial as a parameter.
Very (very) roughly speaking, the polynomial degree will describe the "complexity" of the curve you want to fit. A low-level degree will produce simple curves (just a parabola for d=2), whereas higher degrees will produce more intricate curves, with lots of peaks and valleys, of highly varying sizes, therefore more able to perfectly "fit" all your data points, at the expense of not necessarily being a good "prediction" of all other values.
Like the blue curve on this graphic:
You can see how the straight line would be a better "approximation", while not fitting the data point properly.
How to compute x for any y values in the computed function
You "simply" need to solve the polynomial ! Using the very same library. Add the inverted y value to your coefficents list, and find its root.
Let's say you chose a degree of 2.
Your coefficients array coeffs will contains 3 factors {a0, a1, a2} which describes the equation as such:
If you want to solve this for a particular value, like y= 600, you need to solve :
So, basically,
So, just substract 600 to a0:
coeffs[0] -= 600
and find the root of the polynomial using the dedicated function:
PolynomialFunction polynomial = new PolynomialFunction(coeffs);
LaguerreSolver laguerreSolver = new LaguerreSolver();
double x = laguerreSolver.solve(100, polynomial, 0, 1000000);
System.out.println("For y = 600, we found x = " + x);

How to get intersection of two hashmaps with tolerence in Java?

In my java code, I have two hashmaps, and I want to get the intersection as a value. The keys are ARGB values of a color (integer) and its value is the frequency (integer). Basically each hashmap was generated from an image.
I want to determine a value that represents how close the maps are to each other. The higher the value the more close the two maps are to each other. Of course it can't be perfectly strict because in real life two colors can look the same but have slightly different ARGB values, which is where the tolerance part comes in.
So far I got this:
private int colorCompare(Result otherResult) {
HashMap<Integer, Integer> colorMap1 = getColorMap();
HashMap<Integer, Integer> colorMap2 = otherResult.getColorMap();
int sum = 0;
for (Map.Entry<Integer, Integer> entry : colorMap1.entrySet()) {
Integer key = entry.getKey();
Integer value = entry.getValue();
if (colorMap2.containsKey(key)) {
sum += value + colorMap2.get(key);
}
}
return sum;
}
public double CloseTo(Pixel otherpixel) {
Color mycolor = getColor();
Color othercolor = otherpixel.getColor();
double rmean = ( mycolor.getRed() + othercolor.getRed() )/2;
int r = mycolor.getRed() - othercolor.getRed();
int g = mycolor.getGreen() - othercolor.getGreen();
int b = mycolor.getBlue() - othercolor.getBlue();
double weightR = 2 + rmean/256;
double weightG = 4.0;
double weightB = 2 + (255-rmean)/256;
return Math.sqrt(weightR*r*r + weightG*g*g + weightB*b*b);
}
Does anyone know how to incorporate the tolerance part into it as I have no idea...
Thanks

I was unsure what the intersection of two maps would be, but it sounds as though you want to compute a distance of some sort based on the histograms of two images. One classic approach to this problem is Earth mover's distance (EMD). Assume for the moment that the images have the same number of pixels. The EMD between these two images is determined by the one-to-one correspondence between the pixels of the first image and the pixels of the second that minimizes the sum over all paired pixels of the distance between their colors. The EMD can be computed in polynomial time using the Hungarian algorithm.
If the images are of different sizes, then we have to normalize the frequencies and swap out the Hungarian algorithm for one that can solve a more general minimum-cost flow problem.

How do I generate normal cumulative distribution in Java? its inverse cdf? How about lognormal?

I am brand new to Java, second day! I want generate samples with normal distribution. I am using inverse transformation.
Basically, I want to find the inverse normal cumulative distribution, then find its inverse. And generate samples.
My questions is: Is there a built-in function for inverse normal cdf? Or do I have to hand code?
I have seen people refer to this on apache commons. Is this a built-in? Or do I have to download it?
If I have to do it myself, can you give me some tips? If I download, doesn't my prof also have to have the "package" or special file installed?
Thanks in advance!
Edit:Just found I can't use libraries, also heard there is simpler way converting normal using radian.

As it is mentioned here:
Apache Commons - Math has what you are looking for.
More specifically, check out the NormalDistrubitionImpl class.
And no your professor doesn't need to download stuff if you provide him with all the needed libraries.
UPDATE :
If you want to hand code it (I don't know the actual formula), you can check the following link:
http://home.online.no/~pjacklam/notes/invnorm/
There are 2 people who implemented it in java: http://home.online.no/~pjacklam/notes/invnorm/#Java

I had had the same problem and find its solution, the following code will give results for cumulative distribution function just like excel do:
private static double erf(double x)
{
//A&S formula 7.1.26
double a1 = 0.254829592;
double a2 = -0.284496736;
double a3 = 1.421413741;
double a4 = -1.453152027;
double a5 = 1.061405429;
double p = 0.3275911;
x = Math.abs(x);
double t = 1 / (1 + p * x);
//Direct calculation using formula 7.1.26 is absolutely correct
//But calculation of nth order polynomial takes O(n^2) operations
//return 1 - (a1 * t + a2 * t * t + a3 * t * t * t + a4 * t * t * t * t + a5 * t * t * t * t * t) * Math.Exp(-1 * x * x);
//Horner's method, takes O(n) operations for nth order polynomial
return 1 - ((((((a5 * t + a4) * t) + a3) * t + a2) * t) + a1) * t * Math.exp(-1 * x * x);
}
public static double NORMSDIST(double z)
{
double sign = 1;
if (z < 0) sign = -1;
double result=0.5 * (1.0 + sign * erf(Math.abs(z)/Math.sqrt(2)));
return result;
}

Mathematically, this is a hard problem, and there are a few solutions you might consider.
Dislcaimer: Mathematical jargon ahead.
As you probably already know, the normalcdf function is used to calculate probabilities of normal random variables. Because a normal distribution is continuous, the corresponding probability density function (normalpdf) does not itself give probabilities, (in contrast to discrete distributions like binomial or geometric distributions). Instead, the area under the curve gives the probability that the normal random variable falls within a range of values. So, the normalcdf function you seek is the area under a section of the normalpdf function.
Mathematically, finding the area under a continuous curve is a fundamental problem of calculus. The solution to this type of problem is called an integral and integrating a function over a range of numbers means finding the area under the curve and between the lowest value in the range to the highest.
In most circumstances, we could just integrate the pdf function to get the cdf function, then evaluate it wherever we want. The heart of the problem, and the reason that an algorithm in Java is not as simple as one might think, is that normalpdf function does not have a closed form integral- it's value cannot be calculated in any finite number of steps. So, values of the normalcdf function are particularly elusive.
There are two main classes of solutions for the problem.
1. Numerical Integration Techniques
Numerical integration techniques solve the problem by approximating the area under the curve geometrically. The area is divided into rectangles or other shapes of equal or varying widths, with the height of each being given by the pdf function. The sum of the areas of the rectangle is an approximation of the area under the curve, which is the corresponding probability. These technique can be used to compute values to arbitrary precision, but is more computationally expensive than class 2. Using better approximations (e.g. Simpson's rule) improves computation. A simple numeric integration method follows.
public static double normCDF(double z)
{ double LeftEndpoint = -100;
int nRectangles = 100000;
double runningSum = 0;
double x;
for(int n = 0; n < nRectangles; n++){
x = LeftEndpoint + n*(z-LeftEndpoint)/nRectangles;
runningSum += Math.pow(Math.sqrt(2*Math.PI),-1)*Math.exp(-Math.pow(x,2)/2)*(z-LeftEndpoint)/nRectangles;
}
System.out.println(runningSum);
return runningSum;
}
2. Analytic Techniques
Analytic techniques take advantage of the fact that while the normalpdf does not have a closed-form integral, the pdf can be "converted" to a sum called a Taylor series, then integrated term-by-term. Basically, it turns the pdf into a sum of infinitely many simple functions, then integrates each one analytically, then adds together all of the integrals. Since this is an analytic procedure, a programmer need only include the integral series in the program after computing the coefficients. The precision of the result just depends on how many terms of the sum you include in the calculation, and tends to approach accurate values much sooner than numerical integration techniques. For example, the solution by Mohammad Aldefrawy computes just five coefficients. Below is a method that includes the computation of coefficients, so you one could compute values to arbitrary precision (Actually, the normalcdf series isn't computed directly. Instead, the coefficients of the related error function are computed then converted by a linear transformation). However, since computation of the coefficients involves the factorial function, one experiences memory issues for substantially large numbers of coefficients. Thankfully, this method returns values with much higher precision in a fraction of the iterations required by methods in the previous class of solutions to yield similar results.
public static double normalCDF(double x){
System.out.println(0.5*(1+erf(x/Math.sqrt(2))));
return 0.5*(1+erf(x/Math.sqrt(2)));
}
public static double erf(double z)
{
int nTerms = 315;
double runningSum = 0;
for(int n = 0; n < nTerms; n++){
runningSum += Math.pow(-1,n)*Math.pow(z,2*n+1)/(factorial(n)*(2*n+1));
}
return (2/Math.sqrt(Math.PI))*runningSum;
}
static double factorial(int n){
if(n == 0) return 1;
if(n == 1) return 1;
return n*factorial(n-1);
}
Other functions
For the inverse function, since we used the error function in the normalCDF method, we can use the inverse error function in a similar way. Again, we obtain the coefficients of the inverse error function analytically, then compute them as needed in the method.
public static double invErf(double z)
{
int nTerms = 315;
double runningSum = 0;
double[] a = new double[nTerms + 1];
double[] c = new double[nTerms + 1];
c[0]=1;
for(int n = 1; n < nTerms; n++){
double runningSum2=0;
for (int k = 0; k <= n-1; k++){
runningSum2 += c[k]*c[n-1-k]/((k+1)*(2*k+1));
}
c[n] = runningSum2;
runningSum2 = 0;
}
for(int n = 0; n < nTerms; n++){
a[n] = c[n]/(2*n+1);
runningSum += a[n]*Math.pow((0.5)*Math.sqrt(Math.PI)*z,2*n+1);
}
return runningSum;
}
public static double invNorm(double A){
return (2/Math.sqrt(2))*invErf(2*A-1);
}
I don't have a method for the lognormal function, but you could obtain one using the same idea.

I never tried it but the guys from algo team were using Colt and they were happy with the results.

Generating correlated numbers

Here is a fun one: I need to generate random x/y pairs that are correlated at a given value of Pearson product moment correlation coefficient, or Pearson r. You can imagine this as two arrays, array X and array Y, where the values of array X and array Y must be re-generated, re-ordered or transformed until they are correlated with each other at a given level of Pearson r. Here is the kicker: Array X and Array Y must be uniform distributions.
I can do this with a normal distribution, but transforming the values without skewing the distribution has me stumped. I tried re-ordering the values in the arrays to increase the correlation, but I will never get arrays correlated at 1.00 or -1.00 just by sorting.
Any ideas?
--
here is the AS3 code for random correlated gaussians, to get the wheels turning:
public static function nextCorrelatedGaussians(r:Number):Array{
var d1:Number;
var d2:Number;
var n1:Number;
var n2:Number;
var lambda:Number;
var r:Number;
var arr:Array = new Array();
var isNeg:Boolean;
if (r<0){
r *= -1;
isNeg=true;
}
lambda= ( (r*r) - Math.sqrt( (r*r) - (r*r*r*r) ) ) / (( 2*r*r ) - 1 );
n1 = nextGaussian();
n2 = nextGaussian();
d1 = n1;
d2 = ((lambda*n1) + ((1-lambda)*n2)) / Math.sqrt( (lambda*lambda) + (1-lambda)*(1-lambda));
if (isNeg) {d2*= -1}
arr.push(d1);
arr.push(d2);
return arr;
}

I ended up writing a short paper on this
It doesn't include your sorting method (although in practice I think it's similar to my first method, in a roundabout way), but does describe two ways that don't require iteration.

Here is an implementation of of twolfe18's algorithm written in Actionscript 3:
for (var j:int=0; j < size; j++) {
xValues[i]=Math.random());
}
var varX:Number = Util.variance(xValues);
var varianceE:Number = 1/(r*varX) - varX;
for (var i:int=0; i < size; i++) {
yValues[i] = xValues[i] + boxMuller(0, Math.sqrt(varianceE));
}
boxMuller is just a method that generates a random Gaussian with the arguments (mean, stdDev).
size is the size of the distribution.
Sample output
Target p: 0.8
Generated p: 0.04846346291280387
variance of x distribution: 0.0707786253165176
varianceE: 17.589920412141158
As you can see I'm still a ways off. Any suggestions?

This apparently simple question has been messing up with my mind since yesterday evening! I looked for the topic of simulating distributions with a dependency, and the best I found is this: simulate dependent random variables. The gist of it is, you can easily simulate 2 normals with given correlation, and they outline a method to transform these non-independent normals, but this won't preserve correlation. The correlation of the transform will be correlated, so to speak, but not identical. See the paragraph "Rank correlation coefficents".
Edit: from what I gather from the second part of the article, the copula method would allow you to simulate / generate random variables with rank correlation.

start with the model y = x + e where e is the error (a normal random variable). e should have a mean of 0 and variance k.
long story short, you can write a formula for the expected value of the Pearson in terms of k, and solve for k. note, you cannot randomly generate data with the Pearson exactly equal to a specific value, only with the expected Pearson of a specific value.
i'll try to come back and edit this post to include a closed form solution when i have access to some paper.
EDIT: ok, i have a hand-wavy solution that is probably correct (but will require testing to confirm). for now, assume desired Pearson = p > 0 (you can figure out the p < 0 case). like i mentioned earlier, set your model for Y = X + E (X is uniform, E is normal).
sample to get your x's
compute var(x)
the variance of E should be: (1/(rsd(x)))^2 - var(x)
generate your y's based on your x's and sample from your normal random variable E
for p < 0, set Y = -X + E. proceed accordingly.
basically, this follows from the definition of Pearson: cov(x,y)/var(x)*var(y). when you add noise to the x's (Y = X + E), the expected covariance cov(x,y) should not change from that with no noise. the var(x) does not change. the var(y) is the sum of var(x) and var(e), hence my solution.
SECOND EDIT: ok, i need to read definitions better. the definition of Pearson is cov(x, y)/(sd(x)sd(y)). from that, i think the true value of var(E) should be (1/(rsd(x)))^2 - var(x). see if that works.

To get a correlation of 1 both X and Y should be the same, so copy X to Y and you have a correlation of 1. To get a -1 correlation, make Y = 1 - X. (assuming X values are [0,1])

A strange problem demands a strange solution -- here is how I solved it.
-Generate array X
-Clone array X to Create array Y
-Sort array X (you can use whatever method you want to sort array X -- quicksort, heapsort anything stable.)
-Measure the starting level of pearson's R with array X sorted and array Y unsorted.
WHILE the correlation is outside of the range you are hoping for
IF the correlation is to low
run one iteration of CombSort11 on array Y then recheck correlation
ELSE IF the correlation is too high
randomly swap two values and recheck correlation
And thats it! Combsort is the real key, it has the effect of increasing the correlation slowly and steadily. Check out Jason Harrison's demo to see what I mean. To get a negative correlation you can invert the sort or invert one of the arrays after the whole process is complete.
Here is my implementation in AS3:
public static function nextReliableCorrelatedUniforms(r:Number, size:int, error:Number):Array {
var yValues:Array = new Array;
var xValues:Array = new Array;
var coVar:Number = 0;
for (var e:int=0; e < size; e++) { //create x values
xValues.push(Math.random());
}
yValues = xValues.concat();
if(r != 1.0){
xValues.sort(Array.NUMERIC);
}
var trueR:Number = Util.getPearson(xValues, yValues);
while(Math.abs(trueR-r)>error){
if (trueR < r-error){ // combsort11 for y
var gap:int = yValues.length;
var swapped:Boolean = true;
while (trueR <= r-error) {
if (gap > 1) {
gap = Math.round(gap / 1.3);
}
var i:int = 0;
swapped = false;
while (i + gap < yValues.length && trueR <= r-error) {
if (yValues[i] > yValues[i + gap]) {
var t:Number = yValues[i];
yValues[i] = yValues[i + gap];
yValues[i + gap] = t;
trueR = Util.getPearson(xValues, yValues)
swapped = true;
}
i++;
}
}
}
else { // decorrelate
while (trueR >= r+error) {
var a:int = Random.randomUniformIntegerBetween(0, size-1);
var b:int = Random.randomUniformIntegerBetween(0, size-1);
var temp:Number = yValues[a];
yValues[a] = yValues[b];
yValues[b] = temp;
trueR = Util.getPearson(xValues, yValues)
}
}
}
var correlates:Array = new Array;
for (var h:int=0; h < size; h++) {
var pair:Array = new Array(xValues[h], yValues[h]);
correlates.push(pair);}
return correlates;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.