how to iteratively subtract and compare values in java - java

there are n metric values like below
metric of feature clear 0.4054651081081644
metric of board various 0.6931471805599453
metric of design few 0.025975486403260736
metric of call end 0.13353139262452257
metric of bag other 0.1823215567939546
Now, the highest value has to be taken(board various). From highest value how subtract all the other values one by one iteratively and retrieve values that are above 0.02

List<Double> vals = <list of your values>
double max = Collections.max(vals);
vals.remove(max);
double result = vals.stream()
.filter(v -> v > .02)
.reduce(max, (v1, v2) -> v1 - v2);

Pseudo code (Python, actually):
metrics = [0.4054651081081644, 0.6931471805599453, ... 0.1823215567939546]
mx = max(metrics)
for m in metrics:
if m != mx and m > 0.02:
mx = mx - m
The translation to Java is left as an exercise to the reader.
Caution: Comparing floating point numbers can be tricky.

Related

Java: Intersection of two ellipse segments transformed into 3d space

I have segments of lines and ellipses (NOT planes and ellipsoids) transformed into 3d space and need to calculate whether two given segments intersect and where.
I'm using Java but haven't yet found a library which solves my problem, nor came across some algorithms that I could use for my own implementation.
For line-line intersection test there are several ways to solve. The classic way is using linear algebra (i.e., solving a linear matrix system) but from software development point of view I like more the Geometric-Algebra way, in the form of Plucker Coordinates, which only requires to implement vector algebra operations (i.e., cross-product and dot-product) which are simpler to code than matrix operations for solving linear systems.
I will show both for the sake of comparison then you decide:
Linear Algebra Way
Given line segment P limited by points P1 and P2 and line segment Q limited by points Q1 and Q2.
The parametric form of the lines is given by:
P(t) = P1 + t (P2 - P1)
Q(t) = Q1 + t (Q2 - Q1)
Where t is a real number in the interval [0 1].
If two lines intersect then the following equation holds:
P(t0) = Q(t1)
Provided that the two unknown numbers t0 and t1 exist. Expanding the above equation we get:
t0 (P2 - P1) - t1 (Q2 - Q1) = Q1 - P1
We can solve for t0 and t1 by expressing the above equation in matrix algebra:
A x = B
Where A is a 3x2 matrix with coordinates of vector (P2 - P1) in the first column and coordinates of the vector (Q2 - Q1) in the second column; x is a 2x1 column vector of unknowns t0 and t1 and B is a 3x1column vector with coordinates of vector (Q1 - P1).
Classically the system can be solved calculating the pseudo-inverse of matrix A, denoted by A^+:
A^+ = (A^T A)^-1 A^T
See:
https://en.m.wikipedia.org/wiki/Generalized_inverse
Fortunately any matrix package in Java should be able to compute the above calculations very easily and perhaps very efficiently too.
If multiplying A with its pseudo-inverse A^+ is equal to the identity matrix I, i.e., (A A^+) == I, then there IS a unique answer (intersection) and you can get it computing the following product:
x = A^+ B
Of course if you cannot compute the pseudo-inverse in the first place, e.g., because (A^T A) is singular (i.e. determinant is zero), then no intersection exists.
Since we are dealing with segments, the intersection point is at point P(x0) or Q(x1) iff x0 and x1 are both in the interval [0 1].
OTHER METHOD OF SOLUTION
To avoid dealing with matrix algebra we can try to solve the system using vector algebra and substitution method:
t0 (P2 - P1) - t1 (Q2 - Q1) = Q1 - P1
t0 = a + t1 b
t1 = C • (Q1 - (1 - a) P1 - a P2) / |C|^2
Where:
a = (P2 - P1) • (Q1 - P1) / |P2 - P1|^2
b = (P2 - P1) • (Q2 - Q1) / |P2 - P1|^2
C = b (P2 - P1) - (Q2 - Q1)
I cannot provide a geometric intuition of the above results yet.
Plucker Coordinates way
Given line segment P limited by points P1 and P2 and line segment Q limited by points Q1 and Q2.
The Plucker Coordinates of line P is given by a pair of 3d vectors (Pd, Pm):
Pd = P2 - P1
Pm = P1 x P2
Where Pm is the cross-product of P1 and P2. The Plucker Coordinates (Qd, Qm) of line Q can be calculated in exactly the same way.
The lines P and Q only can intersect if they are coplanar. Thr lines P and Q are coplnanar iif:
Pd • Qm + Qd • Pm = 0
Where (•) is the dot-product. Since machines have finite precision a robust test should check not for zero but for a small number. If Pd x Qd = 0 then lines are parallel (here 0 is the zero vector). Again a robust test should be for instamce that the squared length of (Pd x Qd) is small.
If the lines are not parallel then they are coplanar and their intersection (called "meet" in Plucker's jargon) will be the point:
x = ((Pm • N) Qd - (Qm • N) Pd - (Pm • Qd) N) / (Pd x Qd) • N
Where N is any coordinate axis vector (i.e., (1,0,0) or (0,1,0) or (0,0,1)), such that (Pd x Qd) • N is non-zero.
If the neither P nor Q pass through the origin, then their Plucker coordinate Pm and Qm respectively will be non-zero and the following sinpler formula will work
x = Pm x Qm / Pd • Qm
For an introduction to Plucker Coordinates see:
https://en.m.wikipedia.org/wiki/Plücker_coordinates
http://www.realtimerendering.com/resources/RTNews/html/rtnv11n1.html#art3
For a general intersection formula see "Corollary 6" of:
http://web.cs.iastate.edu/~cs577/handouts/plucker-coordinates.pdf
Transforming Ellipses to Circles Forth and Back
We can always transform an ellipse into a circle. An ellipse has two "radius", called semi-axes, which you can visualize in your mind as two orthogonal vectors, one big called major semi-axes and one small called minor semi-axes. You can apply a non-uniform scaling transformation to both semi-axes vectors for making them of equal size, so you get a circle.
We define an ellipse "E" by its center O, which is a 3d point and its two semi-axes A1 and A2, which are also 3d vectors. A normal vector N to the ellipse's plane can be computed by the cross product of its semi-axes N = A1 x A2 and then normalizing it to have unit length.
For now suppose there is a linear function M that you can use to transform (scale) your ellipse E into a circle C, with radius equal to the minor semi-axes, by applying it to your ellipse's semi-axes A1 and A2 and to the ellipse's center O.
Notice that the eliipse's center O and ellipse's normal vector N are not changed by M. So M(N) = N and M(O) = O. That means that the circle is in the same plane and has the same position C than the ellipse. The linear function M has a corresponding inverse function M^-1 so we can transform back the vectors of the circle to get the original ellipse E.
Of course we can transform the endpoints of lines P and Q also using function M for sending them to the "circle space" and we can send them back to "ellipse space" using M^-1.
Using M we can compute the intersection of the lines P and Q with the ellipse E in the circle space. So now we can focus on the line-circle intersection.
Line-Plane Intersection
Given a plane with normal vector N and distance D such that:
N • x + D = 0
For every point x in the plane. Then the intersection with line P with Plucker Coordinates (Pd, Pm) is given by:
x = (N x Pm - D Pd) / N • Pd
This works only if the line P is not in the plane i.e.,:
(N • P1 + D) != 0 and (N • P2 + D) != 0
And for our ellipse E we have:
N = (A1 x A2)/|A1 x A2|
D = -N • O
Line-Circle and Point-Circle Intersection
Now having x, the point-in-circle check is easy:
|O - x| <= |A2|
Equality holds only when x is in circle boundary.
If line P is in circle's plane then the following sinple check will give you the answer:
https://stackoverflow.com/a/1079478/9147444
How to Compute the Linear Function M
A simple way to compute M is the following. Use the Ellipse's normal vector N and semi-axes A1 and A2 to create a 3x3 matrix U. Such that U has the vectors A1, A2 and N as columns.
Then scale the major semi-axes A1 to have the same length to the minor semi-axes A2. Then create the matrix V auch that V has the new vector A1 and A2 and N as columns.
Then we define the linear matrix system:
R U = V
Where R is a 3x3 (non-uniform-)scaling-rotation matrix.
We can solve for R by multiplying both sides of the equation by the inverse of U which is denoted by U^-1
R U U^-1 = V U^-1
Since U U^-1 is the identity matrix we get:
R = V U^+
Using R we define the affine transformation
M(x) = R (x - O) + O
We can use M to transform points to circle space, such as O, P1, P2, Q1 and Q2. But if we need to transform vectors such as A1, A2, N, Pd and Qd. We need to use the simpler M:
M(x) = R x
Since A1, A2 and N are eigenvectors of R then R is not singular and has an inverse. We define the inverse M as:
M^-1(x) = R^-1 (x - O) + O
And for vectors:
M^-1(x) = R^-1 x
Update: Ellipse-Ellipse intersection
Two intersecting non-coplanar 3d-ellipses have their intersection points on the line formed by the intersection between their planes. So you first find the line formed by the intersection of the planes containig the ellipses (if planes do not intersect i.e., they are parallel, then neither the ellipses intersect).
The line of intersection belong to both planes, so it is perpendicular to both normals. The direction vector V is the cross product of the plane normals:
V = N1 × N2
To fully determine the line we also need to find a point on the line. We can do that solving the linear equations of the planes. Given the 2x3 matrix N = [N1^T N2^T] with the normals N1 and N2 as rows, and the 2x1 column vector b = [N1 • C1, N2 • C2], where C1 and C2 are the centers of the ellipses, we can form the linear matrix system:
N X = b
Where X is some point satifying both plane equations. The system is underdetermined since there are infinite number of points in the line satifying the system. We can still find a particular solution closer to the origin by using the pseudo-inverse of matrix N, denoted by N^+.
X = N^+ b
The line equation is
L(t) = X + t V
For some scalar t.
Then you can apply the method described earlier to test the line-ellipse intersection i.e., scaling the ellipse to a circle and intersect with the coplanar line. If both ellipses intersect the line in the same points then they intersect.
Now, the case in which the ellipses actually lie on the same plane is more complex. You can ceck the approach taken by Dr Eberly in his excellent book "Geometric Tools" (also available online):
https://www.geometrictools.com/Documentation/IntersectionOfEllipses.pdf
And also you can check the C++ source code which is open source:
https://www.geometrictools.com/GTEngine/Include/Mathematics/GteIntrEllipse2Ellipse2.h

Using random function in selecting an object if two same distance values

I have an ArrayList unsolvedOutlets containing object Outlet that has attributes longitude and latitude.
Using the longitude and latitude of Outlet objects in ArrayList unsolvedOutlets, I need to find the smallest distance in that list using the distance formula : SQRT(((X2 - X1)^2)+(Y2-Y1)^2), wherein (X1, Y1) are given. I use Collections.min(list) in finding the smallest distance.
My problem is if there are two or more values with the same smallest distance, I'd have to randomly select one from them.
Code:
ArrayList<Double> distances = new ArrayList<Double>();
Double smallestDistance = 0.0;
for (int i = 0; i < unsolvedOutlets.size(); i++) {
distances.add(Math.sqrt(
(unsolvedOutlets.get(i).getLatitude() - currSolved.getLatitude())*
(unsolvedOutlets.get(i).getLatitude() - currSolved.getLatitude())+
(unsolvedOutlets.get(i).getLongitude() - currSolved.getLongitude())*
(unsolvedOutlets.get(i).getLongitude() - currSolved.getLongitude())));
distances.add(0.0); //added this to test
distances.add(0.0); //added this to test
smallestDistance = Collections.min(distances);
System.out.println(smallestDistance);
}
The outcome in the console would print out 0.0 but it wont stop. Is there a way to know if there are multiple values with same smallest value. Then I'd incorporate the Random function. Did that make sense? lol but if anyone would have the logic for that, it would be really helpful!!
Thank you!
Keep track of the indices with min distance in your loop and after the loop choose one at random:
Random random = ...
...
List<Integer> minDistanceIndices = new ArrayList<>();
double smallestDistance = 0.0;
for (int i = 0; i < unsolvedOutlets.size(); i++) {
double newDistance = Math.sqrt(
(unsolvedOutlets.get(i).getLatitude() - currSolved.getLatitude())*
(unsolvedOutlets.get(i).getLatitude() - currSolved.getLatitude())+
(unsolvedOutlets.get(i).getLongitude() - currSolved.getLongitude())*
(unsolvedOutlets.get(i).getLongitude() - currSolved.getLongitude()));
distances.add(newDistance);
if (newDistance < smallestDistance) {
minDistanceIndices.clear();
minDistanceIndices.add(i);
smallestDistance = newDistance;
} else if (newDistance == smallestDistance) {
minDistanceIndices.add(i);
}
}
if (!unsolvedOutlets.isEmpty()) {
int index = minDistanceIndices.get(random.nextInt(minDistanceIndices.size()));
Object chosenOutlet = unsolvedOutlets.get(index);
System.out.println("chosen outlet: "+ chosenOutlet);
}
As Jon Skeet mentioned you don't need to take the square root to compare the distances.
Also if you want to use distances on a sphere your formula is wrong:
With your formula you'll get the same distance for (0° N, 180° E) to (0° N, 0° E) as for (90° N, 180° E) to (90° N, 0° E), but while you need to travel around half the earth to travel from the first to the second, the last 2 coordinates both denote the north pole.
Note: I believe fabian's solution is superior to this, but I've kept it around to demonstrate that there are many different ways of implementing this...
I would probably:
Create a new type which contained the distance from the outlet as well as the outlet (or just the square of the distance), or use a generic Pair type for the same purpose
Map (using Stream.map) the original list to a list of these pairs
Order by the distance or square-of-distance
Look through the sorted list until you find a distance which isn't the same as the first one in the list
You then know how many - and which - outlets have the same distance.
Another option would be to simply shuffle the original collection, then sort the result by distance, then take the first element - that way even if multiple of them do have the same distance, you'll be taking a random one of those.
JB Nizet's option of "find the minimum, then perform a second scan to find all those with that distance" would be fine too - and quite possibly simpler :) Lots of options...

Poisson Distribution in Java (correctness?)

I have to generate data for a Poisson distribution. My range is n = 1000 up to 100K. Where n is the number of data elements; k varies from 1 to n. It says to use lambda as n/2
I have never taken stats and have no idea how to get the correct curve here. I can feed it lambda as n/2, but do I vary K from 0-n? I tried this (passing k in as a parameter) and when I graphed the data it ramped up, not a fish tail. What am I doing wrong, or am I doing it correctly?
Thanks
I have this code in java from Knuth.
static double poissonRandomNumber(int lambda) {
double L = Math.exp(-lambda);
int k = 0;
double p = 1;
do {
k = k + 1;
double u = Math.random();
p = p * u;
} while (p > L);
return k - 1;
}
One of the problems you are running into is a basic limitation of how computers represent and perform calculations with floating point numbers.
A real number is represented on a computer in a form similar to scientific notation:
Significant digits × base^exponent
For double precision numbers, there are 11 bits used for the exponent and 52 for the "significant digits" portion. Because floating point numbers are normalized, the first positive floating point number > 0.0 has a value of about 10^-320 (this is defined as Double.MIN_VALUE in Java). See IEEE Standard 754 Floating Point Numbers for a good writeup on this.
Consider the line of code:
double L = Math.exp(-lambda);
With a lambda of 1000, e^-1000 (which is about 10^-435) is less than Double.MIN_VALUE, and there is no way the computer can represent e^-1000 any differently than it can represent e^-100000
You can solve this problem by noticing that lambda is an "arrival rate", and you can calculate random samples for shorter intervals and sum them. That is
x = p(L);
can be computed as
x = p(L/2) + p(L/2);
and larger numbers can be approximated:
x = 100 * p(L/100);
The Wikipedia article has on the Poisson distribution has some good pointers to ways to compute Poisson distributions for large values of lambda.

java hashcode for floating point numbers

I want to use Double (or Float) as keys in a Hashmap
Map<Double, String> map = new HashMap<Double, String>()
map.put(1.0, "one");
System.out.println(map.containsKey(Math.tan(Math.PI / 4)));
and this returns false.
if I were comparing these two numbers I would have done something like this
final double EPSILON = 1e-6;
Math.abs(1.0 - Math.tan(Math.PI / 4)) < EPSILON
But since Hashmap would use hashcode it breaks things for me.
I thought to implement a roundKey function that rounds to some multiple of EPSILON before using it as a key
map.put(roundKey(1.0), "one")
map.containsKey(roundKey(Math.tan(Math.PI / 4)))
is there a better way ?
what is the right way to implement this roundKey
If you know what rounding is appropriate, you can use that. e.g. if you need to round to cents, you can round to two decimal places.
However, for the example above discrete rounding to a fixed precision might not be appropriate. e.g. if you round to 6 decimal places, 1.4999e-6 and 1.5001e-6 will not match as one rounds up and the other down even though the difference is << 1e-6.
In that situation the closest you can do is to use a NavigableMap
NavigableMap<Double, String> map = new TreeMap<>();
double x = ....;
double error = 1e-6;
NavigableMap<Double, String> map2 = map.subMap(x - error, x + error);
or you can use
Map.Entry<Double, String> higher = map.higherEntry(x);
Map.Entry<Double, String> lower = map.lowerEntry(x);
Map.Entry<Double, String> entry = null;
if (higher == null)
entry = lower;
else if (lower == null)
entry = higher;
else if (Math.abs(lower.getKey() - x) < Math.abs(higher.getkey() - x))
entry = lower;
else
entry = higher;
// entry is the closest match.
if (entry != null && Math.abs(entry - x) < error) {
// found the closest entry within the error
}
This will find all the entries within a continuous range.
Best way is to not use floating point numbers as keys, as they are (as you discovered) not going to compare.
Kludgy "solutions" like calling them identical if they're within a certain range of each other only lead to problems later, as you're either going to have to stretch the filter or make it more strict in time, both leading to potential problems with existing code, and/or people will forget how things were supposed to work.
Of course in some applications you want to do that, but as a key for looking up something? No. You're probably better off using angles in degrees, and as integers, as the keys here. If you need greater precision than 1 degree, use the angle in e.g. tenth of degrees by storing a number of 0 through 3600.
That will give you reliable behaviour of your Map while retaining the data you're planning to store.

Generating correlated numbers

Here is a fun one: I need to generate random x/y pairs that are correlated at a given value of Pearson product moment correlation coefficient, or Pearson r. You can imagine this as two arrays, array X and array Y, where the values of array X and array Y must be re-generated, re-ordered or transformed until they are correlated with each other at a given level of Pearson r. Here is the kicker: Array X and Array Y must be uniform distributions.
I can do this with a normal distribution, but transforming the values without skewing the distribution has me stumped. I tried re-ordering the values in the arrays to increase the correlation, but I will never get arrays correlated at 1.00 or -1.00 just by sorting.
Any ideas?
--
here is the AS3 code for random correlated gaussians, to get the wheels turning:
public static function nextCorrelatedGaussians(r:Number):Array{
var d1:Number;
var d2:Number;
var n1:Number;
var n2:Number;
var lambda:Number;
var r:Number;
var arr:Array = new Array();
var isNeg:Boolean;
if (r<0){
r *= -1;
isNeg=true;
}
lambda= ( (r*r) - Math.sqrt( (r*r) - (r*r*r*r) ) ) / (( 2*r*r ) - 1 );
n1 = nextGaussian();
n2 = nextGaussian();
d1 = n1;
d2 = ((lambda*n1) + ((1-lambda)*n2)) / Math.sqrt( (lambda*lambda) + (1-lambda)*(1-lambda));
if (isNeg) {d2*= -1}
arr.push(d1);
arr.push(d2);
return arr;
}
I ended up writing a short paper on this
It doesn't include your sorting method (although in practice I think it's similar to my first method, in a roundabout way), but does describe two ways that don't require iteration.
Here is an implementation of of twolfe18's algorithm written in Actionscript 3:
for (var j:int=0; j < size; j++) {
xValues[i]=Math.random());
}
var varX:Number = Util.variance(xValues);
var varianceE:Number = 1/(r*varX) - varX;
for (var i:int=0; i < size; i++) {
yValues[i] = xValues[i] + boxMuller(0, Math.sqrt(varianceE));
}
boxMuller is just a method that generates a random Gaussian with the arguments (mean, stdDev).
size is the size of the distribution.
Sample output
Target p: 0.8
Generated p: 0.04846346291280387
variance of x distribution: 0.0707786253165176
varianceE: 17.589920412141158
As you can see I'm still a ways off. Any suggestions?
This apparently simple question has been messing up with my mind since yesterday evening! I looked for the topic of simulating distributions with a dependency, and the best I found is this: simulate dependent random variables. The gist of it is, you can easily simulate 2 normals with given correlation, and they outline a method to transform these non-independent normals, but this won't preserve correlation. The correlation of the transform will be correlated, so to speak, but not identical. See the paragraph "Rank correlation coefficents".
Edit: from what I gather from the second part of the article, the copula method would allow you to simulate / generate random variables with rank correlation.
start with the model y = x + e where e is the error (a normal random variable). e should have a mean of 0 and variance k.
long story short, you can write a formula for the expected value of the Pearson in terms of k, and solve for k. note, you cannot randomly generate data with the Pearson exactly equal to a specific value, only with the expected Pearson of a specific value.
i'll try to come back and edit this post to include a closed form solution when i have access to some paper.
EDIT: ok, i have a hand-wavy solution that is probably correct (but will require testing to confirm). for now, assume desired Pearson = p > 0 (you can figure out the p < 0 case). like i mentioned earlier, set your model for Y = X + E (X is uniform, E is normal).
sample to get your x's
compute var(x)
the variance of E should be: (1/(rsd(x)))^2 - var(x)
generate your y's based on your x's and sample from your normal random variable E
for p < 0, set Y = -X + E. proceed accordingly.
basically, this follows from the definition of Pearson: cov(x,y)/var(x)*var(y). when you add noise to the x's (Y = X + E), the expected covariance cov(x,y) should not change from that with no noise. the var(x) does not change. the var(y) is the sum of var(x) and var(e), hence my solution.
SECOND EDIT: ok, i need to read definitions better. the definition of Pearson is cov(x, y)/(sd(x)sd(y)). from that, i think the true value of var(E) should be (1/(rsd(x)))^2 - var(x). see if that works.
To get a correlation of 1 both X and Y should be the same, so copy X to Y and you have a correlation of 1. To get a -1 correlation, make Y = 1 - X. (assuming X values are [0,1])
A strange problem demands a strange solution -- here is how I solved it.
-Generate array X
-Clone array X to Create array Y
-Sort array X (you can use whatever method you want to sort array X -- quicksort, heapsort anything stable.)
-Measure the starting level of pearson's R with array X sorted and array Y unsorted.
WHILE the correlation is outside of the range you are hoping for
IF the correlation is to low
run one iteration of CombSort11 on array Y then recheck correlation
ELSE IF the correlation is too high
randomly swap two values and recheck correlation
And thats it! Combsort is the real key, it has the effect of increasing the correlation slowly and steadily. Check out Jason Harrison's demo to see what I mean. To get a negative correlation you can invert the sort or invert one of the arrays after the whole process is complete.
Here is my implementation in AS3:
public static function nextReliableCorrelatedUniforms(r:Number, size:int, error:Number):Array {
var yValues:Array = new Array;
var xValues:Array = new Array;
var coVar:Number = 0;
for (var e:int=0; e < size; e++) { //create x values
xValues.push(Math.random());
}
yValues = xValues.concat();
if(r != 1.0){
xValues.sort(Array.NUMERIC);
}
var trueR:Number = Util.getPearson(xValues, yValues);
while(Math.abs(trueR-r)>error){
if (trueR < r-error){ // combsort11 for y
var gap:int = yValues.length;
var swapped:Boolean = true;
while (trueR <= r-error) {
if (gap > 1) {
gap = Math.round(gap / 1.3);
}
var i:int = 0;
swapped = false;
while (i + gap < yValues.length && trueR <= r-error) {
if (yValues[i] > yValues[i + gap]) {
var t:Number = yValues[i];
yValues[i] = yValues[i + gap];
yValues[i + gap] = t;
trueR = Util.getPearson(xValues, yValues)
swapped = true;
}
i++;
}
}
}
else { // decorrelate
while (trueR >= r+error) {
var a:int = Random.randomUniformIntegerBetween(0, size-1);
var b:int = Random.randomUniformIntegerBetween(0, size-1);
var temp:Number = yValues[a];
yValues[a] = yValues[b];
yValues[b] = temp;
trueR = Util.getPearson(xValues, yValues)
}
}
}
var correlates:Array = new Array;
for (var h:int=0; h < size; h++) {
var pair:Array = new Array(xValues[h], yValues[h]);
correlates.push(pair);}
return correlates;
}

Categories