Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have a Rectangle class. It has a length, breadth and area (all ints). I want to hash it such that every rectangle with the same length and breadth hashes to the same value. What is a way to do this?
EDIT: I understand its a broad question. Thats why I asked for "a" way to do it. Not the best way.
A good and simple scheme is to calculate the hash for a pair of integers as follows:
hash = length * CONSTANT + width
Empirically, you will get best results (i.e. the fewest number collisions) if CONSTANT is a prime number. A lot of people1 recommend a value like 31, but the best choice depends on the most likely range of the length and width value. If they are strictly bounded, and small enough, then you could do better than 31.
However, 31 is probably good enough for practical purposes2. A few collisions at this level is unlikely to make a significant performance difference, and even a perfect hashing function does not eliminate collisions at the hash table level ... where you use the modulus of the hash value.
1 - I'm not sure where this number comes from, or whether there are empirical studies to back it up ... in the general case. I suspect it comes from hashing of (ASCII) strings. But 31 is prime ... and it is a Mersenne prime (2^7 - 1) which means it could be computed using a shift and a subtraction if hardware multiple is slow.
2 - I'm excluding cases where you need to worry about someone deliberately creating hash function collisions in an attempt to "break" something.
You can use the Apache Commons library, which has a HashCodeBuilder class. Assuming you have a Rectangle class with a width and a height, you can add the following method:
#Override
public int hashCode(){
return new HashCodeBuilder().append(width).append(height).append(children).toHashCode();
}
What you want (as clarified in your comment on the question) is not possible. There are N possible hashCodes, one for each int, where N is approximately 4.2 billion. Assuming rectangles must have positive dimensions, there are ((N * N) / 4) possible rectangles. How do you propose to make them fit into N hashCodes? When N is > 4, you have more possible rectangles than hashCodes.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 months ago.
Improve this question
I need to generate random numbers between 100 and 500 where the mean is 150.
I've been reading on distribution curves and probability density functions but can't find a concrete solution to this question.
Any help is appreciated.
I can think of two possible approaches you might take. One would be to take something like a normal distribution but that might lead to values outside your range. You could discard those samples and try again, but doing so would alter your mean. So this is probably the more complicated approach.
The other alternative, the one I would actually suggest, is to start with a uniform distribution in the range 0 to 1. Then transform that number so the result has the properties you want.
There are many such transformations you could employ. In the absence of any strong rationale for something else, I would probably go for some formula such as
y = pow(x, a) * b + c
In that formula x would be uniformly distributed in the [0, 1] range, while y should have the bounds and mean you want, once the three parameters have been tuned correctly.
Using b=400 and c=100 you can match the endpoints quite easily, because a number from the [0, 1] range raised to any power will again be a number from that range. So now all you need is determine a. Reversing the effect of b and c you want pow(x, a) to have an mean of (150 - c) / b = 1/8 = 0.125.
To compute the mean (or expected value) in a discrete distribution, you multiply each value with its probability and sum them up. In the case of a continuous distribution that becomes the integral over value times probability density function. The density function of our uniform distribution is 1 in the interval and 0 elsewhere. So we just need to integrate pow(x, a) from 0 to 1. The result of that is 1 / (a + 1) so you get
1 / (a + 1) = 1 / 8
a + 1 = 8
a = 7
So taking it all together I'd suggest
return Math.pow(random.nextDouble(), 7) * 400 + 100
If you want to get a feeling for this distribution, you can consider
x = pow((y - c) / b, 1 / a)
to be the cumulative distribution function. The density would be the derivative of that with respect to y. Ask a computer algebra system and you get some ugly formula. You might as well ask directly for a plot, e.g. on Wolfram Alpha.
That probability density is actually infinite at 100, and then drops quickly for larger values. So one thing you don't get from this approach is a density maximum at 150. If you had wanted that, you'd need a different approach but getting both density maximum and expected value at 150 feels really tricky.
One more thing to consider would be reversing the orientation. If you start with b=-400 and c=500 you get a=1/7. That's a different distribution, with different properties but the same bounds and mean. If I find the time I'll try to plot a comparison for both of these.
Create a List of numbers that have the mean value you want, they can be in any order to reach just this property (so you could just count up or whatever).
Then use Collections.shuffle to randomize the order of the list.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Using Java, I am trying to work my way through a list of problems like this:
Write a program that prints all prime numbers. (Note: if your programming language does not support arbitrary size numbers, printing all primes up to the largest number you can easily represent is fine too.)
Do they mean "all prime numbers up to n"? How can I know whether Java supports arbitrary size numbers or not, and if yes, what is it?
Java's primitives have well defined ranges. ints, for example, range between -231 and 231-1. You can see the full details in Java's tutorial. If you want to represent larger integers than the long primitive allows (i.e., 263-1), you'll have to resort to using the BigInteger class.
A program that printed all primes would be a non-halting program. There are infinitely many primes, so you could run the program forever.
The basic number types in most languages have a defined amount of memory space, and therefore a limited range of numbers they can express. For example, a Java int is stored in 4 bytes. That would make the non-halting primes program impossible, since you would eventually get to a prime larger than the largest number you could store.
As well as primitive number types, many languages have arbitrary size number types. The amount of memory they use will grow as you store larger numbers. Java has BigInteger for this purpose, while Python just stores all numbers that way by default.
Even these arbitrary-size numbers, though, are eventually limited by the amount of memory the program can access. In reality, it is impossible to print all the primes.
What the problem statement you are trying to solve is really saying is that they don't want you to bother about any of this! Just use the standard number type in your language. No system is capable of calculating infinitely many primes. What they want you to do is to write the algorithm that would calculate infinitely many primes given a system and data type capable of doing so.
Perhaps they could have worded it better. In short: don't worry about it.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm preparing for an interview. The question I got is: Two numbers are represented by a linked list, where each node contains a single digit.The digits are stored in reverse oder, such that the 1's digit is at the head of the list. Write a function that adds the two numbers and returns the sum as a linked list.
The suggested answer adds each digits individually and keeps a "carry" number. For example, if the first digits of the two numbers are "5" and "7". The algorithm records "2" in the first digit of resulting sum and keeps "1" as a "carry" to add to 10th digit of result.
However, my solution is to traverse the two linked lists and translate them into two integers. Then I add the numbers and translate sum to a new linked list. Wouldn't my solution be more straight forward and efficient?
Thanks!
While your solution may be more straightforward, I don't think it's actually more efficient.
I'm assuming the "correct" algorithm is something like this:
pop the first element of both lists
Add them together (with the carry if there is one) and make a new node using the ones digit
Pass the carry (dividing the sum by 10 to get the actual thing to carry) and repeat 1) and 2), with each successive node being pointed to by the previous one.
The main things I see when I'm comparing your algorithm with that one is:
In terms of memory, you want to create two BigIntegers to store intermediate values (I'm assuming you're using BigInteger or some equivalent Object to avoid the constraints of an int or long), on top of the final linked list itself. The original algorithm doesn't use more than a couple of ints to do intermediate calculations, and so in the long run, the original algorithm actually uses less memory.
You're also suggesting that you do all of your arithmetic using the BigIntegers, rather that in ints. Although it's possible that BigInteger is really optimized to the point where it isn't much slower than primitive operations, I highly doubt that calling BigInteger#add is faster than doing the + operation on ints.
Also, some more food for thought. Suppose you didn't have something handy like BigInteger to store arbitrarily large integers. Then you're going to have to have some way to store arbitrarily large integers for your algorithm to work properly. After that, you basically need a way to add arbitrarily large integers to add arbitrarily large integers, and you end up with a problem where you either have to do something like the original algorithm anyway, or you end up using a completely different representation in a subroutine (yikes).
(Assuming by "integer" you mean int.)
Your solution does not scale beyond numbers that can fit in an int, whereas the original solution is only limited by the amount of available memory.
As far as efficiency is concerned, there is nothing about your solution that would make it more efficient than the original.
Your solution is more straightforward to describe, certainly - and an argument might be made in certain situations that the readability of the code your solution would produce would be preferable, when working with large teams.
However, most of the time - their suggested answer is a lot more memory efficient, and probably more CPU-efficient.
You're suggesting going through the first linked-list, storing it as a number (+1 store). Going through the second, storing it as a number (+1 store). Adding the 2 numbers, saving the result (+1 store). Converting this number into a linked list, and saving it (+1 store)
Their solution involves going through the first and second linked list, while writing to the third, and storing it as a new one (+1 store)
This is a +1 store, vs. your +4 store. This might seem like not much, but if we were to try and add n pairs of numbers at the same time (on a distributed system or something), you're looking at 4n stores, rather than just n stores. Which could be a big deal.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I did an algoritms refresher and I (re)read about a sorting algorithm that runs in linear time, namely Counting Sort.
To be honest I had forgotten about it.
I understand the structure and logic and the fact that it runs in linear time is a very attactive quality.
But I have the folowing question:
As I understanding the concrete implementation of the algorithm relies on 2 things:
1) The range of input numbers is small (otherwise the intermediate array will be huge and with many gaps).
2) We actually know the range of numbers.
Taking that these 2 assumptions are correct (please correct me otherwise), I was wondering what is the best application domain that this algorithm applies to.
I mean specifically in Java, is an implementation like the following Java/Counting sort sufficient:
public static void countingSort(int[] a, int low, int high)
{
int[] counts = new int[high - low + 1]; // this will hold all possible values, from low to high
for (int x : a)
counts[x - low]++; // - low so the lowest possible value is always 0
int current = 0;
for (int i = 0; i < counts.length; i++)
{
Arrays.fill(a, current, current + counts[i], i + low); // fills counts[i] elements of value i + low in current
current += counts[i]; // leap forward by counts[i] steps
}
}
or it is not a trivial matter to come up with the high and low?
Is there a specific application in Java that counting sort is best suited for?
I assume there are subtleties like these otherwise why would anyone bother with all the O(nlogn) algorithms?
Algorithms are not about the language, so this is language-agnostic. As you have said - use counting sort when the domain is small. If you have only three numbers - 1, 2, 3 it is far better to sort them with counting sort, than a quicksort, heapsort or whatever which are O(nlogn). If you have a specific question, feel free to ask.
It is incorrect to say counting sort is O(n) in the general case, the "small range of element" is not a recommendation but a key assumption.
Counting sort assumes that each of the elements is an integer in the
range 1 to k, for some integer k. When k = O(n), the Counting-sort
runs in O(n) time.
In the general situation, the range of key k is independent to the number of element n and can be arbitrarily large. For example, the following array:
{1, 1000000000000000000000000000000000000000000000000000000000000000000000000000}
As for the exact break even point value of k and n where counting sort outperform traditional sort, it is hugely dependent on the implementation and best done via benchmarking (trying both out).
Eclipse 3.5 has a very nice feature to generate Java hashCode() functions. It would generate for example (slightly shortened:)
class HashTest {
int i;
int j;
public int hashCode() {
final int prime = 31;
int result = prime + i;
result = prime * result + j;
return result;
}
}
(If you have more attributes in the class, result = prime * result + attribute.hashCode(); is repeated for each additional attribute. For ints .hashCode() can be omitted.)
This seems fine but for the choice 31 for the prime. It is probably taken from the hashCode implementation of Java String, which was used for performance reasons that are long gone after the introduction of hardware multipliers. Here you have many hashcode collisions for small values of i and j: for example (0,0) and (-1,31) have the same value. I think that is a Bad Thing(TM), since small values occur often. For String.hashCode you'll also find many short strings with the same hashcode, for instance "Ca" and "DB". If you take a large prime, this problem disappears if you choose the prime right.
So my question: what is a good prime to choose? What criteria do you apply to find it?
This is meant as a general question - so I do not want to give a range for i and j. But I suppose in most applications relatively small values occur more often than large values. (If you have large values the choice of the prime is probably unimportant.) It might not make much of a difference, but a better choice is an easy and obvious way to improve this - so why not do it? Commons lang HashCodeBuilder also suggests curiously small values.
(Clarification: this is not a duplicate of Why does Java's hashCode() in String use 31 as a multiplier? since my question is not concerned with the history of the 31 in the JDK, but on what would be a better value in new code using the same basic template. None of the answers there try to answer that.)
I recommend using 92821. Here's why.
To give a meaningful answer to this you have to know something about the possible values of i and j. The only thing I can think of in general is, that in many cases small values will be more common than large values. (The odds of 15 appearing as a value in your program are much better than, say, 438281923.) So it seems a good idea to make the smallest hashcode collision as large as possible by choosing an appropriate prime. For 31 this rather bad - already for i=-1 and j=31 you have the same hash value as for i=0 and j=0.
Since this is interesting, I've written a little program that searched the whole int range for the best prime in this sense. That is, for each prime I searched for the minimum value of Math.abs(i) + Math.abs(j) over all values of i,j that have the same hashcode as 0,0, and then took the prime where this minimum value is as large as possible.
Drumroll: the best prime in this sense is 486187739 (with the smallest collision being i=-25486, j=67194). Nearly as good and much easier to remember is 92821 with the smallest collision being i=-46272 and j=46016.
If you give "small" another meaning and want to be the minimum of Math.sqrt(i*i+j*j) for the collision as large as possible, the results are a little different: the best would be 1322837333 with i=-6815 and j=70091, but my favourite 92821 (smallest collision -46272,46016) is again almost as good as the best value.
I do acknowledge that it is quite debatable whether these calculation make much sense in practice. But I do think that taking 92821 as prime makes much more sense than 31, unless you have good reasons not to.
Actually, if you take a prime so large that it comes close to INT_MAX, you have the same problem because of modulo arithmetic. If you expect to hash mostly strings of length 2, perhaps a prime near the square root of INT_MAX would be best, if the strings you hash are longer it doesn't matter so much and collisions are unavoidable anyway...
Collisions may not be such a big issue... The primary goal of the hash is to avoid using equals for 1:1 comparisons.
If you have an implementation where equals is "generally" extremely cheap for objects that have collided hashs, then this is not an issue (at all).
In the end, what is the best way of hashing depends on what you are comparing. In the case of an int pair (as in your example), using basic bitwise operators could be sufficient (as using & or ^).
You need to define your range for i and j. You could use a prime number for both.
public int hashCode() {
http://primes.utm.edu/curios/ ;)
return 97654321 * i ^ 12356789 * j;
}
I'd choose 7243. Large enough to avoid collissions with small numbers. Doesn't overflow to small numbers quickly.
I just want to point out that hashcode has nothing to do with prime.
In JDK implementation
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
I found if you replace 31 with 27, the result are very similar.