Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Using Java, I am trying to work my way through a list of problems like this:
Write a program that prints all prime numbers. (Note: if your programming language does not support arbitrary size numbers, printing all primes up to the largest number you can easily represent is fine too.)
Do they mean "all prime numbers up to n"? How can I know whether Java supports arbitrary size numbers or not, and if yes, what is it?
Java's primitives have well defined ranges. ints, for example, range between -231 and 231-1. You can see the full details in Java's tutorial. If you want to represent larger integers than the long primitive allows (i.e., 263-1), you'll have to resort to using the BigInteger class.
A program that printed all primes would be a non-halting program. There are infinitely many primes, so you could run the program forever.
The basic number types in most languages have a defined amount of memory space, and therefore a limited range of numbers they can express. For example, a Java int is stored in 4 bytes. That would make the non-halting primes program impossible, since you would eventually get to a prime larger than the largest number you could store.
As well as primitive number types, many languages have arbitrary size number types. The amount of memory they use will grow as you store larger numbers. Java has BigInteger for this purpose, while Python just stores all numbers that way by default.
Even these arbitrary-size numbers, though, are eventually limited by the amount of memory the program can access. In reality, it is impossible to print all the primes.
What the problem statement you are trying to solve is really saying is that they don't want you to bother about any of this! Just use the standard number type in your language. No system is capable of calculating infinitely many primes. What they want you to do is to write the algorithm that would calculate infinitely many primes given a system and data type capable of doing so.
Perhaps they could have worded it better. In short: don't worry about it.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
As a school assingment I have a really big for (it goes from 1 to a 13 digits number) with a few BigInteger operations inside. To reduce the loop I can skip all the even numbers and to avoid unnecessary BigInteger operations I can check for multiples of 3, 5, 7 and 11. Here is an exemple:
for(long i = min; i < max; i += 2){
if( i%3 != 0 && i%5 != 0 && i%7 != 0 && i%11 != 0){
BigInteger aux = new BigInteger(Long.toString(i));
BigInteger[] aux2 = k.divideAndRemainder(aux);
if(aux2[1].longValueExact() == 0){
list.add(aux);
list.add(aux2[0]);
}
}
Now, this loop would take months to finish, so I thought of breaking the for in multiple threads, each one covering a window of the original 13-digits-number.
My first question is: with a i7-3770 processor, how many threads could I have to be as fast as efficiently possible?
edit: the assingment is to otimize a problem that requires a lot of CPU. In this case, to find all divisors of a 23 digits number (the "k" in k.divideAndRemainder(aux)), thats why I'm using BigIntegers. The 13 digits number used by the for is the root of k.
Java is fine for multithreading. BigInteger is not exactly performant, but that's not the problem.
On that processor, up to 8 threads can utilize the cores completely. I doubt you run a custom OS that allows you to cede full CPU control to an application, so you want to keep at least 1 logical core for the OS and associated services, as well as 1 for Java misc threads.
If your program takes months to complete on a single thread, then using 4 threads efficiently will make it take weeks to complete. While that's a big speedup, I don't think that's quite enough for you, is it?
This is a school assignment, as you've said yourself. Just like with Project Euler, there are ways to do it, and there are ways to do it right. The first ones take days of computing on powerful machines after 10 minutes programming, the latter ones take days of thinking and seconds of computing. Your problem is not the hardware or language, it's the algorithm.
Now for the actual stuff that you can do better. From obvious to less so.
BigInteger aux = new BigInteger(Long.toString(i)); Are you mad? Why would you convert a long to a String and then parse it back to a number?! This is stupid slow. Just use BigInteger.valueOf(long val).
BigInteger is slower than primitives. Why do you need it if your values (up to 13 digits, as you've said yourself) fit comfortably in a long? If the only reason is the divideAndRemainder method - write one yourself! It's going to be 2 lines of code...
You seem to be looking for divisors of a particular number. (The filtering of multiples of 3,5,7,11 makes me think you're looking for prime ones and will do some filtering later) There is a number of speedups for such algorithms, main one being (and it's not clear whether you're using it) is making the maximum checked number the square root of the target, rather than the number itself, half of it, or whatever arbitrary bound people make up. Other ones can be found in the wiki or more complex wiki
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
In a past examination paper, there was a question that gave two methods to check if int[] A contained all the same values as int[] B (two unsorted arrays of size N), an engineer had to decide which implementation he would use.
The first method used a single for loop with a nested call to linear search; its asymptotic run time was calculated to be theta(n^2) and additional memory usage for creating copies of A and B is around (N*4N + 4N) bytes
The second method used an additional int[] C, that is a copy of B, which is sorted. A for loop is used with a nested binary search; it's asymptotic run time was calculated as theta(nlog(n^2)) and additional memory usage for creating copies of A and B is around (4N + 4N + N*4N + 4N) bytes (the first two 4N's are due to C being a copy of B, and the subsequent copy of C being created in the sort(C); function)
The final question asks which implementation the Engineer should use, I believe that a faster algorithm would be the better option, as for larger inputs the faster algorithm would cut the time of computation drastically, although the draw back is that with larger inputs, he runs the risk of an OutOfMemory Error, I understand one could state either method, depending on the size of the arrays, although my question is, in the majority of cases, which implementation is the better implementation?
The first alg. has complexity of theta (n^2) and the second of theta ( n log(n^2)).
Hence, the second one is much faster for n great enough.
As you mentioned, the memory usage comes into account and one would argue that the second one consumes a lot more memory but that is not true.
The first alg. consumes: n*4n+4n= 4n^2 + 4n
The second alg. consumes: 4n+4n+4n+n*4n=4n^2+12n
Suppose n is equal to 1000 then the first alg consumes 4004000 and the second 4012000 memory. So there is no big difference in memory consumption between the two algorithms.
So, from a memory consumption perspective it does not really matter which alg. you choose and in terms of complexity they both consumes theta (n^2) memory.
A relevant question to many a programmer, but the answer depends entirely on the context. If the software runs on a system with a little memory, then the algorithm with a smaller footprint will likely be better. On the other hand, a real-time system needs speed; so the faster algorithm would likely be better.
It is important to also recognize the more subtle issues that arise during execution, e.g. when the additional memory requirements of the faster algorithm force the system to use paging and, in turn, slow down execution.
Thus, it is important to understand in-context the benefit-cost trade off for each algorithm. And as always, emphasize code readability and sound algorithm design/integration.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm preparing for an interview. The question I got is: Two numbers are represented by a linked list, where each node contains a single digit.The digits are stored in reverse oder, such that the 1's digit is at the head of the list. Write a function that adds the two numbers and returns the sum as a linked list.
The suggested answer adds each digits individually and keeps a "carry" number. For example, if the first digits of the two numbers are "5" and "7". The algorithm records "2" in the first digit of resulting sum and keeps "1" as a "carry" to add to 10th digit of result.
However, my solution is to traverse the two linked lists and translate them into two integers. Then I add the numbers and translate sum to a new linked list. Wouldn't my solution be more straight forward and efficient?
Thanks!
While your solution may be more straightforward, I don't think it's actually more efficient.
I'm assuming the "correct" algorithm is something like this:
pop the first element of both lists
Add them together (with the carry if there is one) and make a new node using the ones digit
Pass the carry (dividing the sum by 10 to get the actual thing to carry) and repeat 1) and 2), with each successive node being pointed to by the previous one.
The main things I see when I'm comparing your algorithm with that one is:
In terms of memory, you want to create two BigIntegers to store intermediate values (I'm assuming you're using BigInteger or some equivalent Object to avoid the constraints of an int or long), on top of the final linked list itself. The original algorithm doesn't use more than a couple of ints to do intermediate calculations, and so in the long run, the original algorithm actually uses less memory.
You're also suggesting that you do all of your arithmetic using the BigIntegers, rather that in ints. Although it's possible that BigInteger is really optimized to the point where it isn't much slower than primitive operations, I highly doubt that calling BigInteger#add is faster than doing the + operation on ints.
Also, some more food for thought. Suppose you didn't have something handy like BigInteger to store arbitrarily large integers. Then you're going to have to have some way to store arbitrarily large integers for your algorithm to work properly. After that, you basically need a way to add arbitrarily large integers to add arbitrarily large integers, and you end up with a problem where you either have to do something like the original algorithm anyway, or you end up using a completely different representation in a subroutine (yikes).
(Assuming by "integer" you mean int.)
Your solution does not scale beyond numbers that can fit in an int, whereas the original solution is only limited by the amount of available memory.
As far as efficiency is concerned, there is nothing about your solution that would make it more efficient than the original.
Your solution is more straightforward to describe, certainly - and an argument might be made in certain situations that the readability of the code your solution would produce would be preferable, when working with large teams.
However, most of the time - their suggested answer is a lot more memory efficient, and probably more CPU-efficient.
You're suggesting going through the first linked-list, storing it as a number (+1 store). Going through the second, storing it as a number (+1 store). Adding the 2 numbers, saving the result (+1 store). Converting this number into a linked list, and saving it (+1 store)
Their solution involves going through the first and second linked list, while writing to the third, and storing it as a new one (+1 store)
This is a +1 store, vs. your +4 store. This might seem like not much, but if we were to try and add n pairs of numbers at the same time (on a distributed system or something), you're looking at 4n stores, rather than just n stores. Which could be a big deal.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have a Rectangle class. It has a length, breadth and area (all ints). I want to hash it such that every rectangle with the same length and breadth hashes to the same value. What is a way to do this?
EDIT: I understand its a broad question. Thats why I asked for "a" way to do it. Not the best way.
A good and simple scheme is to calculate the hash for a pair of integers as follows:
hash = length * CONSTANT + width
Empirically, you will get best results (i.e. the fewest number collisions) if CONSTANT is a prime number. A lot of people1 recommend a value like 31, but the best choice depends on the most likely range of the length and width value. If they are strictly bounded, and small enough, then you could do better than 31.
However, 31 is probably good enough for practical purposes2. A few collisions at this level is unlikely to make a significant performance difference, and even a perfect hashing function does not eliminate collisions at the hash table level ... where you use the modulus of the hash value.
1 - I'm not sure where this number comes from, or whether there are empirical studies to back it up ... in the general case. I suspect it comes from hashing of (ASCII) strings. But 31 is prime ... and it is a Mersenne prime (2^7 - 1) which means it could be computed using a shift and a subtraction if hardware multiple is slow.
2 - I'm excluding cases where you need to worry about someone deliberately creating hash function collisions in an attempt to "break" something.
You can use the Apache Commons library, which has a HashCodeBuilder class. Assuming you have a Rectangle class with a width and a height, you can add the following method:
#Override
public int hashCode(){
return new HashCodeBuilder().append(width).append(height).append(children).toHashCode();
}
What you want (as clarified in your comment on the question) is not possible. There are N possible hashCodes, one for each int, where N is approximately 4.2 billion. Assuming rectangles must have positive dimensions, there are ((N * N) / 4) possible rectangles. How do you propose to make them fit into N hashCodes? When N is > 4, you have more possible rectangles than hashCodes.
This question already has answers here:
Java array with more than 4gb elements
(11 answers)
Closed 8 years ago.
I was trying to get all primes before 600851475143.
I was using Sieve of Eratosthenes for this.
This requires me to create a boolean array of that huge size.
Bad idea, you can run out of memory.
Any other way. I tried using a string, using each index with values 0 & 1 to represent true or false. but indexOf method too returns int.
Next i am using 2d array for my problem.
Any other better way to store such a huge array?
The memory requirement for 600851475143 booleans is at best 70Gb. This isn't feasible. You need to either use compression as suggested by Stephan, or find a different algorithm for calculating the primes.
I had a similar problem and i used a bit set (basically set 1 or 0 for the desired offset in order) and i recomend using EWAHCompressedBitmap it will also compress your bit set
EDIT
As Alan said the BitSet will occupy 70GB of memory but you can do another thing : to have multiple BitSets (consecutive ones so that you can calculate the absolute position) and load in memory just the BitSet that you need in that moment something like a lazy load, in this case you will have control of the memory used.
Its not really practical to remember for each number if it was a prime or not for such a large amount (the sieve is a very slow approach for large numbers in general).
From this link you get an idea how many primes there are to be expected smaller than X. For your 600 billion range you can expect roughly 20 billion primes to exist within that range. Storing them as long[] would require about 160GB of memory... that notably more than the suggested 70GB for storing a single bit for each number, half if you exclude even numbers (2 is the only even prime).
For a desktop computer 35GB in memory may be a bit much, but a good workstation can have that much RAM. I would try a two-dimensional array with bit shifting/masking.
I still would expect your sieve code to run a considerable amount of time (something from days to years). I suggest you investigate more advanced prime detection methods than sieve.
You could use HotSpot's internal sun.misc.Unsafe API to allocate a bigger array. I wrote a blogpost how to simulate an array with it However, it's not an official Java API, so it qualifies as a hack.
Use BitSet. You can then set bit any index element. 600851475143 is 2^39 thus taking only 39 bits internally (actually in reality it will occupy 64 bits as it uses long).
You can infact move upto 2^63 which is massive for most purposes