library for integer factorization in java or scala

library for integer factorization in java or scala - java

There are a lot of questions about how to implement factorization, however for production use, I would rather use an open source library to get something efficient and well tested right away.
The method I am looking for looks like this:
static int[] getPrimeFactors(int n)
it would return {2,2,3} for n=12
A library may also have an overload for handling long or even BigInteger types
The question is not about a particular application, it is about having a library which handles well this problem. Many people argue that different implementations are needed depending on the range of the numbers, in this regard, I would expect that the library select the most reasonable method at runtime.
By efficient I don't mean "world fastest" (I would not work on the JVM for that...), I just mean dealing with int and long range within a second rather than a hour.

It depends what you want to do. If your needs are modest (say, you want to solve Project Euler problems), a simple implementation of Pollard's rho algorithm will find factors up to ten or twelve digits instantly; if that's what you want, let me know, and I can post some code. If you want a more powerful factoring program that's written in Java, you can look at the source code behind Dario Alpern's applet; I don't know about a test suite, and it's really not designed with an open api, but it does have lots of users and is well tested. Most of the heavy-duty open-source factoring programs are written in C or C++ and use the GMP big-integer library, but you may be able to access them via your language's foreign function interface; look for names like gmp-ecm, msieve, pari or yafu. If those don't satisfy you, a good place to ask for more help is the Mersenne Forum.

If you want to solve your problem, rather than get what you are asking for, you want a table. You can precompute it using silly slow methods, store it, and then look up the factors for any number in microseconds. In particular, you want a table where the smallest factor is listed in an index corresponding to the number--much more memory efficient if you use trial division to remove a few of the smallest primes--and then walk your way down the table until you hit a 1 (meaning no more divisors; what you have left is prime). This will take only two bytes per table entry, which means you can store everything on any modern machine more hefty than a smartphone.
I can demonstrate how to create this if you're interested, and show how to check that it is correct with greater reliability than you could hope to achieve with an active community and unit tests of a complex algorithm (unless you ran the algorithm to generate this table and verified that it was all ok).

I need them for testing if a polynomial is primitive or not.
This is faster than trying to find the factors of all the numbers.
public static boolean gcdIsOne(int[] nums) {
int smallest = Integer.MAX_VALUE;
for (int num : nums) {
if (num > 0 && smallest < num)
smallest = num;
}
OUTER:
for (int i = 2; i * i <= smallest; i = (i == 2 ? 3 : i + 2)) {
for (int num : nums) {
if (num % i != 0)
continue OUTER;
}
return false;
}
return true;
}

I tried this function in scala. Here is my result:
def getPrimeFactores(i: Int) = {
def loop(i: Int, mod: Int, primes: List[Int]): List[Int] = {
if (i < 2) primes // might be i == 1 as well and means we are done
else {
if (i % mod == 0) loop(i / mod, mod, mod :: primes)
else loop(i, mod + 1, primes)
}
}
loop(i, 2, Nil).reverse
}
I tried it to be as much functional as possible.
if (i % mod == 0) loop(i / mod, mod, mod :: primes) checks if we found a divisor. If we did we add it to primes and divide i by mod.
If we did not find a new divisor, we just increase the divisor.
loop(i, 2, Nil).reverse initializes the function and orders the result increasingly.

Related

Integer range in square root algorithm of Cracking the Code book

There is an algorithm in java for square root in cracking the code book as below:
int sqrt(int n) {
return sqrt_helper(n, 1, n);
}
int sqrt_helper(int n, int min, int max) {
if (max < min) return -1;
int guess = (min + max) / 2·,
if (guess *guess == n) {
return guess;
} else if (guess * guess < n) {
return sqrt_helper(n, guess + 1, max);
} else {
return sqrt_helper(n, min, guess - l);
}
}
The question is:
As min and max are integer, they can have any values in the range, i.e max = Integer.MAX_VALUE
So how not be worry about guess = (min + max) / 2 as it will cross the allowed range, or guess *guess also.

There are simple ways of getting around that problem (like min + (max - min) / 2).
The more serious integer overflow problem is guess * guess. You could change the test to compare guess with n / guess, which is slower but normally won't overflow. Or you could use a bit hack to find a better starting point (clz is useful here, if you have it), since you should be able to find a guess whose square is within the range of representable integers.
An interviewer might be even more impressed if you were able to provide the Newton-Raphson algorithm, which converges extremely rapidly.

Since you mention "Cracking the Coding Interview"...
Typically in the context of the average coding interview one wouldn't worry about implementation-specific details like this. The interviewer is trying to confirm basic competency and understanding - they'll rarely want to run your code, and it should be a given for both of you that the algorithm will break down at the extreme limits of your language's basic data types. If the interviewer asks specifically about limitations, then you could briefly mention that the function will fail for values higher than (Integer.MAX_VALUE / 2) in this language.
The limitation will apply to almost any algorithm you write for a coding interview, and no reasonable interviewer would expect you to specifically design your solution to mitigate this kind of edge case. I would find it extremely off-putting if I asked a candidate to write a function that produces Fibonacci numbers and they spent time trying to optimize the case where the output exceeds 16 digit values.
If for some reason you needed to find the square root of extremely large values using this algorithm in a real life scenario, I'd expect you'd to have to implement it using a generic big number library for your particular language. That being said, I wouldn't roll my own square root algorithm for any real use case under almost any circumstance.

You'd have to confirm your language. But in pseudo-code you can do something like:
int guess = ((min.ToBigInt() + max.ToBigInt()) / 2).ToInt()

Sum to determine the largest multiple of 5 under 1,000

I am currently trying to get a feature working in a Java application I am making, however I'm uncertain how to implement this in a single line.
I know that I could do something along the lines of (not exactly, but roughly):
while(i<995){
i=i+5
}
However I am eager to implement this all into one line, such as in a single
static int highestMult = *the equation*
I would not be using this specifically for the highest multiple of 5 in 1,000, however upon my own research I could not find a desired solution for this specific case, therefore this is an example.
The examples I have previously found all, generally, relate to finding only a highest multiple, not putting together the highest multiple, and a limit.
If this is not knowledge from the back of your head, it'd also be a great help just to understand the logic behind how you came up with the solution, it could save me being stuck on similar issues in the future.
Thanks,

If c is the under number (1000 in your case), and m the multiple (5 in your case), then
((c - 1) / m) * m
is one way. (Note to purists: you don't actually need the outer parentheses but I include them for clarity).
Here I'm exploiting integer arithmetic to force the truncation of ((c - 1) / m) to the flooring integer. Multiplication of this result by m means the final value is a multiple of m. Make sure that c and m are integral types or this will not work (unless you cast explicitly which is not as elegant).
This is undefined for c < 1 and m < 1

try:
int number=5;
int limit=999;
int i=limit-(limit%number);
where 999 is limit - 1
% is reminder
(999%5)=4
if we remove the reminder from limit, we will got it:
999-4=995
we could use limit=1000, but result could be 1000 too
reminder is very useful thing for programming :D
defined for number > 0 and limit >=0

Combinatorics algorithm parallelization

I'm writing the program which is calculates C(n, k) combinations and have big difference between n and k (e. g. n=39, k=13 -> 8122425444 combinations). Also, I need to make some calculations with every combination in realtime. The question is how can I divide my algorithm to several threads to make it faster?
public void getCombinations(List<Item> items) {
int n = items.size();
int k = 13;
int[] res = new int[k];
for (int i = 1; i <= k; i++) {
res[i - 1] = i;
}
int p = k;
while (p >= 1) {
//here I make a Set from items in List by ids in res[]
Set<Item> cards = convert(res, items);
//some calculations
if (res[k - 1] == n) {
p--;
} else {
p = k;
}
if (p >= 1) {
for (int i = k; i >= p; i--) {
res[i - 1] = res[p - 1] + i - p + 1;
}
}
}
}
private Set<Item> convert(int[] res, List<Item> items) {
Set<Item> set = new TreeSet<Item>();
for (int i : res) {
set.add(items.get(i - 1));
}
return set;
}

If you're using JDK 7 then you could use fork/join to divide and conquer this algorithm.
If you want to keep things simple then I would just get a thread to compute a subset of the input and use a CountDownLatch until all threads have completed. The number of threads depends on your CPU.
You could also use Hadoop's map/reduce if you think the input will grow so you can compute on several computers. You will need to normalise it as a map/reduce operation - but look at examples.

The simplest way to split combinations is to have combinations of combinations. ;)
For each possible "first" value you can create a new task in a thread pool. Or you can create each possible pair of "first" and "second" in as a new task. or three etc. You only need to create as many tasks as you have cpus, so you don't need to go over board.
e.g. say you want to create all possible selections of 13 from 39 items.
for(Item item: items) {
List<Item> items2 = new ArrayList<Item>(items);
items2.remove(item);
// create a task which considers all selections of 12 from 38 (plus item)
createCombinationsOf(item, item2, 12);
}
This creates roughly equal work for 39 cpus which may be more than enough. If you want more create pairs (39*38/2) of those.

Your question is quite vague.
What problem are you having right now? Implementing the divide and conquer part of the algorithm (threading, joining, etc), or figuring out how to divide a problem into it's sub-parts.
The later should be your first step. Do you know how to break your original problem into several smaller problems (that can then be dispatched to Executor threads or a similar mechanism to be processed), and how to join the results?

I have been working on some code that works with combinatoric sets of this size. Here are a few suggestions for getting output in a reasonable amount of time.
Instead of building a list of combinations and then processing them, write your program to take a rank for a combination. You can safely assign signed 64 bit long values to each combination for all k values up to n = 66. This will let you easily break up the number system and assign it to different threads/hardware.
If your computation is simple, you should look at using OpenCL or CUDA to do the work. There are a couple of options for doing this. Rootbeer and Aparapi are options for staying in Java and letting a library take care of the GPU details. JavaCL is a nice binding to OpenCL, if you do not mind writing kernels directly in C99. AWS has GPU instance for doing this type of work.
If you are going to collect a result for each combination, you are really going to need to consider storage space. For your example of C(39,13), you would need a little under 61 Gigs just to store a long for each combination. You need a good strategy for dealing with datasets of this size.
If you are trying to roll up this data into a simple result for the entire set of combinations, then follow #algolicious' suggestion and look at map/reduce to solve this problem.
If you really need answers for each combination, but a little error is OK, you may want to look at using AI algorithms or a linear solver to compress the data. Be aware that these techniques will only work if there is something to learn in the resulting data.
If some error will not work, but you need every answer, you may want to just consider recomputing it each time you need it, based on the element's rank.

Dynamic Java integer/long overflow checking versus performance

This is a rather theoretical question, so while the language is specifically Java, any general solution will suffice.
Suppose I wanted to write a trivial factorial function:
long factorial(int n)
{
//handle special cases like negatives, etc.
long p = 1;
for(int i = 1; i <= n; i++)
{
p = p * n;
}
return p;
}
But now, I also want to check if the factorial overflows (without simply hard coding a MAX_FACTORIAL_PARAMETER or something of the like). In general checking for overflow during multiplication is as simple as checking the result against the original inputs, but in this case, since overflow can occur at any point, it would be rather expensive to perform more divisions and comparisons in every single loop.
The question then, is twofold--is there any way to solve the factorial problem of overflow without checking for multiplication overflow at every step or hard coding a maximum allowed parameter?
And in general, how should I approach problems that involve many stages of iteration/recursion that could silently fail at every stage without compromising performance by introducing expensive checks at each?

While Java does not help you with this, there certainly are languages that help with overflow. For example, C# provides the checked keyword. Underneath, this feature probably uses hardware support in the form of the overflow flag.

It depends on your specific problem. Maybe you can do curve sketching or some other mathematical analytic.
In your given example it would be the best just to check at every loop. Its not that time consuming, as it won't even modify your complexity class (because O(n) = O(n+n) = O(2n) = O(n)). In most cases it would also be the best thing to do a simple check, as it keeps your code clean and maintainable.

In this specific case the simplest thing to do is hard code the maximum 'n' value. If speed was important to you, you would store all possible values in an array (there aren't many) and not calculate anything. ;)
If you want to improve the method use long result (not much better but a simple change) or use BigInteger which doesn't have an overflow problem. ;)

Is there any way to solve the factorial problem of overflow without checking for multiplication overflow at every step or hard coding a maximum allowed parameter?
No.
And in general, how should I approach problems that involve many stages of iteration/recursion that could silently fail at every stage without compromising performance by introducing expensive checks at each?
I don't think that you can in Java.
Some machine instruction sets set an 'oveflow' bit if the previous integer arithmetic operation overflowed, but most programming languages don't provide a way to make use of it. C# is an exception, as is (IIRC) Ada.

Since Java 8, Java has the Math.multiplyExact() method. This checks internally for an overflow. Since the method is an 'intrinsic' (see here for a list), the Java implementation will be replaced by dedicated low level machine instructions if possible, possibly just checking a CPU overflow flag, thus making the check quite fast.
Btw, note that this seems to be much faster on JDK 1.8.0_40 compared to JDK 1.0.8_05.

In Java, overflow should give you a negative result so you could probably use that as a check:
long factorial(int n)
{
//handle special cases like negatives, etc.
long p = 1;
for(int i = 1; i <= n; i++)
{
p = p * n;
}
if (p < 0)
{
throw new ArithmeticException("factorial: Overflow");
}
return p;
}
Alternatively, for languages with positive overflow, since n! > (n - 1)! you could try:
long factorial(int n)
{
//handle special cases like negatives, etc.
if (n == 1 || n == 2) { return n; }
// Calculate (n-1)!
long p = 2;
for(int i = 3; i < n; i++) // Note changes to loop start and end.
{
p = p * n;
}
long previous = p; // (n-1)!
p = p * n; // Calculate n!
if (p < previous)
{
throw new ArithmeticException("factorial: Overflow");
}
return p;
}
Those methods only require one check per factorial.

Which is the best way to implement prime number finding algorithms in Java? How do we make library classes and use then in Java?

I want to make library classes in Java and use them in my future programs. I want these library classes to find prime numbers upto a certain number or even the next prime number or you can say solve most of the basic things related to prime numbers.
I have never made a Java Library Class. I aim to learn that doing this. Please help me without that by pointing out a tutorial or something. I am familiar with netbeans IDE.
I found out a few algorithms like Sieve of Eratosthenes and Sieve of Atkin. It would be great if you can point out a few more such efficient algorithms. I don't want them to be the best but at least good enough. My aim is to learn few things by implementing them. Because I have little practical coding experience I want to do it to improve my skills.
My friend suggested me to use Stream Classes and he was talking something about implementing it by giving the output of one file as an input to another to make my code clean. I didn't understand him very well. Please pardon me if i said anything wrong. What I want to ask in this point is, is that an efficient and OO way of doing what i want to do. If yes please tell me how to do that and if not please point out some other way to do it.
I have basic knowledge of the Java language. What I want to accomplish through this venture is gain coding experience because that is what everyone out here suggested, "to take up small things like these and learn on my own"
thanks to all of you in advance
regards
shahensha
EDIT:
In the Sieve of Eratosthenes and others we are required to store the numbers from 2 to n in a data structure. Where should I store it? I know I can use a dynamic Collection, but just a small question...If i want to find primes in the order of billions or even more (I will use Big Integer no doubt), but all this will get stored in the heap right? Is there a fear of overflow? Even if it doesn't will it be a good practice? Or would it be better to store the numbers or the list (on which we will perform actions depending on the algorithm we use) in a file and access it there? Sorry if my question was too noobish...

"Sieve of Eratosthenes" is good algorithm to find the prime numbers. If you will use google you can find ready implementation in java.

I'll add some thoughts to this:
There's nothing technically different about a Library Class, it's simply how you use it. To my mind, the most important thing is that you think hard about your public API. Make it bit enough to be useful to your prospective callers, keep it small enough that you have freedom to change the internal implementation as you see fit, and ensure that you have a good understanding of what your library does do and what it doesn't do. Don't try to do everything, just do one thing well. (And the API generally extends to documentation too, make sure you write decent Javadocs.)
Start with either of these as they are fine. If you design your API well, you can change this at any time and roll out version 1.1 that uses a different algorithm (or even uses JNI to call a native C library), and your callers can just drop in the new JAR and use your code without even recompiling. Don't forget that premature optimisation is the root of all evil; don't worry to much about making your first version fast, but focus on making it correct and making it clean.
I'm not sure why your friend was suggesting streams. Streams are a way of dealing with input and output of raw bytes - useful when reading from files or network connections, but generally not a good way to call another Java method. Your library shouldn't worry about input and output, it just needs to offer some methods for numerical calculations. So you should implement methods that take integers (or whatever is appropriate) and return integers.
For instance, you might implement:
/**
* Calculates the next prime number after a given point.
*
* Implementation detail: callers may assume that prime numbers are
* calculated deterministically, such that the efficiency of calling
* this method with a large parameter is not dramatically worse than
* calling it with a small parameter.
*
* #param x The lower bound (exclusive) of the prime number to return.
* Must be strictly positive.
* #return Colloquially, the "next" prime number after the given parameter.
* More formally, this number will be prime and there are no prime numbers
* less than this value and greater than <code>x</code> that are also
* prime.
* #throws IllegalArgumentException if <code>x</code> is not strictly
* positive.
*/
public long smallestPrimeGreaterThan(long x);
/**
* Returns all prime numbers within a given range, in order.
*
* #param lowerBound The lower bound (exclusive) of the range.
* #param upperBound The upper bound (exclusive) of the range.
* #return A List of the prime numbers that are strictly between the
* given parameters. This list is in ascending order. The returned
* value is never null; if no prime numbers exist within the given
* range, then an empty list is returned.
*/
public List<Long> primeNumbersBetween(long lowerBound, long upperBound);
No streams in sight! Uses of streams, such as outputting to the console, should be handled by applications that use your library and not by your library itself. This is what I meant in my first point about being clear of what your library does and doesn't do. You just generate the prime numbers; it's up to the caller to then do something cool with them.

But when you compare, the sieve of Atkin is faster than the sieve of Eratosthenes:
http://en.wikipedia.org/wiki/Prime_number_counting_function Also refer to this link where different functions are explained clearly :)
Good luck..

There is no such thing as "library class". I suppose you mean to make a class in such a way that does it's job in a reusable way. The way to do this is to have a clean interface - with minimal (if any) bindings to other libraries or to your execution environment (your main class etc).
The two you mention are "good enough". For your purpose you don't need to look any further.
Just read from System.in and write to System.out and that's it. Though, in your case, there is nothing to read.
To achieve what I think is your goal, you need to write a main class that hadles the execution environment - main function, initialize your algorithm, iteratively look for the next prime, and write it to System.out. Of course, you'll need another class to implement the algorithm. It should contain the internal state and provide a method for finding the next prime.

`IMO, keep aside the thought that you're making a library (.jar file according to my interpretation of this question).
Focus on creating a simple Java class first, like this:
//SieveOfEratosthenes.java
public class PrimeSieve{
public static void main(String args[])
{
int N = Integer.parseInt(args[0]);
// initially assume all integers are prime
boolean[] isPrime = new boolean[N + 1];
for (int i = 2; i <= N; i++) {
isPrime[i] = true;
}
// mark non-primes <= N using Sieve of Eratosthenes
for (int i = 2; i*i <= N; i++) {
// if i is prime, then mark multiples of i as nonprime
// suffices to consider mutiples i, i+1, ..., N/i
if (isPrime[i]) {
for (int j = i; i*j <= N; j++) {
isPrime[i*j] = false;
}
}
}
// count primes
int primes = 0;
for (int i = 2; i <= N; i++) {
if (isPrime[i]) primes++;
}
System.out.println("The number of primes <= " + N + " is " + primes);
}
}
Now, the next step; Implementing it for larger values, you can always use BigInteger. SO questions pertaining to the same:
Java BigInteger Prime numbers
Problems with java.math.BigInteger
BigNums Implementation
Try reading all questions related to BigInteger class on SO, BigInteger Tagged questions.
Hope this helps.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.