From my understanding, a 2 dimensional matrix that is used in mathematics can be created in Java using a 2 dimensional array. There are of course things you can do with real math matrixes such as adding and subtracting them. With Java however, you would need to write code to do that and there are libraries available that provide that functionality.
What I would like to know though is whether a Java array is even an optimal way of dealing with matrix data. I can think of cases where a 2d matrix has some of its indexes filled in while many are just left blank due to the nature of the data. For me this raises the question whether this is a waste of memory space, especially if the matrix is very large and has a lot of empty indexes.
Do specialized Java math libraries deal with matrixes differently and don't rely upon a 2d array? Or do they use a regular Java array as well and just live with wasted memory?
A few things:
Never create matrices of 2 dimensional arrays. It's always preferable to have 1 dimensional array with class accessors that take 2 parameters. The reason is purely for performance. When you allocate a contiguous chunk of memory, you give the CPU the chance to allocate the whole matrix in one memory page, which will minimize cache misses, and hence boost performance.
Matrices with many zeros are called sparse matrices. It's always a trade-off between using sparse matrices and having many zeros in your array.
A contiguous array will allow the compiler to use vector operations, such as SIMD, for addition, subtraction, etc.
A non-contiguous array will be faster if the relative number of zeros is really high, and you implement it in a smart way.
I don't believe java provides a library for sparse matrices, but I'm sure there's some out there. I'm a C++ dev, and I came to this question because I dealt with matrices a lot during my academic life. A famous C++ library with easy and high-level interface is Armadillo.
Hope this helps.
I have around 1,700 vectors in the form of:
a*x + b*y + c*z
I need to store it inside a memory structure in Java. So far, my idea was either store the data:
Inside 2-dimensional arrays
Inside lists that hold arrays of 3 values
What would be the optimal move here ?
The best choice would be the one that you can prove the best. This means that when dealing with such questions you should profile different solutions and see which one is better with respect to your pattern of utilization of data.
I see multiple different choices:
have a class Coefficient { double a, b, c; }
use a List<double[]>
use a double[][]
Probably the worst choice would be to wrap them into Double objects since it would place a lot of overhead everywhere.
My guess is that double[][] should be slightly the more efficient because JVM has native instructions to manage array but you won't get the same performance benefit you'd get from other languages because a bidimensional array in Java is still an array of arrays, so it's not contiguous in memory.
Probably List<double[]> and double[][] behave in a quite similar way with respect to reading or updating the values but things may change if you have a lot of insertions or deletions (assuming you resize the list to correct size before adding elements).
In the end just profile the code and check the results.
I am using the Mahout API within for a Naive Bayes Classifier. One of the functions is SparseVectorsFromSequenceFiles and although I have tried the old Google search, I still do not understanf what a sparse vector is.
The closest to an explanation I have is this site which didn't help me understand tbh.
Conceptually, vectors represent a generalization of arrays, i.e. data structures that allow arbitrary access to its elements using an index. Java's built-in arrays, Vector<T> and ArrayList<T> are examples of data structures implementing a "regular" (dense) vector concept.
Dense vectors provide constant-time access to its elements by translating a vector index into a memory address using a simple formula baseAddress + index * elementSize. This means that the size in memory is proportional to the largest index that the vector needs to support.
While this is acceptable in situations when the number of elements that you wish to put in a vector and the highest possible index are relatively close to each other. However, if you wish to use indexes from a wide range to index a relatively small number of elements (say, 1,000 elements scattered across a vector with 100,000 indexes) allocating 100,000 spaces is wasteful. You can save memory at the expense of CPU cycles by implementing a data structure that exposes the interface of a vector, but uses a smaller amount of memory for its internal representation.
The example at your link shows one possible implementation. Other implementations are possible, depending on the distribution of indexes in your data. If the indexes are distributed randomly, you could use a HashMap<Integer,T> as your backing storage for a sparse vector. If indexes are clustered together, you could split your index space by "pages", and allocate a real array only to pages that you need. This implementation would be similar to the way the physical memory is allocated to virtual memory space.
I'm looking to do some some element-wise operations (addition, multiplication, sqrt, etc.) on floating point arrays that are ~800x300 elements in size.
How much of a speedup (if any) would I get from doing this with matrix libraries (JAMA, EJML, etc.) over just doing the element-wise operations in for loops?
For loops look more appealing because my equations can get kind of complicated, and for loops would mean I could keep all my equations as is -- in plain old infix notation. Since java doesn't support operator overloading, using a matrix library wouldn't be as simple. So, I only want to use a matrix library if it's going to mean a real speedup. (Speed will be important here.)
I would suggest you to use some of the matrix libraries for that. In most cases it should run as fast as simple for loops. But it also can run faster. So, what you will get for free: API & the equal or better perfromance. It also saves a bit of your time while writing element-wise operations.
As the author of la4j library I can say that using third-party library gives you an opportunity to get faster and faster code from new releases. For example. You can choise la4j for you needs. It is currenlty (version 0.4.0-0.4.5) uses simple for loops calculations for element-wise operations. So, it won't be faster then hand-written code. But, I'm now on the middle of developing a new parallel engine for la4j, that allows to run a code in parallel mode without any significant changes in API. Like this:
Matrix a = new Basic2DMatrix(...); // simple 2D array matrix
Matrix b = new Basic2DMatrix(...); // that is too
Matrix c = a.multiply(b); // a * b in sequental mode
Matrix c = a.par().multiply(b); // a * b in parallel mode
So, all you need to do is change a one piece of the code. All these advantages you'll get for free with libraries like la4j. Just let the libraries do their job and spend your solving real problems.
As an optional assignment, I'm thinking about writing my own implementation of the BigInteger class, where I will provide my own methods for addition, subtraction, multiplication, etc.
This will be for arbitrarily long integer numbers, even hundreds of digits long.
While doing the math on these numbers, digit by digit isn't hard, what do you think the best datastructure would be to represent my "BigInteger"?
At first I was considering using an Array but then I was thinking I could still potentially overflow (run out of array slots) after a large add or multiplication. Would this be a good case to use a linked list, since I can tack on digits with O(1) time complexity?
Is there some other data-structure that would be even better suited than a linked list? Should the type that my data-structure holds be the smallest possible integer type I have available to me?
Also, should I be careful about how I store my "carry" variable? Should it, itself, be of my "BigInteger" type?
Check out the book C Interfaces and Implementations by David R. Hanson. It has 2 chapters on the subject, covering the vector structure, word size and many other issues you are likely to encounter.
It's written for C, but most of it is applicable to C++ and/or Java. And if you use C++ it will be a bit simpler because you can use something like std::vector to manage the array allocation for you.
Always use the smallest int type that will do the job you need (bytes). A linked list should work well, since you won't have to worry about overflowing.
If you use binary trees (whose leaves are ints), you get all the advantages of the linked list (unbounded number of digits, etc) with simpler divide-and-conquer algorithms. You do not have in this case a single base but many depending the level at which you're working.
If you do this, you need to use a BigInteger for the carry. You may consider it an advantage of the "linked list of ints" approach that the carry can always be represented as an int (and this is true for any base, not just for base 10 as most answers seem to assume that you should use... In any base, the carry is always a single digit)
I might as well say it: it would be a terrible waste to use base 10 when you can use 2^30 or 2^31.
Accessing elements of linked lists is slow. I think arrays are the way to go, with lots of bound checking and run time array resizing as needed.
Clarification: Traversing a linked list and traversing an array are both O(n) operations. But traversing a linked list requires deferencing a pointer at each step. Just because two algorithms both have the same complexity it doesn't mean that they both take the same time to run. The overhead of allocating and deallocating n nodes in a linked list will also be much heavier than memory management of a single array of size n, even if the array has to be resized a few times.
Wow, there are some… interesting answers here. I'd recommend reading a book rather than try to sort through all this contradictory advice.
That said, C/C++ is also ill-suited to this task. Big-integer is a kind of extended-precision math. Most CPUs provide instructions to handle extended-precision math at comparable or same speed (bits per instruction) as normal math. When you add 2^32+2^32, the answer is 0… but there is also a special carry output from the processor's ALU which a program can read and use.
C++ cannot access that flag, and there's no way in C either. You have to use assembler.
Just to satisfy curiosity, you can use the standard Boolean arithmetic to recover carry bits etc. But you will be much better off downloading an existing library.
I would say an array of ints.
An Array is indeed a natural fit. I think it is acceptable to throw OverflowException, when you run out of place in your memory. The teacher will see attention to detail.
A multiplication roughly doubles digit numbers, addition increases it by at most 1. It is easy to create a sufficiently big Array to store the result of your operation.
Carry is at most a one-digit long number in multiplication (9*9 = 1, carry 8). A single int will do.
std::vector<bool> or std::vector<unsigned int> is probably what you want. You will have to push_back() or resize() on them as you need more space for multiplies, etc. Also, remember to push_back the correct sign bits if you're using two-compliment.
i would say a std::vector of char (since it has to hold only 0-9) (if you plan to work in BCD)
If not BCD then use vector of int (you didnt make it clear)
Much less space overhead that link list
And all advice says 'use vector unless you have a good reason not too'
As a rule of thumb, use std::vector instead of std::list, unless you need to insert elements in the middle of the sequence very often. Vectors tend to be faster, since they are stored contiguously and thus benefit from better spatial locality (a major performance factor on modern platforms).
Make sure you use elements that are natural for the platform. If you want to be platform independent, use long. Remember that unless you have some special compiler intrinsics available, you'll need a type at least twice as large to perform multiplication.
I don't understand why you'd want carry to be a big integer. Carry is a single bit for addition and element-sized for multiplication.
Make sure you read Knuth's Art of Computer Programming, algorithms pertaining to arbitrary precision arithmetic are described there to a great extent.