From my understanding, a 2 dimensional matrix that is used in mathematics can be created in Java using a 2 dimensional array. There are of course things you can do with real math matrixes such as adding and subtracting them. With Java however, you would need to write code to do that and there are libraries available that provide that functionality.
What I would like to know though is whether a Java array is even an optimal way of dealing with matrix data. I can think of cases where a 2d matrix has some of its indexes filled in while many are just left blank due to the nature of the data. For me this raises the question whether this is a waste of memory space, especially if the matrix is very large and has a lot of empty indexes.
Do specialized Java math libraries deal with matrixes differently and don't rely upon a 2d array? Or do they use a regular Java array as well and just live with wasted memory?
A few things:
Never create matrices of 2 dimensional arrays. It's always preferable to have 1 dimensional array with class accessors that take 2 parameters. The reason is purely for performance. When you allocate a contiguous chunk of memory, you give the CPU the chance to allocate the whole matrix in one memory page, which will minimize cache misses, and hence boost performance.
Matrices with many zeros are called sparse matrices. It's always a trade-off between using sparse matrices and having many zeros in your array.
A contiguous array will allow the compiler to use vector operations, such as SIMD, for addition, subtraction, etc.
A non-contiguous array will be faster if the relative number of zeros is really high, and you implement it in a smart way.
I don't believe java provides a library for sparse matrices, but I'm sure there's some out there. I'm a C++ dev, and I came to this question because I dealt with matrices a lot during my academic life. A famous C++ library with easy and high-level interface is Armadillo.
Hope this helps.
Related
I have around 1,700 vectors in the form of:
a*x + b*y + c*z
I need to store it inside a memory structure in Java. So far, my idea was either store the data:
Inside 2-dimensional arrays
Inside lists that hold arrays of 3 values
What would be the optimal move here ?
The best choice would be the one that you can prove the best. This means that when dealing with such questions you should profile different solutions and see which one is better with respect to your pattern of utilization of data.
I see multiple different choices:
have a class Coefficient { double a, b, c; }
use a List<double[]>
use a double[][]
Probably the worst choice would be to wrap them into Double objects since it would place a lot of overhead everywhere.
My guess is that double[][] should be slightly the more efficient because JVM has native instructions to manage array but you won't get the same performance benefit you'd get from other languages because a bidimensional array in Java is still an array of arrays, so it's not contiguous in memory.
Probably List<double[]> and double[][] behave in a quite similar way with respect to reading or updating the values but things may change if you have a lot of insertions or deletions (assuming you resize the list to correct size before adding elements).
In the end just profile the code and check the results.
I'm curious as to how one would take an unknown amount of, let's just say ints of user input, and store them in an array? I know ArrayLists was designed for this, but let's say we can't use them nor can we ask the user of how many times they plan on entering inputs.
There is a class ResizableDoubleArray in one the Statistic Package of Appache Commons.
The nice thing is that it uses primitives, instead of objects. This needs much less memory (interesting for application with huge arrays.
Similar example:
IntArrayList.java
They all work like ArrayList: if limit is reached, a new array is allocated which is e.g 30% larger. fast System.arraycopy is used to copy the array content to new array.
See also
Code design: performance vs maintainability
I am using the Mahout API within for a Naive Bayes Classifier. One of the functions is SparseVectorsFromSequenceFiles and although I have tried the old Google search, I still do not understanf what a sparse vector is.
The closest to an explanation I have is this site which didn't help me understand tbh.
Conceptually, vectors represent a generalization of arrays, i.e. data structures that allow arbitrary access to its elements using an index. Java's built-in arrays, Vector<T> and ArrayList<T> are examples of data structures implementing a "regular" (dense) vector concept.
Dense vectors provide constant-time access to its elements by translating a vector index into a memory address using a simple formula baseAddress + index * elementSize. This means that the size in memory is proportional to the largest index that the vector needs to support.
While this is acceptable in situations when the number of elements that you wish to put in a vector and the highest possible index are relatively close to each other. However, if you wish to use indexes from a wide range to index a relatively small number of elements (say, 1,000 elements scattered across a vector with 100,000 indexes) allocating 100,000 spaces is wasteful. You can save memory at the expense of CPU cycles by implementing a data structure that exposes the interface of a vector, but uses a smaller amount of memory for its internal representation.
The example at your link shows one possible implementation. Other implementations are possible, depending on the distribution of indexes in your data. If the indexes are distributed randomly, you could use a HashMap<Integer,T> as your backing storage for a sparse vector. If indexes are clustered together, you could split your index space by "pages", and allocate a real array only to pages that you need. This implementation would be similar to the way the physical memory is allocated to virtual memory space.
Is there a Java array library which supports slicing? I just need regular n x n' x n'' x ... arrays and either taking one slice from given dimension or whole dimension (i.e. no need for ranges).
Notes (read replies to potential comments):
I know that regular Java arrays are not supporting it and I'm not willing to write my own slicing library.
Using Collection (suggested in comment to other question) based shifts the problem
Using System.arraycopy does not help in high dimension as it does not lower the nesting of loops significantly
This is (sort of - long story) numerical problem so OO approach for inner code is not necessary the best one - the most usable abstraction boils down to slicing anyway
I would prefer R/W view from slice (if it will only be R/O copy I won't complain though)
EDIT: Unfortunaly I need to store objects inside array - not only double's.
Vectorz is a vector/matrix library supports slicing and is a good choice if you are doing numerical work with arrays of double values. It is specifically designed for vector/matrix maths in 3D modelling, gaining, simulation or machine learning contexts.
Advantages:
Very fast (everything backed by primitive doubles and double[] arrays)
100% Pure Java
Supports arbitrary slicing and dicing, mostly as O(1) operations (i.e. no data copying required)
Slices are fully read/write enabled, i.e. you can use them to modify the original structures
You can also join vectors together, take subvector views etc.
Specialised classes for numerical work, e.g. diagonal matrices etc.
It currently supports 0, 1 and 2 dimensional arrays, higher dimensional arrays are planned but not yet implemented.
What is a real life example of non-rectangular and N-dimensional array. I know you could use 3-dimensional for gaming but not sure when you're going to use more than that, also I never saw an example of non-rectangular arrays.
High-dimensional arrays (3D, 4D, etc.) often arise in the context of dynamic programming algorithms, in which they are used to store intermediate results in a larger computation so that an overall result can be computed. For sample, the Floyd-Warshall algorithm, when used to compute all-pairs shortest paths, uses a three-dimensional array to cache intermediate values as they are computed. The resulting 3D array is then used to read off the shortest paths between any two nodes in the graph.
Jagged arrays are sometimes used to represent upper-triangular matrices in matrix operations like the QR decomposition or in Gaussian elimination. They also form the basis of some data structures like the exponential array.
Hope this helps!