Aye it's been done a million times before, but damnit I want to do it again. I'm writing a simple Matrix Library for C++ with the intention of doing it right. I've come across something that's fairly obvious in mathematics, but not so obvious to a strongly typed system -- the fact that a 1x1 matrix is just a number. To avoid this, I started walking down the hairy path of matrices as a composition of vectors, but also stumbled upon the fact that two vectors multiplied together could either be a number or a dyad, depending on the orientation of the two.
My question is, what is the right way to deal with this situation in a strongly typed language like C++ or Java?
something that's fairly obvious in
mathematics, but not so obvious to a
strongly typed system -- the fact that
a 1x1 matrix is just a number.
That's arguable. A hardcore mathematician (I'm not) would probably argue against it, he would say that a 1x1 matrix can be regarded as isomorphic (or something like that) to a scalar, but they are conceptually different things. Only in some informal sense "a 1x1 matrix is a scalar" (similar, though stronger, that a complex number without an imaginary part "is a real").
I don't think that that correspondence should be reflected in a strong typed language. And I dont' think it is, in typical implementations (of complex or matrix), eg. Java Apache Commons Math. For example, a Complex with zero imaginary part is not a Number (from the type POV - they cannot be casted one into another).
In the case of matrices, the correspondence is even more disputable. Should we be able to multiply two matrices of sizes (4x3) x (1x1) ? If we regard the second as a scalar, it's valid, but not as a matrix, since it violates the restriction on matrix dimensions for multiplication. And I believe Commons sticks to that.
In a weakly typed language (eg Matlab) it would be another story.
If you aren't worried about SIMD optimisations and the like then I would have thought the best way would be to set up a templated tensor. Choose your maximum tensor dimensions and then you can do things like this:
typedef Tensor3D< float, 4, 1, 1 > Vector4;
And so forth. The mathematics, if implemented correctly, will just work with all forms of "matrix" and "vector". Both are, afterall, just special cases of tensors.
Edit: knowing the size of a template is actually pretty easy. Add in a GetRows() etc function and you can return the value you pass into the template at instantiation.
ie
template< typename T, int rows, int cols > class Tensor2D
{
public:
int GetRows() { return rows; }
int GetCols() { return cols; }
};
My advice? Don't worry about the 1x1 case and sleep at night. You shouldn't be worried about any uses suddenly deciding to use your library to model a bunch of numbers as 1x1 matricies and complaining about your implementation.
No one who solves these problems will be so foolish. If you're smart enough to use matricies, you're smart enough to use them properly.
As for all the permutations that scalars introduce, I'd say that you must account for them. As a matrix library user, I'd expect to be able to multiply two matricies together to get another matrix, a matrix by a (column or row) vector get a vector result, and a scalar times a matrix to get another matrix.
If I multiply two vectors I can get a scalar (inner product) or a matrix (outer product). Your library had better give them to me.
It's not trivial. It's been done "right" by others, but kudos to working it through for yourself.
Related
I would like to create two models of binary prediction: one with the cut point strictly greater than 0.5 (in order to obtain fewer signals but better ones) and second with the cut point strictly less than 0.5.
Doing the cross-validation, we have a test error related to the cut point equal to 0.5. How can I do it with other cut value? I talk about XGBoost for Java.
xgboost returns a list of scores. You can do what ever you want to that list of scores.
I think that particularly in Java, it returns a 2d ArrayList of shape (1, n)
In binary prediction you probably used a logistic function, thus your scores will be between 0 to 1.
Take your scores object and create a custom function that will calculate new predictions, by the rules you've described.
If you are using an automated/xgboost-implemented Cross Validation Function, you might want to build a customized evaluation function which will do as you bid, and pass it as an argument to xgb.cv
If you want to be smart when setting your threshold, I suggest reading about AUC of Roc Curve and Precision Recall Curve.
Is there a legitimate use case for AffineTransform.getScaleX() and family?
It returns the m00 or the top left corner element of the transform matrix and that is pretty useless for determining the scaling the matrix does. Consider for example a trivial 90 degree rotation matrix and both getScaleX() and getScaleY() return 0.
I would never use this call as it is confusing to the code reader who may not be familiar with the fact that it does not return scaling in a meaningful way, much better to getMatrix(m) and then access the m[0], because most people likely to read code that uses transformations are familiar with matrix math.
There must be a use case for this, but I just don't get it.
As you wrote, the notion of the x scaling factor is quite meaningless in the general case. It is only relevant for the special case of scaling matrixes. i.E. those for which getType returns TYPE_GENERAL_SCALE or TYPE_UNIFORM_SCALE. (A translation could be added, too) In other cases, getScaleX is "correct" in the sense that it does what the docs say, but usesless and misleading regarding its name. Analogous reasoning applies to getShearX, which only makes senses in context of shearing matrixes. getTranslateX is a bit different, since one could argue that this is where the origin gets translated to, regardless of all other transformations the matrix implies.
I'm doing some video processing, for each frame I need to get a gradient of a bi-variate function.
The function is represented as a two dimensional array of doubles. Where the domain is the rows and columns indices and the range is the double value of the corresponding indices values. Or more simply put, the function f is defined for double[][] matrix as such:
f(x,y)=matrix[x][y]
I'm trying to use the Apache Commons Math library for it:
SmoothingPolynomialBicubicSplineInterpolator iterpolator = new SmoothingPolynomialBicubicSplineInterpolator();
BicubicSplineInterpolatingFunction f = iterpolator.interpolate(xs, ys, matrix.getData());
for (int i = 0; i < ans.length; i++) {
for (int j = 0; j < ans[0].length; j++) {
ans[i][j] = f.partialDerivativeY(i, j);
}
}
with xs, as a sorted array of the x indices (0,1,...,matrix.getRowDimension() - 1)
ys the same on the columns dimension (0,1,...,matrix.getColumnDimension() - 1)
The problem is that for a typical matrix in the size of 150X80 it takes as much as 1.4 seconds to run, which renders it completely irrelevant for my needs. So, as a novice user of this library, and programmatic numeric analysis in general, I want to know:
Am I doing something wrong?
Is there another, faster, way I can accomplish this task with?
Is there another open source library (preferably maven-friendly) that offers a solution?
Numerical differentiation is an entire topic unto itself, a simple google should bring up enough material for you to work with (just the wiki might be sufficient). There are parameters of your problem that I cannot know, so I can only speak broadly here, but there are direct methods of determining the gradient at a given point, i.e. ones that don't require an interpolation. See the wikipedia for the formulae (ranging from the simple f(x+1)-f(x), which is where h=1, to the higher order ones). Calculating the partial derivatives is then a simple O(NM) loop with an uber easy formula inside (no interpolation required).
The specifics can get gritty:
The higher order formulae need to be reduced for the edges, or
discarded altogether.
Your precise speed requirements might render more complex formulae useless (depending on the platform sometimes the lookup times for higher order formulae make them too slow; again, it depends on the cache etc.). This is easy to test, the formulae are simple; code them and benchmark.
The specific implementation is also dependent on your error requirements. The theory provides error bounds, so that will play a role in what formula you need; but again, there's a trade-off with speed requirements. The in turn can be practically lowered if you know specifics about the types of matrices you'll be processing, if such a thing is known.
The implementation can be made even easier (and maybe faster) if you have existing convolution tools, since this method is really just a convolution of the matrix (note; technically it's called a cross-correlation).
As an optional assignment, I'm thinking about writing my own implementation of the BigInteger class, where I will provide my own methods for addition, subtraction, multiplication, etc.
This will be for arbitrarily long integer numbers, even hundreds of digits long.
While doing the math on these numbers, digit by digit isn't hard, what do you think the best datastructure would be to represent my "BigInteger"?
At first I was considering using an Array but then I was thinking I could still potentially overflow (run out of array slots) after a large add or multiplication. Would this be a good case to use a linked list, since I can tack on digits with O(1) time complexity?
Is there some other data-structure that would be even better suited than a linked list? Should the type that my data-structure holds be the smallest possible integer type I have available to me?
Also, should I be careful about how I store my "carry" variable? Should it, itself, be of my "BigInteger" type?
Check out the book C Interfaces and Implementations by David R. Hanson. It has 2 chapters on the subject, covering the vector structure, word size and many other issues you are likely to encounter.
It's written for C, but most of it is applicable to C++ and/or Java. And if you use C++ it will be a bit simpler because you can use something like std::vector to manage the array allocation for you.
Always use the smallest int type that will do the job you need (bytes). A linked list should work well, since you won't have to worry about overflowing.
If you use binary trees (whose leaves are ints), you get all the advantages of the linked list (unbounded number of digits, etc) with simpler divide-and-conquer algorithms. You do not have in this case a single base but many depending the level at which you're working.
If you do this, you need to use a BigInteger for the carry. You may consider it an advantage of the "linked list of ints" approach that the carry can always be represented as an int (and this is true for any base, not just for base 10 as most answers seem to assume that you should use... In any base, the carry is always a single digit)
I might as well say it: it would be a terrible waste to use base 10 when you can use 2^30 or 2^31.
Accessing elements of linked lists is slow. I think arrays are the way to go, with lots of bound checking and run time array resizing as needed.
Clarification: Traversing a linked list and traversing an array are both O(n) operations. But traversing a linked list requires deferencing a pointer at each step. Just because two algorithms both have the same complexity it doesn't mean that they both take the same time to run. The overhead of allocating and deallocating n nodes in a linked list will also be much heavier than memory management of a single array of size n, even if the array has to be resized a few times.
Wow, there are some… interesting answers here. I'd recommend reading a book rather than try to sort through all this contradictory advice.
That said, C/C++ is also ill-suited to this task. Big-integer is a kind of extended-precision math. Most CPUs provide instructions to handle extended-precision math at comparable or same speed (bits per instruction) as normal math. When you add 2^32+2^32, the answer is 0… but there is also a special carry output from the processor's ALU which a program can read and use.
C++ cannot access that flag, and there's no way in C either. You have to use assembler.
Just to satisfy curiosity, you can use the standard Boolean arithmetic to recover carry bits etc. But you will be much better off downloading an existing library.
I would say an array of ints.
An Array is indeed a natural fit. I think it is acceptable to throw OverflowException, when you run out of place in your memory. The teacher will see attention to detail.
A multiplication roughly doubles digit numbers, addition increases it by at most 1. It is easy to create a sufficiently big Array to store the result of your operation.
Carry is at most a one-digit long number in multiplication (9*9 = 1, carry 8). A single int will do.
std::vector<bool> or std::vector<unsigned int> is probably what you want. You will have to push_back() or resize() on them as you need more space for multiplies, etc. Also, remember to push_back the correct sign bits if you're using two-compliment.
i would say a std::vector of char (since it has to hold only 0-9) (if you plan to work in BCD)
If not BCD then use vector of int (you didnt make it clear)
Much less space overhead that link list
And all advice says 'use vector unless you have a good reason not too'
As a rule of thumb, use std::vector instead of std::list, unless you need to insert elements in the middle of the sequence very often. Vectors tend to be faster, since they are stored contiguously and thus benefit from better spatial locality (a major performance factor on modern platforms).
Make sure you use elements that are natural for the platform. If you want to be platform independent, use long. Remember that unless you have some special compiler intrinsics available, you'll need a type at least twice as large to perform multiplication.
I don't understand why you'd want carry to be a big integer. Carry is a single bit for addition and element-sized for multiplication.
Make sure you read Knuth's Art of Computer Programming, algorithms pertaining to arbitrary precision arithmetic are described there to a great extent.
We were just assigned a new project in my data structures class -- Generating text with markov chains.
Overview
Given an input text file, we create an initial seed of length n characters. We add that to our output string and choose our next character based on frequency analysis..
This is the cat and there are two dogs.
Initial seed: "Th"
Possible next letters -- i, e, e
Therefore, probability of choosing i is 1/3, e is 2/3.
Now, say we choose i. We add "i" to the output string. Then our seed becomes
hi and the process continues.
My solution
I have 3 classes, Node, ConcreteTrie, and Driver
Of course, the ConcreteTrie class isn't a Trie of the traditional sense. Here is how it works:
Given the sentence with k=2:
This is the cat and there are two dogs.
I generate Nodes Th, hi, is, ... + ... , gs, s.
Each of these nodes have children that are the letter that follows them. For example, Node Th would have children i and e. I maintain counts in each of those nodes so that I can later generate the probabilities for choosing the next letter.
My question:
First of all, what is the most efficient way to complete this project? My solution seems to be very fast, but I really want to knock my professor's socks off. (On my last project A variation of the Edit distance problem, I did an A*, a genetic algorithm, a BFS, and Simulated Annealing -- and I know that the problem is NP-Hard)
Second, what's the point of this assignment? It doesn't really seem to relate to much of what we've covered in class. What are we supposed to learn?
On the relevance of this assignment with what you covered in class (Your second question). The idea of a 'data structures' class is to expose students to the very many structures frequently encountered in CS: lists, stacks, queues, hashes, trees of various types, graphs at large, matrices of various creed and greed, etc. and to provide some insight into their common implementations, their strengths and weaknesses and generally their various fields of application.
Since most any game / puzzle / problem can be mapped to some set of these structures, there is no lack of subjects upon which to base lectures and assignments. Your class seems interesting because while keeping some focus on these structures, you are also given a chance to discover real applications.
For example in a thinly disguised fashion the "cat and two dogs" thing is an introduction to statistical models applied to linguistics. Your curiosity and motivation prompted you to make the relation with markov models and it's a good thing, because chances are you'll meet "Markov" a few more times before graduation ;-) and certainly in a professional life in CS or related domain. So, yes! it may seem that you're butterflying around many applications etc. but so long as you get a feel for what structures and algorithms to select in particular situations, you're not wasting your time!
Now, a few hints on possible approaches to the assignment
The trie seems like a natural support for this type of problem. Maybe you can ask yourself however how this approach would scale, if you had to index say a whole book rather than this short sentence. It seems mostly linearly, although this depends on how each choice on the three hops in the trie (for this 2nd order Markov chain) : as the number of choices increase, picking a path may become less efficient.
A possible alternative storage for the building of the index is a stochatisc matrix (actually a 'plain' if only sparse matrix, during the statistics gathering process, turned stochastic at the end when you normalize each row -or column- depending on you set it up) to sum-up to one (100%). Such a matrix would be roughly 729 x 28, and would allow the indexing, in one single operation, of a two-letter tuple and its associated following letter. (I got 28 for including the "start" and "stop" signals, details...)
The cost of this more efficient indexing is the use of extra space. Space-wise the trie is very efficient, only storing the combinations of letter triplets effectively in existence, the matrix however wastes some space (you bet in the end it will be very sparsely populated, even after indexing much more text that the "dog/cat" sentence.)
This size vs. CPU compromise is very common, although some algorithms/structures are somtimes better than others on both counts... Furthermore the matrix approach wouldn't scale nicely, size-wize, if the problem was changed to base the choice of letters from the preceding say, three characters.
None the less, maybe look into the matrix as an alternate implementation. It is very much in spirit of this class to try various structures and see why/where they are better than others (in the context of a specific task).
A small side trip you can take is to create a tag cloud based on the probabilities of the letters pairs (or triplets): both the trie and the matrix contain all the data necessary for that; the matrix with all its interesting properties, may be more suited for this.
Have fun!
You using bigram approach with characters, but usually it applied to words, because the output will be more meaningful if we use just simple generator as in your case).
1) From my point of view you doing all right. But may be you should try slightly randomize selection of the next node? E.g. select random node from 5 highest. I mean if you always select node with highest probability your output string will be too uniform.
2) I've done exactly the same homework at my university. I think the point is to show to the students that Markov chains are powerful but without extensive study of application domain output of generator will be ridiculous