Automatic adjustment of 3 parameters to minimize standard deviation

Automatic adjustment of 3 parameters to minimize standard deviation - java

Situation:
I am currently developing a Java application based on rules. Every rule has 3 numeric parameters to influence a database communication. I am measuring a value, that is affected by this rules and calculate the standard deviation of the measured values. The standard deviation should be as small as possible.
Question:
I am wondering if it is possible to do this automated? I can already start a test scenario automatically and I can calculate the standard deviation automatically. So, now I am looking for mechanism to adjust the parameters according to the measured values. Any ideas?
Thanx.
PS: I know, it's a very general question...

As Peter says, you have to minimize a function f(a,b,c). There are a lot of elaborate methods for well behaving functions. Eg for functions which can be differentiated, or for so called convex functions. In your case you have a function where we do not know very much about. So f could have different local minima which kills many established minimization methods.
If a simple evaluation of a parameter set a,b,c is fast, you can try some kind of coordinate descent. This is not the best method, brute force but easy for you to implement. I will name the standard deviation achieved by (a,b,c) as s(a,b,c):
I give you some python style pseudo code, which should be easy to read:
def improve(a,b,c):
eps = .01
s1 = s(a*(1+eps), b, c)
s2 = s(a, b*(1+eps), b, c)
s3 = s(a, b, c*(1+eps))
s4 = s(a*(1-eps), b, c)
s5 = s(a, b*(1-eps), c)
s6 = s(a, b, c*(1-eps))
# determine minimal of (s1....s6) and take index:
i = argmin (s1....s6)
# take parameters which lead to miminal si:
if i==1:
a = a*(1+eps)
if i==2:
b = b*(1+eps)
...
if i==6:
c = c*(1-eps)
return a,b ,c
You have to start with some values (a,b,c) and this function should give you a new triple (a,b,c) which leads to less variation. Now you can apply this step as often as you want.
Maybe you have to adapt eps, that depends on how fast s(a,b,c) changes if you make little modifications on a, b, or c.
This is not the best solution, but an easy to try hands-on approach.

Fortunately there are a number of general solutions. It should be a matter of solving the function to minimise the result. If you have a function x= f(a,b,c) you want to find a, b, c which gives a minimal x. The simplest approach is trial an error, but you can improve on this by using a binary search and linear interpolation (assuming the topology is relatively simple) There are more complex approaches but you may not need them.
Do you know what the actual function is? If its a pure standard deviation you can make a,b,c the same e.g. the average, and your standard deviation will be 0.

If you don't know anything about f, I think I would run random samples for some time and have a look at the results. Then you can decide whether you want to do a gradient descent or something else.

Related

Programmatically determine asymptotic runtime of a given algorithm? [duplicate]

I wonder whether there is any automatic way of determining (at least roughly) the Big-O time complexity of a given function?
If I graphed an O(n) function vs. an O(n lg n) function I think I would be able to visually ascertain which is which; I'm thinking there must be some heuristic solution which enables this to be done automatically.
Any ideas?
Edit: I am happy to find a semi-automated solution, just wondering whether there is some way of avoiding doing a fully manual analysis.

It sounds like what you are asking for is an extention of the Halting Problem. I do not believe that such a thing is possible, even in theory.
Just answering the question "Will this line of code ever run?" would be very difficult if not impossible to do in the general case.
Edited to add:
Although the general case is intractable, see here for a partial solution: http://research.microsoft.com/apps/pubs/default.aspx?id=104919
Also, some have stated that doing the analysis by hand is the only option, but I don't believe that is really the correct way of looking at it. An intractable problem is still intractable even when a human being is added to the system/machine. Upon further reflection, I suppose that a 99% solution may be doable, and might even work as well as or better than a human.

You can run the algorithm over various size data sets, and you could then use curve fitting to come up with an approximation. (Just looking at the curve you create probably will be enough in most cases, but any statistical package has curve fitting).
Note that some algorithms exhibit one shape with small data sets, but another with large... and the definition of large remains a bit nebulous. This means that an algorithm with a good performance curve could have so much real world overhead that (for small data sets) it doesn't work as well as the theoretically better algorithm.
As far as code inspection techniques, none exist. But instrumenting your code to run at various lengths and outputting a simple file (RunSize RunLength would be enough) should be easy. Generating proper test data could be more complex (some algorithms work better/worse with partially ordered data, so you would want to generate data that represented your normal use-case).
Because of the problems with the definition of "what is large" and the fact that performance is data dependent, I find that static analysis often is misleading. When optimizing performance and selecting between two algorithms, the real world "rubber hits the road" test is the only final arbitrator I trust.

A short answer is that it's impossible because constants matter.
For instance, I might write a function that runs in O((n^3/k) + n^2). This simplifies to O(n^3) because as n approaches infinity, the n^3 term will dominate the function, irrespective of the constant k.
However, if k is very large in the above example function, the function will appear to run in almost exactly n^2 until some crossover point, at which the n^3 term will begin to dominate. Because the constant k will be unknown to any profiling tool, it will be impossible to know just how large a dataset to test the target function with. If k can be arbitrarily large, you cannot craft test data to determine the big-oh running time.

I am surprised to see so many attempts to claim that one can "measure" complexity by a stopwatch. Several people have given the right answer, but I think that there is still room to drive the essential point home.
Algorithm complexity is not a "programming" question; it is a "computer science" question. Answering the question requires analyzing the code from the perspective of a mathematician, such that computing the Big-O complexity is practically a form of mathematical proof. It requires a very strong understanding of the fundamental computer operations, algebra, perhaps calculus (limits), and logic. No amount of "testing" can be substituted for that process.
The Halting Problem applies, so the complexity of an algorithm is fundamentally undecidable by a machine.
The limits of automated tools applies, so it might be possible to write a program to help, but it would only be able to help about as much as a calculator helps with one's physics homework, or as much as a refactoring browser helps with reorganizing a code base.
For anyone seriously considering writing such a tool, I suggest the following exercise. Pick a reasonably simple algorithm, such as your favorite sort, as your subject algorithm. Get a solid reference (book, web-based tutorial) to lead you through the process of calculating the algorithm complexity and ultimately the "Big-O". Document your steps and results as you go through the process with your subject algorithm. Perform the steps and document your progress for several scenarios, such as best-case, worst-case, and average-case. Once you are done, review your documentation and ask yourself what it would take to write a program (tool) to do it for you. Can it be done? How much would actually be automated, and how much would still be manual?
Best wishes.

I am curious as to why it is that you want to be able to do this. In my experience when someone says: "I want to ascertain the runtime complexity of this algorithm" they are not asking what they think they are asking. What you are most likely asking is what is the realistic performance of such an algorithm for likely data. Calculating the Big-O of a function is of reasonable utility, but there are so many aspects that can change the "real runtime performance" of an algorithm in real use that nothing beats instrumentation and testing.
For example, the following algorithms have the same exact Big-O (wacky pseudocode):
example a:
huge_two_dimensional_array foo
for i = 0, i < foo[i].length, i++
for j = 0; j < foo[j].length, j++
do_something_with foo[i][j]
example b:
huge_two_dimensional_array foo
for j = 0, j < foo[j].length, j++
for i = 0; i < foo[i].length, i++
do_something_with foo[i][j]
Again, exactly the same big-O... but one of them uses row ordinality and one of them uses column ordinality. It turns out that due to locality of reference and cache coherency you might have two completely different actual runtimes, especially depending on the actual size of the array foo. This doesn't even begin to touch the actual performance characteristics of how the algorithm behaves if it's part of a piece of software that has some concurrency built in.
Not to be a negative nelly but big-O is a tool with a narrow scope. It is of great use if you are deep inside algorithmic analysis or if you are trying to prove something about an algorithm, but if you are doing commercial software development the proof is in the pudding, and you are going to want to have actual performance numbers to make intelligent decisions.
Cheers!

This could work for simple algorithms, but what about O(n^2 lg n), or O(n lg^2 n)?
You could get fooled visually very easily.
And if its a really bad algorithm, maybe it wouldn't return even on n=10.

Proof that this is undecidable:
Suppose that we had some algorithm HALTS_IN_FN(Program, function) which determined whether a program halted in O(f(n)) for all n, for some function f.
Let P be the following program:
if(HALTS_IN_FN(P,f(n)))
{
while(1);
}
halt;
Since the function and the program are fixed, HALTS_IN_FN on this input is constant time. If HALTS_IN_FN returns true, the program runs forever and of course does not halt in O(f(n)) for any f(n). If HALTS_IN_FN returns false, the program halts in O(1) time.
Thus, we have a paradox, a contradiction, and so the program is undecidable.

A lot of people have commented that this is an inherently unsolvable problem in theory. Fair enough, but beyond that, even solving it for any but the most trivial cases would seem to be incredibly difficult.
Say you have a program that has a set of nested loops, each based on the number of items in an array. O(n^2). But what if the inner loop is only run in a very specific set of circumstances? Say, on average, it's run in aprox log(n) cases. Suddenly our "obviously" O(n^2) algorithm is really O(n log n). Writing a program that could determine if the inner loop would be run, and how often, is potentially more difficult than the original problem.
Remember O(N) isn't god; high constants can and will change the playing field. Quicksort algorithms are O(n log n) of course, but when the recursion gets small enough, say down to 20 items or so, many implementations of quicksort will change tactics to a separate algorithm as it's actually quicker to do a different type of sort, say insertion sort with worse O(N), but much smaller constant.
So, understand your data, make educated guesses, and test.

I think it's pretty much impossible to do this automatically. Remember that O(g(n)) is the worst-case upper bound and many functions perform better than that for a lot of data sets. You'd have to find the worst-case data set for each one in order to compare them. That's a difficult task on its own for many algorithms.

You must also take care when running such benchmarks. Some algorithms will have a behavior heavily dependent on the input type.
Take Quicksort for example. It is a worst-case O(n²), but usually O(nlogn). For two inputs of the same size.
The traveling salesman is (I think, not sure) O(n²) (EDIT: the correct value is 0(n!) for the brute force algotithm) , but most algorithms get rather good approximated solutions much faster.
This means that the the benchmarking structure has to most of the time be adapted on an ad hoc basis. Imagine writing something generic for the two examples mentioned. It would be very complex, probably unusable, and likely will be giving incorrect results anyway.

Jeffrey L Whitledge is correct. A simple reduction from the halting problem proves that this is undecidable...
ALSO, if I could write this program, I'd use it to solve P vs NP, and have $1million... B-)

I'm using a big_O library (link here) that fits the change in execution time against independent variable n to infer the order of growth class O().
The package automatically suggests the best fitting class by measuring the residual from collected data against each class growth behavior.
Check the code in this answer.
Example of output,
Measuring .columns[::-1] complexity against rapid increase in # rows
--------------------------------------------------------------------------------
Big O() fits: Cubic: time = -0.017 + 0.00067*n^3
--------------------------------------------------------------------------------
Constant: time = 0.032 (res: 0.021)
Linear: time = -0.051 + 0.024*n (res: 0.011)
Quadratic: time = -0.026 + 0.0038*n^2 (res: 0.0077)
Cubic: time = -0.017 + 0.00067*n^3 (res: 0.0052)
Polynomial: time = -6.3 * x^1.5 (res: 6)
Logarithmic: time = -0.026 + 0.053*log(n) (res: 0.015)
Linearithmic: time = -0.024 + 0.012*n*log(n) (res: 0.0094)
Exponential: time = -7 * 0.66^n (res: 3.6)
--------------------------------------------------------------------------------

I guess this isn't possible in a fully automatic way since the type and structure of the input differs a lot between functions.

Well, since you can't prove whether or not a function even halts, I think you're asking a little much.
Otherwise #Godeke has it.

I don't know what's your objective in doing this, but we had a similar problem in a course I was teaching. The students were required to implement something that works at a certain complexity.
In order not to go over their solution manually, and read their code, we used the method #Godeke suggested. The objective was to find students who used linked list instead of a balansed search tree, or students who implemented bubble sort instead of heap sort (i.e. implementations that do not work in the required complexity - but without actually reading their code).
Surprisingly, the results did not reveal students who cheated. That might be because our students are honest and want to learn (or just knew that we'll check this ;-) ). It is possible to miss cheating students if the inputs are small, or if the input itself is ordered or such. It is also possible to be wrong about students who did not cheat, but have large constant values.
But in spite of the possible errors, it is well worth it, since it saves a lot of checking time.

As others have said, this is theoretically impossible. But in practice, you can make an educated guess as to whether a function is O(n) or O(n^2), as long as you don't mind being wrong sometimes.
First time the algorithm, running it on input of various n. Plot the points on a log-log graph. Draw the best-fit line through the points. If the line fits all the points well, then the data suggests that the algorithm is O(n^k), where k is the slope of the line.
I am not a statistician. You should take all this with a grain of salt. But I have actually done this in the context of automated testing for performance regressions. The patch here contains some JS code for it.

If you have lots of homogenious computational resources, I'd time them against several samples and do linear regression, then simply take the highest term.

It's easy to get an indication (e.g. "is the function linear? sub-linear? polynomial? exponential")
It's hard to find the exact complexity.
For example, here's a Python solution: you supply the function, and a function that creates parameters of size N for it. You get back a list of (n,time) values to plot, or to perform regression analysis. It times it once for speed, to get a really good indication it would have to time it many times to minimize interference from environmental factors (e.g. with the timeit module).
import time
def measure_run_time(func, args):
start = time.time()
func(*args)
return time.time() - start
def plot_times(func, generate_args, plot_sequence):
return [
(n, measure_run_time(func, generate_args(n+1)))
for n in plot_sequence
]
And to use it to time bubble sort:
def bubble_sort(l):
for i in xrange(len(l)-1):
for j in xrange(len(l)-1-i):
if l[i+1] < l[i]:
l[i],l[i+1] = l[i+1],l[i]
import random
def gen_args_for_sort(list_length):
result = range(list_length) # list of 0..N-1
random.shuffle(result) # randomize order
# should return a tuple of arguments
return (result,)
# timing for N = 1000, 2000, ..., 5000
times = plot_times(bubble_sort, gen_args_for_sort, xrange(1000,6000,1000))
import pprint
pprint.pprint(times)
This printed on my machine:
[(1000, 0.078000068664550781),
(2000, 0.34400010108947754),
(3000, 0.7649998664855957),
(4000, 1.3440001010894775),
(5000, 2.1410000324249268)]

Matrix library vs for loops for element-wise operations in Java

I'm looking to do some some element-wise operations (addition, multiplication, sqrt, etc.) on floating point arrays that are ~800x300 elements in size.
How much of a speedup (if any) would I get from doing this with matrix libraries (JAMA, EJML, etc.) over just doing the element-wise operations in for loops?
For loops look more appealing because my equations can get kind of complicated, and for loops would mean I could keep all my equations as is -- in plain old infix notation. Since java doesn't support operator overloading, using a matrix library wouldn't be as simple. So, I only want to use a matrix library if it's going to mean a real speedup. (Speed will be important here.)

I would suggest you to use some of the matrix libraries for that. In most cases it should run as fast as simple for loops. But it also can run faster. So, what you will get for free: API & the equal or better perfromance. It also saves a bit of your time while writing element-wise operations.
As the author of la4j library I can say that using third-party library gives you an opportunity to get faster and faster code from new releases. For example. You can choise la4j for you needs. It is currenlty (version 0.4.0-0.4.5) uses simple for loops calculations for element-wise operations. So, it won't be faster then hand-written code. But, I'm now on the middle of developing a new parallel engine for la4j, that allows to run a code in parallel mode without any significant changes in API. Like this:
Matrix a = new Basic2DMatrix(...); // simple 2D array matrix
Matrix b = new Basic2DMatrix(...); // that is too
Matrix c = a.multiply(b); // a * b in sequental mode
Matrix c = a.par().multiply(b); // a * b in parallel mode
So, all you need to do is change a one piece of the code. All these advantages you'll get for free with libraries like la4j. Just let the libraries do their job and spend your solving real problems.

How can I calculate integrals of step functions in the most simple way?

I have a time series of recoreded frequencies, from which I would like to calculate secondly means. However the sample rate is not constant, which means that a simple arithmetic mean is wrong. What I would actually like to compute is the integral of the step function (described by the timeseries) within each secondly interval.
Consider for example this time series:
08:11:23.400 -> 49.9 Hz
08:11:24.200 -> 50.1 Hz
08:11:24.600 -> 50.15 Hz
08:11:24.800 -> 50.05 Hz
08:11:25.100 -> 49.95 Hz
The arithmetic mean of the second 08:11:24.000 - 08:11:25.000 would be (50.1 + 50.15 + 50.05)/3 = 50.1. But this is not the mean fequency measured in that second. It is instead:
(200*49.9 + 400*50.1 + 200*50.15 + 200*50.05)/1000 = 50.06, because the measured frequencies were true for different amounts of time.
This is the calculation of a weighted mean (with the hold times as weights) or equivalently the calculation of the integral of the step function (and then deviding by the time).
First of all: Is there a name for this specific calculation? It seems a rather standard computation on time series to me. Not knowing a name for this makes it hard to google for it.
Second: Which java library supports such a calculation? I would like to avoid implementing this by myself. I refuse to believe that there is no good standard java library offering this. I was looking into the apache commons math library but without any luck (but again: maybe I'm just missing the correct term to look for).

I am not sure the formula 200*49.9 + 400*50.1 + ... is correct. It implies that the frequency 49.9 is in effect from 08:11:23.400 to 08:11:24.200, as if the frequency sensor meters future frequency. I would rather think that it meters mean past frequency. Then, is the frequency really a step function? Or is a saw-tooth function closer to reality? Or even a smooth function, reconstructed with splines?
As a result, I would recommend to compute the integral by yourself, and be ready to change the calculation formula. As for bugs, you equally can make errors while choosing function from a library, and when setting its parameters.

There are not many libraries in Java which can do this properly. But the basic thing you are looking for is digital signal processing.
Here's a similar question: Signal processing library in Java?

Comparing c and java programs runtime

I had a job interview today, we were given a programming question, and were asked to solve it using c/c++/Java, I solved it in java and its runtime was 3 sec (the test was more 16000 lines, and the person accompanying us said the running time was reasonable), another person there solved it in c and the runtime was 0.25 sec, so I was wondering, is a factor of 12 normal?
Edit:
As I said, I don't think there was really much room for algorithm variation except maybe in one little thing, anyway, there was this protocol that we had to implement:
A (client) and B (server) communicate according to some protocol p, before the messages are delivered their validity is checked, the protocol is defined by its state and the text messages that can be sent when it is in a certain state, in all states there was only one valid message that could be sent, except in one state where there was like 10 messages that can be sent, there are 5 states and the states transition is defined by the protocol too.
so what I did with the state from which 10 different messages can be sent was storing their string value in an ArrayList container, then when I needed to check the message validity in the corresponding state i checked if arrayList.contains(sentMessageStr); I would think that this operation's complexity is O(n) although I think java has some built-in optimization for this operation, although now that I am thinking about it, maybe I should've used a HashSet container.I suppose the c implementation would have been storing those predefined legal strings lexicographically in an array and implementing a binary search function.
thanks

I would guess that it's likely the jvm took a significant portion of that 3 seconds just to load. Try running your java version on the same machine 5 times in a row. Or try running both on a dataset 500 times as large. I suspect you'll see a significant constant latency for the Java version that will become insignificant when runtimes go into the minutes.

Sounds more like a case of insufficient samples and unequal implementations (and possibly unequal test beds).
One of the first rules in measurement is to establish enough samples and obtain the mean of the samples for comparison. Even a couple of runs of the same program is not sufficient. You need to tax the machine enough to obtain samples whose values can be compared. That's why test-beds need to be warmed up, so that there are little or no variables at play, except for the system under observation.
And of course, you also have different people implementing the same requirement/algorithm in different manners. It counts. Period. Unless the algorithm implementations have been "normalized", obtaining samples and comparing them are the same as comparing apples and watermelons.
I don't think I need to expand on the fact that the testbeds could have been of varying configurations, or under varying loads.

It's almost impossible to say without seeing the code - you may have a better algorithm for example that scales up much better for larger input but has a greater overhead for small input sizes.
Having said that, this kind of 12x difference is roughly what I would expect if you coded the solution using "higher level" constructs such as ArrayLists / boxed objects and the C solution was basically using optimised, low level pointer arithmetic into a pre-allocated memory region.
I'd rather maintain the higher level solution, but there are times when only hand-optimised low level code will do.....
Another potential explanation is that the JIT had not yet warmed up on your code. In general, you need to reach "steady state" (typically a few thousand iterations of every code path) before you will see top performance in JIT-compiled code.

Performance depends on implementation. Without knowing exactly what you code and what your competitor did, it's very difficult to tell exactly what happened.
But let's say for isntance, that you used objects like vectors or whatever to solve the problem and the C guy used arrays[], his implementation is going to be faster than yours for sure.
C code can be translated very efficiently into assembly instructions, while Java on the other hand, relies on a bunch of stuff (like the JVM) that might make the bytecode of your program fatter and probably a little bit slower.

You will be hard pressed to find something that can execute faster in Java than in C. Its true that an order of magnitude is a big difference but in general C is more performant.
On the other hand you can produce a solution to any given problem much quicker in Java (especially taking into account the richness of the libraries).
So at the end of the day, if there is a choice at all, it comes down as a dilemma between performance and productivity.

That depends on the algorithm. Java is of course generally slower than C/C++ as it's a virtual machine but for most common applications its speed is sufficient. I would not call a factor of 12 normal for common applications.
Would be nice if you posted the C and Java codes for comparison.

A factor of 12 can be normal. So could a factor of 1 or 1/2. As some commentators mentioned, a lot has to do with how you coded your solution.
Dont forget that java programs have to run in a jvm (unless you compile to native machine code), so any benchmarks should take that into account.
You can google for 'java and c speed comparisons' for some analysis

Back in the days I'd say that there's nothing wrong with your Java code being 12 times slower. But nowadays I'd rather say that the C guy implemented it more efficiently. Obviously I might be wrong, but are you sure you used proper data structures and well simply coded it well?
Also did you measure the memory usage? This might sound silly, but last year at the uni we had a programming challenge, don't really remember what it was but we had to solve a graph problem in whatever language we wanted - I did two implementations of my algorithm one in C and one in Java, the Java one was ~1,5-2x slower, BUT for instance I knew I didn't have to worry about memory management (I knew exactly how big the input will be and how many test samples will be run from the teacher) so I simply didn't free any memory (which took way too much time in a programme that run for ~1-2seconds on a graph with ~15k nodes, or was it 150k?) so my Java code was better memory wise but it was slower. I also parsed the input myself in C (didn't do that in Java) which saved me really A LOT of time (~20-25% boost, I was amazed myself). I'd say 1,5-2x is more realistic than 12x.

Most likely the algorithm used in the implementation was different.
For instance ( an over simplification ) if you want to add a number N , M number of times one implementation could be:
long addTimes( long n, long m ) {
long r = 0;
long i;
for( i = 0; i < m ; i++ ) {
r += n;
}
return r;
}
And another implementation could simply be:
long addTimes( long n, long m ) {
return n * m;
}
Both, will run mostly the same in Java and C (you don't even have to change the code ) and still, one implementation will run way lot faster than the other.

Static Typing and Writing a Simple Matrix Library

Aye it's been done a million times before, but damnit I want to do it again. I'm writing a simple Matrix Library for C++ with the intention of doing it right. I've come across something that's fairly obvious in mathematics, but not so obvious to a strongly typed system -- the fact that a 1x1 matrix is just a number. To avoid this, I started walking down the hairy path of matrices as a composition of vectors, but also stumbled upon the fact that two vectors multiplied together could either be a number or a dyad, depending on the orientation of the two.
My question is, what is the right way to deal with this situation in a strongly typed language like C++ or Java?

something that's fairly obvious in
mathematics, but not so obvious to a
strongly typed system -- the fact that
a 1x1 matrix is just a number.
That's arguable. A hardcore mathematician (I'm not) would probably argue against it, he would say that a 1x1 matrix can be regarded as isomorphic (or something like that) to a scalar, but they are conceptually different things. Only in some informal sense "a 1x1 matrix is a scalar" (similar, though stronger, that a complex number without an imaginary part "is a real").
I don't think that that correspondence should be reflected in a strong typed language. And I dont' think it is, in typical implementations (of complex or matrix), eg. Java Apache Commons Math. For example, a Complex with zero imaginary part is not a Number (from the type POV - they cannot be casted one into another).
In the case of matrices, the correspondence is even more disputable. Should we be able to multiply two matrices of sizes (4x3) x (1x1) ? If we regard the second as a scalar, it's valid, but not as a matrix, since it violates the restriction on matrix dimensions for multiplication. And I believe Commons sticks to that.
In a weakly typed language (eg Matlab) it would be another story.

If you aren't worried about SIMD optimisations and the like then I would have thought the best way would be to set up a templated tensor. Choose your maximum tensor dimensions and then you can do things like this:
typedef Tensor3D< float, 4, 1, 1 > Vector4;
And so forth. The mathematics, if implemented correctly, will just work with all forms of "matrix" and "vector". Both are, afterall, just special cases of tensors.
Edit: knowing the size of a template is actually pretty easy. Add in a GetRows() etc function and you can return the value you pass into the template at instantiation.
ie
template< typename T, int rows, int cols > class Tensor2D
{
public:
int GetRows() { return rows; }
int GetCols() { return cols; }
};

My advice? Don't worry about the 1x1 case and sleep at night. You shouldn't be worried about any uses suddenly deciding to use your library to model a bunch of numbers as 1x1 matricies and complaining about your implementation.
No one who solves these problems will be so foolish. If you're smart enough to use matricies, you're smart enough to use them properly.
As for all the permutations that scalars introduce, I'd say that you must account for them. As a matrix library user, I'd expect to be able to multiply two matricies together to get another matrix, a matrix by a (column or row) vector get a vector result, and a scalar times a matrix to get another matrix.
If I multiply two vectors I can get a scalar (inner product) or a matrix (outer product). Your library had better give them to me.
It's not trivial. It's been done "right" by others, but kudos to working it through for yourself.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.