Does it make sense to write logical tests using JBehave?

Does it make sense to write logical tests using JBehave? - java

I've encountered JBehave recently and I think we should use it. So I have called in the tester of our team and he also thinks that this should be used.
With that as starting point I have asked the tester to write stories for a test application (the Bowling Game Kata of Uncle Bob). At the end of the day we would try to map his tests against the bowling game.
I was expecting a test like this:
Given a bowling game
When player rolls 5
And player rolls 4
Then total pins knocked down is 9
Instead, the tester came with 'logical tests', in other words he was not being that specific. But, in his terms this was a valid test.
Given a bowling game
When player does a regular throw
Then score should be calculated appropriately
My problem with this is ambiguity, what is a 'regular throw'? What is 'appropriately'? What will it mean when one of those steps fail?
However, the tester says that a human does understand and that what I was looking for where 'physical tests', which where more cumbersome to write.
I could probably map 'regular' with rolling two times 4 (still no spare, nor strike), but it feels like I am again doing a translation I don't want to make.
So I wonder, how do you approach this? How do you write your JBehave tests? And do you have any experience when it is not you who writes these tests, and you have to map them to your code?

His test is valid, but requires a certain knowledge of the domain, which no framework will have. Automated tests should be explicit, think of them as examples. Writing them costs more than writing "logical tests", but this pays in the long run since they can be replayed at will, very quickly, and give an immediate feedback.
You should have paired with him writing the first tests, to put it in the right direction. Perhaps you could give him your test, and ask him to increase the coverage by adding new tests.

The amount of explicitness needed in acceptance criteria depends on level of trust between the development team and the business stakeholders.
In your example, the business is assuming that the developers/testers understand enough about bowling to determine the correct outcome.
But imagine a more complex domain, like finance. For that, it would probably be better to have more explicit examples to ensure a good understanding of the requirement.
Alternatively, let's say you have a scenario:
Given I try to sign up with an invalid email address
Then I should not be registered
For this, a developer/tester probably has better knowledge of what constitutes a valid or invalid email address than the business stakeholder does. You would still want to test against a variety of addresses, but that can be specified within the step definitions, rather than exposing it at the scenario level.

I hate such vague words as "appropriately" in the "expected values". The "appropriately" is just an example of "toxic word" for the testing, and if not eliminated, this "approach" can get widespread, effectively killing the testing in general. It might "be enough" for human tester, but such "test cases" are acceptable only at first attempts to exploratory "smoke test".
Whatever reproducible, systematical and automatable, every test case must be specific. (not just "should".. to assume the softness of "would" could be allowed? Instead I use the present tense "shall be" or even better strict "is", as a claim to confirm/refuse.) And this rule is absolute once it comes to automation.
What your tester made, was rather a "test-area", a "scenario template", instead of a real test-case: Because so many possible test-results can be produced...
You were specific, in your scenario: That was a very specific real "test case". It is possible to automate your test case, nice: You can delegate it on a machine and evaluate it as often as you need, automatically. (with the bonus of automated report, from an Continuous Integration server)
But the "empty test scenario template"? It has some value too: It is a "scenario template", an empty skeleton prepared to be filled by data: So I love to name these situations "DDT": "Data Driven Testing".
Imagine a web-form to be tested, with validations on its 10 inputs, with cross-validations... And the submit button. There can be 10 test-cases for every single input:
empty;
with a char, but still too short anyway;
too long for the server, but allowed within the form for copy-paste and further edits;
with invalid chars...
The approach I recommend is to prepare a set of to-pass data: even to generate them (from DB or even randomly), whatever you can predict shall pass the test, the "happy scenario". Keep the data aside, as a data-template, and use it to initialize the form, to fill it up, and then to brake-down some single value: Create test cases "to fail". Do it i.e. 10 times for every single input, for each of the 10 inputs (100 tests-cases even before cross-rules attempted) ... and then, after the 100 times of the refusing of the form by the server, fill up the form by the to-pass data, without distorting them, so the form can be accepted finally. (accepted submit changes status on the server-app, so needs to go as the last one, to test all the 101 cases on the same app-state)
To do your test this way, you need two things:
the empty scenario template,
and a table of 100 rows of data:
10 columns of input data: with only one value manipulated, as passing row by row down the table (i.e. ever heard about grey-code?),
possibly keeping the inheritance history in a row-description, where from is the row derived and how, via which manipulated value.
Also the 11th column, the "expected result" column(s) filled: to pass/fail expected status, expected err/validation message, reference to the requirements, for the test-coveradge tracking. (i.e. ever seen FitNesse?)
And possibly also the column for the real detected result, when test performed, to track history of the single row-test-case. (so the CI server mentioned already)
To combine the "empty scenario skeleton" on one side and the "data-table to drive the test" on the other side, some mechanism is needed, indeed. And your data need to be imported. So you can prepare the rows in excel, which could be theoretically imported too, but for the easier life I recommend either CSV, properties, XML, or just any machine&human readable format, textual format.

His 'logical test' has the same information content as the phrase 'test regular bowling score' in a test plan or TODO list. But it is considerably longer, therefor worse.
Using jbehave at all only makes sense in the case the test team are responsible for generating tests with more information in them than that. Otherwise, it would be more efficient to take the TODO list and code it up in JUnit.

And I love words like "appropriately" in the "expected values". You need to use cucumber or other wrappers as the generic documentation. If you're using it to cover and specify all possible scenarios you're probably wasting a lot of your time scrolling through hundred of feature files.

Related

How can I make my matrix multiplication Java code more fail-safe? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I am working on a project, where I was provided a Java matrix-multiplication program which can run in a distributed system , which is run like so :
usage: java Coordinator maxtrix-dim number-nodes coordinator-port-num
For example:
java blockMatrixMultiplication.Coordinator 25 25 54545
Here's a snapshot of how output looks like :
I want to extend this code with some kind of failsafe ability - and am curious about how I would create checkpoints within a running matrix multiplication calculation. The general idea is to recover to where it was in a computation (but it doesn't need to be so fine grained - just recover to beginning, i.e row 0 column 0 )
My first idea is to use log files (like Apache log4j ), where I would be logging the relevant matrix status. Then, if we forcibly shut down the app in the middle of a calculation, we could recover to a reasonable checkpoint.
Should I use MySQL for such a task (or maybe a more lightweight database)? Or would a basic log file ( and using some useful Apache libraries) be good enough ? any tips appreciated, thanks
source-code :
MatrixMultiple
Coordinator
Connection
DataIO
Worker

If I understand the problem correctly, all you need to do is recover your place in a single matrix calculation in the event of a crash or if the application is quit half way through.
Minimum Viable Solution
The simplest approach would be to recover just the two matrixes you were actively multiplying, but none of your progress, and multiply them from the beginning next time you load the application.
The Process:
At the beginning of public static int[][] multiplyMatrix(int[][] a, int[][] b) in your MatrixMultiple class, create a file, let's call it recovery_data.txt with the state of the two arrays being multiplied (parameters a and b). Alternatively, you could use a simple database for this.
At the end of public static int[][] multiplyMatrix(int[][] a, int[][] b) in your MatrixMultiple class, right before you return, clear the contents of the file, or wipe you database.
When the program is initially run, most likely near the beginning of the main(String[] args) you should check to see if the contents of the text file is non-null, in which case you should multiply the contents of the file, and display the output, otherwise proceed as usual.
Notes on implementation:
Using a simple text file or a full fledged relational database is a decision you are going to have to make, mostly based on the real world data that only you could really know, but in my mind, a textile wins out in most situations, and here are my reasons why. You are going to want to read the data sequentially to rebuild your matrix, and so being relational is not that useful. Databases are harder to work with, not too hard, but compared to a text file there is no question, and since you would not be much use of querying, that isn't balanced out by the ways they usually might make a programmers life easier.
Consider how you are going to store your arrays. In a text file, you have several options, my recommendation would be to store each row in a line of text, separated by spaces or commas, or some other character, and then put an extra line of blank space before the second matrix. I think a similar approach is used in crAlexander's Answer here, but I have not tested his code. Alternatively, you could use something more complicated like JSON, but I think that would be too heavy handed to justify. If you are using a database, then the relational structure should make several logical arrangements for your data apparent as well.
Strategic Checkpoints
You expressed interest in saving some calculations by taking advantage of the possibility that some of the calculations will have already been handled on last time the program ran. Lets look first look at the Pros and Cons of adding in checkpoints after every row has been processed, best I can see them.
Pros:
Save computation time next time the program is run, if the system had been closed.
Cons:
Making the extra writes will either use more nodes if distributed (more on that later) or increase general latency from calculations because you now have to throw in a database write operation for every checkpoint
More complicated to implement (but probably not by too much)
If my comments on the implementation of the Minimum Viable Solution about being able to get away with a text file convinced you that you would not have to add in RDBMS, I take back the parts about not leveraging queries, and everything being accessed sequentially, so a database is now perhaps a smarter choice.
I'm not saying that checkpoints are definitely not the better solution, just that I don't know if they are worth it, but here is what I would consider:
Do you expect people to be quitting half way through a calculation frequently relative to the total amount of calculations they will be running? If you think this feature will be used a lot, then the pro of adding checkpoints becomes much more significant relative to the con of it slowing down calculations as a whole.
Does it take a long time to complete a typical calculation that people are providing the program? If so, the added latency I mentioned in the cons is (percentage wise) smaller, and so perhaps more tolerable, but users are already less happy with performance, and so that cancels out some of the effect there. It also makes the argument for checkpointing more significant because it has the potential to save more time.
And so I would only recommend checkpointing like this if you expect a relatively large amount of instances where this is happening, and if it takes a relatively large amount of time to complete a calculation.
If you decide to go with checkpoints, then modify the approach to:
after every row has been processed on the array that you produce the content of that row to your database, or if you use the textile, at the end of the textile, after another empty line to separate it from the last matrix.
on startup if you need to finish a calculation that has already been begun, solve out and distribute only the rows that have yet to be considered, and retrieve the content of the other rows from your database.
A quick point on implementing frequent checkpoints: You could greatly reduce the extra latency from adding in frequent checkpoints by pushing this task out to an additional thread. Doing this would use more processes, and there is always some latency in actually spawning the process or thread, but you do not have to wait for the entire write operation to be completed before proceeding.
A quick warning on the implementation of any such failsafe method
If there is an unchecked edge case that would mean some sort of invalid matrix would crash the program, this failsafe now bricks the program it entirely by trying it again on every start. To combat this, I see some obvious solutions, but perhaps a bit of thought would let you modify my approaches to something you prefer:
Use a lot of try and catch statements, if you get any sort of error that seems to be caused by malformed data, wipe your recovery file, or modify it to add a note that tells your program to treat it as a special case. A good treatment of this special case may be to display the two matrixes at start with an explanation that your program failed to multiply them likely due to malformed content.
Add data in your file/database on how many times the program has quit while solving the current problem, if this is not the first resume, treat it like the special case in the above option.
I hope that this provided enough information for you to implement your failsafe in the way that makes the most sense given what you suspect the realistic use to be, and note that there are perhaps other ways you could approach this problem as well, and these could equally have their own lists of pros and cons to take into consideration.

Identify an english word as a thing or product?

Write a program with the following objective -
be able to identify whether a word/phrase represents a thing/product. For example -
1) "A glove comprising at least an index finger receptacle, a middle finger receptacle.." <-Be able to identify glove as a thing/product.
2) "In a window regulator, especially for automobiles, in which the window is connected to a drive..." <- be able to identify regulator as a thing.
Doing this tells me that the text is talking about a thing/product. as a contrast, the following text talks about a process instead of a thing/product -> "An extrusion coating process for the production of flexible packaging films of nylon coated substrates consisting of the steps of..."
I have millions of such texts; hence, manually doing it is not feasible. So far, with the help of using NLTK + Python, I have been able to identify some specific cases which use very similar keywords. But I have not been able to do the same with the kinds mentioned in the examples above. Any help will be appreciated!

What you want to do is actually pretty difficult. It is a sort of (very specific) semantic labelling task. The possible solutions are:
create your own labelling algorithm, create training data, test, eval and finally tag your data
use an existing knowledge base (lexicon) to extract semantic labels for each target word
The first option is a complex research project in itself. Do it if you have the time and resources.
The second option will only give you the labels that are available in the knowledge base, and these might not match your wishes. I would give it a try with python, NLTK and Wordnet (interface already available), you might be able to use synset hypernyms for your problem.

This task is called named entity reconition problem.
EDIT: There is no clean definition of NER in NLP community, so one can say this is not NER task, but instance of more general sequence labeling problem. Anyway, there is still no tool that can do this out of the box.
Out of the box, Standford NLP can only recognize following types:
Recognizes named (PERSON, LOCATION, ORGANIZATION, MISC), numerical
(MONEY, NUMBER, ORDINAL, PERCENT), and temporal (DATE, TIME, DURATION,
SET) entities
so it is not suitable for solving this task. There are some commercial solutions that possible can do the job, they can be readily found by googling "product name named entity recognition", some of them offer free trial plans. I don't know any free ready to deploy solution.
Of course, you can create you own model by hand-annotating about 1000 or so product name containing sentences and training some classifier like Conditional Random Field classifier with some basic features (here is documentation page that explains how to that with stanford NLP). This solution should work reasonable well, while it won't be perfect of course (no system will be perfect but some solutions are better then others).
EDIT: This is complex task per se, but not that complex unless you want state-of-the art results. You can create reasonable good model in just 2-3 days. Here is (example) step-by-step instruction how to do this using open source tool:
Download CRF++ and look at provided examples, they are in a simple text format
Annotate you data in a similar manner
a OTHER
glove PRODUCT
comprising OTHER
...
and so on.
Spilt you annotated data into two files train (80%) and dev(20%)
use following baseline template features (paste in template file)
U02:%x[0,0]
U01:%x[-1,0]
U01:%x[-2,0]
U02:%x[0,0]
U03:%x[1,0]
U04:%x[2,0]
U05:%x[-1,0]/%x[0,0]
U06:%x[0,0]/%x[1,0]
4.Run
crf_learn template train.txt model
crf_test -m model dev.txt > result.txt
Look at result.txt. one column will contain your hand-labeled data and other - machine predicted labels. You can then compare these, compute accuracy etc. After that you can feed new unlabeled data into crf_test and get your labels.
As I said, this won't be perfect, but I will be very surprised if that won't be reasonable good (I actually solved very similar task not long ago) and certanly better just using few keywords/templates
ENDNOTE: this ignores many things and some best-practices in solving such tasks, won't be good for academic research, not 100% guaranteed to work, but still useful for this and many similar problems as relatively quick solution.

Estimate unit tests required in large code base

Our team is responsible for a large codebase containing legal rules.
The codebase works mostly like this:
class SNR_15_UNR extends Rule {
public double getValue(RuleContext context) {
double snr_15_ABK = context.getValue(SNR_15_ABK.class);
double UNR = context.getValue(GLOBAL_UNR.class);
if(UNR <= 0) // if UNR value would reduce snr, apply the reduction
return snr_15_ABK + UNR;
return snr_15_ABK;
}
}
When context.getValue(Class<? extends Rule>) is called, it just evaluates the specific rule and returns the result. This allows you to create a dependency graph while a rule is evaluating, and also to detect cyclic dependencies.
There are about 500 rule classes like this. We now want to implement tests to verify the correctness of these rules.
Our goal is to implement a testing list as follows:
TEST org.project.rules.SNR_15_UNR
INPUT org.project.rules.SNR_15_ABK = 50
INPUT org.project.rules.UNR = 15
OUTPUT SHOULD BE 50
TEST org.project.rules.SNR_15_UNR
INPUT org.project.rules.SNR_15_ABK = 50
INPUT org.project.rules.UNR = -15
OUTPUT SHOULD BE 35
Question is: how many test scenario's are needed? Is it possible to use static code analysis to detect how many unique code paths exist throughout the code? Does any such tool exist, or do I have to start mucking about with Eclipse JDT?
For clarity: I am not looking for code coverage tools. These tell me which code has been executed and which code was not. I want to estimate the development effort required to implement unit tests.

(EDIT 2/25, focused on test-coding effort):
You have 500 sub-classes, and each appears (based on your example with one conditional) to have 2 cases. I'd guess you need 500*2 tests.
If your code is not a regular as you imply, a conventional (branch) code coverage tool might not be the answer you think you want as starting place, but it might actually help you make an estimate. Code T<50 tests across randomly chosen classes, and collect code coverage data P (as a percentage) over whatever part of the code base you think needs testing (particularly your classes). Then you need roughly (1-P)*100*T tests.
If your extended classes are all as regular as you imply, you might consider generating them. If you trust the generation process, you might be able avoid writing the tests.
(ORIGINAL RESPONSE, focused on path coverage tools)
Most code coverage tools are "line" or "branch" coverage tools; they do not count unique paths through the code. At best they count basic blocks.
Path coverage tools do exist; people have built them for research demos, but commercial versions are relatively rare. You can find one at http://testingfaqs.org/t-eval.html#TCATPATH. I don't think this one handles Java.
One of the issues is that the apparent paths through code is generally exponential in the number of decisions since each encountered decision generates a True path and a False path based on the outcome of the conditional (1 decision --> 2 paths, 2 decisions --> 4 paths, ...). Worse loops are in effect a decision repeated as many times as the loop iterates; a loop that repeats a 100 times in effect has 2**100 paths. To control this problem, the more interesting path coverage tools try to determine the feasibility of a path: if the symbolically combined predicates from the conditionals in a prefix of that path are effectively false, the path is infeasible and can be ignored, since it can't really occur. Another standard trick is treat loops as 0, 1, and N iterations to reduce the number of apparent paths. Managing the number of paths requires rather a lot of machinery, considerably above what most branch-coverage test tools need, which helps explain why real path coverage tools are rare.

how many test scenario's are needed?
Many. 500 might be a good start.
Is it possible to use static code analysis to detect how many unique code paths exist throughout the code?
Yes. It called a code coverage tool. Here are some free ones. http://www.java-sources.com/open-source/code-coverage

String analysis and classification

I am developing a financial manager in my freetime with Java and Swing GUI. When the user adds a new entry, he is prompted to fill in: Moneyamount, Date, Comment and Section (e.g. Car, Salary, Computer, Food,...)
The sections are created "on the fly". When the user enters a new section, it will be added to the section-jcombobox for further selection. The other point is, that the comments could be in different languages. So the list of hard coded words and synonyms would be enormous.
So, my question is, is it possible to analyse the comment (e.g. "Fuel", "Car service", "Lunch at **") and preselect a fitting Section.
My first thought was, do it with a neural network and learn from the input, if the user selects another section.
But my problem is, I don´t know how to start at all. I tried "encog" with Eclipse and did some tutorials (XOR,...). But all of them are only using doubles as in/output.
Anyone could give me a hint how to start or any other possible solution for this?
Here is a runable JAR (current development state, requires Java7) and the Sourceforge Page

Forget about neural networks. This is a highly technical and specialized field of artificial intelligence, which is probably not suitable for your problem, and requires a solid expertise. Besides, there is a lot of simpler and better solutions for your problem.
First obvious solution, build a list of words and synonyms for all your sections and parse for these synonyms. You can then collect comments online for synonyms analysis, or use parse comments/sections provided by your users to statistically detect relations between words, etc...
There is an infinite number of possible solutions, ranging from the simplest to the most overkill. Now you need to define if this feature of your system is critical (prefilling? probably not, then)... and what any development effort will bring you. One hour of work could bring you a 80% satisfying feature, while aiming for 90% would cost one week of work. Is it really worth it?
Go for the simplest solution and tackle the real challenge of any dev project: delivering. Once your app is delivered, then you can always go back and improve as needed.

String myString = new String(paramInput);
if(myString.contains("FUEL")){
//do the fuel functionality
}

In a simple app, if you will be having only some specific sections in your application then you can get string from comments and check it if it contains some keywords and then according to it change the value of Section.

If you have a lot of categories, I would use something like Apache Lucene where you could index all the categories with their name's and potential keywords/phrases that might appear in a users description. Then you could simply run the description through Lucene and use the top matched category as a "best guess".
P.S. Neural Network inputs and outputs will always be doubles or floats with a value between 0 and 1. As for how to implement String matching I wouldn't even know where to start.

It seems to me that following will do:
hard word statistics
maybe a stemming class (English/Spanish) which reduce a word like "lunches" to "lunch".
a list of most frequent non-words (the, at, a, for, ...)
The best fit is a linear problem, so theoretical fit for a neural net, but why not take immediately the numerical best fit.

A machine learning algorithm such as an Artificial Neural Network doesn't seem like the best solution here. ANNs can be used for multi-class classification (i.e. 'to which of the provided pre-trained classes does the input represent?' not just 'does the input represent an X?') which fits your use case. The problem is that they are supervised learning methods and as such you need to provide a list of pairs of keywords and classes (Sections) that spans every possible input that your users will provide. This is impossible and in practice ANNs are re-trained when more data is available to produce better results and create a more accurate decision boundary / representation of the function that maps the inputs to outputs. This also assumes that you know all possible classes before you start and each of those classes has training input values that you provide.
The issue is that the input to your ANN (a list of characters or a numerical hash of the string) provides no context by which to classify. There's no higher level information provided that describes the word's meaning. This means that a different word that hashes to a numerically close value can be misclassified if there was insufficient training data.
(As maclema said, the output from an ANN will always be floats with each value representing proximity to a class - or a class with a level of uncertainty.)
A better solution would be to employ some kind of word-relation or synonym graph. A Bag of words model might be useful here.
Edit: In light of your comment that you don't know the Sections before hand,
an easy solution to program would be to provide a list of keywords in a file that gets updated as people use the program. Simply storing a mapping of provided comments -> Sections, which you will already have in your database, would allow you to filter out non-keywords (and, or, the, ...). One option is to then find a list of each Section that the typed keywords belong to and suggest multiple Sections and let the user pick one. The feedback that you get from user selections would enable improvements of suggestions in the future. Another would be to calculate a Bayesian probability - the probability that this word belongs to Section X given the previous stored mappings - for all keywords and Sections and either take the modal Section or normalise over each unique keyword and take the mean. Calculations of probabilities will need to be updated as you gather more information ofcourse, perhaps this could be done with every new addition in a background thread.

Tweet value prediction: What sort of analysis (Bayesian?) might predict how much a Twitter user will value a tweet?

I’m thinking of adding a feature to the TalkingPuffin Twitter client, where, after some training with the user, it can rank incoming tweets according to their predicted value. What solutions are there for the Java virtual machine (Scala or Java preferred) to do this sort of thing?

This is a classification problem, where you essentially want to learn a function y(x) which predicts whether 'x', an unlabeled tweet, belongs in the class 'valuable' or in the class 'not valuable'.
The trickiest bits here are not the algorithm (Naive Bayes is just counting and multiplying and is easy to code!) but:
Gathering the training data
Defining the optimal feature set
For one, I suggest you track tweets that the user favorites, replies to, and retweets, and for the second, look at qualities like who wrote the tweet, the words in the tweet, and whether it contains a link or not.

Doing this well is not easy. Google would love to be able to do such things ("What links will the user value"), as would Netflix ("What movies will they value") and many others. In fact, you'd probably do well to read through the notes about the winning entry for the Netflix Prize.
Then you need to extract a bunch of features, as #hmason says. And then you need an appropriate machine learning algorithm; you either need a function approximator (where you try to use your features to predict a value between, say, 0 and 1, where 1 is "best tweet ever" and 0 is "omg who cares") or a classifier (where you use your features to try to predict whether it's a "good" or "bad" tweet).
If you go for the latter--which makes user-training easy, since they just have to score tweets with "like" (to mix social network metaphors)--then you typically do best with support vector machines, for which there exists a fairly comprehensive Java library.
In the former case, there are a variety of techniques that might be worth trying; if you decide to use the LIBSVM library, they have variants for regression (i.e. parameter estimation) as well.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.