I know this is kind of hard to understand but bear with me here, I'm taking a statistics class and I wanted to automate the process, so what we do is take some data, take the minimum number and maximum, get the range, all easy and already done.
But here is the trick, we need to split the data between classes, for example user entered 10 data points ranging between the numbers 10 and 40, what you wanna do is create classes to fit each number, class one is the numbers between 10 and 15, class 2 is 15 and 20 and so on (this was just an example).. thinking about it first you can just make a shit ton of if conditions, but that doesn't seem intuitive and second all data and class width and number of classes are the user's choice, so how do you suggest I approach this problem?
To help you understand here is a better example:
STATS
Im trying to better organise the types of tasks regularly sent to my team based off of the titles and short comment people enter.
Our team only handles a handful of issues (maybe 10 or so) different types of tasks, so I've put together a list of common words used within the description of a particular type of task and i've been using this to categorise the issues. for example.... an issue might come through like "User x doesn't have access to office after hours, please update their swipecard access level". what i've got so far is if the comments contain 'swipecard' or 'access', its a building access type request.
I've quickly found myself with code that's LOTS of ... if contains, and if !contains...
Is there a neater way of doing what im after?
If you want to make it complex, it sounds like you have a classification problem.
If you want to keep it simple, you're probably on the right track with your if statements and contains(). To get to a cleaner solution, I would approach it as follows:
Create a class to modify your categories - give it two attributes: String categoryName, List<String> commonlyUsedWords;
Populate a list with instances of that class - one per type.
For each issue, loop through the list of categories and check how many words match, and store that as a percentage (e.g. 8 out of 10 words match, therefore 80% match).
Return the category with the highest match rate.
I was reading through a few active SO questions and I came across this one that reminded me of a question that I have had for a while.
In a few applications that I have done, there came a point when I had to use a view that uses category totals to display some easy to read values, such as cost, or counts, or whatever. Up until now, I have always had to find workarounds for these totals because I could not get them to show up in the dynamicViewPanels, or anything else of the like. My solutions have always been, (as David Leedy suggests in the linked question) static HTML tables displaying the counts with the category, View Navigators which display the information in repeaters, and building complex dialogs which allow the user to pick certain information to get a calculation table with the appropriate values and formulas etc.
My question is, did I overlook something in the controls that actually allow these column totals to show up in the existing view panel controls?
EDIT
Just for the sake of clarity, I am not talking about a column with a calculated value, but really the totals for categories.
There is a ready baked function that uses a ViewNavigator to get to values from category totals. That just might do the trick for you.
I've encountered JBehave recently and I think we should use it. So I have called in the tester of our team and he also thinks that this should be used.
With that as starting point I have asked the tester to write stories for a test application (the Bowling Game Kata of Uncle Bob). At the end of the day we would try to map his tests against the bowling game.
I was expecting a test like this:
Given a bowling game
When player rolls 5
And player rolls 4
Then total pins knocked down is 9
Instead, the tester came with 'logical tests', in other words he was not being that specific. But, in his terms this was a valid test.
Given a bowling game
When player does a regular throw
Then score should be calculated appropriately
My problem with this is ambiguity, what is a 'regular throw'? What is 'appropriately'? What will it mean when one of those steps fail?
However, the tester says that a human does understand and that what I was looking for where 'physical tests', which where more cumbersome to write.
I could probably map 'regular' with rolling two times 4 (still no spare, nor strike), but it feels like I am again doing a translation I don't want to make.
So I wonder, how do you approach this? How do you write your JBehave tests? And do you have any experience when it is not you who writes these tests, and you have to map them to your code?
His test is valid, but requires a certain knowledge of the domain, which no framework will have. Automated tests should be explicit, think of them as examples. Writing them costs more than writing "logical tests", but this pays in the long run since they can be replayed at will, very quickly, and give an immediate feedback.
You should have paired with him writing the first tests, to put it in the right direction. Perhaps you could give him your test, and ask him to increase the coverage by adding new tests.
The amount of explicitness needed in acceptance criteria depends on level of trust between the development team and the business stakeholders.
In your example, the business is assuming that the developers/testers understand enough about bowling to determine the correct outcome.
But imagine a more complex domain, like finance. For that, it would probably be better to have more explicit examples to ensure a good understanding of the requirement.
Alternatively, let's say you have a scenario:
Given I try to sign up with an invalid email address
Then I should not be registered
For this, a developer/tester probably has better knowledge of what constitutes a valid or invalid email address than the business stakeholder does. You would still want to test against a variety of addresses, but that can be specified within the step definitions, rather than exposing it at the scenario level.
I hate such vague words as "appropriately" in the "expected values". The "appropriately" is just an example of "toxic word" for the testing, and if not eliminated, this "approach" can get widespread, effectively killing the testing in general. It might "be enough" for human tester, but such "test cases" are acceptable only at first attempts to exploratory "smoke test".
Whatever reproducible, systematical and automatable, every test case must be specific. (not just "should".. to assume the softness of "would" could be allowed? Instead I use the present tense "shall be" or even better strict "is", as a claim to confirm/refuse.) And this rule is absolute once it comes to automation.
What your tester made, was rather a "test-area", a "scenario template", instead of a real test-case: Because so many possible test-results can be produced...
You were specific, in your scenario: That was a very specific real "test case". It is possible to automate your test case, nice: You can delegate it on a machine and evaluate it as often as you need, automatically. (with the bonus of automated report, from an Continuous Integration server)
But the "empty test scenario template"? It has some value too: It is a "scenario template", an empty skeleton prepared to be filled by data: So I love to name these situations "DDT": "Data Driven Testing".
Imagine a web-form to be tested, with validations on its 10 inputs, with cross-validations... And the submit button. There can be 10 test-cases for every single input:
empty;
with a char, but still too short anyway;
too long for the server, but allowed within the form for copy-paste and further edits;
with invalid chars...
The approach I recommend is to prepare a set of to-pass data: even to generate them (from DB or even randomly), whatever you can predict shall pass the test, the "happy scenario". Keep the data aside, as a data-template, and use it to initialize the form, to fill it up, and then to brake-down some single value: Create test cases "to fail". Do it i.e. 10 times for every single input, for each of the 10 inputs (100 tests-cases even before cross-rules attempted) ... and then, after the 100 times of the refusing of the form by the server, fill up the form by the to-pass data, without distorting them, so the form can be accepted finally. (accepted submit changes status on the server-app, so needs to go as the last one, to test all the 101 cases on the same app-state)
To do your test this way, you need two things:
the empty scenario template,
and a table of 100 rows of data:
10 columns of input data: with only one value manipulated, as passing row by row down the table (i.e. ever heard about grey-code?),
possibly keeping the inheritance history in a row-description, where from is the row derived and how, via which manipulated value.
Also the 11th column, the "expected result" column(s) filled: to pass/fail expected status, expected err/validation message, reference to the requirements, for the test-coveradge tracking. (i.e. ever seen FitNesse?)
And possibly also the column for the real detected result, when test performed, to track history of the single row-test-case. (so the CI server mentioned already)
To combine the "empty scenario skeleton" on one side and the "data-table to drive the test" on the other side, some mechanism is needed, indeed. And your data need to be imported. So you can prepare the rows in excel, which could be theoretically imported too, but for the easier life I recommend either CSV, properties, XML, or just any machine&human readable format, textual format.
His 'logical test' has the same information content as the phrase 'test regular bowling score' in a test plan or TODO list. But it is considerably longer, therefor worse.
Using jbehave at all only makes sense in the case the test team are responsible for generating tests with more information in them than that. Otherwise, it would be more efficient to take the TODO list and code it up in JUnit.
And I love words like "appropriately" in the "expected values". You need to use cucumber or other wrappers as the generic documentation. If you're using it to cover and specify all possible scenarios you're probably wasting a lot of your time scrolling through hundred of feature files.
I have a database full of two different types of users (Mentors and Mentees), whereby I want the second group (Mentees) to be able to "search" for people in the first group (Mentors) who match their profile. Mentors and Mentees can both go in and change items in their profile at any point in time.
Currently, I am using Apache Mahout for the user matching (recommender.mostSimilarIDs()). The problem I'm running into is that I have to reload the user data every single time anyone searches. By itself, this doesn't take that long, but when Mahout processes the data it seems to take a very long time (14 minutes for 3000 Mentors and 3000 Mentees). After processing, matching takes mere seconds. I also get the same INFO message over and over again while it's processing ("Processed 2248 users"), while looking at the code shows that the message should only be outputted every 10000 users.
I'm using the GenericUserBasedRecommender and the GenericDataModel, along with the NearestNUserNeighborhood, AveragingPreferenceInferrer and PearsonCorrelationSimilarity. I load mentors from the database, add the mentee to the list of POJOs and convert them to a FastByIDMap to give to the DataModel.
Is there a better way to be doing this? The product owner needs the data to be current for every search.
(I'm the author.)
You shouldn't need to ask it to reload the data every time, why's that?
14 minutes sounds way, way too long to load such a small amount of data too, something's wrong. You might follow up with more info at user#mahout.apache.org.
You are seeing log messages from a DataModel, which you can disable in your logging system of choice. It prints one final count. This is nothing to worry about.
I would advise you against using a PreferenceInferrer unless you absolutely know you want it. Do you actually have ratings here? I might suggest LogLikelihoodSimilarity if not.