How to test generated Strings where order does not matter?

How to test generated Strings where order does not matter? - java

How to unit generated strings where the end order is fairly flexible. Lets say I'm trying to test some code that prints out out generated SQL that comes from key-value pairs. However, the exact order of many of the fragments does not matter.
For example
SELECT
*
FROM
Cats
WHERE
fur = 'fluffy'
OR
colour = 'white'
is functionally identical to
SELECT
*
FROM
Cats
WHERE
colour = 'white'
OR
fur = 'fluffy'
It doesn't matter in which order the condition clauses get generated, but it does matter that they follow the where clause. Also, it is hard to predict since the ordering of pairs when looping through the entrySet() of a HashMap is not predictable. Sorting the keys would solve this, but introduces a runtime penalty for no (or negative) business value.
How do I unit test the generation of such strings without over-specifying the order?
I thought about using a regexp but* I could not think of how to write one that said:
A regex is what I was thinking of but I can think of a regex that says something like "SELECT * FROM Cats WHERE" followed by one of {"fur = 'fluffy', colour = 'white'} followed by "OR"followed by one of one of {"fur = 'fluffy',colour = 'white'} ... and not the one used last time.
NB: I'm not actually doing this with SQL, it just made for an easier way to frame the problem.

I see a few different options:
If you can live with a modest runtime penalty, LinkedHashMap keeps insertion order.
If you want to solve this completely without changing your implementation, in your example I don't see why you should have to do something more complicated than checking that every fragment appears in the code, and that they appear after the WHERE. Pseudo-code:
Map<String, String> parametersAndValues = { "fur": "fluffy", "colour", "white" };
String generatedSql = generateSql(parametersToValues);
int whereIndex = generatedSql.indexOf("WHERE");
for (String key, value : parametersAndValues) {
String fragment = String.format("%s = '%s'", key, value);
assertThat(generatedSql, containsString(fragment));
assertThat(whereIndex, is(lessThan(generatedSql.indexOf(fragment))));
}
But we can do it even simpler than that. Since you don't actually have to test this with a large set of parameters - for most implementations there are only three important quantities, "none, one, or many" - it's actually feasible to test it against all possible values:
String variation1 = "SELECT ... WHERE fur = 'fluffy' OR colour = 'white'";
String variation2 = "SELECT ... WHERE colour = 'white' OR fur = 'fluffy'";
assertThat(generatedSql, is(anyOf(variation1, variation2)));
Edit: To avoid writing all possible variations by hand (which gets rather tedious if you have more than two or three items as there are n! ways to combine n items), you could have a look at the algorithm for generating all possible permutations of a sequence and do something like this:
List<List<String>> permutations = allPermutationsOf("fur = 'fluffy'",
"colour = 'white'", "scars = 'numerous'", "disposition = 'malignant'");
List<String> allSqlVariations = new ArrayList<>(permutations.size());
for (List<String> permutation : permutations) {
allSqlVariations.add("SELECT ... WHERE " + join(permutation, " OR "));
}
assertThat(generatedSql, is(anyOf(allSqlVariations)));

Well, one option would be to somehow parse the SQL, extract the list of fields and check that everything is ok, disregarding order of the fields. However, this is going to be rather ugly: If done right, you have to implement a complete SQL parser (obviously overkill), if you do it quick-and-dirty using regex or similar, you risk that the test will break for minor changes to the generated SQL.
Instead, I'd propose to use a combination of unit and integration testing:
Have a unit test that tests the code which supplies the list of fields for building the SQL. I.e., have a method Map getRestrictions() which you can easily unit-test.
Have an integration test for the SQL generation as a whole, which runs against a real database (maybe some embedded DB like the H2 database, which you can a start just for the test).
That way, you unit-test the actual values supplied to the SQL, and you integration-test that you are really creating the right SQL.
Note: I my opinion this is an example of "integration code", which cannot be usefully unit-tested. The problem is that the code does not produce a real, testable result by itself. Rather, its purpose is to interface with a database (by sending it SQL), which produces the result. In other words, the code does the right thing not if it produces some specific SQL string, but if it drives the database to do the right thing. Therefore, this code can be meaningfully tested only with the database, i.e. in an integration test.

First, use a LinkedHashMap instead of a regular HashMap. It shouldn't introduce any noticeable performance degradation. Rather than sorting, it retains insertion ordering.
Second, insert the pairs into the map in a well understood manner. Perhaps you are getting the data from a table, and adding an ordering index is unacceptable. But perhaps the database can be ordered by primary key or something.
Combined, those two changes should give you predictable results.
Alternatively, compare actual vs. expected using something smarter than string equals. Perhaps a regex to scrape out all the pairs that have been injected into the actual SQL query?

The best I have come-up so far is to use some library during testing (suck as PowerMockito) to replace the HashMap with a SortedMap like TreeMap. That way for the tests the order will be fixed. However, this only works if the map isn't built in the same code that generates the string.

Related

JDBI: How to map query as list of objects

I am using JDBI to iterate through a resultset via streams. Currently mapToMap is causing problems when there is a column with the same name in the result. What I need is just the values without the column names.
Is there a way to map the results to an Object list/array? The docs does not have an example for this. I would like to have something like
query.mapTo(List<Object>.class).useStream(s -> { . . .})

First of all - what kind of use case would allow you not care at all about the column name but only the values? I am genuinely curious
If it does make sense, it is trivial to implement a RowMapper<List<Object>> in your case, which runs through all the columns by index and puts the results of rs.getObject(i) into a list.

jOOQ: returning list with join,groupby and count in single object

Core question: how do you properly fetch information from a query into objects?
Idea
I am creating functions in my DAO, which comes down to the following query:
select A.*, count(*)
from A
left join B on B.aId = A.aId
group by A.*
Im looking for a way to create a jOOQ expression that just gives me a list (or something I can loop over) with objects A (pojo) and Integer.
Concrete case
In my code case: A = Volunteer and B = VolunteerMatch where I store several matches for each volunteer. B has (volunteerId, volunteerMatchId) as primary
key. Thus this query results in both the information from the Volunteer, as well as the number of matches. Clearly this can be done in two seperate queries, but I want to do it as one!
Problem
I cannot find a single object to return in my function. I am trying to get something like List<VolunteerPojo, Integer>. Let me explain this better using examples and why they dont fit for me.
What I tried 1
SelectHavingStep<Record> query = using(configuration())
.select(Volunteer.VOLUNTEER.fields())
.select(Volunteermatch.VOLUNTEERMATCH.VOLUNTEERID.count())
.from(Volunteer.VOLUNTEER)
.leftJoin(Volunteermatch.VOLUNTEERMATCH).on(Volunteermatch.VOLUNTEERMATCH.VOLUNTEERID.eq(Volunteer.VOLUNTEER.VOLUNTEERID))
.groupBy(Volunteer.VOLUNTEER.fields());
Map<VolunteerPojo, List<Integer>> map = query.fetchGroups(
r -> r.into(Volunteer.VOLUNTEER).into(VolunteerPojo.class),
r -> r.into(Volunteermatch.VOLUNTEERMATCH.VOLUNTEERID.count()).into(Integer.class)
);
The problem with this, is that I create a List from the integers. But that is not what I want, I want a single integer (the count will always return one row). Note: I don't want the solution "just create your own map without list", since my gut says there is a solution inside jOOQ. Im here to learn!
What I tried 2
SelectHavingStep<Record> query = using(configuration())
.select(Volunteer.VOLUNTEER.fields())
.select(Volunteermatch.VOLUNTEERMATCH.VOLUNTEERID.count())
.from(Volunteer.VOLUNTEER)
.leftJoin(Volunteermatch.VOLUNTEERMATCH).on(Volunteermatch.VOLUNTEERMATCH.VOLUNTEERID.eq(Volunteer.VOLUNTEER.VOLUNTEERID))
.groupBy(Volunteer.VOLUNTEER.fields());
Result<Record> result = query.fetch();
for (Record r : result) {
VolunteerPojo volunteerPojo = r.into(Volunteer.VOLUNTEER).into(VolunteerPojo.class);
Integer count = r.into(Volunteermatch.VOLUNTEERMATCH.VOLUNTEERID.count()).into(Integer.class);
}
However, I do not want to return the result object in my code. On each place I call this function, I am calling the r.into(...).into(...). During compile time, this won't give an error if it returns an integer or a real pojo. I don't want this to prevent future errors. But at least it doesn't give it in a List I suppose.
Reasoning
Either option is probably fine, but I have the feeling there is something better that I missed in the documentation. Maybe I can adapt (1) to not get a list of integers. Maybe I can change Result<Record> into something like Result<VolunteerPojo, Integer> to indicate what objects really are returned. A solution for each problem would be nice, since I am using jOOQ more and more and this would be a good learning experience!

So close! Don't use ResultQuery.fetchGroups(). Use ResultQuery.fetchMap() instead:
Map<VolunteerPojo, Integer> map =
using(configuration())
.select(VOLUNTEER.fields())
.select(VOLUNTEERMATCH.VOLUNTEERID.count())
.from(VOLUNTEER)
.leftJoin(VOLUNTEERMATCH)
.on(VOLUNTEERMATCH.VOLUNTEERID.eq(VOLUNTEER.VOLUNTEERID))
.groupBy(VOLUNTEER.fields())
.fetchMap(
r -> r.into(VOLUNTEER).into(VolunteerPojo.class),
r -> r.get(VOLUNTEERMATCH.VOLUNTEERID.count())
);

Looking for a table-like data structure

I have 2 sets of data.
Let say one is a people, another is a group.
A people can be in multiple groups while a group can have multiple people.
My operations will basically be CRUD on group and people.
As well as a method that makes sure a list of people are in different groups (which gets called alot).
Right now I'm thinking of making a table of binary 0's and 1's with horizontally representing all the people and vertically all the groups.
I can perform the method in O(n) time by adding each list of binaries and compare with the "and" operation of the list of binaries.
E.g
Group A B C D
ppl1 1 0 0 1
ppl2 0 1 1 0
ppl3 0 0 1 0
ppl4 0 1 0 0
check (ppl1, ppl2) = (1001 + 0110) == (1001 & 0110)
= 1111 == 1111
= true
check (ppl2, ppl3) = (0110 + 0010) == (0110+0010)
= 1000 ==0110
= false
I'm wondering if there is a data structure that does something similar already so I don't have to write my own and maintain O(n) runtime.

I don't know all of the details of your problem, but my gut instinct is that you may be over thinking things here. How many objects are you planning on storing in this data structure? If you have really large amounts of data to store here, I would recommend that you use an actual database instead of a data structure. The type of operations you are describing here are classical examples of things that relational databases are good at. MySQL and PostgreSQL are examples of large scale relational databases that could do this sort of thing in their sleep. If you'd like something lighter-weight SQLite would probably be of interest.
If you do not have large amounts of data that you need to store in this data structure, I'd recommend keeping it simple, and only optimizing it when you are sure that it won't be fast enough for what you need to do. As a first shot, I'd just recommend using java's built in List interface to store your people and a Map to store groups. You could do something like this:
// Use a list to keep track of People
List<Person> myPeople = new ArrayList<Person>();
Person steve = new Person("Steve");
myPeople.add(steve);
myPeople.add(new Person("Bob"));
// Use a Map to track Groups
Map<String, List<Person>> groups = new HashMap<String, List<Person>>();
groups.put("Everybody", myPeople);
groups.put("Developers", Arrays.asList(steve));
// Does a group contain everybody?
groups.get("Everybody").containsAll(myPeople); // returns true
groups.get("Developers").containsAll(myPeople); // returns false
This definitly isn't the fastest option available, but if you do not have a huge number of People to keep track of, you probably won't even notice any performance issues. If you do have some special conditions that would make the speed of using regular Lists and Maps unfeasible, please post them and we can make suggestions based on those.
EDIT:
After reading your comments, it appears that I misread your issue on the first run through. It looks like you're not so much interested in mapping groups to people, but instead mapping people to groups. What you probably want is something more like this:
Map<Person, List<String>> associations = new HashMap<Person, List<String>>();
Person steve = new Person("Steve");
Person ed = new Person("Ed");
associations.put(steve, Arrays.asList("Everybody", "Developers"));
associations.put(ed, Arrays.asList("Everybody"));
// This is the tricky part
boolean sharesGroups = checkForSharedGroups(associations, Arrays.asList(steve, ed));
So how do you implement the checkForSharedGroups method? In your case, since the numbers surrounding this are pretty low, I'd just try out the naive method and go from there.
public boolean checkForSharedGroups(
Map<Person, List<String>> associations,
List<Person> peopleToCheck){
List<String> groupsThatHaveMembers = new ArrayList<String>();
for(Person p : peopleToCheck){
List<String> groups = associations.get(p);
for(String s : groups){
if(groupsThatHaveMembers.contains(s)){
// We've already seen this group, so we can return
return false;
} else {
groupsThatHaveMembers.add(s);
}
}
}
// If we've made it to this point, nobody shares any groups.
return true;
}
This method probably doesn't have great performance on large datasets, but it is very easy to understand. Because it's encapsulated in it's own method, it should also be easy to update if it turns out you need better performance. If you do need to increase performance, I would look at overriding the equals method of Person, which would make lookups in the associations map faster. From there you could also look at a custom type instead of String for groups, also with an overridden equals method. This would considerably speed up the contains method used above.
The reason why I'm not too concerned about performance is that the numbers you've mentioned aren't really that big as far as algorithms are concerned. Because this method returns as soon as it finds two matching groups, in the very worse case you will call ArrayList.contains a number of times equal to the number of groups that exist. In the very best case scenario, it only needs to be called twice. Performance will likely only be an issue if you call the checkForSharedGroups very, very often, in which case you might be better off finding a way to call it less often instead of optimizing the method itself.

Have you considered a HashTable? If you know all of the keys you'll be using, it's possible to use a Perfect Hash Function which will allow you to achieve constant time.

How about having two separate entities for People and Group. Inside People have a set of Group and vice versa.
class People{
Set<Group> groups;
//API for addGroup, getGroup
}
class Group{
Set<People> people;
//API for addPeople,getPeople
}
check(People p1, People p2):
1) call getGroup on both p1,p2
2) check the size of both the set,
3) iterate over the smaller set, and check if that group is present in other set(of group)
Now, you can basically store People object in any data structure. Preferably a linked list if size is not fixed otherwise an array.

Get a value from hashtable by a part of its key

Say I have a Hashtable<String, Object> with such keys and values:
apple => 1
orange => 2
mossberg => 3
I can use the standard get method to get 1 by "apple", but what I want is getting the same value (or a list of values) by a part of the key, for example "ppl". Of course it may yield several results, in this case I want to be able to process each key-value pair. So basically similar to the LIKE '%ppl%' SQL statement, but I don't want to use a (in-memory) database just because I don't want to add unnecessary complexity. What would you recommend?
Update:
Storing data in a Hashtable isn't a requirement. I'm seeking for a kind of a general approach to solve this.

The obvious brute-force approach would be to iterate through the keys in the map and match them against the char sequence. That could be fine for a small map, but of course it does not scale.
This could be improved by using a second map to cache search results. Whenever you collect a list of keys matching a given char sequence, you can store these in the second map so that next time the lookup is fast. Of course, if the original map is changed often, it may get complicated to update the cache. As always with caches, it works best if the map is read much more often than changed.
Alternatively, if you know the possible char sequences in advance, you could pre-generate the lists of matching strings and pre-fill your cache map.
Update: Hashtable is not recommended anyway - it is synchronized, thus much slower than it should be. You are better off using HashMap if no concurrency is involved, or ConcurrentHashMap otherwise. Latter outperforms a Hashtable by far.
Apart from that, out of the top of my head I can't think of a better collection to this task than maps. Of course, you may experiment with different map implementations, to find the one which suits best your specific circumstances and usage patterns. In general, it would thus be
Map<String, Object> fruits;
Map<String, List<String>> matchingKeys;

Not without iterating through explicitly. Hashtable is designed to go (exact) key->value in O(1), nothing more, nothing less. If you will be doing query operations with large amounts of data, I recommend you do consider a database. You can use an embedded system like SQLite (see SQLiteJDBC) so no separate process or installation is required. You then have the option of database indexes.
I know of no standard Java collection that can do this type of operation efficiently.

Sounds like you need a trie with references to your data. A trie stores strings and lets you search for strings by prefix. I don't know the Java standard library too well and I have no idea whether it provides an implementation, but one is available here:
http://www.cs.duke.edu/~ola/courses/cps108/fall96/joggle/trie/Trie.java
Unfortunately, a trie only lets you search by prefixes. You can work around this by storing every possible suffix of each of your keys:
For 'apple', you'd store the strings
'apple'
'pple'
'ple'
'le'
'e'
Which would allow you to search for every prefix of every suffix of your keys.
Admittedly, this is the kind of "solution" that would prompt me to continue looking for other options.

first of all, use hashmap, not hashtable.
Then, you can filter the map using a predicate by using utilities in google guava
public Collection<Object> getValues(){
Map<String,Object> filtered = Maps.filterKeys(map,new Predicate<String>(){
//predicate methods
});
return filtered.values();
}

Can't be done in a single operation
You may want to try to iterate the keys and use the ones that contain your desired string.

The only solution I can see (I'm not Java expert) is to iterate over the keys and check for matching against a regular expression. If it matches, you put the matched key-value pair in the hashtable that will be returned.

If you can somehow reduce the problem to searching by prefix, you might find a NavigableMap helpful.

it will be interesting to you to look throw these question: Fuzzy string search library in Java
Also take a look on Lucene (answer number two)

How to best represent Constants (Enums) in the Database (INT vs VARCHAR)?

what is the best solution in terms of performance and "readability/good coding style" to represent a (Java) Enumeration (fixed set of constants) on the DB layer in regard to an integer (or any number datatype in general) vs a string representation.
Caveat: There are some database systems that support "Enums" directly but this would require to keept the Database Enum-Definition in sync with the Business-Layer-implementation. Furthermore this kind of datatype might not be available on all Database systems and as well might differ in the syntax => I am looking for an easy solution that is easy to mange and available on all database systems. (So my question only adresses the Number vs String representation.)
The Number representation of a constants seems to me very efficient to store (for example consumes only two bytes as integer) and is most likely very fast in terms of indexing, but hard to read ("0" vs. "1" etc)..
The String representation is more readable (storing "enabled" and "disabled" compared to a "0" and "1" ), but consumes much mor storage space and is most likely also slower in regard to indexing.
My questions is, did I miss some important aspects? What would you suggest to use for an enum representation on the Database layer.
Thank you very much!

In most cases, I prefer to use a short alphanumeric code, and then have a lookup table with the expanded text. When necessary I build the enum table in the program dynamically from the database table.
For example, suppose we have a field that is supposed to contain, say, transaction type, and the possible values are Sale, Return, Service, and Layaway. I'd create a transaction type table with code and description, make the codes maybe "SA", "RE", "SV", and "LY", and use the code field as the primary key. Then in each transaction record I'd post that code. This takes less space than an integer key in the record itself and in the index. Exactly how it is processed depends on the database engine but it shouldn't be dramatically less efficient than an integer key. And because it's mnemonic it's very easy to use. You can dump a record and easily see what the values are and likely remember which is which. You can display the codes without translation in user output and the users can make sense of them. Indeed, this can give you a performance gain over integer keys: In many cases the abbreviation is good for the users -- they often want abbreviations to keep displays compact and avoid scrolling -- so you don't need to join on the transaction table to get a translation.
I would definitely NOT store a long text value in every record. Like in this example, I would not want to dispense with the transaction table and store "Layaway". Not only is this inefficient, but it is quite possible that someday the users will say that they want it changed to "Layaway sale", or even some subtle difference like "Lay-away". Then you not only have to update every record in the database, but you have to search through the program for every place this text occurs and change it. Also, the longer the text, the more likely that somewhere along the line a programmer will mis-spell it and create obscure bugs.
Also, having a transaction type table provides a convenient place to store additional information about the transaction type. Never ever ever write code that says "if whatevercode='A' or whatevercode='C' or whatevercode='X' then ..." Whatever it is that makes those three codes somehow different from all other codes, put a field for it in the transaction table and test that field. If you say, "Well, those are all the tax-related codes" or whatever, then fine, create a field called "tax_related" and set it to true or false for each code value as appropriate. Otherwise when someone creates a new transaction type, they have to look through all those if/or lists and figure out which ones this type should be added to and which it shouldn't. I've read plenty of baffling programs where I had to figure out why some logic applied to these three code values but not others, and when you think a fourth value ought to be included in the list, it's very hard to tell whether it is missing because it is really different in some way, or if the programmer made a mistake.
The only type I don't create the translation table is when the list is very short, there is no additional data to keep, and it is clear from the nature of the universe that it is unlikely to ever change so the values can be safely hard-coded. Like true/false or positive/negative/zero or male/female. (And hey, even that last one, obvious as it seems, there are people insisting we now include "transgendered" and the like.)
Some people dogmatically insist that every table have an auto-generated sequential integer key. Such keys are an excellent choice in many cases, but for code lists, I prefer the short alpha key for the reasons stated above.

I would store the string representation, as this is easy to correlate back to the enum and much more stable. Using ordinal() would be bad because it can change if you add a new enum to the middle of the series, so you would have to implement your own numbering system.
In terms of performance, it all depends on what the enums would be used for, but it is most likely a premature optimization to develop a whole separate representation with conversion rather than just use the natural String representation.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.