DB Unit should ignore order of rows - java

Is there a way telling DB-Unit to ignore the order in which rows should be compared? My problem is, that I do not know in which order the rows will be written to the database, but DB-Unit forces me to give an ordered list.
What I want dbunit to do is:
check that number of rows in database and expected dataset match (Solved: Works out of the box
check whether each rows will be found only once in the result-set. (NOT SOLVED)
Any ideas?

Solved this issue for me. I'm sorting the rows of the actual and expected tables. Therefore I use all columns which can be found in expected table. This approach might result in problems if the table you are checking is large but in my case it is not. :-)
Column[] expectedColumns = expectedTable.getTableMetaData().getColumns();
ITable sortedExpected = new SortedTable(expectedTable, expectedColumns);
ITable sortedActual = new SortedTable(actualTable, expectedColumns);
Assertion.assertEquals(sortedExpected, sortedActual);

Related

CLOB and CriteriaQuery

I have an entity that has a CLOB attribute:
public class EntityS {
...
#Lob
private String description;
}
To retrieve certain EntityS from the DB we use a CriteriaQuery where we need the results to be unique, so we do:
query.where(builder.and(predicates.toArray(new Predicate[predicates.size()]))).distinct(true).orderBy(builder.asc(root.<Long> get(EntityS_.id)));
If we do that we get the following error:
ORA-00932: inconsistent datatypes: expected - got CLOB
I know that's because you cannot use distinct when selecting a CLOB. But we need the CLOB. Is there a workaround for this using CriteriaQuery with Predicates and so on?
We are using an ugly workaround getting rid of the .unique(true) and then filtering the results, but that's crap. We are using it only to be able to keep on developing the app, but we need a better solution and I don't seem to find one...
In case you are using Hibernate as persistence provider, you can specify the following query hint:
query.setHint(QueryHints.HINT_PASS_DISTINCT_THROUGH, false);
This way, "distinct" is not passed through to the SQL command, but Hibernate will take care of returning only distinct values.
See here for more information: https://thoughts-on-java.org/hibernate-tips-apply-distinct-to-jpql-but-not-sql-query/
Thinking outside the box - I have no idea if this will work, but perhaps it is worth a shot. (I tested it and it seems to work, but I created a table with just one column, CLOB data type, and two rows, both with the value to_clob('abcd') - of course it should work on that setup.)
To de-duplicate, compute a hash of each clob, and instruct Oracle to compute a row number partitioned by the hash value and ordered by nothing (null). Then select just the rows where the row number is 1. Something like below (t is the table I created, with one CLOB column called c).
I expect that execution time should be reasonably good. The biggest concern, of course, is collisions. How important is it that you not miss ANY of the CLOBs, and how many rows do you have in the base table in the first place? Is something like "one chance in a billion" of having a collision acceptable?
select c
from (
select c, row_number() over (partition by dbms_crypto.hash(c, 3) order by null) as rn
from t
)
where rn = 1;
Note - the user (your application, in your case) must have EXECUTE privilege on SYS.DBMS_CRYPTO. A DBA can grant it if needed.

hibernate search 4 distinct results on lucene index

in a relational database there are some normalized tables, while the actually relevant data for me is stored in a view, which is really big (about 120 million rows for 80 columns).
About 10 of 80 columns are relevant for searching issues, which are to be implemented using hibernate search 4.3.2.
It seems logical to me, that by indexing the view entity and querying only 10 of 80 desired columns (#Field annotation) i'm getting loads of redundant data, which distincts only by the primary key.
Currently i do following:
ScrollableResults ids = fullTextSession.createCriteria(clazz)
.addOrder(Order.asc("id"))
.add(Restrictions.ilike(field, query))
.add(Projections.distinct(Projections.id()))
.setProjection((Projections.distinct(Projections.id())))
.scroll(ScrollMode.FORWARD_ONLY);
ArrayList<String> results = new ArrayList<String>();
while (ids.next()) {
ScrollableResults redundantResults = fullTextSession.createCriteria(clazz)
.add(Restrictions.idEq(ids.get(0)))
.setProjection(Projections.projectionList()
.add(Projections.property("name"))
.add(Projections.property("city"))
.add(Projections.property("postal"))
)
.scroll(ScrollMode.FORWARD_ONLY);
if (redundantResults.next())
results.add((String) redundantResults.get(0));
}
I know i must be somewhere wrong, my intentions are:
1. Get a distinct set of objects, matching my search criteria
2. Obtain them only using lucene index, since a DB-query is too expensive
While the step of obtaining the distinct ids seems to be really good at performance, the second step of getting the properties data from the document is really slow. It seems to me, that no queries to DB are made during both steps, which accords to my intention.
I think that projections are the only way to work on lucene index and avoid hibernate queries to DB, or am i wrong?
I appreciate any advice how to achieve better search performance.

Search DB entries for a match when table has eight columns

I have to work with a POJO "Order" that 8 fields and each of these fields is a column in the "order" table. The DB schema is denormalized (and worse, deemed final and unchangeable) so now I have to write a search module that can execute a search with any combination of the above 8 fields.
Are there any approaches on how to do this? Right now I get the input in a new POJO and go through eight IF statements looking for values that are not NULL. Each time I find such a value I add it to the WHERE condition in my SELECT statement.
Is this the best I can hope for? Is it arguably better to select on some minimum of criteria and then iterate over the received collection in memory, only keeping the entries that match the remaining criteria? I can provide pseudo code if that would be useful. Working on Java 1.7, JSF 2.2 and MySQL.
Each time I find such a value I add it to the WHERE condition in my SELECT statement.
This is a prime target for Sql Injection attacks!
Would something like the following work with MySql?
SELECT *
FROM SomeTable
WHERE (#param1 IS NULL OR SomeTable.SomeColumn1 = #param1) OR
(#param2 IS NULL OR SomeTable.SomeColumn2 = #param2) OR
(#param3 IS NULL OR SomeTable.SomeColumn3 = #param3) OR
/* .... */

Can astyanax return ordered column names?

Using com.netflix.astyanax, I add entries for a given row as follows:
final ColumnListMutation<String> columnList = m.withRow(columnFamily, key);
columnList.putEmptyColumn(columnName);
Later I retrieve all my columns with:
final OperationResult<ColumnList<String>> operationResult = keyspace
.prepareQuery(columnFamily).getKey(key).execute();
operationResult.getResult().getColumnNames();
The following correctly return all the columns I have added but the columns are not ordered accordingly to when they were entered in the database. Since each column has a timestamp associated to it, there ought to be a way to do exactly this but I don't see it. Is there?
Note: If there isn't, I can always change the code above to:
columnList.putColumn(ip,new Date());
and then retrieve the column values, order them accordingly, but that seems cumbersome, inefficient, and silly since each column already has a timestamp.
I know from PlayOrm that if you do column Slices, it returns those in order. In fact, playorm uses that do enable S-SQL in partitions and basically batches the column slicing which comes back in order or reverse order depending on how requested. You may want to do a column slice from 0 to MAXLONG.
I am not sure about getting the row though. I haven't tried that.
oh, and PlayOrm is just a mapping layer on top of astyanax though not really relational and more noSql'ish really as demonstrated by it's patterns pages
http://buffalosw.com/wiki/Patterns-Page/
Cassandra will never order your columns in "insertion order".
Columns are always ordered lowest first. It also depends on how cassandra interprets your column names. You can define the interpretation with the comparator you set when defining your column family.
From what you gave it looks you use String timestamp values. If you simply serialized your timestamps as e.g. "123141" and "231" be aware that with an UTF8Type comparator "231">"123131".
Better approach: Use Time-based UUIDs as column names, as many examples for Time-series data in Cassandra propose. Then you can use the UUIDType comparator.
CREATE COLUMN FAMILY timeseries_data
WITH comparator = UUIDType
AND key_validation_class=UTF8Type;

Implementing result paging in hibernate (getting total number of rows)

How do I implement paging in Hibernate? The Query objects has methods called setMaxResults and setFirstResult which are certainly helpful. But where can I get the total number of results, so that I can show link to last page of results, and print things such as results 200 to 250 of xxx?
You can use Query.setMaxResults(int results) and Query.setFirstResult(int offset).
Editing too: There's no way to know how many results you'll get. So, first you must query with "select count(*)...". A little ugly, IMHO.
You must do a separate query to get the max results...and in the case where between time A of the first time the client issues a paging request to time B when another request is issued, if new records are added or some records now fit the criteria then you have to query the max again to reflect such. I usually do this in HQL like this
Integer count = (Integer) session.createQuery("select count(*) from ....").uniqueResult();
for Criteria queries I usually push my data into a DTO like this
ScrollableResults scrollable = criteria.scroll(ScrollMode.SCROLL_INSENSITIVE);
if(scrollable.last()){//returns true if there is a resultset
genericDTO.setTotalCount(scrollable.getRowNumber() + 1);
criteria.setFirstResult(command.getStart())
.setMaxResults(command.getLimit());
genericDTO.setLineItems(Collections.unmodifiableList(criteria.list()));
}
scrollable.close();
return genericDTO;
you could perform two queries - a count(*) type query, which should be cheap if you are not joining too many tables together, and a second query that has the limits set. Then you know how many items exists but only grab the ones being viewed.
You can do one thing. just prepare Criteria query as per your busness requirement with all Predicates , sorting , searching etc.
and then do as below :-
CriteriaBuilder criteriaBuilder = em.getCriteriaBuilder();
CriteriaQuery<Feedback> criteriaQuery = criteriaBuilder.createQuery(Feedback.class);
//Just Prepare your all Predicates as per your business need.
//eg :-
yourPredicateAsPerYourBusnessNeed = criteriaBuilder.equal(Root.get("applicationName"), applicationName);
criteriaQuery.where(yourPredicateAsPerYourBusnessNeed).distinct(true);
TypedQuery<Feedback> criteriaQueryWithPredicate = em.createQuery(criteriaQuery);
//Getting total Count Here
Long totalCount = criteriaQueryWithPredicate.getResultStream().distinct().count();
Now we have our actual data with us as above with total count , right.
So now we can apply pagination on the data we have in our hand above , as below :-
List<Feedback> feedbackList = criteriaQueryWithPredicate.setFirstResult(offset).setMaxResults(pageSize).getResultList();
Now You can prepare a wrapper with your List return by DB along with the totalCount , startingPageNo that is offset here in this case, page Size etc and can return to your service / controller class.
I am 101 % sure , this will solve your problem, Because I was facing same problem and sorted it out same way.
Thanks- Sunil Kumar Mali
You can just setMaxResults to the maximum number of rows you want returned. There is no harm in setting this value greater than the number of actual rows available. The problem the other solutions is they assume the ordering of records remains the same each repeat of the query, and there are no changes going on between commands.
To avoid that if you really want to scroll through results, it is best to use the ScrollableResults. Don't throw this object away between paging, but use it to keep the records in the same order. To find out the number of records from the ScrollableResults, you can simply move to the last() position, and then get the row number. Remember to add 1 to this value, since row numbers start counting at 0.
I personally think you should handle the paging in the front-end. I know this isn't that efficiƫnt but at least it would be less error prone.
If you would use the count(*) thing what would happen if records get deleted from the table in between requests for a certain page? Lots of things could go wrong this way.

Categories