Retrieving separate documents as one object - java

I have two basic document types:
Patient with fields name and number;
Analysis with fields that
consist of 3 arrays, date of analysis, a bunch of floats and a
patientId, linking it to the Patient, so there's a 1:M relationship
between Patient and Analysis.
I'm using Ektorp as a driver. Right now there are two POJOs, reflecting the documents, and Patient POJO also has a Set<Analysis>, marked by #DocumentReferences annotation (which I don't actually need yet, but might use later to show all analyses for one patient, for example).
However, what I want to do is use pagination to populate TableView with rows, containing info about both Patient AND Analysis, sorted by date of analysis in descending order.
Right now, my only idea is to
Query CouchDB for all Analysis documents sorted by date (using
pagination via PageRequest in Ektorp)
Then for each analysis query db for patients with patientId
Make a separate class representing a row in table with fields for both patient and analysis, then create an object of that class using
retrieved Analysis and Patient objects.
In code it would look somewhat like this:
PageRequest pageRequest = PageRequest.firstPage(5);
ViewQuery query = new ViewQuery()
.designDocId("_design/analysis")
.viewName("by_date")
.descending(true)
.includeDocs(true);
Page<Analysis> result = db.queryForPage(query, pageRequest, Analysis.class);
for(Analysis analysis : result) {
Patient patient = db.find(Patient.class, analysis.getPatientId());
Row row = new Row(patient, analysis);
//TODO: ...Populate TableView...
}
This all sounds very cumbersome. Is there a better way to do this? Can I deserialize data I need in one go (or at least in one query)?
I know there's a way to use {_id: doc.patientId} as a value in emit() function to return Patient for each Analysis when querying with parameter include_docs=true, but I'm not sure how to use that to my advantage.
I could just embed Patient in Analysis, but then I would have to change every Analysis document if there's a change in Patient, so that's not a good solution.
I've read these two paragraphs of documentation several times over, but I can't for the life of me figure out how to implement them with Ektorp.
Any help would be appreciated.

Related

Use pagination on query on multiple indices in hibernate search

we are implementing a global search that we can use to query all entities in our application, let's say cars, books and movies. Those entities do not share common fields, cars have a manufacturer, books have an author and movies have a director, for example.
In a global search field I'd like to search for all my entities with one query to be able to use pagination. Two aproaches come to my mind when thinking about how to solve this:
Query one index after another and manually merge the result. This means that I have to implement pagination myself.
Add common fields to each item like name and creator (or create an interface as shown here Single return type in Hibernate Search).In this case I can only search for fields in my global search that I map to the common fields.
My question is now: Is there a third (better) way? How would you suggest to implement such a global search on multiple indices?
Query one index after another and manually merge the result. This means that I have to implement pagination myself.
I definitely wouldn't do that, as this will perform very poorly, especially with deep pagination (page 40, etc.).
Add common fields to each item like name and creator (or create an interface as shown here Single return type in Hibernate Search).In this case I can only search for fields in my global search that I map to the common fields.
That's the way. You don't even need a common interface since you can just target multiple fields in the same predicate. The common interface would only help to target all relevant types: you can call .search(MyInterface.class) instead of .search(Arrays.asList(Car.class, Book.class, Movie.class)).
You can still apply predicates to fields that are specific to each type; it's just that fields that appear in more than one type must be consistent (same type, etc.). Also, obviously, if you require that the "manufacturer" (and no other field) matches "james", Books and Movies won't match anymore, since they don't have a manufacturer.
Have you tried it? For example, this should work just fine as long as manufacturer, author and director are all text fields with the same analyzer:
SearchResult<Object> result = searchSession.search( Arrays.asList(
Car.class, Book.class, Movie.class
) )
.where( f -> f.simpleQueryString()
.fields( "manufacturer", "author", "director" )
.matching( "james" ) )
.fetch( 20 );
List<Object> hits = result.hits(); // Result is a mix of Car, Book and Movie.
One approach would be to create a SQL view (SearchEntry?) that combines all of the tables you want to search. This allows you to alias your different column names. It won't be very good for performance but you could also just create one big field that is a concatenation of different searchable fields. Finally, include a "type" field that you tie back to your entity.
Now you can query everything in one go and use the type/id to tie back to a specific entity that the "search" data was initially pulled from.

Hint HINT_PASS_DISTINCT_THROUGH reduces the amount of Entities returned per page for a PageRequest down to below the configured page size (PostgreSQL)

I'm setting up a JPA Specification based repository implementation that utilizes jpa specifications(constructed based on RSQL filter strings) to filter the results, define result ordering and remove any duplicates via "distinct" that would otherwise be returned due to joined tables. The JPA Specification builder method joins several tables and sets the "distinct" flag:
final Join<Object, Object> rootJoinedTags = root.join("tags", JoinType.LEFT);
final Join<Object, Object> rootJoinedLocations = root.join("location", JoinType.LEFT);
...
query.distinct(true);
To allow sorting by joined table columns, I've applied the "HINT_PASS_DISTINCT_THROUGH" hint to the relevant repository method(otherwise, sorting by joined table columns returns an error along the lines of "sort column must be included in the SELECT DISTINCT query").
#QueryHints(value = {
#QueryHint(name = org.hibernate.jpa.QueryHints.HINT_PASS_DISTINCT_THROUGH, value = "false")
})
Page<SomeEntity> findAll(#Nullable Specification<SomeEntity> spec, Pageable pageable);
The arguments for said repository method are constructed as such:
final Sort sort = getSort(searchFilter);
final Specification spec = getSpecificationIfPresent(searchFilter);
final PageRequest pageRequest = PageRequest.of(searchFilter.getPageNumber(), searchFilter.getLimit(), sort);
return eventRepository.findAll(spec, pageRequest);
After those changes, filtering and sorting seem to work as expected. However, the hint seems to cause "distinct" filtering to be applied after the result page is already constructed, thus reducing the number of returned entities in the page from the configured "size" PageRequest argument, to whatever is left after the duplicates are filtered out. For example, if we'd make a PageRequest with "page=0" and "pageSize=10", then the resulting Page may return only 5 "SomeEntity" instances, although the database contains way more entries(177 entities to be exact in this case). If I remove the hint, then the returned entities number is correct again.
Question: is there a way to make the same Specification query setup work with correctly sized Pages(some other hints that might be added to have duplicate filtering performed before the Page object is constructed)? If not, then is there another approach I could use to achieve the required Specification-based filtering, with joined-column sorting and duplicate removal as with "distinct"?
PS: PostgreSQL is the database behind the application in question
The problem you are experimenting have to do with the way you are using the HINT_PASS_DISTINCT_THROUGH hint.
This hint allows you to indicate Hibernate that the DISTINCT keyword should not be used in the SELECT statement issued against the database.
You are taking advantage of this fact to allow your queries to be sorted by a field that is not included in the DISTINCT column list.
But that is not how this hint should be used.
This hint only must be used when you are sure that there will be no difference between applying or not a DISTINCT keyword to the SQL SELECT statement, because the SELECT statement already will fetch all the distinct values per se. The idea is improve the performance of the query avoiding the use of an unnecessary DISTINCT statement.
This is usually what will happen when you use the query.distinct method in you criteria queries, and you are join fetching child relationships. This great article of #VladMihalcea explain how the hint works in detail.
On the other hand, when you use paging, it will set OFFSET and LIMIT - or something similar, depending on the underlying database - in the SQL SELECT statement issued against the database, limiting to a maximum number of results your query.
As stated, if you use the HINT_PASS_DISTINCT_THROUGH hint, the SELECT statement will not contain the DISTINCT keyword and, because of your joins, it could potentially give duplicate records of your main entity. This records will be processed by Hibernate to differentiate duplicates, because you are using query.distinct, and it will in fact remove duplicates if needed. I think this is the reason why you may get less records than requested in your Pageable.
If you remove the hint, as the DISTINCT keyword is passed in the SQL statement which is sent to the database, as far as you only project information of the main entity, it will fetch all the records indicated by LIMIT and this is why it will give you always the requested number of records.
You can try and fetch join your child entities (instead of only join with them). It will eliminate the problem of not being able to use the field you need to sort by in the columns of the DISTINCT keyword and, in addition, you will be able to apply, now legitimately, the hint.
But if you do so it will you another problem: if you use join fetch and pagination, to return the main entities and its collections, Hibernate will no longer apply pagination at database level - it will no include OFFSET or LIMIT keywords in the SQL statement, and it will try to paginate the results in memory. This is the famous Hibernate HHH000104 warning:
HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!
#VladMihalcea explain that in great detail in the last part of this article.
He also proposed one possible solution to your problem, Window Functions.
In you use case, instead of using Specifications, the idea is that you implement your own DAO. This DAO only need to have access to the EntityManager, which is not a great deal as you can inject your #PersistenceContext:
#PersistenceContext
protected EntityManager em;
Once you have this EntityManager, you can create native queries and use window functions to build, based on the provided Pageable information, the right SQL statement that will be issued against the database. This will give you a lot of more freedom about what fields use for sorting or whatever you need.
As the last cited article indicates, Window Functions is a feature supported by all mayor databases.
In the case of PostgreSQL, you can easily come across them in the official documentation.
Finally, one more option, suggested in fact by #nickshoe, and explained in great detail in the article he cited, is to perform the sorting and paging process in two phases: in the first phase, you need to create a query that will reference your child entities and in which you will apply paging and sorting. This query will allow you to identify the ids of the main entities that will be used, in the second phase of the process, to obtain the main entities themselves.
You can take advantage of the aforementioned custom DAO to accomplish this process.
It may be an off-topic answer, but it may help you.
You could try to tackle this problem (pagination of parent-child entities) by separating the query in two parts:
a query for retrieving the ids that match the given criteria
a query for retrieving the actual entities by the resulting ids of the previous query
I came across this solution in this blog post: https://vladmihalcea.com/fix-hibernate-hhh000104-entity-fetch-pagination-warning-message/

Modelling data and accessing it using the DAO pattern

I'm creating a very simple application in Java that will be storing questions in an embedded Derby database. I've decided to use the DAO pattern for accessing the data in the database. I cannot make use of an ORM for this project.
A question will have data that I would normally model using a many to one relationship in a relational database. An example of this data would be:
A question will have one category. One category will have multiple questions.
A question will have a score of 1000, 2000 or 3000. A score will have many questions.
With the above in mind, I would create three tables (brackets indicate columns):
Question (id, question, scoreId, categoryId)
Score (id, score)
Category (id, category)
My first question is:
Would modelling my data across three tables like I suggest above be bad practice/the wrong way to go about this? Is there any benefit in storing score and category in separate tables? Or would it be better to combine them into the Question table? A many to one relationship that links to a table with a single column (with the exception of id) seems redundant to me, as instead of storing an id referencing the Score/Category table, we can simply store the value of the category/score (since the category/score table does not store any additional information).
My second question is:
If modelling my data across separate tables is the correct approach, then how would I access the data using the DAO pattern? My confusion comes from the following:
I would create a DAO to populate a Question model object that would look a little something like this:
public class Question {
String question;
String category;
Integer score;
}
I would create a concrete implementation of the DAO interface like this:
public class QuestionAccessObject implements QuestionDao {
private static final String TABLE_1 = "QUESTION";
private static final String TABLE_2 = "SCORE";
private static final String TABLE_3 = "CATEGORY";
#Override
public List<Question> getAllQuestions() {
List<Question> questions = new ArrayList<>();
//Run a query with joins across the three tables and iterate over the result to populate the list
return questions;
}
}
Shouldn't each DAO object only be concerned with a single table in the database? My approach listed above doesn't seem like the most correct way to go about this. Seperate tables would also make inserting data into the database very messy (I don't understand how I could take clean approach using the DAO pattern and multiple tables). Creating a DAO for the Score and Category tables just wouldn't really make sense.. (and if I did this, how would I populate my model?)
Would modelling my data across three tables like I suggest above be bad practice/the wrong way to go about this? Is there any benefit in storing score and category in separate tables....?
It's a matter of discussion. In case of score I rather stick this information with the question. On the other hand, the category would be in the separated table since more of the question would share the same category, so it makes a perfect sense.
Shouldn't each DAO object only be concerned with a single table in the database?
Yes, DAO, an object should be concerned with a single source of data - as you say. I would certainly try to avoid any ComplexDao since those classes tend to get more complex and the number of methods increases over the time.
There exist a service layer to combine those results together and provide an output to the controller using the very same service.
Modeling the data across separate tables is A correct approach (not necessarily the best).
Separating tables helps database normalization: https://en.wikipedia.org/wiki/Database_normalization.
One could argue that the DAO pattern implies that each DAO object is concerned with a single entity . Similar to how ORMs work, an entity could easily reference other entities.
When you query for a question you could also just return the category and score ids inside the question object and force the library user to fetch the score value and category value (lazy fetch) using those id values with their respective DAOs (score and category).
So I believe that what you're doing seems fine
Hope this helps

DynamoDB query on sub field of JSON Object

Is it possible to search for a subfield of a json object in dynamoDB table?
My table:
Item: "item name",
Location: {...},
ItemInformation :
{
ItemName: "itemName",
ProductLine: {
Brand: "Razer",
ManufacturerSource: "Razer"
}
Originally in this table ItemInformation would be a key and searching for an object we would construct the json for the item information and then query with the json string as a key.. Now we need to implement searching by sub fields of that object, which can contain different fields each time, i.e. isDigital: "true".
I notice in the question:
DynamoDB advanced scan - JAVA
The answer would seem to be no and I would have to separate out the fields. But I am curious about why and how the PHP library can query for sub fields on a JSON object in dynamoDB. Is there really no better solution then to store the column as separate fields and then add an index on all fields?
After looking through documentation it is not feasible to implement the search fields as I originally intended. The problem is that while the values are JSON they are stored as string literals so I have to do refactoring to start storing as JSON objects. Additionally I cannot add in columns and index because the search could operate on any number of fields and different items can have different fields, i.e. an Item can have Brand, BatteryInformation, Name. Given that the requirement is that any of these subfields should be searchable its better to do this in Cloud Search or ElasticSearch where I can index and search on arbitrary fields and values within a column of an object.
Since this is a DynamoDB table, I am going to use CloudSearch since it offers easier indexing option and integration for data.
At the moment, the only solution available is to store the column as separate fields. Probably, aws may come up with a solution in future releases.

DDD valueObject and database schema

To end 2014 year I got a simple question I think.
I would like to use "DDD" a bit more, and I'm currently trying to experiment various usecases to learn more about DDD.
My current usecase is the following :
we have a new database schema that is using a classic pattern in our company : modeling our nomenclature table as "id / code / label". I think it's a pretty classic case when using hibernate for example.
But in the OO world things get "complciated" for something this simple when using a API like JDBC or QueryDSL. I need to fetch an object by its code, retrieve its id or load the full object and then set it as a one to one relation in another object.
I wondering :
this kind of nomenclature can be an enum (or a class with String cosnatnts depending on the developer). in DDD terms, it is my ValueObject
the id  /code / label in the database is not i18n friendly (it's not a prerequisite) so I don't see its advantages. Except when the table can be updated dynamically and the usecase is "pick something in a combobox loaded from this table and build a relation with another object : but that's all because if you have business rules that must be applied you need to know the new code etc etc).
My questions are :
do you often use the id / ocde / label pattern in your database model.
how do your model your nomenclature data ? (country is perhaps not the best example :) but no matter what how do you model it ? without thinking much I would say database table for country; but for some status : "valid, waiting validation, rejected" ?
do you model your valueObjects using this pattern ?
or do you use lots of enum and only store their toString (or ordinal) in the database ?
In the Java OO objects world, I'm currently thinking that it is easier to manipulate enum that objects loaded from the database. I need to build repositories to load them for example. And it will be so simple to use them as enums. I'm searching some recomfort here or perhaps am I missing something so obvious ?
thanks
see you in 2015 !
Update 1 :
We can create a "Budget" and the first one is mark as Initial and the next ones are marked as "Corrective" (with a increment). For example, we can have a list of Budgets :"Initial Budget", "Corrective budget #1", "Corrective budget #2".
For this we have this database design : a Budget Table, a Version Budge with a foreign key between the two. the Version budget only contains an ID, a CODE and a LABEL.
Personnaly, I would like to remove this table. I don't see the advantages of this structure. And from the OO perspective, when I'm creating a budget I can query the databse to see if I need to create an Inital or Corrective budget (using a count query) then I can set the right enum to my new budget. But with the current design I need to query the database using the CODE that I want, select the ID and set the ID. So yes, it's really database oriented. Where is the DDD part ? a ValueObject is something that describe, quantify something. In my case seems good to me. A Version describe the current status of my Budget. I can comapre two versions just but checking their code, they don't have lifecycle (I don't want this one in particular).
How to you handle this type of usecases ?
It's only a simple example because I found that if you ask a database admin he would surely said that all seems good : using primary key, modeling relations, enforing constraints, using foreign key and avoid data duplication.
Thanks again Mike and Doctor for their comments.
I will hook in in your country example. In most cases, country will be a value object. There is nothing that will reference a country entity and that should know that if the values of the country changes it is still the same country. In fact, the country could be represented as an enum, and some nasty resource lookup functions that translate the Iso3 into a usefull display text. What we do is, we define it as a value object class with iso3, displayname and some other static information. Now out of this value object we define a kind of "power enum" (I still miss a standard term here). The class implementing the country value object gets a private constructor and static properties for each of its values (for each country) and explicit cast operators from and to int. Now you can treat it just like a normal enum of your programing language. The advantage to a normal enum beside having more property fields is, that it also can have methods (of course query methods, that don't change the state of the object). You can even use polymorphism (some countries with different behaviour than others). You could also load the content of the enums from a database table (without the statics then and a static lookupByIso3 method instead).
This you could make with some other "enum like" value objects, too. Imagine Currencies (it could have conversion methods that are implemented polymorphic). The handling of the daily exchange rates is a different topic though.
If the set of values is not fixed (for example another value object candidate like postal adress) then it is not a value object enum, but a standard value object that could be instantiated with the values you want.
To decide if you can live with something as a value object, you can use the following question: Do you want copy semantic, or reference semantic? If you ever change a property of the object, should all places where you used it update, too, or should they stay as they are? If the latter, than the "changed" object is a new and different value object. Another question would be, if you need to track changes to an object realizing that it remains the "same" despite of changing values. And if you have a value object, where you only want specific instances to exist, it is a kind of enum described above.
Does that somehow help you?

Categories