in a relational database there are some normalized tables, while the actually relevant data for me is stored in a view, which is really big (about 120 million rows for 80 columns).
About 10 of 80 columns are relevant for searching issues, which are to be implemented using hibernate search 4.3.2.
It seems logical to me, that by indexing the view entity and querying only 10 of 80 desired columns (#Field annotation) i'm getting loads of redundant data, which distincts only by the primary key.
Currently i do following:
ScrollableResults ids = fullTextSession.createCriteria(clazz)
.addOrder(Order.asc("id"))
.add(Restrictions.ilike(field, query))
.add(Projections.distinct(Projections.id()))
.setProjection((Projections.distinct(Projections.id())))
.scroll(ScrollMode.FORWARD_ONLY);
ArrayList<String> results = new ArrayList<String>();
while (ids.next()) {
ScrollableResults redundantResults = fullTextSession.createCriteria(clazz)
.add(Restrictions.idEq(ids.get(0)))
.setProjection(Projections.projectionList()
.add(Projections.property("name"))
.add(Projections.property("city"))
.add(Projections.property("postal"))
)
.scroll(ScrollMode.FORWARD_ONLY);
if (redundantResults.next())
results.add((String) redundantResults.get(0));
}
I know i must be somewhere wrong, my intentions are:
1. Get a distinct set of objects, matching my search criteria
2. Obtain them only using lucene index, since a DB-query is too expensive
While the step of obtaining the distinct ids seems to be really good at performance, the second step of getting the properties data from the document is really slow. It seems to me, that no queries to DB are made during both steps, which accords to my intention.
I think that projections are the only way to work on lucene index and avoid hibernate queries to DB, or am i wrong?
I appreciate any advice how to achieve better search performance.
I'm trying to build a Pagination inside my system. In DAO i'm using "setFirstResult()" and "setMaxResults()" to limit the amount of rows returned.
Look:
Query query = entityManager.createNamedQuery(namedQuery);
if (firstResult != null) {
query.setFirstResult(firstResult);
}
if (maxResult != null) {
query.setMaxResults(maxResult);
}
List returnList = query.getResultList();
But for pagination works i need to know the amount of rows without limitation (firstResult() and maxResults()).
If i have this query:
SELECT * FROM MyEntity e WHERE e.car = :carParam OFFSET 10 LIMIT 20
i would like to count like this
SELECT Count(*) FROM MyEntity e WHERE e.car = :carParam
But I want to avoid create another query manually for each Entity, how can i do a count() without force creating a new count() query ?
There is no way to calculate the total count of results without using Criteria API or (as you said) creating another query manually. Both of them will cause to a separate query against database to calculate the total counts. I had some experiences in this context. It may doubles your response time if your query is going to be run on a huge data set or if you have a large number of concurrent users.
I think the best way to prevent such an overhead on your DBMS is to display a "load more" link at the end of the search results. I highly recommend such approach if displaying total count is not part of your main business or is not forced by the client.
Take a look at this link (if you've not checked it before)
I have a table which I need to query, then organize the returned objects into two different lists based on a column value. I can either query the table once, retrieving the column by which I would differentiate the objects and arrange them by looping through the result set, or I can query twice with two different conditions and avoid the sorting process. Which method is generally better practice?
MY_TABLE
NAME AGE TYPE
John 25 A
Sarah 30 B
Rick 22 A
Susan 43 B
Either SELECT * FROM MY_TABLE, then sort in code based on returned types, or
SELECT NAME, AGE FROM MY_TABLE WHERE TYPE = 'A' followed by
SELECT NAME, AGE FROM MY_TABLE WHERE TYPE = 'B'
Logically, a DB query from a Java code will be more expensive than a loop within the code because querying the DB involves several steps such as connecting to DB, creating the SQL query, firing the query and getting the results back.
Besides, something can go wrong between firing the first and second query.
With an optimized single query and looping with the code, you can save a lot of time than firing two queries.
In your case, you can sort in the query itself if it helps:
SELECT * FROM MY_TABLE ORDER BY TYPE
In future if there are more types added to your table, you need not fire an additional query to retrieve it.
It is heavily dependant on the context. If each list is really huge, I would let the database to the hard part of the job with 2 queries. At the opposite, in a web application using a farm of application servers and a central database I would use one single query.
For the general use case, IMHO, I will save database resource because it is a current point of congestion and use only only query.
The only objective argument I can find is that the splitting of the list occurs in memory with a hyper simple algorithm and in a single JVM, where each query requires a bit of initialization and may involve disk access or loading of index pages.
In general, one query performs better.
Also, with issuing two queries you can potentially get inconsistent results (which may be fixed with higher transaction isolation level though ).
In any case I believe you still need to iterate through resultset (either directly or by using framework's methods that return collections).
From the database point of view, you optimally have exactly one statement that fetches exactly everything you need and nothing else. Therefore, your first option is better. But don't generalize that answer in way that makes you query more data than needed. It's a common mistake for beginners to select all rows from a table (no where clause) and do the filtering in code instead of letting the database do its job.
It also depends on your dataset volume, for instance if you have a large data set, doing a select * without any condition might take some time, but if you have an index on your 'TYPE' column, then adding a where clause will reduce the time taken to execute the query. If you are dealing with a small data set, then doing a select * followed with your logic in the java code is a better approach
There are four main bottlenecks involved in querying a database.
The query itself - how long the query takes to execute on the server depends on indexes, table sizes etc.
The data volume of the results - there could be hundreds of columns or huge fields and all this data must be serialised and transported across the network to your client.
The processing of the data - java must walk the query results gathering the data it wants.
Maintaining the query - it takes manpower to maintain queries, simple ones cost little but complex ones can be a nightmare.
By careful consideration it should be possible to work out a balance between all four of these factors - it is unlikely that you will get the right answer without doing so.
You can query by two conditions:
SELECT * FROM MY_TABLE WHERE TYPE = 'A' OR TYPE = 'B'
This will do both for you at once, and if you want them sorted, you could do the same, but just add an order by keyword:
SELECT * FROM MY_TABLE WHERE TYPE = 'A' OR TYPE = 'B' ORDER BY TYPE ASC
This will sort the results by type, in ascending order.
EDIT:
I didn't notice that originally you wanted two different lists. In that case, you could just do this query, and then find the index where the type changes from 'A' to 'B' and copy the data into two arrays.
I have a table with group and permission column. I want to find the max permission from a list of group. I am using java and oracle database, I thought of two ways to do this:
Way 1:
in java loop through the group list
result = select permission from table where group = currentgroup
if result > max, max = result
Way 2:
max = select max(permission) from table where group in (group list)
I thought way 2 would be faster, but then group list can be very long and I dont know if it is a good idea to have long list in a single sql query.
From the information you've given, the second approach is by far the best. Databases are optimised directly for these kinds of tasks, so within reason, its always best to narrow the data down with the database. The first approach means the database needs to return all values anyway, increasing processing time, bandwidth and using up memory within your java application.
How it is possible to limit the number of results retrieved from a database?
select e from Entity e /* I need only 10 results for instance */
You can try like this giving 10 results to be fetched explicitly.
entityManager.createQuery(JPQL_QUERY)
.setParameter(arg0, arg1)
.setMaxResults(10)
.getResultList();
It will automatically create native query in back-end to retrieve specific number of results, if the backend supports it, and otherwise do the limit in memory after getting all results.
You can set an offset too using setFirstResult()
em.createNamedQuery("Entity.list")
.setFirstResult(startPosition)
.setMaxResults(length);
If you are using Spring data JPA, then you can use Pageable/PageRequest to limit the record to 1 or any number you want. The first argument, is the page no, and the second argument is the number of records.
Pageable page = PageRequest.of(0, 1);
Entity e = entityRepository.findAll(page);
Make sure the entityRepostitory interface extends JpaRepository (which supports sorting and pagination).