Hibernate Batch Operation performance - java

I have around 5000 records to update. I am trying to measure performance of the operation. It starts with around 100 ms but after every thousand updates operation time increases around 80 ms. Why is it slowing down? JVM?
StatelessSession session = dao.getStatelessSession();
Transaction transaction = session.beginTransaction();
try {
List<Entity> list = dao.findAll();
int counter = 0;
for (Entity each : list) {
final Date startTime = Clock.getTime();
webService.execute(each);
session.update(each);
counter += 1;
final Date endTime = Clock.getTime();
LOGGER.info("***** " + getMilliSecondsDifference(startTime, endTime) + " for count: " + counter + "*****");
}
} catch (Exception e) {
LOGGER.info("***** Exception occured : ", e);
} finally {
transaction.commit();
session.close();
}

Hüseyin,
It doesnt have to be hibernate problem at all if we look at your code.
I suggest you to comment out your line related with webservice call.
Then please try again batch hql running.
Maybe networking could be getting slower.

You have one transaction and dealing with a large number of objects. here you will probabely have a memory leak and performance issue too.
The objects references will stay in memory untill a session flush is executed (commit). so you will have a big number of object in memory in addition of the big number of informations about the object changes that will be also kept in the hibernate session and that can alter the performance too (i'am not an hibernate expert but you should consider this point)
I think that you may think about using a lot of transactions
See theses interesting links:
Transaction Management for bulk operations
Hibernate session and Transaction Management Guidelines
Good luck

Related

How to DO Batch Update in Hibernate Effectively

I have read many article and found some ways to do batch process
One of that is Using flush and clear , following is the code
long t1 = System.currentTimeMillis();
Session session = getSession();
Transaction transaction = session.beginTransaction();
try {
Query query = session.createQuery("FROM PersonEntity WHERE id > " + lastMaxId + " ORDER BY id");
query.setMaxResults(1000);
rows = query.list();
int count = 0;
if (rows == null || rows.size() == 0) {
return;
}
LOGGER.info("fetched {} rows from db", rows.size());
for (Object row : rows) {
PersonEntity personEntity = (PersonEntity) row;
personEntity.setName(randomAlphaNumeric(30));
lastMaxId = personEntity.getId();
session.saveOrUpdate(personEntity);
if (++count % 50 == 0) {
session.flush();
session.clear();
LOGGER.info("Flushed and Cleared");
}
}
} finally {
if (session != null && session.isOpen()) {
LOGGER.info("Closing Session and commiting transaction");
transaction.commit();
session.close();
}
}
long t2 = System.currentTimeMillis();
LOGGER.info("time taken {}s", (t2 - t1) / 1000);
In above code we are processing records in batch of 1000 and updating them in the same transaction .
It is OK when we have to do batch update only .
But I have following questions regading it :
There can be case when some other thread(T2) is accessing the same set of rows for some runtime update operations , but in this case till the 1000 batch will not be commited , T2 remians stuck
So , How we should handle this case ?
Possible thoughts/solution by me :
I think we can do update in different session with small batch of say 50
Use a diffrent Stateless connection for Update and commit the transcation one by one , but close the session when a batch of 1000 completes .
Please Help me getting better solution .
Do you mean to say this:
there is a batch update in progress inside a transaction
in the meanwhile another thread starts updating one of the records that's there in the batch as well
because of this, the batch will wait till the update in point 2 is complete. This causes the rest of the records in the batch to also wait.
So far, it appears all good. However, the important pont here was that the transaction was done to make the update to a large set of records "faster". Usually, transactions are used to ensure "consistency/atomicity".
How does one design this piece - fast updates to multiple records in one go with atomicity not being the primary criteria, while a likely update to a record in the batch is also requested by another thread

PostgreSQL's XMIN in Oracle & MySQL

I'm trying to get the equivalent for this code on Oracle & MySQL
if(vardbtype.equals("POSTGRESQL")){
Long previousTxId = 0L;
Long nextTxId = 0L;
Class.forName("org.postgresql.Driver");
System.out.println("----------------------------");
try(Connection c = DriverManager.getConnection("jdbc:postgresql://localhost:5432/"+ vardbserver, vardbuser, vardbpassword);
PreparedStatement stmts = c.prepareStatement("SELECT * FROM "+ vardbname +" where xmin::varchar::bigint > ? and xmin::varchar::bigint < ? ");
PreparedStatement max = c.prepareStatement("select max(xmin::varchar::bigint) as txid from "+ vardbname)
) {
c.setAutoCommit(false);
while(true) {
stmts.clearParameters();
try(ResultSet rss = max.executeQuery()) {
if(rss.next()) {
nextTxId = rss.getLong(1);
}
}
stmts.setLong(1, previousTxId);
stmts.setLong(2, nextTxId + 1);
try(ResultSet rss = stmts.executeQuery()) {
while(rss.next()) {
String message = rss.getString("MESSAGE");
System.out.println("Message = " + message);
TextMessage mssg = session.createTextMessage(message);
System.out.println("Sent: " + mssg.getText());
producer.send(mssg);
}
previousTxId = nextTxId;
}
Thread.sleep(batchperiod2);
}
}
}
Basically, the code works to get contents inside a database's table and sent it to ActiveMQ. And when the table updated, it will sent the content that just updated (not sending the past that was sent). But this code only works on PostgreSQL
Then i'm planning to create an "if" function. So i can use another database to getting the data (Oracle and MySQL).
I guess i must change this code right?
try(Connection c = DriverManager.getConnection("jdbc:postgresql://localhost:5432/"+ vardbserver, vardbuser, vardbpassword);
PreparedStatement stmts = c.prepareStatement("SELECT * FROM "+ vardbname +" where xmin::varchar::bigint > ? and xmin::varchar::bigint < ? ");
PreparedStatement max = c.prepareStatement("select max(xmin::varchar::bigint) as txid from "+ vardbname)
) {
A couple thoughts supplemental to Thorsten's answer.
First, xmin is a system column which is, iirc, stored in the row header on disk. It is updated by writes. I have not yet run into a case where the transaction id's don't increase. However, there has to be some wraparound point. I think you are better off with a trigger which stores the transaction ids in another table for processing for this reason (and using that to process things).
For Oracle and MySQL, underlying storage is sufficiently different that I don't see how you can do this directly.
If you want a common solution you want a queue table where you can use a trigger to insert waiting copies, and then select/delete from that in your worker. This will likely work better on MySQL than on PostgreSQL, and for Oracle you want to look for index-oriented tables. If autovacuum has trouble keeping up, ask more questions or hire a consultant.
After further research
InnoDB provides a DB_TRX_ID column which is similar. Note you cannot assume you have this column if you are running MySQL because MySQL has different table storage engines and not all even support transactions. So that is an important limitation.
I was unable to locate a similar column on Oracle.
This script is looking in intervals at a table and putting out all inserted messages since that last loop.
PostgreSQL stores the transaction number that inserted a record, so this can be used to find the newly inserted records (although I am not sure whether it is guaranteed for a new transaction to have a higher number than all previous ones as the script assumes).
Other DBMS don't have this pseudo column. So you would have to have a timestamp column in your table and use this instead. You'd have to change the two queries as well as the code to match the data type (I suppose java.sql.Timestamp instead of Long, but I am no Java guy).

Do not update row in ResultSet if data has changed

we are extracting data from various database types (Oracle, MySQL, SQL-Server, ...). Once it is successfully written to a file we want to mark it as transmitted, so we update a specific column.
Our problem is, that a user has the possibility to change the data in the meantime but might forget to commit. The record is blocked with a select for update statement. So it can happen, that we mark something as transmitted, which is not.
This is an excerpt from our code:
Statement stmt = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_UPDATABLE);
ResultSet extractedData = stmt.executeQuery(sql);
writeDataToFile(extractedData);
extractedData.beforeFirst();
while (extractedData.next()) {
if (!extractedData.rowUpdated()) {
extractedData.updateString("COLUMNNAME", "TRANSMITTED");
// code will stop here if user has changed data but did not commit
extractedData.updateRow();
// once committed the changed data is marked as transmitted
}
}
The method extractedData.rowUpdated() returns false, because technically the user didn't change anything yet.
Is there any way to not update the row and detect if data was changed at this late stage?
Unfortunately I cannot change the program the user is using to change the data.
So you want to
Run through all rows of the table that have not been exported
Export this data somewhere
Mark these rows exported so your next iteration will not export them again
As there might be pending changes on a row, you don't want to mess with that information
How about:
You iterate over all rows.
for every row
generate a hash value for the contents of the row
compare column "UPDATE_STATUS" with calulated hash
if no match
export row
store hash into "UPDATE_STATUS"
if store fails (row locked)
-> no worries, will be exported again next time
if store succeeds (on data already changed by user)
-> no worries, will be exported again as hash will not match
This might further slow your export as you'll have to iterate over everything instead of over everything WHERE UPDATE_STATUS IS NULL but you might be able to do two jobs - one (fast)
iterating over WHERE UPDATE_STATUS IS NULL and one slow and thorough WHERE UPDATE_STATUS IS NOT NULL (with the hash-rechecking in place)
If you want to avoid store-failures/waits, you might want to store the hash /updated information into a second table copying the primary key plus the hash field value - that way user
locks on the main table would not interfere with your updates at all (as those would be on another table)
"a user [...] might forget to commit" > A user either commits or he doesn't. "Forgetting" to commit is tantamount to a bug in his software.
To work around that you need to either:
Start a transaction with isolation level SERIALIZABLE, and within that transaction:
Read the data and export it. Data read this way is blocked from being updated.
Update the data you processed. Note: don't do that with an updateable ResultSet, do that with an UPDATE statement. That way you don't need an CONCUR_UPDATABLE + TYPE_SCROLL_SENSITIVE which is much slower than a CONCUR_READ_ONLY + TYPE_FORWARD_ONLY.
Commit the transaction.
That way the buggy software will be blocked from updating data you are processing.
Another way
Start a TRANSACTION at a lower isolation level (default READ COMMITTED) and within that transaction
Select the data with proper Table Hints Eg for SQL Server these: TABLOCKX + HOLDLOCK (large datasets), or ROWLOCK + XLOCK + HOLDLOCK (small datasets), or PAGLOCK + XLOCK + HOLDLOCK. Having HOLDLOCK as a table hint is practically equivalent to having a SERIALIZABLE transaction. Note that lock escalation may escalate the latter two to table locks if the number of locks becomes too high.
Update the data you processed; Note: use an UPDATE statement. Lose the updatable/scroll_sensitive resultset.
Commit the TRANSACTION.
Same deal, the buggy software will be blocked from updating data you are processing.
In the end we had to implement optimistic locking. In some tables we already have a column that stores the version number. Some other tables have a timestamp column that holds the time of the last change (changed by trigger).
While a timestamp might not always be a reliable source for optimistic locking we went with it anyway. Several changes during a single second are not very realistic in our environment.
Since we have to know the primary key without describing it before hand, we had to access the resultset metadata. Some of our databases do not support this (DB/2 legacy tables for example). We are still using the old system for these.
Note: The tableMetaData is an XML-config file where our description of the table is stored. This is not directly related to the metadata of the table in the database.
Statement stmt = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_UPDATABLE);
ResultSet extractedData = stmt.executeQuery(sql);
writeDataToFile(extractedData);
extractedData.beforeFirst();
while (extractedData.next()) {
if (tableMetaData.getVersion() != null) {
markDataAsExported(extractedData, tableMetaData);
} else {
markResultSetAsExported(extractedData, tableMetaData);
}
}
// new way with building of an update statement including the version column in the where clause
private void markDataAsExported(ResultSet extractedData, TableMetaData tableMetaData) throws SQLException {
ResultSet resultSetPrimaryKeys = null;
PreparedStatement versionedUpdateStatement = null;
try {
ResultSetMetaData extractedMetaData = extractedData.getMetaData();
resultSetPrimaryKeys = conn.getMetaData().getPrimaryKeys(null, null, tableMetaData.getTable());
ArrayList<String> primaryKeyList = new ArrayList<String>();
String sqlStatement = "update " + tableMetaData.getTable() + " set " + tableMetaData.getUpdateColumn()
+ " = ? where ";
if (resultSetPrimaryKeys.isBeforeFirst()) {
while (resultSetPrimaryKeys.next()) {
primaryKeyList.add(resultSetPrimaryKeys.getString(4));
sqlStatement += resultSetPrimaryKeys.getString(4) + " = ? and ";
}
sqlStatement += tableMetaData.getVersionColumn() + " = ?";
versionedUpdateStatement = conn.prepareStatement(sqlStatement);
while (extractedData.next()) {
versionedUpdateStatement.setString(1, tableMetaData.getUpdateValue());
for (int i = 0; i < primaryKeyList.size(); i++) {
versionedUpdateStatement.setObject(i + 2, extractedData.getObject(primaryKeyList.get(i)),
extractedMetaData.getColumnType(extractedData.findColumn(primaryKeyList.get(i))));
}
versionedUpdateStatement.setObject(primaryKeyList.size() + 2,
extractedData.getObject(tableMetaData.getVersionColumn()), tableMetaData.getVersionType());
if (versionedUpdateStatement.executeUpdate() == 0) {
logger.warn(Message.COLLECTOR_DATA_CHANGED, tableMetaData.getTable());
}
}
} else {
logger.warn(Message.COLLECTOR_PK_ERROR, tableMetaData.getTable());
markResultSetAsExported(extractedData, tableMetaData);
}
} finally {
if (resultSetPrimaryKeys != null) {
resultSetPrimaryKeys.close();
}
if (versionedUpdateStatement != null) {
versionedUpdateStatement.close();
}
}
}
//the old way as fallback
private void markResultSetAsExported(ResultSet extractedData, TableMetaData tableMetaData) throws SQLException {
while (extractedData.next()) {
extractedData.updateString(tableMetaData.getUpdateColumn(), tableMetaData.getUpdateValue());
extractedData.updateRow();
}
}

How to add more quickly add 1 million triples in Sesame 2.7.7

I've noticed that the instantiation using the the RepositoryConnection method add was slower than when instantiated by modifying the model using a SPARQL query. Despite the difference, even the SPARQL update method takes a long time for instantiation (3.4 minutes to 10,000 triplets). The execution of multiple inserts (one query for each triple) or one big insert query does not change the performance of the methods. It is still slow. Is there another method appropriate for adding 1 million triples, or are there any special configurations that can help?
Code for RepositoryConnection
Repository myRepository = new HTTPRepository(serverURL, repositoryId);
myRepository.initialize();
RepositoryConnection con = myRepository.getConnection();
ValueFactory f = myRepository.getValueFactory();
i = 0;
j = 1000000;
while(i < j)(
URI event = f.createURI(ontologyIRI + "event"+i);
URI hasTimeStamp = f.createURI(ontologyIRI + "hasTimeStamp");
Literal timestamp = f.createLiteral(fields.get(0));
con.add(event, hasTimeStamp, timestamp);
i++
}
Code for SPARQL
Repository myRepository = new HTTPRepository(serverURL, repositoryId);
myRepository.initialize();
RepositoryConnection con = myRepository.getConnection();
i = 0;
j = 1000000;
while(i < j)(
query = "INSERT {";
query += "st:event"+i+" st:hasTimeStamp '"+fields.get(0)+"'^^<http://www.w3.org/2001/XMLSchema#float> .\n"
+ "}"
+ "WHERE { ?x ?y ?z }";
Update update = con.prepareUpdate(QueryLanguage.SPARQL, query);
update.execute();
i++;
}
Edition
I've done experiment with In Memory and Native Store Sesame repositories with synchronization value equal to 0
(I only just noticed that you added the requested additional info, hence this rather late reply)
The problem is, as I suspected, that you are not using transactions to batch your update operations together. Effectively, each add operation you do becomes a single transaction (a Sesame repository connection by default runs in autocommit mode), and this is slow and ineffecient.
To change this, start a transaction (using RepositoryConnection.begin()), then add your data, and finally call RepositoryConnection.commit() to finalize the transaction.
Here's how you should modify your first code example:
Repository myRepository = new HTTPRepository(serverURL, repositoryId);
myRepository.initialize();
RepositoryConnection con = myRepository.getConnection();
ValueFactory f = myRepository.getValueFactory();
i = 0;
j = 1000000;
try {
con.begin(); // start the transaction
while(i < j) {
URI event = f.createURI(ontologyIRI + "event"+i);
URI hasTimeStamp = f.createURI(ontologyIRI + "hasTimeStamp");
Literal timestamp = f.createLiteral(fields.get(0));
con.add(event, hasTimeStamp, timestamp);
i++;
}
con.commit(); // finish the transaction: commit all our adds in one go.
}
finally {
// always close the connection when you're done with it.
con.close();
}
The same applies to your code with the SPARQL update. For more information on how to work with transactions, have a look at the Sesame manual, particularly the chapter about using the Repository API.
As an aside: since you're working over HTTTP, there is a risk that if your transaction becomes too large, it will start consuming a lot of memory in your client. If this starts happening you may want to break up your update into several transactions. But with an update consisting of a million triples you should still be alright I think.

How to handle large dataset with JPA (or at least with Hibernate)?

I need to make my web-app work with really huge datasets. At the moment I get either OutOfMemoryException or output which is being generated 1-2 minutes.
Let's put it simple and suppose that we have 2 tables in DB: Worker and WorkLog with about 1000 rows in the first one and 10 000 000 rows in the second one. Latter table has several fields including 'workerId' and 'hoursWorked' fields among others. What we need is:
count total hours worked by each user;
list of work periods for each user.
The most straightforward approach (IMO) for each task in plain SQL is:
1)
select Worker.name, sum(hoursWorked) from Worker, WorkLog
where Worker.id = WorkLog.workerId
group by Worker.name;
//results of this query should be transformed to Multimap<Worker, Long>
2)
select Worker.name, WorkLog.start, WorkLog.hoursWorked from Worker, WorkLog
where Worker.id = WorkLog.workerId;
//results of this query should be transformed to Multimap<Worker, Period>
//if it was JDBC then it would be vitally
//to set resultSet.setFetchSize (someSmallNumber), ~100
So, I have two questions:
how to implement each of my approaches with JPA (or at least with Hibernate);
how would you handle this problem (with JPA or Hibernate of course)?
suppose that we have 2 tables in DB: Worker and WorkLog with about 1000 rows in the first one and 10 000 000 rows in the second one
For high volumes like this, my recommendation would be to use The StatelessSession interface from Hibernate:
Alternatively, Hibernate provides a
command-oriented API that can be used
for streaming data to and from the
database in the form of detached
objects. A StatelessSession has no
persistence context associated with it
and does not provide many of the
higher-level life cycle semantics. In
particular, a stateless session does
not implement a first-level cache nor
interact with any second-level or
query cache. It does not implement
transactional write-behind or
automatic dirty checking. Operations
performed using a stateless session
never cascade to associated instances.
Collections are ignored by a stateless
session. Operations performed via a
stateless session bypass Hibernate's
event model and interceptors. Due to
the lack of a first-level cache,
Stateless sessions are vulnerable to
data aliasing effects. A stateless
session is a lower-level abstraction
that is much closer to the underlying
JDBC.
StatelessSession session = sessionFactory.openStatelessSession();
Transaction tx = session.beginTransaction();
ScrollableResults customers = session.getNamedQuery("GetCustomers")
.scroll(ScrollMode.FORWARD_ONLY);
while ( customers.next() ) {
Customer customer = (Customer) customers.get(0);
customer.updateStuff(...);
session.update(customer);
}
tx.commit();
session.close();
In this code example, the Customer
instances returned by the query are
immediately detached. They are never
associated with any persistence
context.
The insert(), update() and
delete() operations defined by the
StatelessSession interface are
considered to be direct database
row-level operations. They result in
the immediate execution of a SQL
INSERT, UPDATE or DELETE
respectively. They have different
semantics to the save(),
saveOrUpdate() and delete()
operations defined by the Session
interface.
It seems you can do this with EclipseLink too.
Check this : http://wiki.eclipse.org/EclipseLink/Examples/JPA/Pagination :
Query query = em.createQuery...
query.setHint(QueryHints.CURSOR, true)
.setHint(QueryHints.SCROLLABLE_CURSOR, true)
ScrollableCursor scrl = (ScrollableCursor)q.getSingleResult();
Object o = null;
while ((o = scrl.next()) != null) { ... }
There are several techniques that may need to be used in conjunction with one another to create and manipulate queries for large data-sets where memory is a limitation:
Use setFetchSize(some value, maybe 100+) as the default (via JDBC) is 10.
This is more about performance and is the single biggest related factor thereof.
Can be done in JPA using queryHint available from provider (Hibernate, etc).
There does not (for whatever reason) seem to be a JPA Query.setFetchSize(int) method.
Do not try to marshall the entire result-set for 10K+ records.
Several strategies apply: For GUIs, use paging or a framework that does paging. Consider Lucene or commercial searching/indexing engines (Endeca if the company has the money). For sending data somewhere, stream it and flush the buffer every N records to limit how much memory is used. The stream may be flushed to a file, network, etc. Remember that underneath, JPA uses JDBC and JDBC keeps the result-set on the Server, only fetching N-rows in a row-set group at a time. This break-down can be manipulated to facilitate flushing data in groups.
Consider what the use-case is. Typically, an application is trying to answer questions. When the answer is to weed through 10K+ rows, then the design should be reviewed. Again, consider using indexing engines like Lucene, refine the queries, consider using BloomFilters as contains check caches to find needles in haystacks without going to the database, etc.
Raw SQL shouldn't be considered a last resort. It should still be considered an option if you want to keep things "standard" on the JPA tier, but not on the database tier. JPA also has support for native queries where it will still do the mapping to standard entities for you.
However, if you have a large result set that cannot be processed in the database, then you really should just use plain JDBC as JPA (standard) does not support streaming of large sets of data.
It will be harder to port your application across different application servers if you use JPA implementation specific constructs since the JPA engine is embedded in the application server and you may not have a control on which JPA provider is being used.
I'm using something like this and it works very fast. I also hate to use native SQL as our application should work on any database.
Folowing resutls into a very optimized sql and returns list of records which are maps.
String hql = "select distinct " +
"t.uuid as uuid, t.title as title, t.code as code, t.date as date, t.dueDate as dueDate, " +
"t.startDate as startDate, t.endDate as endDate, t.constraintDate as constraintDate, t.closureDate as closureDate, t.creationDate as creationDate, " +
"sc.category as category, sp.priority as priority, sd.difficulty as difficulty, t.progress as progress, st.type as type, " +
"ss.status as status, ss.color as rowColor, (p.rKey || ' ' || p.name) as project, ps.status as projectstatus, (r.code || ' ' || r.title) as requirement, " +
"t.estimate as estimate, w.title as workgroup, o.name || ' ' || o.surname as owner, " +
"ROUND(sum(COALESCE(a.duration, 0)) * 100 / case when ((COALESCE(t.estimate, 0) * COALESCE(t.progress, 0)) = 0) then 1 else (COALESCE(t.estimate, 0) * COALESCE(t.progress, 0)) end, 2) as factor " +
"from " + Task.class.getName() + " t " +
"left join t.category sc " +
"left join t.priority sp " +
"left join t.difficulty sd " +
"left join t.taskType st " +
"left join t.status ss " +
"left join t.project p " +
"left join t.owner o " +
"left join t.workgroup w " +
"left join p.status ps " +
"left join t.requirement r " +
"left join p.status sps " +
"left join t.iterationTasks it " +
"left join t.taskActivities a " +
"left join it.iteration i " +
"where sps.active = true and " +
"ss.done = false and " +
"(i.uuid <> :iterationUuid or it.uuid is null) " + filterHql +
"group by t.uuid, t.title, t.code, t.date, t.dueDate, " +
"t.startDate, t.endDate, t.constraintDate, t.closureDate, t.creationDate, " +
"sc.category, sp.priority, sd.difficulty, t.progress, st.type, " +
"ss.status, ss.color, p.rKey, p.name, ps.status, r.code, r.title, " +
"t.estimate, w.title, o.name, o.surname " + sortHql;
if (logger.isDebugEnabled()) {
logger.debug("Executing hql: " + hql );
}
Query query = hibernateTemplate.getSessionFactory().getCurrentSession().getSession(EntityMode.MAP).createQuery(hql);
for(String key: filterValues.keySet()) {
Object valueSet = filterValues.get(key);
if (logger.isDebugEnabled()) {
logger.debug("Setting query parameter for " + key );
}
if (valueSet instanceof java.util.Collection<?>) {
query.setParameterList(key, (Collection)filterValues.get(key));
} else {
query.setParameter(key, filterValues.get(key));
}
}
query.setString("iterationUuid", iteration.getUuid());
query.setResultTransformer(Transformers.ALIAS_TO_ENTITY_MAP);
if (logger.isDebugEnabled()) {
logger.debug("Query building complete.");
logger.debug("SQL: " + query.getQueryString());
}
return query.list();
I agree that doing the calculation on the database server is your best option in the particular case you mentioned. HQL and JPAQL can handle both of those queries:
1)
select w, sum(wl.hoursWorked)
from Worker w, WorkLog wl
where w.id = wl.workerId
group by w
or, if the association is mapped:
select w, sum(wl.hoursWorked)
from Worker w join w.workLogs wl
group by w
both or which return you List where the Object[]s are Worker and Long. Or you could also use "dynamic instantiation" queries to wrap that up, for example:
select new WorkerTotal( select w, sum(wl.hoursWorked) )
from Worker w join w.workLogs wl
group by w
or (depending on need) probably even just:
select new WorkerTotal( select w.id, w.name, sum(wl.hoursWorked) )
from Worker w join w.workLogs wl
group by w.id, w.name
WorkerTotal is just a plain class. It must have matching constructor(s).
2)
select w, new Period( wl.start, wl.hoursWorked )
from Worker w join w.workLogs wl
this will return you a result for each row in the WorkLog table... The new Period(...) bit is called "dynamic instantiation" and is used to wrap tuples from the result into objects (easier consumption).
For manipulation and general usage, I recommend StatelessSession as Pascal points out.

Categories