Indexing data in Hibernate Search

Indexing data in Hibernate Search - java

I just started integrating Hibernate Search with my Hibernate application. The data is indexed by using Hibernate Session every time I start the server.
FullTextSession fullTextSession = Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
List books = session.createQuery("from Book as book").list();
for (Book book : books) {
fullTextSession.index(book);
}
tx.commit(); //index is written at commit time
It is very awkward and the server takes 10 minutes to start.
Am I doing the this in right way?
I wrote a scheduler which will update the indexes periodically. Will this update the existing index entries automatically, or create duplicate indices?

As detailed in the Hibernate Search guide, section 3.6.1, if you are using annotations (by now the default), the listeners which launch indexing on store are registered by default:
Hibernate Search is enabled out of the
box when using Hibernate Annotations
or Hibernate EntityManager. If, for
some reason you need to disable it,
set
hibernate.search.autoregister_listeners
to false.
An example on how to turn them on by hand:
hibConfiguration.setListener("post-update", new FullTextIndexEventListener());
hibConfiguration.setListener("post-insert", new FullTextIndexEventListener());
hibConfiguration.setListener("post-delete", new FullTextIndexEventListener());
All you need to do is annotate the entities which you want to be indexed with the
#Indexed(index = "fulltext")
annotation, and then do the fine-grained annotation on the fields, as detailed in the user guide.
So you should neither launch indexing by hand when storing, neither relaunch indexing whae the application starts, unless you have entities which have been stored before indexing was enabled.
You may get performance problems when you are storing an object which say has an "attachment" and so you are indexing that in the same scope of the transaction which is storing the entity. See here:
Hibernate Search and offline text extraction
for a solution that solves this problem.

Provided you are using a FSDirectoryProvider (which is the default) the Lucene index is persisted on disk. This means there is no need to index on very startup. If you have existing database you want of course to create an initial index using the fullTextSession.index() functionality. However, this should not be on application startup. Consider exposing some sort of trigger url, or admin interface.
Once you have the initial index I would recommend to use automatic indexing. This means that the Lucene index gets automatically updated if a books get created/updated/deleted. Automatic indexing should also be enabled by default.
I recommend you refer to the automatic and manual indexing sections in the online manual - http://docs.jboss.org/hibernate/stable/search/reference/en/html_single
--Hardy

I currently use Hibernate Search's automatic indexing with JPA and it works really well. To create your indexes initially you can just call the following:
FullTextEntityManager fullTextEntityManager =
Search.getFullTextEntityManager(entityManager);
try {
fullTextEntityManager.createIndexer().startAndWait();
} catch (InterruptedException e) {
// Exception handling
}
where "entityManager" is just a javax.persistence.EntityManager. The above will index all fields marked with #Field for all entities marked as #Indexed.
Then as long as you do all your updates, etc, through the entity manager the indexes are automatically updated. You can then search as per usual but be sure to recreate your EntityManager on each search (you can use the EntityManagerFactory to do so).

Related

Spring JPA always caches data [duplicate]

This question already has answers here:
Spring Data JPA Update #Query not updating?
(5 answers)
Closed 2 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Let's suppose to have this situation:
We have Spring Data configured in the standard way, there is a Respository object, an Entity object and all works well.
Now for some complex motivations I have to use EntityManager (or JdbcTemplate, whatever is at a lower level than Spring Data) directly to update the table associated to my Entity, with a native SQL query. So, I'm not using Entity object, but simply doing a database update manually on the table I use as entity (it's more correct to say the table from which I get values, see next rows).
The reason is that I had to bind my spring-data Entity to a MySQL view that makes UNION of multiple tables, not directly to the table I need to update.
What happens is:
In a functional test, I call the "manual" update method (on table from which the MySQL view is created) as previously described (through entity-manager) and if I make a simple Respository.findOne(objectId), I get the old object (not updated one). I have to call Entitymanager.refresh(object) to get the updated object.
Why?
Is there a way to "synchronize" (out of the box) objects (or force some refresh) in spring-data? Or am I asking for a miracle?
I'm not ironical, but maybe I'm not so expert, maybe (or probably) is my ignorance. If so please explain me why and (if you want) share some advanced knowledge about this amazing framework.

If I make a simple Respository.findOne(objectId) I get old object (not
updated one). I've to call Entitymanager.refresh(object) to get
updated object.
Why?
The first-level cache is active for the duration of a session. Any object entity previously retrieved in the context of a session will be retrieved from the first-level cache unless there is reason to go back to the database.
Is there a reason to go back to the database after your SQL update? Well, as the book Pro JPA 2 notes (p199) regarding bulk update statements (either via JPQL or SQL):
The first issue for developers to consider when using these [bulk update] statements
is that the persistence context is not updated to reflect the results
of the operation. Bulk operations are issued as SQL against the
database, bypassing the in-memory structures of the persistence
context.
which is what you are seeing. That is why you need to call refresh to force the entity to be reloaded from the database as the persistence context is not aware of any potential modifications.
The book also notes the following about using Native SQL statements (rather than JPQL bulk update):
■ CAUTION Native SQL update and delete operations should not be
executed on tables mapped by an entity. The JP QL operations tell the
provider what cached entity state must be invalidated in order to
remain consistent with the database. Native SQL operations bypass such
checks and can quickly lead to situations where the inmemory cache is
out of date with respect to the database.
Essentially then, should you have a 2nd level cache configured then updating any entity currently in the cache via a native SQL statement is likely to result in stale data in the cache.

In Spring Boot JpaRepository:
If our modifying query changes entities contained in the persistence context, then this context becomes outdated.
In order to fetch the entities from the database with latest record.
Use #Modifying(clearAutomatically = true)
#Modifying annotation has clearAutomatically attribute which defines whether it should clear the underlying persistence context after executing the modifying query.
Example:
#Modifying(clearAutomatically = true)
#Query("UPDATE NetworkEntity n SET n.network_status = :network_status WHERE n.network_id = :network_id")
int expireNetwork(#Param("network_id") Integer network_id, #Param("network_status") String network_status);

Based on the way you described your usage, fetching from the repo should retrieve the updated object without the need to refresh the object as long as the method which used the entity manager to merge has #transactional
here's a sample test
#DirtiesContext(classMode = ClassMode.AFTER_CLASS)
#RunWith(SpringJUnit4ClassRunner.class)
#ContextConfiguration(classes = ApplicationConfig.class)
#EnableJpaRepositories(basePackages = "com.foo")
public class SampleSegmentTest {
#Resource
SampleJpaRepository segmentJpaRepository;
#PersistenceContext
private EntityManager entityManager;
#Transactional
#Test
public void test() {
Segment segment = new Segment();
ReflectionTestUtils.setField(segment, "value", "foo");
ReflectionTestUtils.setField(segment, "description", "bar");
segmentJpaRepository.save(segment);
assertNotNull(segment.getId());
assertEquals("foo", segment.getValue());
assertEquals("bar",segment.getDescription());
ReflectionTestUtils.setField(segment, "value", "foo2");
entityManager.merge(segment);
Segment updatedSegment = segmentJpaRepository.findOne(segment.getId());
assertEquals("foo2", updatedSegment.getValue());
}
}

JAX-WS Webservice with JPA transactions

I'm going to become mad with JPA...
I have a JAX-WS Webservice like that
#WebService
public class MyService
{
#EJB private MyDbService myDbService;
...
System.out.println(dmrService.read());
...
}
My EJB contains
#Stateless
public class MyDbService
{
#PersistenceContext(unitName="mypu")
private EntityManager entityManager;
public MyEntity read()
{
MyEntity myEntity;
String queryString = "SELECT ... WHERE e.name = :type";
TypedQuery<MyEntity> query = entityManager.createQuery(queryString,MyEntity.class);
query.setParameter("type","xyz");
try
{
myEntity= query.getSingleResult();
}
catch (Exception e)
{
myEntity= null;
}
return myEntity;
}
In my persistence.xml the mypu has transaction-type="JTA" and a jta-data-source
If I call the webservice, it's working. The entity is retrieved from the db.
Now, using an external tool, I'm changing the value of one field in my record.
I'm calling the webservice again and ... the entity displayed contains the old value.
If I'm deploying again, or if I'm adding a entityManager.refresh(myEntity) after the request, I have the good value again.

In #MyTwoCents answer, Option 2 is to NOT use your 'external' tool for changes, use your application instead. Caching is of more use if your application knows about all the changes going on, or has some way of being informed of them. This is the better option, but only if your application can be the single access point for the data.
Forcing a refresh, via EntityManager.refresh() or through provider specific query hints on specific queries, or by invalidating the cache as described here https://wiki.eclipse.org/EclipseLink/Examples/JPA/Caching#How_to_refresh_the_cache is another option. This forces JPA to go past the cache and access the database on the specific query. Problems with this are you must either know when the cache is stale and needs to be refreshed, or put it on queries that cannot tolerate stale data. If that is fairly frequent or on every query, then your application is going through all the work of maintaining a cache that isn't used.
The last option is to turn off the second level cache. This forces queries to always load entities into an EntityManager from the database data, not a second level cache. You reduce the risk of stale data (but not eliminate it, as the EntityManager is required to have its own first level cache for managed entities, representing a transactional cache), but at the cost of reloading and rebuilding entities, sometimes unnecessarily if they have been read before by other threads.
Which is best depends entirely on the application and its expected use cases.

Don't be mad its fine
Flow goes like this.
You fired a query saying where type="xyz"
Now Hibernate keeps this query or state in cache so that if you fire query again it will return same value if state is not changes.
Now you are updating detail from some external resource.
Hibernate doesnt have any clue about that
So when you fire query again it returns from catch
When you do refresh, hibernate gets detail from Database
Solution :
So you can either add refresh before calling get call
OR
Change the Table value using Hibernate methods in Application so that Hibernate is aware about changes.
OR
Disable Hibernate cache to query each time from DB (not recommended as it will slow down stuff)

Clear Hibernate 2nd level cache after manually DB update

Shortly, I have an entity mapped to view in DB (Oracle) with enabled 2nd level Cache (read only strategy) -- ehcache.
If I manually update some column in DB -- cache will not be updated.
I did not find any ways to do this. Only if updates will be done through Hibernate entity.
May I somehow implement this feature?
Maybe Job to monitor table (or view)? Or maybe there is some method to notify Hibernate about change in DB in concrete table.
Thanks for future answers!

According to Hibernate JavaDoc, you can use org.hibernate.Cache.evictAllRegions() :
evictAllRegions() Evict all data from the cache.
Using Session and SessionFactory:
Session session = sessionFactory.getCurrentSession();
if (session != null) {
session.clear(); // internal cache clear
}
Cache cache = sessionFactory.getCache();
if (cache != null) {
cache.evictAllRegions(); // Evict data from all query regions.
}
1) If you need update only one entity (if directly from db you will update only certain entities) not whole session, you can use
evictEntityRegion(Class entityClass) Evicts all entity data from the given region (i.e.
2) If you have a lot of entities, that can be updated directly from db you can use this method that evicts all entities from 2nd level cache (we can expose this method to admins through JMX or other admin tools):
/**
* Evicts all second level cache hibernate entites. This is generally only
* needed when an external application modifies the game databaase.
*/
public void evict2ndLevelCache() {
try {
Map<String, ClassMetadata> classesMetadata = sessionFactory.getAllClassMetadata();
Cache cache = sessionFactory.getCache();
for (String entityName : classesMetadata.keySet()) {
logger.info("Evicting Entity from 2nd level cache: " + entityName);
cache.evictEntityRegion(entityName);
}
} catch (Exception e) {
logger.logp(Level.SEVERE, "SessionController", "evict2ndLevelCache", "Error evicting 2nd level hibernate cache entities: ", e);
}
}
3) Another approach is described here for postgresql+hibernate, I think you can do something similar for Oracle like this

Use debezium for asynchronous cache updation from your database. You can know more by visiting https://debezium.io/
Also this article is very helpful as it gives direct implementation
https://debezium.io/blog/2018/12/05/automating-cache-invalidation-with-change-data-capture/

As mentioned, when you update DB from the back-end manually (not though application/ hibernate session), the cache is not updated. And your application remains ignorant about it.
To tell the app about the change, you need to refresh the entire cache or part of the cache related to the entity depending on the case. This can be one in two ways:
1- Restart the application - In this case the cache will be rebuild with updated DB change.
2- Trigger the update w/o restarting the app - You need not restart the app but you want to tell you application that the current cache in invalid and it should be refreshed.
You can give this external push to you app in many ways. Few are
listed below.
Through JMX.
Through a servlet with a published URL to refresh the cache. Hit the URL after you change tables in DB.
Implementing a trigger on the database that call a listener on the application.
While implementing the external push/ admin task, you can call a suitable cache related method based on your requirement to invalidate cache/ refresh cache. Examples: Session.refresh(), Cache.evictAllRegions(), Cache.evictEntityRegion(entityName) etc as described in other posts.

You will find here the way to control the second level cache:
http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html#performance-sessioncache

You can use session.refresh() method to reload the objects which are currently held in session.
Read object loading for more detail.

As of JEE 7.0:
myStatelessDaoBean.getSession().evict(MyEntity.class);

Does Playframework (Ebean ORM ) support managed enties

I started working on a new project using Playframework and thought to try using it with Ebean ORM. What i am wondering now is if Play with the Ebean implementation supports managed entities and if so ..how? Take this example method from the controller:
#Transactional
public Result changePassword() {
Long userId = Long.valueOf(session("id"));
User user = User.find.byId(userId);
user.setName("John Doe");
}
Is there any way to persist the changes to the database when the transaction ends? Currently what i am doing is calling user.save(). This is not much but working with JEE/JPA (and recently Dropwizard) i got used to have my entities changes persisted when the transaction ends.

No, I don't think Play natively supports something like auto save transaction as you want. Explicit save method is the only option.
Ebean.save(xyz);
General suggestion on your code, From Play 2.3.x, In your case, No need to annotate or explicitly mention the transaction, if Save is the only action on User EBean. By default each action on EBeans will be executed in separate transactions. Can specify the transaction explicitly if multiple actions need to be executed in single transaction.

JPA2(JBoss7.1's Hibernate) entityManager.find() is getting data from Cache not from DB

I am developing a web application using JSF2, JPA2, EJB3 via JBoss7.1.
I have an Entity(Forum) which contains a list of child entities(Topic).
When I tried to get the list of Topics by forumId for the first time the data is being loaded from DB.
List<Topic> topics = entityManager.find(Forum.class, 1).getTopics();
After that I am adding few more child entities(Topics) to Forum and then again I am trying to retrieve list of Topics by forumId. Nut I am getting the old cached results only. The newly inserted child records are not being loaded from DB.
I am able to load the child entities(Topics) by using following methods:
Method1: Calling entityManager.clear() before entityManager.find()
Method2: Using
em.createQuery("select t from Topic t where t.forum.forumId=?1", Topic.class);
or
em.createQuery("SELECT t FROM Topic t JOIN t.forum f WHERE f.forumId = ?1", Topic.class);
I am aware of setting the QueryHints on NamedQueries. But em.find() method is in a super CrudService which is being extended by all DAOs(Stateless EJBs). So setting QueryHints won't work for me.
So I want to know how can i make em.find() method to load data from DB instead of Cache?
PS: I am using Extended Persistence Context type.
#PersistenceContext(unitName="forum", type=PersistenceContextType.EXTENDED)
protected EntityManager em;

You can specify the behavior of individual find operations by setting additional properties that control the entity managers interaction with the second level cache.
Map<String, Object> props = new HashMap<String, Object>();
props.put("javax.persistence.cache.retrieveMode", CacheRetrieveMode.BYPASS);
entityMgr.find(Forum.class, 1, props).getTopics();

Is it possible that the relation between Forum and Topic was only added in one direction in your entity beans? If you set the forum id on the topic, you should also add this topic to the Forum object to have consistent data inside the first level cache. You should also make sure that you are not using two different entity managers for the update and find. The first level cache is only kept per entity manager, another em can still contain an older version of the entitiy.
Probably unrelated, but with JPA2 you also have a minimal api to evict entities from the second level cache, which could be used after an update:
em.getEntityManagerFactory().getCache().evict(Forum.class, forumId);

Put #Cacheable(false) within the Forum.class.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.