I have the follow statement in java which is called 500 times and it takes about 5 minutes to complete:
List<Items> itemList = session.createCriteria(Items.class)
.add(Restrictions.eq("id", itemID))
.setCacheable(true).setCacheRegion("query.DBMSItems")
.list();
I set on the configuration:
<property name="hibernate.cache.region.factory_class">org.hibernate.cache.ehcache.EhCacheRegionFactory</property>
<property name="hibernate.cache.use_second_level_cache">true</property>
<property name="hibernate.cache.use_query_cache">true</property>
and on the cache:
<cache
name="query.DBMSItems"
maxElementsInMemory="500000"
eternal="false"
timeToIdleSeconds="6000"
timeToLiveSeconds="60000"
overflowToDisk="false"
statistics="true" />
Hence looks like the cache is not working.
Any explanation of why this is happening will be greatly appreciated.
In second level cache, you're caching the query as key and the result as value. If you will execute the query with different parameters, each query will first be executed against the database and take the time it needs, then it will return the values and store the proper key-value pair in cache, and on several calls to the same query with same parameters then it will go to the second level cache.
A better solution for your case would be to cache your data first and then perform the queries against the data in cache.
5 minutes for 500 items is very slow... Before making workarounds like caching it would be better to find root cause. Try to profile you application, I recommend YourKit.
Related
I recently upgraded my Java Liquibase version from 3.5.3 to 3.6.3
I have a very heavy environment where there are lots of databases and tables (I am using Oracle).
On this environment, I am trying to execute a huge changelog file where I create tables and indices.
Find below a small part of the changelog.
<databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.2.xsd">
...
...
...
<changeSet author="me" id="tableCreation78">
<preConditions onFail="MARK_RAN">
<not>
<tableExists tableName="MY_TABLE_NAME" />
</not>
</preConditions>
<comment>Creating table MY_TABLE_NAME</comment>
<createTable tableName="MY_TABLE_NAME">
<column name="M_ID" type="bigint">
<constraints nullable="false" primaryKey="true" primaryKeyName="PK_MY_TABLE_NAME_190" />
</column>
<column name="M_FORMAT" type="int" />
</createTable>
</changeSet>
...
...
...
<changeSet author="me" id="indexCreation121">
<preConditions onFail="MARK_RAN">
<tableExists tableName="MY_TABLE_NAME"/>
<not>
<indexExists tableName="MY_TABLE_NAME" columnNames="M_FEEDER_ID"/>
</not>
</preConditions>
<comment>Creating index for MY_TABLE_NAME</comment>
<createIndex tableName="MY_TABLE_NAME" indexName="MY_INDEX_NAME">
<column name="M_ID_INDEX"/>
</createIndex>
</changeSet>
...
...
...
</databaseChangeLog>
On the Liquibase 3.5.3, creating the index used to be quick.
When I migrated to Liquibase 3.6.3, I had a severe regression in performance.
What used to run in 1-2 minutes, now takes up to 20 minutes to complete.
The changelog does not define Unique Constraints.
While debugging, I noticed one of the many differences between the two versions. In the 3.5.3, the listConstraints and listColumns methods from UniqueConstraintSnapshotGenerator are not called.
In the 3.6.3 version, these methods are called a lot, even though no unique constraints are defined in the changelog. I am guessing that they are here from the previously defined tables of the environment.
Some of these queries (see below) are called multiple times with the exact same parameters. I don't know if it's a maintenance step that was added in the 3.6.3.
2020-08-13 17:03:52,270 INFO [main] select ucc.owner as constraint_container, ucc.constraint_name as constraint_name, ucc.column_name, f.validated as constraint_validate from all_cons_columns ucc INNER JOIN all_constraints f ON ucc.owner = f.owner AND ucc.constraint_name = f.constraint_name where ucc.constraint_name='UC' and ucc.owner='DB' and ucc.table_name not like 'BIN$%' order by ucc.position
I am not sure if this is the cause of the regression but honestly, I am out of ideas.
Does anybody know if this might be the cause of this regression?
Did they add new maintenance steps in Liquibase 3.6.3 that might be causing this big performance degradation?
Thank you so much!
You may need to perform maintenance on your Oracle data dictionary. Databases that use Liquibase tend to drop and create more objects than the average Oracle database, which can cause performance problems with metadata queries.
First, gather optimizer statistics for fixed objects (V$ objects) and the data dictionary (ALL_ objects). This information helps Oracle build good execution plans for metadata queries. The below statement will take a few minutes but may only need to be run once a year:
begin
dbms_stats.gather_fixed_objects_stats;
dbms_stats.gather_dictionary_stats;
end;
/
Another somewhat-common reason for data dictionary query problems is a large number of objects in the recycle bin. The recycle bin is great on production systems, where it lets you instantly recover from dropping the wrong table. But on a development environment, if thousands of objects are constantly dropped but not purged, those old objects can slow down some metadata queries.
--Count the number of objects in the recycle bin.
select count(*) from dba_recyclebin;
--Purge all of them if you don't need them. Must be run as SYS.
purge dba_recyclebin;
Those are two quick and painless solutions to some data dictionary problems. If that doesn't help, you may need to tune specific SQL statements, which may require a lot of information. For example - exactly how long does it take your system to run that query against ALL_CONS_COLUMNS? (On my database, it runs in much less than a second.)
Run Liquibase and then use a query like the one below to find the slowest metadata queries:
select elapsed_time/1000000 seconds, executions, sql_id, sql_fulltext, gv$sql.*
from gv$sql
order by elapsed_time desc;
We will migrate large amounts of data (a single type of entity) from Amazon's DynamoDB into a MySQL DB. We are using Hibernate to map this class into a mysql entity. There are around 3 million entities (excluding rows of list property). Here is our class mapping summary:
#Entity
#Table(name = "CUSTOMER")
public class Customer {
#Id
#Column(name = "id")
private String id;
//Other properties in which all of them are primitive types/String
#ElementCollection
#CollectionTable(name = "CUSTOMER_USER", joinColumns = #JoinColumn(name = "customer_id"))
#Column(name = "userId")
private List<String> users;
// CONSTRUCTORS, GETTERS, SETTERS, etc.
}
users is a list of String. We have created two mysql tables like following:
CREATE TABLE CUSTOMER(id VARCHAR(100), PRIMARY KEY(id));
CREATE TABLE CUSTOMER_USER(customer_id VARCHAR(100), userId VARCHAR(100), PRIMARY KEY(customer_id, userId), FOREIGN KEY (customer_id) REFERENCES CUSTOMER(id));
Note: We do not make hibernate generate any id value, we are assigning our IDs to Customer entities which are guaranteed to be unique.
Here is our hibernate.cfg.xml:
<hibernate-configuration>
<session-factory>
<property name="hibernate.dialect"> org.hibernate.dialect.MySQLDialect </property>
<property name="hibernate.connection.driver_class"> com.mysql.jdbc.Driver </property>
<property name="hibernate.connection.url"> jdbc:mysql://localhost/xxx </property>
<property name="hibernate.connection.username"> xxx </property>
<property name="hibernate.connection.password"> xxx </property>
<property name="hibernate.connection.provider_class">org.hibernate.c3p0.internal.C3P0ConnectionProvider</property>
<property name="hibernate.jdbc.batch_size"> 50 </property>
<property name="hibernate.cache.use_second_level_cache">false</property>
<property name="c3p0.min_size">30</property>
<property name="c3p0.max_size">70</property>
</session-factory>
</hibernate-configuration>
We are creating some number of threads each reading data from Dynamo and inserting them to our MySQl DB via Hibernate. Here is what each thread does:
// Each single thread brings resultItems from DynamoDB
Session session = factory.openSession();
Transaction tx = session.beginTransaction();
for(int i = 0; i < resultItems.size(); i++) {
Customer cust = new Customer(resultItems.get(i));
session.save(cust);
if(i % BATCH_SIZE == 0) {
session.flush();
session.clear();
}
}
tx.commit();
session.close();
We have our own performance monitoring functions and we are continuously logging the overall read/write performance. The problem is, migration starts with reading/writing 1500 items/sec (on average), but keeps getting slowed as long as number of rows in CUSTOMER and CUSTOMER_USER tables increases (after a few minutes, r/w speed was around 500 items/sec). I am not experienced on Hibernate and here are my questions:
What should hibernate.cfg.xml be like for a multi-threaded task like ours? Is the content which i gave above fits for such a task or is there any wrong/missing point?
There are exactly 50 threads and each does following: Read from DynamoDB first, and then insert the results into mysql db, then read from dynamo, and so on. Therefore, uptime of communication with hibernate is not 100%. Under these circumstances, what do you recommend to set min_size and max_size of c3p0 connection pool sizes? To be able to understand the concept, should I also set remaining c3p0-related tags in hibernate.cfg.xml?
What can be done to maximize the speed of bulk inserting?
NOTE 1 I did not write all of the properties, because the remaining ones other than list of users are all int, boolean, String, etc.
NOTE 2 All of points are tested and have no negative effect on performance. When we dont insert anything into mysql db, read speed stays stable for hours.
NOTE 3 Any recommendation/guidance about the structure of mysql tables, configuration settings, sessions/transactions, number of connection pools, batch sizes, etc. will be really helpful!
Assuming you are not doing anything else in the hibernate transaction than just inserting the data into these two tables, you can use StatelessSession session = sessionFactory.openStatelessSession(); instead of normal session which reduces the overhead of maintaining the caches. But then you will have to save the nested collection objects separately.
Refer https://docs.jboss.org/hibernate/orm/3.3/reference/en/html/batch.html
So it could be something like -
// Each single thread brings resultItems from DynamoDB
StatelessSession session = factory.openStatelessSession();
Transaction tx = session.beginTransaction();
for(int i = 0; i < resultItems.size(); i++) {
Customer cust = new Customer(resultItems.get(i));
Long id = session.save(cust); // get the generated id
// TODO: Create a list of related customer users and assign the id to all of them and then save those customer user objects in the same transaction.
if(i % BATCH_SIZE == 0) {
session.flush();
session.clear();
}
}
tx.commit();
session.close();
In your scenario there are 25 threads batch-inserting data into one table simultaneously. MySQL has to maintain ACID properties while 25 transactions for many records in one table remain open or are being committed. That can cause a huge overhead.
While migrating data from databases, network latency can cause significant delays when there are many back-and-forth communications with the database. In this case, using multiple threads can be beneficial. But when doing batch fetches and batch inserts, there is little to gain as the database drivers will (or should) communicate data without doing much back-and-forth communications.
In the batch-scenario, start with 1 thread that reads data, prepares a batch and puts it in a queue for 1 thread that is writing data from the prepared batches. Keep the batches small (100 to 1 000 records) and commit often (every 100 records or so). This will minimize the overhead for maintaining the table. If network latency is a problem, try using 2 threads for reading and 2 for writing (but any performance gain might be offset by the overhead for maintaining the table used by 2 threads simultaneously).
Since there is no generated ID, you should benefit from the hibernate.jdbc.batch_size option already in your hibernate configuration. The hibernate.jdbc.fetch_size option (set this to 250 or so) might also be of interest.
As #hermant1900 mentions, using the StatelessSession is also a good idea. But by far the fastest method is mentioned by #Rob in the comments: use database tools to export the data to a file and import it in MySQL. I'm quite sure this is also the preferred method: it takes less time, less processing and there are fewer variables involved - overall a lot more reliable.
I have a basic setup where the database is read by multiple web applications, and periodically I have a batch application which does a lot of writing. During the writing, the peroformance of the web-apps degrade heavily (their database reads are very slow).
The env. is MySQL db using MYISAM engine, the batch application is a Java SE app, using spring-batch and SimpleJDBCTemplate to issue SQL commands via JDBC. I found that MySQL has a parameter that lowers the priority of write operations on MYISAM engine: low_priority_updates. To quote the docs, amongs others, you can "SET LOW_PRIORITY_UPDATES=1 to change the priority in one thread". I opted for this because it's easiest from the config standpoint of my application. What I've done is configured my DataSource such that it exectutes that "SET ..." for each connection it opens, like so:
<bean id="dataSource" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close">
<!-- other props omitted -->
<property name="connectionInitSqls">
<list>
<value>SET low_priority_updates="ON"</value>
</list>
</property>
</bean>
Now my question is, how do I actually check that an SQL issued via this datasource does actually run with low priority? If I do SHOW FULL PROCESSLIST in MySQL while the inserts are happening it just tell me what SQL they're executing, nothing about the priority:
If I check the server variables low_priority_updates is "OFF" but that's just the server variable, it sais nothing about the thread-local value.
So again, is there any actual way to check if per query/thread low_priority_updates's values are taken into account?
By issuing SET LOW_PRIORITY_UPDATES=1 command, you are affecting the variable value for the session. Therefore it is possible to see this by checking the value of the variable in the session.
I know of two ways to do it:
1- SHOW SESSION VARIABLES LIKE 'low_priority_dapdates'
this shows ON/OFF
2- select ##session.low_priority_updates
this gives 0/1
Important: the above statements/calls will show you the values of the variables in the session where they run.
Therefore, you will need to run them using the connections themselves in order to see the values. I don't know of a way in MySQL where you can select values for variables that belong to another session.
If you would like to see them as a list, you might need to do a work around by creating a table and logging that info yourself. for example:
CREATE TABLE `mydb`.`my_low_priority_updates` (
`connection_id` INT ,
`low_priority_updates_value` INT NOT NULL
)
ENGINE = MyISAM;
then you need a statement that inserts the connection id and the value into the table:
insert into my_low_priority_updates(connection_id,low_priority_updates_value)
select connection_id(),##session.low_priority_updates
from dual
where not exists (select 1 from my_low_priority_updates where connection_id=connection_id())
you can put this statement in a procedure and make sure its called, or add it in a trigger on a table that you know gets updated/inserted into.
after that, querying the my_low_priority_updates table later will show you the values of the variable in each connection.
I need to consume a rather large amounts of data from a daily CSV file. The CSV contains around 120K records. This is slowing to a crawl when using hibernate. Basically, it seems hibernate is doing a SELECT before every single INSERT (or UPDATE) when using saveOrUpdate(); for every instance being persisted with saveOrUpdate(), a SELECT is issued before the actual INSERT or a UPDATE. I can understand why it's doing this, but its terribly inefficient for doing bulk processing, and I'm looking for alternatives
I'm confident that the performance issue lies with the way I'm using hibernate for this, since I got another version working with native SQL (that parses the CSV in the excat same manner) and its literally running circles around this new version)
So, to the actual question, does a hibernate alternative to mysqls "INSERT ... ON DUPLICATE" syntax exist?
Or, if i choose to do native SQL for this, can I do native SQL within a hibernate transaction? Meaning, will it support commit/rollbacks?
There are many possible bottlenecks in to bulk operations. The best approach depends heavily on what your data looks like. Have a look at the Hibernate Manual section on batch processing.
At a minimum, make sure you are using the following pattern (copied from the manual):
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.save(customer);
if ( i % 20 == 0 ) { //20, same as the JDBC batch size
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
}
tx.commit();
session.close();
If you are mapping a flat file to a very complex object graph you may have to get more creative, but the basic principal is that you have to find a balance between pushing good sized chunks of data to the database with each flush/commit and avoiding exploding the size of the session level cache.
Lastly, if you don't need Hibernate to handle any collections or cascading for your data to be correctly inserted, consider using a StatelessSession.
From Hibernate Batch Processing
For update i used the following :
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
ScrollableResults employeeCursor = session.createQuery("FROM EMPLOYEE")
.scroll();
int count = 0;
while ( employeeCursor.next() ) {
Employee employee = (Employee) employeeCursor.get(0);
employee.updateEmployee();
seession.update(employee);
if ( ++count % 50 == 0 ) {
session.flush();
session.clear();
}
}
tx.commit();
session.close();
But for insert i would go for jcwayne answer
According to an answer to a similar question, it can be done by configuring Hibernate to insert objects using a custom stored procedure which uses your database's upsert functionality. It's not pretty, though.
High-throughput data export
If you only want to import data without doing any processing or transformation, then a tool like PostgreSQL COPY is the fastest way o import data.
Batch processing
However, if you need to do the transformation, data aggregation, correlation/merging between existing data and the incoming one, then you need application-level batch processing.
In this case, you want to flush-clear-commit regularly:
int entityCount = 50;
int batchSize = 25;
EntityManager entityManager = entityManagerFactory()
.createEntityManager();
EntityTransaction entityTransaction = entityManager
.getTransaction();
try {
entityTransaction.begin();
for (int i = 0; i < entityCount; i++) {
if (i > 0 && i % batchSize == 0) {
entityTransaction.commit();
entityTransaction.begin();
entityManager.clear();
}
Post post = new Post(
String.format("Post %d", i + 1)
);
entityManager.persist(post);
}
entityTransaction.commit();
} catch (RuntimeException e) {
if (entityTransaction.isActive()) {
entityTransaction.rollback();
}
throw e;
} finally {
entityManager.close();
}
Also, make sure you enable JDBC batching as well using the following configuration properties:
<property
name="hibernate.jdbc.batch_size"
value="25"
/>
<property
name="hibernate.order_inserts"
value="true"
/>
<property
name="hibernate.order_updates"
value="true"
/>
Bulk processing
Bulk processing is suitable when all rows match pre-defined filtering criteria, so you can use a single UPDATE to change all records.
However, using bulk updates that modify millions of records can increase the size of the redo log or end up taking lots of locks on database systems that still use 2PL (Two-Phase Locking), like SQL Server.
So, while the bulk update is the most efficient way to change many records, you have to pay attention to how many records are to be changed to avoid a long-running transaction.
Also, you can combine bulk update with optimistic locking so that other OLTP transactions won't lose the update done by the bulk processing process.
If you use sequence or native generator Hibernate will use a select to get the id:
<id name="id" column="ID">
<generator class="native" />
</id>
You should use hilo or seqHiLo generator:
<id name="id" type="long" column="id">
<generator class="seqhilo">
<param name="sequence">SEQ_NAME</param>
<param name="max_lo">100</param>
</generator>
</id>
The "extra" select is to generate the unique identifier for your data.
Switch to HiLo sequence generation and you can reduce the sequence roundtrips to the database by the number of the allocation size. Please note, there will be a gap in primary keys unless you adjust your sequence value for the HiLo generator
I am writing a query but it always says "No matching index found". I don't know why. My code is as below:
Query query = pm.newQuery(Classified.class);
query.setFilter("emp_Id == emp");
query.setOrdering("upload_date desc");
query.declareParameters("String emp");
List<Classified> results = (List<Classified>)query.execute(session.getAttribute("emp_Id").toString());
<?xml version="1.0" encoding="utf-8"?>
<datastore-indexes autoGenerate="true">
<datastore-index kind="Classified" ancestor="false">
<property name="emp_Id" direction="asc" />
<property name="category" direction="asc" />
<property name="upload_date" direction="desc" />
</datastore-index>
</datastore-indexes>
I have added the above index, but it did not help.
I believe you need to configure a Datastore Index. There's probably one already generated for you in Eclipse at WEB-INF/appengine-generated/datastore-indexes-auto.xml that you just need to copy to WEB-INF/datastore-indexes.xml and deploy again.
Because this needs to be somewhere on the internet...
I kicked myself when I found this out
The error is you do not have a index matching what the query would like to perform. You can have multiple indexes for each entity.
In the Logcat, error, it will tell you exactly what index to set and what order the elements need to be.
ie, if the error says it wants (it wont be nicely formatted):
<datastore-index kind="Classified" ancestor="false">
<property name="category" direction="desc" />
<property name="upload_date" direction="desc" />
</datastore-index>
then Project -> war -> WEB-INF -> appengine-generated -> datastore-indexes-auto.xml and add exactly that. Then, redeploy the project.
Next go into your Google Cloud Console and look at Datastore -> indexes. It should say that the index is being prepared (This goes quicker if you can kill all apps connected and shut down the instance in the console).
Once this has moved into the list of other indexes, rerun the your application and it wont error out with regards to the index anymore.
Go get it Gentlemen/Ladies
The index you define must hold all possible results in the order they will be returned. Your query asks for a particular emp_Id, ordered by upload_date, but your index is ordered primarily by category.
Try removing the category line from your index definition, or swapping the order of category and upload_date, to make upload_date the primary sort order for the index. If another part of your code relies on the category line, you may have to make two separate indices (which incurs some computational cost).
Edit: see comment below by Nick Johnson re. extra parameters.
I am running into this issue at the moment when doing a single property query such as:
const query = datastore
.createQuery('Emailing_dev')
.filter('status', '=', 'Scheduled')
In my case, I should not be getting any errors, however I get Error 9: No matching index found.
If I defined the single property index twice in the yaml, it works:
indexes:
- kind: Emailing_dev
properties:
- name: status
- name: status
but this for sure must be a bug..!!