We are evaluating Neo4J Enterprise Edition Causal Cluster using Bolt Driver for Java.
We have 3 node Core Cluster.
The performance we saw is too low.
We are creating just 1 node with 2 property 10,00,000 times. When tracked, we are getting 300TPS (i.e. only 300 nodes are created per second).
OS is Linux, RHEL.
Each core is running with 32GB.
We were estimating close to 50,000 TPS for creation of just 1 node however it is only 300 TPS which is way way way too low.
I am sure we are missing something big.
This function is called 10,00,000 times by a thread pool of 64 threads.
Code Snippet:
#Override
public void createNode() throws InterruptedException {
try (Session session = RTNeo4j.getInstance().getWriteDriver().session(AccessMode.WRITE)) {
try (final Transaction tx = session.beginTransaction()) {
try {
tx.run("CREATE (a:Person {name: {name}, id: {id}})",
parameters("name", "king", "id", System.currentTimeMillis()));
tx.success();
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
Appreciate quick help for evaluation.
You do not have to create each time a session within a method. Move the creation of the session outside method:
Session session = RTNeo4j.getInstance().getWriteDriver().session(AccessMode.WRITE)
Related
I have recently started using JedisCluster for my application. There is little to no documentation and examples for the same. I tested a use case and the results are not what I expected
public class test {
private static JedisCluster setConnection(HashSet<HostAndPort> IP) {
JedisCluster jediscluster = new JedisCluster(IP, 30000, 3,
new GenericObjectPoolConfig() {{
setMaxTotal(500);
setMinIdle(1);
setMaxIdle(500);
setBlockWhenExhausted(true);
setMaxWaitMillis(30000);
}});
return jediscluster;
}
public static int getIdleconn(Map<String, JedisPool> nodes){
int i = 0;
for (String k : nodes.keySet()) {
i+=nodes.get(k).getNumIdle();
}
return i;
}
public static void main(String[] args) {
HashSet IP = new HashSet<HostAndPort>() {
{
add(new HostAndPort("host1", port1));
add(new HostAndPort("host2", port2));
}};
JedisCluster cluster = setConnection(IP);
System.out.println(getIdleconn(cluster.getClusterNodes()));
cluster.set("Dummy", "0");
cluster.set("Dummy1", "0");
cluster.set("Dummy3", "0");
System.out.println(getIdleconn(cluster.getClusterNodes()));
try {
Thread.sleep(60000);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println(getIdleconn(cluster.getClusterNodes()));
}
}
The output for this snippet is:
0
3
3
Questions=>
I have set the timeout to 30000 JedisCluster(IP, 30000, 3,new GenericObjectPoolConfig() . I believe this is the connection timeout which means Idle connections are closed after 30 seconds. Although this doesn't seem to be happening. After sleeping for 60 seconds, the number of idle connections is still 3. What I am doing/understanding wrong here? I want the pool to close the connection if not used for more than 30 seconds.
setMinIdle(1). Does this mean that regardless the connection timeout, the pool will always maintain one connection?
I prefer availability more than throughput for my app. What should be the value for setMaxWaitMillis if conn timeout is 30 secs?
Though rare, the app fails with redis.clients.jedis.exceptions.JedisNoReachableClusterNodeException: No reachable node in cluster. This i think is connected to 1. How to prevent this?
30000 or 30 seconds here refers to (socket) timeout; the timeout for single socket (read) operation. It is not related with closing idle connections.
Closing idle connections are controlled by GenericObjectPoolConfig. So check the parameters there.
Yes (mostly).
setMaxWaitMillis is the timeout for getting a connection object from a connection object pool. It is not related to 30 secs and not really solve you anything in terms of availability.
Keep your cluster nodes available.
There has been changes in Jedis related to this. You can try a recent version (4.x, even better 4.2.x).
For my current project i'm using Cassandra Db for fetching data frequently. Within every second at least 30 Db requests will hit. For each request at least 40000 rows needed to fetch from Db. Following is my current code and this method will return Hash Map.
public Map<String,String> loadObject(ArrayList<Integer> tradigAccountList){
com.datastax.driver.core.Session session;
Map<String,String> orderListMap = new HashMap<>();
List<ResultSetFuture> futures = new ArrayList<>();
List<ListenableFuture<ResultSet>> Future;
try {
session =jdbcUtils.getCassandraSession();
PreparedStatement statement = jdbcUtils.getCassandraPS(CassandraPS.LOAD_ORDER_LIST);
for (Integer tradingAccount:tradigAccountList){
futures.add(session.executeAsync(statement.bind(tradingAccount).setFetchSize(3000)));
}
Future = Futures.inCompletionOrder(futures);
for (ListenableFuture<ResultSet> future : Future){
for (Row row: future.get()){
orderListMap.put(row.getString("cliordid"), row.getString("ordermsg"));
}
}
}catch (Exception e){
}finally {
}
return orderListMap;
}
My data request query is something like this,
"SELECT cliordid,ordermsg FROM omsks_v1.ordersStringV1 WHERE tradacntid = ?".
My Cassandra cluster has 2 nodes with 32 concurrent read and write thread for each and my Db schema as follow
CREATE TABLE omsks_v1.ordersstringv1_copy1 (
tradacntid int,
cliordid text,
ordermsg text,
PRIMARY KEY (tradacntid, cliordid)
) WITH bloom_filter_fp_chance = 0.01
AND comment = ''
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE'
AND caching = {
'keys' : 'ALL',
'rows_per_partition' : 'NONE'
}
AND compression = {
'sstable_compression' : 'LZ4Compressor'
}
AND compaction = {
'class' : 'SizeTieredCompactionStrategy'
};
My problem is getting Cassandra timeout exception, how to optimize my code to handle all these requests
It would be better if you would attach the snnipet of that Exception (Read/write exception). I assume you are getting read time out. You are trying to fetch a large data set on a single request.
For each request at least 40000 rows needed to fetch from Db
If you have a large record and resultset is too big, it throws exception if results cannot be returned within a time limit mentioned in Cassandra.yaml.
read_request_timeout_in_ms
You can increase the timeout but this is not a good option. It may resolve the issue (may not throw exception but will take more time to return result).
Solution: For big data set you can get the result using manual pagination (range query) with limit.
SELECT cliordid,ordermsg FROM omsks_v1.ordersStringV1
WHERE tradacntid > = ? and cliordid > ? limit ?;
Or use range query
SELECT cliordid,ordermsg FROM omsks_v1.ordersStringV1 WHERE tradacntid
= ? and cliordid >= ? and cliordid <= ?;
This will be much more faster than fetching the whole resultset.
You can also try by reducing the fetch size. Although it will return the whole resultset.
public Statement setFetchSize(int fetchSize) to check if exception is thrown.
setFetchSize controls the page size, but it doesn't control the
maximum rows returned in a ResultSet.
Another point to be noted:
What's the size of tradigAccountList?
Too many requests at a time also may lead to timeout. Large size of tradigAccountList and a lot of read requests are done at a time (load balancing of requests are handled by Cassandra and how many requests can be handled depends on cluster size and some other factors) may cause this exception .
Some related Links:
Cassandra read timeout
NoHostAvailableException With Cassandra & DataStax Java Driver If Large ResultSet
Cassandra .setFetchSize() on statement is not honoured
I have a loop with heavy memory access from oracle.
int firstResult = 0;
int maxResult = 500;
int targetTotal = 8000; // more or less
int phase = 1;
for (int i = 0; i<= targetTotal; i += maxResult) {
try {
Session session = .... init hibernate session ...
// Start Transaction
List<Accounts> importableInvAcList = ...getting list using session and firstResult-maxResult...
List<ContractData> dataList = new ArrayList<>();
List<ErrorData> errorDataList = new ArrayList<>();
for (Accounts account : importableInvAcList) {
... Converting 500 Accounts object to ContractData object ...
... along with 5 more database call using existing session ...
.. On converting The object we generate thousands of ErrorData...
dataList.add(.. converted account to Contract data ..);
errorDataList.add(.. generated error data ..);
}
dataList.stream().forEach(session::save); // 500 data
errorDataList.stream().forEach(session::save); // 10,000-5,000 data
... Commit Transaction ...
phase++;
} catch (Exception e) {
return;
}
}
On the second phase (2nd loop) the Exception comes out. Sometimes Exception is coming out in 3rd or fifth phase.
I also checked the Runtime Memory.
Runtime runtime = Runtime.getRuntime();
long total = runtime.totalMemory();
long free = runtime.freeMemory();
long used = total - free;
long max = runtime.maxMemory();
And in the second phase the status was below for sample...
Used: 1022 MB, Free: 313 MB, Total Allocated: 1335 MB
Stack Trace is here...
org.hibernate.exception.GenericJDBCException: Cannot open connection
at org.hibernate.exception.SQLStateConverter.handledNonSpecificException(SQLStateConverter.java:140)
at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:128)
at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:66)
at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:52)
at org.hibernate.jdbc.ConnectionManager.openConnection(ConnectionManager.java:449)
at org.hibernate.jdbc.ConnectionManager.getConnection(ConnectionManager.java:167)
at org.hibernate.jdbc.JDBCContext.connection(JDBCContext.java:142)
at org.hibernate.transaction.JDBCTransaction.begin(JDBCTransaction.java:85)
at org.hibernate.impl.SessionImpl.beginTransaction(SessionImpl.java:1463)
at ibbl.remote.tx.TxSessionImpl.beginTx(TxSessionImpl.java:41)
at ibbl.remote.tx.TxController.initPersistence(TxController.java:70)
at com.ibbl.data.util.CDExporter2.run(CDExporter2.java:130)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLException: Listener refused the connection with the following error:
ORA-12518, TNS:listener could not hand off client connection
Noted that, this process running in a Thread, and there are 3 similar Thread running at a time.
Why this Exception hangs out after the loop running a while ?
there are 3 similar Thread running at a time.
If your code creates a total of 3 Threads, then, optimally, you need only 3 Oracle Connections. Create all of them before any Thread is created. Create the Threads, assign each Thread a Connection, then start the Threads.
Chances are good, though, that your code might be way too aggressively consuming resources on whatever machine is hosting it. Even if you eliminate the ORA-12518, the RDBMS server may "go south". By "go south", I mean if your application is consuming too many resources the machine hosting it or the machine hosting the RDBMS server may "panic" or something equally dreadful.
I'm wondering if any one experienced the same problem.
We have a Vert.x application and in the end it's purpose is to insert 600 million rows into a Cassandra cluster. We are testing the speed of Vert.x in combination with Cassandra by doing tests in smaller amounts.
If we run the fat jar (build with Shade plugin) without the -cluster option, we are able to insert 10 million records in about a minute. When we add the -cluster option (eventually we will run the Vert.x application in cluster) it takes about 5 minutes for 10 million records to insert.
Does anyone know why?
We know that the Hazelcast config will create some overhead, but never thought it would be 5 times slower. This implies we will need 5 EC2 instances in cluster to get the same result when using 1 EC2 without the cluster option.
As mentioned, everything runs on EC2 instances:
2 Cassandra servers on t2.small
1 Vert.x server on t2.2xlarge
You are actually running into corner cases of the Vert.x Hazelcast Cluster manager.
First of all you are using a worker Verticle to send your messages (30000001). Under the hood Hazelcast is blocking and thus when you send a message from a worker the version 3.3.3 does not take that in account. Recently we added this fix https://github.com/vert-x3/issues/issues/75 (not present in 3.4.0.Beta1 but present in 3.4.0-SNAPSHOTS) that will improve this case.
Second when you send all your messages at the same time, it runs into another corner case that prevents the Hazelcast cluster manager to use a cache of the cluster topology. This topology cache is usually updated after the first message has been sent and sending all the messages in one shot prevents the usage of the ache (short explanation HazelcastAsyncMultiMap#getInProgressCount will be > 0 and prevents the cache to be used), hence paying the penalty of an expensive lookup (hence the cache).
If I use Bertjan's reproducer with 3.4.0-SNAPSHOT + Hazelcast and the following change: send message to destination, wait for reply. Upon reply send all messages then I get a lot of improvements.
Without clustering : 5852 ms
With clustering with HZ 3.3.3 :16745 ms
With clustering with HZ 3.4.0-SNAPSHOT + initial message : 8609 ms
I believe also you should not use a worker verticle to send that many messages and instead send them using an event loop verticle via batches. Perhaps you should explain your use case and we can think about the best way to solve it.
When you're you enable clustering (of any kind) to an application you are making your application more resilient to failures but you're also adding a performance penalty.
For example your current flow (without clustering) is something like:
client ->
vert.x app ->
in memory same process eventbus (negletible) ->
handler -> cassandra
<- vert.x app
<- client
Once you enable clustering:
client ->
vert.x app ->
serialize request ->
network request cluster member ->
deserialize request ->
handler -> cassandra
<- serialize response
<- network reply
<- deserialize response
<- vert.x app
<- client
As you can see there are many encode decode operations required plus several network calls and this all gets added to your total request time.
In order to achive best performance you need to take advantage of locality the closer you are of your data store usually the fastest.
Just to add the code of the project. I guess that would help.
Sender verticle:
public class ProviderVerticle extends AbstractVerticle {
#Override
public void start() throws Exception {
IntStream.range(1, 30000001).parallel().forEach(i -> {
vertx.eventBus().send("clustertest1", Json.encode(new TestCluster1(i, "abc", LocalDateTime.now())));
});
}
#Override
public void stop() throws Exception {
super.stop();
}
}
And the inserter verticle
public class ReceiverVerticle extends AbstractVerticle {
private int messagesReceived = 1;
private Session cassandraSession;
#Override
public void start() throws Exception {
PoolingOptions poolingOptions = new PoolingOptions()
.setCoreConnectionsPerHost(HostDistance.LOCAL, 2)
.setMaxConnectionsPerHost(HostDistance.LOCAL, 3)
.setCoreConnectionsPerHost(HostDistance.REMOTE, 1)
.setMaxConnectionsPerHost(HostDistance.REMOTE, 3)
.setMaxRequestsPerConnection(HostDistance.LOCAL, 20)
.setMaxQueueSize(32768)
.setMaxRequestsPerConnection(HostDistance.REMOTE, 20);
Cluster cluster = Cluster.builder()
.withPoolingOptions(poolingOptions)
.addContactPoints(ClusterSetup.SEEDS)
.build();
System.out.println("Connecting session");
cassandraSession = cluster.connect("kiespees");
System.out.println("Session connected:\n\tcluster [" + cassandraSession.getCluster().getClusterName() + "]");
System.out.println("Connected hosts: ");
cassandraSession.getState().getConnectedHosts().forEach(host -> System.out.println(host.getAddress()));
PreparedStatement prepared = cassandraSession.prepare(
"insert into clustertest1 (id, value, created) " +
"values (:id, :value, :created)");
PreparedStatement preparedTimer = cassandraSession.prepare(
"insert into timer (name, created_on, amount) " +
"values (:name, :createdOn, :amount)");
BoundStatement timerStart = preparedTimer.bind()
.setString("name", "clusterteststart")
.setInt("amount", 0)
.setTimestamp("createdOn", new Timestamp(new Date().getTime()));
cassandraSession.executeAsync(timerStart);
EventBus bus = vertx.eventBus();
System.out.println("Bus info: " + bus.toString());
MessageConsumer<String> cons = bus.consumer("clustertest1");
System.out.println("Consumer info: " + cons.address());
System.out.println("Waiting for messages");
cons.handler(message -> {
TestCluster1 tc = Json.decodeValue(message.body(), TestCluster1.class);
if (messagesReceived % 100000 == 0)
System.out.println("Message received: " + messagesReceived);
BoundStatement boundRecord = prepared.bind()
.setInt("id", tc.getId())
.setString("value", tc.getValue())
.setTimestamp("created", new Timestamp(new Date().getTime()));
cassandraSession.executeAsync(boundRecord);
if (messagesReceived % 100000 == 0) {
BoundStatement timerStop = preparedTimer.bind()
.setString("name", "clusterteststop")
.setInt("amount", messagesReceived)
.setTimestamp("createdOn", new Timestamp(new Date().getTime()));
cassandraSession.executeAsync(timerStop);
}
messagesReceived++;
//message.reply("OK");
});
}
#Override
public void stop() throws Exception {
super.stop();
cassandraSession.close();
}
}
I have created a Quartz Job which runs in background in my JBoss server and is responsible for updating some statistical data at regular interval (coupled with some database flags)
To load and persist the I am using Hibernate 4. Everything works fine except one hick up.
The entire thread i.e. Job is wrapped in a Single transaction which over the period of time (as the amount of data increases) becomes huge and worry some. I am trying to break this single large transaction into multiple small ones, such that each transaction process only a sub group of data.
Problem: I tried very lamely to wrap a code into a loop and start/end transaction at start/end of the loop. As I expected it didn't work. I have been looking around various forums to figure out a solution but have not come across anything that indicates managing multiple transaction in a single session (where in only 1 transaction will be active at a time).
I am relatively new to hibernate and would appreciate any help that points me to a direction on achieving this.
Update: Adding code demonstrate what I am trying to achieve and mean when I say breaking into multiple transaction. And stack trace when this is executed.
log.info("Starting Calculation Job.");
List<GroupModel> groups = Collections.emptyList();
DAOFactory hibDaoFactory = null;
try {
hibDaoFactory = DAOFactory.hibernate();
hibDaoFactory.beginTransaction();
OrganizationDao groupDao = hibDaoFactory.getGroupDao();
groups = groupDao.findAll();
hibDaoFactory.commitTransaction();
} catch (Exception ex) {
hibDaoFactory.rollbackTransaction();
log.error("Error in transaction", ex);
}
try {
hibDaoFactory = DAOFactory.hibernate();
StatsDao statsDao = hibDaoFactory.getStatsDao();
StatsScaledValuesDao statsScaledDao = hibDaoFactory.getStatsScaledValuesDao();
for (GroupModel grp : groups) {
try {
hibDaoFactory.beginTransaction();
log.info("Performing computation for Group " + grp.getName() + " ["
+ grp.getId() + "]");
List<Stats> statsDetail = statsDao.loadStatsGroup(grp.getId());
// Coputing Steps here
for (Entry origEntry : statsEntries) {
entry.setCalculatedItem1(origEntry.getCalculatedItem1());
entry.setCalculatedItem2(origEntry.getCalculatedItem2());
entry.setCalculatedItem3(origEntry.getCalculatedItem3());
StatsDetailsScaledValues scValues = entry.getScaledValues();
if (scValues == null) {
scValues = new StatsDetailsScaledValues();
scValues.setId(origEntry.getScrEntryId());
scValues.setValues(origEntry.getScaledValues());
} else {
scValues.setValues(origEntry.getScaledValues());
}
statsScaledDao.makePersistent(scValues);
}
hibDaoFactory.commitTransaction();
} catch (Exception ex) {
hibDaoFactory.rollbackTransaction();
log.error("Error in transaction", ex);
} finally {
}
}
} catch (Exception ex) {
log.error("Error", ex);
} finally {
}
log.info("Job Complete.");
Following is the exception stacktrace I am getting upon execution of this Job
org.hibernate.SessionException: Session is closed!
at org.hibernate.internal.AbstractSessionImpl.errorIfClosed(AbstractSessionImpl.java:127)
at org.hibernate.internal.SessionImpl.createCriteria(SessionImpl.java:1555)
at sun.reflect.GeneratedMethodAccessor469.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.hibernate.context.internal.ThreadLocalSessionContext$TransactionProtectionWrapper.invoke(ThreadLocalSessionContext.java:352)
at $Proxy308.createCriteria(Unknown Source)
at com.blueoptima.cs.dao.impl.hibernate.GenericHibernateDao.findByCriteria(GenericHibernateDao.java:132)
at com.blueoptima.cs.dao.impl.hibernate.ScrStatsManagementHibernateDao.loadStatsEntriesForOrg(ScrStatsManagementHibernateDao.java:22)
... 3 more
To my understanding from what I have read so far about Hibernate, sessions and transactions. It seems that when a session is created it is attached to the thread and lives through out the threads life or when commit or rollback is called. Thus, when the first transaction is committed the session is being closed and is unavailable for the rest of the threads life.
My question remains: How can we have multiple transactions in a single session?
More detail would be great and some examples but I think I should be able to help with what you have written here.
Have one static SessionFactory (this is big on memory)
Also with your transactions you want something like this.
SomeClass object = new SomeClass();
Session session = sessionFactory().openSession() // create the session object
session.beginTransaction(); //begins the transaction
session.save(object); // saves the object But REMEMBER it isn't saved till session.commit()
session.getTransaction().commit(); // actually persisting the object
session.close(); //closes the transaction
This is how I used my transaction, I am not sure if I do as many transaction as you have at a time. But the session object is light weight compared to the SessionFactory in memory.
If you want to save more objects at a time you could do it in one transaction for example.
SomeClass object1 = new SomeClass();
SomeClass object2 = new SomeClass();
SomeClass object2 = new SomeClass();
session.beginTransaction();
session.save(object1);
session.save(object2);
session.save(object3);
session.getTransaction().commit(); // when commit is called it will save all 3 objects
session.close();
Hope this help in some way or points you in the right direction.
I think you could configure you program to condense transactions as well. :)
Edit
Here is a great youtube tutorial. This guy really broke it down for me.
Hibernate Tutorials