Increment times out; set always succeeds after retry - java

I'm getting strange behavior in memcached, in particular, behavior that is strange in its consistency. Here is my test:
#Test
public void testMemc() {
logger.info("Setting head.");
memc.set(env.memcachedQueueKeys().head, 3600, 0);
logger.info("Set head; incrementing.");
memc.incr(env.memcachedQueueKeys().head, 1);
logger.info("Incremented.");
}
And here is the output:
28 11:04:52.932 INFO; Setting head.
2014-01-28 11:04:52.933 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for q:unittest:scannedemails:w.
28 11:04:52.933 INFO; Set head; incrementing.
2014-01-28 11:04:52.935 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for q:unittest:scannedemails:w.
FAILED: testMemc
net.spy.memcached.OperationTimeoutException: Mutate operation timed out,unable to modify counter [q:unittest:scannedemails:w]
at net.spy.memcached.MemcachedClient.mutate(MemcachedClient.java:1484)
at net.spy.memcached.MemcachedClient.incr(MemcachedClient.java:1529)
at me.unroll.emailroller.ActOnScanResultsTest.testMemc(ActOnScanResultsTest.java:295)
Most of my intuition for this kind of error fails me here. The following things are all strange:
Why does it always fail exactly once to set?
Why does it permanently fail to increment after seeming to succeed at set?
This is on a high-load server (yes, it's a little wrong to be running a test on a load-bearing server, but if it catches issues like this there's at least some advantage). What can cause this consistent failure? There is only one node.

Problem is I couldn't connect at all. This is a bug in spymemcached, since the set operation did not throw an exception even though it had no memcached server to perform set on.

Related

HDFS: namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(390)) - Error: flush failed for required journal

On one of our platforms, HDFS namenode is shutting down with following error message every 1 or 3 days
FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(390)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [<ip1>:<port>,<ip2>:<port>, etc], stream=QuorumOutputStream starting at txid 29873171))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:109)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:525)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:385)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:521)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:710)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:188)
at java.lang.Thread.run(Thread.java:748)
Before this FATAL log we can see following kind of logs, on which we can detect a degradation of response time
WARN client.QuorumJournalManager (QuorumCall.java:waitFor(185)) - Waited 18014 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [<ip1>:<port>,<ip2>:<port>]
Have you already encountered this problem, and do you have any advices to fix it ?
We have already:
checked that our VMs are time synchronized
detected that when the problem occurs a burst of data on the network is in progress, without detecting the root cause yet
checked our network devices. Except a problem on a port which goes from UP state to DOWN state quickly that we are going to fix, the network seems correct
Thanks in advance

javax.ejb.EJBTransactionRolledbackException: Transaction rolled back

(I didn't find proper answer from the existing questions that's why I posted this)
I have an application which processes quite big number of data. I am getting the below errors / exceptions and the process gets terminated. But increasing the ram size resolving the issue. (but we can't do it for some restriction)
2019-02-11 14:02:59,662 ERROR [net.xxx.RuleHandler] (Thread-185232 (HornetQ-client-global-threads-1521150484)) failed to get rules from db: org.hibernate.HibernateException: Transaction was rolled back in a different thread!
at org.hibernate.engine.transaction.synchronization.internal.SynchronizationCallbackCoordinatorTrackingImpl.processAnyDelayedAfterCompletion(SynchronizationCallbackCoordinatorTrackingImpl.java:105) [hibernate-core-4.2.27.Final-redhat-1.jar:4.2.27.Final-redhat-1]
2019-02-11 14:02:59,693 ERROR [net.xxx.ejb.SearchReqMDB] (Thread-185232 (HornetQ-client-global-threads-1521150484)) Something failed while search indexing: net.xxx.xxx.JMPException: javax.ejb.EJBTransactionRolledbackException: Transaction rolled back
How to resolve this issue? Do I need to increase the timeout value for ejb call? and if yes then where I need to change it.
Using JBOSS 6 EAP , EJB & JPA

Cassandra - Strange behaviour with one node

I have a cluster with 3 nodes in my developer environment, with a keyspace and a replication factor = 2, originally I had only one node in this cluster but then I added 2 more nodes, one by one. Cassandra version is 3.7.
All these nodes are "clones" so I just modified the cassandra.yaml with the corresponding IP for every node.
I've done a repair and cleanup on every node, and in my application, I have a consistency level ONE.
This is the nodetool status output:
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.132.0.4 50.54 GiB 256 70.2% 50dc5baf-b8b3-4e19-8173-cf828afd36af rack1
UN 10.132.0.3 50.31 GiB 256 65.3% 2a45b7a5-41ce-4533-ba63-60fd3c5cc530 rack1
UN 10.132.0.9 33.88 GiB 256 64.5% e601fb16-6608-4e72-a820-dd4661977946 rack1
In the Cassandra.yaml I have only 10.132.0.3 as the seed node.
So at this point, everything works fine and as expected, if I turn down one node everything keeps running "fine" unless if this node is 10.132.0.9, if I turn down this "bad" node everything crashes with the following error:
org.apache.cassandra.exceptions.UnavailableException: Cannot achieve consistency level QUORUM
When I stop the bad node, the good ones show this error in his system.log files (I only copy the error not the entire StackTrace):
ERROR [SharedPool-Worker-1] 2018-02-27 10:59:16,449 QueryMessage.java:128 - Unexpected error during query
com.google.common.util.concurrent.UncheckedExecutionException: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.UnavailableException: Cannot achieve consistency level QUORUM
I don't understand what's wrong with this node and I don't find a solution...
Edit
My connection code:
cluster_builder = Cluster.builder()
.addContactPoints(serverIP.getCassandraList(sysvar))
.withAuthProvider(new PlainTextAuthProvider(serverIP.getCassandraUser(sysvar), serverIP.getCassandraPwd(sysvar))).withPoolingOptions(poolingOptions)
.withQueryOptions(new QueryOptions().setConsistencyLevel(ConsistencyLevel.ONE));
cluster = cluster_builder.build();
session = cluster.connect(keyspace);
My query:
statement = QueryBuilder.insertInto(keyspace, "measurement_minute").values(this.normal_names, (List<Object>) values);
And the execution:
ResultSetFuture future = session.executeAsync(statement.setConsistencyLevel(ConsistencyLevel.ONE));
I want to mention that I restarted, repaired and cleaned up all the nodes.
You are requesting QUORUM with a replication factor of 2. This won't really work well as you are really requesting ALL. For a quorum, a majority of your nodes need to respond to your query.
You can calculate the node count for a quorum with (RF/2)+1 (using integer arithmetic), so RF=2 gives (2/2)+1=2 - you need both of your replicas and can't have one down. The reason for some queries to work is that those don't use 10.132.0.9.
You can go with a replication factor of RF=3 or use CL.ONE for example.

Unable to get resource from jedis

After running my application, i am getting this error after around 5 mins.
Even though i am returning the resource after use, i keep getting this.
I have built jedis-2.2.2-SNAPSHOT.jar from the jedis code base, since its not released yet
I had set the minIdle = 100, maxIdle=200 & maxActive=200. At the time of this exception, the connection count to redis was 122 from my application
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
at redis.clients.util.Pool.getResource(Pool.java:42)
Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:442)
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:360)
at redis.clients.util.Pool.getResource(Pool.java:40)
... 6 more
Did you check that redis is still up & running ?
If not, investigate why it died.
try a redis-cli in a terminal if you can. "info" would give you more details.

Java Quartz Ibatis Cron Issues

I have a java webapp using an ibatis row handler to load a very large dataset (1 million rows in an innodb table). The process is run as a nightly cron job by quartz scheduler. However, after it processes for 6 minutes, it dies with the following stack trace:
WARN [DefaultQuartzScheduler_Worker-8] MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(168) | Could not invoke method 'doBatch' on target object [org.myCron#4adb34]
org.springframework.jdbc.UncategorizedSQLException: SqlMapClient operation: encountered SQLException [
--- The error occurred in org/myCron/mySqlMap.xml.
--- The error occurred while applying a result map.
--- Check the mySqlMap.outputMapping.
--- The error happened while setting a property on the result object.
--- Cause: com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception:
** BEGIN NESTED EXCEPTION **
java.io.EOFException
STACKTRACE:
java.io.EOFException
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1903)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2402)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2860)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289)
at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362)
at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352)
at com.mysql.jdbc.ResultSet.next(ResultSet.java:6106)
at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:168)
at sun.reflect.GeneratedMethodAccessor71.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:592)
at com.ibatis.common.jdbc.logging.ResultSetLogProxy.invoke(ResultSetLogProxy.java:47)
at $Proxy10.next(Unknown Source)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.handleResults(SqlExecutor.java:380)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.handleMultipleResults(SqlExecutor.java:301)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.executeQuery(SqlExecutor.java:190)
at com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.sqlExecuteQuery(GeneralStatement.java:205)
at com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.executeQueryWithCallback(GeneralStatement.java:173)
at com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.executeQueryWithRowHandler(GeneralStatement.java:133)
at com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryWithRowHandler(SqlMapExecutorDelegate.java:649)
at com.ibatis.sqlmap.engine.impl.SqlMapSessionImpl.queryWithRowHandler(SqlMapSessionImpl.java:156)
at com.ibatis.sqlmap.engine.impl.SqlMapClientImpl.queryWithRowHandler(SqlMapClientImpl.java:133)
at org.springframework.orm.ibatis.SqlMapClientTemplate$5.doInSqlMapClient(SqlMapClientTemplate.java:267)
at org.springframework.orm.ibatis.SqlMapClientTemplate.execute(SqlMapClientTemplate.java:165)
at org.springframework.orm.ibatis.SqlMapClientTemplate.queryWithRowHandler(SqlMapClientTemplate.java:265)
at org.myCron.doBatch(MyCron.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:592)
at org.springframework.util.MethodInvoker.invoke(MethodInvoker.java:248)
at org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(MethodInvokingJobDetailFactoryBean.java:165)
at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:66)
at org.quartz.core.JobRunShell.run(JobRunShell.java:191)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:516)
** END NESTED EXCEPTION **
The stack trace is very vague. The only hints that I see are 'the error happened while setting a property on the result object'. There are only two properties on the result object: a String and an Integer. Both of them permit null values, but my select statements indicate that neither of them have any null values. They both have a proper gettter/setter (which makes sense since the process runs for a while successfully before dying). Every time that the cron runs, it dies at a random point (so it isn't stuck on a particular row).
Note - The method 'doBatch' does exist since that is the method that starts the cron process. If it couldn't find doBatch, it couldn't successfully process the first thousand rows.
I've also tried runnning the job outside of quartz and it also fails there as well. We tried increasing our MySQL net_read_timeout, net_write_timeout, and delayed_insert_timeout but none of these settings helped with the problem. I also tried setting my log4j setting to DEBUG and I did not get any helpful info.
Any other ideas about what I could try?
Sounds like MySQL closed the connection for some reason. Check the MySQL log see if anything shows up. Turn on various logging options for MySQL if necessary.
Also, start printing debug data (including timestamp) from your app - just print everything, then see what the last action was - perhaps you have some rarely triggered conditions in your code that has a bug.
I.e. every single time you talk to MySQL log it before AND after.

Categories