Cassandra Downed Host Retry (Hector)

Cassandra Downed Host Retry (Hector) - java

Can someone assist me in understanding and troubleshooting this issue? I do not know what is causing Hector to fail when it tries to connect to the Cassandra cluster.
How can I find out where the issue is?
0 [main] INFO me.prettyprint.cassandra.connection.CassandraHostRetryService - Downed Host Retry service started with queue size 10 and retry delay 30s
168 [main] INFO me.prettyprint.cassandra.service.JmxMonitor - Registering JMX me.prettyprint.cassandra.service_keyspace-name:ServiceType=hector,MonitorType=hector
399 [main] INFO me.prettyprint.cassandra.model.ConfigurableConsistencyLevel - READ ConsistencyLevel set to QUORUM for ColumnFamily Files
400 [main] INFO me.prettyprint.cassandra.model.ConfigurableConsistencyLevel - WRITE ConsistencyLevel set to QUORUM for ColumnFamily Files
406 [main] INFO me.prettyprint.cassandra.model.ConfigurableConsistencyLevel - READ ConsistencyLevel set to QUORUM for ColumnFamily FileList
407 [main] INFO me.prettyprint.cassandra.model.ConfigurableConsistencyLevel - WRITE ConsistencyLevel set to QUORUM for ColumnFamily FileList

From the trace it seems you are using QUORUM as consistency level. Try to use ONE and see if it works. It seems that one or more of the nodes that should satisfy your request are down. Use noderool ring/status or see if any node is down in your cluster.

Related

Elastic APM Java - Transactions & Spans are recorded but not reported to Elastic APM Server or Kibana

I have a standalone JAVA application.
And have integrated it successfully with Elastic APM (+ElasticSearch +Kibana) for capturing telemetries.
Java Version: 8 - OpenJDK
Elastic Agent & Library Version: 1.16
Elastic Search, APM and Kibana Version: 7.7.1
Below are the relevant JVM Options being used:
JAVA_OPTS="$JAVA_OPTS -javaagent:$BASE_HOME/agent-lib/elastic-apm-agent-1.16.0.jar -Delastic.apm.service_name=my-app -Delastic.apm.server_urls=http://elastic-apm-server:8200"
JAVA_OPTS="$JAVA_OPTS -Delastic.apm.application_packages=com,org -Delastic.apm.span_frames_min_duration=-1ms"
JAVA_OPTS="$JAVA_OPTS -Delastic.apm.log_file=$BASE_HOME/logs/apm.log -Delastic.apm.log_level=DEBUG"
I am generating custom transactions and spans using the Tracer/Transaction/Span APIs as suggested in the official documentation.
And as per the generated debug logs. These spans and transactions are getting captured as expected.
I have validated the same by DEBUGGING it over the IDE, that transactions are being captured as expected.
Problem: The custom transactions are not shown on the Kibana APM Dashboard
However some out of the box transactions from Quartz(which is being used in the application) are shown as expected. Which should mean the integration with the Elastic APM Server is fine.
It appears to me, even though the transactions are being captured successfully, those are not reported(sent) to the APM Server
Refer some relevant apm logs:
2020-07-01 12:33:09.569 [pool-1-thread-1] DEBUG co.elastic.apm.agent.impl.ElasticApmTracer - startTransaction '' 00-d0025079170e4f03698702f4e68be4ac-cf792454fbef1c77-01 (16970dc3) {
2020-07-01 12:33:09.569 [pool-1-thread-1] DEBUG co.elastic.apm.agent.impl.ElasticApmTracer - Activating 'ExtractionRequestHandler#invokeExtraction' 00-d0025079170e4f03698702f4e68be4ac-cf792454fbef1c77-01 (16970dc3) on thread 26
2020-07-01 12:33:09.569 [pool-1-thread-1] DEBUG co.elastic.apm.agent.impl.transaction.AbstractSpan - increment references to 'ExtractionRequestHandler#invokeExtraction' 00-d0025079170e4f03698702f4e68be4ac-cf792454fbef1c77-01 (16970dc3) (2)
2020-07-01 12:33:09.569 [elastic-apm-server-reporter] DEBUG co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Receiving SPAN event (sequence 86)
2020-07-01 12:33:09.570 [elastic-apm-server-reporter] DEBUG co.elastic.apm.agent.impl.transaction.AbstractSpan - decrement references to 'ExtractionRequestHandler#invokeExtraction' 00-98a1d8f4970d585915eb03a414b7b14c-994dd2823198f1ef-01 (33d448b5) (4)
2020-07-01 12:33:09.570 [elastic-apm-server-reporter] DEBUG co.elastic.apm.agent.impl.transaction.AbstractSpan - decrement references to 'BOpFileUtils#authorizeFilePath' 00-98a1d8f4970d585915eb03a414b7b14c-133200d1793fbaab-01 (67fba8aa) (0)
2020-07-01 12:33:09.570 [elastic-apm-server-reporter] DEBUG co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Receiving SPAN event (sequence 87)
2020-07-01 12:33:09.570 [elastic-apm-server-reporter] DEBUG co.elastic.apm.agent.impl.transaction.AbstractSpan - decrement references to 'ExtractionRequestHandler#invokeExtraction' 00-98a1d8f4970d585915eb03a414b7b14c-994dd2823198f1ef-01 (33d448b5) (3)
2020-07-01 12:33:09.570 [elastic-apm-server-reporter] DEBUG co.elastic.apm.agent.impl.transaction.AbstractSpan - decrement references to 'SCR#init' 00-98a1d8f4970d585915eb03a414b7b14c-77cf207c33eb24ab-01 (2f1f25c3) (0)
Need help in finding what I am doing wrong? And how to fix it?

I got the answer after posting the same on Elastic Support Forum.
It was a very prompt response.
This was not a problem from Elastic APM side, and was more of a silly problem from my side.
Refer the discussion to find the problem and solution.

Error UNKNOWN_MEMBER_ID occurred while committing offsets for group xxx

With Kafka client Java library, consuming logs has worked for some time but with the following errors it doesn't work any more:
2016-07-15 19:37:54.609 INFO 4342 --- [main] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 2147483647 dead.
2016-07-15 19:37:54.933 ERROR 4342 --- [main] o.a.k.c.c.internals.ConsumerCoordinator : Error UNKNOWN_MEMBER_ID occurred while committing offsets for group logstash
2016-07-15 19:37:54.933 WARN 4342 --- [main] o.a.k.c.c.internals.ConsumerCoordinator : Auto offset commit failed: Commit cannot be completed due to group rebalance
2016-07-15 19:37:54.941 ERROR 4342 --- [main] o.a.k.c.c.internals.ConsumerCoordinator : Error UNKNOWN_MEMBER_ID occurred while committing offsets for group logstash
2016-07-15 19:37:54.941 WARN 4342 --- [main] o.a.k.c.c.internals.ConsumerCoordinator : Auto offset commit failed:
2016-07-15 19:37:54.948 INFO 4342 --- [main] o.a.k.c.c.internals.AbstractCoordinator : Attempt to join group logstash failed due to unknown member id, resetting and retrying.
It keeps resetting.
Running another instance of the same application gets errors immediately.
I suspect Kafka or its ZooKeeper has a problem but there's no error log.
Any one who has idea on what's going on here?
This is the application I'm using: https://github.com/izeye/log-redirector

I just faced the same issue. I have been investigating, and in this thread and in this wiki you can find the solution.
The issue seems to be that the processing of a batch takes longer than the session timeout.
Either increase the session timeout or the polling frequency or limit the number of bytes received.
What worked for me was changing max.partition.fetch.bytes. But you can also modify session.timeout.ms or the value you pass to your consumer.poll(TIMEOUT)

Slow connection to Cassandra

I have trouble connecting to my 3-node cassandra cluster via Datastax PHP- and Java-Driver.
Especially for the PHP driver it is crucial that i can connect fast to improve loading times of my website.
How can i debug this or what is the reason?
Java output shows this:
09:59:40,284 [main] DEBUG - com.datastax.driver.NEW_NODE_DELAY_SECONDS is undefined, using default value 1
09:59:40,284 [main] DEBUG - com.datastax.driver.NON_BLOCKING_EXECUTOR_SIZE is undefined, using default value 4
09:59:40,297 [main] DEBUG - com.datastax.driver.NOTIF_LOCK_TIMEOUT_SECONDS is undefined, using default value 60
09:59:40,357 [main] DEBUG - Starting new cluster with contact points [/XXX.XXX.XXX.XXX:9042, /XXX.XXX.XXX.YYY:9042, /XXX.XXX.XXX.ZZZ:9042]
09:59:40,402 [main] DEBUG - Using SLF4J as the default logging framework
09:59:40,489 [main] DEBUG - java.nio.Buffer.address: available
09:59:40,490 [main] DEBUG - sun.misc.Unsafe.theUnsafe: available
09:59:40,490 [main] DEBUG - sun.misc.Unsafe.copyMemory: available
09:59:40,490 [main] DEBUG - java.nio.Bits.unaligned: true
09:59:40,492 [main] DEBUG - Java version: 8
09:59:40,492 [main] DEBUG - -Dio.netty.noUnsafe: false
09:59:40,492 [main] DEBUG - sun.misc.Unsafe: available
09:59:40,492 [main] DEBUG - -Dio.netty.noJavassist: false
09:59:40,665 [main] DEBUG - Javassist: available
09:59:40,665 [main] DEBUG - -Dio.netty.tmpdir: /var/folders/4y/t4b47lbn1zjbjpb6x09l30wm0000gn/T (java.io.tmpdir)
09:59:40,666 [main] DEBUG - -Dio.netty.bitMode: 64 (sun.arch.data.model)
09:59:40,666 [main] DEBUG - -Dio.netty.noPreferDirect: false
09:59:40,708 [main] DEBUG - com.datastax.driver.FORCE_NIO is undefined, using default value false
09:59:40,710 [main] INFO - Did not find Netty's native epoll transport in the classpath, defaulting to NIO.
09:59:40,714 [main] DEBUG - -Dio.netty.eventLoopThreads: 8
09:59:40,723 [main] DEBUG - -Dio.netty.noKeySetOptimization: false
09:59:40,723 [main] DEBUG - -Dio.netty.selectorAutoRebuildThreshold: 512
09:59:40,747 [main] DEBUG - -Dio.netty.leakDetectionLevel: simple
09:59:41,035 [main] DEBUG - com.datastax.driver.DISABLE_COALESCING is undefined, using default value false
09:59:41,046 [main] DEBUG - Generated: io.netty.util.internal.__matchers__.com.datastax.driver.core.Message$ResponseMatcher
09:59:41,066 [main] DEBUG - -Dio.netty.allocator.numHeapArenas: 4
09:59:41,066 [main] DEBUG - -Dio.netty.allocator.numDirectArenas: 4
09:59:41,066 [main] DEBUG - -Dio.netty.allocator.pageSize: 8192
09:59:41,066 [main] DEBUG - -Dio.netty.allocator.maxOrder: 11
09:59:41,067 [main] DEBUG - -Dio.netty.allocator.chunkSize: 16777216
09:59:41,067 [main] DEBUG - -Dio.netty.allocator.tinyCacheSize: 512
09:59:41,067 [main] DEBUG - -Dio.netty.allocator.smallCacheSize: 256
09:59:41,067 [main] DEBUG - -Dio.netty.allocator.normalCacheSize: 64
09:59:41,067 [main] DEBUG - -Dio.netty.allocator.maxCachedBufferCapacity: 32768
09:59:41,067 [main] DEBUG - -Dio.netty.allocator.cacheTrimInterval: 8192
09:59:41,078 [main] DEBUG - Generated: io.netty.util.internal.__matchers__.com.datastax.driver.core.FrameMatcher
09:59:41,082 [main] DEBUG - Generated: io.netty.util.internal.__matchers__.com.datastax.driver.core.Message$RequestMatcher
09:59:41,104 [main] DEBUG - -Dio.netty.initialSeedUniquifier: 0x24d6f22f78c5a924 (took 8 ms)
09:59:41,130 [main] DEBUG - -Dio.netty.allocator.type: unpooled
09:59:41,130 [main] DEBUG - -Dio.netty.threadLocalDirectBufferSize: 65536
09:59:41,197 [cluster1-nio-worker-0] DEBUG - Connection[/XXX.XXX.XXX.YYY:9042-1, inFlight=0, closed=false] Connection opened successfully
09:59:41,218 [cluster1-nio-worker-0] DEBUG - -Dio.netty.recycler.maxCapacity.default: 262144
09:59:41,432 [main] DEBUG - [Control connection] Refreshing node list and token map
09:59:41,518 [main] DEBUG - [Control connection] Refreshing schema
09:59:42,137 [main] DEBUG - [Control connection] Refreshing node list and token map
09:59:42,315 [main] DEBUG - [Control connection] Successfully connected to /XXX.XXX.XXX.YYY:9042
09:59:42,315 [main] INFO - Using data-center name '168' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor)
09:59:42,315 [main] INFO - New Cassandra host /XXX.XXX.XXX.XXX:9042 added
09:59:42,315 [main] INFO - New Cassandra host /XXX.XXX.XXX.YYY:9042 added
09:59:42,315 [main] INFO - New Cassandra host /XXX.XXX.XXX.ZZZ:9042 added
09:59:42,342 [cluster1-nio-worker-1] DEBUG - Connection[/XXX.XXX.XXX.XXX:9042-2, inFlight=0, closed=false] Connection opened successfully
09:59:42,345 [cluster1-nio-worker-2] DEBUG - Connection[/XXX.XXX.XXX.YYY:9042-1, inFlight=0, closed=false] Connection opened successfully
09:59:42,348 [cluster1-nio-worker-3] DEBUG - Connection[/XXX.XXX.XXX.ZZZ:9042-1, inFlight=0, closed=false] Connection opened successfully
09:59:42,580 [cluster1-nio-worker-2] DEBUG - Added connection pool for /XXX.XXX.XXX.XXX:9042
09:59:42,591 [cluster1-nio-worker-3] DEBUG - Added connection pool for /XXX.XXX.XXX.YYY:9042
09:59:42,609 [cluster1-nio-worker-1] DEBUG - Added connection pool for /XXX.XXX.XXX.ZZZ:9042
As you can see, it takes ~2.5 seconds which is too slow for my use case.
Same happens with the PHP driver, but i don't have a log for this.
Queries are very fast once the driver is connected. Only issue is the slow connection time. I have set up all the three nodes as contact points.
EDIT
Just to clarify: My PHP driver is the problem. I'm wondering why it isn't using pooling/persisting connections. When i call the script two times in a row, every call takes 2-5 seconds. I think the second call should be using the persisting pool. phpinfo() shows persistent clusters & sessions = 0. This is the code that i'm using:
$cluster = Cassandra::cluster()
->withContactPoints('XXX.XXX.XXX.XXX', 'XXX.XXX.XXX.YYY, 'XXX.XXX.XXX.ZZZ')
->withCredentials('USERNAME', 'PASSWORD')
->build();
$keyspace = 'myKeyspace';
$session = $cluster->connect($keyspace);
UPDATE
The problem was my network. Had too little bandwidth.

DataStax drivers are full featured drivers. They are aware of your custer topology and cluster state which requires some expensive operations in the cluster object build stage. It is common for the cluster object creation to take multiple seconds (depending on the size of your cluster/number of nodes).
The best practice is not to create the cluster object for every request (that would be extremely inefficient). Instead, you want to build the cluster object one time and maintain the connections open. Then when you receive a request from your front end, handle it with the existing cluster object.
Cassandra will give you very fast response times when used correctly.
For other c* client best practices, take a look at Brian's Cassandra Loader. This is a good reference application as well as a very efficient bulk loader.
Some key best practices include: Limit the number of async requests if you are using execute async, if you are using batches, ensure that the batches are token specific to avoid excessive coordination, do not use logged batches unless you need atomicity, and do not dynamically manipulate your schema from your application to avoid schema mismatches.

Camel: disable polling messages

I am using Camel to setup some routing using the file and jms-queue components. The problem that I am having is that I cannot disable polling messages sent to the console.
I tryed multiple ways to disable these messages by setting the logging level(runLoggingLevel = OFF) on the routes, trace = false on the context, set a logger on the routes and a few others but nothing works.
A message from the file component looks like this:
2013-08-26 09:34:47,651 DEBUG [Camel (camelContextOrder) thread #0 - file://order-import/order-in] o.a.c.c.f.FileConsumer Took 0.001 seconds to poll: order-import\order-in
And a messsage from the jms queue:
2013-08-26 09:34:46,281 DEBUG [ActiveMQ Journal Checkpoint Worker] o.a.a.s.k.MessageDatabase Checkpoint started.
2013-08-26 09:34:46,403 DEBUG [ActiveMQ Journal Checkpoint Worker] o.a.a.s.k.MessageDatabase Checkpoint done.

You have DEBUG logging level configured. You should change that to INFO etc so Camel / ActiveMQ will not log so much.
Check your logging configuration to adjust this.

twitter hosebird client not working

I am trying to get tweets via hbc-twitter4j-v3 . Example code is : https://github.com/twitter/hbc/blob/master/hbc-example/src/main/java/com/twitter/hbc/example/Twitter4jSampleStreamExample.java
For enabling authentication on proxy, I have also set system properties for host,port and authentication. But it is showing following error-
[main] INFO com.twitter.hbc.httpclient.BasicClient - New connection executed: hosebird-client-0, endpoint: /1.1/statuses/sample.json?delimited=length&stall_warnings=true
[hosebird-client-io-thread-0] INFO com.twitter.hbc.httpclient.ClientBase - hosebird-client-0 Establishing a connection
[main] INFO com.twitter.hbc.httpclient.BasicClient - Stopping the client: hosebird-client-0, endpoint: /1.1/statuses/sample.json?delimited=length&stall_warnings=true
[main] INFO com.twitter.hbc.httpclient.ClientBase - hosebird-client-0 exit event - Stopped by user: waiting for 5000 ms
[main] WARN com.twitter.hbc.httpclient.ClientBase - hosebird-client-0 Client thread failed to finish in 5000 millis
[main] INFO com.twitter.hbc.httpclient.BasicClient - Successfully stopped the client: hosebird-client-0, endpoint: /1.1/statuses/sample.json?delimited=length&stall_warnings=true
[hosebird-client-io-thread-0] WARN com.twitter.hbc.httpclient.ClientBase - hosebird-client-0 Unknown host - stream.twitter.com
[hosebird-client-io-thread-0] WARN com.twitter.hbc.httpclient.ClientBase - hosebird-client-0 failed to establish connection properly
[hosebird-client-io-thread-0] INFO com.twitter.hbc.httpclient.ClientBase - hosebird-client-0 Done processing, preparing to close connection
[hosebird-client-io-thread-0] INFO com.twitter.hbc.httpclient.ClientBase - hosebird-client-0 Shutting down httpclient connection manager
Any help??
Thanks in advance

Hopefully I haven't overlooked something but this is how it appears to me...
If by setting properties you mean the http.proxy* ones, I don't think it will work as hosebird-client uses Apache's HTTP client under the hood which doesn't seem to use them.
From a cursory glance at the code, specifically around the ClientBuilder, it doesn't look like hbc supports proxy configuration - perhaps they have a good reason not to or just don't need the feature themselves, maybe try requesting it?
It looks like one of the ways you can get HttpClient to use a proxy is by adding it to the HttpParams object, e.g.:
HttpParams params = ...
HttpHost proxy = new HttpHost(hostname, port);
params.setParameter(ConnRoutePNames.DEFAULT_PROXY, proxy);
Whilst the HttpParams object isn't exposed anywhere you could potentially extend the ClientBuilder in order to supply your proxy configuration. If you look at the ClientBuilder#build() method, you can see where the HttpParams object is being set up. Good luck!
EDIT: Additionally, this issue indicates there are no plans to add proxy support directly in hbc.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Cassandra Downed Host Retry (Hector) - java

From the trace it seems you are using QUORUM as consistency level. Try to use ONE and see if it works. It seems that one or more of the nodes that should satisfy your request are down. Use noderool ring/status or see if any node is down in your cluster.

Related

Elastic APM Java - Transactions & Spans are recorded but not reported to Elastic APM Server or Kibana

Error UNKNOWN_MEMBER_ID occurred while committing offsets for group xxx

Slow connection to Cassandra

Camel: disable polling messages

twitter hosebird client not working

Categories

Resources