Flink TaskManager not reconnecting to the new Jobmanager - java

I have configured Flink in HA mode as mentioned here:
I wanted to test the fault tolerance, hence I did the following:
Setup Flink cluster with 2 JobManagers and 1 TaskManager
Start a streaming job on task manager
Kill the active job manager(to simulate a crash)
The leader election is happening as expected.
But the task manager is noted reconnecting to the new job manager. It simply tries to reconnect to the previous leader every 10seconds.
Pasting the task manager log here:
2018-07-25 19:46:08,508 INFO org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration - Messages have a max timeout of 10000 ms
2018-07-25 19:46:08,515 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at akka://flink/user/taskmanager_0 .
2018-07-25 19:46:08,524 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock.
2018-07-25 19:46:08,525 INFO org.apache.flink.runtime.taskexecutor.JobLeaderService - Start job leader service.
2018-07-25 19:46:08,529 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Connecting to ResourceManager akka.tcp://flink#10.10.97.210:46477/user/resourcemanager(b91b9aeb3565be973c9bb47259414e0a).
2018-07-25 19:46:08,574 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /10.10.97.210:46477
2018-07-25 19:46:08,576 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#10.10.97.210:46477] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#10.10.97.210:46477]] Caused by: [Connection refused: /10.10.97.210:46477]
2018-07-25 19:46:08,579 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink#10.10.97.210:46477/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink#10.10.97.210:46477/user/resourcemanager..
2018-07-25 19:46:18,606 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [null] failed with java.net.ConnectException: Connection refused: /10.10.97.210:46477
2018-07-25 19:46:18,607 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink#10.10.97.210:46477] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#10.10.97.210:46477]] Caused by: [Connection refused: /10.10.97.210:46477]
2018-07-25 19:46:18,607 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink#10.10.97.210:46477/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink#10.10.97.210:46477/user/resourcemanager..
Restarting task manager doesn't help
Restarting cluster doesn't help
Please guide me if anything is missing.

Looking into the logs:
Connection refused: /10.10.97.210:46477
Was port 46477 was opened/excluded from firewall?
Just check if you have set the following in flink config:
jobmanager.rpc.port: 6123
blob.server.port: 50100-50200
And then unblock these ports.

Related

RabbitMQ Connection refused

After application started
Attempting to connect to: [localhost:5672]
2022-09-22 17:51:12.872 WARN 80292 --- [ 127.0.0.1:5672] c.r.c.impl.ForgivingExceptionHandler : An unexpected connection driver error occured (Exception message: Socket closed)
2022-09-22 17:51:12.880 WARN 80292 --- [)-192.168.87.19] o.s.b.a.amqp.RabbitHealthIndicator : Rabbit health check failed
org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection refused (Connection refused)
at org.springframework.amqp.rabbit.support.RabbitExceptionTranslator.convertRabbitAccessException(RabbitExceptionTranslator.java:61) ~[spring-rabbit-2.2.5.RELEASE.jar:2.2.5.RELEASE]
at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:510) ~[spring-rabbit-2.2.5.RELEASE.jar:2.2.5.RELEASE]
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createConnection(CachingConnectionFactory.java:751) ~[spring-rabbit-2.2.5.RELEASE.jar:2.2.5.RELEASE]
at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils.createConnection(ConnectionFactoryUtils.java:214) ~[spring-rabbit-2.2.5.RELEASE.jar:2.2.5.RELEASE]
at org.springframework.amqp.rabbit.core.RabbitTemplate.doExecute(RabbitTemplate.java:2095) ~[spring-rabbit-2.2.5.RELEASE.jar:2.2.5.RELEASE]
at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:2068) ~[spring-rabbit-2.2.5.RELEASE.jar:2.2.5.RELEASE]
Things not related to rabbitMQ are working
application.properties:
#RABBITMQ
spring.rabbitmq.addresses=amqp://rabbitmq:rabbitmq#localhost:5672
try using a simple tool like telnet to check the port is open on the machine, e.g. telnet localhost 5672
are you running this in a Docker container? and is RabbitMQ running on the host machine? In which case localhost identifies the container not the host, you'd need to reference the hosts IP address or some virtual name.
(Not likely to be the cause of the problem is the property settings (unless you are not posting the actual values you are using). I don't think you have the correct property names, but the host/port you say you set are the defaults anyway. I do see the reference to 127.0.0.1:5672 in the logs.)
spring.rabbitmq.host=localhost
spring.rabbitmq.port=5672
See https://docs.spring.io/spring-boot/docs/1.5.6.RELEASE/reference/html/common-application-properties.html

Can't connect TCP transport for host: localhost/127.0.0.1:4444 and Connection is refused to connect

I'm trying to verify the server health monitor for the app which is there in my local host. I'm getting below response. Please take a look into this once and help me out. In View Result tree HTTP Request showing response as 200.
Application: Through Jmeter
Application hosted : Java 11
Metrics: Perfmon
2021-06-16 12:47:04,032 INFO o.a.j.e.StandardJMeterEngine: Running the test!
2021-06-16 12:47:04,033 INFO o.a.j.s.SampleEvent: List of sample_variables: []
2021-06-16 12:47:04,034 INFO k.a.j.p.PerfMonCollector: PerfMon metrics will be stored in C:\Users\DELL\AppData\Local\Temp\perfmon_4504784525335490183.jtl
2021-06-16 12:47:04,038 INFO o.a.j.g.u.JMeterMenuBar: setRunning(true, *local*)
2021-06-16 12:47:04,068 INFO o.a.j.e.StandardJMeterEngine: Starting ThreadGroup: 1 : Thread Group
2021-06-16 12:47:04,068 INFO o.a.j.e.StandardJMeterEngine: Starting 1 threads for group Thread Group.
2021-06-16 12:47:04,069 INFO o.a.j.e.StandardJMeterEngine: Thread will continue on error
2021-06-16 12:47:04,069 INFO o.a.j.t.ThreadGroup: Starting thread group... number=1 threads=1 ramp-up=1 delayedStart=false
2021-06-16 12:47:04,070 INFO o.a.j.t.ThreadGroup: Started thread group number 1
2021-06-16 12:47:04,070 INFO o.a.j.e.StandardJMeterEngine: All thread groups have been started
2021-06-16 12:47:04,070 INFO o.a.j.t.JMeterThread: Thread started: Thread Group 1-1
2021-06-16 12:47:06,002 INFO o.a.j.t.JMeterThread: Thread is done: Thread Group 1-1
2021-06-16 12:47:06,003 INFO o.a.j.t.JMeterThread: Thread finished: Thread Group 1-1
2021-06-16 12:47:06,004 INFO o.a.j.e.StandardJMeterEngine: Notifying test listeners of end of test
2021-06-16 12:47:06,006 ERROR k.a.p.c.AbstractTransport: Error during exit
java.net.SocketException: Connection reset by peer: socket write error
at java.net.SocketOutputStream.socketWrite0(Native Method) ~[?:?]
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:110) ~[?:?]
at java.net.SocketOutputStream.write(SocketOutputStream.java:138) ~[?:?]
at kg.apc.perfmon.client.StreamTransport.writeln(StreamTransport.java:50) ~[perfmon-2.2.2.jar:?]
at kg.apc.perfmon.client.AbstractTransport.disconnect(AbstractTransport.java:63) [perfmon-2.2.2.jar:?]
at kg.apc.jmeter.perfmon.NewAgentConnector.disconnect(NewAgentConnector.java:36) [jmeter-plugins-perfmon-2.1.jar:?]
at kg.apc.jmeter.perfmon.PerfMonCollector.shutdownConnectors(PerfMonCollector.java:281) [jmeter-plugins-perfmon-2.1.jar:?]
at kg.apc.jmeter.perfmon.PerfMonCollector.testEnded(PerfMonCollector.java:149) [jmeter-plugins-perfmon-2.1.jar:?]
at org.apache.jmeter.reporters.ResultCollector.testEnded(ResultCollector.java:345) [ApacheJMeter_core.jar:5.4.1]
at org.apache.jmeter.engine.StandardJMeterEngine.notifyTestListenersOfEnd(StandardJMeterEngine.java:218) [ApacheJMeter_core.jar:5.4.1]
at org.apache.jmeter.engine.StandardJMeterEngine.run(StandardJMeterEngine.java:493) [ApacheJMeter_core.jar:5.4.1]
at java.lang.Thread.run(Thread.java:834) [?:?]
2021-06-16 12:47:06,024 INFO o.a.j.g.u.JMeterMenuBar: setRunning(false, *local*)
What is your platform ?
If your running on Linux, its possible to find what processes have actually opened a tcp port by doing lsof -a -i4 -i6 -itcp
This will tell you what port your application has actually opened
JMeter PerfMon Metrics Collector is a Listener which talks to a special piece of software called PerfMon Server Agent so in order to be able to collect machine performance metrics you need to install this Server Agent onto the machine you want to monitor
Download ServerAgent-x.x.x.zip
Unpack it somewhere
Use startAgent.bat script on Windows or startAgent.sh on Unix and derivatives to start the Server Agent
That's it, you should be able to configure PerfMon Metrics Collector to query the metrics of your choice from the Server Agent
More information: How to Monitor Your Server Health & Performance During a JMeter Load Test

How do i connect my containerised project to kafka running on localhost? [duplicate]

This question already has answers here:
Connect to Kafka running in Docker
(5 answers)
Closed 2 years ago.
I've been trying to connect my containerised spring-boot project with a of kafka and zookeeper running on my localhost but i seem to be getting an error when i run the docker images.
does anyone know what could be causing this error and if so , what the best way to go about fixing it?
i alreadyhave ports 9092 and 2021 exposed
edit:
i was asked to post the text:
2020-04-15 06:55:34,872 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] DEBUG common.network.Selector.pollSelectionKeys - [Consumer clientId=consumer-1, groupId=message] Connection with /172.17.59.17 disconnected
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.kafka.common.network.PlaintextTransportLayer.finishConnect(PlaintextTransportLayer.java:50)
at org.apache.kafka.common.network.KafkaChannel.finishConnect(KafkaChannel.java:216)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:531)
at org.apache.kafka.common.network.Selector.poll(Selector.java:483)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:539)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:212)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:249)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:326)
at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doPoll(KafkaMessageListenerContainer.java:993)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:949)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:901)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
2020-04-15 06:55:34,874 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] DEBUG kafka.clients.NetworkClient.handleDisconnections - [Consumer clientId=consumer-1, groupId=message] Node -1 disconnected.
You need to configure your Kafka broker with correct advertised.listeners. At the moment your Spring client is connecting to the broker for the initial connection, but then receives an internal IP/host from Kafka for subsequent connections, which then fail.
Here's an example of Docker and Zookeeper running in Docker with listeners configured for external connections.
You can read more here: https://rmoff.net/2018/08/02/kafka-listeners-explained/

Spark Cannot Connect to HBaseClusterSingleton

I start an Hbase cluster for my test class. I use that helper class:
HBaseClusterSingleton.java
and use it as like that:
private static final HBaseClusterSingleton cluster = HBaseClusterSingleton.build(1);
I retrieve configuration object as follows:
cluster.getConf()
and I use it at Spark as follows:
sparkContext.newAPIHadoopRDD(conf, MyInputFormat.class, clazzK,
clazzV);
When I run my test there is no need to startup an Hbase cluster because Spark will connect to my dummy cluster. However when I run my test method it throws an error:
2015-08-26 01:19:59,558 INFO [Executor task launch
worker-0-SendThread(localhost:2181)] zookeeper.ClientCnxn
(ClientCnxn.java:logStartConnect(966)) - Opening socket connection to
server localhost/127.0.0.1:2181. Will not attempt to authenticate
using SASL (unknown error)
2015-08-26 01:19:59,559 WARN [Executor
task launch worker-0-SendThread(localhost:2181)] zookeeper.ClientCnxn
(ClientCnxn.java:run(1089)) - Session 0x0 for server null, unexpected
error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused at
sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
Hbase tests, which do not run on Spark, works well. When I check the logs I see that cluster and Spark is started up correctly:
2015-08-26 01:35:21,791 INFO [main] hdfs.MiniDFSCluster
(MiniDFSCluster.java:waitActive(2055)) - Cluster is active
2015-08-26 01:35:40,334 INFO [main] util.Utils
(Logging.scala:logInfo(59)) - Successfully started service
'sparkDriver' on port 56941.
I realized that when I start up an hbase from command line my test method for Spark connects to it!
So, does it means that it doesn't care about the conf I passed to it? Any ideas about how to solve it?

ActiveMQ, Broker Url: how to run on port number other than default 61616

I am trying to run on the port number 5000, tcp://localhost:5000
it shows the following error
WARNING: Could not refresh JMS Connection for destination 'queue://jmsExample' - retrying in 5000 ms. Cause: Could not connect to broker URL: tcp://localhost:5000. Reason: java.net.ConnectException: Connection refused: connect
Any Help please
The port 5000 is already in use by another process or service

Categories