How to reconnect a HazelcastClient to HazelcastServer after server restart

How to reconnect a HazelcastClient to HazelcastServer after server restart - java

I'm having a problem using the hazelcast in an architecture based on microservice and springboot.
I keep one of the applications with being the application that will be the server of hazelcast and the others are clients of this.
However if I have to update the application that is the hazelcast server, the clients applications of the cache overturn the connection to the server and when I up the new version of the server these client applications do not reconnect.
Is there any off setting the hazelcastclient to be doing pooling on the server to try to reconnect as soon as it comes back?
My client is like below:
#bean
open fun hazelcastInstance(): HazelcastInstance? {
return try {
val clientConfig = ClientConfig()
HazelcastClient.newHazelcastClient(clientConfig)
} catch (e: Exception) {
log.error("Could not connect to hazelcast server, server up without cache")
null
}
}
and I receive "com.hazelcast.client.HazelcastClientNotActiveException: Client is shutdown" if my server goes down.
I'm grateful if you could help me

The Connection Attempt Limit and Connection Attempt Period configuration elements help to configure clients' reconnection behaviour. The client will retry as many as ClientNetworkConfig.connectionAttemptLimit times to reconnect to the cluster. Connection Attempt Period is the duration in milliseconds between the connection attempts defined by ClientNetworkConfig.connectionAttemptLimit. Here is an example of how you configure them:
ClientConfig clientConfig = new ClientConfig();
clientConfig.getNetworkConfig().setConnectionAttemptLimit(5);
clientConfig.getNetworkConfig().setConnectionAttemptPeriod(5000);
Starting with Hazelcast 3.9, you can use configuration element reconnect-mode to configure how the client will reconnect to the cluster after a disconnection. It has three options (OFF, ON or ASYNC). The option OFF disables the reconnection. ON enables reconnection in a blocking manner where all the waiting invocations will be blocked until a cluster connection is established or failed. The option ASYNC enables reconnection in a non-blocking manner where all the waiting invocations will receive a HazelcastClientOfflineException. Its default value is ON. You can see a configuration example below:
ClientConfig clientConfig = new ClientConfig();
clientConfig.getConnectionStrategyConfig()
.setReconnectMode(ClientConnectionStrategyConfig.ReconnectMode.ON);

Related

AWS JAVA IoT client reconnections and timeouts

I use IoT Rules on CONNECTED/DISCONNECTED topic (from here). So I want to get email when a device is connected or disconnected. On my device I run next code on startup (only on startup):
iotClient = new AWSIotMqttClient(Configuration.IOT_CLIENT_ENDPOINT,
deviceId,
keyStore,
keystorePass);
iotClient.setKeepAliveInterval(1200000); //20 minutes (maximum)
iotClient.connect();
But I get very strange behavior. I have 3 devices, and on each device I get this stacktrace but due to different reasons:
[pool-8-thread-1] com.amazonaws.services.iot.client.core.AwsIotConnection.onConnectionSuccess Connection successfully established
[pool-8-thread-1] com.amazonaws.services.iot.client.core.AbstractAwsIotClient.onConnectionSuccess Client connection active: <client ID>
[pool-8-thread-1] com.amazonaws.services.iot.client.core.AwsIotConnection.onConnectionFailure Connection temporarily lost
[pool-8-thread-1] com.amazonaws.services.iot.client.core.AbstractAwsIotClient.onConnectionFailure Client connection lost: <client ID>
[pool-8-thread-1] com.amazonaws.services.iot.client.core.AwsIotConnection$1.run Connection is being retried
[pool-8-thread-1] com.amazonaws.services.iot.client.core.AwsIotConnection.onConnectionSuccess Connection successfully established
[pool-8-thread-1] com.amazonaws.services.iot.client.core.AbstractAwsIotClient.onConnectionSuccess Client connection active: <client ID>
Sometimes I get this stacktrace due to DUPLICATE_CLIENTID disconnection reason, or sometimes due to MQTT_KEEP_ALIVE_TIMEOUT disconnection reason (MQTT_KEEP_ALIVE_TIMEOUT happens every 30-35 minutes, DUPLICATE_CLIENTID happens every 10 minutes)
So, I don't understand why do I need to deal with DUPLICATE_CLIENTID if each client has a unique ID, and to deal with MQTT_KEEP_ALIVE_TIMEOUT if there no an intermittent connectivity issue (I get logs every minute to my server, so it isn't WIFI/internet issue). I use the latest AWS IoT SDK from here - https://github.com/aws/aws-iot-device-sdk-java.
How can I solve these issues?
MY TRICKY SOLUTION:
I added a scheduled thread that sends empty messages to topic - ${iot:Connection.Thing.ThingName}/ping every 20 minutes:
scheduledExecutor.scheduleAtFixedRate(() -> {
try {
iotClient.publish(String.format(Configuration.PING_TOPIC, deviceId), AWSIotQos.QOS0, "");
} catch (AWSIotException e) {
LOGGER.error("Failed to send ping", e);
}
}, Configuration.PING_INITIAL_DELAY_IN_MINUTES, Configuration.PING_PERIOD_IN_MINUTES, TimeUnit.MINUTES);
So this solution solves inactive issue, but I still want to find a more elegant solution...

Looking at your logs, it definitly seems like it is connection lost, then connection retried.
During reconnection, it is still connecting using the deviceID you are passing, (however the connection might not have existed from MQTT side), and therefore it sees that it is trying to connect with the same id.
Reading a bit about this, looks like you might not be actually registering your device as a (thing) in aws..
If you were, they when you create an MQTT connection and pass that thingId, then even on reconnection, it wont give you that DuplicateID error.
AWSIotMqttClient client = new AWSIotMqttClient(...);
SomeDevice someDevice = new SomeDevice(thingName); // SomeDevice extends AWSIotDevice
client.attach(someDevice);
client.connect();
you can also experiment with iotClient.cleanSession(true/false) to see if that can help you.
/**
* Sets whether the client and server should establish a clean session on each connection.
* If false, the server should attempt to persist the client's state between connections.
* This must be set before {#link #connect()} is called.
*
* #param cleanSession
* If true, the server starts a clean session with the client on each connection.
* If false, the server should persist the client's state between connections.
*/
#Override
public void setCleanSession(boolean cleanSession) { super.setCleanSession(cleanSession); }
https://docs.aws.amazon.com/iot/latest/developerguide/iot-thing-management.html
MQTT_KEEP_ALIVE_TIMEOUT If there is no client-server communication
for 1.5x of the client's keep-alive time, the client is disconnected.
That means you are not sending/receiving messages..there is no way to fix that, unless you keep an active connection and do things

MQTT connection breaks upon 1000 simultaneous requests

I am Using Amazon Mq as my Mqtt broker and when around 1000 requests are received simultaneously the mqtt broker breaks and disconnects. Can Anyone tell me how to use Amazon Mq as my broker & simultaneously solve the scaling problem also.

I'm assuming that you have created ActiveMQ as a singleton class. Right?
-For producing a message, you create an instance of PooledConnectionFactory like
-------//some code here
ActiveMQConnectionFactory connectionFactory = new ActiveMQConnectionFactory(MQTT_END_POINT);
connectionFactory.setUserName();
connectionFactory.setPassword();
PooledConnectionFactory pooledConnectionFactory = getActiveMQInstance().configurePooledConnectionFactory(activeMQConnectionFactory);
-------
This pooledConnectionFactory is used to create a connection then session and then destination is entered (as mentioned on AmazonMQ documentation). You send the message using MessageProducer object and close the MessageProducer, session and connection
-For consumption, there will be an always-alive-listener that is always ready for message to arrive. The consumer part, it follows the same process like consumerConnection, then session and then destination queue to listen on.
As far as I remember, this part is also mentioned in amazonMQ documentation.
There is one problem that the connection to broker is lost for consumer sometimes, (since producer reopens the connections, produces and closes, it is not observed in it). Remember, you will have to reestablish the connection for consumer.
If there is any variance from the above approach please mention. Also, add your amazonMQ broker picture showing the connection, queue, active consumers.
Just out of curiosity, what are the maximum connections you have set for the PooledConnectionFactory?

java - [Apache Curator] How to properly close curator

I'm trying to implement a fallback logic on my connection to Zookeeper using Apache Curator, basically I have two sets of connection strings and if I receive a LOST state on my state listener I try to reconnect my curator client on the another set of connection strings. I could simple put all machines on the same connection string but I want to connect on fallback only when all machines for the default cluster are offline.
The problem is that I can't close the previous curator client when I try to change to the fallback cluster, I keep receiving the LOG message saying that curator is trying to reconnect, even after I connect on the fallback set of zookeepers. Below you can find a code example of what I'm trying to do:
final ConnectionStateListener listener = (client1, state) -> {
if (state == ConnectionState.LOST) {
reconnect();
}
};
And the reconnect method (will change the lastHost to the fallback cluster):
if (client != null) {
client.close();
}
...
client = CuratorFrameworkFactory.newClient(
lastHost,
sessionTimeout,
connectionTimeout,
retryPolicy);
...
client.start()
I can successfully connect on the new set of connection strings (fallback) but the problem is that the previous client keep trying to connect on the previous connection strings.
Looking at the close() method I saw that curator only close things if the State of the client is STARTED, I think that's why curator keep trying to connect on the previous cluster.
Is there a way to close() the curator client without having the STARTED state on it?
If not is there another way to implement this logic (fallback zookeeper servers) on curator?
Thanks.

Hazelcast - Client mode - How to recover after cluster failure?

We are using hazelcast distributed lock and cache functions in our products. Usage of distributed locking is vitally important for our business logic.
Currently we are using the embedded mode(each application node is also a hazelcast cluster member). We are going to switch to client - server mode.
The problem we have noticed for client - server is that, once the cluster is down for a period, after several attempts clients are destroyed and any objects (maps, sets, etc.) that were retrieved from that client are no longer usable.
Also the client instance does not recover even after the Hazelcast cluster comes back up (we receive HazelcastInstanceNotActiveException )
I know that this issue has been addressed several times and ended up as being a feature request:
issue1
issue2
issue3
My question : What should be the strategy to recover the client? Currently we are planning to enqueue a task in the client process as below. Based on a condition it will try to restart the client instance...
We will check whether the client is running or not via clientInstance.getLifecycleService().isRunning() check.
Here is the task code:
private class ClientModeHazelcastInstanceReconnectorTask implements Runnable {
#Override
public void run() {
try {
HazelCastService hazelcastService = HazelCastService.getInstance();
HazelcastInstance clientInstance = hazelcastService.getHazelcastInstance();
boolean running = clientInstance.getLifecycleService().isRunning();
if (!running) {
logger.info("Current clientInstance is NOT running. Trying to start hazelcastInstance from ClientModeHazelcastInstanceReconnectorTask...");
hazelcastService.startHazelcastInstance(HazelcastOperationMode.CLIENT);
}
} catch (Exception ex) {
logger.error("Error occured in ClientModeHazelcastInstanceReconnectorTask !!!", ex);
}
}
}
Is this approach suitable? I also tried to listen LifeCycle events but could not make it work via events.
Regards

In Hazelcast 3.9 we changed the way connection and reconnection works in clients. You can read about the new behavior in the docs: http://docs.hazelcast.org/docs/3.9.1/manual/html-single/index.html#configuring-client-connection-strategy
I hope this helps.

In Hazelcast 3.10 you may increase connection attempt limit from 2 (by default) to maximum:
ClientConfig clientConfig = new ClientConfig();
clientConfig.getNetworkConfig().setConnectionAttemptLimit(Integer.MAX_VALUE);

How to setup timeout for ejb lookup in websphere 7.0

I have developed a standalone Javase client which performs an EJB Lookup to a remote server and executes its method.The Server application is in EJB 3.0
Under some strange magical but rare situations my program hangs indefinetly, on looking inside the issue it seems that while looking up the ejb on the server, I never get the response from the server and it also never times out.
I would like to know if there is a property or any other way through which we can setup the lookup time in client or at the server side.

There is a very nice article that discusses ORB configuration best practices at DeveloperWorks here. I'm quoting the three different settings that can be configured at client (you, while doing a lookup and executing a method at a remote server);
Connect timeout: Before the client ORB can even send a request to a server, it needs to establish an IIOP connection (or re-use an
existing one). Under normal circumstances, the IIOP and underlying TCP
connect operations should complete very fast. However, contention on
the network or another unforeseen factor could slow this down. The
default connect timeout is indefinite, but the ORB custom property
com.ibm.CORBA.ConnectTimeout (in seconds) can be used to change the
timeout.
Locate request timeout: Once a connection has been established and a client sends an RMI request to the server, then LocateRequestTimeout
can be used to limit the time for the CORBA LocateRequest (a CORBA
“ping”) for the object. As a result, the LocateRequestTimeout should
be less than or equal to the RequestTimeout because it is a much
shorter operation in terms of data sent back and forth. Like the
RequestTimeout, the LocateRequestTimeout defaults to 180 seconds.
Request timeout: Once the client ORB has an established TCP connection to the server, it will send the request across. However, it
will not wait indefinitely for a response, by default it will wait for
180 seconds. This is the ORB request timeout interval. This can
typically be lowered, but it should be in line with the expected
application response times from the server.

You can try the following code, which performs task & then waits at most the time specified.
Future<Object> future = executorService.submit(new Callable<Object>() {
public Object call() {
return lookup(JNDI_URL);
}
});
try {
Object result = future.get(20L, TimeUnit.SECONDS); //- Waiting for at most 20 sec
} catch (ExecutionException ex) {
logger.log(LogLevel.ERROR,ex.getMessage());
return;
}
Also, the task can be cancelled by future.cancel(true).

Remote JNDI uses the ORB, so the only option available is com.ibm.CORBA.RequestTimeout, but that will have an affect on all remote calls. As described in the 7.0 InfoCenter, the default value is 180 (3 minutes).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.