AWS JAVA IoT client reconnections and timeouts - java

I use IoT Rules on CONNECTED/DISCONNECTED topic (from here). So I want to get email when a device is connected or disconnected. On my device I run next code on startup (only on startup):
iotClient = new AWSIotMqttClient(Configuration.IOT_CLIENT_ENDPOINT,
deviceId,
keyStore,
keystorePass);
iotClient.setKeepAliveInterval(1200000); //20 minutes (maximum)
iotClient.connect();
But I get very strange behavior. I have 3 devices, and on each device I get this stacktrace but due to different reasons:
[pool-8-thread-1] com.amazonaws.services.iot.client.core.AwsIotConnection.onConnectionSuccess Connection successfully established
[pool-8-thread-1] com.amazonaws.services.iot.client.core.AbstractAwsIotClient.onConnectionSuccess Client connection active: <client ID>
[pool-8-thread-1] com.amazonaws.services.iot.client.core.AwsIotConnection.onConnectionFailure Connection temporarily lost
[pool-8-thread-1] com.amazonaws.services.iot.client.core.AbstractAwsIotClient.onConnectionFailure Client connection lost: <client ID>
[pool-8-thread-1] com.amazonaws.services.iot.client.core.AwsIotConnection$1.run Connection is being retried
[pool-8-thread-1] com.amazonaws.services.iot.client.core.AwsIotConnection.onConnectionSuccess Connection successfully established
[pool-8-thread-1] com.amazonaws.services.iot.client.core.AbstractAwsIotClient.onConnectionSuccess Client connection active: <client ID>
Sometimes I get this stacktrace due to DUPLICATE_CLIENTID disconnection reason, or sometimes due to MQTT_KEEP_ALIVE_TIMEOUT disconnection reason (MQTT_KEEP_ALIVE_TIMEOUT happens every 30-35 minutes, DUPLICATE_CLIENTID happens every 10 minutes)
So, I don't understand why do I need to deal with DUPLICATE_CLIENTID if each client has a unique ID, and to deal with MQTT_KEEP_ALIVE_TIMEOUT if there no an intermittent connectivity issue (I get logs every minute to my server, so it isn't WIFI/internet issue). I use the latest AWS IoT SDK from here - https://github.com/aws/aws-iot-device-sdk-java.
How can I solve these issues?
MY TRICKY SOLUTION:
I added a scheduled thread that sends empty messages to topic - ${iot:Connection.Thing.ThingName}/ping every 20 minutes:
scheduledExecutor.scheduleAtFixedRate(() -> {
try {
iotClient.publish(String.format(Configuration.PING_TOPIC, deviceId), AWSIotQos.QOS0, "");
} catch (AWSIotException e) {
LOGGER.error("Failed to send ping", e);
}
}, Configuration.PING_INITIAL_DELAY_IN_MINUTES, Configuration.PING_PERIOD_IN_MINUTES, TimeUnit.MINUTES);
So this solution solves inactive issue, but I still want to find a more elegant solution...

Looking at your logs, it definitly seems like it is connection lost, then connection retried.
During reconnection, it is still connecting using the deviceID you are passing, (however the connection might not have existed from MQTT side), and therefore it sees that it is trying to connect with the same id.
Reading a bit about this, looks like you might not be actually registering your device as a (thing) in aws..
If you were, they when you create an MQTT connection and pass that thingId, then even on reconnection, it wont give you that DuplicateID error.
AWSIotMqttClient client = new AWSIotMqttClient(...);
SomeDevice someDevice = new SomeDevice(thingName); // SomeDevice extends AWSIotDevice
client.attach(someDevice);
client.connect();
you can also experiment with iotClient.cleanSession(true/false) to see if that can help you.
/**
* Sets whether the client and server should establish a clean session on each connection.
* If false, the server should attempt to persist the client's state between connections.
* This must be set before {#link #connect()} is called.
*
* #param cleanSession
* If true, the server starts a clean session with the client on each connection.
* If false, the server should persist the client's state between connections.
*/
#Override
public void setCleanSession(boolean cleanSession) { super.setCleanSession(cleanSession); }
https://docs.aws.amazon.com/iot/latest/developerguide/iot-thing-management.html
MQTT_KEEP_ALIVE_TIMEOUT If there is no client-server communication
for 1.5x of the client's keep-alive time, the client is disconnected.
That means you are not sending/receiving messages..there is no way to fix that, unless you keep an active connection and do things

Related

java - [Apache Curator] How to properly close curator

I'm trying to implement a fallback logic on my connection to Zookeeper using Apache Curator, basically I have two sets of connection strings and if I receive a LOST state on my state listener I try to reconnect my curator client on the another set of connection strings. I could simple put all machines on the same connection string but I want to connect on fallback only when all machines for the default cluster are offline.
The problem is that I can't close the previous curator client when I try to change to the fallback cluster, I keep receiving the LOG message saying that curator is trying to reconnect, even after I connect on the fallback set of zookeepers. Below you can find a code example of what I'm trying to do:
final ConnectionStateListener listener = (client1, state) -> {
if (state == ConnectionState.LOST) {
reconnect();
}
};
And the reconnect method (will change the lastHost to the fallback cluster):
if (client != null) {
client.close();
}
...
client = CuratorFrameworkFactory.newClient(
lastHost,
sessionTimeout,
connectionTimeout,
retryPolicy);
...
client.start()
I can successfully connect on the new set of connection strings (fallback) but the problem is that the previous client keep trying to connect on the previous connection strings.
Looking at the close() method I saw that curator only close things if the State of the client is STARTED, I think that's why curator keep trying to connect on the previous cluster.
Is there a way to close() the curator client without having the STARTED state on it?
If not is there another way to implement this logic (fallback zookeeper servers) on curator?
Thanks.

Jms How to know subscriber is not alive anymore

I have a distributed system application that uses JBoss as an application server. I have a client application that serves as a simulation engine. When client is up, it sends an registration message(JMS message) to Server, then some field is set in the database. When Server is up, it sends a message ( a topic) to all clients to check that they are alive. If clients are alive, they can read message and send a response to server (queue) that it is alive.
If user close client normally, client send a message to server that I will unregister. Then server unregisters it. This is done in database side.
If user close client abnormally(kill) , then client can not send a message to server for unregistration. Then server does not know this client is not alive anymore. This causes inconsistency in my application. So I need a way to understand that client subscribed a topic is not subscribed anymore.
Server sends a message to topic to check that clients are alive.
#Schedule(hour = "*", minute = "*", second = "30", persistent = false)
public void sendNodeStatusRequest() {
Message msg = MessageFactory.createStatusRequestMessage();
publishNodeMessage(msg);
}
After a time, Server show following logs. Could I catch this warning from Java?
07:17:00,698 WARN [org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] Connection failure
has been detected: Did not receive ping from /127.0.0.1:61888. It is likely
the client has exited or crashed without closing its connection, or the
network between the server and client has failed. The connection will now be closed. [code=3]
07:17:00,698 WARN [org.hornetq.core.server.impl.ServerSessionImpl] Client
connection failed, clearing up resources for session 4e4e9dc6-153e-11e7-
80fa-742b62812c29
To me the whole point of messaging system is decoupled communication. The sender (server in your case) send its stuff to the topic without actually knowing who will get the message. The clients come and go, and they should be able to read the message whenever it (still) resides in the topic.
Now from your question I understand that the server keeps track of all the connected clients by means of receiving the message back to the dedicated queue.
So I'm asking myself - maybe its something wrong with the design here.
Let me propose slightly different way of implementation.
The server should not be aware of any client, at most (because your system seems to work this way) it should know that client A, B and C are alive now only because these clients passed to the server this knowledge.
Why just don't make clients sending the "keep-alive" message every, say 1 minute (or less, depending on your needs) to the server queue without prior message from the server.
The message can include some client identifier and probably time if its not added by the infrastructure or something)
So the server will just get this message and it will keep track in memory the list of available clients along with the last time they've sent something.
So if some client disconnects "gracefully" - it can send a special message to the server like "I'm client A and consider me disconnected". Otherwise (abnormal termination/network outage/whatever) - it just won't send anything, the server will have a special process that will check whether there are stale clients on the list and if it finds them - it knows that something went wrong.
If you still want to stick with JMS way of doing, then you can try to send the message synchronously, meaning the producer will wait until it hears from the consumer. More information here : http://docs.oracle.com/javaee/6/tutorial/doc/bncfa.html

Spymemcached and Connection Failures

I though Spymemcached does attempt to reestablish connection to the server when this connection get lost.
But I am seeing something different; wondering what I misunderstand or what I do wrong. Here is some sample code:
MemcachedClient c=new MemcachedClient(AddrUtil.getAddresses("server:11211"));
while(true)
try {
System.out.println(c.get("hatsts"));
Thread.sleep(10000);
} catch(Exception e) {
e.printStackTrace();
}
It runs initially without problem. Then I pull the network plug. Subsequently, the client detects a network failure and throws following exception:
net.spy.memcached.OperationTimeoutException: Timeout waiting for value
But then, when i re-establish the network, the client does not recover and continues throwing the exception; even after 5 min. I tried SpyMemcached 2.10.6 and 2.9.0.
What am I missing?
The problem here is that because you pulled the network cable the tcp socket on you client still thinks the connection is valid. The tcp keepalive varies from operating system to operating system and can be as high as 30 minutes. As a result the application (in this case Spymemcached) is not notified that the connection is no longer valid by the tcp layer and therefore Spymemcached doesn't try to reconnect.
The way Spymemcached detects this situation is by counting the amount of consecutive operation timeouts. The last time I checked the default value was 99. Once this many ops time out then Spymemcached will reconnect. You can change this value in the ConnectionFactory if you want to set it to some other value. There's a function called getContinuousTimeout() which is where the Spymemcached gets 99 from by default. You can construct your own ConnectionFactory with the ConnectionFactoryBuilder.
Hopefully this is enough information to answer your question and get you going in the right direction. If not let me know and I can add some more details.

How to setup timeout for ejb lookup in websphere 7.0

I have developed a standalone Javase client which performs an EJB Lookup to a remote server and executes its method.The Server application is in EJB 3.0
Under some strange magical but rare situations my program hangs indefinetly, on looking inside the issue it seems that while looking up the ejb on the server, I never get the response from the server and it also never times out.
I would like to know if there is a property or any other way through which we can setup the lookup time in client or at the server side.
There is a very nice article that discusses ORB configuration best practices at DeveloperWorks here. I'm quoting the three different settings that can be configured at client (you, while doing a lookup and executing a method at a remote server);
Connect timeout: Before the client ORB can even send a request to a server, it needs to establish an IIOP connection (or re-use an
existing one). Under normal circumstances, the IIOP and underlying TCP
connect operations should complete very fast. However, contention on
the network or another unforeseen factor could slow this down. The
default connect timeout is indefinite, but the ORB custom property
com.ibm.CORBA.ConnectTimeout (in seconds) can be used to change the
timeout.
Locate request timeout: Once a connection has been established and a client sends an RMI request to the server, then LocateRequestTimeout
can be used to limit the time for the CORBA LocateRequest (a CORBA
“ping”) for the object. As a result, the LocateRequestTimeout should
be less than or equal to the RequestTimeout because it is a much
shorter operation in terms of data sent back and forth. Like the
RequestTimeout, the LocateRequestTimeout defaults to 180 seconds.
Request timeout: Once the client ORB has an established TCP connection to the server, it will send the request across. However, it
will not wait indefinitely for a response, by default it will wait for
180 seconds. This is the ORB request timeout interval. This can
typically be lowered, but it should be in line with the expected
application response times from the server.
You can try the following code, which performs task & then waits at most the time specified.
Future<Object> future = executorService.submit(new Callable<Object>() {
public Object call() {
return lookup(JNDI_URL);
}
});
try {
Object result = future.get(20L, TimeUnit.SECONDS); //- Waiting for at most 20 sec
} catch (ExecutionException ex) {
logger.log(LogLevel.ERROR,ex.getMessage());
return;
}
Also, the task can be cancelled by future.cancel(true).
Remote JNDI uses the ORB, so the only option available is com.ibm.CORBA.RequestTimeout, but that will have an affect on all remote calls. As described in the 7.0 InfoCenter, the default value is 180 (3 minutes).

Forcing socket.connect to wait a specific time before it decides a connection is unavailable

I'm issuing a socket connection, using the following snippet
Socket socket = new Socket();
InetSocketAddress endPoint = new InetSocketAddress("localhost", 1234);
try
{
socket.connect(endPoint, 30000);
}
catch (IOException e)
{
e.printStackTrace();
// Logging
}
The endpoint it is trying to connect to is offline, what I want it to do is to attempt to connect, and using the 30000ms timeout, wait for that period of time before it concludes a result
Currently, that 30000 parameter doesn't seem to be applied, as from the timestamp on my logging it appears that it is determining within 1 second that a connection failed.
How can I force the connect to wait for a set amount of time before giving up?
13:13:57,685 6235 DEBUG [Thread-7] - Unable to connect to [localhost:1234]
13:13:58,685 7235 DEBUG [Thread-7] - Unable to connect to [localhost:1234]
13:13:59,695 8245 DEBUG [Thread-7] - Unable to connect to [localhost:1234]
13:14:00,695 9245 DEBUG [Thread-7] - Unable to connect to [localhost:1234]
EDIT : The API does state Connects this socket to the server with a specified timeout value. A timeout of zero is interpreted as an infinite timeout. The connection will then block until established or an error occurs. however it appears I'm not experiencing such behaviour, or am not catering to it, most likely the latter
What you're getting here is correct. connect won't sit on a socket waiting until it sees a server, it will attempt to connect and wait for a response. if there is nothing to connect to, it returns. if there is something to connect to, it will wait timeout seconds for a response and fail if none is received.
You need to distinguish among several possible exception conditions.
ConnectException with the text 'connection refused', which means the host was up and reachable and nothing was listening at the port. This happens very quickly and cannot be subjected to a timeout.
NoRouteToHostException: this indicates a connectivity issue. Again it happens immediately and cannot be subjected to a timeout.
UnknownHostException: the host names cannot be resolved via DNS. This happens immediately, or rather after a generally short DNS delay, and cannot be subjected to a timeout.
ConnectException with any other text: this can indicate a failure to respond by the target system. Usually happens when firewalls are present. Can be subjected to a timeout.
You are doing the correct thing by calling Socket.connect() with a timeout parameter. If you don't do this, or if you specify a zero timeout, the default system timeout is used, which is of the order of 60-75 seconds depending on the platform. This is contrary to the Javadoc's statement about an 'infinite timeout', which is not correct. Also you cannot increase the timeout beyond this limit via Socket.connect() witha a timeout parameter. Alternatively you can use java.nio socket channels in non-blocking mode with a select() to administer the timeout for you, but you still can't increase the timeout beyond the platform default via this or any other method.
When the timeout occurs, a SocketTimeoutException exception is thrown which you do not catch and log. The IOException is fired when "an error occurs during the connection". The timeout is never applied because there's an error beforehand.
Edit: Just to clarify: TCP/IP as a suite has many specifics that could prevent a packet from reaching it's desired outcome (a SYN/ACK packet). If a computer responds to your SYN packet by an informing your application that the port is closed (i.e. there's no application running/listening there), it would fire an exception telling you that it is impossible to connect to that port. If you wish to send and re-send SYN packets either way with the knowledge that an application will come online listening on that port, this is done on a different network layer (and, as far as I know, is not accessible with Java out-of-the-box).
Try scocket.setSoTimeout(timeout) before connecting.

Categories