Apache Ignite Client restart scenario

Apache Ignite Client restart scenario - java

This is the scenario
I started the Server node.
I started Client Ignite node which will be done via a Java application say "X".
In visor I could see two nodes one is client and one is server when given command "node".
I killed the Java app "X" by doing "kill -9 pid".
Now when I go to visor terminal and enter "node" it still shows "client" and "server" nodes in the list. when asked about client node details it throws error obviously.
Now, when I restart the Java app "X", in that Java code again there will be an attempt to connect to Ignite server. But instead of connecting it is printing these logs so many times
"org.apache.ignite.logger.java.JavaLogger" "info" "INFO" "" "284" "Accepted incoming communication connection [locAddr=/0:0:0:0:0:0:0:1:47101, rmtAddr=/0:0:0:0:0:0:0:1:62856]" "" "" "" "" "" "" "1587013526124" "" "" "" "" "" "" "ROOT" "{""service"":"""",""logger_name"":""org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi""}"
It's not connecting and continuing to execute the code in Java. So the application is not resuming. And I found this is Ignite server log
[10:37:57] Possible failure suppressed accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_CRITICAL_OPERATION_TIMEOUT, err=class o.a.i.IgniteException: Checkpoint read lock acquisition has been timed out.]]
[10:37:57,739][SEVERE][exchange-worker-#46][GridCacheDatabaseSharedManager] Checkpoint read lock acquisition has been timed out.
class org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$CheckpointReadLockTimeoutException: Checkpoint read lock acquisition has been timed out.
at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.failCheckpointReadLock(GridCacheDatabaseSharedManager.java:1708)
at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1640)
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initTopologies(GridDhtPartitionsExchangeFuture.java:1078)
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:944)
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3258)
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3104)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
at java.lang.Thread.run(Thread.java:748)
[10:39:21,547][SEVERE][tcp-disco-msg-worker-[693d29cd 0:0:0:0:0:0:0:1%lo0:47501 crd]-#2][G] Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=db-checkpoint-thread, threadName=db-checkpoint-thread-#59, blockedFor=209s]
I am assuming here that since I am force shutting down the Java application which starts Ignite Client node, it's possible that there would be some topology imbalance that might happen.
Can someone please suggest, if at all I force kill the Client application, is there a correct way to restart the Client application such that it'll continue re-establishing the connection with Ignite server and continue working?

This scenario is possible when you have very long timeouts.
You should not expect node to be dropped, and a new one to join, before all timeout runs off, such as, network timeout, socket write timeout, failure detection timeout. That, unless you do graceful shutdown.

Related

JSMPP failover multi URL

I am creating an SMPP message sender with endpoint IP Address, Port, etc as parameter;
Currently it is working fine, but a requirement needs to automatic connect to another supplied IP Address if certain conditions activated.
As you might guess, i'm experiencing racing condition between (let's say)
T1 and T2.
Below are simple ideas of how my current codes structured
ConnectionManager : initiate connection (first time) and hold connection pool and sessions
Runner : get current connection from manager and execute submit_sm() async, if any error detected reinitiate new connection held with manager
When T1 got error and want to re-initiate new connection to new address, T2 has already running and attempt to submit message before T1 finished initiate connection, of course it got error pool not open
Is there any ideas or references about how to implement this kind of mechanism?
PS: I implement this as Nifi Processor,
Thank you

Know if an ActiveMQConnection using failover is connected or not to a broker

In my Java application I am using the failover transport to connect to a local ActiveMQ broker:
failover:(tcp://0.0.0.0:61616)
I create one single connection that I reuse in the rest of the application:
ActiveMQConnection connection = (ActiveMQConnection) connectionFactory.createConnection();
In another part of the application when I receive some external call I need to send a message to the broker, and so, for doing that I create a new "Session":
Session locSession = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);
When the broker is down my app tries to reconnect to the broker forever (this is the expected behavior I really want to have).
However, the problem is that if the broker is down and I receive a call that invokes the code that executes the connection.createSession(false, Session.AUTO_ACKNOWLEDGE) then my app hangs forever on this line of code waiting for the app to reconnect successfully to the broker and then create the session.
Please, do you know any way to check before I execute createSession if the connection object is trying to reconnect or it is really connected? If I am able to know this I could avoid the creation of the session if the app is not connected to the broker (only trying to reconnect) and therefore I would avoid to hanging on connection.createSession forever (I would raise an exception).
I wasn't able to find any property or method on ActiveMQConnection to gather this information.

The failover: url provides a setting startupMaxReconnectAttempts to prevent infinite retry when connecting to the broker the first time.
Also note-- If you want an exception to bubble up, that conflicts with requirement to have infinite retry. You would need to adjust the failover settings to match your intended behavior, by setting a max count or max time to perform retry, then throw an exception and unblock your caller.
For example, you could indicate you only want to retry for 5 minutes, then receive an exception to handle in the code to prevent the infinite blocking.

Thank you all for your help and suggestions. They helped me a lot in re-focusing the problem.
However I f found the answer to my question using the method "getTransport().isConnected()".

Jms How to know subscriber is not alive anymore

I have a distributed system application that uses JBoss as an application server. I have a client application that serves as a simulation engine. When client is up, it sends an registration message(JMS message) to Server, then some field is set in the database. When Server is up, it sends a message ( a topic) to all clients to check that they are alive. If clients are alive, they can read message and send a response to server (queue) that it is alive.
If user close client normally, client send a message to server that I will unregister. Then server unregisters it. This is done in database side.
If user close client abnormally(kill) , then client can not send a message to server for unregistration. Then server does not know this client is not alive anymore. This causes inconsistency in my application. So I need a way to understand that client subscribed a topic is not subscribed anymore.
Server sends a message to topic to check that clients are alive.
#Schedule(hour = "*", minute = "*", second = "30", persistent = false)
public void sendNodeStatusRequest() {
Message msg = MessageFactory.createStatusRequestMessage();
publishNodeMessage(msg);
}
After a time, Server show following logs. Could I catch this warning from Java?
07:17:00,698 WARN [org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] Connection failure
has been detected: Did not receive ping from /127.0.0.1:61888. It is likely
the client has exited or crashed without closing its connection, or the
network between the server and client has failed. The connection will now be closed. [code=3]
07:17:00,698 WARN [org.hornetq.core.server.impl.ServerSessionImpl] Client
connection failed, clearing up resources for session 4e4e9dc6-153e-11e7-
80fa-742b62812c29

To me the whole point of messaging system is decoupled communication. The sender (server in your case) send its stuff to the topic without actually knowing who will get the message. The clients come and go, and they should be able to read the message whenever it (still) resides in the topic.
Now from your question I understand that the server keeps track of all the connected clients by means of receiving the message back to the dedicated queue.
So I'm asking myself - maybe its something wrong with the design here.
Let me propose slightly different way of implementation.
The server should not be aware of any client, at most (because your system seems to work this way) it should know that client A, B and C are alive now only because these clients passed to the server this knowledge.
Why just don't make clients sending the "keep-alive" message every, say 1 minute (or less, depending on your needs) to the server queue without prior message from the server.
The message can include some client identifier and probably time if its not added by the infrastructure or something)
So the server will just get this message and it will keep track in memory the list of available clients along with the last time they've sent something.
So if some client disconnects "gracefully" - it can send a special message to the server like "I'm client A and consider me disconnected". Otherwise (abnormal termination/network outage/whatever) - it just won't send anything, the server will have a special process that will check whether there are stale clients on the list and if it finds them - it knows that something went wrong.

If you still want to stick with JMS way of doing, then you can try to send the message synchronously, meaning the producer will wait until it hears from the consumer. More information here : http://docs.oracle.com/javaee/6/tutorial/doc/bncfa.html

Spymemcached and Connection Failures

I though Spymemcached does attempt to reestablish connection to the server when this connection get lost.
But I am seeing something different; wondering what I misunderstand or what I do wrong. Here is some sample code:
MemcachedClient c=new MemcachedClient(AddrUtil.getAddresses("server:11211"));
while(true)
try {
System.out.println(c.get("hatsts"));
Thread.sleep(10000);
} catch(Exception e) {
e.printStackTrace();
}
It runs initially without problem. Then I pull the network plug. Subsequently, the client detects a network failure and throws following exception:
net.spy.memcached.OperationTimeoutException: Timeout waiting for value
But then, when i re-establish the network, the client does not recover and continues throwing the exception; even after 5 min. I tried SpyMemcached 2.10.6 and 2.9.0.
What am I missing?

The problem here is that because you pulled the network cable the tcp socket on you client still thinks the connection is valid. The tcp keepalive varies from operating system to operating system and can be as high as 30 minutes. As a result the application (in this case Spymemcached) is not notified that the connection is no longer valid by the tcp layer and therefore Spymemcached doesn't try to reconnect.
The way Spymemcached detects this situation is by counting the amount of consecutive operation timeouts. The last time I checked the default value was 99. Once this many ops time out then Spymemcached will reconnect. You can change this value in the ConnectionFactory if you want to set it to some other value. There's a function called getContinuousTimeout() which is where the Spymemcached gets 99 from by default. You can construct your own ConnectionFactory with the ConnectionFactoryBuilder.
Hopefully this is enough information to answer your question and get you going in the right direction. If not let me know and I can add some more details.

Forcing socket.connect to wait a specific time before it decides a connection is unavailable

I'm issuing a socket connection, using the following snippet
Socket socket = new Socket();
InetSocketAddress endPoint = new InetSocketAddress("localhost", 1234);
try
{
socket.connect(endPoint, 30000);
}
catch (IOException e)
{
e.printStackTrace();
// Logging
}
The endpoint it is trying to connect to is offline, what I want it to do is to attempt to connect, and using the 30000ms timeout, wait for that period of time before it concludes a result
Currently, that 30000 parameter doesn't seem to be applied, as from the timestamp on my logging it appears that it is determining within 1 second that a connection failed.
How can I force the connect to wait for a set amount of time before giving up?
13:13:57,685 6235 DEBUG [Thread-7] - Unable to connect to [localhost:1234]
13:13:58,685 7235 DEBUG [Thread-7] - Unable to connect to [localhost:1234]
13:13:59,695 8245 DEBUG [Thread-7] - Unable to connect to [localhost:1234]
13:14:00,695 9245 DEBUG [Thread-7] - Unable to connect to [localhost:1234]
EDIT : The API does state Connects this socket to the server with a specified timeout value. A timeout of zero is interpreted as an infinite timeout. The connection will then block until established or an error occurs. however it appears I'm not experiencing such behaviour, or am not catering to it, most likely the latter

What you're getting here is correct. connect won't sit on a socket waiting until it sees a server, it will attempt to connect and wait for a response. if there is nothing to connect to, it returns. if there is something to connect to, it will wait timeout seconds for a response and fail if none is received.

You need to distinguish among several possible exception conditions.
ConnectException with the text 'connection refused', which means the host was up and reachable and nothing was listening at the port. This happens very quickly and cannot be subjected to a timeout.
NoRouteToHostException: this indicates a connectivity issue. Again it happens immediately and cannot be subjected to a timeout.
UnknownHostException: the host names cannot be resolved via DNS. This happens immediately, or rather after a generally short DNS delay, and cannot be subjected to a timeout.
ConnectException with any other text: this can indicate a failure to respond by the target system. Usually happens when firewalls are present. Can be subjected to a timeout.
You are doing the correct thing by calling Socket.connect() with a timeout parameter. If you don't do this, or if you specify a zero timeout, the default system timeout is used, which is of the order of 60-75 seconds depending on the platform. This is contrary to the Javadoc's statement about an 'infinite timeout', which is not correct. Also you cannot increase the timeout beyond this limit via Socket.connect() witha a timeout parameter. Alternatively you can use java.nio socket channels in non-blocking mode with a select() to administer the timeout for you, but you still can't increase the timeout beyond the platform default via this or any other method.

When the timeout occurs, a SocketTimeoutException exception is thrown which you do not catch and log. The IOException is fired when "an error occurs during the connection". The timeout is never applied because there's an error beforehand.
Edit: Just to clarify: TCP/IP as a suite has many specifics that could prevent a packet from reaching it's desired outcome (a SYN/ACK packet). If a computer responds to your SYN packet by an informing your application that the port is closed (i.e. there's no application running/listening there), it would fire an exception telling you that it is impossible to connect to that port. If you wish to send and re-send SYN packets either way with the knowledge that an application will come online listening on that port, this is done on a different network layer (and, as far as I know, is not accessible with Java out-of-the-box).

Try scocket.setSoTimeout(timeout) before connecting.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.