Hazelcast - Client mode - How to recover after cluster failure?

Hazelcast - Client mode - How to recover after cluster failure? - java

We are using hazelcast distributed lock and cache functions in our products. Usage of distributed locking is vitally important for our business logic.
Currently we are using the embedded mode(each application node is also a hazelcast cluster member). We are going to switch to client - server mode.
The problem we have noticed for client - server is that, once the cluster is down for a period, after several attempts clients are destroyed and any objects (maps, sets, etc.) that were retrieved from that client are no longer usable.
Also the client instance does not recover even after the Hazelcast cluster comes back up (we receive HazelcastInstanceNotActiveException )
I know that this issue has been addressed several times and ended up as being a feature request:
issue1
issue2
issue3
My question : What should be the strategy to recover the client? Currently we are planning to enqueue a task in the client process as below. Based on a condition it will try to restart the client instance...
We will check whether the client is running or not via clientInstance.getLifecycleService().isRunning() check.
Here is the task code:
private class ClientModeHazelcastInstanceReconnectorTask implements Runnable {
#Override
public void run() {
try {
HazelCastService hazelcastService = HazelCastService.getInstance();
HazelcastInstance clientInstance = hazelcastService.getHazelcastInstance();
boolean running = clientInstance.getLifecycleService().isRunning();
if (!running) {
logger.info("Current clientInstance is NOT running. Trying to start hazelcastInstance from ClientModeHazelcastInstanceReconnectorTask...");
hazelcastService.startHazelcastInstance(HazelcastOperationMode.CLIENT);
}
} catch (Exception ex) {
logger.error("Error occured in ClientModeHazelcastInstanceReconnectorTask !!!", ex);
}
}
}
Is this approach suitable? I also tried to listen LifeCycle events but could not make it work via events.
Regards

In Hazelcast 3.9 we changed the way connection and reconnection works in clients. You can read about the new behavior in the docs: http://docs.hazelcast.org/docs/3.9.1/manual/html-single/index.html#configuring-client-connection-strategy
I hope this helps.

In Hazelcast 3.10 you may increase connection attempt limit from 2 (by default) to maximum:
ClientConfig clientConfig = new ClientConfig();
clientConfig.getNetworkConfig().setConnectionAttemptLimit(Integer.MAX_VALUE);

Related

Apache Ignite: Caches unusable after reconnecting to Ignite servers

I am using Apache Ignite as a distributed cache and I am running into some fundamental robustness issues. If our Ignite servers reboot for any reason it seems like this breaks all of our Ignite clients, even after the Ignite servers come back online.
This is the error the clients see when interacting with caches after the servers reboot and the clients reconnect:
Caused by: org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed to perform cache operation (cache is stopped): <redacted>
My expectation is that the Ignite clients would reconnect to the Ignite servers and continue working once the servers are online. From what I've read thick clients should do this, but I don't see this happening. Why is the cache still considered to be stopped?
We are using Ignite 2.7.6 with Kubernetes IP finder.

Looks like you are using a stale cache proxy.
If you are using an in memory-cluster, and created a cache dynamically from a client, then the given cache will disappear when the cluster restarts.
The following code, executed from a client against an in-memory cluster, will generate an exception when the cluster restarts, if the cache in question is not part of a server config, but created dynamically on the client.
Ignition.setClientMode(true);
Ignite = Ignition.start();
IgniteCache cache = ignite.getOrCreateCache("mycache"); //dynamically created cache
int counter = 0;
while(true) {
try {
cache.put(counter, counter);
System.out.println("added counter: " + counter);
} catch (Exception e) {
e.printStackTrace();
}
}
generates
java.lang.IllegalStateException: class org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed to perform cache operation (cache is stopped): mycache
at org.apache.ignite.internal.processors.cache.GridCacheGateway.enter(GridCacheGateway.java:164)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onEnter(GatewayProtectedCacheProxy.java:1555)
You need to watch for disconnect events/exceptions
see: https://ignite.apache.org/docs/latest/clustering/connect-client-nodes
IgniteCache cache = ignite.getOrCreateCache(cachecfg);
try {
cache.put(1, "value");
} catch (IgniteClientDisconnectedException e) {
if (e.getCause() instanceof IgniteClientDisconnectedException) {
IgniteClientDisconnectedException cause = (IgniteClientDisconnectedException) e.getCause();
cause.reconnectFuture().get(); // Wait until the client is reconnected.
// proceed
If this is a persistent cluster consisting of multiple baseline nodes,
you should wait until the cluster activates.
https://ignite.apache.org/docs/latest/clustering/baseline-topology
while (!ignite.cluster().active()) {
System.out.println("Waiting for activation");
Thread.sleep(5000);
}
After re-connect you might need to reinitialize your cache proxy
cache = ignite.getOrCreateCache(cachecfg);
}

Spring websockets Broken pipe & client not receiving messages

I have a few problems with using websockets:
java.io.IOException: Broken Pipe
Client doesn't receive messages
TL;DR
Main things I want to know:
Please list all possible scenarios why the client side closes the connection (apart from refreshing or closing the tab).
Can a Broken Pipe Exception occur, apart from the server sending a message to the client over a broken connection? If yes, then how?
What are the possible scenarios why a server doesn't send a message, although the server does send heartbeats? (When this happens, I need to restart the application for it to work again. This is a terrible solution, because it already is in production.)
I have a SpringMVC project that uses websockets; SockJS client side and org.springframework.web.socket.handler.TextWebSocketHandler server side.
A JSON is generated server side and send to the client. Sometimes, I get a java.io.IOException: Broken Pipe. I googled/StackOverflowed a lot and found too many things I don't understand, but the reason is probably the connection is closed client side and the server still sends a message (for example, a heartbeat). Does this sound okay? What are other causes for this exception to arise? What are the reasons for the client side to close the connection (apart from refreshing or closing the tab)?
Also, sometimes the client side doesn't get any messages from the server, although the server should send them. I log before and after sending the message, and both log statements are printed. Does anyone has an idea why this can occur? I have no errors in the console log of Chrome. Refreshing the page doesn't work, I need to restart the spring project...
If you need more info, please leave a comment.
Client side
function connect() {
var socket = new SockJS('/ws/foo');
socket.onopen = function () {
socket.send(fooId); // ask server for Foo with id fooId.
};
socket.onmessage = function (e) {
var foo = JSON.parse(e.data);
// Do something with foo.
};
}
Server side
Service
#Service
public class FooService implements InitializingBean {
public void updateFoo(...) {
// Update some fields of Foo.
...
// Send foo to clients.
FooUpdatesHandler.sendFooToSubscribers(foo);
}
}
WebSocketHandler
public class FooUpdatesHandler extends ConcurrentTextWebSocketHandler {
// ConcurrentTextWebSocketHandler taken from https://github.com/RWTH-i5-IDSG/BikeMan (Apache License version 2.0)
private static final Logger logger = LoggerFactory.getLogger(FooUpdatesHandler.class);
private static final ConcurrentHashMap<String, ConcurrentHashMap<String, WebSocketSession>> fooSubscriptions =
new ConcurrentHashMap<>();
public static void sendFooToSubscribers(Foo foo) {
Map<String, WebSocketSession> sessionMap = fooSubscriptions.get(foo.getId());
if (sessionMap != null) {
String fooJson = null;
try {
fooJson = new ObjectMapper().writeValueAsString(foo);
} catch (JsonProcessingException ignored) {
return;
}
for (WebSocketSession subscription : sessionMap.values()) {
try {
logger.info("[fooId={} sessionId={}] Sending foo...", foo.getId(), subscription.getId());
subscription.sendMessage(new TextMessage(fooJson));
logger.info("[fooId={} sessionId={}] Foo send.", foo.getId(), subscription.getId());
} catch (IOException e) {
logger.error("Socket sendFooToSubscribers [fooId={}], exception: ", foo.getId(), e);
}
}
}
}
}

Just an educated guess: Check your networking gear. Maybe there is a misconfigured firewall terminating these connections; or even worse, broken networking gear causing the connections to terminate. If your server has multiple NICs (which is likely the case), it's also possible that there is some misconfiguration using these NICs, or in connecting to the server via different NICs.

If this problem occurs accidently than it is possible that you have some problem with any cache - please check if spring or SocksJS has own caches for socket interaction.
Is this happens on your devices (or on devices that you control)?
Additionally I can suggest you to use some network packet analyzer like wireshark. With such tool you'll see current network activity 'online'
Some external reasons that can desctroy connection without correct stopping it (and you cannot manage it without connection checkups):
device suspend/poweroff
network failure
browser closing on some error
I think that is a small part of full list of possible reasons to destroy connection.

How to setup timeout for ejb lookup in websphere 7.0

I have developed a standalone Javase client which performs an EJB Lookup to a remote server and executes its method.The Server application is in EJB 3.0
Under some strange magical but rare situations my program hangs indefinetly, on looking inside the issue it seems that while looking up the ejb on the server, I never get the response from the server and it also never times out.
I would like to know if there is a property or any other way through which we can setup the lookup time in client or at the server side.

There is a very nice article that discusses ORB configuration best practices at DeveloperWorks here. I'm quoting the three different settings that can be configured at client (you, while doing a lookup and executing a method at a remote server);
Connect timeout: Before the client ORB can even send a request to a server, it needs to establish an IIOP connection (or re-use an
existing one). Under normal circumstances, the IIOP and underlying TCP
connect operations should complete very fast. However, contention on
the network or another unforeseen factor could slow this down. The
default connect timeout is indefinite, but the ORB custom property
com.ibm.CORBA.ConnectTimeout (in seconds) can be used to change the
timeout.
Locate request timeout: Once a connection has been established and a client sends an RMI request to the server, then LocateRequestTimeout
can be used to limit the time for the CORBA LocateRequest (a CORBA
“ping”) for the object. As a result, the LocateRequestTimeout should
be less than or equal to the RequestTimeout because it is a much
shorter operation in terms of data sent back and forth. Like the
RequestTimeout, the LocateRequestTimeout defaults to 180 seconds.
Request timeout: Once the client ORB has an established TCP connection to the server, it will send the request across. However, it
will not wait indefinitely for a response, by default it will wait for
180 seconds. This is the ORB request timeout interval. This can
typically be lowered, but it should be in line with the expected
application response times from the server.

You can try the following code, which performs task & then waits at most the time specified.
Future<Object> future = executorService.submit(new Callable<Object>() {
public Object call() {
return lookup(JNDI_URL);
}
});
try {
Object result = future.get(20L, TimeUnit.SECONDS); //- Waiting for at most 20 sec
} catch (ExecutionException ex) {
logger.log(LogLevel.ERROR,ex.getMessage());
return;
}
Also, the task can be cancelled by future.cancel(true).

Remote JNDI uses the ORB, so the only option available is com.ibm.CORBA.RequestTimeout, but that will have an affect on all remote calls. As described in the 7.0 InfoCenter, the default value is 180 (3 minutes).

BindException/Too many file open while using HttpClient under load

I have got 1000 dedicated Java threads where each thread polls a corresponding url every one second.
public class Poller {
public static Node poll(Node node) {
GetMethod method = null;
try {
HttpClient client = new HttpClient(new SimpleHttpConnectionManager(true));
......
} catch (IOException ex) {
ex.printStackTrace();
} finally {
method.releaseConnection();
}
}
}
The threads are run every one second:
for (int i=0; i <1000; i++) {
MyThread thread = threads.get(i) // threads is a static field
if(thread.isAlive()) {
// If the previous thread is still running, let it run.
} else {
thread.start();
}
}
The problem is if I run the job every one second I get random exceptions like these:
java.net.BindException: Address already in use
INFO httpclient.HttpMethodDirector: I/O exception (java.net.BindException) caught when processing request: Address already in use
INFO httpclient.HttpMethodDirector: Retrying request
But if I run the job every 2 seconds or more, everything runs fine.
I even tried shutting down the instance of SimpleHttpConnectionManager() using shutDown() with no effect.
If I do netstat, I see thousands of TCP connections in TIME_WAIT state, which means they are have been closed and are clearing up.
So to limit the no of connections, I tried using a single instance of HttpClient and use it like this:
public class MyHttpClientFactory {
private static MyHttpClientFactory instance = new HttpClientFactory();
private MultiThreadedHttpConnectionManager connectionManager;
private HttpClient client;
private HttpClientFactory() {
init();
}
public static HttpClientFactory getInstance() {
return instance;
}
public void init() {
connectionManager = new MultiThreadedHttpConnectionManager();
HttpConnectionManagerParams managerParams = new HttpConnectionManagerParams();
managerParams.setMaxTotalConnections(1000);
connectionManager.setParams(managerParams);
client = new HttpClient(connectionManager);
}
public HttpClient getHttpClient() {
if (client != null) {
return client;
} else {
init();
return client;
}
}
}
However after running for exactly 2 hours, it starts throwing 'too many open files' and eventually cannot do anything at all.
ERROR java.net.SocketException: Too many open files
INFO httpclient.HttpMethodDirector: I/O exception (java.net.SocketException) caught when processing request: Too many open files
INFO httpclient.HttpMethodDirector: Retrying request
I should be able to increase the no of connections allowed and make it work, but I would just be prolonging the evil. Any idea what is the best practise to use HttpClient in a situation like above?
Btw, I am still on HttpClient3.1.

This happened to us a few months back. First, double check to make sure you really are calling releaseConnection() every time. But even then, the OS doesn't actually reclaim the TCP connections all at once. The solution is to use the Apache HTTP Client's MultiThreadedHttpConnectionManager. This pools and reuses the connections.
See http://hc.apache.org/httpclient-3.x/performance.html for more performance tips.
Update: Whoops, I didn't read the lower code sample. If you're doing releaseConnection() and using MultiThreadedHttpConnectionManager, consider whether your OS limit on open files per process is set high enough. We had that problem too, and needed to extend the limit a bit.

There is nothing wrong with first error. You just depleted empirical ports available. Each TCP connection can stay in TIME_WAIT state for 2 minutes. You generate 2000/seconds. Soon or later, the socket can't find any unused local port and you will get that error. TIME_WAIT designed exactly for this purpose. Without it, your system might hijack a previous connection.
The second error means you have too many sockets open. On some system, there is a limit of 1K open files. Maybe you just hit that limit due to lingering sockets and other open files. On Linux, you can change this limit using
ulimit -n 2048
But that's limited by a system-wide max value.

As sudo or root edit the /etc/security/limits.conf file. At the end of the file just above “# End of File” enter the following values:
* soft nofile 65535
* hard nofile 65535
This will set the number of open files to unlimited.

Threading in javax.websockets / Tyrus

I'm writing a Java app that sends and receives messages from a websocket server. When the app receives a message it might take some time to process it. Therefore I'm trying to use multiple threads to receive messages. To my understanding Grizzly has selector threads as well as worker threads. By default there is 1 selecter thread and 2 worker threads, in the following example I'm trying to increase those to 5 and 10 respectively.
In the below example I'm pausing the the thread that calls the onMessage method for 10sec to simulate processing of the incoming information. The information comes in every second, therefore 10 threads should be able to handle the amount of traffic.
When I profile the run, only 1 selector thread is running and 2 working threads. Furthermore, messages are only received at a 10sec interval. Indicating that only 1 thread is handling the traffic - I find this very odd. During profiling, one worker thread e.g. Grizzly(1) receives the first message sent. Then 10 seconds later 'Grizzly(2)' receives the second message - then Grizzly(2) keeps on receiving the messages, and Grizzly(1) does not perform any actions.
Can someone please explain this odd behavior and how to change it to e.g. 10 threads constantly waiting in line for a message?
Main:
public static void main(String[] args) {
WebsocketTextClient client = new WebsocketTextClient();
client.connect();
for (int i = 0; i < 60; i++) {
client.send("Test message " + i);
try {
Thread.sleep(1000);
} catch (Exception e) {
System.out.println("Error sleeping!");
}
}
}
WebsocketTextClient.java:
import java.net.URI;
import javax.websocket.ClientEndpointConfig;
import javax.websocket.EndpointConfig;
import javax.websocket.Session;
import javax.websocket.Endpoint;
import javax.websocket.MessageHandler;
import org.glassfish.tyrus.client.ClientManager;
import org.glassfish.tyrus.client.ThreadPoolConfig;
import org.glassfish.tyrus.container.grizzly.client.GrizzlyClientProperties;
public class WebsocketTextClient {
private ClientManager client;
private ClientEndpointConfig clientConfig;
WebsocketTextClientEndpoint endpoint;
public WebsocketTextClient() {
client = ClientManager.createClient();
client.getProperties().put(GrizzlyClientProperties.SELECTOR_THREAD_POOL_CONFIG, ThreadPoolConfig.defaultConfig().setMaxPoolSize(5));
client.getProperties().put(GrizzlyClientProperties.WORKER_THREAD_POOL_CONFIG, ThreadPoolConfig.defaultConfig().setMaxPoolSize(10));
}
public boolean connect() {
try {
clientConfig = ClientEndpointConfig.Builder.create().build();
endpoint = new WebsocketTextClientEndpoint();
client.connectToServer(endpoint, clientConfig, new URI("wss://echo.websocket.org"));
} catch (Exception e) {
return false;
}
return true;
}
public boolean disconnect() {
return false;
}
public boolean send(String message) {
endpoint.session.getAsyncRemote().sendText(message);
return true;
}
private class WebsocketTextClientEndpoint extends Endpoint {
Session session;
#Override
public void onOpen(Session session, EndpointConfig config) {
System.out.println("Connection opened");
this.session = session;
session.addMessageHandler(new WebsocketTextClientMessageHandler());
}
}
private class WebsocketTextClientMessageHandler implements MessageHandler.Whole<String> {
#Override
public void onMessage(String message) {
System.out.println("Message received from " + Thread.currentThread().getName() + " " + message);
try {
Thread.sleep(10000);
} catch (Exception e) {
System.out.println("Error sleeping!");
}
System.out.println("Resuming");
}
}
}

What you appear to be asking is for WebSockets to be able to receive multiple messages sent by the same client connection, to process those messages in separate threads, and to send the responses when they are ready - which means, potentially out of order. This scenario can only happen if the client is multi-threaded.
To deal with multiple threads on the same WebSocket session would generally require the ability for WebSockets to multiplex the data going to and from the client. This is not currently a feature of WebSockets, but could certainly be built on top of it. However, multiplexing those client and server threads on a single channel introduces a fair bit of complexity, because you need to stop all the client and server threads from inadvertently overwriting or starving one another.
The Java spec for MessageHandler is perhaps a little ambiguous about the threading model;
https://docs.oracle.com/javaee/7/api/javax/websocket/MessageHandler.html says:
Each web socket session uses no more than one thread at a time to call its MessageHandlers.
But the important term here is "socket session". If your client is sending multiple messages within the same WebSocket session, the server side handler will execute within a single thread. This doesn't mean you can't do lots of interesting stuff within the thread, particularly if you're using Input/OutputStreams (or Writers) on both ends. It does mean that communication with the client is mediated by just one thread. If you want to multiplex the communication, you'd have to write something on top of the socket to do so; that would include developing your own threading model for dispatching the requests.
An easier solution would be to create a new Session for each client request. Each client request starts a session (ie, TCP connection), sends the data, and waits for the result. This gives you multiple MessageHandler threads - one per session, per the spec.
This is the most straightforward way to get multi-threading on the server side; any other approach will tend to need a multiplexing mechanism - which, depending on your use case, is perhaps not worth the effort, and certainly carries some complexity and risk.
If you are concerned about the number of sessions (TCP/HTTP connections) between client/s and server/s, you could consider creating a pool of Sessions on the client side, and use each client Session one at a time, returning the session to the pool whenever the client is done with it.
Finally, perhaps not directly relevant: I found that when I used Payara Micro to serve the WebSocket endpoint, I needed to set this:
<resources>
...
<managed-executor-service maximum-pool-size="200" core-pool-size="10" long-running-tasks="true" keep-alive-seconds="300" hung-after-seconds="300" task-queue-capacity="20000" jndi-name="concurrent/__defaultManagedExecutorService" object-type="system-all"></managed-executor-service>
The default ManagedExecutorService only provides a single thread. This appears to be the case in Glassfish as well. This had me running around for hours thinking that I didn't understand the threading model, when it was just the pool size that was confusing me.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.