Apache Curator - Zookeeper connection loss exception, possible memory leak

Apache Curator - Zookeeper connection loss exception, possible memory leak - java

I have been working on a process that continuously monitors a distributed atomic long counter. It monitors it every minute using the following class ZkClient's method getCounter. In fact, I have multiple threads running each of which are monitoring a different counter (distributed atomic long) stored in the Zookeeper nodes. Each thread specifies the path of the counter via the parameters of the getCounter method.
public class TagserterZookeeperManager {
public enum ZkClient {
COUNTER("10.11.18.25:2181"); // Integration URL
private CuratorFramework client;
private ZkClient(String servers) {
Properties props = TagserterConfigs.ZOOKEEPER.getProperties();
String zkFromConfig = props.getProperty("servers", "");
if (zkFromConfig != null && !zkFromConfig.isEmpty()) {
servers = zkFromConfig.trim();
}
ExponentialBackoffRetry exponentialBackoffRetry = new ExponentialBackoffRetry(1000, 3);
client = CuratorFrameworkFactory.newClient(servers, exponentialBackoffRetry);
client.start();
}
public CuratorFramework getClient() {
return client;
}
}
public static String buildPath(String ... node) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < node.length; i++) {
if (node[i] != null && !node[i].isEmpty()) {
sb.append("/");
sb.append(node[i]);
}
}
return sb.toString();
}
public static DistributedAtomicLong getCounter(String taskType, int hid, String jobId, String countType) {
String path = buildPath(taskType, hid+"", jobId, countType);
Builder builder = PromotedToLock.builder().lockPath(path + "/lock").retryPolicy(new ExponentialBackoffRetry(10, 10));
DistributedAtomicLong count = new DistributedAtomicLong(ZkClient.COUNTER.getClient(), path, new RetryNTimes(5, 20), builder.build());
return count;
}
}
From within the threads, this is how I am calling this method:
DistributedAtomicLong counterTotal = TagserterZookeeperManager
.getCounter("testTopic", hid, jobId, "test");
Now it seems like after the threads have run for a few hours, at one stage I start getting the following org.apache.zookeeper.KeeperException$ConnectionLossException exception inside the getCounter method where it tries to read the count:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /contentTaskProd
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073)
at org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:215)
at org.apache.curator.utils.EnsurePath$InitialHelper$1.call(EnsurePath.java:148)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
at org.apache.curator.utils.EnsurePath$InitialHelper.ensure(EnsurePath.java:141)
at org.apache.curator.utils.EnsurePath.ensure(EnsurePath.java:99)
at org.apache.curator.framework.recipes.atomic.DistributedAtomicValue.getCurrentValue(DistributedAtomicValue.java:254)
at org.apache.curator.framework.recipes.atomic.DistributedAtomicValue.get(DistributedAtomicValue.java:91)
at org.apache.curator.framework.recipes.atomic.DistributedAtomicLong.get(DistributedAtomicLong.java:72)
...
I keep getting this exception from thereon for a while and I get the feeling it is causing some internal memory leaks that eventually causes an OutOfMemory error and the whole process bails out. Does anybody have any idea what the reason for this could be? Why would Zookeeper suddenly start throwing the connection loss exception? After the process bails out, I can manually connect to Zookeeper through another small console program that I have written (also using curator) and all look good there.

In order to monitor a node in Zookeeper using curator you can use the NodeCache this won't solve your connection problems.... but instead of polling the node once a minute you can get a push event when it changes.
In my experience, the NodeCache handles quite well disconnection and resume of connections.

Related

Java memory leak with AWS CloudWatch

I'm using AWS CloudWatch PutLogEvents to store client side logs into the CloudWatch and I'm using this reference
https://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_PutLogEvents.html
Below is the sample implementation that I'm using for the PutLogEvents and I'm using a foreach to push the list of InputLogEvents into the CloudWatch.
https://docs.aws.amazon.com/code-samples/latest/catalog/javav2-cloudwatch-src-main-java-com-example-cloudwatch-PutLogEvents.java.html
After continuously pushing logs for about one or two hours the service gets crashed and gets out of memory error. But when I run the application locally it won't get crashed and run more than one or two hours. But I haven't continuously run it on locally to check the crashing time. I was wondering is there any memory leak issue with AWS CloudWatch implementation. Because before the implementation the service works for months without any heap issues.
Here is my service implementation and through the controller, I'm calling this putLogEvents method. My only suspect area is the cwLogDto for each loop and in there I'm assigning the inputLogEvent as null. So it should be garbage collected. Anyway, I have tested this without sending the list of dtos and still, I'm getting the same OOM error.
#Override
public TransportDto putLogEvents(List<CWLogDto> cwLogDtos, UserType userType) throws Exception {
TransportDto transportDto = new TransportDto();
CloudWatchLogsClient logsClient = CloudWatchLogsClient.builder().region(Region.of(region))
.build();
PutLogEventsResponse putLogEventsResponse = putCWLogEvents(logsClient, userType, cwLogDtos);
logsClient.close();
transportDto.setResponse(putLogEventsResponse.sdkHttpResponse());
return transportDto;
}
private PutLogEventsResponse putCWLogEvents(CloudWatchLogsClient logsClient, UserType userType, List<CWLogDto> cwLogDtos) throws Exception{
DateTimeFormatter formatter = DateTimeFormatter.ofPattern(logStreamPattern);
String streamName = LocalDateTime.now().format(formatter);
String logGroupName = logGroupOne;
if(userType.equals(UserType.TWO))
logGroupName =logGroupTwo;
log.info("Total Memory before (in bytes): {}" , Runtime.getRuntime().totalMemory());
log.info("Free Memory before (in bytes): {}" , Runtime.getRuntime().freeMemory());
log.info("Max Memory before (in bytes): {}" , Runtime.getRuntime().maxMemory());
DescribeLogStreamsRequest logStreamRequest = DescribeLogStreamsRequest.builder()
.logGroupName(logGroupName)
.logStreamNamePrefix(streamName)
.build();
DescribeLogStreamsResponse describeLogStreamsResponse = logsClient.describeLogStreams(logStreamRequest);
// Assume that a single stream is returned since a specific stream name was specified in the previous request. if not will create a new stream
String sequenceToken = null;
if(!describeLogStreamsResponse.logStreams().isEmpty()){
sequenceToken = describeLogStreamsResponse.logStreams().get(0).uploadSequenceToken();
describeLogStreamsResponse = null;
}
else{
CreateLogStreamRequest request = CreateLogStreamRequest.builder()
.logGroupName(logGroupName)
.logStreamName(streamName)
.build();
logsClient.createLogStream(request);
request = null;
}
// Build an input log message to put to CloudWatch.
List<InputLogEvent> inputLogEventList = new ArrayList<>();
for (CWLogDto cwLogDto : cwLogDtos) {
InputLogEvent inputLogEvent = InputLogEvent.builder()
.message(new ObjectMapper().writeValueAsString(cwLogDto))
.timestamp(System.currentTimeMillis())
.build();
inputLogEventList.add(inputLogEvent);
inputLogEvent = null;
}
log.info("Total Memory after (in bytes): {}" , Runtime.getRuntime().totalMemory());
log.info("Free Memory after (in bytes): {}" , Runtime.getRuntime().freeMemory());
log.info("Max Memory after (in bytes): {}" , Runtime.getRuntime().maxMemory());
// Specify the request parameters.
// Sequence token is required so that the log can be written to the
// latest location in the stream.
PutLogEventsRequest putLogEventsRequest = PutLogEventsRequest.builder()
.logEvents(inputLogEventList)
.logGroupName(logGroupName)
.logStreamName(streamName)
.sequenceToken(sequenceToken)
.build();
inputLogEventList = null;
logStreamRequest = null;
return logsClient.putLogEvents(putLogEventsRequest);
}
CWLogDto
#Data
#JsonInclude(JsonInclude.Include.NON_NULL)
public class CWLogDto {
private Long userId;
private UserType userType;
private LocalDateTime timeStamp;
private Integer logLevel;
private String logType;
private String source;
private String message;
}
Heap dump summary
Any help will be greatly appreciated.

Java gRPC server inbound vs outbound threads

So I have this grpc Java server:
#Bean(initMethod = "start", destroyMethod = "shutdown")
public Server bodyShopGrpcServer(#Autowired BodyShopServiceInt bodyShopServiceInt) {
return ServerBuilder.forPort(bodyShopGrpcServerPort)
.executor(Executors.newFixedThreadPool(12))
.addService(new BodyShopServiceGrpcGw(bodyShopServiceInt))
.build();
}
..and this client:
long overallStart = System.nanoTime();
int iterations = 10000;
List<Long> results = new CopyOnWriteArrayList<>();
ExecutorService executorService = Executors.newFixedThreadPool(bodyShopGrpcThreadPoolSize);
ManagedChannel channel =
InProcessChannelBuilder.forName("bodyShopGrpcInProcessServer")
.executor(executorService)
.build();
BodyShopServiceGrpc.BodyShopServiceStub bodyShopServiceStub =
BodyShopServiceGrpc.newStub(channel);
for (int i = 0; i < iterations; i++) {
long start = System.nanoTime();
StreamObserver<MakeBodyResponse> responseObserver =
new StreamObserver<>() {
#Override
public void onNext(MakeBodyResponse makeBodyResponse) {
long stop = System.nanoTime();
results.add(stop - start);
}
#Override
public void onError(Throwable throwable) {
Status status = Status.fromThrowable(throwable);
logger.error("Error status: {}", status);
}
#Override
public void onCompleted() {}
};
bodyShopServiceStub.makeBody(
MakeBodyRequest.newBuilder()
.setBody(CarBody.values()[random.nextInt(CarBody.values().length)].toString())
.build(),
responseObserver);
}
channel
.shutdown()
.awaitTermination(
10, TimeUnit.SECONDS);
long sum = results.stream().reduce(0L, Math::addExact);
BigDecimal avg =
BigDecimal.valueOf(sum).divide(BigDecimal.valueOf(iterations), RoundingMode.HALF_DOWN);
long overallStop = System.nanoTime();
This gives me average round-trip latency and overall time for a batch of 10000.
Now what bothers me is that latency is ~30-50% of overall batch time.
I assume this is because all of the server threads are being assign to serve client requests and there's no thread left in the pool to serve callbacks.
Is there a way how to tune this? I mean, it's not possible to set a different thread pool for requests and callbacks.
I know there's a streaming API in grpc, is that a preferred/only way to reduce round-trip latency?

Thx #Eric Anderson it did not occur to me.
Plotted the results and you're absolutely right:
latency plot
My assumption that callback is waiting for an available thread was wrong, it's just that all the requests enter the system at the same time - start measuring at the same time. In fact I was comparing sync. vs. async. values, while this was working for sync. calls for async. it's clearly wrong.

Drools timed rule execution via Java API

I want to create a time-based rule that is being triggered every 5 minutes, and Drools documentation states that:
Conversely when the Drools engine runs in passive mode (i.e.: using fireAllRules instead of fireUntilHalt) by default it doesn’t fire consequences of timed rules unless fireAllRules isn’t invoked again. However it is possible to change this default behavior by configuring the KieSession with a TimedRuleExecutionOption as shown in the following example
KieSessionConfiguration ksconf = KieServices.Factory.get().newKieSessionConfiguration();
ksconf.setOption( TimedRuleExecutionOption.YES );
KSession ksession = kbase.newKieSession(ksconf, null);
However, I am not accessing the KieSession object directly because I am using the Java REST API to send requests to a Drools project deployed on KieExecution Server like so (example taken directly from the Drools documentation):
public class MyConfigurationObject {
private static final String URL = "http://localhost:8080/kie-server/services/rest/server";
private static final String USER = "baAdmin";
private static final String PASSWORD = "password#1";
private static final MarshallingFormat FORMAT = MarshallingFormat.JSON;
private static KieServicesConfiguration conf;
private static KieServicesClient kieServicesClient;
public static void initializeKieServerClient() {
conf = KieServicesFactory.newRestConfiguration(URL, USER, PASSWORD);
conf.setMarshallingFormat(FORMAT);
kieServicesClient = KieServicesFactory.newKieServicesClient(conf);
}
public void executeCommands() {
String containerId = "hello";
System.out.println("== Sending commands to the server ==");
RuleServicesClient rulesClient = kieServicesClient.getServicesClient(RuleServicesClient.class);
KieCommands commandsFactory = KieServices.Factory.get().getCommands();
Command<?> insert = commandsFactory.newInsert("Some String OBJ");
Command<?> fireAllRules = commandsFactory.newFireAllRules();
Command<?> batchCommand = commandsFactory.newBatchExecution(Arrays.asList(insert, fireAllRules));
ServiceResponse<ExecutionResults> executeResponse = rulesClient.executeCommandsWithResults(containerId, batchCommand);
if(executeResponse.getType() == ResponseType.SUCCESS) {
System.out.println("Commands executed with success! Response: ");
System.out.println(executeResponse.getResult());
} else {
System.out.println("Error executing rules. Message: ");
System.out.println(executeResponse.getMsg());
}
}
}
so I'm a bit confused as to how i can pass this TimedRuleExecutionOption to the session?
I've already found a workaround by sending a FireAllRules command periodically but I'd like to know if I can configure this session option so that I don't have to add periodical triggering for every timed event I want to create.
Also, I've tried using FireUntilHalt instead of FireAllRules, but to my understanding that command blocks the execution thread on the server and I have to send a HaltCommand at some point, all of which I would like to avoid since I have a multi-threaded client that sends events to the server.

pass "-Ddrools.timedRuleExecution=true" while starting server instance where kie-server.war is deployed.

You can use drools cron function. It acts as a timer and invoke rule based on the cron expresion. Example to execute a rule every 5 minutes :
rule "Send SMS every 5 minutes"
timer (cron:* 0/5 * * * ?)
when
$a : Event( )
then
end
you can find explanation here

Flink Kafka - how to make App run in Parallel?

I am creating a app in Flink to
Read Messages from a topic
Do some simple process on it
Write Result to a different topic
My code does work, however it does not run in parallel
How do I do that?
It seems my code runs only on one thread/block?
On the Flink Web Dashboard:
App goes to running status
But, there is only one block shown in the overview subtasks
And Bytes Received / Sent, Records Received / Sent is always zero ( no Update )
Here is my code, please assist me in learning how to split my app to be able to run in parallel, and am I writing the app correctly?
public class SimpleApp {
public static void main(String[] args) throws Exception {
// create execution environment INPUT
StreamExecutionEnvironment env_in =
StreamExecutionEnvironment.getExecutionEnvironment();
// event time characteristic
env_in.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
// production Ready (Does NOT Work if greater than 1)
env_in.setParallelism(Integer.parseInt(args[0].toString()));
// configure kafka consumer
Properties properties = new Properties();
properties.setProperty("zookeeper.connect", "localhost:2181");
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("auto.offset.reset", "earliest");
// create a kafka consumer
final DataStream<String> consumer = env_in
.addSource(new FlinkKafkaConsumer09<>("test", new
SimpleStringSchema(), properties));
// filter data
SingleOutputStreamOperator<String> result = consumer.filter(new
FilterFunction<String>(){
#Override
public boolean filter(String s) throws Exception {
return s.substring(0, 2).contentEquals("PS");
}
});
// Process Data
// Transform String Records to JSON Objects
SingleOutputStreamOperator<JSONObject> data = result.map(new
MapFunction<String, JSONObject>()
{
#Override
public JSONObject map(String value) throws Exception
{
JSONObject jsnobj = new JSONObject();
if(value.substring(0, 2).contentEquals("PS"))
{
// 1. Raw Data
jsnobj.put("Raw_Data", value.substring(0, value.length()-6));
// 2. Comment
int first_index_comment = value.indexOf("$");
int last_index_comment = value.lastIndexOf("$") + 1;
// - set comment
String comment =
value.substring(first_index_comment, last_index_comment);
comment = comment.substring(0, comment.length()-6);
jsnobj.put("Comment", comment);
}
else {
jsnobj.put("INVALID", value);
}
return jsnobj;
}
});
// Write JSON to Kafka Topic
data.addSink(new FlinkKafkaProducer09<JSONObject>("localhost:9092",
"FilteredData",
new SimpleJsonSchema()));
env_in.execute();
}
}
My code does work, but it seems to run only on a single thread
( One block shown ) in web interface ( No passing of data, hence the bytes sent / received are not updated ).
How do I make it run in parallel ?

To run your job in parallel you can do 2 things:
Increase the parallelism of your job at the env level - i.e. do something like
StreamExecutionEnvironment env_in =
StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(4);
But this would only increase parallelism at flink end after it reads the data, so if the source is producing data faster it might not be fully utilized.
To fully parallelize your job, setup multiple partitions for your kafka topic, ideally the amount of parallelism you would want with your flink job. So, you might want to do something like below when you are creating your kafka topic:
bin/kafka-topics.sh --create --zookeeper localhost:2181
--replication-factor 3 --partitions 4 --topic test

SSH Server Identification never received - Handshake Deadlock [SSHJ]

We're having some trouble trying to implement a Pool of SftpConnections for our application.
We're currently using SSHJ (Schmizz) as the transport library, and facing an issue we simply cannot simulate in our development environment (but the error keeps showing randomly in production, sometimes after three days, sometimes after just 10 minutes).
The problem is, when trying to send a file via SFTP, the thread gets locked in the init method from schmizz' TransportImpl class:
#Override
public void init(String remoteHost, int remotePort, InputStream in, OutputStream out)
throws TransportException {
connInfo = new ConnInfo(remoteHost, remotePort, in, out);
try {
if (config.isWaitForServerIdentBeforeSendingClientIdent()) {
receiveServerIdent();
sendClientIdent();
} else {
sendClientIdent();
receiveServerIdent();
}
log.info("Server identity string: {}", serverID);
} catch (IOException e) {
throw new TransportException(e);
}
reader.start();
}
isWaitForServerIdentBeforeSendingClientIdent is FALSE for us, so first of all the client (we) send our identification, as appears in logs:
"Client identity String: blabla"
Then it's turn for the receiveServerIdent:
private void receiveServerIdent() throws IOException
{
final Buffer.PlainBuffer buf = new Buffer.PlainBuffer();
while ((serverID = readIdentification(buf)).isEmpty()) {
int b = connInfo.in.read();
if (b == -1)
throw new TransportException("Server closed connection during identification exchange");
buf.putByte((byte) b);
}
}
The thread never gets the control back, as the server never replies with its identity. Seems like the code is stuck in this While loop. No timeouts, or SSH exceptions are thrown, my client just keeps waiting forever, and the thread gets deadlocked.
This is the readIdentification method's impl:
private String readIdentification(Buffer.PlainBuffer buffer)
throws IOException {
String ident = new IdentificationStringParser(buffer, loggerFactory).parseIdentificationString();
if (ident.isEmpty()) {
return ident;
}
if (!ident.startsWith("SSH-2.0-") && !ident.startsWith("SSH-1.99-"))
throw new TransportException(DisconnectReason.PROTOCOL_VERSION_NOT_SUPPORTED,
"Server does not support SSHv2, identified as: " + ident);
return ident;
}
Seems like ConnectionInfo's inputstream never gets data to read, as if the server closed the connection (even if, as said earlier, no exception is thrown).
I've tried to simulate this error by saturating the negotiation, closing sockets while connecting, using conntrack to kill established connections while the handshake is being made, but with no luck at all, so any help would be HIGHLY appreciated.
: )

I bet following code creates a problem:
String ident = new IdentificationStringParser(buffer, loggerFactory).parseIdentificationString();
if (ident.isEmpty()) {
return ident;
}
If the IdentificationStringParser.parseIdentificationString() returns empty string, it will be returned to the caller method. The caller method will keep calling the while ((serverID = readIdentification(buf)).isEmpty()) since the string is always empty. The only way to break the loop would be if call to int b = connInfo.in.read(); returns -1... but if server keeps sending the data (or resending the data) this condition is never met.
If this is the case I would add some kind of artificial way to detect this like:
private String readIdentification(Buffer.PlainBuffer buffer, AtomicInteger numberOfAttempts)
throws IOException {
String ident = new IdentificationStringParser(buffer, loggerFactory).parseIdentificationString();
numberOfAttempts.incrementAndGet();
if (ident.isEmpty() && numberOfAttempts.intValue() < 1000) { // 1000
return ident;
} else if (numberOfAttempts.intValue() >= 1000) {
throw new TransportException("To many attempts to read the server ident").
}
if (!ident.startsWith("SSH-2.0-") && !ident.startsWith("SSH-1.99-"))
throw new TransportException(DisconnectReason.PROTOCOL_VERSION_NOT_SUPPORTED,
"Server does not support SSHv2, identified as: " + ident);
return ident;
}
This way you would at least confirm that this is the case and can dig further why .parseIdentificationString() returns empty string.

Faced a similar issue where we would see:
INFO [net.schmizz.sshj.transport.TransportImpl : pool-6-thread-2] - Client identity string: blablabla
INFO [net.schmizz.sshj.transport.TransportImpl : pool-6-thread-2] - Server identity string: blablabla
But on some occasions, there were no server response.
Our service would typically wake up and transfer several files simultaneously, one file per connection / thread.
The issue was in the sshd server config, we increased maxStartups from default value 10
(we noticed the problems started shortly after batch sizes increased to above 10)
Default in /etc/ssh/sshd_config:
MaxStartups 10:30:100
Changed to:
MaxStartups 30:30:100
MaxStartups
Specifies the maximum number of concurrent unauthenticated connections to the SSH daemon. Additional connections will be dropped until authentication succeeds or the LoginGraceTime expires for a connection. The default is 10:30:100. Alternatively, random early drop can be enabled by specifying the three colon separated values start:rate:full (e.g. "10:30:60"). sshd will refuse connection attempts with a probability of rate/100 (30%) if there are currently start (10) unauthenticated connections. The probability increases linearly and all connection attempts are refused if the number of unauthenticated connections reaches full (60).
If you cannot control the server, you might have to find a way to limit your concurrent connection attempts in your client code instead.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Apache Curator - Zookeeper connection loss exception, possible memory leak - java

Related

Java memory leak with AWS CloudWatch

Java gRPC server inbound vs outbound threads

Drools timed rule execution via Java API

Flink Kafka - how to make App run in Parallel?

SSH Server Identification never received - Handshake Deadlock [SSHJ]

Categories

Resources