Hystrix circuitBreaker.sleepWindowInMilliseconds not working - java

I'm having a spring boot application which is calling iteratively a mockserver instance via a hystrix command, with a fallback method.
The mockserver is configured to allways respond with status code 500. When running without having circuitBreaker.sleepWindowInMilliseconds, everything works fine, the call is done to the mockserver and then the fallback method is invoked.
After configuring circuitBreaker.sleepWindowInMilliseconds value to 5 minutes or so, I would expect that during 5 minutes no calls are done to the mockserver all the calls being directed to the fallback method, but that's not the case.
It looks like the circuitBreaker.sleepWindowInMilliseconds configuration is ignored.
For instance if I reconfigure the mockservice to reply with status code 200 while the iteration is still running, it would imediately print the "mockservice response",without waiting 5 minutes.
in the spring boot main application class:
#RequestMapping("/iterate")
public void iterate() {
for (int i = 1; i<100; i++ ) {
try {
System.out.println(bookService.readingMockService());
Thread.sleep(3000);
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
}
in the spring boot service :
#HystrixCommand(groupKey = "ReadingMockService", commandKey = "ReadingMockService", threadPoolKey = "ReadingMockService", fallbackMethod = "reliableMock", commandProperties = {
#HystrixProperty(name ="circuitBreaker.sleepWindowInMilliseconds", value = "300000") })
public String readingMockService() {
URI uri = URI.create("http://localhost:1080/askmock");
return this.restTemplate.getForObject(uri, String.class);
}
also the mock server is running on the same machine, being configured like :
new MockServerClient("127.0.0.1", 1080).reset();
new MockServerClient("127.0.0.1", 1080)
.when(request("/askmock"))
.respond(response()
.withStatusCode(500)
.withBody("mockservice response")
.applyDelay());

Found the problem :
This property (...circuitBreaker.sleepWindowInMilliseconds ) works together with another one (...circuitBreaker.requestVolumeThreshold ).
If not specifically set this defaults to 20, meaning that first hystrix will try to connect 20 times the usual way and only afterwards the sleepWindowInMilliseconds will get activated and will go to fallback only.
Also the circuit break opens only if the percentage of failed calls exceeds circuitBreaker.errorThresholdPercentage
and in the same time the total number of failed calls exceeds circuitBreaker.requestVolumeThreshold, all within a window of metrics.rollingStats.timeInMilliseconds

From the docs:
https://github.com/Netflix/Hystrix/wiki/configuration#circuitBreaker.sleepWindowInMilliseconds
and by looking at the source code:
https://github.com/Netflix/Hystrix/blob/master/hystrix-core/src/main/java/com/netflix/hystrix/HystrixCommandProperties.java
Using
#HystrixProperty(name="hystrix.command.ReadingMockService.circuitBreaker.sleepWindowInMilliseconds"
should work.

Related

Having trouble listen vert.x config change

I changed my config-test.json,but application did not print "new config:...",the before scanhander has print .
JsonObject jsonConfig = new JsonObject();
jsonConfig.put("path", "test.json");
ConfigStoreOptions config = new ConfigStoreOptions();
config.setType("file").setOptional(true).setConfig(jsonConfig);
ConfigRetrieverOptions options =
new ConfigRetrieverOptions().addStore(config).setScanPeriod(5000);
ConfigRetriever configRetriever = ConfigRetriever.create(vertx, options);
configRetriever.setBeforeScanHandler(h -> {
System.out.println("config:" + configRetriever.getCachedConfig());
});
configRetriever.listen(change -> {
JsonObject newConfiguration = change.getNewConfiguration();
System.out.println("new config:" + newConfiguration);
JsonObject old = change.getPreviousConfiguration();
System.out.println("old config:" + old);
});
The Javadoc of setBeforeScanHandler says:
Registers a handler called before every scan. This method is mostly used for logging purpose.
Which means that in your case the program will check for changes in the JSON file every five seconds as specified in .setScanPeriod(5000). If the program finds the change during one of those checks, first the configRetriever.setBeforeScanHandler method will trigger, followed by configRetriever.listen.
The order of messages when there is a change will be:
config : {...}
new config:{...}
old config:{...}
The program does not register the config change immediately, but only on the regularly scheduled interval checks, as is stated in the setScanPeriod JavaDoc:
Configures the scan period, in ms. This is the time amount between two
checks of the configuration updates.

"WFLYJMX0012: params and description have different lengths" when stopping Wildfly programmatically

In my code I used to stop the Wildfly (16.0.0.Final) programmatically like this:
[...]
Thread shutdownThread = new Thread(){
#Override
public void run() {
try {
MBeanServer mBeanServer = ManagementFactory.getPlatformMBeanServer();
try {
logger.info("Stopping server...");
ObjectName objectName = new ObjectName("jboss.as:management-root=server");
mBeanServer.invoke(objectName, "shutdown", new Object[] { false, 60 }, new String[] { boolean.class.getName(), int.class.getName() });
} catch ( InstanceNotFoundException | ReflectionException | MBeanException | MalformedObjectNameException e ) {
logger.error("Failed to stop server, error msg is: " + e);
}
} catch ( Exception e ) {
logger.error(e.getMessage(), e);
}
}
};
[...]
Curiously this stopped working. I get the following error message:
WFLYJMX0012: params and description have different lengths: java.lang.IllegalArgumentException:
Any idea on that?
Thanks a lot,
Kai
In wildfly 16, they added a third parameter for graceful shutdown.
mBeanServer.invoke(objectName, "shutdown", new Object[] { false, 60, 60 }, new String[] { boolean.class.getName(), int.class.getName(), int.class.getName() });
While the third parameter is listed as optional, we had the same error until we added it.
https://wildscribe.github.io/WildFly/17.0/index.html for more information.
The error indicates that the actual arguments provided for the JMX operation does not match the signature of the operation itself.
In general, it can be motivated because you switched from a standalone to a domain configuration.
At least, you will find a similar problem if you use the CLI: in this case, you need to indicate the host name to shutdown as the first argument of the invocation, the rest of the arguments, whether restart again and the different timeouts, remains the same.
But in this concrete case, perhaps, it is not clear for me reading your question, you switched to Wildly version 16.0.0.Final from a previous one.
As you can see in the WildFly 16.0 Model Reference or if you inspect, with JConsole or some other similar tool, the mbeans exposed by the JMX tree, the shutdown operation signature is the following:
As we also mention, although the three arguments are not required, the mbean signature defines them, and all the arguments should be provided.

Drools timed rule execution via Java API

I want to create a time-based rule that is being triggered every 5 minutes, and Drools documentation states that:
Conversely when the Drools engine runs in passive mode (i.e.: using fireAllRules instead of fireUntilHalt) by default it doesn’t fire consequences of timed rules unless fireAllRules isn’t invoked again. However it is possible to change this default behavior by configuring the KieSession with a TimedRuleExecutionOption as shown in the following example
KieSessionConfiguration ksconf = KieServices.Factory.get().newKieSessionConfiguration();
ksconf.setOption( TimedRuleExecutionOption.YES );
KSession ksession = kbase.newKieSession(ksconf, null);
However, I am not accessing the KieSession object directly because I am using the Java REST API to send requests to a Drools project deployed on KieExecution Server like so (example taken directly from the Drools documentation):
public class MyConfigurationObject {
private static final String URL = "http://localhost:8080/kie-server/services/rest/server";
private static final String USER = "baAdmin";
private static final String PASSWORD = "password#1";
private static final MarshallingFormat FORMAT = MarshallingFormat.JSON;
private static KieServicesConfiguration conf;
private static KieServicesClient kieServicesClient;
public static void initializeKieServerClient() {
conf = KieServicesFactory.newRestConfiguration(URL, USER, PASSWORD);
conf.setMarshallingFormat(FORMAT);
kieServicesClient = KieServicesFactory.newKieServicesClient(conf);
}
public void executeCommands() {
String containerId = "hello";
System.out.println("== Sending commands to the server ==");
RuleServicesClient rulesClient = kieServicesClient.getServicesClient(RuleServicesClient.class);
KieCommands commandsFactory = KieServices.Factory.get().getCommands();
Command<?> insert = commandsFactory.newInsert("Some String OBJ");
Command<?> fireAllRules = commandsFactory.newFireAllRules();
Command<?> batchCommand = commandsFactory.newBatchExecution(Arrays.asList(insert, fireAllRules));
ServiceResponse<ExecutionResults> executeResponse = rulesClient.executeCommandsWithResults(containerId, batchCommand);
if(executeResponse.getType() == ResponseType.SUCCESS) {
System.out.println("Commands executed with success! Response: ");
System.out.println(executeResponse.getResult());
} else {
System.out.println("Error executing rules. Message: ");
System.out.println(executeResponse.getMsg());
}
}
}
so I'm a bit confused as to how i can pass this TimedRuleExecutionOption to the session?
I've already found a workaround by sending a FireAllRules command periodically but I'd like to know if I can configure this session option so that I don't have to add periodical triggering for every timed event I want to create.
Also, I've tried using FireUntilHalt instead of FireAllRules, but to my understanding that command blocks the execution thread on the server and I have to send a HaltCommand at some point, all of which I would like to avoid since I have a multi-threaded client that sends events to the server.
pass "-Ddrools.timedRuleExecution=true" while starting server instance where kie-server.war is deployed.
You can use drools cron function. It acts as a timer and invoke rule based on the cron expresion. Example to execute a rule every 5 minutes :
rule "Send SMS every 5 minutes"
timer (cron:* 0/5 * * * ?)
when
$a : Event( )
then
end
you can find explanation here

Vert.x performance drop when starting with -cluster option

I'm wondering if any one experienced the same problem.
We have a Vert.x application and in the end it's purpose is to insert 600 million rows into a Cassandra cluster. We are testing the speed of Vert.x in combination with Cassandra by doing tests in smaller amounts.
If we run the fat jar (build with Shade plugin) without the -cluster option, we are able to insert 10 million records in about a minute. When we add the -cluster option (eventually we will run the Vert.x application in cluster) it takes about 5 minutes for 10 million records to insert.
Does anyone know why?
We know that the Hazelcast config will create some overhead, but never thought it would be 5 times slower. This implies we will need 5 EC2 instances in cluster to get the same result when using 1 EC2 without the cluster option.
As mentioned, everything runs on EC2 instances:
2 Cassandra servers on t2.small
1 Vert.x server on t2.2xlarge
You are actually running into corner cases of the Vert.x Hazelcast Cluster manager.
First of all you are using a worker Verticle to send your messages (30000001). Under the hood Hazelcast is blocking and thus when you send a message from a worker the version 3.3.3 does not take that in account. Recently we added this fix https://github.com/vert-x3/issues/issues/75 (not present in 3.4.0.Beta1 but present in 3.4.0-SNAPSHOTS) that will improve this case.
Second when you send all your messages at the same time, it runs into another corner case that prevents the Hazelcast cluster manager to use a cache of the cluster topology. This topology cache is usually updated after the first message has been sent and sending all the messages in one shot prevents the usage of the ache (short explanation HazelcastAsyncMultiMap#getInProgressCount will be > 0 and prevents the cache to be used), hence paying the penalty of an expensive lookup (hence the cache).
If I use Bertjan's reproducer with 3.4.0-SNAPSHOT + Hazelcast and the following change: send message to destination, wait for reply. Upon reply send all messages then I get a lot of improvements.
Without clustering : 5852 ms
With clustering with HZ 3.3.3 :16745 ms
With clustering with HZ 3.4.0-SNAPSHOT + initial message : 8609 ms
I believe also you should not use a worker verticle to send that many messages and instead send them using an event loop verticle via batches. Perhaps you should explain your use case and we can think about the best way to solve it.
When you're you enable clustering (of any kind) to an application you are making your application more resilient to failures but you're also adding a performance penalty.
For example your current flow (without clustering) is something like:
client ->
vert.x app ->
in memory same process eventbus (negletible) ->
handler -> cassandra
<- vert.x app
<- client
Once you enable clustering:
client ->
vert.x app ->
serialize request ->
network request cluster member ->
deserialize request ->
handler -> cassandra
<- serialize response
<- network reply
<- deserialize response
<- vert.x app
<- client
As you can see there are many encode decode operations required plus several network calls and this all gets added to your total request time.
In order to achive best performance you need to take advantage of locality the closer you are of your data store usually the fastest.
Just to add the code of the project. I guess that would help.
Sender verticle:
public class ProviderVerticle extends AbstractVerticle {
#Override
public void start() throws Exception {
IntStream.range(1, 30000001).parallel().forEach(i -> {
vertx.eventBus().send("clustertest1", Json.encode(new TestCluster1(i, "abc", LocalDateTime.now())));
});
}
#Override
public void stop() throws Exception {
super.stop();
}
}
And the inserter verticle
public class ReceiverVerticle extends AbstractVerticle {
private int messagesReceived = 1;
private Session cassandraSession;
#Override
public void start() throws Exception {
PoolingOptions poolingOptions = new PoolingOptions()
.setCoreConnectionsPerHost(HostDistance.LOCAL, 2)
.setMaxConnectionsPerHost(HostDistance.LOCAL, 3)
.setCoreConnectionsPerHost(HostDistance.REMOTE, 1)
.setMaxConnectionsPerHost(HostDistance.REMOTE, 3)
.setMaxRequestsPerConnection(HostDistance.LOCAL, 20)
.setMaxQueueSize(32768)
.setMaxRequestsPerConnection(HostDistance.REMOTE, 20);
Cluster cluster = Cluster.builder()
.withPoolingOptions(poolingOptions)
.addContactPoints(ClusterSetup.SEEDS)
.build();
System.out.println("Connecting session");
cassandraSession = cluster.connect("kiespees");
System.out.println("Session connected:\n\tcluster [" + cassandraSession.getCluster().getClusterName() + "]");
System.out.println("Connected hosts: ");
cassandraSession.getState().getConnectedHosts().forEach(host -> System.out.println(host.getAddress()));
PreparedStatement prepared = cassandraSession.prepare(
"insert into clustertest1 (id, value, created) " +
"values (:id, :value, :created)");
PreparedStatement preparedTimer = cassandraSession.prepare(
"insert into timer (name, created_on, amount) " +
"values (:name, :createdOn, :amount)");
BoundStatement timerStart = preparedTimer.bind()
.setString("name", "clusterteststart")
.setInt("amount", 0)
.setTimestamp("createdOn", new Timestamp(new Date().getTime()));
cassandraSession.executeAsync(timerStart);
EventBus bus = vertx.eventBus();
System.out.println("Bus info: " + bus.toString());
MessageConsumer<String> cons = bus.consumer("clustertest1");
System.out.println("Consumer info: " + cons.address());
System.out.println("Waiting for messages");
cons.handler(message -> {
TestCluster1 tc = Json.decodeValue(message.body(), TestCluster1.class);
if (messagesReceived % 100000 == 0)
System.out.println("Message received: " + messagesReceived);
BoundStatement boundRecord = prepared.bind()
.setInt("id", tc.getId())
.setString("value", tc.getValue())
.setTimestamp("created", new Timestamp(new Date().getTime()));
cassandraSession.executeAsync(boundRecord);
if (messagesReceived % 100000 == 0) {
BoundStatement timerStop = preparedTimer.bind()
.setString("name", "clusterteststop")
.setInt("amount", messagesReceived)
.setTimestamp("createdOn", new Timestamp(new Date().getTime()));
cassandraSession.executeAsync(timerStop);
}
messagesReceived++;
//message.reply("OK");
});
}
#Override
public void stop() throws Exception {
super.stop();
cassandraSession.close();
}
}

SSH Server Identification never received - Handshake Deadlock [SSHJ]

We're having some trouble trying to implement a Pool of SftpConnections for our application.
We're currently using SSHJ (Schmizz) as the transport library, and facing an issue we simply cannot simulate in our development environment (but the error keeps showing randomly in production, sometimes after three days, sometimes after just 10 minutes).
The problem is, when trying to send a file via SFTP, the thread gets locked in the init method from schmizz' TransportImpl class:
#Override
public void init(String remoteHost, int remotePort, InputStream in, OutputStream out)
throws TransportException {
connInfo = new ConnInfo(remoteHost, remotePort, in, out);
try {
if (config.isWaitForServerIdentBeforeSendingClientIdent()) {
receiveServerIdent();
sendClientIdent();
} else {
sendClientIdent();
receiveServerIdent();
}
log.info("Server identity string: {}", serverID);
} catch (IOException e) {
throw new TransportException(e);
}
reader.start();
}
isWaitForServerIdentBeforeSendingClientIdent is FALSE for us, so first of all the client (we) send our identification, as appears in logs:
"Client identity String: blabla"
Then it's turn for the receiveServerIdent:
private void receiveServerIdent() throws IOException
{
final Buffer.PlainBuffer buf = new Buffer.PlainBuffer();
while ((serverID = readIdentification(buf)).isEmpty()) {
int b = connInfo.in.read();
if (b == -1)
throw new TransportException("Server closed connection during identification exchange");
buf.putByte((byte) b);
}
}
The thread never gets the control back, as the server never replies with its identity. Seems like the code is stuck in this While loop. No timeouts, or SSH exceptions are thrown, my client just keeps waiting forever, and the thread gets deadlocked.
This is the readIdentification method's impl:
private String readIdentification(Buffer.PlainBuffer buffer)
throws IOException {
String ident = new IdentificationStringParser(buffer, loggerFactory).parseIdentificationString();
if (ident.isEmpty()) {
return ident;
}
if (!ident.startsWith("SSH-2.0-") && !ident.startsWith("SSH-1.99-"))
throw new TransportException(DisconnectReason.PROTOCOL_VERSION_NOT_SUPPORTED,
"Server does not support SSHv2, identified as: " + ident);
return ident;
}
Seems like ConnectionInfo's inputstream never gets data to read, as if the server closed the connection (even if, as said earlier, no exception is thrown).
I've tried to simulate this error by saturating the negotiation, closing sockets while connecting, using conntrack to kill established connections while the handshake is being made, but with no luck at all, so any help would be HIGHLY appreciated.
: )
I bet following code creates a problem:
String ident = new IdentificationStringParser(buffer, loggerFactory).parseIdentificationString();
if (ident.isEmpty()) {
return ident;
}
If the IdentificationStringParser.parseIdentificationString() returns empty string, it will be returned to the caller method. The caller method will keep calling the while ((serverID = readIdentification(buf)).isEmpty()) since the string is always empty. The only way to break the loop would be if call to int b = connInfo.in.read(); returns -1... but if server keeps sending the data (or resending the data) this condition is never met.
If this is the case I would add some kind of artificial way to detect this like:
private String readIdentification(Buffer.PlainBuffer buffer, AtomicInteger numberOfAttempts)
throws IOException {
String ident = new IdentificationStringParser(buffer, loggerFactory).parseIdentificationString();
numberOfAttempts.incrementAndGet();
if (ident.isEmpty() && numberOfAttempts.intValue() < 1000) { // 1000
return ident;
} else if (numberOfAttempts.intValue() >= 1000) {
throw new TransportException("To many attempts to read the server ident").
}
if (!ident.startsWith("SSH-2.0-") && !ident.startsWith("SSH-1.99-"))
throw new TransportException(DisconnectReason.PROTOCOL_VERSION_NOT_SUPPORTED,
"Server does not support SSHv2, identified as: " + ident);
return ident;
}
This way you would at least confirm that this is the case and can dig further why .parseIdentificationString() returns empty string.
Faced a similar issue where we would see:
INFO [net.schmizz.sshj.transport.TransportImpl : pool-6-thread-2] - Client identity string: blablabla
INFO [net.schmizz.sshj.transport.TransportImpl : pool-6-thread-2] - Server identity string: blablabla
But on some occasions, there were no server response.
Our service would typically wake up and transfer several files simultaneously, one file per connection / thread.
The issue was in the sshd server config, we increased maxStartups from default value 10
(we noticed the problems started shortly after batch sizes increased to above 10)
Default in /etc/ssh/sshd_config:
MaxStartups 10:30:100
Changed to:
MaxStartups 30:30:100
MaxStartups
Specifies the maximum number of concurrent unauthenticated connections to the SSH daemon. Additional connections will be dropped until authentication succeeds or the LoginGraceTime expires for a connection. The default is 10:30:100. Alternatively, random early drop can be enabled by specifying the three colon separated values start:rate:full (e.g. "10:30:60"). sshd will refuse connection attempts with a probability of rate/100 (30%) if there are currently start (10) unauthenticated connections. The probability increases linearly and all connection attempts are refused if the number of unauthenticated connections reaches full (60).
If you cannot control the server, you might have to find a way to limit your concurrent connection attempts in your client code instead.

Categories