Camel: PollEnrich generating a lot of Timed Waiting threads - java

I have this camel route
from("file:{{PATH_INPUT}}?charset=iso-8859-1&delete=true")
.process(new ProcessorName())
.pollEnrich().simple("${property.URI_FILE}", String.class).aggregationStrategy(new Estrategia()).timeout(10000).aggregateOnException(true)
.choice()
.when(simple("${property.result} == 'OK'"))
.to(URI_OUTPUT)
.endChoice();
This route takes a file from PATH_INPUT, compare it with the file URI_FILE (I generate URI_FILE property in ProccessorName()) and if URI_FILE body contains a specific data, then the result is "OK" and send it to URI_OUTPUT (activeMQ).
This works ok, but later I noticed that this generated a lot of waiting threads, one for each exchange.
I don't know why is this happening. I have tried with a ConsumerTemplate and the results are the same.

Yes this is expected if you generate a unique URI per endpoint you poll. I assume you generate a dynamic fileName which you specify in that URI, and that you see a thread per endpoint?
I have logged a ticket to make this easier in the future
https://issues.apache.org/jira/browse/CAMEL-11250
If you just want to set the message body to a specify file name, then the fastest and easiest is to use setBody as a java.io.File type:
.setBody(simple("${property.URI_FILE}", java.io.File))

I have run into the same trouble and faced memory leak. As a workaround, I implemented my own 'org.apache.camel.spi.PollingConsumerPollStrategy' which catches the Consumer when it is begun (by pollEnrich) and sends it to a bean that shall hold all of these consumers in a Map.
Then, I added a timer-route only to trigger a purge action onto the Map that checks if a given time limit has been reached for each of them. If so, it stops the Consumer (leading to interrupt its related thread) and then removes it from the Map.
Like this:
from("direct://foo")
.to("an endpoint that returns the file name")
.pollEnrich()
.simple("file://{{app.runtime.draft.path}}"
+ "?fileName=${body}"
+ "&recursive=true"
+ "&delete=true"
+ "&pollStrategy=#myFilePollingStrategy" // my poll strategy
+ "&maxMessagesPerPoll=1")
.timeout(6 * 1000L)
.end()
.to("direct://a")
.to("direct://b")
.to("direct://c")
.end();
from("timer://file-consumer-purge?period=5s")
.bean(fileConsumerController, "purge")
.end();
#Component
public class FileConsumerController {
private Map<Consumer, Long> mapConsumers = new ConcurrentHashMap<>();
private static final long LIMIT = 25 * 1000L; // 25 seconds
public void hold(Consumer consumer) {
mapConsumers.put(consumer, System.currentTimeMillis());
}
public void purge() {
mapConsumers.forEach((consumer, startTime) -> {
if (System.currentTimeMillis() - startTime > LIMIT) {
try {
consumer.stop();
} catch (Exception e) {
e.printStackTrace();
} finally {
mapConsumers.remove(consumer);
}
}
});
}
}
#Component
public class MyFilePollingStrategy extends DefaultPollingConsumerPollStrategy {
#Autowired
FileConsumerController fileConsumerController;
#Override
public boolean begin(Consumer consumer, Endpoint endpoint) {
fileConsumerController.hold(consumer);
return super.begin(consumer, endpoint);
}
}
Notes:
I monitored the behavior through jconsole;
I've only overwritten the begin() method and haven't tested the effects over unexpected / error scenarios.
Hope this helps for now, and may the component be evolved. :)

Related

Multiple blocking calls wrapped in fromCallable in WebFlux

I'm using Feign Client in Reactive Java. The Feign client has an interceptor that sends a blocking request to get auth token and adds it as a header to the feign request.
the feign request is wrapped in Mono.FromCallable with Schedulers.boundedElastic().
my question is: does the inner call to get the auth token considered as a blocking call?
I get that both calls will be on a different thread from Schedulers.boundedElastic() but not sure is ok to execute them on the same thread or I should change it so they'll run on different threads.
Feign client:
#FeignClient(name = "remoteRestClient", url = "${remote.url}",
configuration = AuthConfiguration.class, decode404 = true)
#Profile({ "!test" })
public interface RemoteRestClient {
#GetMapping(value = "/getSomeData" )
Data getData();
}
interceptor:
public class ClientRequestInterceptor implements RequestInterceptor {
private IAPRequestBuilder iapRequestBuilder;
private String clientName;
public ClientRequestInterceptor(String clientName, String serviceAccount, String jwtClientId) {
this.iapRequestBuilder = new IAPRequestBuilder(serviceAccount, jwtClientId);
this.clientName = clientName;
}
#Override
public void apply(RequestTemplate template) {
try {
HttpRequest httpRequest = iapRequestBuilder.buildIapRequest(); <---- blocking call
template.header(HttpHeaders.AUTHORIZATION, httpRequest.getHeaders().getAuthorization());
} catch (IOException e) {
log.error("Building an IAP request has failed: {}", e.getMessage(), e);
throw new InterceptorException(String.format("failed to build IAP request for %s", clientName), e);
}
}
}
feign configuration:
public class AuthConfiguration {
#Value("${serviceAccount}")
private String serviceAccount;
#Value("${jwtClientId}")
private String jwtClientId;
#Bean
public ClientRequestInterceptor getClientRequestInterceptor() {
return new ClientRequestInterceptor("Entitlement", serviceAccount, jwtClientId);
}
}
and feign client call:
private Mono<Data> getData() {
return Mono.fromCallable(() -> RemoteRestClient.getData()
.publishOn(Schedulers.boundedElastic());
}
You can sort of tell that it is a blocking call since it returns a concrete class and not a Future (Mono or Flux). To be able to return a concrete class, the thread needs to wait until we have the response to return it.
So yes it is most likely a blocking call.
Reactor recommends that you use the subscribeOn operator when doing blocking calls, this will place that entire chain of operators on its own thread pool.
You have chosen to use the publishOn and it is worth pointing out the following from the docs:
affects where the subsequent operators execute
This in practice means that up until the publishOn operator all actions will be executed using any available anonymous thread.
But all calls after will be executed on the defined thread pool.
private Mono<Data> getData() {
return Mono.fromCallable(() -> RemoteRestClient.getData()
.publishOn(Schedulers.boundedElastic());
}
You have chosen to place it after so the thread pool switch will be done after the call to getData.
publishOns placing in the chain matters while subscribeOn affects the entire chain of operator which means it's placing does not matter.
So to answer your question again, yes it is most likely a blocking call (i can't confirm by 100% since i have not looked into the source code) and how you wish to solve it with either publishOn on subscribeOn is up to you.
Or look into if there is an reactive alternative library to use.

Executing Mono streams that no one subscribes to in Spring Web flux [duplicate]

This question already has an answer here:
What is the difference between block() , subscribe() and subscribe(-)
(1 answer)
Closed last year.
I have a spring Webflux application. There are two important parts to this application:
A job is scheduled to run at a fixed interval.
The job fetches the data from DB and stores the data in Redis.
void run() {
redisAdapter.getTtl()
.doOnError(RefreshExternalCache::logError)
.switchIfEmpty(Mono.defer(() -> {
log.debug(">> RefreshExternalCache > refreshExternalCacheIfNeeded => Remaining TTL could not be retrieved. Cache does not exist. " +
"Trying to create the cache.");
return Mono.just(Duration.ofSeconds(0));
}))
.subscribe(remainingTtl -> {
log.debug(">> RefreshExternalCache > refreshExternalCacheIfNeeded => original ttl for the cache: {} | ttl for cache in seconds = {} | ttl for cache in minutes = {}",
remainingTtl, remainingTtl.getSeconds(), remainingTtl.toMinutes());
if (isExternalCacheRefreshNeeded(remainingTtl, offerServiceProperties.getExternalCacheExpiration(), offerServiceProperties.getExternalCacheRefreshPeriod())) {
log.debug(">> RefreshExternalCache > refreshExternalCacheIfNeeded => external cache is up-to-date, skipping refresh");
} else {
log.debug(">> RefreshExternalCache > refreshExternalCacheIfNeeded => external cache is outdated, updating the external cache");
offerService.refreshExternalCache();
}
});
}
This basically calls another method called refreshExternalCache(), the implementation below:
public void refreshExternalCache() {
fetchOffersFromSource()
.doOnNext(offerData -> {
log.debug(LOG_REFRESH_CACHE + "Updating local offer cache with data from source");
localCache.put(OFFER_DATA_KEY, offerData);
storeOffersInExternalCache(offerData, offerServiceProperties.getExternalCacheExpiration());
})
.doOnSuccess(offerData -> meterRegistry.counter(METRIC_EXTERNAL_CACHE_REFRESH_COUNTER, TAG_OUTCOME, SUCCESS).increment())
.doOnError(sourceThrowable -> {
log.debug(LOG_REFRESH_CACHE + "Error while refreshing external cache {}", sourceThrowable.getMessage());
meterRegistry.counter(METRIC_EXTERNAL_CACHE_REFRESH_COUNTER, TAG_OUTCOME, FAILURE).increment();
}).subscribe();
}
Also, in the above method, you can see a call to storeOffersInExternalCache
public void storeOffersInExternalCache(OfferData offerData, Duration ttl) {
log.info(LOG_STORING_OFFER_DATA + "Storing the offer data in external cache...");
redisAdapter.storeOffers(offerData, ttl);
}
public void storeOffers(OfferData offerData, Duration ttl) {
Mono.fromRunnable(() -> redisClient.storeSerializedOffers(serializeFromDomain(offerData), ttl)
.doOnNext(status -> {
if (Boolean.TRUE.equals(status)) {
log.info(LOG_STORE_OFFERS + "Data stored in redis.");
meterRegistry.counter(METRIC_REDIS_STORE_OFFERS, TAG_OUTCOME, SUCCESS).increment();
} else {
log.error(LOG_STORE_OFFERS + "Unable to store data in redis.");
meterRegistry.counter(METRIC_REDIS_STORE_OFFERS, TAG_OUTCOME, FAILURE).increment();
}
}).retryWhen(Retry.backoff(redisRetryProperties.getMaxAttempts(), redisRetryProperties.getWaitDuration()).jitter(redisRetryProperties.getBackoffJitter()))
.doOnError(throwable -> {
meterRegistry.counter(METRIC_REDIS_STORE_OFFERS, TAG_OUTCOME, FAILURE).increment();
log.error(LOG_STORE_OFFERS + "Unable to store data in redis. Error: [{}]", throwable.getMessage());
})).subscribeOn(Schedulers.boundedElastic());
}
Redis Client
#Slf4j
#Component
public class RedisClient {
private final ReactiveRedisTemplate<String, String> reactiveRedisTemplate;
private final ReactiveValueOperations<String, String> reactiveValueOps;
public RedisClient(#Qualifier("reactiveRedisTemplate") ReactiveRedisTemplate<String, String> reactiveRedisTemplate) {
this.reactiveRedisTemplate = reactiveRedisTemplate;
this.reactiveValueOps = reactiveRedisTemplate.opsForValue();
}
Mono<Optional<String>> fetchSerializedOffers() {
return reactiveValueOps.get(OFFER_DATA_KEY).map(Optional::ofNullable);
}
Mono<Boolean> storeSerializedOffers(String serializedOffers, Duration ttl) {
return reactiveValueOps.set(OFFER_DATA_KEY, serializedOffers, ttl);
}
Mono<Duration> getTtl() {
return reactiveRedisTemplate.getExpire(OFFER_DATA_KEY);
}
}
Now my concerns are:
If I do not call the subscribe method on these Mono streams, these methods are not even executed. This is fair as they won't execute until someone subscribes to them.
As I understand it correctly, subscribe is a blocking call. This defeats the whole purpose of Reactive programming. Isn't it?
I looked for several ways to make this work, one of them has been shown above. I tried calling one of the methods in Mono.fromRunnable but this also is not a very good approach. (read it on another thread in StackOverflow).
So, is the approach that I am taking above not correct? How do we execute the Mono streams that no one subscribes to?
Answering your concern number 2 (which seems to be the only real doubt in your question). Not really. block() (https://projectreactor.io/docs/core/release/api/reactor/core/publisher/Mono.html#block--) is the one that subscribes to a Mono or Flux and waits indefinitely until a next signal is received. On the other hand subscribe() (https://projectreactor.io/docs/core/release/api/reactor/core/publisher/Mono.html#subscribe--) subscribes to a Mono or Flux but it doesn't block and instead reacts when an element is emitted.

Kafka SpringBoot StreamListener - how to consume multiple topics in order?

I have multiple StreamListener-annotated methods consuming from different topics. But some of these topics need to be read from the "earliest" offset to populate an in-memory map (something like a state machine) and then consume from other topics that might have commands in them that should be executed against the "latest" state machine.
Current code looks something like:
#Component
#AllArgsConstructor
#EnableBinding({InputChannel.class, OutputChannel.class})
#Slf4j
public class KafkaListener {
#StreamListener(target = InputChannel.EVENTS)
public void event(Event event) {
// do something with the event
}
#StreamListener(target = InputChannel.COMMANDS)
public void command(Command command) {
// do something with the command only after all events have been processed
}
}
I tried to add some horrible code that gets the kafka topic offset metadata from the incoming event messages and then uses a semaphore to block the command until a certain percentage of the total offset is reached by the event. It kinda works but makes me sad, and it will be awful to maintain once we have 20 or so topics that all depend on one another.
Does SpringBoot / Spring Streams have any built-in mechanism to do this, or is there some common pattern that people use that I'm not aware of?
TL;DR: How do I process all messages from topic A before consuming any from topic B, without doing something dirty like sticking a Thread.sleep(60000) in the consumer for topic B?
See the kafka consumer binding property resetOffsets
resetOffsets
Whether to reset offsets on the consumer to the value provided by startOffset. Must be false if a KafkaRebalanceListener is provided; see Using a KafkaRebalanceListener.
Default: false.
startOffset
The starting offset for new groups. Allowed values: earliest and latest. If the consumer group is set explicitly for the consumer 'binding' (through spring.cloud.stream.bindings..group), 'startOffset' is set to earliest. Otherwise, it is set to latest for the anonymous consumer group. Also see resetOffsets (earlier in this list).
Default: null (equivalent to earliest).
You can also add a KafkaBindingRebalanceListener and perform seeks on the consumer.
EDIT
You can also set autoStartup to false on the second listener, and start the binding when you are ready. Here's an example:
#SpringBootApplication
#EnableBinding(Sink.class)
public class Gitter55Application {
public static void main(String[] args) {
SpringApplication.run(Gitter55Application.class, args);
}
#Bean
public ConsumerEndpointCustomizer<KafkaMessageDrivenChannelAdapter<?, ?>> customizer() {
return (endpoint, dest, group) -> {
endpoint.setOnPartitionsAssignedSeekCallback((assignments, callback) -> {
assignments.keySet().forEach(tp -> callback.seekToBeginning(tp.topic(), tp.partition()));
});
};
}
#StreamListener(Sink.INPUT)
public void listen(String value, #Header(KafkaHeaders.RECEIVED_MESSAGE_KEY) byte[] key) {
System.out.println(new String(key) + ":" + value);
}
#Bean
public ApplicationRunner runner(KafkaTemplate<byte[], byte[]> template,
BindingsEndpoint bindings) {
return args -> {
while (true) {
template.send("gitter55", "foo".getBytes(), "bar".getBytes());
System.out.println("Hit enter to start");
System.in.read();
bindings.changeState("input", State.STARTED);
}
};
}
}
spring.cloud.stream.bindings.input.group=gitter55
spring.cloud.stream.bindings.input.destination=gitter55
spring.cloud.stream.bindings.input.content-type=text/plain
spring.cloud.stream.bindings.input.consumer.auto-startup=false

How to write Vertx worker verticle - indefinite blocking operation?

Following class is my worker verticle in which i want to execute a blocking code on recieving a message from event bus on a channel named events-config.
The objective is to generate and publish json messages indefinitely until i receive stop operation message on events-config channel.
I am using executeBlocking to achieve the desired functionality. However since am running the blocking operation indefinitely , vertx blocked threadchecker dumping warnings .
Question:
- Is there a way to disable blockedthreadchecker only for specific verticle ??
- Does the code below adheres to the best practice of executing infinite loop on need basis in vertx ? If not can you please suggest best way to do this ?
public class WorkerVerticle extends AbstractVerticle {
Logger logger = LoggerFactory.getLogger(WorkerVerticle.class);
private MessageConsumer<Object> mConfigConsumer;
AtomicBoolean shouldPublish = new AtomicBoolean(true);
private JsonGenerator json = new JsonGenerator();
#Override
public void start() {
mConfigConsumer = vertx.eventBus().consumer("events-config", message -> {
String msgBody = (String) message.body();
if (msgBody.contains(PublishOperation.START_PUBLISH.getName()) && !mJsonGenerator.isPublishOnGoing()) {
logger.info("Message received to start producing data onto kafka " + msgBody);
vertx.<Void>executeBlocking(voidFutureHandler -> {
Integer numberOfMessagesToBePublished = 100000;
if (numberOfMessagesToBePublished <= 0) {
logger.info("Skipping message publish :"+numberOfMessagesToBePublished);
return; // is it best way to do it ??
}
publishData(numberOfMessagesToBePublished);
},false, voidAsyncResult -> logger.info("Blocking publish operation is terminated"));
} else if (msgBody.contains(PublishOperation.STOP_PUBLISH.getName()) && mJsonGenerator.isPublishOnGoing()) {
logger.info("Message received to terminate " + msgBody);
mJsonGenerator.terminatePublish();
}
});
}
private void publishData(){
while(shouldPublish.get()){
//code to generate json indefinitely until some one reset shouldPublish variable
}
}
}
You don't want to use busy loops in your asynchronous code.
Use vertx.setPeriodic() or vertx.setTimer() instead:
vertx.setTimer(20, (l) -> {
// Generate your JSON
if (shouldPublish.get()) {
// Set timer again
}
});

Best way to make multiple asynchronous calls to same web service

Application on which I am working is going to consume 2 REST web service in below sequence:
1) Count Records - To know the numbers of records within a particular time frame.
2) Fetch Records - Once we have number of records then we need to call this service. But this service has a threshold to fetch 10K records in a go. Lets say if first service tell me within particular time interval, it has 100K of records, then I need to call second web service 10 times in paginated way considering it's threshold is 10K in one go.
So if I will make 10 synchronous calls, my application would be too slow to respond back. So I need to a mechanism to make asynchronous calls.
I am using spring framework in the back end code and using rest template for web service call. I am looking to find the best way to make asynchronous call to the above mentioned POST web service
I have done some research and found Asynchronous method useful as below:
https://spring.io/guides/gs/async-method/
Can you please guide me if this is a right approach what I am looking at or is their a better way to make asynchronous call? Looking for your suggestions, Thanks!
What #Journycorner linked is a good start but it doesn't really show the whole picture as it only makes a single request. Working with Future is definitely on the right path. The fact that Spring 4 offers an AsyncRestTemplate that returns a Future is exactly what you want to use.
On my phone so can't write up the full code but this is roughly what you want to do.
#Component
public class SampleAsyncService {
private RestTemplate restTemplate;
private AsyncRestTemplate asyncRestTemplate;
#Value("${myapp.batchSize:1000}")
private int batchSize;
public SampleAsyncService(AsyncRestTemplate asyncRestTemplate, RestTemplate restTemplate) {
this.asyncRestTemplate = asyncRestTemplate;
this.restTemplate = restTemplate;
}
public List<Record> callForRecords() {
ResponseEntity<Integer> response = restTemplate.getForEntity("http://localhost:8081/countService",
Integer.class);
int totalRecords = response.getBody().intValue();
List<Future<ResponseEntity<List<Record>>>> futures = new ArrayList<Future<ResponseEntity<List<Record>>>>();
for (int offset = 0; offset < totalRecords;) {
ListenableFuture<ResponseEntity<List<Record>>> future = asyncRestTemplate.exchange(
"http://localhost:8081/records?startRow={}&endRow={}", HttpMethod.GET, null,
new ParameterizedTypeReference<List<Record>>() {
}, offset, batchSize);
futures.add(future);
offset = offset + batchSize;
}
int responses = 0;
List<Record> fullListOfRecords = new ArrayList<Record>();
while (responses < futures.size()) {
for (Future<ResponseEntity<List<Record>>> future : futures) {
if (future.isDone()) {
responses++;
try {
ResponseEntity<List<Record>> responseEntity = future.get();
fullListOfRecords.addAll(responseEntity.getBody());
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
}
}
}
return fullListOfRecords;
}
public class Record {
}
}
* Update * Created complete code sample.
This is just an improvement over #shawn answer. With the implementation provided earlier, sometimes I was facing issue due to below block:
while (responses < futures.size()) {
for (Future<ResponseEntity<List<Record>>> future : futures) {
if (future.isDone()) {
responses++;
try {
ResponseEntity<List<Record>> responseEntity = future.get();
fullListOfRecords.addAll(responseEntity.getBody());
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
}
}
Here if 6 threads are processed, sometimes same thread is allowed to enter multiple times in above block so it ends up having duplicate records in response. So to avoid such situation, I have added callback block to create final response which make sure no duplicate response, though we still need while(responses < futures.size()) loop block with future.get() to block method to return final combined response until all the asynchronous calls are processed.
#Component
public class SampleAsyncService {
private RestTemplate restTemplate;
private AsyncRestTemplate asyncRestTemplate;
#Value("${myapp.batchSize:1000}")
private int batchSize;
public SampleAsyncService(AsyncRestTemplate asyncRestTemplate, RestTemplate restTemplate) {
this.asyncRestTemplate = asyncRestTemplate;
this.restTemplate = restTemplate;
}
public List<Record> callForRecords() {
ResponseEntity<Integer> response = restTemplate.getForEntity("http://localhost:8081/countService",
Integer.class);
int totalRecords = response.getBody().intValue();
List<Future<ResponseEntity<List<Record>>>> futures = new ArrayList<Future<ResponseEntity<List<Record>>>>();
for (int offset = 0; offset < totalRecords;) {
ListenableFuture<ResponseEntity<List<Record>>> future = asyncRestTemplate.exchange(
"http://localhost:8081/records?startRow={}&endRow={}", HttpMethod.GET, null,
new ParameterizedTypeReference<List<Record>>() {
}, offset, batchSize);
future.addCallback(
new ListenableFutureCallback<ResponseEntity<ChatTranscript>>() {
#Override
public void onSuccess(ResponseEntity<ChatTranscript> response) {
fullListOfRecords.addAll(responseEntity.getBody());
log.debug("Success: " + Thread.currentThread());
}
#Override
public void onFailure(Throwable t) {
log.debug("Error: " + Thread.currentThread());
}
}
);
futures.add(future);
offset = offset + batchSize;
}
int responses = 0;
List<Record> fullListOfRecords = new ArrayList<Record>();
while (responses < futures.size()) {
for (Future<ResponseEntity<List<Record>>> future : futures) {
if (future.isDone()) {
responses++;
try {
future.get();
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
}
}
}
return fullListOfRecords;
}
public class Record {
}
}
If it really has to be asynchronous using the Spring Implementation seems like a smart idea.
I am assuming you need all the 100k records in a single execution so as to package all the data maybe into a file or perform some business logic in one go .
If thats not case it would be wise to reconsider the need to load all data into single execution straining the memory usage of jvm or running into Outofmemory errors.
Assuming the former, Async could be an option to execute parallel threads and capture and collate responses from each. However you need to keep an upper limit of number of threads to be executed in parallel using a "thread pool size" of a task executor.
One more reason to limit the thread size is avoid loading your partner rest webservice with too many parallel calls. Eventually the partner webservice is loading data from database, which will give optimum performance for certain limit of parallel query executions. Hope this helps!

Categories