I'm kind of confused how to use MergeHub.
I'm designing a flow graph that uses Flow.mapAsync(), where the given function creates another flow graph, and then runs it with Sink.ignore(), and returns that CompletionStage as the value for Flow.mapAsync() to wait for. The nested flow will return elements via the Sink returned from materializing the MergeHub.
The issue is that I need to provide the Function which starts the nested flow to Flow.mapAsync() when I'm creating the top-level flow graph, but that requires it to have access to the materialized value returned from materializing the result of MergeHub.of(). How do I get that materialized value before starting the flow graph?
The only way I can see right now is to implement the Function to block until the Sink has been provided (after starting the top-level flow graph), but that seems pretty hacky.
So, something like
class MapAsyncFunctor implements Function<T, CompletionStage<Done>> {...}
MapAsyncFunctor mapAsyncFunctor = new MapAsyncFunctor();
RunnableGraph<Sink<T>> graph = createGraph(mapAsyncFunctor);
Sink<T> sink = materializer.materialize(graph);
mapAsyncFunctor.setSink(sink); // Graph execution blocked in background in call to mapAsyncFunctor.apply() until this is done
Edit: I've created the following class
public final class Channel<T>
{
private final Sink<T, NotUsed> m_channelIn;
private final Source<T, NotUsed> m_channelOut;
private final UniqueKillSwitch m_killSwitch;
public Channel(Class<T> in_class, Materializer in_materializer)
{
final Source<T, Sink<T, NotUsed>> source = MergeHub.of(in_class);
final Sink<T, Source<T, NotUsed>> sink = BroadcastHub.of(in_class);
final Pair<Pair<Sink<T, NotUsed>, UniqueKillSwitch>, Source<T, NotUsed>> matVals = in_materializer.materialize(source.viaMat(KillSwitches.single(), Keep.both()).toMat(sink, Keep.both()));
m_channelIn = matVals.first().first();
m_channelOut = matVals.second();
m_killSwitch = matVals.first().second();
}
public Sink<T, NotUsed> in()
{
return m_channelIn;
}
public Source<T, NotUsed> out()
{
return m_channelOut;
}
public void close()
{
m_killSwitch.shutdown();
}
}
so that I can get a Source/Sink pair to use in building the graph. Is this a good idea? Will I 'leak' these channels if I don't explicitly close()
them?
I'll only ever need to use .out() once for my use-case.
With MergeHub you always need to materialize the hub sink before doing anything else.
Sink<T, NotUsed> toConsumer = MergeHub.of(String.class, 16).to(consumer).run(materializer);
You can then distribute it to all bits of code that need to materialize it to send data to it. Following your snippet above, a possible approach might be passing the Sink your functor at construction time:
class MapAsyncFunctor implements Function<T, CompletionStage<Done>> {
private Sink<T, NotUsed> sink;
public MapAsyncFunctor(Sink<T, NotUsed> sink) {
this.sink = sink;
}
#Override
public CompletionStage<Done> apply(T t) { /* run substream into sink */ }
}
MapAsyncFunctor mapAsyncFunctor = new MapAsyncFunctor(toConsumer);
// run your flow with mapAsync on the above functor
More info on MergeHub can be found in the docs.
Related
I use Spring WebFlux (Project Reactor) and I'm facing the following problem:
I have to get some data from db to use them to call another service - everything in one stream. How to do that?
public Mono<MyObj> saveObj(Mono<MyObj> obj) {
return obj
.flatMap(
ob->
Mono.zip(
repo1.save(
...),
repo2
.saveAll(...)
.collectList(),
repo3
.saveAll(...)
.collectList())
.map(this::createSpecificObject))
.doOnNext(item-> createObjAndCallAnotherService(item));
}
private void createObjAndCallAnotherService(Prot prot){
myRepository
.findById(
prot.getDomCred().stream()
.filter(Objects::nonNull)
.findFirst()
.map(ConfDomCred::getCredId)
.orElse(UUID.fromString("00000000-0000-0000-0000-000000000000")))
.doOnNext( //one value is returned from myRepository -> Flux<MyObjectWithNeededData>
confCred-> {//from this point the code is unreachable!!! - why????
Optional<ConfDomCred> confDomCred=
prot.getDomCreds().stream().filter(Objects::nonNull).findFirst();
confDomCred.ifPresent(
domCred -> {
ProtComDto com=
ProtComDto.builder()
.userName(confCred.getUsername())
.password(confCred.getPassword())
.build();
clientApiToAnotherService.callEndpintInAnotherService(com); //this is a client like Feign that invokes method in another service
});
});
}
UPDATE
When I invoke
Flux<MyObj> myFlux = myRepository
.findById(
prot.getDomCred().stream()
.filter(Objects::nonNull)
.findFirst()
.map(ConfDomCred::getCredId)
.orElse(UUID.fromString("00000000-0000-0000-0000-000000000000")));
myFlux.subscribe(e -> e.getPassword())
then the value is printed
UPDATE2
So as a recap - I think the code below is asynchronous/non-blocking - am I right?
In my
ProtectionCommandService
I had to use subscribe() twice - only then I can call my other service and store them my object: commandControllerApi.createNewCommand
public Mono<Protection> saveProtection(Mono<Protection> newProtection) {
return newProtection.flatMap(
protection ->
Mono.zip(
protectorRepository.save(//some code),
domainCredentialRepository
.saveAll(//some code)
.collectList(),
protectionSetRepository
.saveAll(//some code)
.collectList())
.map(this::createNewObjectWrapper)
.doOnNext(protectionCommandService::createProtectionCommand));
}
ProtectionCommandService class:
public class ProtectionCommandService {
private final ProtectionCommandStrategyFactory protectionCommandFactory;
private final CommandControllerApi commandControllerApi;
public Mono<ProtectionObjectsWrapper> createProtectionCommand(
ProtectionObjectsWrapper protection) {
ProductType productType = protection.getProtector().getProductType();
Optional<ProtectionCommandFactory> commandFactory = protectionCommandFactory.get(productType);
commandFactory
.get()
.createCommandFromProtection(protection)
.subscribe(command -> commandControllerApi.createNewCommand(command).subscribe());
return Mono.just(protection);
}
}
And one of 2 factories:
#Component
#AllArgsConstructor
#Slf4j
public class VmWareProtectionCommandFactory implements ProtectionCommandFactory {
private static final Map<ProductType, CommandTypeEnum> productTypeToCommandType =
ImmutableMap.of(...//some values);
private final ConfigurationCredentialRepository configurationCredentialRepository;
#Override
public Mono<CommandDetails> createCommandFromProtection(ProtectionObjectsWrapper protection) {
Optional<DomainCredential> domainCredential =
protection.getDomainCredentials().stream().findFirst();
return configurationCredentialRepository
.findByOwnerAndId(protection.getOwner(), domainCredential.get().getCredentialId())
.map(credential -> createCommand(protection, credential, domainCredential.get()));
}
and createCommand method returns Mono object as a result of this factory.
private Mono<CommandDetails> createCommand(Protection protection
//other parameters) {
CommandDto commandDto =
buildCommandDto(protection, confCredential, domainCredentials);
String commands = JsonUtils.toJson(commandDto);
CommandDetails details = new CommandDetails();
details.setAgentId(protection.getProtector().getAgentId().toString());
details.setCommandType(///some value);
details.setArguments(//some value);
return Mono.just(details);
UPDATE3
My main method that calls everything has been changed a little bit:
public Mono<MyObj> saveObj(Mono<MyObj> obj) {
return obj
.flatMap(
ob->
Mono.zip(
repo1.save(
...),
repo2
.saveAll(...)
.collectList(),
repo3
.saveAll(...)
.collectList())
.map(this::wrapIntoAnotherObject)
.flatMap(protectionCommandService::createProtectionCommand)
.map(this::createMyObj));
Stop breaking the chain
This is a pure function it returns something, and always returns the same something whatever we give it. It has no side effect.
public Mono<Integer> fooBar(int number) {
return Mono.just(number);
}
we can call it and chain on, because it returns something.
foobar(5).flatMap(number -> { ... }).subscribe();
This is a non pure function, we can't chain on, we are breaking the chain. We can't subscribe, and nothing happens until we subscribe.
public void fooBar(int number) {
Mono.just(number)
}
fooBar(5).subscribe(); // compiler error
but i want a void function, i want, i want i want.... wuuaaa wuaaaa
We always need something to be returned so that we can trigger the next part in the chain. How else would the program know when to run the next section? But lets say we want to ignore the return value and just trigger the next part. Well we can then return a Mono<Void>.
public Mono<Void> fooBar(int number) {
System.out.println("Number: " + number);
return Mono.empty();
}
foobar(5).subscribe(); // Will work we have not broken the chain
your example:
private void createObjAndCallAnotherService(Prot prot){
myRepository.findById( ... ) // breaking the chain, no return
}
And some other tips:
Name your objects correctly not MyObj and saveObj, myRepository
Avoid long names createObjAndCallAnotherService
Follow single responsibility createObjAndCallAnotherService this is doing 2 things, hence the name.
Create private functions, or helper functions to make your code more readable don't inline everything.
UPDATE
You are still making the same misstake.
commandFactory // Here you are breaking the chain because you are ignoring the return type
.get()
.createCommandFromProtection(protection)
.subscribe(command -> commandControllerApi.createNewCommand(command)
.subscribe()); // DONT SUBSCRIBE you are not the consumer, the client that initiated the call is the subscriber
return Mono.just(protection);
What you want to do is:
return commandFactory.get()
.createCommandFrom(protection)
.flatMap(command -> commandControllerApi.createNewCommand(command))
.thenReturn(protection);
Stop breaking the chain, and don't subscribe unless your service is the final consumer, or the one initiating a call.
I am trying to make a reactive pipeline using Java and project-reactor where the use-case is that the application generates flow status(INIT, PROCESSING, SAVED, DONE) at different levels. The status must be emitted asynchronously to a flux which is needed to be handled independently and separately from the main flow. I came across this link:
Spring WebFlux (Flux): how to publish dynamically
My sample flow is something like this:
public class StatusEmitterImpl implements StatusEmitter {
private final FluxProcessor<String, String> processor;
private final FluxSink<String> sink;
public StatusEmitterImpl() {
this.processor = DirectProcessor.<String>create().serialize();
this.sink = processor.sink();
}
#Override
public Flux<String> publisher() {
return this.processor.map(x -> x);
}
public void publishStatus(String status) {
sink.next(status);
}
}
public class Try {
public static void main(String[] args) {
StatusEmitterImpl statusEmitter = new StatusEmitterImpl();
Flux.fromIterable(Arrays.asList("INIT", "DONE")).subscribe(x ->
statusEmitter.publishStatus(x));
statusEmitter.publisher().subscribe(x -> System.out.println(x));
}
}
The problem is that nothing is getting printed on the console. I cannot understand what I am missing.
DirectProcessor passes values to its registered Subscribers directly, without caching the signals. If there is no Subscriber, then the value is "forgotten". If a Subscriber comes in late, then it will only receive signals emitted after it subscribed.
That's what is happening here: because fromIterable works on an in-memory collection, it has time to push all values to the DirectProcessor, which by that time doesn't have a registered Subscriber yet.
If you invert the last two lines you should see something.
The DirectProcessor is hot publishers and don't buffer element,so you should produce element after its subscribe.like is
public static void main(String[] args) {
StatusEmitterImpl statusEmitter = new StatusEmitterImpl();
statusEmitter.publisherA().subscribe(x -> System.out.println(x));
Flux.fromIterable(Arrays.asList("INIT", "DONE")).subscribe(x -> statusEmitter.publishStatus(x));
}
, or use EmitterProcessor,UnicastProcessor instand of DirectProcessor.
I am working on measuing my application metrics using below class in which I increment and decrement metrics.
public class AppMetrics {
private final AtomicLongMap<String> metricCounter = AtomicLongMap.create();
private static class Holder {
private static final AppMetrics INSTANCE = new AppMetrics();
}
public static AppMetrics getInstance() {
return Holder.INSTANCE;
}
private AppMetrics() {}
public void increment(String name) {
metricCounter.getAndIncrement(name);
}
public AtomicLongMap<String> getMetricCounter() {
return metricCounter;
}
}
I am calling increment method of AppMetrics class from multithreaded code to increment the metrics by passing the metric name.
Problem Statement:
Now I want to have metricCounter for each clientId which is a String. That means we can also get same clientId multiple times and sometimes it will be a new clientId, so somehow then I need to extract the metricCounter map for that clientId and increment metrics on that particular map (which is what I am not sure how to do that).
What is the right way to do that keeping in mind it has to be thread safe and have to perform atomic operations. I was thinking to make a map like that instead:
private final Map<String, AtomicLongMap<String>> clientIdMetricCounterHolder = Maps.newConcurrentMap();
Is this the right way? If yes then how can I populate this map by passing clientId as it's key and it's value will be the counter map for each metric.
I am on Java 7.
If you use a map then you'll need to synchronize on the creation of new AtomicLongMap instances. I would recommend using a LoadingCache instead. You might not end up using any of the actual "caching" features but the "loading" feature is extremely helpful as it will synchronizing creation of AtomicLongMap instances for you. e.g.:
LoadingCache<String, AtomicLongMap<String>> clientIdMetricCounterCache =
CacheBuilder.newBuilder().build(new CacheLoader<String, AtomicLongMap<String>>() {
#Override
public AtomicLongMap<String> load(String key) throws Exception {
return AtomicLongMap.create();
}
});
Now you can safely start update metric counts for any client without worrying about whether the client is new or not. e.g.
clientIdMetricCounterCache.get(clientId).incrementAndGet(metricName);
A Map<String, Map<String, T>> is just a Map<Pair<String, String>, T> in disguise. Create a MultiKey class:
class MultiKey {
public String clientId;
public String name;
// be sure to add hashCode and equals
}
Then just use an AtomicLongMap<MultiKey>.
Edited:
Provided the set of metrics is well defined, it wouldn't be too hard to use this data structure to view metrics for one client:
Set<String> possibleMetrics = // all the possible values for "name"
Map<String, Long> getMetricsForClient(String client) {
return Maps.asMap(possibleMetrics, m -> metrics.get(new MultiKey(client, m));
}
The returned map will be a live unmodifiable view. It might be a bit more verbose if you're using an older Java version, but it's still possible.
I'm trying to create some tests to verify data going through an Apache Storm topology (using the Trident API)
I've created this simple filter to access callbacks:
public class CallbackFilter extends BaseFilter {
private final TupleCallback callback;
public CallbackFilter(TupleCallback callback) {
this.callback = callback;
}
#Override
public boolean isKeep(TridentTuple tuple) {
if (callback != null) {
callback.callback(tuple);
}
return true;
}
public interface TupleCallback extends Serializable{
void callback(TridentTuple tuple);
}
}
If I try this, I get a runtime exception saying CountdownLatch is not serializable:
#Test
public void testState() throws Exception {
CountDownLatch latch = new CountDownLatch(4);
TridentTopology tridentTopology = new TridentTopology();
FeederBatchSpout spout = ...
TridentState state = ...
// problematic code:
CallbackFilter.TupleCallback callback = (CallbackFilter.TupleCallback & Serializable) tuple -> {
System.out.println("tuple = " + tuple);
latch.countDown(); //latch is not serializable - exception!
};
CallbackFilter latchFilter = new CallbackFilter(callback);
tridentTopology.stuff()
.each(new Fields("foo", "bar"), latchFilter);
...
So it appears Storm is serializing all of the components of a topology and then submitting them in the serialized form, probably for clustering or whatnot.
Is there any way of getting a callback from Storm to the calling test? Maybe some sort of test mode that doesn't serialize the topology? It's kinda hard to see what is going on inside the topology from a test point of view, especially at each stage of a topology.
update:
even doing something like this doesn't work!
List<TridentTuple> tupleList = new ArrayList<>();
CallbackFilter.TupleCallback callback = (CallbackFilter.TupleCallback & Serializable) tuple -> {
tupleList.add(tuple);
};
I see the tupleList being added to in the debugger, but in the space of the test, the list stays zero. It's like the topology is running in its own JVM.
There's some domain knowledge/business logic baked into the problem I'm trying to solve but I'll try to boil it down to the basics as much as possible.
Say I have an interface defined as follows:
public interface Stage<I, O> {
StageResult<O> process(StageResult<I> input) throws StageException;
}
This represents a stage in a multi-stage data processing pipeline, my idea is to break the data processing steps into sequential (non-branching) independent steps (such as read from file, parse network headers, parse message payloads, convert format, write to file) represented by individual Stage implementations. Ideally I'd implement a FileInputStage, a NetworkHeaderParseStage, a ParseMessageStage, a FormatStage, and a FileOutputStage, then have some sort of
Stage<A, C> compose(Stage<A, B> stage1, Stage<B, C> stage2);
method such that I can eventually compose a bunch of stages into a final stage that looks like FileInput -> FileOutput.
Is this something (specifically the compose method, or a similar mechanism for aggregating many stages into one stage) even supported by the Java type system? I'm hacking away at it now and I'm ending up in a very ugly place involving reflection and lots of unchecked generic types.
Am I heading off in the wrong direction or is this even a reasonable thing to try to do in Java? Thanks so much in advance!
You didn't post enough implementation details to show where the type safety issues are but here is my throw on how you could address the problem:
First dont make the whole thing too generic, make your satges specific reguarding their inputs and outputs
Then create a composit stage which implements Stage and combines two stages into one final result.
Here is a very simpele implementatiom
public class StageComposit<A, B, C> implements Stage<A, C> {
final Stage<A, B> stage1;
final Stage<B, C> stage2;
public StageComposit(Stage<A, B> stage1, Stage<B, C> stage2) {
this.stage1 = stage1;
this.stage2 = stage2;
}
#Override
public StageResult<C> process(StageResult<A> input) {
return stage2.process(stage1.process(input));
}
}
Stage result
public class StageResult<O> {
final O result;
public StageResult(O result) {
this.result = result;
}
public O get() {
return result;
}
}
Example specific Stages:
public class EpochInputStage implements Stage<Long, Date> {
#Override
public StageResult<Date> process(StageResult<Long> input) {
return new StageResult<Date>(new Date(input.get()));
}
}
public class DateFormatStage implements Stage<Date, String> {
#Override
public StageResult<String> process(StageResult<Date> input) {
return new StageResult<String>(
new SimpleDateFormat("yyyy-MM-dd HH:mm:ss")
.format(input.get()));
}
}
public class InputSplitStage implements Stage<String, List<String>> {
#Override
public StageResult<List<String>> process(StageResult<String> input) {
return new StageResult<List<String>>(
Arrays.asList(input.get().split("[-:\\s]")));
}
}
And finally a small test demonstrating how to comibine all
public class StageTest {
#Test
public void process() {
EpochInputStage efis = new EpochInputStage();
DateFormatStage dfs = new DateFormatStage();
InputSplitStage iss = new InputSplitStage();
Stage<Long, String> sc1 =
new StageComposit<Long, Date, String>(efis, dfs);
Stage<Long, List<String>> sc2 =
new StageComposit<Long, String, List<String>>(sc1, iss);
StageResult<List<String>> result =
sc2.process(new StageResult<Long>(System.currentTimeMillis()));
System.out.print(result.get());
}
}
Output for current time would be a list of strings
[2015, 06, 24, 16, 27, 55]
As you see no type safety issues or any type castings. When you need to handle other types of inputs and outputs or convert them to suite the next stage just write a new Stage and hook it up in your stage processing chain.
You may want to consider using a composite pattern or a decorator pattern. For the decorator each stage will wrap or decorate the previous stage. To do this you have each stage implement the interface as you are doing allow a stage to contain another stage.
The process() method does not need to accept a StageResult parameter anymore since it can call the contained Stage's process() method itself, get the StageResult and perform its own processing, returning another StageResult.
One advantage is that you can restructure your pipeline at run time.
Each Stage that may contain another can extend the ComposableStage and each stage that is an end point of the process can extend the LeafStage. Note that I just used those terms to name the classes by function but you can create more imaginative names.