in Java get the results from two blocks executing in parallel - java

Researching this has been a little difficult due to I'm not precisely sure how the question should be worded. Here is some pseudo code summarizing my goal.
public class TestService {
Object someBigMehtod(String A, Integer I) {
{ //block A
//do some long database read
}
{ //block B
//do another long database read at the same time as block B
}
{ //block C
//get in this block when both A & B are complete
//and access result returned or pushed from A & B
//to build up some data object to push out to a class that called
//this service or has subscribed to it
return null;
}
}
}
I am thinking I can use RxJava or Spring Integration to accomplish this or maybe just instantiating multiple threads and running them. Just the layout of it though makes me think Rx has the solution because I am thinking data is pushed to block C. Thanks in advance for any advice you might have.

You can do this with CompletableFuture. In particular, its thenCombine method, which waits for two tasks to complete.
CompletableFuture<A> fa = CompletableFuture.supplyAsync(() -> {
// do some long database read
return a;
});
CompletableFuture<B> fb = CompletableFuture.supplyAsync(() -> {
// do another long database read
return b;
});
CompletableFuture<C> fc = fa.thenCombine(fb, (a, b) -> {
// use a and b to build object c
return c;
});
return fc.join();
These methods will all execute on the ForkJoinPool.commonPool(). You can control where they run if you pass in optional Executors.

You can use Zip operator from Rxjava. This operator can run in parallel multiple process and then zip the results.
Some docu http://reactivex.io/documentation/operators/zip.html
And here an example of how works https://github.com/politrons/reactive/blob/master/src/test/java/rx/observables/combining/ObservableZip.java

For now I just went with John's suggestion. This is getting the desired effect. I mix in RxJava1 and RxJava2 syntax a bit which is probably poor practice. Looks like I have some reading cut out for me on java.util.concurrent package . Time permitting I would like to do the zip solution.
#Test
public void myBigFunction(){
System.out.println("starting ");
CompletableFuture<List<String>> fa = CompletableFuture.supplyAsync( () ->
{ //block A
//do some long database read
try {
Thread.sleep(3000);
System.out.println("part A");
return asList(new String[] {"abc","def"});
} catch (InterruptedException e) {
e.printStackTrace();
}
return null;
}
);
CompletableFuture<List<Integer>> fb = CompletableFuture.supplyAsync( () ->
{ //block B
//do some long database read
try {
Thread.sleep(6000);
System.out.println("Part B");
return asList(new Integer[] {123,456});
} catch (InterruptedException e) {
e.printStackTrace();
}
return null;
}
);
CompletableFuture<List<String>> fc = fa.thenCombine(fb,(a,b) ->{
//block C
//get in this block when both A & B are complete
int sum = b.stream().mapToInt(i -> i.intValue()).sum();
return a.stream().map(new Function<String, String>() {
#Override
public String apply(String s) {
return s+sum;
}
}).collect(Collectors.toList());
});
System.out.println(fc.join());
}
It does only take 6 seconds to run.

Related

Continuous state reduction with Flux

Let's say I have two event types (A and B) and Fluxes that generate them somehow:
Flux<A> aFlux = ...;
Flux<B> bFlux = ...;
and also a type that holds the current state denoted by type S:
class S {
final int val;
}
I want to create the following:
final S sInitial = ...;
Flux<S> sFlux = Flux.merge(aFlux, bFlux)
.scan((a, e) -> {
if(e instanceof A) {
return mapA(a, (A)e);
} else if(e instanceof B) {
return mapB(a, (B)e);
} else {
throw new RuntimeException("invalid event");
}
})
.startWith(sInitial);
where sCurr is the instance of S that was last outputted by sFlux, starting with sInitial and mapA / mapB return the new value of type S. Both S and sInitial are immutable.
That is, I want to:
Continously output the latest state ...
... that is being generated ...
... based on the current state and the received event ...
... as prescribed by the mapper functions
Is there a way to reorganize the above stream flow in some other way, especially in order to avoid using instanceof?
You could add interface and implement it for your A and B classes
interface ToSConvertible {
S toS(S s);
}
Now you could use reactor.core.publisher.Flux#scan(A, java.util.function.BiFunction<A,? super T,A>) method:
Flux<S> sFlux = Flux.merge(aFlux, bFlux)
.scan(sInitial, (s, e) -> e.toS(s));

Parallelizing a fitness function in genetic algorithm

I'm beginner to Java and as my homework I'm supposed to implement concurrency to genetic algorithm solution for Travelling Salesman Problem posted here. Our goal is to make chromosome evaluation performed by threads. So my guess is I have to rewrite this part of code to be multithreaded:
// Gets the best tour in the population
public Tour getFittest() {
Tour fittest = tours[0];
// Loop through individuals to find fittest
for (int i = 1; i < populationSize(); i++) {
if (fittest.getFitness() <= getTour(i).getFitness()) {
fittest = getTour(i);
}
}
return fittest;
}
// Gets population size
public int populationSize() {
return tours.length;
}
Originaly I intended on manually splitting the Array beetwen threads but I believe it;s not the best solution to the problem. So I made some research and everyone suggest to use either parallel streams or ExecutorService. However I had trouble applying both of this solutions even thought I tried to emulate examples posted in other threads. So my questions are: how exactly do I implement them in this case and which one is faster?
Edit: Sorry, I forget to post solution I've tried. Here it is:
public Tour getFittest() {
Tour fittest = tours[0];
synchronized (fittest) {
final ExecutorService executor = Executors.newFixedThreadPool(4);
final List<Future<?>> futures = new ArrayList<>();
for (int i = 1; i < populationSize(); i++) {
Future<?> future = executor.submit((Runnable) () -> {
if (fittest.getFitness() <= getTour(i).getFitness()) {
fittest = getTour(i);
}
});
futures.add(future);
}
try {
for (Future<?> future : futures) {
future.get();
}
}catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
}
return fittest;
}
public int populationSize() {
return tours.length;
}
However when trying to run it I receive "Local variable fittest defined in an enclosing scope must be final or effectively final" error at line:
fittest = getTour(i);
And I have no clue why it's happening or how can I fix it as adding final keyword while initializing it does not fix it. Other than that I have some doubts about using synchronized keyword in this solution. I believe that to achieve true multithreading I need to make use on it due to resource being shared by various threads. Am I right? Sadly I didn't saved my attemp at using streams but I have trouble understanding how it works at all.
Edit2: I managed to "fix" my solution by adding two workarounds. Currently my code looks like that:
public Tour getFittest() {
Tour fittest = tours[0];
synchronized (fittest) {
final ExecutorService executor = Executors.newFixedThreadPool(4);
final List<Future<?>> futures = new ArrayList<>();
for (int i = 1; i < populationSize(); i++) {
final Integer innerI = new Integer(i);
Future<?> future = executor.submit((Runnable) () -> {
if (fittest.getFitness() <= getTour(innerI).getFitness()) {
setFitness(innerI, fittest);
}
}
);
futures.add(future);
}
try {
for (Future<?> future : futures) {
future.get();
}
}catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
}
return fittest;
}
public int populationSize() {
return tours.length;
}
public Tour setFitness (int i, Tour fittest) {
fittest = getTour(i);
return fittest;
}
That said, while it's compiling, there are two problems. Memory usage keeps rising every second the program runs, maxing out my 16GB of RAM in like ten seconds while variable 'fittest' does not change at all. So I guess I'm still doing something wrong.
Here is my steams implementation:
private static Tour getFittest(Tour[] tours){
List<Map.Entry<Tour,Double>> lengths = new ArrayList<>();
Arrays.stream(tours).parallel().forEach(t->lengths.add(new AbstractMap.SimpleEntry<Tour,Double>(t,t.getLength())));
return Collections.min(lengths,Comparator.comparingDouble(Map.Entry::getValue)).getKey();
}
Upon further looking can be 1liner kinda depending on your definition
private static Tour getFittest(Tour[] tours) {
return Arrays.stream(tours).parallel().map(t -> new AbstractMap.SimpleEntry<Tour, Double>(t, t.getLength()))
.min(Comparator.comparingDouble(Map.Entry::getValue)).get().getKey();
}
also after further looking they use .getFitness() which is reciprocal of length. if you use that then use .max() as the filter.
actually even better after review
return Arrays.stream(tours).parallel()
.min(Comparator.comparingDouble(Tour::getLength)).get();

How to rewrite following rx-java crawler

The crawler has a urlQueue to record urls to crawl, a mock asynchronous url fetcher.
I try to write it in rx-java style.
At first, I try Flowable.generate like this
Flowable.generate((Consumer<Emitter<Integer>>) e -> {
final Integer poll = demo.urlQueue.poll();
if (poll != null) {
e.onNext(poll);
} else if (runningCount.get() == 0) {
e.onComplete();
}
}).flatMap(i -> {
runningCount.incrementAndGet();
return demo.urlFetcher.asyncFetchUrl(i);
}, 10)
.doOnNext(page -> demo.onSuccess(page))
.subscribe(page -> runningCount.decrementAndGet());
but it won't work, because at beginning, there may be only one seed in urlQueue, so generate is called 10 times, but only one e.onNext is emitted. Only when it is finished, then next request(1)-> generate is called.
Although in the code, we specify flatMap maxConcurrency is 10, it will crawl one by one.
After that , I modify code like following, It can work like expected.
But In the code, I should care how many tasks are running currently, then calculate how many should be fetched from the queue, that I think rx-java should do this job.
I am not sure if the code can be rewritten in a simpler way.
public class CrawlerDemo {
private static Logger logger = LoggerFactory.getLogger(CrawlerDemo.class);
// it can be redis queue or other queue
private BlockingQueue<Integer> urlQueue = new LinkedBlockingQueue<>();
private static AtomicInteger runningCount = new AtomicInteger(0);
private static final int MAX_CONCURRENCY = 5;
private UrlFetcher urlFetcher = new UrlFetcher();
private void addSeed(int i) {
urlQueue.offer(i);
}
private void onSuccess(Page page) {
page.links.forEach(i -> {
logger.info("offer more url " + i);
urlQueue.offer(i);
});
}
private void start(BehaviorProcessor processor) {
final Integer poll = urlQueue.poll();
if (poll != null) {
processor.onNext(poll);
} else {
processor.onComplete();
}
}
private int dispatchMoreLink(BehaviorProcessor processor) {
int links = 0;
while (runningCount.get() <= MAX_CONCURRENCY) {
final Integer poll = urlQueue.poll();
if (poll != null) {
processor.onNext(poll);
links++;
} else {
if (runningCount.get() == 0) {
processor.onComplete();
}
break;
}
}
return links;
}
private Flowable<Page> asyncFetchUrl(int i) {
return urlFetcher.asyncFetchUrl(i);
}
public static void main(String[] args) throws InterruptedException {
CrawlerDemo demo = new CrawlerDemo();
demo.addSeed(1);
BehaviorProcessor<Integer> processor = BehaviorProcessor.create();
processor
.flatMap(i -> {
runningCount.incrementAndGet();
return demo.asyncFetchUrl(i)
.doFinally(() -> runningCount.decrementAndGet())
.doFinally(() -> demo.dispatchMoreLink(processor));
}, MAX_CONCURRENCY)
.doOnNext(page -> demo.onSuccess(page))
.subscribe();
demo.start(processor);
}
}
class Page {
public List<Integer> links = new ArrayList<>();
}
class UrlFetcher {
static Logger logger = LoggerFactory.getLogger(UrlFetcher.class);
final ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
public Flowable<Page> asyncFetchUrl(Integer url) {
logger.info("start async get " + url);
return Flowable.defer(() -> emitter ->
scheduledExecutorService.schedule(() -> {
Page page = new Page();
// the website urls no more than 1000
if (url < 1000) {
page.links = IntStream.range(1, 5).boxed().map(j -> 10 * url + j).collect(Collectors.toList());
}
logger.info("finish async get " + url);
emitter.onNext(page);
emitter.onComplete();
}, 5, TimeUnit.SECONDS)); // cost 5 seconds to access url
}
}
You are trying to use regular (non-Rx) code with RxJava and not getting the results you want.
The first thing to do is to convert the urlQueue.poll() into a Flowable<Integer>:
Flowable.generate((Consumer<Emitter<Integer>>) e -> {
final Integer take = demo.urlQueue.take(); // Note 1
e.onNext(take); // Note 2
})
.observeOn(Schedulers.io(), 1) // Note 3
.flatMap(i -> demo.urlFetcher.asyncFetchUrl(i), 10)
.subscribe(page -> demo.onSuccess(page));
Reading the queue in a reactive way means a blocking wait. Trying to poll() the queue adds a layer of complexity that RxJava allows you to skip over.
Pass the received value on to any subscribers. If you need to indicate completion, you will need to add an external boolean, or use an in-band indicator (such as a negative integer).
observeOn() operator will subscribe to the generator. The value 1 will cause only one subscription since there is no point in having more than one.
The rest of the code is similar to what you have. The issues that you have arose because the flatMap(...,10) operation will subscribe to the generator 10 times, which is not what you wanted. You want to limit the number of simultaneous fetches. Adding the runningCount was a kludge to prevent exiting the generator early, but it is not a substitute for a proper way to signal end-of-data on the urlQueue.

RxJava flatMap and backpressure strange behavior

While writing a data synchronization job with RxJava I discovered a strange behavior that I cannot explain. I'm quite novice with RxJava and would appreciate help.
Briefely my job is quite simple I have a list of element IDs, I call a webservice to get each element by ID, do some processing and do multiple call to push data to DB.
Data loading is faster than data storing so I encounted OutOfMemory errors.
My code pretty much look like "failing" test but then doning some test I realized that removing the line :
flatMap(dt -> Observable.just(dt))
Make it work.
Failing test output shows clearly that unconsumed items stack up and this lead to OutOfMemory. Working test output shows that producer will always wait consumer so this never lead to OutOfMemory.
public static class DataStore {
public Integer myVal;
public byte[] myBigData;
public DataStore(Integer myVal) {
this.myVal = myVal;
this.myBigData = new byte[1000000];
}
}
#Test
public void working() {
int MAX_CONCURRENT_LOAD = 1;
int MAX_CONCURRENT_STORE = 2;
AtomicInteger nbUnconsumed = new AtomicInteger(0);
List<Integer> ids = IntStream.range(0, 1000).boxed().collect(Collectors.toList());
Observable.from(ids)
.flatMap(this::produce, MAX_CONCURRENT_LOAD)
.doOnNext(s -> logger.info("+1 Total unconsumed values: " + nbUnconsumed.incrementAndGet()))
.flatMap(this::consume, MAX_CONCURRENT_STORE)
.doOnNext(s -> logger.info("-1 Total unconsumed values: " + nbUnconsumed.decrementAndGet()))
.toBlocking().forEach(s -> {});
logger.info("Finished");
}
#Test
public void failing() {
int MAX_CONCURRENT_LOAD = 1;
int MAX_CONCURRENT_STORE = 2;
AtomicInteger nbUnconsumed = new AtomicInteger(0);
List<Integer> ids = IntStream.range(0, 1000).boxed().collect(Collectors.toList());
Observable.from(ids)
.flatMap(this::produce, MAX_CONCURRENT_LOAD)
.doOnNext(s -> logger.info("+1 Total unconsumed values: " + nbUnconsumed.incrementAndGet()))
.flatMap(dt -> Observable.just(dt))
.flatMap(this::consume, MAX_CONCURRENT_STORE)
.doOnNext(s -> logger.info("-1 Total unconsumed values: " + nbUnconsumed.decrementAndGet()))
.toBlocking().forEach(s -> {});
logger.info("Finished");
}
private Observable<DataStore> produce(final int value) {
return Observable.<DataStore>create(s -> {
try {
if (!s.isUnsubscribed()) {
Thread.sleep(200); //Here I synchronous call WS to retrieve data
s.onNext(new DataStore(value));
s.onCompleted();
}
} catch (Exception e) {
s.onError(e);
}
}).subscribeOn(Schedulers.io());
}
private Observable<Boolean> consume(DataStore value) {
return Observable.<Boolean>create(s -> {
try {
if (!s.isUnsubscribed()) {
Thread.sleep(1000); //Here I synchronous call DB to store data
s.onNext(true);
s.onCompleted();
}
} catch (Exception e) {
s.onNext(false);
s.onCompleted();
}
}).subscribeOn(Schedulers.io());
}
What is explaination behind this behavior? How could I solve my failing test without removing the Observable.just(dt)) which in my real case is a Observable.from(someListOfItme)
flatMap by default merges an unlimited amount of sources and by applying that specific lambda without maxConcurrent parameter, you essentially unbounded the upstream which now can run at full speed, overwhelming the internal buffers of the other operators.

How to call RxJava Observable without immediately subscribing?

I have a java method that returns a string template. I want to make 2 async call to a remote api, each call will return a number, then I want to compute the sum of these 2 numbers and put it into the template before returning it.
So I have this java code to achieve this task :
private Observable<Integer> createObservable() {
Observable<Integer> obs = Observable.create(new OnSubscribe<Integer>() {
public void call(Subscriber<? super Integer> t) {
System.out.println("Call with thread : " + Thread.currentThread().getName());
//FAKE CALL TO REMOTE API => THE THREAD IS SLEEPING DURING 4 SECCONDS
try {
Thread.sleep(4000);
} catch (InterruptedException e) {
e.printStackTrace();
}
t.onNext(new Random().nextInt(10));
t.onCompleted();
}
}).subscribeOn(Schedulers.newThread());
return Observable
.merge(obs, obs)
.reduce(new Func2<Integer, Integer, Integer>() {
public Integer call(Integer t1, Integer t2) {
return t1 + t2;
}
});
}
public String retrieveTemplate() {
//I WANT TO START THE WORK OF THE OBSERVABLE HERE BUT I DON'T KNOW HOW TO DO IT
//DO THINGS IN THE MAIN THREAD
//HERE I JUST INITIALIZE A STRING BUT WE COULD IMAGINE I WOULD DO MORE THINGS
String s = "The final Number is {0}";
System.out.println(Thread.currentThread().getName() + " : the string is initialized");
//I WAIT FOR THE OBSERVABLE RESULT HERE
int result = createObservable().toBlocking().first();
return MessageFormat.format(s, result);
}
The output of this code is correct (Two threads are created to call the remote api)
main : the string is initialized
Call with thread : RxNewThreadScheduler-1
Call with thread : RxNewThreadScheduler-2
The final Number is 2
I want to call the RxJava Observable at the begining of the method retrieveTemplate (in order to call the remote api as soon as possible) and wait for the result just before the call of MessageFormat.format but I don't know how to do it
Assuming the whole creation process works, you may want to bind the whole computation together to subscription moment by transforming the source observable:
public Observable<String> retrieveTemplate() {
return createObservable().map(result -> {
String s = "The final Number is {0}";
System.out.println(Thread.currentThread().getName() + " : the string is initialized");
return MessageFormat.format(s, result);
});
}
When you subscribe to the result observable of retrieveTemplate - you actually start the whole computation:
// some other place in the code
retrieveTemplate().subscribe(template -> doStuffWithTemplate(template))

Categories