I have a java method that returns a string template. I want to make 2 async call to a remote api, each call will return a number, then I want to compute the sum of these 2 numbers and put it into the template before returning it.
So I have this java code to achieve this task :
private Observable<Integer> createObservable() {
Observable<Integer> obs = Observable.create(new OnSubscribe<Integer>() {
public void call(Subscriber<? super Integer> t) {
System.out.println("Call with thread : " + Thread.currentThread().getName());
//FAKE CALL TO REMOTE API => THE THREAD IS SLEEPING DURING 4 SECCONDS
try {
Thread.sleep(4000);
} catch (InterruptedException e) {
e.printStackTrace();
}
t.onNext(new Random().nextInt(10));
t.onCompleted();
}
}).subscribeOn(Schedulers.newThread());
return Observable
.merge(obs, obs)
.reduce(new Func2<Integer, Integer, Integer>() {
public Integer call(Integer t1, Integer t2) {
return t1 + t2;
}
});
}
public String retrieveTemplate() {
//I WANT TO START THE WORK OF THE OBSERVABLE HERE BUT I DON'T KNOW HOW TO DO IT
//DO THINGS IN THE MAIN THREAD
//HERE I JUST INITIALIZE A STRING BUT WE COULD IMAGINE I WOULD DO MORE THINGS
String s = "The final Number is {0}";
System.out.println(Thread.currentThread().getName() + " : the string is initialized");
//I WAIT FOR THE OBSERVABLE RESULT HERE
int result = createObservable().toBlocking().first();
return MessageFormat.format(s, result);
}
The output of this code is correct (Two threads are created to call the remote api)
main : the string is initialized
Call with thread : RxNewThreadScheduler-1
Call with thread : RxNewThreadScheduler-2
The final Number is 2
I want to call the RxJava Observable at the begining of the method retrieveTemplate (in order to call the remote api as soon as possible) and wait for the result just before the call of MessageFormat.format but I don't know how to do it
Assuming the whole creation process works, you may want to bind the whole computation together to subscription moment by transforming the source observable:
public Observable<String> retrieveTemplate() {
return createObservable().map(result -> {
String s = "The final Number is {0}";
System.out.println(Thread.currentThread().getName() + " : the string is initialized");
return MessageFormat.format(s, result);
});
}
When you subscribe to the result observable of retrieveTemplate - you actually start the whole computation:
// some other place in the code
retrieveTemplate().subscribe(template -> doStuffWithTemplate(template))
Related
How to use CompletableFuture to use result of first Callable task as arg to all subsequent Callable tasks? I have 3 tasks that need to run like so:
First blocking task runs and returns a value
2nd and 3rd tasks run asysnchronously with argument supplied from first task and return values.
All 3 values summed up as a final result from all of it.
I tried to do this below, but I am stuck on the .thenApply clause.
I can't quite get this code to work. IN the .thenApply clause, how do I pass an argument from the object response returned?
import com.google.common.util.concurrent.Uninterruptibles;
import java.util.concurrent.*;
public class ThreadPoolTest {
static ExecutorService threadPool = Executors.newFixedThreadPool(10);
public static void main(String[] args) {
CompletableFuture<SumCalculator> cf =
CompletableFuture.supplyAsync(() -> new SumCalculator(100000), threadPool);
Integer initialResult = cf.getNow(null).call();
CompletableFuture<SumCalculator> cf2 = CompletableFuture.completedFuture(initialResult)
.thenApplyAsync((i) -> new SumCalculator(i));
// i want to call 2 or more SumCalulator tasks here
System.out.println("DONE? " + cf2.isDone());
System.out.println("message? " + cf2.getNow(null).call());
threadPool.shutdown();
System.out.println("Program exit.");
}
public static class SumCalculator implements Callable<Integer> {
private int n;
public SumCalculator(int n) {
this.n = n;
}
public Integer call() {
int sum = 0;
for (int i = 1; i <= n; i++) {
sum += i;
}
Uninterruptibles.sleepUninterruptibly(800, TimeUnit.MILLISECONDS);
return sum;
}
}
}
NOTE: I do want to collect the responses from all 3 tasks together at the end of the Futures as a combined result list, perhaps as a stream of Integer values? In this case, I would want to sum the values. I am wanting to do this for a performance benefit with multiple threads.
If I understood correctly:
CompletableFuture<Integer> one =
CompletableFuture.supplyAsync(() -> new SumCalculator(100000).call(), threadPool);
CompletableFuture<Integer> two = one.thenApplyAsync(x -> new SumCalculator(x).call(), threadPool);
CompletableFuture<Integer> three = one.thenApplyAsync(x -> new SumCalculator(x).call(), threadPool);
Integer result = one.join() + two.join() + three.join();
System.out.println(result);
Researching this has been a little difficult due to I'm not precisely sure how the question should be worded. Here is some pseudo code summarizing my goal.
public class TestService {
Object someBigMehtod(String A, Integer I) {
{ //block A
//do some long database read
}
{ //block B
//do another long database read at the same time as block B
}
{ //block C
//get in this block when both A & B are complete
//and access result returned or pushed from A & B
//to build up some data object to push out to a class that called
//this service or has subscribed to it
return null;
}
}
}
I am thinking I can use RxJava or Spring Integration to accomplish this or maybe just instantiating multiple threads and running them. Just the layout of it though makes me think Rx has the solution because I am thinking data is pushed to block C. Thanks in advance for any advice you might have.
You can do this with CompletableFuture. In particular, its thenCombine method, which waits for two tasks to complete.
CompletableFuture<A> fa = CompletableFuture.supplyAsync(() -> {
// do some long database read
return a;
});
CompletableFuture<B> fb = CompletableFuture.supplyAsync(() -> {
// do another long database read
return b;
});
CompletableFuture<C> fc = fa.thenCombine(fb, (a, b) -> {
// use a and b to build object c
return c;
});
return fc.join();
These methods will all execute on the ForkJoinPool.commonPool(). You can control where they run if you pass in optional Executors.
You can use Zip operator from Rxjava. This operator can run in parallel multiple process and then zip the results.
Some docu http://reactivex.io/documentation/operators/zip.html
And here an example of how works https://github.com/politrons/reactive/blob/master/src/test/java/rx/observables/combining/ObservableZip.java
For now I just went with John's suggestion. This is getting the desired effect. I mix in RxJava1 and RxJava2 syntax a bit which is probably poor practice. Looks like I have some reading cut out for me on java.util.concurrent package . Time permitting I would like to do the zip solution.
#Test
public void myBigFunction(){
System.out.println("starting ");
CompletableFuture<List<String>> fa = CompletableFuture.supplyAsync( () ->
{ //block A
//do some long database read
try {
Thread.sleep(3000);
System.out.println("part A");
return asList(new String[] {"abc","def"});
} catch (InterruptedException e) {
e.printStackTrace();
}
return null;
}
);
CompletableFuture<List<Integer>> fb = CompletableFuture.supplyAsync( () ->
{ //block B
//do some long database read
try {
Thread.sleep(6000);
System.out.println("Part B");
return asList(new Integer[] {123,456});
} catch (InterruptedException e) {
e.printStackTrace();
}
return null;
}
);
CompletableFuture<List<String>> fc = fa.thenCombine(fb,(a,b) ->{
//block C
//get in this block when both A & B are complete
int sum = b.stream().mapToInt(i -> i.intValue()).sum();
return a.stream().map(new Function<String, String>() {
#Override
public String apply(String s) {
return s+sum;
}
}).collect(Collectors.toList());
});
System.out.println(fc.join());
}
It does only take 6 seconds to run.
The crawler has a urlQueue to record urls to crawl, a mock asynchronous url fetcher.
I try to write it in rx-java style.
At first, I try Flowable.generate like this
Flowable.generate((Consumer<Emitter<Integer>>) e -> {
final Integer poll = demo.urlQueue.poll();
if (poll != null) {
e.onNext(poll);
} else if (runningCount.get() == 0) {
e.onComplete();
}
}).flatMap(i -> {
runningCount.incrementAndGet();
return demo.urlFetcher.asyncFetchUrl(i);
}, 10)
.doOnNext(page -> demo.onSuccess(page))
.subscribe(page -> runningCount.decrementAndGet());
but it won't work, because at beginning, there may be only one seed in urlQueue, so generate is called 10 times, but only one e.onNext is emitted. Only when it is finished, then next request(1)-> generate is called.
Although in the code, we specify flatMap maxConcurrency is 10, it will crawl one by one.
After that , I modify code like following, It can work like expected.
But In the code, I should care how many tasks are running currently, then calculate how many should be fetched from the queue, that I think rx-java should do this job.
I am not sure if the code can be rewritten in a simpler way.
public class CrawlerDemo {
private static Logger logger = LoggerFactory.getLogger(CrawlerDemo.class);
// it can be redis queue or other queue
private BlockingQueue<Integer> urlQueue = new LinkedBlockingQueue<>();
private static AtomicInteger runningCount = new AtomicInteger(0);
private static final int MAX_CONCURRENCY = 5;
private UrlFetcher urlFetcher = new UrlFetcher();
private void addSeed(int i) {
urlQueue.offer(i);
}
private void onSuccess(Page page) {
page.links.forEach(i -> {
logger.info("offer more url " + i);
urlQueue.offer(i);
});
}
private void start(BehaviorProcessor processor) {
final Integer poll = urlQueue.poll();
if (poll != null) {
processor.onNext(poll);
} else {
processor.onComplete();
}
}
private int dispatchMoreLink(BehaviorProcessor processor) {
int links = 0;
while (runningCount.get() <= MAX_CONCURRENCY) {
final Integer poll = urlQueue.poll();
if (poll != null) {
processor.onNext(poll);
links++;
} else {
if (runningCount.get() == 0) {
processor.onComplete();
}
break;
}
}
return links;
}
private Flowable<Page> asyncFetchUrl(int i) {
return urlFetcher.asyncFetchUrl(i);
}
public static void main(String[] args) throws InterruptedException {
CrawlerDemo demo = new CrawlerDemo();
demo.addSeed(1);
BehaviorProcessor<Integer> processor = BehaviorProcessor.create();
processor
.flatMap(i -> {
runningCount.incrementAndGet();
return demo.asyncFetchUrl(i)
.doFinally(() -> runningCount.decrementAndGet())
.doFinally(() -> demo.dispatchMoreLink(processor));
}, MAX_CONCURRENCY)
.doOnNext(page -> demo.onSuccess(page))
.subscribe();
demo.start(processor);
}
}
class Page {
public List<Integer> links = new ArrayList<>();
}
class UrlFetcher {
static Logger logger = LoggerFactory.getLogger(UrlFetcher.class);
final ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
public Flowable<Page> asyncFetchUrl(Integer url) {
logger.info("start async get " + url);
return Flowable.defer(() -> emitter ->
scheduledExecutorService.schedule(() -> {
Page page = new Page();
// the website urls no more than 1000
if (url < 1000) {
page.links = IntStream.range(1, 5).boxed().map(j -> 10 * url + j).collect(Collectors.toList());
}
logger.info("finish async get " + url);
emitter.onNext(page);
emitter.onComplete();
}, 5, TimeUnit.SECONDS)); // cost 5 seconds to access url
}
}
You are trying to use regular (non-Rx) code with RxJava and not getting the results you want.
The first thing to do is to convert the urlQueue.poll() into a Flowable<Integer>:
Flowable.generate((Consumer<Emitter<Integer>>) e -> {
final Integer take = demo.urlQueue.take(); // Note 1
e.onNext(take); // Note 2
})
.observeOn(Schedulers.io(), 1) // Note 3
.flatMap(i -> demo.urlFetcher.asyncFetchUrl(i), 10)
.subscribe(page -> demo.onSuccess(page));
Reading the queue in a reactive way means a blocking wait. Trying to poll() the queue adds a layer of complexity that RxJava allows you to skip over.
Pass the received value on to any subscribers. If you need to indicate completion, you will need to add an external boolean, or use an in-band indicator (such as a negative integer).
observeOn() operator will subscribe to the generator. The value 1 will cause only one subscription since there is no point in having more than one.
The rest of the code is similar to what you have. The issues that you have arose because the flatMap(...,10) operation will subscribe to the generator 10 times, which is not what you wanted. You want to limit the number of simultaneous fetches. Adding the runningCount was a kludge to prevent exiting the generator early, but it is not a substitute for a proper way to signal end-of-data on the urlQueue.
While writing a data synchronization job with RxJava I discovered a strange behavior that I cannot explain. I'm quite novice with RxJava and would appreciate help.
Briefely my job is quite simple I have a list of element IDs, I call a webservice to get each element by ID, do some processing and do multiple call to push data to DB.
Data loading is faster than data storing so I encounted OutOfMemory errors.
My code pretty much look like "failing" test but then doning some test I realized that removing the line :
flatMap(dt -> Observable.just(dt))
Make it work.
Failing test output shows clearly that unconsumed items stack up and this lead to OutOfMemory. Working test output shows that producer will always wait consumer so this never lead to OutOfMemory.
public static class DataStore {
public Integer myVal;
public byte[] myBigData;
public DataStore(Integer myVal) {
this.myVal = myVal;
this.myBigData = new byte[1000000];
}
}
#Test
public void working() {
int MAX_CONCURRENT_LOAD = 1;
int MAX_CONCURRENT_STORE = 2;
AtomicInteger nbUnconsumed = new AtomicInteger(0);
List<Integer> ids = IntStream.range(0, 1000).boxed().collect(Collectors.toList());
Observable.from(ids)
.flatMap(this::produce, MAX_CONCURRENT_LOAD)
.doOnNext(s -> logger.info("+1 Total unconsumed values: " + nbUnconsumed.incrementAndGet()))
.flatMap(this::consume, MAX_CONCURRENT_STORE)
.doOnNext(s -> logger.info("-1 Total unconsumed values: " + nbUnconsumed.decrementAndGet()))
.toBlocking().forEach(s -> {});
logger.info("Finished");
}
#Test
public void failing() {
int MAX_CONCURRENT_LOAD = 1;
int MAX_CONCURRENT_STORE = 2;
AtomicInteger nbUnconsumed = new AtomicInteger(0);
List<Integer> ids = IntStream.range(0, 1000).boxed().collect(Collectors.toList());
Observable.from(ids)
.flatMap(this::produce, MAX_CONCURRENT_LOAD)
.doOnNext(s -> logger.info("+1 Total unconsumed values: " + nbUnconsumed.incrementAndGet()))
.flatMap(dt -> Observable.just(dt))
.flatMap(this::consume, MAX_CONCURRENT_STORE)
.doOnNext(s -> logger.info("-1 Total unconsumed values: " + nbUnconsumed.decrementAndGet()))
.toBlocking().forEach(s -> {});
logger.info("Finished");
}
private Observable<DataStore> produce(final int value) {
return Observable.<DataStore>create(s -> {
try {
if (!s.isUnsubscribed()) {
Thread.sleep(200); //Here I synchronous call WS to retrieve data
s.onNext(new DataStore(value));
s.onCompleted();
}
} catch (Exception e) {
s.onError(e);
}
}).subscribeOn(Schedulers.io());
}
private Observable<Boolean> consume(DataStore value) {
return Observable.<Boolean>create(s -> {
try {
if (!s.isUnsubscribed()) {
Thread.sleep(1000); //Here I synchronous call DB to store data
s.onNext(true);
s.onCompleted();
}
} catch (Exception e) {
s.onNext(false);
s.onCompleted();
}
}).subscribeOn(Schedulers.io());
}
What is explaination behind this behavior? How could I solve my failing test without removing the Observable.just(dt)) which in my real case is a Observable.from(someListOfItme)
flatMap by default merges an unlimited amount of sources and by applying that specific lambda without maxConcurrent parameter, you essentially unbounded the upstream which now can run at full speed, overwhelming the internal buffers of the other operators.
I'm writing a Java program to solve this problem:
I have a balanced tree (namely, a TreeSet in Java) containing values. I have "Task" objects that will do either of the two things: try to find a value in the tree, or add a value to the tree. I will have a list of these "Task" objects (I used a LinkedList in Java) and I create threads to read and remove the tasks from this list one by one and perform their required action (i.e., find or add a value in the tree). I have created a synchronized "remove" method for my task list (which simply calls the underlying LinkedList's "remove" method). I have also defined the "add" method of the tree to be synchronized... (I don't know if it's necessary for it to be synchronized or not, but I assume it is).
How can I improve the performance of this program when using multiple threads? Right now, if I use a single thread, the time is better than when I use multiple threads.
This is the run method of my TaskRunner class, my threads are objects of this class and it implements Runnable, tasks is the list containing tasks and tree is my TreeSet passed to this object in the constructor:
Task task;
int action; // '0' for search, '1' for add
int value; // Value to be used for searching or adding
while (!tasks.isEmpty()) {
try { task = tasks.remove(); }
catch (NoSuchElementException ex) { break; }
action = task.getAction();
value = task.getValue();
if (action == 0)
boolean found = tree.contains(value);
else
tree.add(value);
}
Also, my tree inherits from TreeSet<Integer> in Java and I have defined its add method as synchronized:
public synchronized boolean add(Integer e) {
return super.add(e);
}
And my task list inherits from LinkedList<Task> and its remove method:
public synchronized Task remove() {
return super.remove();
}
If your task class implements Runnable interface, you can use ThreadPool to process the tasks.
Here is an example:
public class TreeSetTaskExample {
public static class Task implements Runnable {
String value;
boolean add;
Set<String> synchronizedTreeSet;
public Task(String value, boolean add, Set<String> synchronizedTreeSet) {
this.value = value;
this.add = add;
this.synchronizedTreeSet = synchronizedTreeSet;
}
#Override
public void run() {
String threadName = Thread.currentThread().toString();
if (add) {
System.out.println(threadName + "# add: " + value);
synchronizedTreeSet.add(value);
} else {
boolean contains = synchronizedTreeSet.contains(value);
System.out.println(threadName + "# treeSet.contains: " + value + " = " + contains + " removed...");
if (contains) {
synchronizedTreeSet.remove(value);
}
}
}
}
public static void main(String[] args) throws InterruptedException {
//
// synchronizedSet
//
Set<String> treeSet = Collections.synchronizedSet(new TreeSet<String>());
//
// ThreadPool with ? Threads
//
int processors = Runtime.getRuntime().availableProcessors();
ExecutorService threadPool = Executors.newFixedThreadPool(processors);
for (int i = 0; i < 100; i++) {
String someValue = "" + (i % 5);
boolean addOrCheck = Math.random() > 0.5;
threadPool.execute(new Task(someValue, addOrCheck, treeSet));
}
//
// don't forget to kill the threadpool
//
threadPool.shutdown();
}
}