I've recently given a coding interview on a Java concurrency task and unfortunately didn't get the job. The worst part is I've given my best but now I'm not even sure where went wrong. Can anyone help give me some ideas about things I can improve on below code? Thanks
The question is pretty vague. Given 4 generic interface which on a high level divides a task into small pieces, work on each piece and combine the partial result into final result, I'm asked to implement the central controller piece of the interface. The only requirement is to use concurrency in the partial result processing and "code must be production quality"
My code is as below (the interfaces was given). I did put in a lot of comment to explain my assumptions which are removed here
// adding V,W in order to use in private fields types
public class ControllerImpl<T, U, V, W> implements Controller<T, U> {
private static Logger logger = LoggerFactory.getLogger(ControllerImpl.class);
private static int BATCH_SIZE = 100;
private Preprocessor<T, V> preprocessor;
private Processor<V, W> processor;
private Postprocessor<U, W> postprocessor;
public ControllerImpl() {
this.preprocessor = new PreprocessorImpl<>();
this.processor = new ProcessorImpl<>();
this.postprocessor = new PostprocessorImpl<>();
}
public ControllerImpl(Preprocessor preprocessor, Processor processor, Postprocessor postprocessor) {
this.preprocessor = preprocessor;
this.processor = processor;
this.postprocessor = postprocessor;
}
#Override
public U process(T arg) {
if (arg == null) return null;
final V[] parts = preprocessor.split(arg);
final W[] partResult = (W[]) new Object[parts.length];
final int poolSize = Runtime.getRuntime().availableProcessors();
final ExecutorService executor = getExecutor(poolSize);
int i = 0;
while (i < parts.length) {
final List<Callable<W>> tasks = IntStream.range(i, i + BATCH_SIZE)
.filter(e -> e < parts.length)
.mapToObj(e -> (Callable<W>) () -> partResult[e] = processor.processPart(parts[e]))
.collect(Collectors.toList());
i += tasks.size();
try {
logger.info("invoking batch of {} tasks to workers", tasks.size());
long start = System.currentTimeMillis();
final List<Future<W>> futures = executor.invokeAll(tasks);
long end = System.currentTimeMillis();
logger.info("done batch processing took {} ms", end - start);
for (Future future : futures) {
future.get();
}
} catch (InterruptedException e) {
logger.error("{}", e);// have comments to explain better handling according to real business requirement
} catch (ExecutionException e) {
logger.error("error: ", e);
}
}
MoreExecutors.shutdownAndAwaitTermination(executor, 60, TimeUnit.SECONDS);
return postprocessor.aggregate(partResult);
}
private ExecutorService getExecutor(int poolSize) {
final ThreadFactory threadFactory = new ThreadFactoryBuilder()
.setNameFormat("Processor-%d")
.setDaemon(true)
.build();
return new ThreadPoolExecutor(poolSize, poolSize, 60, TimeUnit.SECONDS, new LinkedBlockingDeque<>(), threadFactory);
}
}
So, if I understand correctly, you have a Preprocessor that takes a T and splits it into an array of V[]. Then you have a processor which transforms a V into a W. And then a postprocessor which transforms a W[] into a U, right? And you must assemble those things.
First of all, arrays and generics really don't match together, so it's really bizarre for those methods to return arrays rather than lists. For production-quality code, generic arrays shouldn't be used.
So, to recap:
T --> V1 --> W1 --> U
V2 --> W2
. .
. .
Vn --> Wn
So you could do this:
V[] parts = preprocessor.split(t);
W[] transformedParts =
(W[]) Arrays.stream(parts) // unchecked cast due to the use of generic arrays
.parallel() // this is where concurrency happens
.map(processor::processPart)
.toArray();
U result = postProcessor.aggregate(transformedParts);
If you use lists instead of arrays, and write it as a single line:
U result =
postProcessor.aggregate(
preprocessor.split(t)
.parallelStream()
.map(processor::processPart)
.collect(Collectors.toList()));
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I am trying to change the loop to Java streams.
For example,
interface Logic {
int apply(int value);
}
public class AddOneLogic implements Logic {
#Override
public int apply(int value) {
return value + 1;
}
}
public class AddTwoLogic implements Logic {
#Override
public int apply(int value) {
return value + 2;
}
}
Using a loop to apply a Logic looks like
List<Logic> logics = new ArrayList<>();
logics.add(new AddOneLogic());
logics.add(new AddTwoLogic());
int init = 1;
I want to change to streams below. Is there any better way to do it?
int result = init;
for (Logic logic : logics) {
result = logic.apply(result);
}
As #duffymo mentioned in the comments, these classes aren't particularly useful and they could be replaced with Function<Integer, Integer>s and lambda expressions to define them.
In that case, you may want to reduce a list/stream of Functions by Function::andThen,
Function<Integer, Integer> addOneFunction = i -> i + 1;
Function<Integer, Integer> addTwoFunction = i -> i + 2;
Function<Integer, Integer> function =
Stream.of(addOneFunction, addTwoFunction)
.reduce(Function.identity(), Function::andThen);
so you would get a composed function to work with
Integer result = function.apply(init);
// ((1 + 1) + 2) = 4
You can do it with Stream and AtomicInteger and getAndSet(int) method as below,
AtomicInteger result = new AtomicInteger(1);
logics.stream().forEach(ele-> result.getAndSet(ele.apply(result.get())));
// result = ((1+1)+2)=4
Better option would be to use Function,
Function<Integer, Integer> addOne = i -> i + 1;
Function<Integer, Integer> addTwo = i -> i + 2;
List<Function<Integer, Integer>> logics = new ArrayList<>();
logics.add(addOne);
logics.add(addTwo);
AtomicInteger result = new AtomicInteger(1);
logics.stream().forEach(ele-> result.getAndSet(ele.apply(result.get())));
You can even avoid logics list and use andThen method as below,
Function<Integer, Integer> add = addOne.andThen(addTwo);
result = add.apply(1);
Hope it helps..!!
As others have already mentioned: The intention behind the question might be distorted by the attempt to simplify the question so that it can be posted here. The Logic interface does not really make sense, because it could be replaced with an IntUnaryOperator.
Not with a Function<Integer, Integer> - that's a different thing!
But I'll (also) make some assumptions when trying to answer the question:
The Logic interface is merely a placeholder for an interface that has to be retained in its current form
Several Logic instances can sensibly be combined in order to yield an new Logic
The goal is not to "apply streams for the streams sake", but to create sensible, usable classes and methods (and it's a pity that this is worth mentioning...)
If this is the case, then I'd suggest creating a CombinedLogic class that simply offers a method for combining several Logic objects to create the combined one.
It could also be a concrete class that internally stores a List<Logic>. This might be handy in order to modify a combined logic later, as in combinedLogic.setElement(42, new OtherLogic());. But a public class with a modifiable state should be thought through carefully...
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class CombinedLogicExample {
public static void main(String[] args) {
List<Logic> logics = new ArrayList<>();
logics.add(new AddOneLogic());
logics.add(new AddTwoLogic());
Logic combined = CombinedLogic.of(logics);
// Alternatively:
// Logic logic1 = new AddOneLogic();
// Logic logic2 = new AddTwoLogic();
// Logic combined = CombinedLogic.of(logic1, logic2);
int init = 1;
int result = combined.apply(init);
System.out.println(result);
}
}
class CombinedLogic {
static Logic of(Logic... logics) {
return of(Arrays.asList(logics));
}
static Logic of(Iterable<? extends Logic> logics) {
return a -> {
int result = a;
for (Logic logic : logics) {
result = logic.apply(result);
}
return result;
};
}
}
interface Logic {
int apply(int value);
}
class AddOneLogic implements Logic {
#Override
public int apply(int value) {
return value + 1;
}
}
class AddTwoLogic implements Logic {
#Override
public int apply(int value) {
return value + 2;
}
}
I'm beginner to Java and as my homework I'm supposed to implement concurrency to genetic algorithm solution for Travelling Salesman Problem posted here. Our goal is to make chromosome evaluation performed by threads. So my guess is I have to rewrite this part of code to be multithreaded:
// Gets the best tour in the population
public Tour getFittest() {
Tour fittest = tours[0];
// Loop through individuals to find fittest
for (int i = 1; i < populationSize(); i++) {
if (fittest.getFitness() <= getTour(i).getFitness()) {
fittest = getTour(i);
}
}
return fittest;
}
// Gets population size
public int populationSize() {
return tours.length;
}
Originaly I intended on manually splitting the Array beetwen threads but I believe it;s not the best solution to the problem. So I made some research and everyone suggest to use either parallel streams or ExecutorService. However I had trouble applying both of this solutions even thought I tried to emulate examples posted in other threads. So my questions are: how exactly do I implement them in this case and which one is faster?
Edit: Sorry, I forget to post solution I've tried. Here it is:
public Tour getFittest() {
Tour fittest = tours[0];
synchronized (fittest) {
final ExecutorService executor = Executors.newFixedThreadPool(4);
final List<Future<?>> futures = new ArrayList<>();
for (int i = 1; i < populationSize(); i++) {
Future<?> future = executor.submit((Runnable) () -> {
if (fittest.getFitness() <= getTour(i).getFitness()) {
fittest = getTour(i);
}
});
futures.add(future);
}
try {
for (Future<?> future : futures) {
future.get();
}
}catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
}
return fittest;
}
public int populationSize() {
return tours.length;
}
However when trying to run it I receive "Local variable fittest defined in an enclosing scope must be final or effectively final" error at line:
fittest = getTour(i);
And I have no clue why it's happening or how can I fix it as adding final keyword while initializing it does not fix it. Other than that I have some doubts about using synchronized keyword in this solution. I believe that to achieve true multithreading I need to make use on it due to resource being shared by various threads. Am I right? Sadly I didn't saved my attemp at using streams but I have trouble understanding how it works at all.
Edit2: I managed to "fix" my solution by adding two workarounds. Currently my code looks like that:
public Tour getFittest() {
Tour fittest = tours[0];
synchronized (fittest) {
final ExecutorService executor = Executors.newFixedThreadPool(4);
final List<Future<?>> futures = new ArrayList<>();
for (int i = 1; i < populationSize(); i++) {
final Integer innerI = new Integer(i);
Future<?> future = executor.submit((Runnable) () -> {
if (fittest.getFitness() <= getTour(innerI).getFitness()) {
setFitness(innerI, fittest);
}
}
);
futures.add(future);
}
try {
for (Future<?> future : futures) {
future.get();
}
}catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
}
return fittest;
}
public int populationSize() {
return tours.length;
}
public Tour setFitness (int i, Tour fittest) {
fittest = getTour(i);
return fittest;
}
That said, while it's compiling, there are two problems. Memory usage keeps rising every second the program runs, maxing out my 16GB of RAM in like ten seconds while variable 'fittest' does not change at all. So I guess I'm still doing something wrong.
Here is my steams implementation:
private static Tour getFittest(Tour[] tours){
List<Map.Entry<Tour,Double>> lengths = new ArrayList<>();
Arrays.stream(tours).parallel().forEach(t->lengths.add(new AbstractMap.SimpleEntry<Tour,Double>(t,t.getLength())));
return Collections.min(lengths,Comparator.comparingDouble(Map.Entry::getValue)).getKey();
}
Upon further looking can be 1liner kinda depending on your definition
private static Tour getFittest(Tour[] tours) {
return Arrays.stream(tours).parallel().map(t -> new AbstractMap.SimpleEntry<Tour, Double>(t, t.getLength()))
.min(Comparator.comparingDouble(Map.Entry::getValue)).get().getKey();
}
also after further looking they use .getFitness() which is reciprocal of length. if you use that then use .max() as the filter.
actually even better after review
return Arrays.stream(tours).parallel()
.min(Comparator.comparingDouble(Tour::getLength)).get();
during the development of ExecutorService, it became necessary to put List in Set . How can this be done?
public class Executor {
private Set<List<Future<Object>>> primeNumList = Collections.synchronizedSet(new TreeSet<>());
Set<List<Future<Object>>> getPrimeNumList() {
return primeNumList;
}
#SuppressWarnings("unchecked")
public void setup(int min, int max, int threadNum) throws InterruptedException {
ExecutorService executorService = Executors.newFixedThreadPool(threadNum);
List<Callable<Object>> callableList = new ArrayList<>();
for (int i = 0; i < threadNum; i++) {
callableList.add(new AdderImmediately(min + i, max, threadNum));
}
List<Future<Object>> a = executorService.invokeAll(callableList);
primeNumList.add(a); // here i try to add Future list into Set
System.out.println(primeNumList);
executorService.shutdown();
}
My class in which I process the values and return them via call (). After that they fall into the List from where I want them to be placed in the final Set
public class AdderImmediately implements Callable {
private int minRange;
private int maxRange;
private Set<Integer> primeNumberList = new TreeSet<>();
private int step;
AdderImmediately(int minRange, int maxRange, int step) {
this.minRange = minRange;
this.maxRange = maxRange;
this.step = step;
}
#Override
public Object call() {
fillPrimeNumberList(primeNumberList);
return primeNumberList;
}
private void fillPrimeNumberList(Set<Integer> primeNumberList) {
for (int i = minRange; i <= maxRange; i += step) {
if (PrimeChecker.isPrimeNumber(i)) {
primeNumberList.add(i);
}
}
}
}
Is it somehow possible to implement? Because what I have now, I get a ClassCastException. Or am I not understanding something?)
Exception:
Exception in thread "main" java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.Comparable
at java.util.TreeMap.compare(TreeMap.java:1294)
at java.util.TreeMap.put(TreeMap.java:538)
at java.util.TreeSet.add(TreeSet.java:255)
at java.util.Collections$SynchronizedCollection.add(Collections.java:2035)
at Executor.setup(Executor.java:22)
at Demo.main(Demo.java:47)
You are not able to catch the error at compile time because you have used #SuppressWarnings("unchecked"). On removing that, there's a compile warning at this statement: callableList.add(new AdderImmediately(min + i, max, threadNum));
The second problem is, you haven't used generic form while creating AdderImmediately class. You are clearly returning, Set<Integer> type from the call method. If you use the proper generic form in your case, i.e., Callable<Set<Integer>>, the problem becomes clear in the above line. The type of callableList is List<Callable<Object>>. You cannot add an element of type Callable<Set<Integer>> into it.
Because you had added the elements of incorrect type by suppressing generic warnings, you are getting ClassCastException at runtime.
I'd recommend you to read the chapters on Generics from Effective Java 3rd edition to better understand these concepts.
The crawler has a urlQueue to record urls to crawl, a mock asynchronous url fetcher.
I try to write it in rx-java style.
At first, I try Flowable.generate like this
Flowable.generate((Consumer<Emitter<Integer>>) e -> {
final Integer poll = demo.urlQueue.poll();
if (poll != null) {
e.onNext(poll);
} else if (runningCount.get() == 0) {
e.onComplete();
}
}).flatMap(i -> {
runningCount.incrementAndGet();
return demo.urlFetcher.asyncFetchUrl(i);
}, 10)
.doOnNext(page -> demo.onSuccess(page))
.subscribe(page -> runningCount.decrementAndGet());
but it won't work, because at beginning, there may be only one seed in urlQueue, so generate is called 10 times, but only one e.onNext is emitted. Only when it is finished, then next request(1)-> generate is called.
Although in the code, we specify flatMap maxConcurrency is 10, it will crawl one by one.
After that , I modify code like following, It can work like expected.
But In the code, I should care how many tasks are running currently, then calculate how many should be fetched from the queue, that I think rx-java should do this job.
I am not sure if the code can be rewritten in a simpler way.
public class CrawlerDemo {
private static Logger logger = LoggerFactory.getLogger(CrawlerDemo.class);
// it can be redis queue or other queue
private BlockingQueue<Integer> urlQueue = new LinkedBlockingQueue<>();
private static AtomicInteger runningCount = new AtomicInteger(0);
private static final int MAX_CONCURRENCY = 5;
private UrlFetcher urlFetcher = new UrlFetcher();
private void addSeed(int i) {
urlQueue.offer(i);
}
private void onSuccess(Page page) {
page.links.forEach(i -> {
logger.info("offer more url " + i);
urlQueue.offer(i);
});
}
private void start(BehaviorProcessor processor) {
final Integer poll = urlQueue.poll();
if (poll != null) {
processor.onNext(poll);
} else {
processor.onComplete();
}
}
private int dispatchMoreLink(BehaviorProcessor processor) {
int links = 0;
while (runningCount.get() <= MAX_CONCURRENCY) {
final Integer poll = urlQueue.poll();
if (poll != null) {
processor.onNext(poll);
links++;
} else {
if (runningCount.get() == 0) {
processor.onComplete();
}
break;
}
}
return links;
}
private Flowable<Page> asyncFetchUrl(int i) {
return urlFetcher.asyncFetchUrl(i);
}
public static void main(String[] args) throws InterruptedException {
CrawlerDemo demo = new CrawlerDemo();
demo.addSeed(1);
BehaviorProcessor<Integer> processor = BehaviorProcessor.create();
processor
.flatMap(i -> {
runningCount.incrementAndGet();
return demo.asyncFetchUrl(i)
.doFinally(() -> runningCount.decrementAndGet())
.doFinally(() -> demo.dispatchMoreLink(processor));
}, MAX_CONCURRENCY)
.doOnNext(page -> demo.onSuccess(page))
.subscribe();
demo.start(processor);
}
}
class Page {
public List<Integer> links = new ArrayList<>();
}
class UrlFetcher {
static Logger logger = LoggerFactory.getLogger(UrlFetcher.class);
final ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
public Flowable<Page> asyncFetchUrl(Integer url) {
logger.info("start async get " + url);
return Flowable.defer(() -> emitter ->
scheduledExecutorService.schedule(() -> {
Page page = new Page();
// the website urls no more than 1000
if (url < 1000) {
page.links = IntStream.range(1, 5).boxed().map(j -> 10 * url + j).collect(Collectors.toList());
}
logger.info("finish async get " + url);
emitter.onNext(page);
emitter.onComplete();
}, 5, TimeUnit.SECONDS)); // cost 5 seconds to access url
}
}
You are trying to use regular (non-Rx) code with RxJava and not getting the results you want.
The first thing to do is to convert the urlQueue.poll() into a Flowable<Integer>:
Flowable.generate((Consumer<Emitter<Integer>>) e -> {
final Integer take = demo.urlQueue.take(); // Note 1
e.onNext(take); // Note 2
})
.observeOn(Schedulers.io(), 1) // Note 3
.flatMap(i -> demo.urlFetcher.asyncFetchUrl(i), 10)
.subscribe(page -> demo.onSuccess(page));
Reading the queue in a reactive way means a blocking wait. Trying to poll() the queue adds a layer of complexity that RxJava allows you to skip over.
Pass the received value on to any subscribers. If you need to indicate completion, you will need to add an external boolean, or use an in-band indicator (such as a negative integer).
observeOn() operator will subscribe to the generator. The value 1 will cause only one subscription since there is no point in having more than one.
The rest of the code is similar to what you have. The issues that you have arose because the flatMap(...,10) operation will subscribe to the generator 10 times, which is not what you wanted. You want to limit the number of simultaneous fetches. Adding the runningCount was a kludge to prevent exiting the generator early, but it is not a substitute for a proper way to signal end-of-data on the urlQueue.
While writing a data synchronization job with RxJava I discovered a strange behavior that I cannot explain. I'm quite novice with RxJava and would appreciate help.
Briefely my job is quite simple I have a list of element IDs, I call a webservice to get each element by ID, do some processing and do multiple call to push data to DB.
Data loading is faster than data storing so I encounted OutOfMemory errors.
My code pretty much look like "failing" test but then doning some test I realized that removing the line :
flatMap(dt -> Observable.just(dt))
Make it work.
Failing test output shows clearly that unconsumed items stack up and this lead to OutOfMemory. Working test output shows that producer will always wait consumer so this never lead to OutOfMemory.
public static class DataStore {
public Integer myVal;
public byte[] myBigData;
public DataStore(Integer myVal) {
this.myVal = myVal;
this.myBigData = new byte[1000000];
}
}
#Test
public void working() {
int MAX_CONCURRENT_LOAD = 1;
int MAX_CONCURRENT_STORE = 2;
AtomicInteger nbUnconsumed = new AtomicInteger(0);
List<Integer> ids = IntStream.range(0, 1000).boxed().collect(Collectors.toList());
Observable.from(ids)
.flatMap(this::produce, MAX_CONCURRENT_LOAD)
.doOnNext(s -> logger.info("+1 Total unconsumed values: " + nbUnconsumed.incrementAndGet()))
.flatMap(this::consume, MAX_CONCURRENT_STORE)
.doOnNext(s -> logger.info("-1 Total unconsumed values: " + nbUnconsumed.decrementAndGet()))
.toBlocking().forEach(s -> {});
logger.info("Finished");
}
#Test
public void failing() {
int MAX_CONCURRENT_LOAD = 1;
int MAX_CONCURRENT_STORE = 2;
AtomicInteger nbUnconsumed = new AtomicInteger(0);
List<Integer> ids = IntStream.range(0, 1000).boxed().collect(Collectors.toList());
Observable.from(ids)
.flatMap(this::produce, MAX_CONCURRENT_LOAD)
.doOnNext(s -> logger.info("+1 Total unconsumed values: " + nbUnconsumed.incrementAndGet()))
.flatMap(dt -> Observable.just(dt))
.flatMap(this::consume, MAX_CONCURRENT_STORE)
.doOnNext(s -> logger.info("-1 Total unconsumed values: " + nbUnconsumed.decrementAndGet()))
.toBlocking().forEach(s -> {});
logger.info("Finished");
}
private Observable<DataStore> produce(final int value) {
return Observable.<DataStore>create(s -> {
try {
if (!s.isUnsubscribed()) {
Thread.sleep(200); //Here I synchronous call WS to retrieve data
s.onNext(new DataStore(value));
s.onCompleted();
}
} catch (Exception e) {
s.onError(e);
}
}).subscribeOn(Schedulers.io());
}
private Observable<Boolean> consume(DataStore value) {
return Observable.<Boolean>create(s -> {
try {
if (!s.isUnsubscribed()) {
Thread.sleep(1000); //Here I synchronous call DB to store data
s.onNext(true);
s.onCompleted();
}
} catch (Exception e) {
s.onNext(false);
s.onCompleted();
}
}).subscribeOn(Schedulers.io());
}
What is explaination behind this behavior? How could I solve my failing test without removing the Observable.just(dt)) which in my real case is a Observable.from(someListOfItme)
flatMap by default merges an unlimited amount of sources and by applying that specific lambda without maxConcurrent parameter, you essentially unbounded the upstream which now can run at full speed, overwhelming the internal buffers of the other operators.