Using Stream API for organising application pipeline

Using Stream API for organising application pipeline - java

As far as I know Stream API is intended to be applied on collections. But I like the idea of them so much that I try to apply them when I can and when I shouldn't.
Originally my app had two threads communicating through BlockingQueue. First would populate new elements. Second make transformations on them and save on disk. Looked like a perfect stream oportunity for me at a time.
Code I ended up with:
Stream.generate().flatten().filter().forEach()
I'd like to put few maps in there but turns out I have to drag one additional field till forEach. So I either have to create meaningless class with two fields and obscure name or use AbstractMap.SimpleEntry to carry both fields through, which doesn't look like a great deal to me.
Anyway I'd rewritten my app and it even seems to work. However there are some caveats. As I have infinite stream 'the thing' can't be stopped. For now I'm starting it on daemon thread but this is not a solution. Business logic (like on connection loss/finding, this is probably not BL) looks alienated. Maybe I just need proxy for this.
On the other hand there is free laziness with queue population. One thread instead of two (not sure how good is this). Hopefully familiar pattern for other developers.
So my question is how viable is using of Stream API for application flow organising? Is there more underwather roks? If it's not recomended what are alternatives?

Related

Is there a way to do an adapter like class for firestore

I'm currently switching the way data are stored into my app from sqlite to Firestore for the sake of online support. Even if this is a big leap and will mean a lot of changes into my current architecture. I wanted to know if there is a way to sort of emulate an Adapter class (like the one used for SQlite), so I can keep those changes to bear the minimum by at least resuing my methods names and return types and avoid to write explicitly each document read/write where I need them.
What I tried at first in my so called "Adapter like class" was to load the document's data inside a dedicated object in my "adapter class"'s constructor. But it obviously didn't work because Firebase queries are async like explained here:
Is there a way to write this database call into a function easily?
Is what I want to do possible ? Or am I doomed to redo everything ?

Short answer: yes, but you shouldn't
If you absolutely must:
You can wrap the async calls in the Firebase API by using CountDownLatch to create your own Future implementation. Future is a Java Interface similar to Promises in JavaScript, which allow you to pass around the "future result" of an async computation until you absolutely need it.
Check this StackOverflow answer for a sample implementation.
You would then implement your "Adapter-like class" with this wrapper, which would return a Future, and the methods in your class would return future.get(). This call blocks the thread which executes it and when the Future resolves, the call evaluates to the value contained in the Future ("unwrapping" it).
Here's why you shouldn't:
In Android, it's usually a Very Bad Idea to block the UI thread, so if your current Adapter is already a synchronous call, it will keep blocking until the Firebase query finishes.
If you are already blocking the UI thread with your SQLite queries, and think you might be fine because of that, consider the possibility that this is not a problem now just because the SQLite queries are too fast for it to matter. When you switch to Firestore, the queries are likely to take much longer on average and have way more variance in their response time due to different network conditions, etc.
If it's not very unfeasible to consider refactoring to use the async APIs, seriously consider it. Your queries are going to take longer now and you might need "loading" indicators anyway, so using the async APIs will help with that. You may also see opportunities to run queries in parallel when you need both but they don't depend on each other, and keeping it async allows you to do that.
This might not have mattered before if the queries were fast enough that sequential vs. parallel didn't matter much, but now it may.

combined vs. separate backend calls

I try to figure out the best solution for a use case I'm working on. However, I'd appreciate getting some architectural advice from you guys.
I have a use case where the frontend should display a list of users assigned to a task and a list of users who are not assigned but able to be assigned to the same task.
I don't know what the better solution is:
have one backend call which collects both lists of users and sends them
back to the frontend within a new data class containing both lists.
have two backend calls which collect one of the two lists and send them
back separately.
The first solution's pro is the single backend call whereas the second solution's pro is the reusability of the separate methods in the backend.
Any advice on which solution to prefer and why?
Is there any pattern or standard I should get familiar with?

When I stumble across the requirement to get data from a server I start with doing just a single call for, more or less (depends on the problem domain), a single feature (which I would call your task-user-list).
This approach saves implementation complexity on the client's side and saves protocol overhead for transactions (TCP header, etc.).
If performance analysis shows that the call is too slow because it requests too much data (user experience suffers) then I would go with your 2nd solution.
Summed up I would start with 1st approach. Optimize (go with more complex solution) when it's necessary.

I'd prefer the two calls because of the reusability. Maybe one day you need add a third list of users for one case and then you'd need to change the method if you would only use one method. But then there may be other use cases which only required the two lists but not the three, so you would need to change code there as well. Also you would need to change all your testing methods. If your project gets bigger this makes your project hard to update or fix. Also all the modifications increase the chances of introducing new bugs as well.
Seeing the methods callable by the frontend of the backend like an interface helps.
In general an interface should be open for extension but closed on what the methods return and require. As otherwise a slight modification leads to various more modifications.

Why would you prefer Java 8 Stream API instead of direct hibernate/sql queries when working with the DB

Recently I see a lot of code in few projects using stream for filtering objects, like:
library.stream()
.map(book -> book.getAuthor())
.filter(author -> author.getAge() >= 50)
.map(Author::getSurname)
.map(String::toUpperCase)
.distinct()
.limit(15)
.collect(toList()));
Is there any advantages of using that instead of direct HQL/SQL query to the database returning already the filtered results.
Isn't the second aproach much faster?

If the data originally comes from a DB it is better to do the filtering in the DB rather than fetching everything and filtering locally.
First, Database management systems are good at filtering, it is part of their main job and they are therefore optimized for it. The filtering can also be sped up by using indexes.
Second, fetching and transmitting many records and to unmarshal the data into objects just to throw away a lot of them when doing local filtering is a waste of bandwidth and computing resources.

On a first glance: streams can be made to run in parallel; just by changing code to use parallelStream(). (disclaimer: of course it depends on the specific context if just changing the stream type will result in correct results; but yes, it can be that easy).
Then: streams "invite" to use lambda expressions. And those in turn lead to usage of invoke_dynamic bytecode instructions; sometimes gaining performance advantages compared to "old-school" kind of writing such code. (and to clarify the misunderstanding: invoke_dynamic is a property of lambdas, not streams!)
These would be reasons to prefer "stream" solutions nowadays (from a general point of view).
Beyond that: it really depends ... lets have a look at your example input. This looks like dealing with ordinary Java POJOs, that already reside in memory, within some sort of collection. Processing such objects in memory directly would definitely be faster than going to some off-process database to do work there!
But, of course: when the above calls, like book.getAuthor() would be doing a "deep dive" and actually talk to an underlying database; then chances are that "doing the whole thing in a single query" gives you better performance.

The first thing is to realize, that you can't tell from just this code, what statement is issued against the database. It might very well, that all the filtering, limiting and mapping is collected, and upon the invocation of collect all that information is used to construct a matching SQL statement (or whatever query language is used) and send to the database.
With this in mind there are many reasons why streamlike APIs are used.
It is hip. Streams and lambdas are still rather new to most java developers, so they feel cool when they use it.
If something like in the first paragraph is used it actually creates a nice DSL to construct your query statements. Scalas Slick and .Net LINQ where early examples I know about, although I assume somebody build something like it in LISP long before I was born.
The streams might be reactive streams and encapsulate a non-blocking API. While these APIs are really nice because they don't force you to block resources like threads while you are waiting for results. Using them requires either tons of callbacks or using a much nicer stream based API to process the results.
They are nicer to read the imperative code. Maybe the processing done in the stream can't [easily/by the author] be done with SQL. So the alternatives aren't SQL vs Java (or what ever language you are using), but imperative Java or "functional" Java. The later often reads nicer.
So there are good reasons to use such an API.
With all that said: It is, in almost all cases, a bad idea to do any sorting/filtering and the like in your application, when you can offload it to the database. The only exception I can currently think of is when you can skip the whole roundtrip to the database, because you already have the result locally (e.g. in a cache).

Well, your question should ideally be - Is it better to do reduction / filtering operations in the DB or fetch all records and do it in Java using Streams?
The answer isn't straightforward and any stats that give a "concrete" answer will not generalize to all cases.
The operations you are talking about are better done in the DB itself, because that is what DBs are designed for, very fast handling of data. Of course usually in case of relational databases, there will be some "book-keeping and locks" being used to ensure that independent transactions don't end up making the data inconsistent, but even with that, DBs do a pretty good job in filtering data, especially large data sets.
One case where I would prefer filtering data in Java code rather than in DB would be if you need to filter different features from the same data. For example, right now you are getting only the Author's surname. If you wanted to get all books written by the author, ages of authors, children of author, place of birth etc. Then it makes sense to get only one "read-only" copy from the DB and use parallel streams to get different information from the same data set.

Unless measured and proven for a specific scenario either could be good or equally bad. The reason you usually want to take these kind of queries to the database is because (among other things):
DB can handle much larger data then your java process
Queries in a database can be indexed (making them much faster)
On the other hand, if your data is small, using a Stream the way you did is effective. Writing such a Stream pipeline is very readable (once you talk Streams good enough).

Hibernate and other ORMs are usually way more useful for writing entities rather than reading, because they allow developers to offload ordering of specific writes to framework that almost never will "get that wrong".
Now, for reading and reporting, on the other hand (and considering we are talking DB here) an SQL query is likely to be better because there will not be any frameworks in-between, and you will be able to tune query performance in terms of database that will be invoking this query rather than in terms of your framework of choice, which gives more flexibility to how that tuning can be done, sort of.

Correctly using onUpgrade (and content providers) to handle updates without blocking the main thread, are `Loader`s pointless?

This is one of the questions that involves crossing what I call the "Hello World Gulf" I'm on the "Hello world" I can use SQLite and Content Providers (and resolvers) but I now need to cross to the other side, I cannot make the assumption that onUpgrade will be quick.
Now my go-to book (Wrox, Professional Android 4 development - I didn't chose it because of professional, I chose it because Wrox are like the O'Reilly of guides - O'Reilly suck at guides, they are reference book) only touches briefly on using Loaders, so I've done some searching, some more reading and so forth.
I've basically concluded a Loader is little more than a wrapper, it just does things on a different thread, and gives you a callback (on that worker thread) to process things in, it gives you 3 steps, initiating the query, using the results of the query, and resetting the query.
This seems like quite a thin wrapper, so question 1:
Why would I want to use Loaders?
I sense I may be missing something you see, most "utilities" like this with Android are really useful if you go with the grain so to speak, and as I said Loaders seem like a pretty thin wrapper, and they force me to have callback names which could become tedious of there are multiple queries going on
http://developer.android.com/reference/android/content/Loader.html
Reading that points out that "they ought to monitor the data and act upon changes" - this sounds great but it isn't obvious how that is actually done (I am thinking about database tables though)
Presentation
How should this alter the look of my application? Should I put a loading spinning thing (I'm not sure on the name, never needed them before) after a certain amount of time post activity creation? So the fragment is blank, but if X time elapses without the loader reporting back, I show a spiny thing?
Other operations
Loaders are clearly useless for updates and such, their name alone tells one this much, so any nasty updates and such would have to be wrapped by my own system for shunting work to a worker thread. This further leads me to wonder why would I want loaders?
What I think my answer is
Some sort of wrapper (at some level, content provider or otherwise) to do stuff on a worker thread will mean that the upgrade takes place on that thread, this solves the problem because ... well that's not on the main thread.
If I do write my own I can then (if I want to) ensure queries happen in a certain order, use my own data-structures (rather than Bundles) it seems that I have better control.
What I am really looking for
Discussion, I find when one knows why things are the way they are that one makes less mistakes and just generally has more confidence, I am sure there's a reason Loaders exist, and there will be some pattern that all of Android lends itself towards, I want to know why this is.
Example:
Adapters (for ListViews) it's not immediately obvious how one keeps track of rows (insert) why one must specify a default style (and why ArrayAdapter uses toString) when most of the time (in my experience, dare I say) it is subclasses, reading the source code gives one an understanding of what the Adapter must actually do, then I challenge myself "Can I think of a (better) system that meets these requirements", usually (and hopefully) my answer to that converges on how it's actually done.
Thus the "Hello World Gulf" is crossed.
I look forward to reading answers and any linked text-walls on the matter.

you shouldnt use Loaders directly, but rather LoaderManager

Multiple threads modifying a collection in Java?

The project I am working on requires a whole bunch of queries towards a database. In principle there are two types of queries I am using:
read from excel file, check for a couple of parameters and do a query for hits in the database. These hits are then registered as a series of custom classes. Any hit may (and most likely will) occur more than once so this part of the code checks and updates the occurrence in a custom list implementation that extends ArrayList.
for each hit found, do a detail query and parse the output, so that the classes created in (I) get detailed info.
I figured I would use multiple threads to optimize time-wise. However I can't really come up with a good way to solve the problem that occurs with the collection these items are stored in. To elaborate a little bit; throughout the execution objects are supposed to be modified by both (I) and (II).
I deliberately didn't c/p any code, as it would be big chunks of code to make any sense.. I hope it make some sense with the description above.
Thanks,

In Java 5 and above, you may either use CopyOnWriteArrayList or a synchronized wrapper around your list. In earlier Java versions, only the latter choice is available. The same is true if you absolutely want to stick to the custom ArrayList implementation you mention.
CopyOnWriteArrayList is feasible if the container is read much more often than written (changed), which seems to be true based on your explanation. Its atomic addIfAbsent() method may even help simplify your code.
[Update] On second thought, a map sounds more fitting to the use case you describe. So if changing from a list to e.g. a map is an option, you should consider ConcurrentHashMap. [/Update]
Changing the objects within the container does not affect the container itself, however you need to ensure that the objects themselves are thread-safe.

Just use the new java.util.concurrent packages.
Classes like ConcurrentLinkedQueue and ConcurrentHashMap are already there for you to use and are all thread-safe.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.