Interface for Lazy Loaded List of Objects - java

We have a whole bunch of data sources where we consult some REST API or other and get back a list of objects. I'm trying to design an abstraction layer that doesn't need to know how to contact any specific API instance or how to semantically interpret the objects, but that guarantees that we get back a list of objects from whichever class implements the interface we need at the time.
I expect at times the numbers of results to be quite large (but always finite!) and often slow to retrieve, so I require something that does not load everything into memory all at once but allows the results of the list to be worked with as they become available. I'm fine if the list blocks on next or hasNext or whatever the appropriate analogue is.
What's the most appropriate abstraction / approach for achieving these goals and how is it implemented?
My gut tells me it ought to be some flavor of Java 8 Streams, possibly created via the Java 9 Stream.iterate method, but I'm not too familiar with functional programming paradigms and can't for the life of me figure out how one would populate the elements of the Stream as they became available from the REST calls and close it out when it's finished.

It turns out I was confusing myself by conflating two issues: how to provide an Iterator in an Interface (which is trivial), and how to populate that Iterator in the background. I ended up with roughly the following:
Create a custom abstract class which implements Iterator. That class has an internal BlockingQueue and an internal List. It also defines an abstract method which is intended to perform all the activities of population in a single invocation.
The first time hasNext() is called, kick off a daemon thread which invokes that abstract method. Then, while the thread is alive (meaning it's still populating the BlockingQueue) or the List isn't empty (meaning not all elements have been consumed via next()), poll against the BlockingQueue until it has at least one element in it. Once it does, remove that element and add it to the List. next() merely returns elements from the List.
This results in lazy loading (nothing occurs until hasNext() is called for the first time) that also happens asynchronously in the background -- the caller will be able to process things as soon as they're available (hasNext() will block if things aren't available), and it doesn't use up an unreasonable amount of memory (the BlockingQueue will block if it has too many elements).

Related

How to get an iterator from an akka streams Source?

I'm trying to create a flow that I can consume via something like an Iterator.
I'm implementing a library that exposes an iterator-like interface, so that would be the simplest thing for me to consume.
My graph designed so far is essentially a Source<Iterator<DataRow>>. One thing I see so far is to flatten it to Source<DataRow> and then use http://doc.akka.io/japi/akka/current/akka/stream/javadsl/StreamConverters.html#asJavaStream-- followed by https://docs.oracle.com/javase/8/docs/api/java/util/stream/BaseStream.html#iterator--
But given that there will be lots potentially many rows, I'm wondering whether it would make sense to avoid the flattening step (at least within the akka streams context, I'm assuming there's some minor per-element overhead when passed via stages), or if there's a more direct way.
Also, I'm curious how backpressure works in the created stream, especially the child Iterator; does it only buffer one element?
Flattening Step
Flattening a Source<Iterator<DataRow>> to a Source<DataRow> does add some amount of overhead since you'll have to use flatMapConcat which does eventually create a new GraphStage.
However, if you have "many" rows then this separate stage may come in handy since it will provide concurrency for the flattening step.
Backpressure
If you look at the code of StreamConverters.asJavaStream you'll see that there is a QueueSink that is spawning a Future to pull the next element from the akka stream and then doing an Await.result(nextElementFuture, Inf) to wait on the Future to complete so the next element can be forwarded to the java Stream.
Answering your question: yes the child Iterator only buffers one element, but the QueueSink has a Future which may also have the next DataRow. Therefore the javaStream & Iterator may have 2 elements buffered, on top of however much buffering is going on in your original akka Source.
Alternatively, you may implement an Iterator using prefixAndTail(1) under the hood for implementing hasNext and next.

Using Java Concurrent Collections in Scala [duplicate]

I have an Actor that - in its very essence - maintains a list of objects. It has three basic operations, an add, update and a remove (where sometimes the remove is called from the add method, but that aside), and works with a single collection. Obviously, that backing list is accessed concurrently, with add and remove calls interleaving each other constantly.
My first version used a ListBuffer, but I read somewhere it's not meant for concurrent access. I haven't gotten concurrent access exceptions, but I did note that finding & removing objects from it does not always work, possibly due to concurrency.
I was halfway rewriting it to use a var List, but removing items from Scala's default immutable List is a bit of a pain - and I doubt it's suitable for concurrent access.
So, basic question: What collection type should I use in a concurrent access situation, and how is it used?
(Perhaps secondary: Is an Actor actually a multithreaded entity, or is that just my wrong conception and does it process messages one at a time in a single thread?)
(Tertiary: In Scala, what collection type is best for inserts and random access (delete / update)?)
Edit: To the kind responders: Excuse my late reply, I'm making a nasty habit out of dumping a question on SO or mailing lists, then moving on to the next problem, forgetting the original one for the moment.
Take a look at the scala.collection.mutable.Synchronized* traits/classes.
The idea is that you mixin the Synchronized traits into regular mutable collections to get synchronized versions of them.
For example:
import scala.collection.mutable._
val syncSet = new HashSet[Int] with SynchronizedSet[Int]
val syncArray = new ArrayBuffer[Int] with SynchronizedBuffer[Int]
You don't need to synchronize the state of the actors. The aim of the actors is to avoid tricky, error prone and hard to debug concurrent programming.
Actor model will ensure that the actor will consume messages one by one and that you will never have two thread consuming message for the same Actor.
Scala's immutable collections are suitable for concurrent usage.
As for actors, a couple of things are guaranteed as explained here the Akka documentation.
the actor send rule: where the send of the message to an actor happens before the receive of the same actor.
the actor subsequent processing rule: where processing of one message happens before processing of the next message by the same actor.
You are not guaranteed that the same thread processes the next message, but you are guaranteed that the current message will finish processing before the next one starts, and also that at any given time, only one thread is executing the receive method.
So that takes care of a given Actor's persistent state. With regard to shared data, the best approach as I understand it is to use immutable data structures and lean on the Actor model as much as possible. That is, "do not communicate by sharing memory; share memory by communicating."
What collection type should I use in a concurrent access situation, and how is it used?
See #hbatista's answer.
Is an Actor actually a multithreaded entity, or is that just my wrong conception and does it process messages one at a time in a single thread
The second (though the thread on which messages are processed may change, so don't store anything in thread-local data). That's how the actor can maintain invariants on its state.

Detemining queue size

I have to implement a queue to which object will be added and removed by two different threads at different time based on some factor.My problem is the requirement says the queue( whole queue and data it hold) should not take 200KB+ data .If size is 200 thread should wait for space to be available to push more data.Object pushed may vary in size.I can create java queue obut the size of queue will return the total object pushed instead of total memory used How do i determine the totla size of data my queue is refering to .
Consider the object pushed as
class A{
int x;
byte[] buf;//array size vary per object
}
There is no out of the box functionality for this in Java. (In part, because there is no easy way to know if the objects added to the collection are referenced elsewhere and therefore if adding them takes up additional memory.)
For your use case, you would probably be best of just subclassing queue. Override the super to add the size of the object to a counter (obviously you will have to make this calculation thread safe.) and to throw an exception IllegalStateException if it doesn't have room. Similarly decrement your counter if on an overridden remove class.
The method of determining how to much space to add to the counter could vary. Farlan suggested using this and that looks like it would work. But since you are suggesting that you are dealing with a byte array, the size of the data you are adding might already be known to you. You will also have to consider whether you want to consider any of the overhead. The object takes some space, as does the reference inside of the queue itself. Plus the queue object. You could figure out exact values for that, but since it seems like your requirement is just to prevent outofmemory, you could probably just use rough estimates for those as long as you are consistent.
The details of what queue class you want to subclass may depend on how much contention you think there will be between the threads. But it sounds like you have a handle on the sync issues.

java - sharing data between threads - atomicreference or synchronize

I am making a 2 player videogame, and the oponent's position gets updated on a thread, because it has a socket that is continuously listening. What I want to share is position and rotation.
As it is a videogame I don't want the main thread to be blocked (or be just the minimum time possible) and I don't want the performance to be affected. So from what I've seen to share this info the normal thing to do would be something like
class sharedinfo
{
public synchronized read();
public synchronized write();
}
but this would block the read in the main thread (the same that draws the videogame) until the three values (or even more info in the future are written) are written, and also I've read that synchronized is very expensive (also it is important to say this game is for android also, so performance is very important).
But I was thinking that maybe having sharedInfo inside an AtomicReference and eliminating synchronized would make it more efficient, because it would only stop when the reference itself is being updated (the write would not exist, I would create a new object and put it on the atomicreference), also they say that atomic* use hardware operations and are more efficient than synchronized.
What do you think?
Consider using a queue for this, Java has some nice concurrent queue implementations. Look up the BlockingQueue interface in java.util.concurrent, and who implements it. Chances are you fill find strategies implemented that you hadn't even considered.
Before you know it, you will want to communicate more than just positions between your threads, and with a queue you can stick different type of objects in there, maybe at different priorities, etc.
If in your code you use Interfaces (like Queue or BlockingQueue) as much as possible (i.e. anywhere but the place where the specific instance is constructed), it is really easy to swap out what exact type of Queue you are using, if you need different functionality, or just want to play around.

Threading a data set?

I have a bunch of objects. They don't need to be sorted or ordered. They have one method that needs to be called: myObject.update(). Eventually they will need to be removed from he container.
Right now it's single threaded and the update() method is CPU bound (no I/O). We have a nice server with 16 "cores" (cores + HT).
What I would like do is have one container object responsible for "dishing out" objects. And then 15 threads that ask the container for a new object when they need one. Is this a good way to go about it?
What is a thread safe data structure to hold the objects? Or should I just make the container object responsible for not sending out the same object twice?
In java, good candidates for your problem are LinkedBlockingQueue and ArrayBlockingQueue.
They provide first-in-first-out functionality with an optional bound on the number of elements they hold at one time.
Alternatively, a good approach is to use an ExecutorService, which holds a thread pool and an internal queue for serving the threads on-demand.

Categories