updateStateByKey from RDD - java

I am a bit new to Spark-graphx, so please forgive if this is a stupid question. I also would prefer to do this in Java, rather than Scala, if at all possible.
I need to run a graphx calculation on the RDDs of a JavaDStream, but I need to roll the results back into my state object.
I am doing the graphx calculation inside of foreachRDD, since I do not know of another way to get the RDDs from the JavaDStream;
updateStateByKey only works on the JavaDStream;
Each graph vertex maps 1-1 to each state object, so if there is a way to access the state object inside of the foreachRDD, then this would solve it. But just passing a reference to the object inside of the vertex and calling the update function inside of there strikes me as bad practise, but I could be wrong?
How would you solve this problem in Java? I am ready to restructure the calculations to a different logical flow, if there is a better way to do this.
To make this more visual, the structure looks like this:
JavaDStream<StateObject> stream = inputDataStream.updateStateByKey(function);
stream.foreachRDD(rdd -> {
Graph<Vertex, EdgeProperty> graph = GraphImpl.apply(/* derive the Vertex and EdgeProperty from the rdd */);
JavaRDD<Vertex> updatedVertices = graphOperation(graph);
// How to put the contents of updatedVertices back into stream?
});

I put my graph calculation in as a transform and got things up and running up to the point of hanging during fold (in Pregel) and errors from Scala when running JavaConverters.asScalaIteratorConverter that there was no appropriate iterator...
In short, after reading online that Graphframes is potentially more stable than graphx for Java, since it is apparently easier to wrap the Scala in Java context for Dataframes, I have abandoned this approach and moved to Graphframes. For others who have run into similar problems, I apologize that I have no solution to offer, but I am finding the Dataframe approach to work must better with my algorithm.

Related

How to get a Hazelcast map inside a Camel route?

I'm really fresh to Camel (and hazelcast, at that), and I've been playing around with it a bit recently. A seemingly simple operation is causing me a lot of trouble, and I'm struggling to find any waypointers anywhere.
I have a listener watching for changes on a hazelcast map. If said changes match a certain criteria, I want to grab the entire map and send it to a processor. Something like this:
from(hazelcast:map:someMap?hazelcastInstance=#hazelcastInstance)
.filter().method(SomeFilter.class, filterMethod)
.???
// If the filter cafeterias are met, get the entire map and send it to a processor
But I am really not sure how to, well, get entire map itself, especially using Java DSL. The closest thing I've found is to get the map's keySet and then call getAll(keySet) on it, which seems needlessly contrived for such a simple thing? If this is really the preferred method, there is another issue - how do you pass said keySet as the parameter to the getAll operation? I.e.:
<snip>
.setHeader(HazelcastConstants.OPERATION, constant(HazelcastConstants.GET_KEYS_OPERATION))
.to(hazelcast:map:someMap?hazelcastInstance=#hazelcastInstance) // Gets the keySet just fine.
.setHeader(HazelcastConstants.OPERATION, constants(HazelcastConstants.GET_ALL_OPERATION))
.????
// I've tried .setHeader(HazelcastConstants.OPERATION_PARAM, new
// SimpleExpression("${body}")> here,
// amongst many other things, but, I just get an empty object back, so it's
// pretty clear I'm messing up
// the format or parameter choice here.
.to(hazelcast:map:someMap?hazelcastInstance=#hazelcastInstance)
Using camel 2.18.0 here, by the way.
Well after bit more trial and error, I got it working. I swear it was one of the first things I tried, but I must've bungled it up on the first attempt and got stumped. All you really do is get the keySet and then put into OBJECT_ID. So something like this:
.setHeader(HazelcastConstants.OPERATION, constant(HazelcastConstants.GET_KEYS_OPERATION))
.to(hazelcast:map:someMap?hazelcastInstance=#hazelcastInstance)
.setHeader(HazelcastConstants.OPERATION, constants(HazelcastConstants.GET_ALL_OPERATION))
.setHeader(HazelcastConstants.OBJECT_ID, new SimpleExpression("${body}"))
.to(hazelcast:map:someMap?hazelcastInstance=#hazelcastInstance)
(Note: it's hazelcast-map in newer versions.)
And ta-dah, you've got a Hashmap to work with. The alternative seems to use QUERY, but I couldn't figure out the proper syntax for, well, the query. I'd love to hear if anyone has a working example with a QUERY header...

Using Stream API for organising application pipeline

As far as I know Stream API is intended to be applied on collections. But I like the idea of them so much that I try to apply them when I can and when I shouldn't.
Originally my app had two threads communicating through BlockingQueue. First would populate new elements. Second make transformations on them and save on disk. Looked like a perfect stream oportunity for me at a time.
Code I ended up with:
Stream.generate().flatten().filter().forEach()
I'd like to put few maps in there but turns out I have to drag one additional field till forEach. So I either have to create meaningless class with two fields and obscure name or use AbstractMap.SimpleEntry to carry both fields through, which doesn't look like a great deal to me.
Anyway I'd rewritten my app and it even seems to work. However there are some caveats. As I have infinite stream 'the thing' can't be stopped. For now I'm starting it on daemon thread but this is not a solution. Business logic (like on connection loss/finding, this is probably not BL) looks alienated. Maybe I just need proxy for this.
On the other hand there is free laziness with queue population. One thread instead of two (not sure how good is this). Hopefully familiar pattern for other developers.
So my question is how viable is using of Stream API for application flow organising? Is there more underwather roks? If it's not recomended what are alternatives?

Most efficient way to store 5 attributes

So I'm trying to store 5 attributes of an object, which are 5 different integers.
What would be the best way to store these? I was thinking of arrays, but arrays aren't flexible. I also need to be able to retrieve all 5 attributes, so arrays probably won't work well.
Here's some background if it helps: I am currently making a game similar to Terraria (or Minecraft in 2D).
I wanted to store where the object is on the map(x,y), where it is on the screen at the part of the map(x,y), and what type of object it is.
import java.awt.Point
public class MyClass {
private Point pointOnMap;
private Point pointOnScreen;
// ...
}
The Point class binds x & y values into a single object (which makes sense) and gives you useful, basic methods it sounds like you'll need, such as translate and distance. http://docs.oracle.com/javase/7/docs/api/java/awt/Point.html
It is not possible to predict what is the most efficient way to store the attributes without seeing all of your code. (And I for one don't want to :-)) Second, you haven't clearly explained what you are optimizing for. Speed? Memory usage? Minimization of GC pauses?
However, this smells of premature optimization. Wasting lost of time trying to optimize performance on something that hasn't been built, and without any evidence that the performance of this part the codebase is going to be significant.
My advice would be:
Pick a simple design and implement it; e.g. 5 private int variables with getters and setters. If that is inconvenient, then choose a more convenient API.
Complete the program.
Get it working.
Benchmark it. Does it run fast enough? If yes, stop.
Profile it. Pick the biggest performance hotspot and optimize that.
Rerun the benchmarking and profile to check that your optimization has made things faster. If yes, then "commit" it. If not then back it out.
Go to step 4.
I would suggest HashMap where key can be objectId-attributeName and value will be integer value as you have to do retrieval based on key. This will be O(1) operation

Will I have a big performance penalty when I use Reflection api in Android application?

I want to implement a Map of functions. I have several functions for inserting different data in SQLite database (InsertUSER_Data(), InsertMessages()) and so on. I want to create a Map and to call a specific function by a key command. As I see during search there are two approaches for this: anonymous classes and reflection. (e.g. here How to call a method stored in a HashMap? (Java)) I really like the approach based on Reflection API (via a Method type) : Map <String, Method> instead of just using anonymous classes which implement interface. But I have doubts: is it a really big performance overhead in this solution, especially if I'll use it in Android or it's not really significant? Detail explanation will be very helpful.
I cannot answer your exact point, but some basics could help :
The art of knowing the performance beforehand is subtle, difficult and dangerous. The best way to do it is to measure the difference between the two solutions (that could be as simple as System.currentTimeMillis()).
As a rule of thumb, the time passed "in memory" will often be of little importance in terms of performance, in comparison with other things like IO (file or database acesses), remote calls or even something UI elements.
Except if you intend to do (b?)millions of calls, I sincerely doubt that such as small Map would be of consequence. But again, do not believe me, test it.
Finally, could your problem not be solve simply using interfaces ? That would remove this problem, and it is often easier to write, read, and debug. Something like :
interface Actionable {
Result doStuff(Param p);
}
That your various function classes could implement, and that would be possible to call afterwards without knowing exactly what is behind.

Saving a game (Android app.)

This is somewhat of an open question.
I'm in the process of developing a simple game for android and I've gotten to the point where I'm trying to enable thee user to save their progress and return later.
As i'm a beginner, I'm not exactly sure where to start, so I was hoping some of you might have at least some suggestions.
A little info on the setup of the game:
All animation is done in a thread through a canvas and alternation of stored bitmap frames based on a 30 ms loop.
Everything is an object, the characters, the background is simply a 2d array of objects. and each object is generally referenced and created dynamically through a hashmap.
Now how to save? I know I could brute force it, and simply save coordinates and current actions blah blah etc. etc. for each object in each map.
But is there a better way to do this? I've briefly read that in python there's a method of sterilizing objects called "pickle," and there is something similar called "kryo." Am I looking in the right direction?
You should look into Java serialization. It's not perfect, it has problems, but it's the safest, quickest way to turn a complex tree of objects into something that you can save to a file or a db, and load it back when you need.
Else, there's always the possibility to use your own specific serialization using INSERT SQL queries, etc. But be very careful, it's easy to miss parts of what you want to save / restore. One example of that would be to turn your objects tree into XML and save that XML as a file. There are very good 3rd-party libs to map objects to XML and back in Java.
Well.. That's not STERILIZATION, but SERIALIZATION.. Which is a programming technique. And serialization is also the technique you want to use.
Doesn't matter if you use a predefined method or something you write on your own, but the only thing that matters is to loop across the objects and write to the file (or saving structure) the date you need to be later reloaded.
Anyway yes, you're looking the right way.
The best way to do it is implementing a serialization interface. Each object for which the serialize() method is called must save it's data and then call the serialize() method for each child object it owns.

Categories