Use shared variable in Spark

Use shared variable in Spark - java

Hi I am using BLAS to do some math computation in Spark.I got 2 JavaPairRDDs that both has a Double[] field, I want to caculate dot product as follow:
userPairRDD.cartesian(itemPairRDD).mapToPair(
new PairFunction<Tuple2<Tuple2<String, Double[]>, Tuple2<String, Double[]>>, String, ItemAndWeight>() {
#Override
public Tuple2<String, ItemAndWeight> call(Tuple2<Tuple2<String, Double[]>, Tuple2<String, Double[]>> tuple2Tuple2Tuple2) throws Exception {
BLAS.getInstance().ddot("......");
.......
}
}
)
My question is, in my call(),I called BLAS.getInstance() every time it might be inefficient, Can I create only one BLAS object outside call() and just the very object to do ddot()?
Is there any point to take care as this is a distributed program? Thanks in advance.

You don't need shared variable in this case. BLAS.getInstance() just return a static/singleton instance, so no inefficient thing here.

Related

Comparing object references with multiple instantiation

G'day,
I've been trying to finish an assignment and am learning a lot about OO and Java. Nearing the end of this project so very happy but thought I'd try get an interesting question out there, at least for me because of my lack in understanding.
Some background to help clarify, I've used 2D ArrayLists to model a map. I've made a "copy" of the original so that I can update movements and locations, some are permitted some are not.
I use a method to determine which movement is okay and to then update the "copy". There are two classes involved here.
Class GameEngine {
void runGameLoop(ArrayList<ArrayList<String>> map) {
World w = new World();
w.setOriginalMap(map);
while(1) {
w.checkMovement;
}
}
Class World {
ArrayList<ArrayList<String>> originalMap;
void setOriginalMap(ArrayList<ArrayList<String>> map) {
originalMap = new ArrayList<>(map);
}
void checkMovement (String keyEvent, Player obj1) {
ArrayList<ArrayList<String>> copyMap = originalMap;
obj1.setPlayer();
printMap(copyMap, obj1);
}
The issue is that the movement is updated on the map, but the player is now in multiple locations being the ones previous... my map has turned into more of a route. Does this has something with using the same object reference? I make a "copy" inside of the Class method so isn't this local?
Would appreciate some insight.

How to improve performance in recursive method with Loop in Java

I'm trying to use memoization pattern in my recursive operation, but something is going wrong.
Below is my method.
private Map<String, MyDTO> map = new ConcurrentHashMap<>();
private void searchRecursive(String uuid) {
if (!map.containsKey(uuid)) {
MyDTO obj = myClient.getMyObject(uuid);
if("one".equals(obj.getType()) || "two".equals(obj.getType())){
if(Objects.nonNull(obj.getChildren())){
obj.getChildren().forEach(child -> searchRecursive(child.getId()));
}
} else if("three".equals(obj.getType())) {
map.put(uuid, obj);
}
}
}
I would like to improve the performance of this operation!
Thank you very much for the help!

I think as #John3136 said, getType(), getChildren() called more than one time, better to create a reference and reduce call time for those methods.
I'm guessing the obj you are trying to search is like a graph or tree, so I think you can try to convert to iterate the child use the BFS method (Queue like data structure), that would be better.

Is it safe to use hashmap value reference when it may be updated in another thread

Is it safe to use getParameter
Since I can tolerate the value is not latest.
And when next time I can get the latest value of Parameter
Code like this :
public class ParameterManager {
private volatile Map<String, Parameter> scenarioParameterMap = Maps.newHashMap();
public ParameterManager(String appName) throws DarwinClientException {
}
public Parameter getParameter(String scenario) {
return scenarioParameterMap.get(scenario);
}
public void update(String scenario, Map<String, String> parameters) {
if (scenarioParameterMap.containsKey(scenario)) {
Parameter parameter = scenarioParameterMap.get(scenario);
parameter.update(parameters);
} else {
scenarioParameterMap.put(scenario, new Parameter(scenario, parameters));
}
}
}
or the update is just use
scenarioParameterMap.put(scenario, new Parameter(scenario, parameters));

volatile does not help here at all. It only protects the reference held in scenarioParameterMap, not the contents of that map. Since you're not reassigning it to point to a different map at any point, volatile is extraneous.
This code is not threadsafe. You need to use proper synchronization, be that via synchronized, or using a concurrent map, or other equivalent method.
Since I can tolerate the value is not latest.
Thread non-safety can be more dangerous than that. It could give you wrong results. It could crash. You can't get by thinking that the worst case is stale data. That's not the case.
Imagine that Map.put() is in the middle of updating the map and has the internal data in some temporarily invalid state. If Map.get() runs at the same time who knows what might go wrong. Sometimes adding an entry to a hash map will cause the whole thing to be reallocated and re-bucketed. Another thread reading the map at that time would be very confused.

RxJava - Observable chain with concatWith and map

I'm having troubles properly implementing the following scenario using RxJava (v1.2.1):
I need to handle a request for some data object. I have a meta-data copy of this object which I can return immediately, while making an API call to a remote server to retrieve the whole object data. When I receive the data from the API call I need to process the data before emitting it.
My solution currently looks like this:
return Observable.just(localDataCall())
.concatWith(externalAPICall().map(new DataProcessFunction()));
The first Observable, localDataCall(), should emit the local data, which is then concatenated with the remote API call, externalAPICall(), mapped to the DataProcessFunction.
This solution works but it has a behavior that is not clear to me. When the local data call returns its value, this value goes through the DataProcessFunction even though it's not connected to the first call.
Any idea why this is happening? Is there a better implementation for my use case?

I believe that the issue lies in some part of your code that has not been provided. The data returned from localDataCall() is independent of the new DataProcessFunction() object, unless somewhere within localDataCall you use another DataProcessFunction.
To prove this to you I will create a small example using io.reactivex:rxjava:1.2.1:
public static void main(String[] args){
Observable.just(foo())
.concatWith(bar().map(new IntMapper()))
.subscribe(System.out::println);
}
static int foo() {
System.out.println("foo");
return 0;
}
static Observable<Integer> bar() {
System.out.println("bar");
return Observable.just(1, 2);
}
static class IntMapper implements Func1<Integer, Integer>
{
#Override
public Integer call(Integer integer)
{
System.out.println("IntMapper " + integer);
return integer + 5;
}
}
This prints to the console:
foo
bar
0
IntMapper 1
6
IntMapper 2
7
As can be seen, the value 0 created in foo never gets processed by IntMapper; IntMapper#call is only called twice for the values created in bar. The same can be said for the value created by localDataCall. It will not be mapped by the DataProcessFunction object passed to your map call. Just like bar and IntMapper, only values returned from externalAPICall will be processed by DataProcessFunction.

.concatWith() concatenates all items emitted by one observable with all items emitted by the other observable, so no wonder that .map() is being called twice.
But I do not understand why do you need localDataCall() at all in this scenario. Perhaps you might want to use .switchIfEmpty() or .switchOnNext() instead.

Saving on Instance Variables

Our server recently has been going down a lot and I was tasked to improve the memory usage of a set of classes that was identified to be the culprit.
I have code which initializes an instance of an object and goes like this:
boolean var1;
boolean var2;
.
.
.
boolean var100;
void setup() {
var1 = map.hasFlag("var1");
var2 = map.hasFlag("var2);
.
.
.
if (map.hasFlag("some flag") {
doSomething();
}
if (var1) {
increment something
}
if (var2) {
increment something
}
}
The setup code takes about 1300 lines. My question is if it is possible for this method to be more efficient in terms of using too many instance variables.
The instance variables by the way are used in a "main" method handleRow() where for example:
handleRow(){
if (var1) {
doSomething();
}
.
.
.
if (var100) {
doSomething();
}
}
One solution I am thinking is to change the implementation by removing the instance variables in the setup method and just calling it directly from the map when I need it:
handleRow(){
if (map.hasFlag("var1") {
doSomething();
}
.
.
.
if (map.hasFlag("var100") {
doSomething();
}
}
That's one solution I am considering but I would like to hear the inputs of the community. :)

If these are really all boolean variables, consider using a BitSet instead. You may find that reduces the memory footprint by a factor of 8 or possibly even 32 depending on padding.

100 boolean variables will take 1.6k of memory when every boolean with overhead takes 16 bytes (which is a bit much imho) I do not think this will be the source of the problem.
Replacing these flags with calls into the map will negatively impact performance, so your change will probably make things worse.
Before you go redesigning your code (a command pattern looks like a good candidate) you should look further into where the memory leak is that you are asked to solve.
Look for maps that the classes keep adding to, collections that are static variables etc. Once you find out where the reason for the memory growth lies you can decide which part of your classes to refactor.

You could save memory at the cost of time (but if your memory use is a real problem, then it's probably a nett gain in time) by storing the values in a bitset.
If the class is immutable (once you create it, you never change it) then you can perhaps gain by using a variant on Flyweight pattern. Here you have a store of in-use objects in a weak hashmap, and create your objects in a factory. If you create an object that is identical to an existing object, then your factory returns this previous object instead. The saving in memory can be negliable or massive depending on how many repeated objects there are.
If the class is not immutable, but there is such repetition, you can still use the Flyweight pattern, but you will have to do a sort of copy-on-write where altering an object makes it change from using a shared internal representation to one of its own (or a new one from the flyweight store). This is yet more complicated and yet more expensive in terms of time, but again if its appropriate, the savings can be great.

You can use command pattern:
public enum Command {
SAMPLE_FLAG1("FLAG1") {
public abstract void call( ){
//Do you increment here
}
},
SAMPLE_FLAG2("FLAG2") {
public abstract void call( ){
//Do you increment here
}
};
private Map<String, Command> commands = new HashMap<String, Command>( );
static {
for ( Command cmd : Command.values( )) {
commands.put( cmd.name, cmd);
}
};
private String name;
private Command( String name) {
this.name = name;
}
public Command fromString( String cmd) {
return commands.get( cmd);
}
public abstract void call( );
}
and then:
for( String $ : flagMap.keySet( )) {
Command.fromString( $).call( );
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Use shared variable in Spark - java

You don't need shared variable in this case. BLAS.getInstance() just return a static/singleton instance, so no inefficient thing here.

Related

Comparing object references with multiple instantiation

How to improve performance in recursive method with Loop in Java

Is it safe to use hashmap value reference when it may be updated in another thread

RxJava - Observable chain with concatWith and map

Saving on Instance Variables

Categories

Resources