Spark Scala Dynamic creation of Serializable object

Spark Scala Dynamic creation of Serializable object - java

I need using a tester for Scala Spark filter, with tester implementing java's Predicate interface and receiving specific class name by arguments.
I'm doing something like this
val tester = Class.forName(qualifiedName).newInstance().asInstanceOf[Predicate[T]]
var filtered = rdd.filter(elem => tester.test(elem))
The problem is that at runtime i have a Spark "TaskNotSerializable Exception" because my specific Predicate class is not Serializable.
If I do
val tester = Class.forName(qualifiedName).newInstance()
.asInstanceOf[Predicate[T] with Serializable]
var filtered = rdd.filter(elem => tester.test(elem))
I get the same error.
If I create tester into rdd.filter call it works:
var filtered = rdd.filter { elem =>
val tester = Class.forName(qualifiedName).newInstance()
.asInstanceOf[Predicate[T] with Serializable]
tester.test(elem)
}
But I would create a single object (maybe to broadcast) for testing. How can I resolve?

You simply have to require the class implements Serializable. Note that the asInstanceOf[Predicate[T] with Serializable] cast is a lie: it doesn't actually check value is Serializable, which is why the second case doesn't produce an error immediately during the cast, and the last one "succeeds".
But I would create a single object (maybe to broadcast) for testing.
You can't. Broadcast or not, deserialization will create new objects on worker nodes. But you can create only a single instance on each partition:
var filtered = rdd.mapPartitions { iter =>
val tester = Class.forName(qualifiedName).newInstance()
.asInstanceOf[Predicate[T]]
iter.filter(tester.test)
}
It will actually perform better than serializing the tester, sending it, and deserializing it would, since it's strictly less work.

Related

How to use reflection to find annotated Lambda Functions

I have an application that has many declared Lambdas. I've added an annotation to them so that I can use reflection to find all the functions marked with the annotation. They are all defined as:
#FooFunction("abc")
public static Function<Task, Result> myFunc = task -> {... returns new Result}
At startup, my application uses reflection to find all of the annotated functions and add them to the hashmap.
static HashMap<String, Function<Task, Result>> funcMap = new HashMap<>();
static {
Reflections reflections = new Reflections("my.package", Scanners.values());
var annotated = reflections.getFieldsAnnotatedWith(FooFunction.class);
annotated.forEach(aField -> {
try {
var annot = aField.getAnnotation(FooFunction.class);
var key = annot.value();
funcMap.put(key, aField.get(null);
} catch (Exception e) {
...;
}
}
The above code definitely won't work, especially on the put since aField.get(null) returns an Object. If I cast the object to Function<Task,Result>, I get an unchecked cast warning. No matter how I circle around it, I can't get rid of the warning (without using Suppress).
I've tried changing the Function<Foo, Bar> to something more generic like Function<?,?> but that took me down another rabbit hole.
All of the functions are declared as static since they really don't need to belong to a specific class. They are grouped under various classes simply for organizational purposes.
The underlying objective is: the API will receive a list of tasks. There are about 100 different Task types. Each Task has an "id" field which is used to determine which Function should be used to process that Task. It looks something like this:
var results = Arrays.stream(request.getTasks())
.map(task -> functionMap.getOrDefault(task.getId(), unknownTaskFn).apply(task)
.toList();
My questions:
Is this an antipattern? If so, is there a better prescribed pattern?
How can I go from an Object to a Function<Task,Result> properly to put it into the map?
Thanks

Casting is inevitable, because Field.get returns Object by design, but it could be done without warnings.
I would also suggest define a custom interface
public interface TaskResultFunction extends Function<Task, Result> {
}
and use it for lambda declarations
#FooFunction("abc")
public static TaskResultFunction myFunc = task -> {... returns new Result}
(otherwise we will have to deal with ParameterizedTypeReference, but in this case it is not necessary and overcomplicated)
Map<String, Function<Task, String>> funcMap = ...
// or more strict
Map<String, TaskResultFunction> funcMap = ...
//...
if (TaskResultFunction.class.isAssignableFrom(field.getType())) {
TaskResultFunction fn = (TaskResultFunction) field.get(null);
taskResultFunctions.put(key, fn);
}

Kryo: Difference between readClassAndObject/ReadObject and WriteClassAndObject/WriteObject

I am trying to understand the following statement from the documentation:
If the concrete class of the object is not known and the object couldbe null:
kryo.writeClassAndObject(output, object);
Object object = kryo.readClassAndObject(input);
What does if the concrete class is not known exactly.
I am having the following code:
case class RawData(modelName: String,
sourceType: String,
deNormalizedVal: String,
normalVal: Map[String, String])
object KryoSpike extends App {
val kryo = new Kryo()
kryo.setRegistrationRequired(false)
kryo.addDefaultSerializer(classOf[scala.collection.Map[_,_]], classOf[ScalaImmutableAbstractMapSerializer])
kryo.addDefaultSerializer(classOf[scala.collection.generic.MapFactory[scala.collection.Map]], classOf[ScalaImmutableAbstractMapSerializer])
kryo.addDefaultSerializer(classOf[RawData], classOf[ScalaProductSerializer])
//val testin = Map("id" -> "objID", "field1" -> "field1Value")
val testin = RawData("model1", "Json", "", Map("field1" -> "value1", "field2" -> "value2") )
val outStream = new ByteArrayOutputStream()
val output = new Output(outStream, 20480)
kryo.writeClassAndObject(output, testin)
output.close()
val input = new Input(new ByteArrayInputStream(outStream.toByteArray), 4096)
val testout = kryo.readClassAndObject(input)
input.close()
println(testout.toString)
}
When I use readClassAndObject and writeClassAndObject is works. However if I use writeObject and readObject it does not.
Exception in thread "main" com.esotericsoftware.kryo.KryoException:
Class cannot be created (missing no-arg constructor):
com.romix.scala.serialization.kryo.ScalaProductSerializer
I just don't understand why.
earlier using the same code, Instead of using my class RawData, I used a Map and it worked like a charm with writeObject and ReadObject. Hence i am confused.
Can someone help understand it ?

The difference is as follows:
you use writeClassAndObject and readClassAndObject when you're using a serializer that:
serializes a base type: an interface, a class that has subclasses, or - in case of Scala - a trait like Product,
and needs the type (i.e. the Class object) of the deserialized object to construct this object (without this type, it doesn't know what to construct),
example: ScalaProductSerializer
you use writeObject and readObject when you're using a serializer that:
serializes exactly one type (i.e. a class that can be instantiated; example: EnumSetSerializer),
or serializes more than one type but the specific type can be somehow deduced from the serialized data (example: ScalaImmutableAbstractMapSerializer)
To sum this up for your specific case:
when you deserialize your RawData:
ScalaProductSerializer needs to find out the exact type of Product to create an instance,
so it uses the typ: Class[Product] parameter to do it,
as a result, only readClassAndObject works.
when you deserialze a Scala immutable map (scala.collection.immutable.Map imported as IMap):
ScalaImmutableAbstractMapSerializer doesn't need to find out the exact type - it uses IMap.empty to create an instance,
as a result, it doesn't use the typ: Class[IMap[_, _]] parameter,
as a result, both readObject and readClassAndObject work.

Append to Kotlin data class ArrayList

I have the following data class intended for use in an Android application running Kotlin version 1.2.51:
data class Data(var a: ArrayList<String>, var b: String)
As you can see, a is an ArrayList. I want to append elements from another array into a. I've tried this:
itemsToAppend.forEach {
Data.a.add(it)
}
However, Android Studio determines that a is an unresolved reference. How exactly does one append an item to such an ArrayList?
Thanks.

Data classes are not object classes. You will have to initialise them before you can use it
val d= Data(ArrayList(), "demo")
itemsToAppend.forEach {
d.a.add(it)
}

create an instance of Data:
var a: ArrayList<String> = arrayListOf()
var data = Data(a, "something")
and use data in your loop

If you want to access you list staticly do this:
data class D(var a: ArrayList<String>) { // a can't be used as D.a
companion object {
var ab: ArrayList<String> = ArrayList() // ab can be used as D.ab
}
}

Gson-like library for scala

I'm learning scala. I'm trying to find an easy way for turing JSON String to Scala case class instance. Java has wonderful library called Google Gson. It can turn java bean to json and back without some special coding, basically you can do it in a single line of code.
public class Example{
private String firstField
private Integer secondIntField
//constructor
//getters/setters here
}
//Bean instance to Json string
String exampleAsJson = new Gson().toJson(new Example("hehe", 42))
//String to Bean instance
Example exampleFromJson = new Gson().fromJson(exampleAsJson, Example.class)
I'm reading about https://www.playframework.com/documentation/2.5.x/ScalaJson and can't get the idea: why it's so complex is scala? Why should I write readers/writers to serialize/deserialize plain simple case class instances? Is there easy way to convert case class instance -> json -> case class instance using play json api?

Let's say you have
case class Foo(a: String, b: String)
You can easily write a formatter for this in Play by doing
implicit val fooFormat = Json.format[Foo]
This will allow you to both serialize and deserialize to JSON.
val foo = Foo("1","2")
val js = Json.toJson(foo)(fooFormat) // Only include the specific format if it's not in scope.
val fooBack = js.as[Foo] // Now you have foo back!

Check out uPickle
Here's a small example:
case class Example(firstField: String, secondIntField: Int)
val ex = Example("Hello", 3)
write(ex) // { "firstField": "Hello", "secondIntField" : 3 }

What is the Scala equivalent to a Java builder pattern?

In the work that I do on a day to day in Java, I use builders quite a lot for fluent interfaces, e.g.: new PizzaBuilder(Size.Large).onTopOf(Base.Cheesy).with(Ingredient.Ham).build();
With a quick-and-dirty Java approach, each method call mutates the builder instance and returns this. Immutably, it involves more typing, cloning the builder first before modifying it. The build method eventually does the heavy lifting over the builder state.
What's a nice way of achieving the same in Scala?
If I wanted to ensure that onTopOf(base:Base) was called only once, and then subsequently only with(ingredient:Ingredient) and build():Pizza could be called, a-la a directed builder, how would I go about approaching this?

Another alternative to the Builder pattern in Scala 2.8 is to use immutable case classes with default arguments and named parameters. Its a little different but the effect is smart defaults, all values specified and things only specified once with syntax checking...
The following uses Strings for the values for brevity/speed...
scala> case class Pizza(ingredients: Traversable[String], base: String = "Normal", topping: String = "Mozzarella")
defined class Pizza
scala> val p1 = Pizza(Seq("Ham", "Mushroom"))
p1: Pizza = Pizza(List(Ham, Mushroom),Normal,Mozzarella)
scala> val p2 = Pizza(Seq("Mushroom"), topping = "Edam")
p2: Pizza = Pizza(List(Mushroom),Normal,Edam)
scala> val p3 = Pizza(Seq("Ham", "Pineapple"), topping = "Edam", base = "Small")
p3: Pizza = Pizza(List(Ham, Pineapple),Small,Edam)
You can then also use existing immutable instances as kinda builders too...
scala> val lp2 = p3.copy(base = "Large")
lp2: Pizza = Pizza(List(Ham, Pineapple),Large,Edam)

You have three main alternatives here.
Use the same pattern as in Java, classes and all.
Use named and default arguments and a copy method. Case classes already provide this for you, but here's an example that is not a case class, just so you can understand it better.
object Size {
sealed abstract class Type
object Large extends Type
}
object Base {
sealed abstract class Type
object Cheesy extends Type
}
object Ingredient {
sealed abstract class Type
object Ham extends Type
}
class Pizza(size: Size.Type,
base: Base.Type,
ingredients: List[Ingredient.Type])
class PizzaBuilder(size: Size.Type,
base: Base.Type = null,
ingredients: List[Ingredient.Type] = Nil) {
// A generic copy method
def copy(size: Size.Type = this.size,
base: Base.Type = this.base,
ingredients: List[Ingredient.Type] = this.ingredients) =
new PizzaBuilder(size, base, ingredients)
// An onTopOf method based on copy
def onTopOf(base: Base.Type) = copy(base = base)
// A with method based on copy, with `` because with is a keyword in Scala
def `with`(ingredient: Ingredient.Type) = copy(ingredients = ingredient :: ingredients)
// A build method to create the Pizza
def build() = {
if (size == null || base == null || ingredients == Nil) error("Missing stuff")
else new Pizza(size, base, ingredients)
}
}
// Possible ways of using it:
new PizzaBuilder(Size.Large).onTopOf(Base.Cheesy).`with`(Ingredient.Ham).build();
// or
new PizzaBuilder(Size.Large).copy(base = Base.Cheesy).copy(ingredients = List(Ingredient.Ham)).build()
// or
new PizzaBuilder(size = Size.Large,
base = Base.Cheesy,
ingredients = Ingredient.Ham :: Nil).build()
// or even forgo the Builder altogether and just
// use named and default parameters on Pizza itself
Use a type safe builder pattern. The best introduction I know of is this blog, which also contains references to many other articles on the subject.
Basically, a type safe builder pattern guarantees at compile time that all required components are provided. One can even guarantee mutual exclusion of options or arity. The cost is the complexity of the builder code, but...

Case classes solve the problem as shown in previous answers, but the resulting api is difficult to use from java when You have scala collections in your objects. To provide a fluent api to java users try this:
case class SEEConfiguration(parameters : Set[Parameter],
plugins : Set[PlugIn])
case class Parameter(name: String, value:String)
case class PlugIn(id: String)
trait SEEConfigurationGrammar {
def withParameter(name: String, value:String) : SEEConfigurationGrammar
def withParameter(toAdd : Parameter) : SEEConfigurationGrammar
def withPlugin(toAdd : PlugIn) : SEEConfigurationGrammar
def build : SEEConfiguration
}
object SEEConfigurationBuilder {
def empty : SEEConfigurationGrammar = SEEConfigurationBuilder(Set.empty,Set.empty)
}
case class SEEConfigurationBuilder(
parameters : Set[Parameter],
plugins : Set[PlugIn]
) extends SEEConfigurationGrammar {
val config : SEEConfiguration = SEEConfiguration(parameters,plugins)
def withParameter(name: String, value:String) = withParameter(Parameter(name,value))
def withParameter(toAdd : Parameter) = new SEEConfigurationBuilder(parameters + toAdd, plugins)
def withPlugin(toAdd : PlugIn) = new SEEConfigurationBuilder(parameters , plugins + toAdd)
def build = config
}
Then in java code the api is really easy to use
SEEConfigurationGrammar builder = SEEConfigurationBuilder.empty();
SEEConfiguration configuration = builder
.withParameter(new Parameter("name","value"))
.withParameter("directGivenName","Value")
.withPlugin(new PlugIn("pluginid"))
.build();

It's the same exact pattern. Scala allows for mutation and side effects. That said, if you'd like to be more of a purest, have each method return a new instance of the object that you're constructing with the element(s) changed. You could even put the functions within the Object of a class so that there's a higher level of separation within your code.
class Pizza(size:SizeType, layers:List[Layers], toppings:List[Toppings]){
def Pizza(size:SizeType) = this(size, List[Layers](), List[Toppings]())
object Pizza{
def onTopOf( layer:Layer ) = new Pizza(size, layers :+ layer, toppings)
def withTopping( topping:Topping ) = new Pizza(size, layers, toppings :+ topping)
}
so that your code might look like
val myPizza = new Pizza(Large) onTopOf(MarinaraSauce) onTopOf(Cheese) withTopping(Ham) withTopping(Pineapple)
(Note: I've probably screwed up some syntax here.)

using Scala partial applies are feasible if you are building a smallish object that you don't need to pass over method signatures. If any of those assumptions don't apply, I recommend using a mutable builder to build an immutable object. With this being scala you could implement the builder pattern with a case class for the object to build with a companion as the builder.
Given that the end result is a constructed immutable object I don't see that it defeats any of the Scala principles.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Spark Scala Dynamic creation of Serializable object - java

Related

How to use reflection to find annotated Lambda Functions

Kryo: Difference between readClassAndObject/ReadObject and WriteClassAndObject/WriteObject

Append to Kotlin data class ArrayList

Gson-like library for scala

What is the Scala equivalent to a Java builder pattern?

Categories

Resources