overloaded methods in Trait error Spark Scala - java

I have some code
trait Reader {
def read(spark: SparkSession, format: String, path: String): DataFrame
def read[T: Encoder](spark: SparkSession, format: String, path: String): Dataset[T]
}
class LocalReader extends Reader {
override def read[T: Encoder](spark: SparkSession, format: String, path: String): Dataset[T] = {
spark.read
.format(format)
.option("header", "true")
.load(getClass.getResource(path).getPath)
.as[T]
}
override def read(spark: SparkSession, format: String, path: String): DataFrame = {
spark.read
.format(format)
.option("header", "true")
.load(getClass.getResource(path).getPath)
}
}
object TopNSimilarCustomers extends SparkJob {
override def appName: String = "TopNSimilarCustomers"
override def run(spark: SparkSession, args: Array[String], reader: Reader): Unit = {
/**
* Only I/O here
*/
if (args.length == 0)
return
val rawData = reader.read(spark, "json", "/spark-test-data.json")
val res = transform(spark, rawData, args(0))
}
I'm getting an error at val rawData = reader.read(spark, "json", "/spark-test-data.json") cannot resolve overloaded method read.
So I want to have Readers/Writers for different purposes LocalReader/S3Reader and since it can return DF and DS I write an overloaded method even I have to use one. And eventually, have to implement both. Any way to avoid it?
How can I achieve what I'm trying to do? any other way or a better way etc?
how to fix the error?

Reason for getting cannot resolve overloaded method read. is Reader trait has two methods both will take same number of params.
To solve this issue rename method names something like for example readDF & readDS or you can also check below code & modify as per your requirement.
case class ReadConfig(format: String,path: String,options: Map[String,String])
case class WriteConfig(format: String,path: String,options: Map[String,String])
case class Config(read: ReadConfig,write: WriteConfig)
trait Writer {
def write(df: DataFrame): Unit
}
trait Reader {
def read: DataFrame
}
trait RW extends Reader with Writer {
val spark : SparkSession
val config : Config
}
// Add logic for Local
class Local(override val spark: SparkSession,override val config: Config) extends RW {
override def read: DataFrame = {
spark.read
.format(config.read.format)
.options(config.read.options)
.load(config.read.path)
}
override def write(df: DataFrame): Unit = {
df.write
.format(config.write.format)
.options(config.write.options)
.save(config.write.path)
}
}
// Add logic for S3
class S3(override val spark: SparkSession,override val config: Config) extends RW {
override def read: DataFrame = {
spark.read
.format(config.read.format)
.options(config.read.options)
.load(config.read.path)
}
override def write(df: DataFrame): Unit = {
df.write
.format(config.write.format)
.options(config.write.options)
.save(config.write.path)
}
}

Related

Error: Class is not registered for polymorphic serialization in the scope of its interface

I need set serializable interface for using any data class in methods, example data:
#Serializable
interface Todo{}
#Serializable
data class userDataForRegistration(val name: String, val number: String, val password: String): Todo
#Serializable
data class userDataForLogin(val number: String, val password: String): Todo
#Serializable
data class contactForRemove(val id: String, val number: String): Todo
#Serializable
data class userData(val number: String)
#Serializable
data class message(val message: String)
example method, where body - some of the above data classes :
class Connection {
val client = OkHttpClient()
// params: login, registration, contact
fun sendData(url: String, param: String, body: Todo){
var json = Json.encodeToString(body)
var reqBody = RequestBody.create("application/json; charset=utf-8".toMediaTypeOrNull(), json)
val request = Request.Builder()
.url(url)
.post(reqBody)
.build()
client.newCall(request).enqueue(object : Callback {
override fun onFailure(call: Call, e: IOException) {
println("error" + e)
}
override fun onResponse(call: Call, response: Response){
var res = response.body?.string()
when(param){
"login", "registration" -> {
try{
val objUser = Json.decodeFromString<User>(res.toString())
returnUser(objUser)
}
catch(e: Exception){
val mes = Json.decodeFromString<message>(res.toString())
returnMessage(mes)
}
}
"contact" ->{
val mes = Json.decodeFromString<message>(res.toString())
returnMessage(mes)
}
}
}
})
}
but if i calling method:
val userDataForLogin = userDataForLogin(etv_name.text.toString(), etv_pass.text.toString())
val response = connection.sendData("${ip_static.ip}/user/login", "login", userDataForLogin)
i get error:
#Serializable annotation is ignored because it is impossible to serialize automatically interfaces or enums. Provide serializer manually via e.g. companion object
I need use only TODO interface to use any data class in methods, object and abstract class will doesnt working, because it use data class
Also my plugins in build.gradle:
plugins {
id 'com.android.application' version '7.2.2' apply false
id 'com.android.library' version '7.2.2' apply false
id 'org.jetbrains.kotlin.android' version '1.6.10' apply false
id 'org.jetbrains.kotlin.plugin.serialization' version '1.6.21'
}
I read, that kotlin.plugin.serialization 1.6.2+ working with serialization interface, but idk whats wrong with me...
Thank you in advance!)
You should not add #Serializable to the interface, see the documentation regarding polymorphic seriliazation. Instead you must annotate the implementations and possibly register the corresponding serializers.

Vert.x 4 eventbus serialize multiple classes with same codec

Is there a way to register a codec for multiple classes? Basically, all my classes should just be serialized using a Jackson object mapper. But it seems like I have to create a custom codec for each class (even though I can abstract it a little bit using generics).
A small code example:
Codec:
class JacksonCodec<T>(private val mapper: ObjectMapper, private val clazz: Class<T>) : MessageCodec<T, T> {
override fun encodeToWire(buffer: Buffer, s: T) {
buffer.appendBytes(mapper.writeValueAsBytes(s))
}
override fun decodeFromWire(pos: Int, buffer: Buffer): T {
val length = buffer.getInt(pos)
val bytes = buffer.getBytes(pos + 4, pos + 4 + length)
return mapper.readValue(bytes, clazz)
}
...
}
register codec for each class I want to serialize:
vertx.eventBus()
.registerDefaultCodec(A::class.java, JacksonCodec(DatabindCodec.mapper(), A::class.java))
vertx.eventBus()
vertx.eventBus()
.registerDefaultCodec(B::class.java, JacksonCodec(DatabindCodec.mapper(), B::class.java))
vertx.eventBus()
The code examples are kotlin but same applies for Java.
As far as I can tell looking at the code, there is no way, as the class needs to be the exact match:
https://github.com/eclipse-vertx/vert.x/blob/master/src/main/java/io/vertx/core/eventbus/impl/CodecManager.java#L99
It is possible, with some limitations and quirks. I would not recommend doing it.
Let's start with the limitations:
It can not be used in clustered mode
You have to declare the codec name every time you send something over the eventbus.
If you create a generic codec that encodes classes with Jackson and every time you send something over the eventbus you make sure to add it using codecName in the deliveryOptions, you can register it only once and use it for all of your classes.
Full example:
fun main() {
val vertx = Vertx.vertx()
vertx.eventBus().registerCodec(GenericCodec())
vertx.eventBus().consumer<Foo>("test-address") {
println(it.body())
it.reply(Bar(), genericDeliveryOptions)
}
vertx.eventBus().request<String>("test-address", Foo(), genericDeliveryOptions) {
println(it.result().body())
}
vertx.close()
}
data class Foo(
val foo: String = "foo",
)
data class Bar(
val bar: String = "bar",
)
class GenericCodec : MessageCodec<Any, Any> {
companion object {
const val NAME = "generic"
}
private val mapper: ObjectMapper = ObjectMapper()
override fun encodeToWire(buffer: Buffer, s: Any) {
buffer.appendBytes(mapper.writeValueAsBytes(s))
}
override fun decodeFromWire(pos: Int, buffer: Buffer): Any {
throw RuntimeException("should never get here, unless using clustered mode")
}
override fun transform(s: Any): Any {
return s
}
override fun name(): String {
return NAME
}
override fun systemCodecID(): Byte {
return -1
}
}
val genericDeliveryOptions = deliveryOptionsOf(codecName = GenericCodec.NAME)

Declare classes in Kotlin functions

I declared a data class in a Kotlin function, but the data is empty after gson conversion.
fun writeAndFlush(context: StateMachine) {
data class Temp(val model: TaskModel, val totalTime: String?, val state: String)
val temp = Temp(context.businessObj, context.totalTime, context.state.toString())
Log.e("test", temp.toString()) // print data here.
val json = Gson().toJson(temp)
Log.e("test", json) // problem here.....print null
}
Is there any problem with this way?

What is a good way to implement reloading of a Typesafe config

In a Scala application that is using Typesafe Config, I want to add the possibility to reload a Config at runtime. A Config instance is immutable. Here is what I have so far:
package config
trait Settings {
private[config] var config: Config = ConfigFactory.empty()
def engine: EngineSettings
}
trait EngineSettings {
def weight: Int
def offset: Int
}
class AppSettings {
override def engine = new EngineSettings {
override def weight = config.getInt("engine.weight")
override def offset = config.getInt("engine.offset")
}
}
object Settings {
private val namedSettings = new TrieMap[String, AppSettings]
def load(configName: String = "local"): Settings = {
// load config
// create or update AppSettings
// add to map and return
}
}
Initially a Settings instance is created using Settings.load. That instance reference is handed to other classes. Then a second thread can reload the underlying Config by calling Settings.load again. Here is how you access it:
class Engine(settings: Settings) {
def calculate() = {
val weight = settings.engine.weight
// do some stuff
val offset = settings.engine.offset
}
}
There are two problems:
someone might reload the underlying Config while calculate() is at line: // do some stuff (consistency)
don't like using a var in the Settings trait
How can I improve this design :)
You could turn config into a method with support for config cache invalidation (and with sensible defaults), so you can choose between dynamic (default in the following sample) and performance.
In general I suggest you use a good Scala typesafe wrapper of TypeSafe's Config such as Ficus (e.g. Gradle-stype artifact dependency net.ceedubs:ficus_2.11:1.1.1)
package config
import scala.collection.concurrent.TrieMap
import com.typesafe.config.{Config, ConfigFactory}
import net.ceedubs.ficus.Ficus._
trait Settings {
protected[config] def config (
name: String = "local",
invalidateCache: Boolean = false
): Config = {
if (invalidateCache) { ConfigFactory invalidateCaches }
ConfigFactory load name
}
def engine: EngineSettings
}
trait EngineSettings {
def weight: Int
def offset: Int
}
class AppSettings(val name: String = "local") extends Settings {
val c = config()
override def engine = new EngineSettings {
override def weight = c.as[Int]("engine.weight")
override def offset = c.as[Int]("engine.offset")
}
}
object Settings {
private val namedSettings = new TrieMap[String, AppSettings]
def load(configName: String = "local"): Settings = {
// e.g.
val loadedUpToDate = new AppSettings
namedSettings +=
((configName + "." + System.currentTimeMillis, loadedUpToDate))
new Settings {
override def engine = loadedUpToDate.engine
}
}
}
I think this solves your issues because:
Configuration retrieval is dynamic by default through reload
By using a method you don't resort to mutable state

Efficient POJO mapping to/from Java Mongo DBObject using Jackson

Although similar to Convert DBObject to a POJO using MongoDB Java Driver my question is different in that I am specifically interested in using Jackson for mapping.
I have an object which I want to convert to a Mongo DBObject instance. I want to use the Jackson JSON framework to do the job.
One way to do so is:
DBObject dbo = (DBObject)JSON.parse(m_objectMapper.writeValueAsString(entity));
However, according to https://github.com/FasterXML/jackson-docs/wiki/Presentation:-Jackson-Performance this is the worst way to go. So, I am looking for an alternative. Ideally, I would like to be able to hook into the JSON generation pipeline and populate a DBObject instance on the fly. This is possible, because the target in my case is a BasicDBObject instance, which implements the Map interface. So, it should fit into the pipeline easily.
Now, I know I can convert an object to Map using the ObjectMapper.convertValue function and then recursively convert the map to a BasicDBObject instance using the map constructor of the BasicDBObject type. But, I want to know if I can eliminate the intermediate map and create the BasicDBObject directly.
Note, that because a BasicDBObject is essentially a map, the opposite conversion, namely from a scalar DBObject to a POJO is trivial and should be quite efficient:
DBObject dbo = getDBO();
Class clazz = getObjectClass();
Object pojo = m_objectMapper.convertValue(dbo, clazz);
Lastly, my POJO do not have any JSON annotations and I would like it to keep this way.
You can probably use Mixin annotations to annotate your POJO and the BasicDBObject (or DBObject), so annotations is not a problem. Since BasicDBOject is a map, you can use #JsonAnySetter on the put method.
m_objectMapper.addMixInAnnotations(YourMixIn.class, BasicDBObject.class);
public interface YourMixIn.class {
#JsonAnySetter
void put(String key, Object value);
}
This is all I can come up with since I have zero experience with MongoDB Object.
Update: MixIn are basically a Jackson mechanism to add annotation to a class without modifying said class. This is a perfect fit when you don't have control over the class you want to marshal (like when it's from an external jar) or when you don't want to clutter your classes with annotation.
In your case here, you said that BasicDBObject implements the Map interface, so that class has the method put, as defined by the map interface. By adding #JsonAnySetter to that method, you tell Jackson that whenever he finds a property that he doesn't know after introspection of the class to use the method to insert the property to the object. The key is the name of the property and the value is, well, the value of the property.
All this combined makes the intermediate map go away, since Jackson will directly convert to the BasicDBOject because it now knows how to deserialize that class from Json. With that configuration, you can do:
DBObject dbo = m_objectMapper.convertValue(pojo, BasicDBObject.class);
Note that I haven't tested this because I don't work with MongoDB, so there might be some loose ends. However, I have used the same mechanism for similar use cases without any problem. YMMV depending on the classes.
Here's an example of a simple serializer (written in Scala) from POJO to BsonDocument which could be used with version 3 of Mongo driver. The de-serializer would be somewhat more difficult to write.
Create a BsonObjectGenerator object which would do a streaming serialization to Mongo Bson directly:
val generator = new BsonObjectGenerator
mapper.writeValue(generator, POJO)
generator.result()
Here's the code for a serializer:
class BsonObjectGenerator extends JsonGenerator {
sealed trait MongoJsonStreamContext extends JsonStreamContext
case class MongoRoot(root: BsonDocument = BsonDocument()) extends MongoJsonStreamContext {
_type = JsonStreamContext.TYPE_ROOT
override def getCurrentName: String = null
override def getParent: MongoJsonStreamContext = null
}
case class MongoArray(parent: MongoJsonStreamContext, arr: BsonArray = BsonArray()) extends MongoJsonStreamContext {
_type = JsonStreamContext.TYPE_ARRAY
override def getCurrentName: String = null
override def getParent: MongoJsonStreamContext = parent
}
case class MongoObject(name: String, parent: MongoJsonStreamContext, obj: BsonDocument = BsonDocument()) extends MongoJsonStreamContext {
_type = JsonStreamContext.TYPE_OBJECT
override def getCurrentName: String = name
override def getParent: MongoJsonStreamContext = parent
}
private val root = MongoRoot()
private var node: MongoJsonStreamContext = root
private var fieldName: String = _
def result(): BsonDocument = root.root
private def unsupported(): Nothing = throw new UnsupportedOperationException
override def disable(f: Feature): JsonGenerator = this
override def writeStartArray(): Unit = {
val array = new BsonArray
node match {
case MongoRoot(o) =>
o.append(fieldName, array)
fieldName = null
case MongoArray(_, a) =>
a.add(array)
case MongoObject(_, _, o) =>
o.append(fieldName, array)
fieldName = null
}
node = MongoArray(node, array)
}
private def writeBsonValue(value: BsonValue): Unit = node match {
case MongoRoot(o) =>
o.append(fieldName, value)
fieldName = null
case MongoArray(_, a) =>
a.add(value)
case MongoObject(_, _, o) =>
o.append(fieldName, value)
fieldName = null
}
private def writeBsonString(text: String): Unit = {
writeBsonValue(BsonString(text))
}
override def writeString(text: String): Unit = writeBsonString(text)
override def writeString(text: Array[Char], offset: Int, len: Int): Unit = writeBsonString(new String(text, offset, len))
override def writeString(text: SerializableString): Unit = writeBsonString(text.getValue)
private def writeBsonFieldName(name: String): Unit = {
fieldName = name
}
override def writeFieldName(name: String): Unit = writeBsonFieldName(name)
override def writeFieldName(name: SerializableString): Unit = writeBsonFieldName(name.getValue)
override def setCodec(oc: ObjectCodec): JsonGenerator = this
override def useDefaultPrettyPrinter(): JsonGenerator = this
override def getFeatureMask: Int = 0
private def writeBsonBinary(data: Array[Byte]): Unit = {
writeBsonValue(BsonBinary(data))
}
override def writeBinary(bv: Base64Variant, data: Array[Byte], offset: Int, len: Int): Unit = {
val res = if (offset != 0 || len != data.length) {
val subset = new Array[Byte](len)
System.arraycopy(data, offset, subset, 0, len)
subset
} else {
data
}
writeBsonBinary(res)
}
override def writeBinary(bv: Base64Variant, data: InputStream, dataLength: Int): Int = unsupported()
override def isEnabled(f: Feature): Boolean = false
override def writeRawUTF8String(text: Array[Byte], offset: Int, length: Int): Unit = writeBsonString(new String(text, offset, length, "UTF-8"))
override def writeRaw(text: String): Unit = unsupported()
override def writeRaw(text: String, offset: Int, len: Int): Unit = unsupported()
override def writeRaw(text: Array[Char], offset: Int, len: Int): Unit = unsupported()
override def writeRaw(c: Char): Unit = unsupported()
override def flush(): Unit = ()
override def writeRawValue(text: String): Unit = writeBsonString(text)
override def writeRawValue(text: String, offset: Int, len: Int): Unit = writeBsonString(text.substring(offset, offset + len))
override def writeRawValue(text: Array[Char], offset: Int, len: Int): Unit = writeBsonString(new String(text, offset, len))
override def writeBoolean(state: Boolean): Unit = {
writeBsonValue(BsonBoolean(state))
}
override def writeStartObject(): Unit = {
node = node match {
case p#MongoRoot(o) =>
MongoObject(null, p, o)
case p#MongoArray(_, a) =>
val doc = new BsonDocument
a.add(doc)
MongoObject(null, p, doc)
case p#MongoObject(_, _, o) =>
val doc = new BsonDocument
val f = fieldName
o.append(f, doc)
fieldName = null
MongoObject(f, p, doc)
}
}
override def writeObject(pojo: scala.Any): Unit = unsupported()
override def enable(f: Feature): JsonGenerator = this
override def writeEndArray(): Unit = {
node = node match {
case MongoRoot(_) => unsupported()
case MongoArray(p, a) => p
case MongoObject(_, _, _) => unsupported()
}
}
override def writeUTF8String(text: Array[Byte], offset: Int, length: Int): Unit = writeBsonString(new String(text, offset, length, "UTF-8"))
override def close(): Unit = ()
override def writeTree(rootNode: TreeNode): Unit = unsupported()
override def setFeatureMask(values: Int): JsonGenerator = this
override def isClosed: Boolean = unsupported()
override def writeNull(): Unit = {
writeBsonValue(BsonNull())
}
override def writeNumber(v: Int): Unit = {
writeBsonValue(BsonInt32(v))
}
override def writeNumber(v: Long): Unit = {
writeBsonValue(BsonInt64(v))
}
override def writeNumber(v: BigInteger): Unit = unsupported()
override def writeNumber(v: Double): Unit = {
writeBsonValue(BsonDouble(v))
}
override def writeNumber(v: Float): Unit = {
writeBsonValue(BsonDouble(v))
}
override def writeNumber(v: BigDecimal): Unit = unsupported()
override def writeNumber(encodedValue: String): Unit = unsupported()
override def version(): Version = unsupported()
override def getCodec: ObjectCodec = unsupported()
override def getOutputContext: JsonStreamContext = node
override def writeEndObject(): Unit = {
node = node match {
case p#MongoRoot(_) => p
case MongoArray(p, a) => unsupported()
case MongoObject(_, p, _) => p
}
}
}
You might be intereted in checking how jongo does it. It is open source and the code can be found on github. Or you could also simply use their library. I use a mix of jongo and plain DBObjects when I need more flexibility.
They claim that they are (almost) as fast as using the Java driver directly so I suppose their method is efficient.
I use the little helper utility class below which is inspired from their code base and uses a mix of Jongo (the MongoBsonFactory) and Jackson to convert between DBObjects and POJOs. Note that the getDbObject method does a deep copy of the DBObject to make it editable - if you don't need to customise anything you can remove that part and improve performance.
import com.fasterxml.jackson.annotation.JsonAutoDetect;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.ObjectReader;
import com.fasterxml.jackson.databind.ObjectWriter;
import com.fasterxml.jackson.databind.introspect.VisibilityChecker;
import com.mongodb.BasicDBObject;
import com.mongodb.DBEncoder;
import com.mongodb.DBObject;
import com.mongodb.DefaultDBEncoder;
import com.mongodb.LazyWriteableDBObject;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import org.bson.LazyBSONCallback;
import org.bson.io.BasicOutputBuffer;
import org.bson.io.OutputBuffer;
import org.jongo.marshall.jackson.bson4jackson.MongoBsonFactory;
public class JongoUtils {
private final static ObjectMapper mapper = new ObjectMapper(MongoBsonFactory.createFactory());
static {
mapper.setVisibilityChecker(VisibilityChecker.Std.defaultInstance().withFieldVisibility(
JsonAutoDetect.Visibility.ANY));
}
public static DBObject getDbObject(Object o) throws IOException {
ObjectWriter writer = mapper.writer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
writer.writeValue(baos, o);
DBObject dbo = new LazyWriteableDBObject(baos.toByteArray(), new LazyBSONCallback());
//turn it into a proper DBObject otherwise it can't be edited.
DBObject result = new BasicDBObject();
result.putAll(dbo);
return result;
}
public static <T> T getPojo(DBObject o, Class<T> clazz) throws IOException {
ObjectReader reader = mapper.reader(clazz);
DBEncoder dbEncoder = DefaultDBEncoder.FACTORY.create();
OutputBuffer buffer = new BasicOutputBuffer();
dbEncoder.writeObject(buffer, o);
T pojo = reader.readValue(buffer.toByteArray());
return pojo;
}
}
Sample usage:
Pojo pojo = new Pojo(...);
DBObject o = JongoUtils.getDbObject(pojo);
//you can customise it if you want:
o.put("_id", pojo.getId());
I understand that this is a very old question, but if asked today I would instead recommend the built-in POJO support on the official Mongo Java driver.
Here's an update to assylias' answer that doesn't require Jongo and is compatible with the Mongo 3.x drivers. It also handles nested object graphs, I couldn't get that to work with LazyWritableDBObject which has been removed in the mongo 3.x drivers anyway.
The idea is to tell Jackson how to serialize an object to a BSON byte array, and then deserialize the BSON byte array into BasicDBObject. I'm sure you can find some low level API in the mongo-java-drivers if you want to ship the BSON bytes directly to the database. You will need a dependency to bson4jackson in order for ObjectMapper to serialize BSON when you call writeValues(ByteArrayOutputStream, Object):
import com.fasterxml.jackson.databind.ObjectMapper;
import com.mongodb.BasicDBObject;
import com.mongodb.DBObject;
import de.undercouch.bson4jackson.BsonFactory;
import de.undercouch.bson4jackson.BsonParser;
import org.bson.BSON;
import org.bson.BSONObject;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
public class MongoUtils {
private static ObjectMapper mapper;
static {
BsonFactory bsonFactory = new BsonFactory();
bsonFactory.enable(BsonParser.Feature.HONOR_DOCUMENT_LENGTH);
mapper = new ObjectMapper(bsonFactory);
}
public static DBObject getDbObject(Object o) {
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
mapper.writeValue(baos, o);
BSONObject decode = BSON.decode(baos.toByteArray());
return new BasicDBObject(decode.toMap());
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}

Categories