My Flink pipeline currently uses an Pojo that contains some Lists and Maps (of Strings), along the lines of
public class MyPojo {
private List<String> myList = new ArrayList<>();
private OtherPojo otherPojo = new OtherPojo();
// getters + setters...
}
public class OtherPojo {
private Map<String, String> myMap = new HashMap<>();
// getters + setters...
}
For performance reasons, I want to get around Kryo serialization, so I disabled the generic fallback with env.getConfig().disableGenericTypes(); as described in the Flink documentation.
Now, Flink complains about the lists:
Exception in thread "main" java.lang.UnsupportedOperationException: Generic types have been disabled in the ExecutionConfig and type java.util.List is treated as a generic type.
at org.apache.flink.api.java.typeutils.GenericTypeInfo.createSerializer(GenericTypeInfo.java:86)
at org.apache.flink.api.java.typeutils.PojoTypeInfo.createPojoSerializer(PojoTypeInfo.java:319)
at org.apache.flink.api.java.typeutils.PojoTypeInfo.createSerializer(PojoTypeInfo.java:311)
at org.apache.flink.streaming.api.graph.StreamGraph.addOperator(StreamGraph.java:258)
at org.apache.flink.streaming.api.graph.StreamGraphGenerator.transformOneInputTransform(StreamGraphGenerator.java:649)
at org.apache.flink.streaming.api.graph.StreamGraphGenerator.transform(StreamGraphGenerator.java:250)
at org.apache.flink.streaming.api.graph.StreamGraphGenerator.generate(StreamGraphGenerator.java:209)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.getStreamGraph(StreamExecutionEnvironment.java:1540)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1507)
...
What is the preferred way of serializing such simple lists and maps in Flink?. Internally, these are currently ArrayList and HashMap, but other implementations would also be fine. There seems to be a class org.apache.flink.api.common.typeutils.base.ListSerializer in Flink, but I do not know how to use it.
Marius already explained the reason beautifully, although I don't see the reason why Flink does not support your use case out of the box. Nevertheless, I'll add the solution that works right now.
// create type info
final TypeInformation<OtherPojo> otherPojoInfo = Types.POJO(OtherPojo.class,
ImmutableMap.of("myMap", Types.MAP(Types.STRING, Types.STRING)));
final TypeInformation<MyPojo> myPojoInfo = Types.POJO(MyPojo.class,
ImmutableMap.of("myList", Types.LIST(Types.STRING), "otherPojo", otherPojoInfo));
// test it
final MyPojo myPojo = new MyPojo();
myPojo.getMyList().add("test");
myPojo.getOtherPojo().getMyMap().put("ping", "pong");
final TypeSerializer<MyPojo> serializer = myPojoInfo.createSerializer(env.getConfig());
DataOutputSerializer dataOutputSerializer = new DataOutputSerializer(100);
serializer.serialize(myPojo, dataOutputSerializer);
DataInputDeserializer dataInputDeserializer = new DataInputDeserializer(dataOutputSerializer.getSharedBuffer());
final MyPojo clone = serializer.deserialize(dataInputDeserializer);
assert(myPojo.equals(clone));
Note that the terrible access pattern in the test code is just for a quick and dirty demonstration.
If you do:
env.getConfig().disableGenericTypes();
It will raise an exception whenever a data type is encountered that would go through Kryo.
So in that case you have to write your own Serializer.
Which can be created using TypeSerializer, simply call typeInfo.createSerializer(config) on the TypeInformation object.
For generic types, you need to “capture” the generic type information via the TypeHint, in your case for a list:
TypeInformation<List<Object>> info = TypeInformation.of(new TypeHint<List<Object>>(){});
ListTypeInfo class
More details in here.
Related
I have an application that has many declared Lambdas. I've added an annotation to them so that I can use reflection to find all the functions marked with the annotation. They are all defined as:
#FooFunction("abc")
public static Function<Task, Result> myFunc = task -> {... returns new Result}
At startup, my application uses reflection to find all of the annotated functions and add them to the hashmap.
static HashMap<String, Function<Task, Result>> funcMap = new HashMap<>();
static {
Reflections reflections = new Reflections("my.package", Scanners.values());
var annotated = reflections.getFieldsAnnotatedWith(FooFunction.class);
annotated.forEach(aField -> {
try {
var annot = aField.getAnnotation(FooFunction.class);
var key = annot.value();
funcMap.put(key, aField.get(null);
} catch (Exception e) {
...;
}
}
The above code definitely won't work, especially on the put since aField.get(null) returns an Object. If I cast the object to Function<Task,Result>, I get an unchecked cast warning. No matter how I circle around it, I can't get rid of the warning (without using Suppress).
I've tried changing the Function<Foo, Bar> to something more generic like Function<?,?> but that took me down another rabbit hole.
All of the functions are declared as static since they really don't need to belong to a specific class. They are grouped under various classes simply for organizational purposes.
The underlying objective is: the API will receive a list of tasks. There are about 100 different Task types. Each Task has an "id" field which is used to determine which Function should be used to process that Task. It looks something like this:
var results = Arrays.stream(request.getTasks())
.map(task -> functionMap.getOrDefault(task.getId(), unknownTaskFn).apply(task)
.toList();
My questions:
Is this an antipattern? If so, is there a better prescribed pattern?
How can I go from an Object to a Function<Task,Result> properly to put it into the map?
Thanks
Casting is inevitable, because Field.get returns Object by design, but it could be done without warnings.
I would also suggest define a custom interface
public interface TaskResultFunction extends Function<Task, Result> {
}
and use it for lambda declarations
#FooFunction("abc")
public static TaskResultFunction myFunc = task -> {... returns new Result}
(otherwise we will have to deal with ParameterizedTypeReference, but in this case it is not necessary and overcomplicated)
Map<String, Function<Task, String>> funcMap = ...
// or more strict
Map<String, TaskResultFunction> funcMap = ...
//...
if (TaskResultFunction.class.isAssignableFrom(field.getType())) {
TaskResultFunction fn = (TaskResultFunction) field.get(null);
taskResultFunctions.put(key, fn);
}
I am attempting to use MapStruct to convert between 2 internal models. These models are generated from the same specification and represent a large tree-like structure that is constantly being added to. The goal of using MapStruct is to have an efficient, low (non-generated) code way of performing this conversion that stays up to date with additions to the specification. As an example, my models would look like:
package com.mycompany.models.speca;
public class ModelSpecA {
private String name;
private int biggestNumberFound;
private com.mycompany.models.speca.InternalModel internalModel;
private List<com.mycompany.models.speca.InternalModel> internalModelList;
}
package com.mycompany.models.specb;
public class ModelSpecB {
private String name;
private int biggestNumberFound;
private com.mycompany.models.specb.InternalModel internalModel;
private List internalModelList;
}
with all of the expected getters and setters and no-arg constructors.
MapStruct is able to generate code for the mapping extremely easily with the code looking like:
interface ModelSpecMapper {
ModelSpecB map(ModelSpecA source);
}
From unit testing and inspecting the generated code, the mapping is accurate and complete except in one regard: the mapping of the internalModelList member in each class. The generated code looks like the following:
...
if (sourceInternalModelList != null) {
specBTarget.setInternalModelList( specASource.getInternalModelList() );
}
...
I.e. It is mapping from the generic List<com.mycompany.models.speca.InternalModel> to the non-generic List without doing model conversion. This passes at compile time and runtime in unit tests, but will cause errors in later code when we expect to be able to cast to the SpecB version of the model.
So far, I've investigated if it is possible to force a mapping of the parameterized type in the source to its corresponding type without using expensive reflection operations, which would eliminate the gains from using MapStruct as a solution. This is my first experience with MapStruct, so there may be an obvious solution I am simply unaware of. Adding an explicit mapping is infeasible as I need this to be forward compatible with future additions to the model including new Lists.
TLDR; How do I use MapStruct to convert the contents of a generic List to a non-generic List? E.g. List<com.mycompany.a.ComplexModel> --> List whose members are of type com.mycompany.b.ComplexModel.
Based on suggestions by #chrylis -cautiouslyoptimistic-, I managed to successfully accomplish the mapping by using Jackson to perform the mapping directly from type to type, but that is tangential to the MapStruct problem. I was able to accomplish the stated goal of mapping a generic list to a non-generic list by adding a default mapping to my MapStruct mapper:
/**
* Map a generic List which contains object of any type to a non-generic List which will contain objects
* of the resulting mapping.
* E.g. It maps a generic list of T to a non-generic list of contents mapped from T.
* #param source Source generic List to map from.
* #return A non-generic List whose contents are the contents of <i>source</i> with mapping implementation applied.
* #param <T>
*/
default <T> List mapGenericToNonGeneric(List<T> source) {
if (source == null) {
return null;
}
if (source.isEmpty()) {
return new LinkedList();
}
// Handle the most common known cases as an optimization without expensive reflection.
final Class<?> objectClass = source.get(0).getClass();
if (ClassUtils.isPrimitiveOrWrapper(objectClass)) {
return new LinkedList(source);
}
if (String.class.equals(objectClass)) {
return new LinkedList(source);
}
try {
Method mapperMethod = Stream.of(this.getClass().getDeclaredMethods())
.map(method -> {
Parameter[] params = method.getParameters();
// If this method is a mapper that takes the type of our list contents
if (params.length == 1 && params[0].getParameterizedType().equals(objectClass)) {
return method;
}
return null;
})
.filter(Objects::nonNull)
.findFirst()
.orElse(null);
if (mapperMethod != null) {
final List result = new LinkedList();
for (T sourceObject : source) {
result.add(mapperMethod.invoke(this, sourceObject));
}
log.info("Executed slow generic list conversion for type {}", objectClass.getName());
return result;
}
} catch (Exception e) {
throw new RuntimeException(e);
}
return null;
}
From inspecting the generated code and adding assertions to the type of each collection contents, this is passing my property-based testing suite. It still makes use of reflection to determine the parameterized type but is able to handle arbitrary additions to the model and outperforms the previous Jackson solution substantially.
I have the bunch of Java classes. I need to create simple POJOs of just the fields from Java classes. There is a way to create POJOs from JSON but I need directly from Java classes.
Java class may have logical methods and constructed based upon different things. My goal is just to hold the state in POJOs and send it over the network and deserialize in same set of POJOs.
You can serialize Java classes just fine, no need to strip them down to their fields (which is what it sounds like you want).
class MyClass implements Serializable {
private int myInt;
private String myString;
public MyClass(int mi, String ms) {
myInt = mi; myString = ms;
}
public String doStuff() { return String.format("%s %d", myString, myInt); }
}
Code for serialization:
MyClass toSerialize = new MyClass(5, "Test");
try (ObjectOutputStream out = new ObjectOutputStream(getNetworkOutstream())) {
out.writeObject(toSerialize);
}
Code to deserialize:
try (ObjectInputStream in = new ObjectInputStream(getNetworkInStream())) {
MyClass received = (MyClass) in.readObject();
}
The doStuff method is not in the way if that's what you're thinking.
Caveat is that all fields need to also be Serializable (or primitives).
If you are looking for a way to programmatically parse all those classes and generate POJOs for them, then you can use libraries like Antlr, JavaCC or JavaParser to analyse sources and then generate and save the new POJOs.
Use some JSON library.
For ex. GSON
You could choose what fields to serialize or not using transient identifier.
Apart from that these libraries offer much more and definitely all the requirements you specified.
Without getting bogged down with specifics, my code represents a library whereby each book is made up of a Set of pages containing a Set of Words.
I have created my own Set implementations:
class PageSet<E> extends HashSet<E>(){
public boolean set(int index, E e){....}
....
}
and
class WordSet<E> extends HashSet<E>(){
public boolean set(int index, E e){....}
....
}
I've got stuck when I try to create a Book in my main class:
Set<Set<Word>> dictionary = new PageSet<WordSet<Word>>();
Which results in a type conversion mismatch. However it will quite happily accept
Set<Set<Word>> dictionary = new PageSet<Set<Word>>();
Could someone please shed some light as to what I'm doing wrong when using a generic setup like this?
Basically, a PageSet<WordSet<Word>> is not a Set<Set<Word>>, because X<Subclass> is not a X<Superclass>.
If you had said
Set<WordSet<Word>> dictionary = new PageSet<WordSet<Word>>();
then that would have worked also.
It's either
Set<Set<Word>> dictionary = new PageSet<Set<Word>>();
or
Set<WordSet<Word>> dictionary = new PageSet<WordSet<Word>>();
Since although WordSet is a subclass of Set, a Set<WordSet> is not a subclass of Set<Set>.
In other words, generics are not covariant, which is different from things like arrays.
In any case, you should not extend collections unless you are trying to create new collection types. Since you cannot restrict the visibilities of superclass methods in a subclass, people will be able to write
WordSet<Word> words = ...;
words.clear();
You probably do not want to give clients that power. Instead, use aggregation instead of inheritance.
class Word {
private String text;
private PartOfSpeech part;
// Constructors, getters, setters, equals, hashCode are elided.
}
class Page {
private int pageNumber;
private Set<Word> contents = new HashSet<>();
public class Book {
private String title;
private List<Page> pages = new ArrayList<>();
}
Pages in a book are ordered linearly, which is why I used lists. I'm not sure why you used sets. But in any case, by encapsulating the collections inside the classes, you can provide client code exactly the interface you want them to use. The visibilities were chosen deliberately; this looks like a cluster of related classes, but you might want to change them.
I am playing with flexjson and Google Cloud Endpoints. My model which I need to serialize is:
public class SampleModel {
Long id;
DateTime createdAt;
String message;
OtherModel other;
}
I just created DateTimeObjectFactory to find a way of creating DateTime objects (lacks of no arg constructor). Now I have question about OtherModel and SampleModel as well.
I want to serialize in fact a List of SampleModel. So here is my code:
List<SampleModel> sampleList = new ArrayList<SampleModel>();
// ...
// adding some items to sampleList
// ...
String s = new JSONSerializer().deepSerialize(sampleList);
I want to deepSerialize it for now to avoid some unserializated fields, but just for now.
When I want to deserialize s I do that:
sampleList = new JSONDeserializer<List<SampleModel>>()
.use("other", OtherModel.class)
.use(DateTime.class, new DateTimeObjectFactory())
.deserialize(s);
I think that everything is just ok in that kind of deserializing, because I can see in logs deserialized object. But in fact when I want to get item from that new sampleList I get an error:
java.lang.ClassCastException: java.util.HashMap cannot be cast to com.test.games.testapi.model.SampleModel
If I have good understanding every not trivial object will be deserialized as Map if I don't point the right class to deserializer. So this error means that script didn't know SampleModel? What is the meaning of this?