Kinesis Data Analytics - Flink state serializer incompatible after recovering from Snapshot

Kinesis Data Analytics - Flink state serializer incompatible after recovering from Snapshot - java

We have our Flink application(version 1.13.2) deployed on AWS KDA. The strategy is that we do not want the application to stop at all, so we always recover the application from a snapshot when updating the jar with new changes.
Recently, we found a problem where a lower-level POJO class is corrupted. It contains a few getters and setters with wrong namings. This early mistake essentially hinders us from adding the POJO class with new fields. So we decided to rename the getter/setter directly. But it led us to the following exception after updating the application.
org.apache.flink.util.StateMigrationException: The new state serializer (org.apache.flink.api.common.typeutils.base.ListSerializer#46c65a77) must not be incompatible with the old state serializer (org.apache.flink.api.common.typeutils.base.ListSerializer#30c9146c).
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.updateRestoredStateMetaInfo(RocksDBKeyedStateBackend.java:704) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.tryRegisterKvStateInformation(RocksDBKeyedStateBackend.java:624)
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.createInternalState(RocksDBKeyedStateBackend.java:837) at org.apache.flink.runtime.state.KeyedStateFactory.createInternalState(KeyedStateFactory.java:47) at org.apache.flink.runtime.state.ttl.TtlStateFactory.createStateAndWrapWithTtlIfEnabled(TtlStateFactory.java:71)
at org.apache.flink.runtime.state.AbstractKeyedStateBackend.getOrCreateKeyedState(AbstractKeyedStateBackend.java:301) at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.getOrCreateKeyedState(StreamOperatorStateHandler.java:315) at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.getOrCreateKeyedState(AbstractStreamOperator.java:494) at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.open(WindowOperator.java:243) at org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:442)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:582) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55) at org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:562)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647) at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:537) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:764) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:571)
at java.base/java.lang.Thread.run(Thread.java:829)
As far as we understand, the failure happens specifically in the 2 CoGroup functions we implemented. They are both consuming the corrupted POJO class nested in another POJO class, Session. A code snippet of the Cogroup function is shown below. BTW, we are using google guava list here, not sure if it causes list serializer problem.
public class OutputCoGroup extends CoGroupFunction<Session, Event, OutputSession> {
#Override
public void coGroup(Iterable<Session> sessions, Iterable<Event> events,
Collector<OutputSession> collector) throws Exception {
// we are using google guava list here, not sure if it causes list serializer problem
if (Lists.newArrayList(sessions).size() > 0) {
...
if (events.iterator().hasNext()) {
List<Event> eventList = Lists.newArrayList(events);
...
As we can see in the input, the session is the POJO class that contains the problematic POJO class.
public class Session{
private problematicPOJO problematicpojo;
...
}
The problematic POJO class has 2 Boolean fields with the wrong getter/setter namings(literally missing Is :´<). Other fields in the class are ignored, they do not have any issues.
public class problematicPojo {
private Boolean isA;
private Boolean isB;
...
public getA(){ ... }
public setA(...){ ... }
public getB(){ ... }
public setB(...){ ... }
...
}
We have looked up some possible solutions.
Using State Processor API -> AWS does not provide access to KDA snapshots, so we're not able to modify it
Providing TypeInformation to the problematic POJO class -> did not seem to be working
We are thinking of specifying listStateDescriptor in the cogroup function(changing to RichCoGroup) to be able to manually update the states when recovering from a snapshot. But we could not get too much insight from the official docs. Is anyone here familiar with this method and can help us out?
Thank you!

Related

Event upcasting in saga

I have an event that is serialized using XStreamSerializer in a saga like so:
public class MyEvent {
private String property1;
private String property2;
...
}
public MySaga {
....
private MyEvent myEvent;
....
}
After creating several sagas with that event I needed to modify the event by adding a property:
public class MyEvent {
private String property1;
private String property2;
private String property2;
...
}
And now I'm having problem on deserialization.
I have figured out a workaround by using a Serialization id that solves the problem,
but I need to implement some sort of upcasting procedure similar to the event upcasting, but for the sagas, in which I override the deserialization process and upcast the inner property myEvent to the new event.
Is this possible?
My guess is that I should override ConverterFactory class somehow but I'm not sure how.
Could anybody advice something please?

Based your other question, I assume you are using Axon Framework v2.4.6.
As you can find on the docs, Axon provide an example of how to write an upcaster here using the correct 2.4 documentation link.
You basically get the intermediateRepresentation which is an org.dom4j.Document since you are using XStream Serializer. Based on that, you have to use methods given by XStream (copy, get, set, remove, etc) and update your XML which is the payload of a given Event.
You can find some other examples on the new docs or on our code-samples repository.
KR,

Spring MVC: issue between xml and annotation configurations

I have created a simple controller
#GetMapping("/playerAccount")
public Iterable<PlayerAccount> getPlayerAccounts(com.querydsl.core.types.Predicate predicate) {
return repository.findAll(predicate);
}
When I call the GET /playerAccount API, I get the exception IllegalStateException "No primary or default constructor found for interface com.querydsl.core.types.Predicate" (thrown by org.springframework.web.method.annotation.ModelAttributeMethodProcessor#createAttribute).
After some (deep!) digging, I found out that if I delete the following line in my spring.xml file:
<mvc:annotation-driven />
And if I add the following line in my Spring.java file:
#EnableWebMvc
then the problem disappears.
I really don't understand why. What could be the cause of that ? I thought that these were really equivalent (one being a xml based configuration, the other being java/annotation based).
I read this documentation on combining Java and Xml configuration, but I didn't see anything relevant there.
edit:
from the (few) comments/answers that I got so far, I understand that maybe using a Predicate in my API is not the best choice.
Although I would really like to understand the nature of the bug, I first want to address the initial issue I'm trying to solve:
Let's say I have a MyEntity entity that is composed of 10 different fields (with different names and types). I would like to search on it easily. If I create the following (empty) interface:
public interface MyEntityRepository extends JpaRepository<MyEntity, Long>, QuerydslPredicateExecutor<MyEntity> {
}
then without any other code (apart from the xml configuration ), I am able to easily search a myEntity entity in the database.
Now I just want to expose that functionality to a Rest endpoint. And ideally, if I add a new field to my MyEntity, I want that API to automatically work with that new field, just like the MyEntityRepository does, without modifying the controller.
I thought this was the purpose of Spring Data and a good approach, but please tell me if there's a better / more common way of creating a search API to a given Entity.

I didn't see that it returned an exception, that's why I thought it was a dependency problem.
Try to make your code look like this, and it will do it.
#RestController
public class MyClass {
#Autowired
private final MyRepository repository;
#GetMapping("/playerAccount")
public Iterable<PlayerAccount> getPlayerAccounts() {
return repository.findAll();
}
If you have a parameter in your request you add #RequestParam.
Code time (yaaaaaay) :
#RestController
public class MyClass {
#Autowired
private final MyRepository repository;
#GetMapping("/playerAccount")
public Iterable<PlayerAccount> getPlayerAccounts(#RequestParam(required = false) Long id) {
return repository.findById(id);
}
Ps: the request should keep the same variable name e.g
.../playerAccount?id=6

Spring Integration - aggregate and transform

What would be the simplest integration component arrangement in my use case:
Receive messages from multiple sources and in multiple formats (all messages are JSON serialized objects).
Store messages in buffer up to 10 seconds (aggregate)
Group messages by different class property getter (eg class1.someId(), class2.otherId(), ...)
Release all messages that are grouped and transform to new aggregated message.
So far (point 1. and 2.), I'm using aggregator, but don't know if there is out of box solution for problem at 3.) - or I will have to try to cast each Message and check if type of object is class1 - then use correlationstrategy someId, if class2 then otherId.
For problem 4.) - I could manually code something - but Transformer seems like a good component to use, I just don't know if there is something like aggregating transformer where I can specify mapping rules for each input type.
UPDATE
Something like this:
class One{
public String getA(){ return "1"; }
}
class Two{
public Integer getB(){ return 1; }
}
class ReduceTo{
public void setId(Integer id){}
public void setOne(One one){}
public void setTwo(Two two){}
}
public class ReducingAggregator {
#CorrelationStrategyMethod
public String strategy(One one){
return one.getA();
}
#CorrelationStrategyMethod
public String strategy(Two two){
return two.getB().toString();
}
#AggregatorMethod
public void reduce(ReduceTo out, One in){
out.setId(Integer.valueOf(in.getA()));
out.setOne(in);
}
#AggregatorMethod
public void reduce(ReduceTo out, Two in){
out.setId(in.getB());
out.setTwo(in);
}
}
Annotations have, I suppose, different use-case than current spring ones. RediceTo could be any object including collections. In config we could specify when passed first time should it be empty list or something else (like reduce in java streams).

Not sure what you would like to see as out-of-the-box solution. That is your classes, so your methods. How Framework may make some decision on them?
Well, yes, you need to implement CorrelationStrategy. Or you can consider to use ExpressionEvaluatingCorrelationStrategy and don't write the Java code :-).
Please, elaborate more what you would like to see as an out-of-the-box feature.
The aggregating transformer is encapsulated exactly in the MessageGroupProcessor function of the Aggregator. By default it is DefaultAggregatingMessageGroupProcessor. Yes, you can code your own or again - use an ExpressionEvaluatingMessageGroupProcessor and don't write Java code again :-)

Trying to link to JPA Repository query methods with Spring HATEOS and get IllegalArgumentException: 'uriTemplate' must not be null

In my app I have a foreign key relation between Things and Stuff where a given Thing may contain hundreds of Stuffs and I'm using Spring Data JPA to expose the Thing and Stuff repository's.
I want to display all the Stuff associated with the users selected Thing, but because of the size of the return I want to page the Stuff result.
Searching showed that it is not possible to add paging functionality to the embedded stuff links from a Thing return, so the below link as returned from my Thing repository can never be paged:
"stuff": {
"href": "http://localhost:8080/api/things/1/stuff"
}
So I have added a custom method to my Stuff repository to get all Stuff by thing Id, and that works fine when called directly.
I want to add a Link to the Thing Resource return pointing at the custom search method to get all the assocated Stuff, but when I use the ControllerLinkBuilder.linkTo() method it fails with
java.lang.IllegalArgumentException: 'uriTemplate' must not be null
at org.springframework.util.Assert.hasText(Assert.java:181) ~[spring-core-4.3.11.RELEASE.jar:4.3.11.RELEASE]
at org.springframework.web.util.UriTemplate.<init>(UriTemplate.java:61) ~[spring-web-4.3.11.RELEASE.jar:4.3.11.RELEASE]
Stuff Repo:
public interface StuffRepo extends JpaRepository<Stuff, Long> {
Page<Stuff> findByThingId(#Param("thingId") Long thingId, Pageable pageable);
}
Configuration:
#Bean
ThingProcessor getThingProcessor()
{
return new ThingProcessor();
}
public static class ThingProcessor implements ResourceProcessor<Resource<Thing>>{
#Override
public Resource<Thing> process(Resource<Thing> resource) {
ControllerLinkBuilder.linkTo(ControllerLinkBuilder.methodOn(StuffRepo.class).findByThingId(resource.getContent().id, null));
return resource;
}
}
Am I missing some annotation or configuration? I have tried annotating the Repo and the method with #RestResource and it makes no difference. Also is there a better way to get a paged result for the sub objects?

I got around this using the RepositoryEntityLinks class, that gives you a list of Link objects, and from there I find the one I need using link.getRel()

Patterns: Populate instance from Parameters and export it to XML

I'm building a simple RESTFul Service; and for achieve that I need two tasks:
Get an instance of my resource (i.e Book) from request parameters, so I can get that instance to be persisted
Build an XML document from that instance to send the representation to the clients
Right now, I'm doing both things in my POJO class:
public class Book implements Serializable {
private Long id;
public Book(Form form) {
//Initializing attributes
id = Long.parseLong(form.getFirstValue(Book.CODE_ELEMENT));
}
public Element toXml(Document document) {
// Getting an XML Representation of the Book
Element bookElement = document.createElement(BOOK_ELEMENT);
}
I've remembered an OO principle that said that behavior should be where the data is, but now my POJO depends from Request and XML API's and that doesn't feels right (also, that class has persistence anotations)
Is there any standard approach/pattern to solve that issue?
EDIT:
The libraries i'm using are Restlets and Objectify.

I agree with you when you say that the behavior should be where the data is. But at the same time, as you say I just don't feel confortable polluting a POJO interface with specific methods used for serialization means (which can grow considerably depending on the way you want to do it - JSON, XML, etc.).
1) Build an XML document from that instance to send the representation to the clients
In order to decouple the object from serialization logic, I would adopt the Strategy Pattern:
interface BookSerializerStrategy {
String serialize(Book book);
}
public class XmlBookSerializerStrategy implements BookSerializerStrategy {
public String serialize(Book book) {
// Do something to serialize your book.
}
}
public class JsonBookSerializerStrategy implements BookSerializerStrategy {
public String serialize(Book book) {
// Do something to serialize your book.
}
}
You POJO interface would become:
public class Book implements Serializable {
private Long id;
private BookSerializerStrategy serializer
public String serialize() {
return serializer.serialize(this);
}
public void setSerializer(BookSerializerStrategy serializer) {
this.serializer = serializer;
}
}
Using this approach you will be able to isolate the serialization logic in just one place and wouldn't pollute your POJO with that. Additionally, returning a String I won't need to couple you POJO with classes Document and Element.
2) Get an instance of my resource (i.e Book) from request parameters, so I can get that instance to be persisted
To find a pattern to handle the deserialization is more complex in my opinion. I really don't see a better way than to create a Factory with static methods in order to remove this logic from your POJO.
Another approach to answer your two questions would be something like JAXB uses: two different objects, an Unmarshaller in charge of deserialization and a Marshaller for serialization. Since Java 1.6, JAXB comes by default with JDK.
Finally, those are just suggestions. I've become really interested in your question actually and curious about other possible solutions.

Are you using Spring, or any other framework, in your project? If you used Spring, it would take care of serialization for you, as well as assigning request params to method params (parsing as needed).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Kinesis Data Analytics - Flink state serializer incompatible after recovering from Snapshot - java

Related

Event upcasting in saga

Spring MVC: issue between xml and annotation configurations

Spring Integration - aggregate and transform

Trying to link to JPA Repository query methods with Spring HATEOS and get IllegalArgumentException: 'uriTemplate' must not be null

Patterns: Populate instance from Parameters and export it to XML

Categories

Resources