open method is not being called in Flink RichMapFunction

open method is not being called in Flink RichMapFunction - java

I am trying to use apache flink for a simple example described at Shortcuts. However, I noticed the open method is never called and as a result I get null pointer exception on first line of map function.
public class MyMap extends RichMapFunction<Integer, Integer> {
private ValueState<Integer> test;
public void open(Configuration cfg) {
test = getRuntimeContext().getState(new
ValueStateDescriptor<Integer>("myTest", Integer.class));
System.out.println("1:" + test);
}
#Override
public Integer map(Integer i) throws Exception {
System.out.println("2:" + test.value()); //test is null here
test.update(test.value() == null? 1: test.value() + 1);
System.out.println("3:" + test.value());
return i;
}
}

Update:
Did you try to #Override the open function?
test test.value is supposed to be null the first time.
You are on keyed context, which means that each message has a key which flink already knows about. When you enter a stateful operator, flink will try to fetch a value for that key from the configured state backend. Unless you configure the ValueStateDescriptor to have a default (it is deprecated), the first time you process a message for a specific key the state will be null. Thus your application should handle the null value.
Try the following example (my java is rusty, this is in scala. Ask me if you need help converting it):
env.fromElements(("key1", 2),("key2", 4), ("key1", 5))
.keyBy(_._1)
.map {
new RichMapFunction[(String, Int), (String, Int)] {
lazy val stateTypeInfo: TypeInformation[Int] = implicitly[TypeInformation[Int]]
lazy val serializer: TypeSerializer[Int] = stateTypeInfo.createSerializer(getRuntimeContext.getExecutionConfig)
lazy val stateDescriptor = new ValueStateDescriptor[Int]("dummy state", serializer)
var testVar: ValueState[Int] = _
override def open(config: Configuration) = {
testVar = this.getRuntimeContext.getState(stateDescriptor)
}
override def map(in: (String, Int)): (String, Int) = {
println(s"message $in")
println(s"state ${testVar.value()}")
println()
val sum = Option(testVar.value()).getOrElse(0) + in._2
testVar.update(sum)
(in._1, sum)
}
}
}.print()
env.execute()
This should produce:
message (key1,2) (first time key1 is seen)
state null (state is null)
(key1,2) (output)
message (key2,4) (first time key2 is seen)
state null (state is null)
(key2,4) (output)
message (key1,5) (second time key1 is seen!! We stored something there!)
state 2 (we stored a 2)
(key1,7) (thus output is 2+5=7)

I had the similar problem. I could solve the problem by replacing the following import:
import java.lang.module.Configuration;
with this one:
import org.apache.flink.configuration.Configuration;

Related

How can I do this in java without using Either?

I have a function that returns String.
private String processQuery(String[] args){
//code logic
}
Returned result can either be a answer (Your account detail is $account_detail.) or response (Sorry I cannot understand you?). Depending upon the result, code will do separate things.
What I came up with is to user Either<String, String>.
private Either<String,String> processQuery(String[] args){
//code logic
}
private void reply(String[] args){
//code logic
var either = processQuery(args);
return either.fold((l){
//returned result is answer
},(r){
//returned result is response
});
}
If it returns left then it is answer, if it returns right it is response. But since there is not Either type in java so I tried passing AtomicBoolean around.
What is the better solution for this only using java stl?

One solution is to make the method take two lambda functions that corresponds to a correct and an incorrect answer and then call only the appropriate one
private void processQuery(String[] args, Consumer<String> correct, Consumer<String> incorrect){
if (args.length == 0) {
incorrect.accept("Sorry I cannot understand you?");
return;
}
correct.accept("Your account detail is $account_detail.");
}
which can be called like this
private void reply(String[] args){
processQuery(args, (
reply -> System.out.println("Success!, " + reply)
),
(
reply -> System.out.println("Fail, " + reply)
)
);
}
or create variables for the different functions
Consumer<String> badAnswer = reply -> System.out.println("Fail, " + reply);
Consumer<String> goodAnswer = reply -> System.out.println("Success!, " + reply);
private void reply(String[] args){
processQuery(args, goodAnswer, badAnswer);
}

You can use Pair:
Pair<String, String> pair = Pair.with("Blah", "Blee");
See some example here
A better approach, if your responses actually represent some kind of an error, will be to throw an exception of some kind, and to keep the return value of String to represent a "good" flow.

Copy fields across objects of different type in gRPC

Suppose I have two proto buffer types:
message MessageType1 {
SomeType1 field1 = 1;
SomeType2 field2 = 2;
SomeType3 field3 = 3;
}
message MessageType2 {
SomeType1 field1 = 1;
SomeType2 field2 = 2;
SomeType4 field4 = 3;
}
Then in Java I would like to be able to use one object as a template to another:
MessageType1 message1 = ...;
MessageType2 message2 = MessageType2.newBuilder()
.usingTemplate(message1) // sets field1 & field2 only
.setField4(someValue)
.build()
instead of
MessageType1 message1 = ...;
MessageType2 message2 = MessageType2.newBuilder()
.setField1(message1.getField1())
.setField2(message1.getField2())
.setField4(someValue)
.build()
Why do I need this? My gRPC service is designed to take incoming data of one type (message1) which is almost identical to another message of a different type (message2) -- which needs to be sent out. The amount of identical fields is huge and copy code is mundane. Manual solution also has a disadvantage of a miss if a new field gets added.
There exists a template method (object.newBuilder(template)) which allows templating object of the same type, but how about templating between different types?
I could, of course, write a small reflection utility which inspects all members (methods?) and manually copies data over, but generated code looks discouraging and ugly for this sort of quest.
Is there any good approach to tackle this?

It turned out to be not so complicated. I wrote a small utility which would evaluate and match FieldDescriptors (something that gRPC generates). In my world it is enough to match them by name and type. Full solution here:
/**
* Copies fields from source to dest. Only copies fields if they are set, have matching name and type as their counterparts in dest.
*/
public static void copyCommonFields(#Nonnull GeneratedMessageV3 source, #Nonnull com.google.protobuf.GeneratedMessageV3.Builder<?> destBuilder) {
Map<FieldDescriptorKeyElements, Descriptors.FieldDescriptor> elementsInSource = Maps.uniqueIndex(source.getDescriptorForType().getFields(), FieldDescriptorKeyElements::new);
Map<FieldDescriptorKeyElements, Descriptors.FieldDescriptor> elementsInDest = Maps.uniqueIndex(destBuilder.getDescriptorForType().getFields(), FieldDescriptorKeyElements::new);
// those two above could even be cached if necessary as this is static info
Set<FieldDescriptorKeyElements> elementsInBoth = Sets.intersection(elementsInSource.keySet(), elementsInDest.keySet());
for (Map.Entry<Descriptors.FieldDescriptor, Object> entry : source.getAllFields().entrySet()) {
Descriptors.FieldDescriptor descriptor = entry.getKey();
FieldDescriptorKeyElements keyElements = new FieldDescriptorKeyElements(descriptor);
if (entry.getValue() != null && elementsInBoth.contains(keyElements)) {
destBuilder.setField(elementsInDest.get(keyElements), entry.getValue());
}
}
}
// used for convenient/quick lookups in a Set
private static final class FieldDescriptorKeyElements {
final String fieldName;
final Descriptors.FieldDescriptor.JavaType javaType;
final boolean isRepeated;
private FieldDescriptorKeyElements(Descriptors.FieldDescriptor fieldDescriptor) {
this.fieldName = fieldDescriptor.getName();
this.javaType = fieldDescriptor.getJavaType();
this.isRepeated = fieldDescriptor.isRepeated();
}
#Override
public int hashCode() {
return Objects.hash(fieldName, javaType, isRepeated);
}
#Override
public boolean equals(Object obj) {
if (obj == null || !(obj instanceof FieldDescriptorKeyElements)) {
return false;
}
FieldDescriptorKeyElements other = (FieldDescriptorKeyElements) obj;
return Objects.equals(this.fieldName, other.fieldName) &&
Objects.equals(this.javaType, other.javaType) &&
Objects.equals(this.isRepeated, other.isRepeated);
}
}

Answering your specific question: no, there is no template based way to do this. However, there are some other ways to get the same effect:
If you don't care about performance and the field numbers are the same between the messages, you can serialize the first message to bytes and deserialize them back as the new message. This requires that all the fields in the first message must match the type and id number of those in the second message (though, the second message can have other fields). This is probably not a good idea.
Extract the common fields to another message, and share that message. For example:
proto:
message Common {
SomeType1 field1 = 1;
SomeType2 field2 = 2;
SomeType3 field3 = 3;
}
message MessageType1 {
Common common = 1;
// ...
}
message MessageType2 {
Common common = 1;
// ...
}
Then, you can share the messages in code:
MessageType1 message1 = ...;
MessageType2 message2 = MessageType2.newBuilder()
.setCommon(message1.getCommon())
.build();
This is the probably the better solution.
Lastly, as you mentioned, you could resort to reflection. This is probably the most verbose and slowest way, but it would allow you the most control (aside from manually copying over the fields). Not recommended.

Access a value's parent naming from within the instantiated class (Scala)?

Assume Scala 2.11. I'm writing a class that will persist a Scala value. It's intention is to be used as such:
class ParentClass {
val instanceId: String = "aUniqueId"
val statefulString: Persisted[String] = persisted { "SomeState" }
onEvent {
case NewState(state) => statefulString.update(state)
}
}
Persisted is a class with a type parameter that is meant to persist that specific value like a cache, and Persist handles all of the logic associated with persistence. However, to simply the implementation, I'm hoping to retrieve information about it's instantiation. For example, if it's instance in the parent class is named statefulString, how can I access that name from within the Persisted class itself?
The purpose of doing this is to prevent collisions in automatic naming of persisted values while simplifying the API. I cannot rely on using type, because there could be multiple values of String type.
Thanks for your help!
Edit
This question may be helpful: How can I get the memory location of a object in java?
Edit 2
After reading the source code for ScalaCache, it appears there is a way to do this via WeakTypeTag. Can someone explain what exactly is happening in its macros?
https://github.com/cb372/scalacache/blob/960e6f7aef52239b85fa0a1815a855ab46356ad1/core/src/main/scala/scalacache/memoization/Macros.scala

I was able to do this with the help of Scala macros and reflection, and adapting some code from ScalaCache:
class Macros(val c: blackbox.Context) {
import c.universe._
def persistImpl[A: c.WeakTypeTag, Repr: c.WeakTypeTag](f: c.Tree)(keyPrefix: c.Expr[ActorIdentifier], scalaCache: c.Expr[ScalaCache[Repr]], flags: c.Expr[Flags], ec: c.Expr[ExecutionContext], codec: c.Expr[Codec[A, Repr]]) = {
commonMacroImpl(keyPrefix, scalaCache, { keyName =>
q"""_root_.persistence.sync.caching($keyName)($f)($scalaCache, $flags, $ec, $codec)"""
})
}
private def commonMacroImpl[A: c.WeakTypeTag, Repr: c.WeakTypeTag](keyPrefix: c.Expr[ActorIdentifier], scalaCache: c.Expr[ScalaCache[Repr]], keyNameToCachingCall: (c.TermName) => c.Tree): Tree = {
val enclosingMethodSymbol = getMethodSymbol()
val valNameTree = getValName(enclosingMethodSymbol)
val keyName = createKeyName()
val scalacacheCall = keyNameToCachingCall(keyName)
val tree = q"""
val $keyName = _root_.persistence.KeyStringConverter.createKeyString($keyPrefix, $valNameTree)
$scalacacheCall
"""
tree
}
/**
* Get the symbol of the method that encloses the macro,
* or abort the compilation if we can't find one.
*/
private def getValSymbol(): c.Symbol = {
def getValSymbolRecursively(sym: Symbol): Symbol = {
if (sym == null || sym == NoSymbol || sym.owner == sym)
c.abort(
c.enclosingPosition,
"This persistence block does not appear to be inside a val. " +
"Memoize blocks must be placed inside vals, so that a cache key can be generated."
)
else if (sym.isTerm)
try {
val termSym = sym.asInstanceOf[TermSymbol]
if(termSym.isVal) termSym
else getValSymbolRecursively(sym.owner)
} catch {
case NonFatal(e) => getValSymbolRecursively(sym.owner)
}
else
getValSymbolRecursively(sym.owner)
}
getValSymbolRecursively(c.internal.enclosingOwner)
}
/**
* Convert the given method symbol to a tree representing the method name.
*/
private def getValName(methodSymbol: c.Symbol): c.Tree = {
val methodName = methodSymbol.asMethod.name.toString
// return a Tree
q"$methodName"
}
private def createKeyName(): TermName = {
// We must create a fresh name for any vals that we define, to ensure we don't clash with any user-defined terms.
// See https://github.com/cb372/scalacache/issues/13
// (Note that c.freshName("key") does not work as expected.
// It causes quasiquotes to generate crazy code, resulting in a MatchError.)
c.freshName(c.universe.TermName("key"))
}
}

Using boxed/atomic values in Scala with Chronicle Map

We're using ChronicleMap to support off-heap persistence in a large number of different stores, but hit a bit a of a problem with the most simple usecase.
First of all, here's the helper I wrote to make creation easier:
import java.io.File
import java.util.concurrent.atomic.AtomicLong
import com.madhukaraphatak.sizeof.SizeEstimator
import net.openhft.chronicle.map.{ChronicleMap, ChronicleMapBuilder}
import scala.reflect.ClassTag
object ChronicleHelper {
def estimateSizes[Key, Value](data: Iterator[(Key, Value)], keyEstimator: AnyRef => Long = defaultEstimator, valueEstimator: AnyRef => Long = defaultEstimator): (Long, Long, Long) = {
println("Estimating sizes...")
val entries = new AtomicLong(1)
val keySum = new AtomicLong(1)
val valueSum = new AtomicLong(1)
var i = 0
val GroupSize = 5000
data.grouped(GroupSize).foreach { chunk =>
chunk.par.foreach { case (key, value) =>
entries.incrementAndGet()
keySum.addAndGet(keyEstimator(key.asInstanceOf[AnyRef]))
valueSum.addAndGet(valueEstimator(value.asInstanceOf[AnyRef]))
}
i += 1
println("Progress:" + i * GroupSize)
}
(entries.get(), keySum.get() / entries.get(), valueSum.get() / entries.get())
}
def defaultEstimator(v: AnyRef): Long = SizeEstimator.estimate(v)
def createMap[Key: ClassTag, Value: ClassTag](data: => Iterator[(Key, Value)], file: File): ChronicleMap[Key, Value] = {
val keyClass = implicitly[ClassTag[Key]].runtimeClass.asInstanceOf[Class[Key]]
val valueClass = implicitly[ClassTag[Value]].runtimeClass.asInstanceOf[Class[Value]]
val (entries, averageKeySize, averageValueSize) = estimateSizes(data)
val builder = ChronicleMapBuilder.of(keyClass, valueClass)
.entries(entries)
.averageKeySize(averageKeySize)
.averageValueSize(averageValueSize)
.asInstanceOf[ChronicleMapBuilder[Key, Value]]
val cmap = builder.createPersistedTo(file)
val GroupSize = 5000
println("Inserting data...")
var i = 0
data.grouped(GroupSize).foreach { chunk =>
chunk.par.foreach { case (key, value) =>
cmap.put(key, value)
}
i += 1
println("Progress:" + i * GroupSize)
}
cmap
}
def empty[Key: ClassTag, Value: ClassTag]: ChronicleMap[Key, Value] = {
val keyClass = implicitly[ClassTag[Key]].runtimeClass.asInstanceOf[Class[Key]]
val valueClass = implicitly[ClassTag[Value]].runtimeClass.asInstanceOf[Class[Value]]
ChronicleMapBuilder.of(keyClass, valueClass).create()
}
def loadMap[Key: ClassTag, Value: ClassTag](file: File): ChronicleMap[Key, Value] = {
val keyClass = implicitly[ClassTag[Key]].runtimeClass.asInstanceOf[Class[Key]]
val valueClass = implicitly[ClassTag[Value]].runtimeClass.asInstanceOf[Class[Value]]
ChronicleMapBuilder.of(keyClass, valueClass).createPersistedTo(file)
}
}
It uses https://github.com/phatak-dev/java-sizeof for object size estimation. Here's the kind of usage we want to support:
object TestChronicle {
def main(args: Array[String]) {
def dataIterator: Iterator[(String, Int)] = (1 to 5000).toIterator.zipWithIndex.map(x => x.copy(_1 = x._1.toString))
ChronicleHelper.createMap[String, Int](dataIterator, new File("/tmp/test.map"))
}
}
But it throws an exception:
[error] Exception in thread "main" java.lang.ClassCastException: Key
must be a int but was a class java.lang.Integer [error] at
net.openhft.chronicle.hash.impl.VanillaChronicleHash.checkKey(VanillaChronicleHash.java:661)
[error] at
net.openhft.chronicle.map.VanillaChronicleMap.queryContext(VanillaChronicleMap.java:281)
[error] at
net.openhft.chronicle.map.VanillaChronicleMap.put(VanillaChronicleMap.java:390)
[error] at ...
I can see that it might have something to do with atomicity of Scala's Int as opposed to Java's Integer, but how do I bypass that?
Scala 2.11.7
Chronicle Map 3.8.0

Seems suspicious that in your test it's Iterator[(String, Int)] (rather than Iterator[(Int, String)]) for key type is String and value type is Int, while the error message is compaining about key's type (int/Integer)
If error message says Key must be a %type% it means that you configured that type in the first ChronicleMapBuilder.of(keyType, valueType) statement. So in your case it means that you configured int.class (the Class object, representing the primitive int type in Java), that is not allowed, and providing java.lang.Integer instance to map's methods (probably you provide primitive ints, but they become Integer due to boxing), that is allowed. You should ensure that you are providing java.lang.Integer.class (or some other Scala's class) to ChronicleMapBuilder.of(keyType, valueType) call.
I don't know what size estimation this project gives: https://github.com/phatak-dev/java-sizeof, but in any case you should specify size in bytes that the object will take in serialized form. Serialized form itself depends on default serializers, chosen for a specific type in Chronicle Map (and may change between Chronicle Map versions), or custom serializers configured for specific ChronicleMapBuilder. So using any information about key/value "sizes" to configure a Chronicle Map, other than out of the Chronicle Map itself, is fragile. You can use the following procedure to estimate sizes more reliably:
public static <V> double averageValueSize(Class<V> valueClass, Iterable<V> values) {
try (ChronicleMap<Integer, V> testMap = ChronicleMap.of(Integer.class, valueClass)
// doesn't matter, anyway not a single value will be written to a map
.averageValueSize(1)
.entries(1)
.create()) {
LongSummaryStatistics statistics = new LongSummaryStatistics();
for (V value : values) {
try (MapSegmentContext<Integer, V, ?> c = testMap.segmentContext(0)) {
statistics.accept(c.wrapValueAsData(value).size());
}
}
return statistics.getAverage();
}
}
You can find it in this test: https://github.com/OpenHFT/Chronicle-Map/blob/7aedfba7a814578a023f7975ef15ba88b4d435db/src/test/java/eg/AverageValueSizeTest.java
This procedure is hackish, but there are no better options right now.
Another recommendation:
If your keys or values are kind of primitives (ints, longs, doubles, but boxed), or any other type that is always of the same size, you shouldn't use averageKey/averageValue/averageKeySize/averageValueSize methods, better you use constantKeySizeBySample/constantValueSizeBySample method. Specifically for java.lang.Integer, Long and Double even this is not needed, Chronicle Map already knows that those types are constantly sized.

Get an existing object from a Map with a key

For an application written in java (Eclipse), I have created a Map where I save objects of a custom class.
This custom class is called Music and has this constructor:
public Music (String title, String autor, int code){
this.setTitle(title);
this.setAutor(autor);
this.setCode(code);
}
This class has 3 child classes: Vinyl, CD and cassette that extend it. Here is the CD class:
public CD(String title, String autor, String type, int code) {
super(title, autor, code);
this.setType(type);
}
Then, in other class called ManageMusic I have created some methods and the Map:
private final Map<Integer, Music> musicMap;
public ManageMusic() {
musicMap = new HashMap<Integer, Music>();
}
If I want to add an object to the Map, I have a method that basically in this example with the CD does:
musicItem = new CD(title, autor, format, newCode);
musicMap.put(newCode, musicItem);
The code in all theses cases is a number with which I refer to as a determined object to set it into the Map, delete it or get it from the Map.
Now, my question is: When I want to get an object from the Map and set it into a String, I am doing this:
String object = musicMap.get(code).toString();
This way I should be getting the object from the Map and casting it to a String.
How can I manage the case when the code passed doesn't exist in the Map?
How could I catch an exception or do something to tell the user that there is no element inside the Map with that code?

You can use Ternary operator ?:
String object = musicMap.get(code) != null ? musicMap.get(code).toString() : "No item found.";
Edit: (thanks to #user270349)
Even better approach
Music m = musicMap.get(code);
String object = (m != null) ? m.toString() : "No item found.";

You can check if the return value of get is null :
Music object = musicMap.get(code);
if (object == null) {
// do nothing
} else {
String str = object.toString();
}
You could also use containsKey() method :
if (musicMap.containsKey(code)) {
// your code
}

I am not sure if I understood but you can always do.
Music music = musicMap.get(code);
if( music != null )
String object = music.toString()

You can use containsKey method:
String str;
if(musicMap.containsKey(code)){
str = musicMap.get(code);
} else {
// do something
// str = "some string";
}

I would suggest to throw an exception when there is no element in map corresponding to key.
This exception can be caught somewhere in your application(depends on how exceptions are handled in your application) this type of implementation allows to easily display different types of error or warning messages to the user.
Music object = musicMap.get(code);
if (object != null) {
// do something
} else {
throw new NoCDFoundException("no.item.found");
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

open method is not being called in Flink RichMapFunction - java

I had the similar problem. I could solve the problem by replacing the following import: import java.lang.module.Configuration; with this one: import org.apache.flink.configuration.Configuration;

Related

How can I do this in java without using Either?

Copy fields across objects of different type in gRPC

Access a value's parent naming from within the instantiated class (Scala)?

Using boxed/atomic values in Scala with Chronicle Map

Get an existing object from a Map with a key

Categories

Resources