Apache Flink Python Table API UDF Dependencies Problem - java

After starting a Python Table API Job that involves user defined functions (UDF) by submitting it to a local cluster, it crashes with a
py4j.protocol.Py4JJavaError caused by
java.util.ServiceConfigurationError: org.apache.beam.sdk.options.PipelineOptionsRegistrar: org.apache.beam.sdk.options.DefaultPipelineOptionsRegistrar not a subtype.
I am aware that this is a bug concerning the dependencies on the lib path/classloading. I have already tried to follow all instructions at the following link: https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/debugging_classloading.html
I have tried extensively different configurations with the classloader.parent-first-patterns-additional config option. Different entries with org.apache.beam.sdk.[...] have led to different, additional error messages.
The following dependencies, which refer to apache beam, are on the lib path:
beam-model-fn-execution-2.20.jar
beam-model-job-management-2.20.jar
beam-model-pipeline-2.20.jar
beam-runners-core-construction-java-2.20.jar
beam-runners-java-fn-execution-2.20.jar
beam-sdks-java-core-2.20.jar
beam-sdks-java-fn-execution-2.20.jar
beam-vendor-grpc-1_21_0-0.1.jar
beam-vendor-grpc-1_26_0.0.3.jar
beam-vendor-guava-26_0-jre-0.1.jar
beam-vendor-sdks-java-extensions-protobuf-2.20.jar
I can also rule out that it is due to my code, as I have tested the following sample code of the project website: https://flink.apache.org/2020/04/09/pyflink-udf-support-flink.html
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.table import StreamTableEnvironment, DataTypes
from pyflink.table.descriptors import Schema, OldCsv, FileSystem
from pyflink.table.udf import udf
env = StreamExecutionEnvironment.get_execution_environment()
env.set_parallelism(1)
t_env = StreamTableEnvironment.create(env)
add = udf(lambda i, j: i + j, [DataTypes.BIGINT(), DataTypes.BIGINT()], DataTypes.BIGINT())
t_env.register_function("add", add)
t_env.connect(FileSystem().path('/tmp/input')) \
.with_format(OldCsv()
.field('a', DataTypes.BIGINT())
.field('b', DataTypes.BIGINT())) \
.with_schema(Schema()
.field('a', DataTypes.BIGINT())
.field('b', DataTypes.BIGINT())) \
.create_temporary_table('mySource')
t_env.connect(FileSystem().path('/tmp/output')) \
.with_format(OldCsv()
.field('sum', DataTypes.BIGINT())) \
.with_schema(Schema()
.field('sum', DataTypes.BIGINT())) \
.create_temporary_table('mySink')
t_env.from_path('mySource')\
.select("add(a, b)") \
.insert_into('mySink')
t_env.execute("tutorial_job")
When executing this code, the same error message appears.
Does anyone have a description of a configuration of a Flink cluster that can run Python Table API jobs with UDF? Many thanks for all tips in advance!

The problem is solved by the new version 1.10.1 of Apache Flink. Executing the sample script shown in the question is now possible via the binaries with the command run -py path/to/script without any problems.
As for the dependencies, they are already included in the already delivered flink_table_x.xx-1.10.1.jar. So no further dependencies need to be added to the lib-path, which was done in the question by the debugging/configuration attempt.

Related

java.lang.NoClassDefFoundError when running read_stream_kafka using sparklyr library in R

I am trying to stream data from kafka producer to R for processing using sparklyr but when I run the read_stream_kafka() function i get an error below:
read_options = list(kafka.bootstrap.server = "localhost:9092", subscribe = "topic")
> stream = stream_read_kafka(sc, options = read_options)
Error:java.lang.NoClassDefFoundError:org/apache/kafka/common/serialization/ByteArrayDeserializer
Caused by: java.lang.ClassNotFoundException:org.apache.kafka.common.serialization.ByteArrayDeserializer
I am using the Docker platform to contain the KAFKA and zookeeper container.
I also ensured that i am using java8, and I have already initialised the following paths: SPARK_HOME, SCALA_HOME, and JAVA_HOME as well.
I also added the rkafka and rjava library since I suspected its probably because I havent included the kafka library but the error is still there.
Further information that may be useful to help solve this challenge include:
My kafka version is kafka_2.13_2.8.1, the spark version is spark-2.8.4-bin-hadoop2.7, the scala version is 2.12, and the R version is 4.1.1. Might the error be resulting from a compatibility issue. If so, then how do i solve this.
Please assist on this
Many thanks

How can I export traces generated by the OpenTelemetry Java agent to Google Cloud Trace?

I've got a Spring Boot application that'd I'd like to automatically generate traces for using the OpenTelemetry Java agent, and subsequently upload those traces to Google Cloud Trace.
I've added the following code to the entry point of my application for sending traces:
OpenTelemetrySdk.builder()
.setTracerProvider(
SdkTracerProvider.builder()
.addSpanProcessor(
SimpleSpanProcessor.create(TraceExporter.createWithDefaultConfiguration())
)
.build()
)
.buildAndRegisterGlobal();
...and I'm running my application with the following system properties:
-javaagent:path/to/opentelemetry-javaagent-all.jar \
-jar myapp.jar
...but I don't know how to connect the two.
Is there some agent configuration I can apply? Something like:
-Dotel.traces.exporter=google_cloud_trace
I ended up resolving this as follows:
Clone the GoogleCloudPlatform /
opentelemetry-operations-java repo
git clone
git#github.com:GoogleCloudPlatform/opentelemetry-operations-java.git
Build the exporter-auto project
./gradlew clean :exporter-auto:shadowJar
Copy the jar produced in exporter-auto/build/libs to my target project
Run the application with the following arguments:
-javaagent:path/to/opentelemetry-javaagent-all.jar
-Dotel.javaagent.experimental.extensions=[artifact-from-step-3].jar
-Dotel.traces.exporter=google_cloud_trace
-Dotel.metrics.exporter=none
-jar myapp.jar
Note: This setup does not require any explicit code changes in the target code base.

heroku nodejs app - events.js:167 Error Unhandled 'error' event : spawn java ENOENT

I'm having a node-js app on Heroku using the pdfMerge.js library.
following the documentation I'm using the stream event mechanism as a callback to identify the end of the process
then an exception is thrown :
events.js:167 Error: spawn java ENOENT.
I'm almost sure it's happening because I'm messing required java installation as described here:
pdfmerger combines multiple PDF-files into a single PDF-file. It is a node module that utilizes the Apache PDFBox Library, which the required functionality are distributed along with this module. The only requirement for this module to run, is having Java 6 or higher in the path.
I'm Not familiar enough with Heroku installation/configuration process in order to make it work.
thanks in advance
You can add Java to your app by adding the heroku/jvm buildpack like this:
$ heroku buildpacks:add -i 1 heroku/jvm
Then redeploy with git commit --allow-empty and git push heroku master.

Hadoop log4j cannot find KafkaLog4JAppender.class

I added KafkaLog4JAppender functionality to my MR job.
locally the job is running and sending the formatted logs into my Kafka cluster.
when I try to run it from the yarn server, using:
jar [jar-name].jar [DriverClass].class [job-params] -Dlog4j.configuration=log4j.xml -libjars
I get the following expception:
log4j:ERROR Could not create an Appender. Reported error follows.
java.lang.ClassNotFoundException: kafka.producer.KafkaLog4jAppender
the KafkaLog4JAppender class is in the path.
running
jar tvf [my-jar].jar | grep KafkaLog4J
finds the class
I'm kinda lost and would appreciate any helpfull input
thanks in advance!
If it works in local mode and not working in Yarn/distributed mode, then it could be problem of jar not being distributed properly. YOu might want to check Using third part jars and files in your MapReduce application(Distributed cache) for details on how to distribute your jar containing KafkaLog4jAppender.class

Java Stored Procedure using MQ

I need to create a Java stored procedure in Oracle. I have used IBM's sample class for creating an MQ message from a simple class outside of Java EE environment. I have tested the class by itself and it is working.
My Oracle vesrion is 11i.
When I am trying to add the jars used in the simple application to load to Oralce along with my simple class, I am getting errors about class not found, even if same jars work with the test case. I am stuck with this for over a week and am desperately hoping that someone would be able to help me with it.
Kinds of errors I am getting are like this from -v flag with loadjava utility on the client
on line 326 / 327, you see this:
creating : class com/ibm/mq/jms/admin/AP loading : class
com/ibm/mq/jms/admin/AP
and then at the end starting from line 6224 to end, it indicates that the above class can’t be resolved:
com/ibm/mq/jms/admin/APRCXI: ORA-29534: referenced object
xxxx.com/ibm/mq/jms/admin/AP could not be resolved
com/ibm/mq/jms/admin/APSDX: ORA-29534: referenced object xxxx.com/ibm/mq/jms/admin/AP could not be resolved exiting : errors
resolving class com/ibm/mq/jms/admin/AP
the command I used is:
c:\Oracle\product\11.2.0\client_1\bin\loadjava.bat -f -jarsasdbobjects
-prependjarnames -stoponerror -u xxxx/yyyy#SID -v -resolve lib\jms.jar lib\com.ibm.mqjms.jar lib\com.ibm.mq.jmqi.jar lib\dhbcore.jar
lib\fscontext.jar src\com\test\javasp\mq\JmsProducer.java
I also tried with -genmissing option with some additional jars ( list I found here ), but still get similar error for a different class.
Other issue with this that I am facing is that if I get an error and try to use the dropjava command from Oracle, it doesn't work either.
I also saw from this link, that this person was successful, but unfortunately, he/she didn't indicate how they used loadjava to load the jars.
If i can provide any other information, please let me know.
If anyoneone has any idea on how to get the Java Stored Procedure to use IBM MQ working with Oracle 11i, I would really appreciate the help.
Found a detailed answer in this blog entry. I tried it and it worked for me.
In Oracle there is no concept of CLASSPATH, so the standard MQ Client install is useless. You can only load the jars reference by your app within the database schema. Classes are resolved when loaded with the -r option. You can further on specify your own resolved using -r -Resolver (schemas). (check Oracle docs for exact format). So in effect the database schemas becomes the classpath.
Using Websphere MQ classes for Java poses a number of problems, you have to ensure that the Oracle JDK version is at an appropriate support level to connect to the chosen MQ server version. Check the system requirements for websphere MQ Vx.x. You should find IBM's web references. check the support for MQ classed for java.
I have such an issue at the moment trying to connect to Mq using Oracle 10 and JDK 1.4.2. I had to recompile my Java code using JDK 1.4.x. This does not work and I assume it is because I connect to MQ 7.0.1.7 which requires JRE 1.7 as minimum.

Categories