Create Apache Pulsar Sink on Kubernetes cluster with Java API

Create Apache Pulsar Sink on Kubernetes cluster with Java API - java

I am trying to create a Clickhouse sink connector with the Java API on a distant Pulsar cluster running on kubernetes but I experience some difficulties with it.
My cluster run on pulsar 2.8.1
pulsarAdmin.sinks().createSinkWithUrl(
mySinkConfig,
"https://archive.apache.org/dist/pulsar/pulsar-2.8.1/connectors/pulsar-io-jdbc-clickhouse-2.8.1.nar")
The api returns well and seems to create the sink.
I can get its status or its configuration however the status is in failure and when checking the pod on kubernetes I see the pod corresponding to the new sink crashing
NAME READY STATUS RESTARTS AGE
pf-my-tenant-test-sink6-ab222ce8-0 0/1 CrashLoopBackOff 15 57m
with this in the k8s logs
null
Reason: java.io.IOException: No such file or directory
Here is the command used by pulsar when creating the pod
sh -c
/pulsar/bin/pulsar-admin
--auth-plugin org.apache.pulsar.client.impl.auth.AuthenticationToken
--auth-params file:///etc/auth/token
--admin-url https://XXXX:8443/ functions download
--tenant my-tenant
--namespace test
--name sink6
--destination-file /pulsar/download/pulsar_functions/functions17103366778764930042.tmp && SHARD_ID=${POD_NAME##*-} && echo shardId=${SHARD_ID} && exec java -cp /pulsar/instances/java-instance.jar:/pulsar/instances/deps/*
-Dpulsar.functions.extra.dependencies.dir=/pulsar/instances/deps -Dpulsar.functions.instance.classpath=/pulsar/conf:::/pulsar/lib/*:
-Dlog4j.configurationFile=kubernetes_instance_log4j2.xml -Dpulsar.function.log.dir=logs/functions/my-tenant/test/sink6
-Dpulsar.function.log.file=sink6-$SHARD_ID -Xmx1073741824 org.apache.pulsar.functions.instance.JavaInstanceMain
--jar /pulsar/download/pulsar_functions/functions17103366778764930042.tmp --instance_id $SHARD_ID --function_id 16d9bcab-abcd-2f4b-b536-d3fb5d1232ab
--function_version 8807b42e-b1fc-4495-862e-21fe27085eb7
--function_details '{"tenant":"my-tenant","namespace":"test","name":"sink6","className":"org.apache.pulsar.functions.api.utils.IdentityFunction","autoAck":true,"parallelism":1,"source":{"typeClassName":"org.apache.pulsar.client.api.schema.GenericRecord","inputSpecs":{"my-topic":{}},"cleanupSubscription":true},"sink":{"className":"org.apache.pulsar.io.jdbc.ClickHouseJdbcAutoSchemaSink","configs":"{\"userName\":\"XXXXX\",\"password\":\"XXXX\",\"jdbcUrl\":\"jdbc:clickhouse://XXXXXX\",\"tableName\":\"XXXXXXX\"}","typeClassName":"org.apache.pulsar.client.api.schema.GenericRecord"},"resources":{"cpu":1.0,"ram":"1234","disk":"1234"},"componentType":"SINK"}'
--pulsar_serviceurl pulsar+ssl://XXXX:6651/
--client_auth_plugin org.apache.pulsar.client.impl.auth.AuthenticationToken
--client_auth_params file:///etc/auth/token
--use_tls false
--tls_allow_insecure false
--hostname_verification_enabled false
--max_buffered_tuples 1024
--port 9093
--metrics_port 39809
--pending_async_requests 1000
--expected_healthcheck_interval -1
--secrets_provider org.apache.pulsar.functions.secretsprovider.ClearTextSecretsProvider
--cluster_name pulsar-ofgebi
--nar_extraction_directory /tmp
Does anyone has any idea why creating a sink with the Java API could result in such error ?

Related

Jenkins kubernetes plugin: provided port:50000 is not reachable

I have set up jenkins on GKE using the official helm chart.
I have also created an nginx-ingress controller installation using helm and I am able to access jenkins via https://112.222.111.22/jenkins where 112.222.111.22 is the static IP I am passing to the load balancer.
I am also able to create jobs.
However, when I try to spin up inbound remote agent:
▶ java -jar agent.jar -noCertificateCheck -jnlpUrl https://112.222.111.22/jenkins/computer/My%20Builder%203/slave-agent.jnlp -secret <some_secret>
...
WARNING: Connect timed out
Feb 28, 2020 5:57:18 PM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: https://112.222.111.22/jenkins/ provided port:50000 is not reachable
java.io.IOException: https://112.222.111.22/jenkins/ provided port:50000 is not reachable
at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:303)
at hudson.remoting.Engine.innerRun(Engine.java:527)
at hudson.remoting.Engine.run(Engine.java:488)
Why is that?

I had similar issue. I have solved it enabling the "Use WebSocket". Jenkins Salve/Agent > Configure > Launch Method > Use WebSocket (enable) > Save.

Might be because port 50000 is not open in the Jenkins master.
You can try to open the port by Creating Inbound rule in Jenkins Master's firewall which allows Port 50000 (coming from other machine in this case Agent, hence Inbound rule).
Try following steps:
Go to Master Jenkins.
Open Wiindows Defender Firewall--> Advance setting --> create Inbound Rule allow port 50000
Note: In 1 inbound rule you can allow more than one port as shown in below.
Restart master Jenkins service.
Check if (step is valid only if you have created windows service on agent) agent service is running continuously.
Check Agent status in Jenkins !

looks like port 50000 is not open in the jenkins master. Try to open the port and restart the machine and instance ( i.e jenkins via url ) and see if it helps.

How to reference local Kafka and Zookeeper config on Spring Cloud Dataflow "Cloudfoundry" server start

Here's is what I have successfully done so far on SCDF Local Server
I have successfully deployed SCDF server on my local and also I have used Kafka and Zookeeper config parameters with it i.e
mymac$ java -jar spring-cloud-dataflow-server-local-1.3.0.RELEASE.jar
--spring.cloud.dataflow.applicationProperties.stream.spring.cloud.stream.kafka.binder.brokers=localhost:9092
--spring.cloud.dataflow.applicationProperties.stream.spring.cloud.stream.kafka.binder.zkNodes=localhost:2181
I was able to create my stream
ingest = producer-app > :broker1
filter = :broker1 > filter-app > :broker2
Now I need help to do the exact same thing on PCFDev
I have my PCFDEv running
I have to deploy SCDF-Cloudfoundry jar with my local kafka and zookeeper parameters to pcfDev but when I do the following steps it gives me an error that its
1.1) cf push -f manifest-scdf.yml --no-start -p /XXX/XXX/XXX/spring-cloud-dataflow-server-cloudfoundry-1.3.0.BUILD-SNAPSHOT.jar -k 1500M
this runs good...no problem. but 1.2
1.2) cf start dataflow-server --spring.cloud.dataflow.applicationProperties.stream.spring.cloud.stream.kafka.binder.brokers=host.pcfdev.io:9092 --spring.cloud.dataflow.applicationProperties.stream.spring.cloud.stream.kafka.binder.zkNodes=host.pcfdev.io:2181
gives me this error:--
Incorrect Usage: unknown flag `spring.cloud.dataflow.applicationProperties.stream.spring.cloud.stream.kafka.binder.brokers'
below is my manifest-scdf.yml file
---
instances: 1
memory: 2048M
applications:
- name: dataflow-server
host: dataflow-server
services:
- redis
- rabbit
env:
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_URL: https://api.local.pcfdev.io
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_ORG: pcfdev-org
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SPACE: pcfdev-space
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_DOMAIN: local.pcfdev.io
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_USERNAME: admin
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_PASSWORD: admin
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SKIP_SSL_VALIDATION: true
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES: rabbit
MAVEN_REMOTE_REPOSITORIES_REPO1_URL: https://repo.spring.io/libs-snapshot
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_DISK: 512
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_BUILDPACK: java_buildpack
spring.cloud.deployer.cloudfoundry.stream.memory: 400
spring.cloud.dataflow.features.tasks-enabled: true
spring.cloud.dataflow.features.streams-enabled: true
Please help me. Thank you.

There are few options to supply Kafka credentials to Stream-apps in PCF.
1. Kafka CUPs
This option allows you to create CUPs for an external Kafka-service. While deploying the stream, you can then supply the coordinates to each application either individually as described in the docs or you can supply them as global properties for all the stream-apps deployed by the SCDF-server.
2. Inline properties
Instead of extracting from CUPs, you can also directly supply the HOST/PORT while deploying the stream. Again, this can be applied globally, too.
stream deploy myTest --properties "app.*.spring.cloud.stream.kafka.binder.brokers=<HOST>:9092,app.*.spring.cloud.stream.kafka.binder.zkNodes=<HOST>:2181
Note: The HOST must be reachable for the stream-apps; o'wise, it ill continue to connect to localhost and potentially fail since the apps are running inside a VM.

The error you're seeing is coming from the CF CLI, it's interpreting those (I'm assuming environment) variables you're providing as flags to the cf start command and failing.
You could either provide them in your manifest.yml or set their values manually using the CLI's cf set-env command by doing something like this:
cf set-env dataflow-server spring.cloud.dataflow.applicationProperties.stream.spring.cloud.stream.kafka.binder.brokers host.pcfdev.io:9092
cf set-env dataflow-server spring.cloud.dataflow.applicationProperties.stream.spring.cloud.stream.kafka.binder.zkNodes host.pcfdev.io:2181
After you've set them they should be picked up when you run cf start dataflow-server.
Relevant CLI docs:
http://cli.cloudfoundry.org/en-US/cf/set-env.html

No able to connect to spark cluster via sparklyr package when my custom package method is invoked via OpenCpu

I have created an R package that makes use of the sparklyr capabilities within a dummy hello function. My package does a very simple thing as connection to a spark cluster, print the spark version and disconnect. The package is successfully clean and build and is successfully executed from R and Rstudio.
# Connect to Spark cluster
spark_conn <- sparklyr::spark_connect(master = "spark://elenipc.home:7077", spark_home = '/home/eleni/spark-2.2.0-bin-hadoop2.7/')
# Print the version of Spark
sv<- sparklyr::spark_version(spark_conn)
print(sv)
# Disconnect from Spark
sparklyr::spark_disconnect(spark_conn)
It is very important for me to be able to execute the hello function from OpenCpu rest api. (I have used opencpu api for executing many other custom created packages.)
When invoking opencpu api like:
curl http://localhost/ocpu/user/rstudio/library/myFirstBigDataPackage/R/hello/print -X POST
i get the following response:
Failed while connecting to sparklyr to port (8880) for sessionid (89615): Gateway in port (8880) did not respond.
Path: /home/eleni/spark-2.2.0-bin-hadoop2.7/bin/spark-submit
Parameters: --class, sparklyr.Shell, '/home/rstudio/R/x86_64-pc-linux-gnu-library/3.4/sparklyr/java/sparklyr-2.2-2.11.jar', 8880, 89615
Log: /tmp/ocpu-temp/file26b165c92166_spark.log
---- Output Log ----
Error occurred during initialization of VM
Could not allocate metaspace: 1073741824 bytes
---- Error Log ----
In call:
force(code)
Of course allocate more memory to both java & spark executor does not resolve the issue. permission issues are also discarded as i already configured the etc/apparmor.d/opencpu.d/custom file so as to permit opencpu to have rwx privileges on spark. It seems to be a connectivity issue that i do not know how to face. During method invocation via opencpu api spark logs do not even print something.
For you info my environment configuration is as follows:
java version "1.8.0_65"
R version 3.4.1
RStudio version 1.0.153
spark-2.2.0-bin-hadoop2.7
opencpu 1.5 (compatible with my Ubuntu 14.04.3 LTS)
Thank you very much for you support and time!!!

Exception getting zoo instance in geomesa accumulo data store

Thanks for your attention. I am setting up Accumulo Data Store using geomesa and zookeper and have completed set up configuration changes and installed required instance like accumulo, java and maven.
When I am creating a new feature using command line interface using command geomesa create-schema -u root -p ****** \
-c device_ping \
-f feature \
-s uuid:String:index=true,dtg:Date,geom:Point:srid=4326 \
--dtg dtg
It fails giving
Exception getting zoo instance and terminate throwing error "Unable to create data store, please check your connection parameters."
I am unable to find solution to this problem and don't know which configuration parameters are wrong. Here is the details of attached screenshot

GeoMesa works to figure out the Zookeepers from a local copy of the Accumulo cluster's configuration. That configuration is likely in $ACCUMULO_HOME.
You can manually set the zookeepers with -z host1,host2,host3. If the hosts are correct (or you set them manually), you might check that zookeeper is running and can be accessed from your laptop.
To double check Zookeeper, you can do something like...
echo ruok | nc hostName portNumber
If Zookeeper is running, you'll receive an 'imok' message back.
Lastly, if Zookeeper is up and running, but just slow for some reason, you can increase the Zookeeper timeout by setting the Java system property "instance.zookeeper.timeout" higher. The timeout is currently set to 5 seconds.

Launch spark master windows7

Using win7-64, jdk8, sparks1.6.2.
I have spark running, winutils, HADOOP_HOME, etc
Per documentation Note: The launch scripts do not currently support Windows. To run a Spark cluster on Windows, start the master and workers by hand. But does not say how?
How do I launch spark master on windows?
Tried running sh start-master.sh thru git bash : failed to launch org.apache.spark.deploy.master.Master: Even though it prints out Master --ip Sam-Toshiba --port 7077 --webui-port 8080 - So I don't know what all this means.
But when I try spark-submit --class " " --master spark://Sam-Toshiba:7077 target/ .jar -
I get errors:
WARN AbstractLifeCycle: FAILED SelectChannelConnector#0.0.0.0:
4040: java.net.BindException: Address already in use: bind
java.net.BindException: Address already in use
WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
17/01/12 14:44:29 WARN AppClient$ClientEndpoint: Failed to connect to master Sam-Toshiba:7077
java.io.IOException: Failed to connect to Sam-Toshiba/192.168.137.1:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
Also tried spark://localhost:7077 - same errors

On Windows you can launch Master using below command. Open command prompt and go to Spark bin folder and execute
spark-class.cmd org.apache.spark.deploy.master.Master
Above command will print like Master: Starting Spark master at spark://192.168.99.1:7077 in console as per IP of your machine. You can check the UI at http://192.168.99.1:8080/
If you want to launch worker once your master is up you can use below command. This will use all the available cores of your machine.
spark-class.cmd org.apache.spark.deploy.worker.Worker spark://192.168.99.1:7077
If you want to utilize 2 cores of your 4 cores of machine then use
spark-class.cmd org.apache.spark.deploy.worker.Worker -c 2 spark://192.168.99.1:7077

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Create Apache Pulsar Sink on Kubernetes cluster with Java API - java

Related

Jenkins kubernetes plugin: provided port:50000 is not reachable

How to reference local Kafka and Zookeeper config on Spring Cloud Dataflow "Cloudfoundry" server start

No able to connect to spark cluster via sparklyr package when my custom package method is invoked via OpenCpu

Exception getting zoo instance in geomesa accumulo data store

Launch spark master windows7

Categories

Resources