How to delete a topic configuration via KafkaAdminClient - java

I want to delete a configuration (reset it to default) for a topic which was overridden before. This is possible with the provided script
$> ./kafka-configs.sh --zookeeper localhost:2181 --alter --entity-type topics \
--entity-name test --delete-config my.overridden.config
Is there a way to do this with the KafkaAdminClient provided in kafka-clients-1.1.1.jar?
I just found the method org.apache.kafka.clients.admin.KafkaAdminClient.alterConfigs(Map<ConfigResource, Config>, AlterConfigsOptions), but when I call it with a configuration value set to null, I get a NullPointerException on the server:
[2018-07-31 11:24:01,658] ERROR [Admin Manager on Broker 0]: Error processing alter configs request for resource Resource(type=TOPIC, name='test'}, config org.apache.kafka.common.requests.AlterConfigsRequest$Config#5d4fef59 (kafka.server.AdminManager)
java.lang.NullPointerException
at java.util.Hashtable.put(Hashtable.java:459)
at java.util.Properties.setProperty(Properties.java:166)
at kafka.server.AdminManager$$anonfun$alterConfigs$1$$anonfun$apply$18.apply(AdminManager.scala:357)
at kafka.server.AdminManager$$anonfun$alterConfigs$1$$anonfun$apply$18.apply(AdminManager.scala:356)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.server.AdminManager$$anonfun$alterConfigs$1.apply(AdminManager.scala:356)
at kafka.server.AdminManager$$anonfun$alterConfigs$1.apply(AdminManager.scala:339)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at kafka.server.AdminManager.alterConfigs(AdminManager.scala:339)
at kafka.server.KafkaApis.handleAlterConfigsRequest(KafkaApis.scala:1987)
at kafka.server.KafkaApis.handle(KafkaApis.scala:136)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)
at java.lang.Thread.run(Thread.java:745)
An empty list won't work either.
I am using Kafka in version 2.11-1.1.0.

A lot of the function provided in Kafka admin jars are APIs and not a direct scripted function. You can change a topic's configuration using zookeeper AdminUtils in a java program as below. Send an empty property object to the function to clear out existing properties.
import org.I0Itec.zkclient.ZkClient;
import org.I0Itec.zkclient.ZkConnection;
import kafka.admin.AdminUtils;
import kafka.utils.ZKStringSerializer$;
import kafka.utils.ZkUtils;
public static void changeConfig(String topic) {
ZkClient zkClient = new ZkClient("your_zkHost", 5000, 5000, ZKStringSerializer$.MODULE$);
ZkUtils zkUtils = new ZkUtils(zkClient, new ZkConnection("your_zkHost"), false);
Properties prop = new Properties();
prop.setProperty("retention.ms", "3600000");
AdminUtils.changeTopicConfig(zkUtils, topic, prop);
}
If you need this function often, you can include a file reader to get new configurations and package into a jar for easy execution.

Related

Error when using jar but not in the library itself

I built a Java library to send message in IBM MQ.
It is working fine when I execute the code on the library project.
However, when I use the .jar into another tool (JMeter), an error occurs.
java.lang.NoSuchMethodError: com.ibm.mq.jmqi.JmqiFactory.getInstance(Lcom/ibm/mq/jmqi/JmqiThreadPoolFactory;Lcom/ibm/mq/jmqi/JmqiPropertyHandler;)Lcom/ibm/mq/jmqi/JmqiEnvironment;
at com.ibm.msg.client.mqlight.MQLightComponent.getImplementationInfo(MQLightComponent.java:220) ~[mq-jms-8.0.0.3.jar:8.0.0.3 - p800-003-150615.2]
at com.ibm.msg.client.commonservices.trace.Trace.getVersion(Trace.java:1692) ~[mq-jms-7.0.1.3.jar:?]
at com.ibm.msg.client.commonservices.trace.Trace.createFFSTString(Trace.java:1650) ~[mq-jms-7.0.1.3.jar:?]
at com.ibm.msg.client.commonservices.trace.Trace.ffstInternal(Trace.java:1536) ~[mq-jms-7.0.1.3.jar:?]
at com.ibm.msg.client.commonservices.trace.Trace.ffst(Trace.java:1444) ~[mq-jms-7.0.1.3.jar:?]
at com.ibm.msg.client.jms.JmsFactoryFactory.getInstance(JmsFactoryFactory.java:209) ~[mq-jms-7.0.1.3.jar:7.0.1.3 - k701-103-100812]
at com.ibm.mq.jms.MQConnectionFactory.initialiseMQConnectionFactory(MQConnectionFactory.java:3325) ~[mq-jms-7.0.1.3.jar:7.0.1.3 - k701-103-100812]
at com.ibm.mq.jms.MQConnectionFactory.<init>(MQConnectionFactory.java:274) ~[mq-jms-7.0.1.3.jar:7.0.1.3 - k701-103-100812]
at my.package.MQ_Manager.createConnection(MQ_Manager.java:36) ~[my-jar.jar:?]
at my.package.MQ_Manager.<init>(MQ_Manager.java:27) ~[my-jar.jar:?]
at my.package.Producer.<init>(Producer.java:18) ~[my-jar.jar:?]
at my.package.Request.sendRequest(Request.java:116) ~[my-jar.jar:?]
at my.package.Request$sendRequest$2.call(Unknown Source) ~[?:?]
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) ~[groovy-all-2.4.16.jar:2.4.16]
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116) ~[groovy-all-2.4.16.jar:2.4.16]
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:120) ~[groovy-all-2.4.16.jar:2.4.16]
at Script1.run(Script1.groovy:14) ~[?:?]
at org.codehaus.groovy.jsr223.GroovyScriptEngineImpl.eval(GroovyScriptEngineImpl.java:321) ~[groovy-all-2.4.16.jar:2.4.16]
at org.codehaus.groovy.jsr223.GroovyCompiledScript.eval(GroovyCompiledScript.java:72) ~[groovy-all-2.4.16.jar:2.4.16]
at javax.script.CompiledScript.eval(Unknown Source) ~[?:1.8.0_31]
at org.apache.jmeter.util.JSR223TestElement.processFileOrScript(JSR223TestElement.java:223) ~[ApacheJMeter_core.jar:5.1 r1853635]
at org.apache.jmeter.protocol.java.sampler.JSR223Sampler.sample(JSR223Sampler.java:71) ~[ApacheJMeter_java.jar:5.1 r1853635]
at org.apache.jmeter.threads.JMeterThread.doSampling(JMeterThread.java:622) ~[ApacheJMeter_core.jar:5.1 r1853635]
at org.apache.jmeter.threads.JMeterThread.executeSamplePackage(JMeterThread.java:546) ~[ApacheJMeter_core.jar:5.1 r1853635]
at org.apache.jmeter.threads.JMeterThread.processSampler(JMeterThread.java:486) ~[ApacheJMeter_core.jar:5.1 r1853635]
at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:253) ~[ApacheJMeter_core.jar:5.1 r1853635]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_31]
Corresponding code:
MQConnectionFactory factory = new MQConnectionFactory();
factory.setHostName(properties.getProperty("HOST"));
factory.setPort(Integer.parseInt(properties.getProperty("PORT")));
factory.setChannel(properties.getProperty("CHANNEL"));
factory.setTransportType(WMQConstants.WMQ_CM_CLIENT);
factory.setQueueManager(properties.getProperty("QUEUE_MANAGER"));
factory.setAppName(properties.getProperty("APP_NAME"));
Connection connection = factory.createConnection(properties.getProperty("APP_USER"), properties.getProperty("APP_PASSWORD"));
connection.start();
return connection;
Error occurs at this line MQConnectionFactory factory = new MQConnectionFactory();
Any idea? Thank you.
Update 1
To create the .jar I :
Clicked on Export
Selected Runnable JAR file
Selected Extract required libraries into generated JAR
Update 2
Moreover when I built the Jar, I get this warning. Do you think that is important?
This operation repacks references libraries. Please review the licences associated with libraries you wish ti reference to make sure you are able to repack them using this application. Note also that this operation does not copy signature files from orignal libraies to the generated JAR file.
After disccussing the issue with the OP I went ahead a verified the issue myself.
The result is that its working for me ...
Step 1:
Created a maven project with included the following code
package test;
import javax.jms.JMSException;
import com.ibm.mq.jms.MQConnectionFactory;
public class JmeterTest {
public JmeterTest() {
}
public void test() throws JMSException {
MQConnectionFactory factory = new MQConnectionFactory();
factory.setAppName("myApp");
}
public static void main(String[] args) {
System.out.println("test");
}
}
Step 2:
Exported this from eclipse as runnable jar and copied into JMeter (\lib\ext\).
Note that a export with library handling package required jars into generated jar does not work. Use Extract into generated jar or Copy into subfolder (and then copy the jars from the subfolder into \lib\ext as well).
related dependecies are:
com.ibm.mq.allclient-9.0.4.0.jar
bcokix-jdk15on-1.57.jar
bcprov-jdk15on-1.57.jar
javax.jms-api-2.0.1.jar
Step 3:
Started JMeter and created a ThreadGroup with a JSR223 Sampler.
import test.JmeterTest;
new JmeterTest().test();
Then started the test. No error occurred.
Step 4:
Instead of exporting a library, you could directly (after adding the dependencies) add the required code into the script panel:
import javax.jms.JMSException;
import com.ibm.mq.jms.MQConnectionFactory;
MQConnectionFactory factory = new MQConnectionFactory();
factory.setAppName("myApp");
Worked as well
Conclusion: Interference from other dependencies on the jmeter classpath are the likeliest cause of the issue.

Java trying to connect to HDFS throws HADOOP_HOME unset, could not find winutils

I'm trying to prototype an app to use Hadoop as a datastore and I'm falling over at the first hurdle. I've got access to a Hadoop cluster and I purloined a test sample from Spring to try out the first baby step:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hdfs.DistributedFileSystem;
import org.junit.jupiter.api.Test;
import java.io.PrintWriter;
import java.net.URI;
import java.util.Scanner;
public class HdfsTest {
#Test
public void testHdfs() throws Exception {
System.setProperty("HADOOP_USER_NAME", "adam");
// Path that we need to create in HDFS.
// Just like Unix/Linux file systems, HDFS file system starts with "/"
final Path path = new Path("/usr/adam/junk.txt");
// Uses try with resources in order to avoid close calls on resources
// Creates anonymous sub class of DistributedFileSystem to allow calling
// initialize as DFS will not be usable otherwise
try (
final DistributedFileSystem dFS
= new DistributedFileSystem() {
{
initialize(new URI(
"hdfs://hanameservice/user/adam"),
new Configuration());
}
};
// Gets output stream for input path using DFS instance
final FSDataOutputStream streamWriter = dFS.create(path);
// Wraps output stream into PrintWriter to use high level
// and sophisticated methods
final PrintWriter writer = new PrintWriter(streamWriter);
) {
// Writes tutorials information to file using print writer
writer.println("bungalow bill");
writer.println("what did you kill");
System.out.println("File Written to HDFS successfully!");
}
}
These are the Hadoop libraries I'm using:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.8.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.8.1</version>
</dependency>
Could I be missing a dependency?
This is the logging with the errors - it seems there are 2 separate errors though.
2017-06-23 16:01:38.787 WARN --- [ main] org.apache.hadoop.util.Shell : Did not find winutils.exe: {}
java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
at org.apache.hadoop.util.Shell.fileNotFoundException(Shell.java:528)
at org.apache.hadoop.util.Shell.getHadoopHomeDir(Shell.java:549)
at org.apache.hadoop.util.Shell.getQualifiedBin(Shell.java:572)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:669)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1445)
at org.apache.hadoop.fs.FileSystem.initialize(FileSystem.java:221)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:145)
at com.bp.gis.tardis.HdfsTest$1.<init>(HdfsTest.java:34)
at com.bp.gis.tardis.HdfsTest.testHdfs(HdfsTest.java:31)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:316)
at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:114)
at org.junit.jupiter.engine.descriptor.MethodTestDescriptor.lambda$invokeTestMethod$6(MethodTestDescriptor.java:171)
at org.junit.jupiter.engine.execution.ThrowableCollector.execute(ThrowableCollector.java:40)
at org.junit.jupiter.engine.descriptor.MethodTestDescriptor.invokeTestMethod(MethodTestDescriptor.java:168)
at org.junit.jupiter.engine.descriptor.MethodTestDescriptor.execute(MethodTestDescriptor.java:115)
at org.junit.jupiter.engine.descriptor.MethodTestDescriptor.execute(MethodTestDescriptor.java:57)
at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.lambda$execute$1(HierarchicalTestExecutor.java:81)
at org.junit.platform.engine.support.hierarchical.SingleTestExecutor.executeSafely(SingleTestExecutor.java:66)
at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:76)
at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.lambda$execute$1(HierarchicalTestExecutor.java:91)
at org.junit.platform.engine.support.hierarchical.SingleTestExecutor.executeSafely(SingleTestExecutor.java:66)
at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:76)
at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.lambda$execute$1(HierarchicalTestExecutor.java:91)
at org.junit.platform.engine.support.hierarchical.SingleTestExecutor.executeSafely(SingleTestExecutor.java:66)
at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:76)
at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:51)
at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:43)
at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:137)
at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:87)
at org.junit.platform.launcher.Launcher.execute(Launcher.java:93)
at com.intellij.junit5.JUnit5IdeaTestRunner.startRunnerWithArgs(JUnit5IdeaTestRunner.java:61)
at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
at org.apache.hadoop.util.Shell.checkHadoopHomeInner(Shell.java:448)
at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:419)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:496)
... 35 common frames omitted
2017-06-23 16:01:39.449 WARN --- [ main] org.apache.hadoop.util.NativeCodeLoader : Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
java.lang.IllegalArgumentException: java.net.UnknownHostException: hanameservice
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
at org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:130)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:343)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:287)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:156)
at com.bp.gis.tardis.HdfsTest$1.<init>(HdfsTest.java:34)
at com.bp.gis.tardis.HdfsTest.testHdfs(HdfsTest.java:31)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:316)
at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:114)
at org.junit.jupiter.engine.descriptor.MethodTestDescriptor.lambda$invokeTestMethod$6(MethodTestDescriptor.java:171)
at org.junit.jupiter.engine.execution.ThrowableCollector.execute(ThrowableCollector.java:40)
at org.junit.jupiter.engine.descriptor.MethodTestDescriptor.invokeTestMethod(MethodTestDescriptor.java:168)
at org.junit.jupiter.engine.descriptor.MethodTestDescriptor.execute(MethodTestDescriptor.java:115)
at org.junit.jupiter.engine.descriptor.MethodTestDescriptor.execute(MethodTestDescriptor.java:57)
at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.lambda$execute$1(HierarchicalTestExecutor.java:81)
at org.junit.platform.engine.support.hierarchical.SingleTestExecutor.executeSafely(SingleTestExecutor.java:66)
at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:76)
at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.lambda$execute$1(HierarchicalTestExecutor.java:91)
at org.junit.platform.engine.support.hierarchical.SingleTestExecutor.executeSafely(SingleTestExecutor.java:66)
at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:76)
at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.lambda$execute$1(HierarchicalTestExecutor.java:91)
at org.junit.platform.engine.support.hierarchical.SingleTestExecutor.executeSafely(SingleTestExecutor.java:66)
at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:76)
at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:51)
at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:43)
at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:137)
at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:87)
at org.junit.platform.launcher.Launcher.execute(Launcher.java:93)
at com.intellij.junit5.JUnit5IdeaTestRunner.startRunnerWithArgs(JUnit5IdeaTestRunner.java:61)
at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: java.net.UnknownHostException: hanameservice
... 36 more
How do I sort this out? My contact person with the Hadoop cluster I'm trying to connect with is not familiar with the hdfs: protocol and their frame of reference seems to be all manual and not programmatic. They want me to login to an edge node and run scripts there in a shell. I feel I should be asking them particular questions, but I'm not sure what.
There are 2 distinct problems:
It appears you are running from a Windows host. On Windows, Hadoop requires native code extensions so that it can integrate with the OS correctly for things like file access semantics and permissions. Notice that the exception message contains a link to an Apache Hadoop wiki page: WindowsProblems. That page contains information on how to handle this.
There is a failure to establish a socket connection to host "hanameservice". This is most likely not a real name, but rather a logical name used for HDFS High Availability. Internally, the HDFS client code would map this logical name to 1 of 2 real NameNode host names, but only if the configuration is complete. You likely do not have a complete set of the configuration files (core-site.xml and hdfs-site.xml) from the cluster. You would need the complete configuration on your local system for this to work.
They want me to login to an edge node and run scripts there in a shell.
Overall, this may be the shortest path for you rather than trying to work through the Windows integration and configuration. If you wrap your code in the Hadoop Tool interface, build it as a jar, and then copy that jar to the edge node, then you'll be able to run it as hadoop jar your-app.jar. You'll be running inside a known working environment, with no need to sort out the native code extensions and no need to worry about whether or not configuration is complete and up-to-date with the cluster configuration.

Spark ML- failing to load model using MatrixFactorizationModel

I am trying to implement recommender system using Spark collaborative filtering.
First I prepare model and save to disk:
MatrixFactorizationModel model = trainModel(inputDataRdd);
model.save(jsc.sc(), "/op/tc/model/");
When I load model using separate process the program fails with below exception:
Code:
static JavaSparkContext jsc ;
private static Options options;
static{
SparkConf conf = new SparkConf().setAppName("TC recommender application");
conf.set("spark.driver.allowMultipleContexts", "true");
jsc= new JavaSparkContext(conf);
}
MatrixFactorizationModel model = MatrixFactorizationModel.load(jsc.sc(),
"/op/tc/model/");
Exception:
Exception in thread "main" java.io.IOException: Not a file:
maprfs:/op/tc/model/data
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:324)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952)
at org.apache.spark.rdd.RDD$$anonfun$aggregate$1.apply(RDD.scala:1114)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.aggregate(RDD.scala:1107)
at org.apache.spark.mllib.recommendation.MatrixFactorizationModel.countApproxDistinctUserProduct(MatrixFactorizationModel.scala:96)
at org.apache.spark.mllib.recommendation.MatrixFactorizationModel.predict(MatrixFactorizationModel.scala:126)
at com.aexp.cxp.recommendation.ProductRecommendationIndividual.main(ProductRecommendationIndividual.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:742)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Is there any configuration i need to set to load the model? any suggestion would be great help.
In Spark as in any other distributed computing framework, it is important to understand where the code runs when you are trying to debug it. It is also important to have access to various types. For example, in YARN, you would have:
the master logs if your record them yourself
the aggregated slave logs (thanks YARN, useful feature !)
the YARN node manager (will for example tell you why a container was killed etc)
etc
Digging into Spark issues can be quite time consuming if you don't look at the right place from the start. Now more specifically on this question, you have a clear stacktrace, which is not always the case, so you should use it to your advantage.
The top of the stacktrace is
Exception in thread "main" java.io.IOException: Not a file:
maprfs:/op/tc/model/data at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:324)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120) at
org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at
As you can see, the Spark job was executing a map operation when it failed. Who executes a map ? The slaves, therefore you have to make sure your file is available on all slaves, not only on the master.
More generally, you always need to make a clear distinction in your head between the code you are writing for the master and the code you are writing for the slaves. This will help you detecting this kind of interactions, as well as references to non-serializable objects and such common mistakes.

kafka -> storm -> flink : unexpected block data

Im moving a topology from storm to flink. The topology has been reduced to KafkaSpout->Bolt. The bolt is just counting packets and not trying to decode them.
The compiled .jar is submitted to flink via flink -c <entry point> <path to .jar> and hits the following error:
java.lang.Exception: Call to registerInputOutput() of invokable failed
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:529)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot instantiate user function.
at org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:190)
at org.apache.flink.streaming.runtime.tasks.StreamTask.registerInputOutput(StreamTask.java:174)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:526)
... 1 more
Caused by: java.io.StreamCorruptedException: unexpected block data
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1365)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:294)
at org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:255)
at org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:175)
... 3 more
My question(s):
Did I miss a configuration step w/re the KafkaSpout? This was working when used in vanilla-storm.
Are there specific versions of the storm libraries that I need to use? I'm including 0.9.4 with my build.
Something else that I might have missed?
Should I be using the storm KafkaSpout or would I be better served by writing my own using the flink KafkaSource?
EDIT:
Here are the relevant pieces of code:
Topology:
BrokerHosts brokerHosts = new ZkHosts(configuration.getString("kafka/zookeeper"));
SpoutConfig kafkaConfig = new SpoutConfig(brokerHosts, configuration.getString("kafka/topic"), "/storm_env_values", "storm_env_DEBUG");
FlinkTopologyBuilder builder = new FlinkTopologyBuilder();
builder.setSpout("environment", new KafkaSpout(kafkaConfig), 1);
builder.setBolt("decode_bytes", new EnvironmentBolt(), 1).shuffleGrouping("environment");
Init:
FlinkLocalCluster cluster = new FlinkLocalCluster(); // replaces: LocalCluster cluster = new LocalCluster();
cluster.submitTopology("env_topology", conf, buildTopology());
The bolt is based on BaseRichBolt. The execute() fn just logs the presence of any packet to debug. No other code in there.
I just had look at this. There is one issues right now but I got it working locally. You can apply this hot fixed to your code and build the compatibility layer by yourself.
KafkaSpout registers metrics. However, metrics are currently not supported by the compatibility layer. You need to remove the exception in FlinkTopologyContext.registerMetric(...) and just return null. (There is already a open PR that work on the integration of metrics, thus I don't want to push this hot fix into master branch)
Furhtermore, you need to add some configuration parameters to your query manually:
I just made up some values here:
Config c = new Config();
List<String> zkServers = new ArrayList<String>();
zkServers.add("localhost");
c.put(Config.STORM_ZOOKEEPER_SERVERS, zkServers);
c.put(Config.STORM_ZOOKEEPER_PORT, 2181);
c.put(Config.STORM_ZOOKEEPER_SESSION_TIMEOUT, 30);
c.put(Config.STORM_ZOOKEEPER_CONNECTION_TIMEOUT, 30);
c.put(Config.STORM_ZOOKEEPER_RETRY_TIMES, 3);
c.put(Config.STORM_ZOOKEEPER_RETRY_INTERVAL, 5);
You need to add some additional dependencies to your project:
Additionally to flink-storm you need:
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka</artifactId>
<version>0.9.4</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.8.1.1</version>
</dependency>
This works for me, using Kafka_2.10-0.8.1.1 and FlinkLocalCluster execute within Eclipse.
It also works in a local Flink cluster started via bin/start-local-streaming.sh. For this, using bin/flink run command, you need to use FlinkSubmitter instead of FlinkLocalCluster. Furthermore, you need the following dependencies for your jar:
<include>org.apache.storm:storm-kafka</include>
<include>org.apache.kafka:kafka_2.10</include>
<include>org.apache.curator:curator-client</include>
<include>org.apache.curator:curator-framework</include>
<include>com.google.guava:guava</include>
<include>com.yammer.metrics:metrics-core</include>

Samza/Kafka Failed to Update Metadata

I am currently working on writing a Samza Script that will just take data from a Kafka topic and output the data to another Kafka topic. I have written a very basic StreamTask however upon execution I am running into an error.
The error is below:
Exception in thread "main" org.apache.samza.SamzaException: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 193 ms.
at org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.send(CoordinatorStreamSystemProducer.java:112)
at org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.writeConfig(CoordinatorStreamSystemProducer.java:129)
at org.apache.samza.job.JobRunner.run(JobRunner.scala:79)
at org.apache.samza.job.JobRunner$.main(JobRunner.scala:48)
at org.apache.samza.job.JobRunner.main(JobRunner.scala)
Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 193 ms
I not entirely sure how to configure or have the script write the required Kafka metadata. Below is my code for the StreamTask and the properties file. In the properties file I added the Metadata section to see if that would assist in the process afterwards but to no avail. Is that the right direction or am I missing something entirely?
import org.apache.samza.task.StreamTask;
import org.apache.samza.task.MessageCollector;
import org.apache.samza.task.TaskCoordinator;
import org.apache.samza.system.SystemStream;
import org.apache.samza.system.IncomingMessageEnvelope;
import org.apache.samza.system.OutgoingMessageEnvelope;
/*
* Take all messages received and send them to
* a Kafka topic called "words"
*/
public class TestStreamTask implements StreamTask{
private static final SystemStream OUTPUT_STREAM = new SystemStream("kafka" , "words"); // create new system stream for kafka topic "words"
#Override
public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator){
String message = (String) envelope.getMessage(); // pull message from stream
for(String word : message.split(" "))
collector.send(new OutgoingMessageEnvelope(OUTPUT_STREAM, word, 1)); // output messsage to new system stream for kafka topic "words"
}
}
# Job
job.factory.class=org.apache.samza.job.yarn.YarnJobFactory
job.name=test-words
# YARN
yarn.package.path=file://${basedir}/target/${project.artifactId}-${pom.version}-dist.tar.gz
# Task
task.class=samza.examples.wikipedia.task.TestStreamTask
task.inputs=kafka.test
task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory
task.checkpoint.system=kafka
task.checkpoint.replication.factor=1
# Metrics
metrics.reporters=snapshot,jmx
metrics.reporter.snapshot.class=org.apache.samza.metrics.reporter.MetricsSnapshotReporterFactory
metrics.reporter.snapshot.stream=kafka.metrics
metrics.reporter.jmx.class=org.apache.samza.metrics.reporter.JmxReporterFactory
# Serializers
serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory
serializers.registry.metrics.class=org.apache.samza.serializers.MetricsSnapshotSerdeFactory
# Systems
systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
systems.kafka.samza.msg.serde=string
systems.kafka.consumer.zookeeper.connect=localhost:2181/
systems.kafka.consumer.auto.offset.reset=largest
systems.kafka.producer.bootstrap.servers=localhost:9092
# Metadata
systems.kafka.metadata.bootstrap.servers=localhost:9092
This question is about Kafka 0.8 which should be out of support if I am not mistaken.
This fact, combined with the context of people only running into this issue sometimes, but not all the time (and nobody seems to struggle with this in recent years), gives me very good confidence that upgrading to a more recent version of Kafka will resolve the problem.

Categories