Producer#initTransactions doesn't work with KafkaContainer - java

I try to send messages to Kafka with a transaction. So, I use this code:
try (Producer<Void, String> producer = createProducer(kafkaContainerBootstrapServers)) {
producer.initTransactions();
producer.beginTransaction();
Arrays.stream(messages).forEach(
message -> producer.send(new ProducerRecord<>(KAFKA_INPUT_TOPIC, message)));
producer.commitTransaction();
}
...
private static Producer<Void, String> createProducer(String kafkaContainerBootstrapServers) {
return new KafkaProducer<>(
ImmutableMap.of(
ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaContainerBootstrapServers,
ProducerConfig.CLIENT_ID_CONFIG, UUID.randomUUID().toString(),
ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true,
ProducerConfig.TRANSACTIONAL_ID_CONFIG, UUID.randomUUID().toString()
),
new VoidSerializer(),
new StringSerializer());
}
If I use local Kafka, it works well.
But if I use Kafka TestContainers, it freezes on producer.initTransactions():
private static final String KAFKA_VERSION = "4.1.1";
#Rule
public KafkaContainer kafka = new KafkaContainer(KAFKA_VERSION)
.withEmbeddedZookeeper();
How can I configure KafkaContainer to work with transactions?

Try using Kafka for JUnit instead of Kafka testcontainers. I had the same problem with transactions and made them alive in this way.
Maven dependency that I used:
<dependency>
<groupId>net.mguenther.kafka</groupId>
<artifactId>kafka-junit</artifactId>
<version>2.1.0</version>
<scope>test</scope>
</dependency>

I got an exception using Kafka for JUnit as #AntonLitvinenko suggested. My question about it here.
I added this dependency to fix it (see the issue):
<dependency>
<groupId>org.apache.curator</groupId>
<artifactId>curator-test</artifactId>
<version>2.12.0</version>
<exclusions>
<exclusion>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
</exclusion>
</exclusions>
<scope>test</scope>
</dependency>
Also, I used 2.0.1 version for kafka-junit and kafka_2.11:
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>${kafkaVersion}</version>
<scope>test</scope>
</dependency>

Related

How to use BulkProcessor in version 8 +

I'm upgrading from elasticsearch 7 client to 8 and trying to stop using the deprecated RestHighLevelClient. The issue is that in one of the modules i am having BulkProcessor, and i can't figure out how I can use it with the new clients library, since none of them is compatible.
public static Builder builder(Client client, Listener listener, Scheduler flushScheduler, Scheduler retryScheduler, Runnable onClose) {
Objects.requireNonNull(client, "client");
Objects.requireNonNull(listener, "listener");
return new Builder(client::bulk, listener, flushScheduler, retryScheduler, onClose);
}
The builder above expects package org.elasticsearch.client.internal.Client and i don't find any implementation i can use in any of the dependencies below:
<dependency>
<groupId>co.elastic.clients</groupId>
<artifactId>elasticsearch-java</artifactId>
<version>8.3.2</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-client</artifactId>
<version>8.3.2</version>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>8.3.2</version>
</dependency>
Am i missing something?
Thank you!

Java ElasticSearach Bool Geo Query

I'm trying to issue an ElasticSearch query using the java api from my application but for some reason i keep getting the following error:
java.lang.NoClassDefFoundError:
org/apache/lucene/search/spans/SpanBoostQuery at
org.elasticsearch.index.query.QueryBuilders.boolQuery(QueryBuilders.java:301)
Below are the current dependencies I have in my pom.xml:
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>transport</artifactId>
<version>5.4.2</version>
</dependency>
<dependency>
<groupId>org.locationtech.spatial4j</groupId>
<artifactId>spatial4j</artifactId>
<version>0.6</version>
</dependency>
<dependency>
<groupId>com.vividsolutions</groupId>
<artifactId>jts</artifactId>
<version>1.13</version>
<exclusions>
<exclusion>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
</exclusion>
</exclusions>
</dependency>
The code:
double lon = -115.14029016987968;
double lat = 36.17206351151878;
QueryBuilder fullq = boolQuery()
.must(matchAllQuery())
.filter(geoShapeQuery(
"geometry",
ShapeBuilders.newCircleBuilder().center(lon, lat).radius(10, DistanceUnit.METERS)).relation(ShapeRelation.INTERSECTS));
TransportClient client = new PreBuiltTransportClient(Settings.EMPTY)
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost"), 9300));
SearchRequestBuilder finalQuery = client.prepareSearch("speedlimit").setTypes("speedlimit")
.setQuery(fullq);
SearchResponse searchResponse = finalQuery.execute().actionGet();
SearchHits searchHits = searchResponse.getHits();
if (searchHits.getTotalHits() > 0) {
String strSpeed = JsonPath.read(searchResponse.toString(), "$.hits.hits[0]._source.properties.TITLE");
int speed = Integer.parseInt(strSpeed.substring(0, 2));
}
else if (searchHits.getTotalHits() <= 0){
System.out.println("nothing");
}
This is the query I'm trying to run, i've followed the ES docs but can't get any further. Has anyone tried to run a query like this, or am I going the incorrect route? I'm tempted to just abandon the Java API and go back to making HTTP calls from Java, but i thought i would try their Java API. Any tips appreciated, thanks.
This error for me was resolved after i removed the older dependency related to "org.apache.lucene", we need to make sure all the org.apache.lucene dependecies are latest which are at par with the version which contains SpanBoostQuery:
I commented below and it worked:
<!--<dependency>-->
<!--<groupId>org.apache.lucene</groupId>-->
<!--<artifactId>lucene-spellchecker</artifactId>-->
<!--<version>3.6.2</version>-->
<!--</dependency>-->

Unable to fetch data from Cassandra using spark (java)

I am new to Cassandra and Spark and trying to fetch data from DB using spark.
I am using Java for this purpose.
Problem is that there are no exceptions thrown or error occurred but still I am not able to get the data. Find my code below -
SparkConf sparkConf = new SparkConf();
sparkConf.setAppName("Spark-Cassandra Integration");
sparkConf.setMaster("local[4]");
sparkConf.set("spark.cassandra.connection.host", "stagingHost22");
sparkConf.set("spark.cassandra.connection.port", "9042");
sparkConf.set("spark.cassandra.connection.timeout_ms", "5000");
sparkConf.set("spark.cassandra.read.timeout_ms", "200000");
JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf);
String keySpaceName = "testKeySpace";
String tableName = "testTable";
CassandraJavaRDD<CassandraRow> cassandraRDD = CassandraJavaUtil.javaFunctions(javaSparkContext).cassandraTable(keySpaceName, tableName);
final ArrayList dataList = new ArrayList();
JavaRDD<String> userRDD = cassandraRDD.map(new Function<CassandraRow, String>() {
private static final long serialVersionUID = -165799649937652815L;
public String call(CassandraRow row) throws Exception {
System.out.println("Inside RDD call");
dataList.add(row);
return "test";
}
});
System.out.println( "data Size -" + dataList.size());
Cassandra and spark maven dependencies are -
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-mapping</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-extras</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>com.sparkjava</groupId>
<artifactId>spark-core</artifactId>
<version>2.5.4</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>2.0.0-M3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.3.0</version>
</dependency>
This is sure that stagingHost22 host has the cassandra data with keyspace - testKeySpace and table name - testTable. Find below query output -
cqlsh:testKeySpace> select count(*) from testTable;
count
34
(1 rows)
Can Anybody please suggest what am I missing here?
Thanks in advance.
Warm regards,
Vibhav
Your current code does not perform any Spark action. Therefore no data is loaded.
See the Spark documentation to understand the difference between transformations and actions in Spark:
http://spark.apache.org/docs/latest/programming-guide.html#rdd-operations
Furthermore adding CassandraRows to a ArrayList isn't something that is usally necessary when using the Cassandra connector. I would suggest to implement a simple select first (following the Spark-Cassandra-Connector documentation). If this is working you can extend this code as needed.
Check the following links on samples how to load data using the connector:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/2_loading.md
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/14_data_frames.md

Kafka stream not working in spark job

I wrote code to get data from "topicTest1" Kafka Queue. I am not able to print data from the consumer. Error occurred and mentioned below,
Below is my code to consume data,
public static void main(String[] args) throws Exception {
// StreamingExamples.setStreamingLogLevels();
SparkConf sparkConf = new SparkConf().setAppName("JavaKafkaWordCount").setMaster("local[*]");
;
// Create the context with 2 seconds batch size
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(100));
int numThreads = Integer.parseInt("3");
Map<String, Integer> topicMap = new HashMap<>();
String[] topics = "topicTest1".split(",");
for (String topic : topics) {
topicMap.put(topic, numThreads);
}
JavaPairReceiverInputDStream<String, String> messages = KafkaUtils.createStream(jssc, "9.98.171.226:9092", "1",
topicMap);
messages.print();
jssc.start();
jssc.awaitTermination();
}
Using following depedencies
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-twitter_2.11</artifactId>
<version>1.6.1</version>
</dependency>
Below error I got
Exception in thread "dispatcher-event-loop-0" java.lang.NoSuchMethodError: scala/Predef$.$conforms()Lscala/Predef$$less$colon$less; (loaded from file:/C:/Users/Administrator/.m2/repository/org/scala-lang/scala-library/2.10.5/scala-library-2.10.5.jar by sun.misc.Launcher$AppClassLoader#4b69b358) called from class org.apache.spark.streaming.scheduler.ReceiverSchedulingPolicy (loaded from file:/C:/Users/Administrator/.m2/repository/org/apache/spark/spark-streaming_2.11/1.6.2/spark-streaming_2.11-1.6.2.jar by sun.misc.Launcher$AppClassLoader#4b69b358).
at org.apache.spark.streaming.scheduler.ReceiverSchedulingPolicy.scheduleReceivers(ReceiverSchedulingPolicy.scala:138)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$receive$1.applyOrElse(ReceiverTracker.scala:450)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)16/11/14 13:38:00 INFO ForEachDStream: metadataCleanupDelay = -1
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.lang.Thread.run(Thread.java:785)
Another Error
Exception in thread "JobGenerator" java.lang.NoSuchMethodError: scala/Predef$.$conforms()Lscala/Predef$$less$colon$less; (loaded from file:/C:/Users/Administrator/.m2/repository/org/scala-lang/scala-library/2.10.5/scala-library-2.10.5.jar by sun.misc.Launcher$AppClassLoader#4b69b358) called from class org.apache.spark.streaming.scheduler.ReceivedBlockTracker (loaded from file:/C:/Users/Administrator/.m2/repository/org/apache/spark/spark-streaming_2.11/1.6.2/spark-streaming_2.11-1.6.2.jar by sun.misc.Launcher$AppClassLoader#4b69b358).
at org.apache.spark.streaming.scheduler.ReceivedBlockTracker.allocateBlocksToBatch(ReceivedBlockTracker.scala:114)
at org.apache.spark.streaming.scheduler.ReceiverTracker.allocateBlocksToBatch(ReceiverTracker.scala:203)
at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:247)
at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:246)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:246)
at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:181)
at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87)
at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Make sure that you use the correct versions. Lets say you use following maven dependecy:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.6.1</version>
</dependency>
So the artifact equals: spark-streaming-kafka_2.10
Now check if you use the correct Kafka version:
cd /KAFKA_HOME/libs
Now find kafka_YOUR-VERSION-sources.jar.
In case you have kafka_2.10-0xxxx-sources.jar you are fine! :)
If you use different versions, just change maven dependecies OR download the correct kafka version.
After that check your Spark version. Make sure you use the correct versions
groupId: org.apache.spark
artifactId: spark-core_2.xx
version: xxx

#XStreamOmitField for Restlet GAE not working

I have a POJO field annotated with #XStreamOmitField however when I look that the response of the ServerResource, the field is there.
Here is the code I have (simplified):
#Override
public ItemDTO getStuff() {
return stuff.getItem();
}
Here's the POM config I have:
<dependency>
<groupId>org.restlet.gae</groupId>
<artifactId>org.restlet</artifactId>
<version>${version.restlet}</version>
</dependency>
<dependency>
<groupId>org.restlet.gae</groupId>
<artifactId>org.restlet.ext.servlet</artifactId>
<version>${version.restlet}</version>
</dependency>
<dependency>
<groupId>org.restlet.gae</groupId>
<artifactId>org.restlet.ext.xstream</artifactId>
<version>${version.restlet}</version>
</dependency>
<dependency>
<groupId>org.restlet.gae</groupId>
<artifactId>org.restlet.ext.json</artifactId>
<version>${version.restlet}</version>
</dependency>
Version is, <version.restlet>2.3.1</version.restlet>
What could the the problem here? It should be automatic right?
Update:
My idea is this is caused by the APISpark library used in my app causing XStream to be not-used in favor of Jackson:
<dependency>
<groupId>org.restlet.gae</groupId>
<artifactId>org.restlet.ext.apispark</artifactId>
<version>${version.restlet}</version>
<exclusions>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.5.1</version>
</dependency>
Since the extension org.restlet.ext.apispark uses the extension org.restlet.ext.jackson, the latter automatically registers a converter in addition to the ones from the extensions org.restlet.ext.xstream and org.restlet.ext.json.
The following method allows you to see which converters are registered:
private void configureConverters() {
List<ConverterHelper> converters = Engine.getInstance()
.getRegisteredConverters();
for (ConverterHelper converterHelper : converters) {
System.out.println(converterHelper.getClass());
}
}
#Override
public Restlet createInboundRoot() {
configureConverters();
(...)
}
The registeration depends on the order in the classpath so I guess that the jackson one is registered first. This means that this is the one used to convert the response data of your request.
To be able to use the Jettison extension you need to manually remove the registered Jackson converter, as described below:
private void configureConverters() {
List<ConverterHelper> converters = Engine.getInstance()
.getRegisteredConverters();
JacksonConverter jacksonConverter = null;
for (ConverterHelper converterHelper : converters) {
System.err.println(converterHelper.getClass());
if (converterHelper instanceof JacksonConverter) {
jacksonConverter = (JacksonConverter) converterHelper;
break;
}
}
if (jacksonConverter != null) {
Engine.getInstance()
.getRegisteredConverters().remove(jacksonConverter);
}
}
Hope it helps you,
Thierry

Categories