While trying to configure a newly created kafka topic, using java kafka adminClient, values are overwritten.
I have tried to set the same topic configuration using console commands and it works. Unfortunately when I try through Java code some values collide and are overwritten.
ConfigResource resource = new ConfigResource(ConfigResource.Type.TOPIC, topicName);
Map<ConfigResource, Config> updateConfig = new HashMap<>();
// update retention Bytes for this topic
ConfigEntry retentionBytesEntry = new ConfigEntry(TopicConfig.RETENTION_BYTES_CONFIG, String.valueOf(retentionBytes));
updateConfig.put(resource, new Config(Collections.singleton(retentionBytesEntry)));
// update retention ms for this topic
ConfigEntry retentionMsEntry = new ConfigEntry(TopicConfig.RETENTION_MS_CONFIG, String.valueOf(retentionMs));
updateConfig.put(resource, new Config(Collections.singleton(retentionMsEntry)));
// update segment Bytes for this topic
ConfigEntry segmentBytesEntry = new ConfigEntry(TopicConfig.SEGMENT_BYTES_CONFIG, String.valueOf(segmentbytes));
updateConfig.put(resource, new Config(Collections.singleton(segmentBytesEntry)));
// update segment ms for this topic
ConfigEntry segmentMsEntry = new ConfigEntry(TopicConfig.SEGMENT_MS_CONFIG, String.valueOf(segmentMs));
updateConfig.put(resource, new Config(Collections.singleton(segmentMsEntry)));
// Update the configuration
client.alterConfigs(updateConfig);
I expect the topic to have all given configuration values correctly.
Your logic is not working correctly because you call Map.put() several times with the same key. Hence only the last entry is kept.
The correct way to specify multiple topic configurations is to add them in the ConfigEntry object. Only after add the ConfigEntry to the Map.
For example:
// Your Topic Resource
ConfigResource cr = new ConfigResource(Type.TOPIC, "mytopic");
// Create all your configurations
Collection<ConfigEntry> entries = new ArrayList<>();
entries.add(new ConfigEntry(TopicConfig.SEGMENT_BYTES_CONFIG, String.valueOf(segmentbytes)));
entries.add(new ConfigEntry(TopicConfig.RETENTION_BYTES_CONFIG, String.valueOf(retentionBytes)));
...
// Create the Map
Config config = new Config(entries);
Map<ConfigResource, Config> configs = new HashMap<>();
configs.put(cr, config);
// Call alterConfigs()
admin.alterConfigs(configs);
Related
I am creating a Springboot app and struggling to understand why my GlobalKtable is not updating.
As far as I understand, the global table is supposed to update automatically when the source topic is updated. This is not the case for me.
I did notice the global table becomes populated with new data after I manually delete the state store folder.
I also noted the following error output when the Spring boot app is launched:
**2021-11-27 23:09:18.232 ERROR 17592 --- [ main] o.a.k.s.p.internals.StateDirectory : Failed to change
permissions for the directory d:\kafkastreamsdb
2021-11-27 23:09:18.233 ERROR 17592 --- [ main]
o.a.k.s.p.internals.StateDirectory : Failed to change
permissions for the directory d:\kafkastreamsdb\Kafka-streams**
Seems to me the reason why I only see all the current data in the GlobalKtable after deleting the statestore folder is because the stream is not writing to the state store while the stream is running, but recreates the state store from the source topic after deletion?
So, the issue here is when I try to use the global table as a look-up in the join below, the enriched stream returns NULL for the table values. However, when I delete the state store folder, and restart Springboot, the enriched stream does return the table values.
Just to clarify, new events are continuously sent to the source topic, but this data is only visible in the table after deletion.
Here is my code:
#Service
public class TopologyBuilder2 {
public static Topology build() {
StreamsBuilder builder = new StreamsBuilder();
// Register FakeAddress stream
KStream<String, FakeAddress> streamFakeAddress =
builder.stream("FakeAddress", Consumed.with(Serdes.String(), JsonSerdes.FakeAddress()));
GlobalKTable<String, Greetings> globalGreetingsTable = builder.globalTable(
"Greetings"
, Consumed.with(Serdes.String(), JsonSerdes.Greetings())
, Materialized.<String, Greetings, KeyValueStore<Bytes, byte[]>>as(
"GREETINGS" /* table/store name */)
.withKeySerde(Serdes.String()) /* key serde */
.withValueSerde(JsonSerdes.Greetings()) /* value serde */);
// LEFT Key mapper
KeyValueMapper<String, FakeAddress, String> keyMapperFakeAddress =
( leftkey, fakeAddress) -> {
// System.out.println(String.valueOf(fakeAddress.getCountry()));
return String.valueOf(fakeAddress.getCountry());
};
// Value joiner
ValueJoiner<FakeAddress, Greetings, EnrichedCountryGreeting> valueJoinerFakeAddressAndGreetings =
(fakeAddress, greetings) -> new EnrichedCountryGreeting(fakeAddress, greetings);
KStream<String, EnrichedCountryGreeting> enrichedStream
= streamFakeAddress.join(globalGreetingsTable, keyMapperFakeAddress, valueJoinerFakeAddressAndGreetings);
enrichedStream.print(Printed.<String, EnrichedCountryGreeting>toSysOut().withLabel("Stream-enrichedStream: "));
return builder.build();
}
}
I have a problem trying to run a DRPC topology containing one single bolt and query it through a local cluster. After debugging with IntelliJ, the bolt is indeed executed but the JCQueue is stuck in an infinite loop after that the bolt has been executed and until a timeout is sent to the server.
Here is the code used to build the topology builder:
public static LinearDRPCTopologyBuilder createBuilder()
{
var bolt = new MRedisLookupBolt(createRedisConfiguration(), new RedisTurnoverMapper());
var builder = new LinearDRPCTopologyBuilder("sales");
builder.addBolt(bolt, 1).localOrShuffleGrouping();
return builder;
}
The MRedisLookupBolt is just a very simple implementation of IBasicBolt executing a hget command against Jedis. The execute method of the MRedisLookupBolt is just emitting an instance of Values containing the value for two fields that are declared like this:
declarer.declare(new Fields("id", "Value"));
The topology is built and queried in an unit test like this:
Config conf = new Config();
conf.setDebug(true);
conf.setNumWorkers(1);
try(LocalDRPC drpc = new LocalDRPC())
{
LocalCluster cluster = new LocalCluster();
var builder = BasicRedisRPCTopology.createBuilder();
LocalCluster.LocalTopology topo = cluster.submitTopology(
"Sales-fetch", conf, builder.createLocalTopology(drpc));
var result = drpc.execute("sales", "XXXXX");
System.out.println("################ Result: " + result);
}
catch (Exception e)
{
e.printStackTrace();
}
When reading the logs, I am sure that the data is well red by the bolt and that everything is emitted
But at the end, I have this stack trace gently printed out by my test method. Of course, no value is allocated to the result variable and the process never reach the last print instructions:
There is something that I am missing here. What I understand: the JCQueue used by BoltExecutor to retrieve the id of which bolt to execute is never ending although there is only one parameters sent to the local DRPC and only one bolt declared into the topology. I have already tried to add more bolts to the topology or change the builder implementation used to create it but with no success.
I found a solution suitable for my use case using Apache Storm 2.1.0.
It seems that invoking the submitTopology method of the local cluster as proposed by the documentation does not end the executor correctly with version 2.1.0 using the LinearDRPCTopologyBuilder to build the topology.
By looking closer to the source code, it was possible to understand how to apply the LinearDRPCTopologyBuilder logic to the TopologyBuilder directly.
Here is the change applied to the createBuilder method:
public static TopologyBuilder createBuilder(ILocalDRPC localDRPC)
{
var spout = Optional.ofNullable(localDRPC)
.map(drpc -> new DRPCSpout("sales", drpc))
.orElse(new DRPCSpout("sales"));
var bolt = new MRedisLookupBolt(createRedisConfiguration(), new RedisTurnoverMapper());
var builder = new TopologyBuilder();
builder.setSpout("drpc", spout);
builder.setBolt("redisLookup", bolt, 1)
.shuffleGrouping("drpc");
builder.setBolt("return", new ReturnResults())
.shuffleGrouping("redisLookup");
return builder;
}
And here is an exemple of execution:
Config conf = new Config();
conf.setDebug(true);
conf.setNumWorkers(1);
try(LocalDRPC drpc = new LocalDRPC())
{
LocalCluster cluster = new LocalCluster();
var builder = BasicRedisRPCTopology.createBuilder(drpc);
cluster.submitTopology("Sales-fetch", conf, builder.createTopology());
var result = drpc.execute("sales", "XXXXX");
System.out.println("################ Result: " + result);
}
catch (Exception e)
{
e.printStackTrace();
}
Unfortunately this solution does not allow to use all the embedded tools of the LinearDRPCTopologyBuilder and implies to build all the topology flow 'by hand'. Is is necessary to change the mapper behavior to as the fields are not exposed in the same order as before.
I'm on a heatmap project for my university, we have to get some data (212Go) from a txt file (coordinates, height), then put it in HBase to retrieve it on a web client with Express.
I practiced using a 144Mo file, this is working :
SparkConf conf = new SparkConf().setAppName("PLE");
JavaSparkContext context = new JavaSparkContext(conf);
JavaRDD<String> data = context.textFile(args[0]);
Connection co = ConnectionFactory.createConnection(getConf());
createTable(co);
Table table = co.getTable(TableName.valueOf(TABLE_NAME));
Put put = new Put(Bytes.toBytes("KEY"));
for (String s : data.collect()) {
String[] tmp = s.split(",");
put.addImmutable(FAMILY,
Bytes.toBytes(tmp[2]),
Bytes.toBytes(tmp[0]+","+tmp[1]));
}
table.put(put);
But I now that I use the 212Go file, I got some memory errors, I guess the collect method gather all the data in memory, so 212Go is too much.
So now I'm trying this :
SparkConf conf = new SparkConf().setAppName("PLE");
JavaSparkContext context = new JavaSparkContext(conf);
JavaRDD<String> data = context.textFile(args[0]);
Connection co = ConnectionFactory.createConnection(getConf());
createTable(co);
Table table = co.getTable(TableName.valueOf(TABLE_NAME));
Put put = new Put(Bytes.toBytes("KEY"));
data.foreach(line ->{
String[] tmp = line.split(",");
put.addImmutable(FAMILY,
Bytes.toBytes(tmp[2]),
Bytes.toBytes(tmp[0]+","+tmp[1]));
});
table.put(put);
And I'm getting "org.apache.spark.SparkException: Task not serializable", I searched about it and tried some fixing, without success, upon what I read here : Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects
Actually I don't understand everything in this topic, I'm just a student, maybe the answer to my problem is obvious, maybe not, anyway thanks in advance !
As a rule of thumb - serializing database connections (any type) doesn't make sense. There are not designed to be serialized and deserialized, Spark or not.
Create connection for each partition:
data.foreachPartition(partition -> {
Connection co = ConnectionFactory.createConnection(getConf());
... // All required setup
Table table = co.getTable(TableName.valueOf(TABLE_NAME));
Put put = new Put(Bytes.toBytes("KEY"));
while (partition.hasNext()) {
String line = partition.next();
String[] tmp = line.split(",");
put.addImmutable(FAMILY,
Bytes.toBytes(tmp[2]),
Bytes.toBytes(tmp[0]+","+tmp[1]));
}
... // Clean connections
});
I also recommend reading Design Patterns for using foreachRDD from the official Spark Streaming programming guide.
I have many objects of a class Say Test which I want to write to Kafka and process them using spark streaming App. I want to use the Kryo Serialization.
My application is in Java
JavaDStream<Test> testData = KafkaUtils
.createDirectStream(context , keyClass,valueClass ,keyDecoderClass ,valueDecoderClass , props,topics);
My question is what should I put for keyClass,valueClass ,keyDecoderClass ,valueDecoderClass ?
Say if your topic is "String " and value is "Test" then first you would need to create TestEncoder and TestDecoder classes by implementing kafka.serializer.Encoder and kafka.serializer.Decoder. Now in your createDirectStream method you can have
JavaPairInputDStream<String, Test> testData = KafkaUtils
.createDirectStream(context, String.class,Test.class ,StringDecoder.class,TestDecoder.class,props,topics);
You can refer KafkaKryoEncoder at https://www.tomsdev.com/blog/2015/storm-kafka-complex-types/
In your Kafka producer you would need to register your custom Encoder class like
Properties properties = new Properties();
properties.put("metadata.broker.list", brokerList);
properties.put("serializer.class", "com.my.TestEncoder");
Producer<String, Test> producer = new Producer<String, Test>(new ProducerConfig(properties));
Test test = new Test();
KeyedMessage<String, Test> data = new KeyedMessage<String, Test>("myTopic", test);
producer.send(data);
I am using kafka 0.8 version and very much new to it.
I want to know the list of topics created in kafka server along with it's
metadata.
Is there any API available to find out this?
Basically, I need to write a Java consumer that should auto-discover any topic in kafka server.There is API to fetch TopicMetadata, but this needs name of topic as input
parameters.I need information for all topics present in server.
with Kafka 0.9.0
you can list the topics in the server with the provided consumer method listTopics();
eg.
Map<String, List<PartitionInfo> > topics;
Properties props = new Properties();
props.put("bootstrap.servers", "1.2.3.4:9092");
props.put("group.id", "test-consumer-group");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
topics = consumer.listTopics();
consumer.close();
I think this is the best way:
ZkClient zkClient = new ZkClient("zkHost:zkPort");
List<String> topics = JavaConversions.asJavaList(ZkUtils.getAllTopics(zkClient));
A good place to start would be the sample shell scripts shipped with Kafka.
In the /bin directory of the distribution there's some shell scripts you can use, one of which is ./kafka-topic-list.sh
If you run that without specifying a topic, it will return all topics with their metadata.
See:
https://github.com/apache/kafka/blob/0.8/bin/kafka-list-topic.sh
That shell script in turn runs:
https://github.com/apache/kafka/blob/0.8/core/src/main/scala/kafka/admin/ListTopicCommand.scala
The above are both references to the 0.8 Kafka version, so if you're using a different version (even a point difference), be sure to use the appropriate branch/tag on github
Using Scala:
import java.util.{Properties}
import org.apache.kafka.clients.consumer.KafkaConsumer
object KafkaTest {
def main(args: Array[String]): Unit = {
val brokers = args(0)
val props = new Properties();
props.put("bootstrap.servers", brokers);
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
val consumer = new KafkaConsumer[String, String](props);
val topics = consumer.listTopics().keySet();
println(topics)
}
}
If you want to pull broker or other-kafka information from Zookeeper then kafka.utils.ZkUtils provides a nice interface. Here is the code I have to list all zookeeper brokers (there are a ton of other methods there):
List<Broker> listBrokers() {
final ZkConnection zkConnection = new ZkConnection(connectionString);
final int sessionTimeoutMs = 10 * 1000;
final int connectionTimeoutMs = 20 * 1000;
final ZkClient zkClient = new ZkClient(connectionString,
sessionTimeoutMs,
connectionTimeoutMs,
ZKStringSerializer$.MODULE$);
final ZkUtils zkUtils = new ZkUtils(zkClient, zkConnection, false);
scala.collection.JavaConversions.seqAsJavaList(zkUtils.getAllBrokersInCluster());
}
You can use zookeeper API to get the list of brokers as mentioned below:
ZooKeeper zk = new ZooKeeper("zookeeperhost, 10000, null);
List<String> ids = zk.getChildren("/brokers/ids", false);
List<Map> brokerList = new ArrayList<>();
ObjectMapper objectMapper = new ObjectMapper();
for (String id : ids) {
Map map = objectMapper.readValue(zk.getData("/brokers/ids/" + id, false, null), Map.class);
brokerList.add(map);
}
Use this broker list to get all the topic using the following link
https://cwiki.apache.org/confluence/display/KAFKA/Finding+Topic+and+Partition+Leader