everyone.
This is my first post here, so, please, pardon my finesse skill of writing stack overflow questions.
I am having trouble using AdminClient from org.apache.kafka.clients.admin.AdminClient.
The issue at hand is this:
I initiate a secure connection to our broker server (running kafka 1.0.0) using SASL SSL.
it works just fine when I am running a consumer against that same broker with the same security settings. However when I am doing AdminClient stuff, it seems to have worked, but I see no traffic coming out of my machine to the broker server whatsoever in wireshark, and what I am trying to do does not happen on the broker side.
here is my code:
public class AclProvisioner {
//set up variables
private static Properties props = new Properties();
private static ClassLoader classloader = Thread.currentThread().getContextClassLoader();
static String mid = null;
static String topic = null;
public static void main(String... args) {
props.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, "kafkabroker.mydomain.com:9094");
props.put("security.protocol","SASL_SSL");
props.put("ssl.truststore.location", "C:\\Temp\\mydomain.root.jks" );
props.put("ssl.truststore.password","my_truststore_password");
props.put("sasl.mechanism","GSSAPI");
props.put("sasl.kerberos.service.name","kafka_admin_username");
AdminClient adminClient = AdminClient.create(props);
// generate ACLs
AclBinding newTopicReadAcl = new AclBinding( new Resource(ResourceType.TOPIC, "TestTopic"),
new AccessControlEntry("MY_TESTID", "*", AclOperation.READ, AclPermissionType.ALLOW) );
AclBinding newTopicDescribeAcl = new AclBinding( new Resource(ResourceType.TOPIC, "TestTopic"),
new AccessControlEntry("MY_TESTID", "*", AclOperation.DESCRIBE, AclPermissionType.ALLOW) );
AclBinding newGroupReadAcl = new AclBinding( new Resource(ResourceType.GROUP, "TestGroup"),
new AccessControlEntry("MY_TESTID", "*", AclOperation.READ, AclPermissionType.ALLOW) );
Collection<AclBinding> aclList = Arrays.asList(newTopicReadAcl, newTopicDescribeAcl, newGroupReadAcl);
adminClient.createAcls(aclList);
// create topic
int numPartitions = 6;
short replicasFactor = 2;
NewTopic newTopic = new NewTopic("Demo.JavaAdminClientTest", numPartitions, replicasFactor);
Map<String, String> configMap = new HashMap<>();
configMap.put(TopicConfig.CLEANUP_POLICY_CONFIG, TopicConfig.CLEANUP_POLICY_COMPACT);
configMap.put(TopicConfig.COMPRESSION_TYPE_CONFIG, "gzip");
newTopic.configs(configMap);
List<NewTopic> topics = Arrays.asList(newTopic);
adminClient.createTopics( topics );
}
If I ssh to the server itself and export my keytab and kinit, I am able to generate ACLs just fine using CLI method. I am also able to run a consumer using the same exact properties (as far as security goes).
Another thing I have discovered, is that if I put a server that does not exist or can not be reached, the program does fail, telling me that it could not resolve the BOOTSTRAP_SERVER_NAME.
same exact behavior happens if instead of ACL I attempt to create Topics. Once again, that does work just fine out of CLI.
I appreciate any pointers!
Cheers
All AdminClient methods are asynchronous and only return Future objects.
So if you don't explicitly wait on the futures to complete, your program just terminates before the AdminClient has time to send anything over the network.
You can use all() or values() on the CreateAclsResult [0] and CreateTopicsResults [1] to retrieve KafkaFuture [2] objects. Then use get() on them to wait for example.
[0] http://kafka.apache.org/11/javadoc/org/apache/kafka/clients/admin/CreateAclsResult.html
[1] http://kafka.apache.org/11/javadoc/org/apache/kafka/clients/admin/CreateTopicsResult.html
[2] http://kafka.apache.org/11/javadoc/org/apache/kafka/common/KafkaFuture.html
Related
I have created a Kafka Topic and pushed a message to it.
So
bin/kafka-console-consumer --bootstrap-server abc.xyz.com:9092 --topic myTopic --from-beginning --property print.key=true --property key.separator="-"
prints
key1-customer1
on the command line.
I want to create a Kafka Stream out of this topic and want to print this key1-customer1 on the console.
I wrote the following for it:
final Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "app-id");
streamsConfiguration.put(StreamsConfig.CLIENT_ID_CONFIG, "client-id");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "abc.xyz.com:9092");
streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
// Records should be flushed every 10 seconds. This is less than the default
// in order to keep this example interactive.
streamsConfiguration.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10 * 1000);
// For illustrative purposes we disable record caches
streamsConfiguration.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);
final StreamsBuilder builder = new StreamsBuilder();
final KStream<String, String> customerStream = builder.stream("myTopic");
customerStream.foreach(new ForeachAction<String, String>() {
public void apply(String key, String value) {
System.out.println(key + ": " + value);
}
});
final KafkaStreams streams = new KafkaStreams(builder.build(), streamsConfiguration);
streams.start();
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
This does not fail. However, this does not print anything on the console as this answer suggests.
I am new to Kafka. So any suggestions to make this work would help me a lot.
TL;DR Use Printed.
import org.apache.kafka.streams.kstream.Printed
val sysout = Printed
.toSysOut[String, String]
.withLabel("customerStream")
customerStream.print(sysout)
I would try unsetting CLIENT_ID_CONFIG and leaving only APPLICATION_ID_CONFIG. Kafka Streams uses the application ID to set the client ID.
I would also verify the offsets for the consumer group ID that your Kafka Streams application is using (this consumer group ID is also based on your application ID). Use the the kafka-consumer-groups.sh tool. It could be that your Streams application is ahead of all the records you've produced to that topic, possibly because of auto offset reset being set to latest, or possibly for some other reason not easily discernible from your question.
I am creating a app in Flink to
Read Messages from a topic
Do some simple process on it
Write Result to a different topic
My code does work, however it does not run in parallel
How do I do that?
It seems my code runs only on one thread/block?
On the Flink Web Dashboard:
App goes to running status
But, there is only one block shown in the overview subtasks
And Bytes Received / Sent, Records Received / Sent is always zero ( no Update )
Here is my code, please assist me in learning how to split my app to be able to run in parallel, and am I writing the app correctly?
public class SimpleApp {
public static void main(String[] args) throws Exception {
// create execution environment INPUT
StreamExecutionEnvironment env_in =
StreamExecutionEnvironment.getExecutionEnvironment();
// event time characteristic
env_in.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
// production Ready (Does NOT Work if greater than 1)
env_in.setParallelism(Integer.parseInt(args[0].toString()));
// configure kafka consumer
Properties properties = new Properties();
properties.setProperty("zookeeper.connect", "localhost:2181");
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("auto.offset.reset", "earliest");
// create a kafka consumer
final DataStream<String> consumer = env_in
.addSource(new FlinkKafkaConsumer09<>("test", new
SimpleStringSchema(), properties));
// filter data
SingleOutputStreamOperator<String> result = consumer.filter(new
FilterFunction<String>(){
#Override
public boolean filter(String s) throws Exception {
return s.substring(0, 2).contentEquals("PS");
}
});
// Process Data
// Transform String Records to JSON Objects
SingleOutputStreamOperator<JSONObject> data = result.map(new
MapFunction<String, JSONObject>()
{
#Override
public JSONObject map(String value) throws Exception
{
JSONObject jsnobj = new JSONObject();
if(value.substring(0, 2).contentEquals("PS"))
{
// 1. Raw Data
jsnobj.put("Raw_Data", value.substring(0, value.length()-6));
// 2. Comment
int first_index_comment = value.indexOf("$");
int last_index_comment = value.lastIndexOf("$") + 1;
// - set comment
String comment =
value.substring(first_index_comment, last_index_comment);
comment = comment.substring(0, comment.length()-6);
jsnobj.put("Comment", comment);
}
else {
jsnobj.put("INVALID", value);
}
return jsnobj;
}
});
// Write JSON to Kafka Topic
data.addSink(new FlinkKafkaProducer09<JSONObject>("localhost:9092",
"FilteredData",
new SimpleJsonSchema()));
env_in.execute();
}
}
My code does work, but it seems to run only on a single thread
( One block shown ) in web interface ( No passing of data, hence the bytes sent / received are not updated ).
How do I make it run in parallel ?
To run your job in parallel you can do 2 things:
Increase the parallelism of your job at the env level - i.e. do something like
StreamExecutionEnvironment env_in =
StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(4);
But this would only increase parallelism at flink end after it reads the data, so if the source is producing data faster it might not be fully utilized.
To fully parallelize your job, setup multiple partitions for your kafka topic, ideally the amount of parallelism you would want with your flink job. So, you might want to do something like below when you are creating your kafka topic:
bin/kafka-topics.sh --create --zookeeper localhost:2181
--replication-factor 3 --partitions 4 --topic test
I want to ensure whether kafka server is running or not before starting production and consumption jobs. It is in windows environment and here's my kafka server's code in eclipse...
Properties properties = new Properties();
properties.setProperty("broker.id", "1");
properties.setProperty("port", "9092");
properties.setProperty("log.dirs", "D://workspace//");
properties.setProperty("zookeeper.connect", "localhost:2181");
Option<String> option = Option.empty();
KafkaConfig config = new KafkaConfig(properties);
KafkaServer kafka = new KafkaServer(config, new CurrentTime(), option);
kafka.startup();
In this case if (kafka != null) is not enough because it is always true. So is there any way to know that my kafka server is running and ready for producer. It is necessary for me to check this because it causes loss of some starting data packets.
All Kafka brokers must be assigned a broker.id. On startup a broker will create an ephemeral node in Zookeeper with a path of /broker/ids/$id. As the node is ephemeral it will be removed as soon as the broker disconnects, e.g. by shutting down.
You can view the list of the ephemeral broker nodes like so:
echo dump | nc localhost 2181 | grep brokers
The ZooKeeper client interface exposes a number of commands; dump lists all the sessions and ephemeral nodes for the cluster.
Note, the above assumes:
You're running ZooKeeper on the default port (2181) on localhost, and that localhost is the leader for the cluster
Your zookeeper.connect Kafka config doesn't specify a chroot env for your Kafka cluster i.e. it's just host:port and not host:port/path
You can install Kafkacat tool on your machine
For example on Ubuntu You can install it using
apt-get install kafkacat
once kafkacat is installed then you can use following command to connect it
kafkacat -b <your-ip-address>:<kafka-port> -t test-topic
Replace <your-ip-address> with your machine ip
<kafka-port> can be replaced by the port on which kafka is running. Normally it is 9092
once you run the above command and if kafkacat is able to make the connection then it means that kafka is up and running
I used the AdminClient api.
Properties properties = new Properties();
properties.put("bootstrap.servers", "localhost:9092");
properties.put("connections.max.idle.ms", 10000);
properties.put("request.timeout.ms", 5000);
try (AdminClient client = KafkaAdminClient.create(properties))
{
ListTopicsResult topics = client.listTopics();
Set<String> names = topics.names().get();
if (names.isEmpty())
{
// case: if no topic found.
}
return true;
}
catch (InterruptedException | ExecutionException e)
{
// Kafka is not available
}
For Linux, "ps aux | grep kafka" see if kafka properties are shown in the results. E.g. /path/to/kafka/server.properties
Paul's answer is very good and it is actually how Kafka & Zk work together from a broker point of view.
I would say that another easy option to check if a Kafka server is running is to create a simple KafkaConsumer pointing to the cluste and try some action, for example, listTopics(). If kafka server is not running, you will get a TimeoutException and then you can use a try-catch sentence.
def validateKafkaConnection(kafkaParams : mutable.Map[String, Object]) : Unit = {
val props = new Properties()
props.put("bootstrap.servers", kafkaParams.get("bootstrap.servers").get.toString)
props.put("group.id", kafkaParams.get("group.id").get.toString)
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
val simpleConsumer = new KafkaConsumer[String, String](props)
simpleConsumer.listTopics()
}
The good option is to use AdminClient as below before starting to produce or consume the messages
private static final int ADMIN_CLIENT_TIMEOUT_MS = 5000;
try (AdminClient client = AdminClient.create(properties)) {
client.listTopics(new ListTopicsOptions().timeoutMs(ADMIN_CLIENT_TIMEOUT_MS)).listings().get();
} catch (ExecutionException ex) {
LOG.error("Kafka is not available, timed out after {} ms", ADMIN_CLIENT_TIMEOUT_MS);
return;
}
Firstly you need to create AdminClient bean:
#Bean
public AdminClient adminClient(){
Map<String, Object> configs = new HashMap<>();
configs.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG,
StringUtils.arrayToCommaDelimitedString(new Object[]{"your bootstrap server address}));
return AdminClient.create(configs);
}
Then, you can use this script:
while (true) {
Map<String, ConsumerGroupDescription> groupDescriptionMap =
adminClient.describeConsumerGroups(Collections.singletonList(groupId))
.all()
.get(10, TimeUnit.SECONDS);
ConsumerGroupDescription consumerGroupDescription = groupDescriptionMap.get(groupId);
log.debug("Kafka consumer group ({}) state: {}",
groupId,
consumerGroupDescription.state());
if (consumerGroupDescription.state().equals(ConsumerGroupState.STABLE)) {
boolean isReady = true;
for (MemberDescription member : consumerGroupDescription.members()) {
if (member.assignment() == null || member.assignment().topicPartitions().isEmpty()) {
isReady = false;
}
}
if (isReady) {
break;
}
}
log.debug("Kafka consumer group ({}) is not ready. Waiting...", groupId);
TimeUnit.SECONDS.sleep(1);
}
This script will check the state of the consumer group every second till the state will be STABLE. Because all consumers assigned to topic partitions, you can conclude that server is running and ready.
you can use below code to check for brokers available if server is running.
import org.I0Itec.zkclient.ZkClient;
public static boolean isBrokerRunning(){
boolean flag = false;
ZkClient zkClient = new ZkClient(endpoint.getZookeeperConnect(), 10000);//, kafka.utils.ZKStringSerializer$.MODULE$);
if(zkClient!=null){
int brokersCount = zkClient.countChildren(ZkUtils.BrokerIdsPath());
if(brokersCount > 0){
logger.info("Following Broker(s) {} is/are available on Zookeeper.",zkClient.getChildren(ZkUtils.BrokerIdsPath()));
flag = true;
}
else{
logger.error("ERROR:No Broker is available on Zookeeper.");
}
zkClient.close();
}
return flag;
}
I found an event OnError in confluent Kafka:
consumer.OnError += Consumer_OnError;
private void Consumer_OnError(object sender, Error e)
{
Debug.Log("connection error: "+ e.Reason);
ConsumerConnectionError(e);
}
And its documentation in code:
//
// Summary:
// Raised on critical errors, e.g. connection failures or all brokers down. Note
// that the client will try to automatically recover from errors - these errors
// should be seen as informational rather than catastrophic
//
// Remarks:
// Executes on the same thread as every other Consumer event handler (except OnLog
// which may be called from an arbitrary thread).
public event EventHandler<Error> OnError;
I'm trying to connect to SAP ECC 6.0 using JCo. I'm following this tutorial. However, there is a Note saying:
For this example the destination configuration is stored in a file that is called by the program. In practice you should avoid this for security reasons.
And that is reasonable and understood. But, there is no explenation how to set up secure destination provider.
I found solution in this thread that created custom implementation of DestinationDataProvider and that works on my local machine. But when I deploy it on Portal I get an error saying that there is already registered DestinationDataProvider.
So my question is:
How to store destination data in SAP Java EE application?
Here is my code to further clarify what I'm trying to do.
public static void main(String... args) throws JCoException {
CustomDestinationProviderMap provider = new CustomDestinationProviderMap();
com.sap.conn.jco.ext.Environment.registerDestinationDataProvider(provider);
Properties connectProperties = new Properties();
connectProperties.setProperty(DestinationDataProvider.JCO_ASHOST, "host.sap.my.domain.com");
connectProperties.setProperty(DestinationDataProvider.JCO_SYSNR, "00");
connectProperties.setProperty(DestinationDataProvider.JCO_CLIENT, "100");
connectProperties.setProperty(DestinationDataProvider.JCO_USER, "user");
connectProperties.setProperty(DestinationDataProvider.JCO_PASSWD, "password");
connectProperties.setProperty(DestinationDataProvider.JCO_LANG, "en");
provider.addDestination(DESTINATION_NAME1, connectProperties);
connect();
}
public static void connect() throws JCoException {
String FUNCTION_NAME = "BAPI_EMPLOYEE_GETDATA";
JCoDestination destination = JCoDestinationManager.getDestination(DESTINATION_NAME1);
JCoContext.begin(destination);
JCoFunction function = destination.getRepository().getFunction(FUNCTION_NAME);
if (function == null) {
throw new RuntimeException(FUNCTION_NAME + " not found in SAP.");
}
//function.getImportParameterList().setValue("EMPLOYEE_ID", "48");
function.getImportParameterList().setValue("FSTNAME_M", "ANAKIN");
function.getImportParameterList().setValue("LASTNAME_M", "SKYWALKER");
try {
function.execute(destination);
} catch (AbapException e) {
System.out.println(e.toString());
return;
}
JCoTable table = function.getTableParameterList().getTable("PERSONAL_DATA");
for (int i = 0; i < table.getNumRows(); i++) {
table.setRow(i);
System.out.println(table.getString("PERNO") + '\t' + table.getString("FIRSTNAME") + '\t' + table.getString("LAST_NAME")
+'\t' + table.getString("BIRTHDATE")+'\t' + table.getString("GENDER"));
}
JCoContext.end(destination);
}
Ok, so I got this up and going and thought I'd share my research.
You need to add your own destination in Portal. To achieve that you need to go to NetWeaver Administrator, located at: host:port/nwa. So it'll be something like sapportal.your.domain.com:50000/nwa.
Then you go to Configuration-> Infrastructure-> Destinations and add your destination there. You can leave empty most of the fields like Message Server. The important part is Destination name as it is how you will retrieve it and destination type which should be set to RFC Destination in my case. Try pinging your newly created destination to check if its up and going.
Finally you should be able to get destination by simply calling: JCoDestination destination = JCoDestinationManager.getDestination(DESTINATION_NAME); as it is added to your Portal environment and managed from there.
Take a look at the CustomDestinationDataProvider in the JCo examples of the Jco connector download. The important parts are:
static class MyDestinationDataProvider implements DestinationDataProvider
...
com.sap.conn.jco.ext.Environment.registerDestinationDataProvider(new MyDestinationDataProvider());
Then you can simply do:
instance = JCoDestinationManager.getDestination(DESTINATION_NAME);
Btw. you may also want to check out http://hibersap.org/ as they provide nice ways to store the config as well.
I am trying to let multiple java processes exchange events using Esper. One process should send events, the other prepares a query and reacts according to the reported events.
When both operations are done within the same java process, everything works fine. But when I use two different processes, they just don't see each other.
I am wondering what is the key for this communication. I used the same name for the provider. This is all I could do so far.
The Producer:
String aType = espertest.dummy.A.class.getName();
Configuration cepConfig = new Configuration();
cepConfig.addEventType("A",aType);
EPServiceProvider epService = EPServiceProviderManager.getProvider("DummyProvider", cepConfig);
Object o = new A();
epService.getEPRuntime().sendEvent(o);
The Consumer:
String aType = A.class.getName();
String expression = "select count(*) from "+aType + "";
System.out.println("Our Query: " + expression);
Configuration cepConfig = new Configuration();
cepConfig.addEventType("A",aType);
EPServiceProvider epService = EPServiceProviderManager.getProvider("DummyProvider", cepConfig);
EPStatement statement = epService.getEPAdministrator().createEPL(expression);
DummyListener listener = new DummyListener();
statement.addListener(listener);
System.out.println("Anything");
try{
A a = new A();
epService.getEPRuntime().sendEvent(a);
Thread.sleep(60000);
}catch(Exception E)
{
System.out.println("Exception ");
}
The consumer tries to count the events of type A. It also sends an instance of A as a test, and this works fine. The listener is called as expected.
The code above is just an excerpt.
You need to configure middleware (Message Queue, Distributed Cache, Networked FileSystem, Socket Connection, etc....) to get the events from the producer JVM to the consumer JVM. If you can deploy the producer and consumer to a container that supports Apache Camel (e.g. ServiceMix) then it should be trivial to stand up a prototype that uses ActiveMQ to transport your objects into Esper as Camel has support for both products.
JVM 1
From Data Source
To CEP Engine 1
To Message Queue
JVM 2 (also could host MQ Broker)
From Message Queue
To CEP Engine 2
To Destination
Update:
If the producer and consumer can be threads in the same JVM, then the issue may be in the consumer. I cannot see where the consumer does anything with the event from the producer. Try something like this instead (esper reference is provided to the producer/consumer and consumer is reworked with an update method to handle results of the select statement).
Test Driver:
public Driver() {
String aType = espertest.dummy.A.class.getName();
Configuration cepConfig = new Configuration();
cepConfig.addEventType("A",aType);
EPServiceProvider epService = EPServiceProviderManager.getProvider("DummyProvider", cepConfig);
Consumer c = new Consumer(epService);
Producer p = new Producer(epService);
}
Producer:
public Producer(EPServiceProvider epsp) {
Object o = new A();
epsp.getEPRuntime().sendEvent(o);
}
Consumer:
public Consumer(EPServiceProvider epsp) {
EPStatement statement = epsp.getEPAdministrator().createEPL(input);
statement.setSubscriber(this);
}
public void update(A event) {
System.out.println("Consumer received event!");
}