Read file using kafka file connector in java - java

I have written a very simple code in Java to read a file and send those records to Kafka topic. Everything is working as expected. But, instead of writing file , I want to use Kafka file connector. I did it in the past using REST proxy(curl) command but never tried in java. I need some help to do it.
I can see there is Kafka-connect api in Maven repository and I can add it in my pom.xml file. What should be my next step to integrate it in my java code.
My code to read file without Kafka connect :
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.Properties;
import java.util.Scanner;
public class SimpleProducer_ReadFile {
public static void main(String[] args) throws FileNotFoundException {
// System.out.println("Hello Kafka ");
// setting properties
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,StringSerializer.class.getName());
// create the producer
KafkaProducer<String, String> produce = new KafkaProducer<String, String>(props);
//reading file
File read = new File("C:\\Users\\\Desktop\\TestFile.txt");
Scanner scan = new Scanner(read);
while(scan.hasNextLine()){
String data = scan.nextLine();
System.out.println(data);
//create the producer record
ProducerRecord<String, String> record = new ProducerRecord<String, String>("test-topic",data);
//send data
produce.send(record);
}
//flush and close
produce.flush();
produce.close();
}
}

All you need is Kafka Connect and FileStreamSource connector that reads data from a file and sends it to Kafka.
In your case, the configuration should be
name=local-file-source
connector.class=FileStreamSource
tasks.max=1
file=/path/to/file.txt
topic=test-topic
Now the equivalent curl command would be:
curl -X POST \
-H "Content-Type: application/json" \
--data '{"name": "local-file-source", "config": {"connector.class":"FileStreamSource", "tasks.max":"1", "file":"path/to/file.txt", "topics":"test-topic" }}' http://localhost:8083/connectors
If you want to do this programmatically, simply send a POST request as above.

Related

Copy JSON file from Local to HDFS

import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class HdfsWriter extends Configured implements Tool {
public int run(String[] args) throws Exception {
//String localInputPath = args[0];
Path outputPath = new Path(args[0]); // ARGUMENT FOR OUTPUT_LOCATION
Configuration conf = getConf();
FileSystem fs = FileSystem.get(conf);
OutputStream os = fs.create(outputPath);
InputStream is = new BufferedInputStream(new FileInputStream("/home/acadgild/acadgild.txt")); //Data set is getting copied into input stream through buffer mechanism.
IOUtils.copyBytes(is, os, conf); // Copying the dataset from input stream to output stream
return 0;
}
public static void main(String[] args) throws Exception {
int returnCode = ToolRunner.run(new HdfsWriter(), args);
System.exit(returnCode);
}
}
Need to Move the data from Local to HDFS.
The above code I got from another blog , it's not working. can anyone help me on this.
Also i need to parse the Json using MR and group by DateTime and move to HDFS
Map Reduce is a distributed job processing framework
for each mapper local means the local filesytem on the node on which that mapper is running.
What you want is reading from local on a given node to be put on to HDFS and then processing it via MapReduce.
There are multiple tools available for copying from Local of one node to HDFS
hdfs put localPath HdfsPath (Shell script)
flume

Write to kafka Topic based on the content on content of record using kafkastreams

I'm trying to write from a topic(parent) to another topic(child) in kafka based on the content of the records in the parent.
A sample record if i consume from the parent topic is {"date":{"string":"2017-03-20"},"time":{"string":"20:04:13:563"},"event_nr":1572470,"interface":"Transaction Manager","event_id":5001,"date_time":1490040253563,"entity":"Transaction Manager","state":0,"msg_param_1":{"string":"ISWSnk"},"msg_param_2":{"string":"Application startup"},"msg_param_3":null,"msg_param_4":null,"msg_param_5":null,"msg_param_6":null,"msg_param_7":null,"msg_param_8":null,"msg_param_9":null,"long_msg_param_1":null,"long_msg_param_2":null,"long_msg_param_3":null,"long_msg_param_4":null,"long_msg_param_5":null,"long_msg_param_6":null,"long_msg_param_7":null,"long_msg_param_8":null,"long_msg_param_9":null,"last_sent":{"long":1490040253563},"transmit_count":{"int":1},"team_id":null,"app_id":{"int":4},"logged_by_app_id":{"int":4},"entity_type":{"int":3},"binary_data":null}.
I'll like to use the value of entity to write to a topic of the same name as the value of entity(There's a fixed amount of values of entity so i can statically create that if it's difficult programmatically to dynamically create topics). I'm trying to use this
import org.apache.kafka.common.serialization.Serde;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.KeyValue;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.KStreamBuilder;
import java.util.Properties;
public class entityDataLoader {
public static void main(final String[] args) throws Exception {
final Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "map-function-lambda-example");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
streamsConfiguration.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.ByteArray().getClass().getName());
streamsConfiguration.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
// Set up serializers and deserializers, which we will use for overriding the default serdes
// specified above.
final Serde<String> stringSerde = Serdes.String();
final Serde<byte[]> byteArraySerde = Serdes.ByteArray();
// In the subsequent lines we define the processing topology of the Streams application.
final KStreamBuilder builder = new KStreamBuilder();
// Read the input Kafka topic into a KStream instance.
final KStream<byte[], String> textLines = builder.stream(byteArraySerde, stringSerde, "postilion-events");
String content = textLines.toString();
String entity = JSONExtractor.returnJSONValue(content, "entity");
System.out.println(entity);
textLines.to(entity);
final KafkaStreams streams = new KafkaStreams(builder, streamsConfiguration);
streams.cleanUp();
streams.start();
// Add shutdown hook to respond to SIGTERM and gracefully close Kafka Streams
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
}
The content of content is org.apache.kafka.streams.kstream.internals.KStreamImpl#568db2f2 making it obvious that #KStream.toString() isn't the correct method to use to attempt to get the value of entity.
P.S. The JSONExtractor class is defined as
import org.json.simple.JSONObject;
import org.json.simple.parser.ParseException;
import org.json.simple.parser.JSONParser;
class JSONExtractor {
public static String returnJSONValue(String args, String value){
JSONParser parser = new JSONParser();
String app= null;
System.out.println(args);
try{
Object obj = parser.parse(args);
JSONObject JObj = (JSONObject)obj;
app= (String) JObj.get(value);
return app;
}
catch(ParseException pe){
System.out.println("No Object found");
System.out.println(pe);
}
return app;
}
}
You can use branch() to split your parent stream into "sub streams" and write each "sub stream" to one output topic (cf. http://docs.confluent.io/current/streams/developer-guide.html#stateless-transformations)
Your branch() must create a single "sub stream" for all you output topic, but because you know all you topics, this should not be a problem.
Also, for Kafka Streams, it's recommended to create all output topic before you start you application (cf. http://docs.confluent.io/current/streams/developer-guide.html#user-topics)

Sending Data to Kafka Producer

i am trying to read the 100k file and send it to kafka topic. Here is my Kafka Code Which sends data to Kafka-console-consumer. When i am sending data i am receiving the data like this
java.util.stream.ReferencePipeline$Head#e9e54c2
Here is the sample single record data what i am sending:
173|172686|548247079|837113012|0x548247079f|7|173|172686a|0|173|2059 22143|0|173|1|173|172686|||0|||7|0||7|||7|172686|allowAllServices|?20161231:22143|548247079||0|173||172686|5:2266490827:DCCInter;20160905152146;2784
Any suggestion to get the data which i had showned in above...Thanks
Code:
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Properties;
import java.util.Properties;
import java.util.concurrent.ExecutionException;
import java.util.stream.Stream;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
#SuppressWarnings("unused")
public class HundredKRecords {
private static String sCurrentLine;
public static void main(String args[]) throws InterruptedException, ExecutionException{
String fileName = "/Users/sreeeedupuganti/Downloads/octfwriter.txt";
//read file into stream, try-with-resources
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
stream.forEach(System.out::println);
kafka(stream.toString());
} catch (IOException e) {
e.printStackTrace();
}
}
public static void kafka(String stream) {
Properties props = new Properties();
props.put("metadata.broker.list", "localhost:9092");
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("partitioner.class","kafka.producer.DefaultPartitioner");
props.put("request.required.acks", "1");
ProducerConfig config = new ProducerConfig(props);
Producer<String, String> producer = new Producer<String, String>(config);
producer.send(new KeyedMessage<String, String>("test",stream));
producer.close();
}
}
Problem is in line kafka(stream.toString());
Java stream class doesn't override method toString. By default it returns getClass().getName() + '#' + Integer.toHexString(hashCode()). That's exactly that you recieve.
In order to receive in kafka the whole file, you have manually convert it to one String (array of bytes).
Please, note, that kafka has limit for message size.

Unable to push messages to apache kafka?

I am new to kafka and trying to run a sample apache java producer code to push data to kafka. I am able to create new topics through java but while pushing, I am getting an exception. Here is the code:
package kafkaTest;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.List;
import java.util.Properties;
import org.I0Itec.zkclient.ZkClient;
import org.I0Itec.zkclient.serialize.BytesPushThroughSerializer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
public class HelloKafkaProducer {
final static String TOPIC = "test_kafka1";
public static void main(String[] argv){
Properties properties = new Properties();
properties.put("metadata.broker.list", "172.25.37.66:9092");
ZkClient zkClient = new ZkClient("172.25.37.66:2181", 4000, 6000, new BytesPushThroughSerializer());
List<String> brokerList = zkClient.getChildren("/brokers/topics");
for(int i=0;i<brokerList.size();i++){
System.out.println(brokerList.get(i));
}
properties.put("zk.connect","172.25.37.66:2181");
properties.put("serializer.class","kafka.serializer.StringEncoder");
ProducerConfig producerConfig = new ProducerConfig(properties);
kafka.javaapi.producer.Producer<String,String> producer = new kafka.javaapi.producer.Producer<String, String>(producerConfig);
SimpleDateFormat sdf = new SimpleDateFormat();
KeyedMessage<String, String> message =new KeyedMessage<String, String>(TOPIC,"Test message from java program " + sdf.format(new Date()));
System.out.println(message);
producer.send(message);
/*Consumer consumerThread = new Consumer(TOPIC);
consumerThread.start();*/
}
}
And this is the stacktrace :
topic1
test_kafka1
topic11
test
test_kafka
KeyedMessage(test_kafka1,null,null,Test message from java program 4/5/15 1:30 PM)
Exception in thread "main" [2015-05-04 13:30:41,432] ERROR Failed to send requests for topics test_kafka1 with correlation ids in [0,12] (kafka.producer.async.DefaultEventHandler:97)
kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries.
at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90)
at kafka.producer.Producer.send(Producer.scala:77)
at kafka.javaapi.producer.Producer.send(Producer.scala:33)
at kafkaTest.HelloKafkaProducer.main(HelloKafkaProducer.java:54)
On the console, I see the [2015-05-04 18:55:29,959] INFO Closing socket connection to /172.17.70.73. (kafka.network.Processor) everytime I run the program. I am able to push and pull using console to the topics.
All help would be appreciated .
Thanks.
You don't connect to zookeeper in case of Kafka Producer. You have to connect to the Broker. For that purpose use the following property.
props.put("metadata.broker.list", "localhost:9092, broker1:9092");
Here I have used localhost in your case it will be 172.25.37.66

How to programmatically put data to the Google Appengine database from remote executable?

I would like to pre-fill and periodically put data to the Google Appengine database.
I would like to write a program in java and python that connect to my GAE service and upload data to my database.
How can I do that?
Thanks
Please use RemoteAPI for doing this programmatically.
In python, you can first configure the appengine_console.py as described here
Once you have that, you can launch and write the following commands in the python shell:
$ python appengine_console.py yourapp
>>> import yourdbmodelclassnamehere
>>> m = yourmodelclassnamehere(x='',y='')
>>> m.put()
And here is code from the java version which is self explanatory (directly borrowed from the remote api page on gae docs):
package remoteapiexample;
import com.google.appengine.api.datastore.DatastoreService;
import com.google.appengine.api.datastore.DatastoreServiceFactory;
import com.google.appengine.api.datastore.Entity;
import com.google.appengine.tools.remoteapi.RemoteApiInstaller;
import com.google.appengine.tools.remoteapi.RemoteApiOptions;
import java.io.IOException;
public class RemoteApiExample {
public static void main(String[] args) throws IOException {
String username = System.console().readLine("username: ");
String password =
new String(System.console().readPassword("password: "));
RemoteApiOptions options = new RemoteApiOptions()
.server("<your app>.appspot.com", 443)
.credentials(username, password);
RemoteApiInstaller installer = new RemoteApiInstaller();
installer.install(options);
try {
DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
System.out.println("Key of new entity is " +
ds.put(new Entity("Hello Remote API!")));
} finally {
installer.uninstall();
}
}
}

Categories