I'm trying count the number of a user's original tweets after i've stored all of the tweets i've downloaded to a MongoDB database using storm. Anyways whenever i count the number of the authors original tweets using the following code,it keeps reading (and counting) the same tweet.
Bolt:
public class CalculateTheMetrics extends BaseBasicBolt {
Map<String,Double>OT1=new HashMap<String, Double>();
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("USERNAME","OT1"));
}
#Override
public void execute(Tuple input,BasicOutputCollector collector) {
String author=input.getString(0);
String tweet=input.getString(2);
Double OT1=this.OT1.get(author);
if(OT1==null){
OT1=0.0;
}
if(author!=null && tweet!=null ){
if(!tweet.startsWith("#") || !tweet.startsWith("RT")){
OT1+=1;
}
this.OT1.put(author,OT1);
System.out.println(author+" +OT1);
collector.emit(new Values(author,OT1))
}
}
Topology:
public class TheAuthorsAndTheirTweetData {
public static void main(String[]args) throws Exception{
TopologyBuilder topologyBuilder=new TopologyBuilder();
topologyBuilder.setSpout("READ_TWEET_DATA_FROM_MONGODB", new ReadLinesFromTextFile("tweets.txt"));
topologyBuilder.setBolt("TWEET_DATA_FROM_MONGODB_TO_FURTHER_PROCESSING",new FromMongoDBToProcessing()).shuffleGrouping("READ_TWEET_DATA_FROM_MONGODB");
topologyBuilder.setSpout("READ_THE_AUTHORS_FROM_TEXT_FILE",new ReadLastLineFromTextFile("authors.txt"));
topologyBuilder.setBolt("FROM_THE_AUTHORS_TEXT_FILE_TO_FURTHER_PROCESSING", new FromTheAuthorsTextFileToFurtherProcessing()).shuffleGrouping("READ_THE_AUTHORS_FROM_TEXT_FILE");
topologyBuilder.setBolt("SEARCH_FOR_THE_AUTHORS_TWEET_DATA",new SearchForTheAuthorsTweetData(),16).fieldsGrouping("TWEET_DATA_FROM_MONGODB_TO_FURTHER_PROCESSING",new Fields("USERNAME","ID")).fieldsGrouping("FROM_THE_AUTHORS_TEXT_FILE_TO_FURTHER_PROCESSING",new Fields("USERNAME","ID"));
topologyBuilder.setBolt("CALCULATE_THE_METRICS",new CalculateTheMetrics(),64).fieldsGrouping("SEARCH_FOR_THE_AUTHORS_TWEET_DATA",new Fields("USERNAME"));
Config config=new Config();
if(args!=null && args.length>0){
config.setNumWorkers(10);
config.setNumAckers(5);
config.setMaxSpoutPending(100);
StormSubmitter.submitTopology(args[0], config, topologyBuilder.createTopology());
}else{
LocalCluster localCluster=new LocalCluster();
localCluster.submitTopology("Test",config,topologyBuilder.createTopology());
Utils.sleep(1*60*60*1000);
localCluster.killTopology("Test");
localCluster.shutdown();
}
}
}
What I want is,for it to stop reading repeatedly the same tweet and counting the same tweet.Please help
Something like this?
public class Calculate1Metric extends BaseRichBolt {
private OutputCollector collector;
Map<String ,Integer>OT1;
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("username","OT1"));
}
#Override
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.collector=collector;
this.OT1=new HashMap<String, Integer>();
}
#Override
public void execute(Tuple input) {
final String sourceComponent = input.getSourceComponent();
String author = input.getString(0);
String tweet = input.getString(2);
if (author != null && tweet != null) {
Integer OT1 = this.OT1.get(author);
if (OT1 == null) {
OT1 = 0;
}
if (!tweet.startsWith("#") || !tweet.contains("RT ") || !tweet.startsWith("RT")) {
OT1 += 1;
}
if(!this.OT1.containsKey(author)) {
this.OT1.put(author, OT1);
}else{
collector.emit(new Values(author,OT1,OT2));
System.out.println(author + " " + OT1+" "+OT2);
this.OT1.remove(author);
}
}else{
collector.fail(input);
}
collector.ack(input);
}
Related
I am processing a stream of messages using Java KStream APIs. Currently I my code emits to get the output every 5 mins, but I want to get the out put at the top of the 5 min interval ( i.e 17:10, 17:15 etc)
Currently the interval depends on the time the program started. If the program starts at 17:08 the data gets collected at 17:13, 17:18, 17:23 etc intervals.
Is there a way that I can schedule so the data gets emitted at 5 min intervals which are multiples of 5?
class WindowedTransformerExample {
public static void main(String[] args) {
final StreamsBuilder builder = new StreamsBuilder();
final String stateStoreName = "stateStore";
final StoreBuilder<KeyValueStore<String, String>> keyValueStoreBuilder =
Stores.keyValueStoreBuilder(Stores.inMemoryKeyValueStore(stateStoreName),
Serdes.String(),
Serdes.String());
builder.addStateStore(keyValueStoreBuilder);
builder.<String, String>stream("topic").transform(new
WindowedTransformer(stateStoreName), stateStoreName)
.filter((k, v) -> k != null && v != null)
// Here's where you do something with records emitted after 10 minutes
.foreach((k, v)-> System.out.println());
}
static final class WindowedTransformer implements TransformerSupplier<String, String, KeyValue<String, String>> {
private final String storeName;
public WindowedTransformer(final String storeName) {
this.storeName = storeName;
}
#Override
public Transformer<String, String, KeyValue<String, String>> get() {
return new Transformer<String, String, KeyValue<String, String>>() {
private KeyValueStore<String, String> keyValueStore;
private ProcessorContext processorContext;
#Override
public void init(final ProcessorContext context) {
processorContext = context;
keyValueStore = (KeyValueStore<String, String>) context.getStateStore(storeName);
// could change this to PunctuationType.STREAM_TIME if needed
context.schedule(Duration.ofMinutes(5), PunctuationType.WALL_CLOCK_TIME, (ts) -> {
try(final KeyValueIterator<String, String> iterator = keyValueStore.all()) {
while (iterator.hasNext()) {
final KeyValue<String, String> keyValue = iterator.next();
processorContext.forward(keyValue.key, keyValue.value);
}
}
});
}
#Override
public KeyValue<String, String> transform(String key, String value) {
if (key != null) {
keyValueStore.put(key, value);
}
return null;
}
#Override
public void close() {
}
};
}
}
}
I'm using apache storm to create a topology that initially read a "stream" of tuple in a file, and next it split and store the tuples in mongodb.
I've a cluster on Atlas with a shared replica set. I've already developed the topology, and the solution works properly if I use a single thread.
public static StormTopology build() {
return buildWithSpout();
}
public static StormTopology buildWithSpout() {
Config config = new Config();
TopologyBuilder builder = new TopologyBuilder();
CsvSpout datasetSpout = new CsvSpout("file.txt");
SplitterBolt splitterBolt = new SplitterBolt(",");
PartitionMongoInsertBolt insertPartitionBolt = new PartitionMongoInsertBolt();
builder.setSpout(DATA_SPOUT_ID, datasetSpout, 1);
builder.setBolt(DEPENDENCY_SPLITTER_ID, splitterBolt, 1).shuffleGrouping(DATA_SPOUT_ID);
builder.setBolt(UPDATER_COUNTER_ID, insertPartitionBolt, 1).shuffleGrouping(DEPENDENCY_SPLITTER_ID);
}
However, when I use parallel processes, my persistor bolt don't save all tuples in mongodb, despite the tuples are correctly emitted by the previous bolt.
builder.setSpout(DATA_SPOUT_ID, datasetSpout, 1);
builder.setBolt(DEPENDENCY_SPLITTER_ID, splitterBolt, 3).shuffleGrouping(DATA_SPOUT_ID);
builder.setBolt(UPDATER_COUNTER_ID, insertPartitionBolt, 3).shuffleGrouping(DEPENDENCY_SPLITTER_ID);
This is my first bolt:
public class SplitterBolt extends BaseBasicBolt {
private String del;
private MongoConnector db = null;
public SplitterBolt(String del) {
this.del = del;
}
public void prepare(Map stormConf, TopologyContext context) {
db = MongoConnector.getInstance();
}
public void execute(Tuple input, BasicOutputCollector collector) {
String tuple = input.getStringByField("tuple");
int idTuple = Integer.parseInt(input.getStringByField("id"));
String opString = "";
String[] data = tuple.split(this.del);
for(int i=0; i < data.length; i++) {
OpenBitSet attrs = new OpenBitSet();
attrs.fastSet(i);
opString = Utility.toStringOpenBitSet(attrs, 5);
collector.emit(new Values(idTuple, opString, data[i]));
}
db.incrementCount();
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("idtuple","binaryattr","value"));
}
}
And this is my persistor bolt that store in mongo all tuples:
public class PartitionMongoInsertBolt extends BaseBasicBolt {
private MongoConnector mongodb = null;
public void prepare(Map stormConf, TopologyContext context) {
//Singleton Instance
mongodb = MongoConnector.getInstance();
}
public void execute(Tuple input, BasicOutputCollector collector) {
mongodb.insertUpdateTuple(input);
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {}
}
My only doubt is that I used a singleton pattern for the connection class to mongo. Can this be a problem?
UPDATE
This is my MongoConnector class:
public class MongoConnector {
private MongoClient mongoClient = null;
private MongoDatabase database = null;
private MongoCollection<Document> partitionCollection = null;
private static MongoConnector mongoInstance = null;
public MongoConnector() {
MongoClientURI uri = new MongoClientURI("connection string");
this.mongoClient = new MongoClient(uri);
this.database = mongoClient.getDatabase("db.database");
this.partitionCollection = database.getCollection("db.collection");
}
public static MongoConnector getInstance() {
if (mongoInstance == null)
mongoInstance = new MongoConnector();
return mongoInstance;
}
public void insertUpdateTuple2(Tuple tuple) {
int idTuple = (Integer) tuple.getValue(0);
String attrs = (String) tuple.getValue(1);
String value = (String) tuple.getValue(2);
value = value.replace('.', ',');
Bson query = Filters.eq("_id", attrs);
Document docIterator = this.partitionCollection.find(query).first();
if (docIterator != null) {
Bson newValue = new Document(value, idTuple);
Bson updateDocument = new Document("$push", newValue);
this.partitionCollection.updateOne(docIterator, updateDocument);
} else {
Document document = new Document();
document.put("_id", attrs);
ArrayList<Integer> partition = new ArrayList<Integer>();
partition.add(idTuple);
document.put(value, partition);
this.partitionCollection.insertOne(document);
}
}
}
SOLUTION UPDATE
I've solved the problem chainging this line:
this.partitionCollection.updateOne(docIterator, updateDocument);
in
this.partitionCollection.findOneAndUpdate(query, updateDocument);
I have a topology in which I am trying to count word occurrences which are being generated by SimulatorSpout (not real Stream) and after that write to MySQL database table, the table scheme is very simple:
Field | Type | ...
ID | int(11) | Auto_icr
word | varchar(50) |
count | int(11) |
But I am facing weird problem(as I beforementioned)
I successfully submitted The Topology to my Storm Cluster which consists of 4 supervisors, and I can see the flow of the Topology in Storm Web UI
(no exceptions) but when I checked the MySQL table, to my surprise, the table is empty...
ANY COMMENTS, SUGGESTIONS ARE WELCOME...
Here are spouts and bolts:
public class MySQLConnection {
private static Connection conn = null;
private static String dbUrl = "jdbc:mysql://192.168.0.2:3306/test?";
private static String dbClass = "com.mysql.jdbc.Driver";
public static Connection getConnection() throws SQLException, ClassNotFoundException {
Class.forName(dbClass);
conn = DriverManager.getConnection(dbUrl, "root", "qwe123");
return conn;
}
}
============================= SentenceSpout ===============================
public class SentenceSpout extends BaseRichSpout{
private static final long serialVersionUID = 1L;
private boolean _completed = false;
private SpoutOutputCollector _collector;
private String [] sentences = {
"Obama delivered a powerfull speech against USA",
"I like cold beverages",
"RT http://www.turkeyairline.com Turkish Airlines has delayed some flights",
"don't have a cow man...",
"i don't think i like fleas"
};
private int index = 0;
public void open (Map config, TopologyContext context, SpoutOutputCollector collector) {
_collector = collector;
}
public void nextTuple () {
_collector.emit(new Values(sentences[index]));
index++;
if (index >= sentences.length) {
index = 0;
Utils.waitForSeconds(1);
}
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("sentence"));
}
public void ack(Object msgId) {
System.out.println("OK: " + msgId);
}
public void close() {}
public void fail(Object msgId) {
System.out.println("FAIL: " + msgId);
}
}
============================ SplitSentenceBolt ==============================
public class SplitSentenceBolt extends BaseRichBolt {
private static final long serialVersionUID = 1L;
private OutputCollector _collector;
public void prepare (Map config, TopologyContext context, OutputCollector collector) {
_collector = collector;
}
public void execute (Tuple tuple) {
String sentence = tuple.getStringByField("sentence");
String httpRegex = "((https?|ftp|telnet|gopher|file)):((//)|(\\\\))+[\\w\\d:##%/;$()~_?\\+-=\\\\\\.&]*";
sentence = sentence.replaceAll(httpRegex, "").replaceAll("RT", "").replaceAll("[.|,]", "");
String[] words = sentence.split(" ");
for (String word : words) {
if (!word.isEmpty())
_collector.emit(new Values(word.trim()));
}
_collector.ack(tuple);
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}
=========================== WordCountBolt =================================
public class WordCountBolt extends BaseRichBolt {
private static final long serialVersionUID = 1L;
private HashMap<String , Integer> counts = null;
private OutputCollector _collector;
private ResultSet resSet = null;
private Statement stmt = null;
private Connection _conn = null;
private String path = "/home/hduser/logOfStormTops/logger.txt";
String rLine = null;
public void prepare (Map config, TopologyContext context, OutputCollector collector) {
counts = new HashMap<String, Integer>();
_collector = collector;
}
public void execute (Tuple tuple) {
int insertResult = 0;
int updateResult = 0;
String word = tuple.getStringByField("word");
//----------------------------------------------------
if (!counts.containsKey(word)) {
counts.put(word, 1);
try {
insertResult = wordInsertIfNoExist(word);
if (insertResult == 1) {
_collector.ack(tuple);
} else {
_collector.fail(tuple);
}
} catch (ClassNotFoundException e) {
e.printStackTrace();
} catch (SQLException e) {
e.printStackTrace();
}
} else {
//-----------------------------------------------
counts.put(word, counts.get(word) + 1);
try {
// writing to db
updateResult = updateCountOfExistingWord(word);
if (updateResult == 1) {
_collector.ack(tuple);
} else {
_collector.fail(tuple);
}
// Writing to file
BufferedWriter buffer = new BufferedWriter(new FileWriter(path));
buffer.write("[ " + word + " : " + counts.get("word") + " ]");
buffer.newLine();
buffer.flush();
buffer.close();
} catch (ClassNotFoundException e) {
e.printStackTrace();
} catch (SQLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("{word-" + word + " : count-" + counts.get(word) + "}");
}
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
}
// *****************************************************
public int wordInsertIfNoExist(String word) throws ClassNotFoundException, SQLException {
String query = "SELECT word FROM wordcount WHERE word=\"" + word + "\"";
String insert = "INSERT INTO wordcount (word, count) VALUES (\"" + word + "\", 1)";
_conn = MySQLConnection.getConnection();
stmt = _conn.createStatement();
resSet = stmt.executeQuery(query);
int res = 0;
if (!resSet.next()) {
res = stmt.executeUpdate(insert);
} else {
System.out.println("Yangi qiymatni kirityotrganda nimadir sodir bo'ldi");
}
resSet.close();
stmt.close();
_conn.close();
return res;
}
public int updateCountOfExistingWord(String word) throws ClassNotFoundException, SQLException {
String update = "UPDATE wordcount SET count=count+1 WHERE word=\"" + word + "\"";
_conn = MySQLConnection.getConnection();
stmt = _conn.createStatement();
int result = stmt.executeUpdate(update);
//System.out.println(word + "'s count has been updated (incremented)");
resSet.close();
stmt.close();
_conn.close();
return result;
}
}
========================= WordCountTopology ==============================
public class WordCountTopology {
private static final String SENTENCE_SPOUT_ID = "sentence-spout";
private static final String SPLIT_BOLT_ID = "split-bolt";
private static final String COUNT_BOLT_ID = "count-bolt";
private static final String TOPOLOGY_NAME = "NewWordCountTopology";
#SuppressWarnings("static-access")
public static void main(String[] args) throws AlreadyAliveException, InvalidTopologyException {
SentenceSpout spout = new SentenceSpout();
SplitSentenceBolt splitBolt = new SplitSentenceBolt();
WordCountBolt countBolt = new WordCountBolt();
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout(SENTENCE_SPOUT_ID, spout, 2);
builder.setBolt(SPLIT_BOLT_ID, splitBolt, 4).shuffleGrouping(SENTENCE_SPOUT_ID);
builder.setBolt(COUNT_BOLT_ID, countBolt, 4).fieldsGrouping(SPLIT_BOLT_ID, new Fields("word"));
Config config = new Config();
config.setMaxSpoutPending(100);
config.setDebug(true);
StormSubmitter submitter = new StormSubmitter();
submitter.submitTopology(TOPOLOGY_NAME, config, builder.createTopology());
}
}
It is because the _collector.ack(tuple) is not being called when there is exception thrown. When there are too many pending tuple, spout will stop sending new tuples. Try throwing out RuntimeException instead of printStackTrace.
Below is the code for my Implementation of a simple MapReduce Job using a custom writable comparable.
public class MapReduceKMeans {
public static class MapReduceKMeansMapper extends
Mapper<Object, Text, SongDataPoint, Text> {
public void map(Object key, Text value, Context context)
throws InterruptedException, IOException {
String str = value.toString();
// Reading Line one by one from the input CSV.
String split[] = str.split(",");
String trackId = split[0];
String title = split[1];
String artistName = split[2];
SongDataPoint songDataPoint =
new SongDataPoint(new Text(trackId), new Text(title),
new Text(artistName));
context.write(songDataPoint, new Text());
}
}
public static class MapReduceKMeansReducer extends
Reducer<SongDataPoint, Text, Text, NullWritable> {
public void reduce(SongDataPoint key, Iterable<Text> values,
Context context) throws IOException, InterruptedException {
StringBuilder sb = new StringBuilder();
sb.append(key.getTrackId()).append("\t").
append(key.getTitle()).append("\t")
.append(key.getArtistName()).append("\t");
String write = sb.toString();
context.write(new Text(write), NullWritable.get());
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 2) {
System.err
.println("Usage:<CsV Out Path> <Final Out Path>");
System.exit(2);
}
Job job = new Job(conf, "Song Data Trial");
job.setJarByClass(MapReduceKMeans.class);
job.setMapperClass(MapReduceKMeansMapper.class);
job.setReducerClass(MapReduceKMeansReducer.class);
job.setOutputKeyClass(SongDataPoint.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
When I debug my code reads all the rows in the CSV file but it does not enter the reduce job at all.
I also have made use of the SongDataPoint as my custom writable.
Its code is as below.
public class SongDataPoint implements WritableComparable<SongDataPoint> {
Text trackId;
Text title;
Text artistName;
public SongDataPoint() {
this.trackId = new Text();
this.title = new Text();
this.artistName = new Text();
}
public SongDataPoint(Text trackId, Text title, Text artistName) {
this.trackId = trackId;
this.title = title;
this.artistName = artistName;
}
#Override
public void readFields(DataInput in) throws IOException {
this.trackId.readFields(in);
this.title.readFields(in);
this.artistName.readFields(in);
}
#Override
public void write(DataOutput out) throws IOException {
}
public Text getTrackId() {
return trackId;
}
public void setTrackId(Text trackId) {
this.trackId = trackId;
}
public Text getTitle() {
return title;
}
public void setTitle(Text title) {
this.title = title;
}
public Text getArtistName() {
return artistName;
}
public void setArtistName(Text artistName) {
this.artistName = artistName;
}
#Override
public int compareTo(SongDataPoint o) {
// TODO Auto-generated method stub
int compare = getTrackId().compareTo(o.getTrackId());
return compare;
}
}
Any help is appreciated. Thanks.
Your output key class class as per Driver is SongDataPoint.class and output value class as Text.class but actually you are writing Text as key in Reducer and Nullwritable as value in Reducer.
you should also specify the Mapper output values as following.
job.setMapOutputKeyClass(SongDataPoint.class);
job.setMapOutputValueClass(Text.class);
My write method in my CustomWritable Class was left blank by mistake. It solved the problem after writing the proper code in it.
public void write(DataOutput out) throws IOException {
}
I'm trying to create a topology that has:1 spout that emits tweets and two bolts:
a TweetParserBolt that collects tweets
and UserParserBolt that collects the tweeters' username.
Suppose I've created a third bolt that anchors the TweetParserBolt and the UserParserBolt so that it can map the tweeter's username to a list of tweets that she/he have already posted.The problem I've encountered is that the bolt returns a null list of tweets.
Can anyone please help me understand what's wrong with the code
Below is my code for the topology and the three bolts:
public class TwitterTopology {
private static String consumerKey = "*********************";
private static String consumerSecret = "*****************";
private static String accessToken = "********************";
private static String accessTokenSecret = "****************";
public static void main(String [] args) throws Exception{
/*** SETUP ***/
String remoteClusterTopologyName = null;
if (args!=null) {
if (args.length==1) {
remoteClusterTopologyName = args[0];
}
// If credentials are provided as commandline arguments
else if (args.length==4) {
accessToken =args[0];
accessTokenSecret =args[1];
consumerKey =args[2];
consumerSecret =args[3];
}
}
/**************** ****************/
TopologyBuilder builder = new TopologyBuilder();
FilterQuery filterQuery = new FilterQuery();
filterQuery.track(new String[]{"#cloudcomputing"});
filterQuery.language(new String[]{"en"});
TwitterSpout spout = new TwitterSpout( accessToken, accessTokenSecret,consumerKey, consumerSecret, filterQuery);
builder.setSpout("TwitterSpout",spout,1);
builder.setBolt("TweetParserBolt",new TweetParserBolt(),4).shuffleGrouping("TwitterSpout");
builder.setBolt("UserMapperBolt",new UserParserBolt()).shuffleGrouping("TwitterSpout");
UserAndTweetsMapperBolt()).fieldsGrouping(("TweetParserBolt"), new Fields("username","tweet","bolt"))
.fieldsGrouping(("UserMapperBolt"),new Fields("username","tweet","bolt"));
Config conf = new Config();
conf.setDebug(true);
if (remoteClusterTopologyName!=null) {
conf.setNumWorkers(4);
StormSubmitter.submitTopology(remoteClusterTopologyName, conf, builder.createTopology());
}
else {
conf.setMaxTaskParallelism(3);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("test", conf, builder.createTopology());
Thread.sleep(460000);
cluster.shutdown();
}
}
public class TweetParserBolt extends BaseRichBolt {
private OutputCollector collector;
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer){
declarer.declare(new Fields("username","tweet","bolt"));
}
#Override
public void prepare(Map map,TopologyContext context,OutputCollector collector){
this.collector=collector;
}
#Override
public void execute(Tuple tuple){
Status tweet=(Status)tuple.getValue(0);
String username=tweet.getUser().getScreenName();
collector.emit(tuple,new Values(username,tweet,"tweet_parser_bolt"));
}
}
public class UserParserBolt extends BaseRichBolt{
private OutputCollector collector;
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer){
declarer.declare(new Fields("username","tweet"));
}
#Override
public void prepare(Map map,TopologyContext context,OutputCollector collector){
this.collector=collector;
}
#Override
public void execute(Tuple tuple){
Status tweet=(Status)tuple.getValue(0);
String username=tweet.getUser().getScreenName();
collector.emit(tuple,new Values(username,tweet,"user_parser_bolt"));
}
}
public class UserAndTweetsMapperBolt extends BaseRichBolt {
private OutputCollector collector;
List<Tuple>listOfTuples;
Map<String,Status>tempTweetsMap;
Map<String,List<Status>>UserAndTweetsMap;
List<Status>tweets;
List<String>tempUsers;
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer){
declarer.declare(new Fields("username","tweets"));
}
#Override
public void prepare(Map map,TopologyContext context,OutputCollector collector){
this.collector=collector;
this.listOfTuples=new ArrayList<Tuple>();
this.tempTweetsMap=new HashMap<String, Status>();
this.UserAndTweetsMap=new HashMap<String, List<Status>>();
this.tempUsers=new ArrayList<String>();
this.tweets=new ArrayList<Status>();
}
#Override
public void execute(Tuple tuple){
//String username=tuple.getStringByField("username");
//Status status=(Status)tuple.getValueByField("tweet");
String username=tuple.getValue(0).toString();
String sourceComponent=tuple.getSourceComponent();
if(sourceComponent.equals("TwitterParserBolt")){
String tempUser1=tuple.getValue(0).toString();
Status tempStatus1=(Status)tuple.getValue(1);
tempTweetsMap.put(tempUser1,tempStatus1);
}else if(sourceComponent.equals("UserParserBolt")){
String tempUser2=tuple.getValue(0).toString();
Status tempStatus2=(Status)tuple.getValue(1);
tempUsers.add(tempUser2);
}
for(int i=0;i<tempUsers.size();i++){
for(int j=0;j<tempTweetsMap.size();j++){
if(tempUsers.get(i).equals(tempTweetsMap.get(j).getUser().getScreenName())){
tweets.add(tempTweetsMap.get(j));
}
}
}
collector.emit(new Values(username,tweets));
}
}
You need to do a fields grouping on just the username in the bolt that combines them. If you group by all the fields as you're doing now, you may or may not get all the tweets for the same user in the same task. Also, your map will only capture the last status for any given user. If you want them all you need to make the value an array of statuses.
UserAndTweetsMapperBolt().
fieldsGrouping(("TweetParserBolt"), new Fields("username")).
fieldsGrouping(("UserMapperBolt"),new Fields("username"));