How to configure correct parallelism in persistor bolt? - java

I'm using apache storm to create a topology that initially read a "stream" of tuple in a file, and next it split and store the tuples in mongodb.
I've a cluster on Atlas with a shared replica set. I've already developed the topology, and the solution works properly if I use a single thread.
public static StormTopology build() {
return buildWithSpout();
}
public static StormTopology buildWithSpout() {
Config config = new Config();
TopologyBuilder builder = new TopologyBuilder();
CsvSpout datasetSpout = new CsvSpout("file.txt");
SplitterBolt splitterBolt = new SplitterBolt(",");
PartitionMongoInsertBolt insertPartitionBolt = new PartitionMongoInsertBolt();
builder.setSpout(DATA_SPOUT_ID, datasetSpout, 1);
builder.setBolt(DEPENDENCY_SPLITTER_ID, splitterBolt, 1).shuffleGrouping(DATA_SPOUT_ID);
builder.setBolt(UPDATER_COUNTER_ID, insertPartitionBolt, 1).shuffleGrouping(DEPENDENCY_SPLITTER_ID);
}
However, when I use parallel processes, my persistor bolt don't save all tuples in mongodb, despite the tuples are correctly emitted by the previous bolt.
builder.setSpout(DATA_SPOUT_ID, datasetSpout, 1);
builder.setBolt(DEPENDENCY_SPLITTER_ID, splitterBolt, 3).shuffleGrouping(DATA_SPOUT_ID);
builder.setBolt(UPDATER_COUNTER_ID, insertPartitionBolt, 3).shuffleGrouping(DEPENDENCY_SPLITTER_ID);
This is my first bolt:
public class SplitterBolt extends BaseBasicBolt {
private String del;
private MongoConnector db = null;
public SplitterBolt(String del) {
this.del = del;
}
public void prepare(Map stormConf, TopologyContext context) {
db = MongoConnector.getInstance();
}
public void execute(Tuple input, BasicOutputCollector collector) {
String tuple = input.getStringByField("tuple");
int idTuple = Integer.parseInt(input.getStringByField("id"));
String opString = "";
String[] data = tuple.split(this.del);
for(int i=0; i < data.length; i++) {
OpenBitSet attrs = new OpenBitSet();
attrs.fastSet(i);
opString = Utility.toStringOpenBitSet(attrs, 5);
collector.emit(new Values(idTuple, opString, data[i]));
}
db.incrementCount();
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("idtuple","binaryattr","value"));
}
}
And this is my persistor bolt that store in mongo all tuples:
public class PartitionMongoInsertBolt extends BaseBasicBolt {
private MongoConnector mongodb = null;
public void prepare(Map stormConf, TopologyContext context) {
//Singleton Instance
mongodb = MongoConnector.getInstance();
}
public void execute(Tuple input, BasicOutputCollector collector) {
mongodb.insertUpdateTuple(input);
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {}
}
My only doubt is that I used a singleton pattern for the connection class to mongo. Can this be a problem?
UPDATE
This is my MongoConnector class:
public class MongoConnector {
private MongoClient mongoClient = null;
private MongoDatabase database = null;
private MongoCollection<Document> partitionCollection = null;
private static MongoConnector mongoInstance = null;
public MongoConnector() {
MongoClientURI uri = new MongoClientURI("connection string");
this.mongoClient = new MongoClient(uri);
this.database = mongoClient.getDatabase("db.database");
this.partitionCollection = database.getCollection("db.collection");
}
public static MongoConnector getInstance() {
if (mongoInstance == null)
mongoInstance = new MongoConnector();
return mongoInstance;
}
public void insertUpdateTuple2(Tuple tuple) {
int idTuple = (Integer) tuple.getValue(0);
String attrs = (String) tuple.getValue(1);
String value = (String) tuple.getValue(2);
value = value.replace('.', ',');
Bson query = Filters.eq("_id", attrs);
Document docIterator = this.partitionCollection.find(query).first();
if (docIterator != null) {
Bson newValue = new Document(value, idTuple);
Bson updateDocument = new Document("$push", newValue);
this.partitionCollection.updateOne(docIterator, updateDocument);
} else {
Document document = new Document();
document.put("_id", attrs);
ArrayList<Integer> partition = new ArrayList<Integer>();
partition.add(idTuple);
document.put(value, partition);
this.partitionCollection.insertOne(document);
}
}
}
SOLUTION UPDATE
I've solved the problem chainging this line:
this.partitionCollection.updateOne(docIterator, updateDocument);
in
this.partitionCollection.findOneAndUpdate(query, updateDocument);

Related

Java Kafka stream processing tumbling windows

I am processing a stream of messages using Java KStream APIs. Currently I my code emits to get the output every 5 mins, but I want to get the out put at the top of the 5 min interval ( i.e 17:10, 17:15 etc)
Currently the interval depends on the time the program started. If the program starts at 17:08 the data gets collected at 17:13, 17:18, 17:23 etc intervals.
Is there a way that I can schedule so the data gets emitted at 5 min intervals which are multiples of 5?
class WindowedTransformerExample {
public static void main(String[] args) {
final StreamsBuilder builder = new StreamsBuilder();
final String stateStoreName = "stateStore";
final StoreBuilder<KeyValueStore<String, String>> keyValueStoreBuilder =
Stores.keyValueStoreBuilder(Stores.inMemoryKeyValueStore(stateStoreName),
Serdes.String(),
Serdes.String());
builder.addStateStore(keyValueStoreBuilder);
builder.<String, String>stream("topic").transform(new
WindowedTransformer(stateStoreName), stateStoreName)
.filter((k, v) -> k != null && v != null)
// Here's where you do something with records emitted after 10 minutes
.foreach((k, v)-> System.out.println());
}
static final class WindowedTransformer implements TransformerSupplier<String, String, KeyValue<String, String>> {
private final String storeName;
public WindowedTransformer(final String storeName) {
this.storeName = storeName;
}
#Override
public Transformer<String, String, KeyValue<String, String>> get() {
return new Transformer<String, String, KeyValue<String, String>>() {
private KeyValueStore<String, String> keyValueStore;
private ProcessorContext processorContext;
#Override
public void init(final ProcessorContext context) {
processorContext = context;
keyValueStore = (KeyValueStore<String, String>) context.getStateStore(storeName);
// could change this to PunctuationType.STREAM_TIME if needed
context.schedule(Duration.ofMinutes(5), PunctuationType.WALL_CLOCK_TIME, (ts) -> {
try(final KeyValueIterator<String, String> iterator = keyValueStore.all()) {
while (iterator.hasNext()) {
final KeyValue<String, String> keyValue = iterator.next();
processorContext.forward(keyValue.key, keyValue.value);
}
}
});
}
#Override
public KeyValue<String, String> transform(String key, String value) {
if (key != null) {
keyValueStore.put(key, value);
}
return null;
}
#Override
public void close() {
}
};
}
}
}

Save custom arraylist to Shared Prefrences with gson

I am trying to save the state of my app to shared prefrences. The information that I want to save is an arraylist of custom objects where each object (PatientInfo) contains a few string and 2 more custom arraylist (SkinPhotoInfo, TreatmentsInfo). I was able to save and load an array list of custom objects, but I was'nt able to save the arraylist that has arraylists in it.
Anyone got an idea of what is the easiest way to do it? The object itself is allready parcable if it helps in any way.
P. S. When is the best time to save to shared prefrences - onPause or onDelete?
Thank you for your help!!
PatientInfo:
public class PatientInfo implements Parcelable {
String name;
String skinType;
String notes;
String image;
ArrayList<SkinPhotoInfo> skinPhotos;
ArrayList<TreatmentsInfo> treatments;
Boolean showDeleteButton;
#Override
public int describeContents() {
return 0;
}
#Override
public void writeToParcel(Parcel dest, int flags) {
dest.writeString(name);
dest.writeString(skinType);
dest.writeString(notes);
dest.writeValue(image);
dest.writeValue(skinPhotos);
dest.writeValue(treatments);
}
public static final Creator<PatientInfo> CREATOR = new Creator<PatientInfo>()
{
#Override
public PatientInfo createFromParcel(Parcel source) {
PatientInfo ret = new PatientInfo();
ret.name = source.readString();
ret.skinType = source.readString();
ret.notes = source.readString();
ret.image = (String)source.readString();
ret.skinPhotos = source.readArrayList(null);
ret.treatments = source.readArrayList(null);
return ret;
}
#Override
public PatientInfo[] newArray(int size) {
return new PatientInfo[size];
}
};
public PatientInfo() {
this.name = "";
this.skinType = "";
this.image = "";
this.skinPhotos = new ArrayList<SkinPhotoInfo>();
this.showDeleteButton = false;
this.treatments = new ArrayList<TreatmentsInfo>();
}}
SkinPhotoInfo:
public class SkinPhotoInfo implements Parcelable {
String photoDate;
Boolean showDeleteButton;
Uri imageUri;
#Override
public int describeContents() {
return 0;
}
#Override
public void writeToParcel(Parcel dest, int flags) {
dest.writeString(photoDate);
dest.writeByte((byte)(showDeleteButton ? 1 : 0)); // If showDeleteButton == true, byte == 1
dest.writeValue(imageUri);
}
public static final Creator<SkinPhotoInfo> CREATOR = new Creator<SkinPhotoInfo>()
{
#Override
public SkinPhotoInfo createFromParcel(Parcel source) {
SkinPhotoInfo ret = new SkinPhotoInfo();
ret.skinImageThumnail = (Bitmap)source.readValue(Bitmap.class.getClassLoader());
ret.photoDate = source.readString();
ret.showDeleteButton = source.readByte() != 1;
ret.imageUri = (Uri) source.readValue(Uri.class.getClassLoader());
return ret;
}
#Override
public SkinPhotoInfo[] newArray(int size) {
return new SkinPhotoInfo[size];
}
};
public SkinPhotoInfo(Uri imageUri, String photoDate) {
this.imageUri = imageUri;
this.photoDate = photoDate;
showDeleteButton = false;
}}
TreatmentsInfo:
public class TreatmentsInfo implements Parcelable {
String treatmentDate;
String treatmentName;
String pattern = "MM-dd-yy";
Boolean showDeleteButton;
#Override
public int describeContents() {
return 0;
}
#Override
public void writeToParcel(Parcel dest, int flags) {
dest.writeString(treatmentDate);
dest.writeString(treatmentName);
dest.writeString(pattern);
dest.writeByte((byte)(showDeleteButton ? 1 : 0)); // If showDeleteButton == true, byte == 1
}
public static final Creator<TreatmentsInfo> CREATOR = new Creator<TreatmentsInfo>()
{
#Override
public TreatmentsInfo createFromParcel(Parcel source) {
TreatmentsInfo ret = new TreatmentsInfo();
ret.treatmentDate = source.readString();
ret.treatmentName = source.readString();
ret.pattern = source.readString();
ret.showDeleteButton = source.readByte() != 1;
return ret;
}
#Override
public TreatmentsInfo[] newArray(int size) {
return new TreatmentsInfo[size];
}
};
public TreatmentsInfo(){
this.treatmentDate = "";
this.treatmentName = "";
this.showDeleteButton = false;
this.pattern = "";
}
public TreatmentsInfo(String treatmentDate, String treatmentName) {
this.treatmentDate = treatmentDate;
this.treatmentName = treatmentName;
this.showDeleteButton = false;
}}
Use Gson library and save the arraylist as string.
Snippet below is save as file but you can use it in sharedpreference as well:
public static void saveGroupChatFile(File file, List<GCRoom> list) throws IOException {
String data = new Gson().toJson(list);
FileOutputStream fout = new FileOutputStream(file, false);
OutputStreamWriter osw = new OutputStreamWriter(fout);
osw.write(data);
osw.close();
}
public static List<GCRoom> readGroupChatFile(File file) throws IOException {
Type listType = new TypeToken<List<GCRoom>>() {
}.getType();
JsonReader reader = new JsonReader(new FileReader(file));
return new Gson().fromJson(reader, listType);
}
As for the library:
implementation 'com.google.code.gson:gson:2.8.5'
You can do something like:
String json = new Gson().toJson(YourObject);
To save in the Shared Preferences.
To retrieve the json and transform it to YourObejct, just do:
String json = myPrefsObject.getString(TAG, "");
return new Gson().fromJson(json, YourObject.class);
As for the PS question, the answer is onPause.
Let me know if you need something else
GSON provides method to convert objects to string and vice versa.
Use toJson() to convert object to string
PatientInfo patientInfo = new PatientInfo();
Gson gson = new Gson();
String objectAsString = gson.toJson(patientInfo);
Use fromJson() to convert string to object
Gson gson = new Gson();
PatientInfo patientinfo = gson.fromJson(data, PatientInfo.class);
//data is object that that you saved in shared preference after converting to string
Convert response to gson and use it as list and thus simply convert list setvalue and use putArray() to that set
public class staticpref{
private static SharedPreferences prefs;
private static SharedPreferences.Editor editor;
public static void putArray(String key, Set<String> arrayList){
editor.putStringSet(key, arrayList);
editor.commit();
}
public static Set getArray(String key,Set<String> defvalue){
return prefs.getStringSet(key,defvalue);
}
}
or you can make static class for getting and array you have to convert gson to arraylist and like this
String strResponse = anyjsonResponse;
Modelclass model= new Gson().fromJson(strResponse, Modelclass .class);
List<String> datalist= model.anyvalue();
Putandgetarray.addArrayList(datalist);
static methods for achieving this
public class Putandgetarray{
public static void addArrayList(List<data> dataList){
String strputdata = new Gson().toJson(dataList, new TypeToken<List<MutedChat>>() {}.getType());
SharedPreferenceUtils.putString("key", strputdata);
}
public static List<data> getArrayList(){
Type type = new TypeToken<List<data>>(){}.getType();
String strreturndata=SharedPreferenceUtils.getString("key","");
return new Gson().fromJson(strreturndata, type);
}
}
In sharedPreferece you can put only putStringSet(String key, #Nullable Set values); in sharedpreference

bolt that counts number of a user's original tweets

I'm trying count the number of a user's original tweets after i've stored all of the tweets i've downloaded to a MongoDB database using storm. Anyways whenever i count the number of the authors original tweets using the following code,it keeps reading (and counting) the same tweet.
Bolt:
public class CalculateTheMetrics extends BaseBasicBolt {
Map<String,Double>OT1=new HashMap<String, Double>();
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("USERNAME","OT1"));
}
#Override
public void execute(Tuple input,BasicOutputCollector collector) {
String author=input.getString(0);
String tweet=input.getString(2);
Double OT1=this.OT1.get(author);
if(OT1==null){
OT1=0.0;
}
if(author!=null && tweet!=null ){
if(!tweet.startsWith("#") || !tweet.startsWith("RT")){
OT1+=1;
}
this.OT1.put(author,OT1);
System.out.println(author+" +OT1);
collector.emit(new Values(author,OT1))
}
}
Topology:
public class TheAuthorsAndTheirTweetData {
public static void main(String[]args) throws Exception{
TopologyBuilder topologyBuilder=new TopologyBuilder();
topologyBuilder.setSpout("READ_TWEET_DATA_FROM_MONGODB", new ReadLinesFromTextFile("tweets.txt"));
topologyBuilder.setBolt("TWEET_DATA_FROM_MONGODB_TO_FURTHER_PROCESSING",new FromMongoDBToProcessing()).shuffleGrouping("READ_TWEET_DATA_FROM_MONGODB");
topologyBuilder.setSpout("READ_THE_AUTHORS_FROM_TEXT_FILE",new ReadLastLineFromTextFile("authors.txt"));
topologyBuilder.setBolt("FROM_THE_AUTHORS_TEXT_FILE_TO_FURTHER_PROCESSING", new FromTheAuthorsTextFileToFurtherProcessing()).shuffleGrouping("READ_THE_AUTHORS_FROM_TEXT_FILE");
topologyBuilder.setBolt("SEARCH_FOR_THE_AUTHORS_TWEET_DATA",new SearchForTheAuthorsTweetData(),16).fieldsGrouping("TWEET_DATA_FROM_MONGODB_TO_FURTHER_PROCESSING",new Fields("USERNAME","ID")).fieldsGrouping("FROM_THE_AUTHORS_TEXT_FILE_TO_FURTHER_PROCESSING",new Fields("USERNAME","ID"));
topologyBuilder.setBolt("CALCULATE_THE_METRICS",new CalculateTheMetrics(),64).fieldsGrouping("SEARCH_FOR_THE_AUTHORS_TWEET_DATA",new Fields("USERNAME"));
Config config=new Config();
if(args!=null && args.length>0){
config.setNumWorkers(10);
config.setNumAckers(5);
config.setMaxSpoutPending(100);
StormSubmitter.submitTopology(args[0], config, topologyBuilder.createTopology());
}else{
LocalCluster localCluster=new LocalCluster();
localCluster.submitTopology("Test",config,topologyBuilder.createTopology());
Utils.sleep(1*60*60*1000);
localCluster.killTopology("Test");
localCluster.shutdown();
}
}
}
What I want is,for it to stop reading repeatedly the same tweet and counting the same tweet.Please help
Something like this?
public class Calculate1Metric extends BaseRichBolt {
private OutputCollector collector;
Map<String ,Integer>OT1;
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("username","OT1"));
}
#Override
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.collector=collector;
this.OT1=new HashMap<String, Integer>();
}
#Override
public void execute(Tuple input) {
final String sourceComponent = input.getSourceComponent();
String author = input.getString(0);
String tweet = input.getString(2);
if (author != null && tweet != null) {
Integer OT1 = this.OT1.get(author);
if (OT1 == null) {
OT1 = 0;
}
if (!tweet.startsWith("#") || !tweet.contains("RT ") || !tweet.startsWith("RT")) {
OT1 += 1;
}
if(!this.OT1.containsKey(author)) {
this.OT1.put(author, OT1);
}else{
collector.emit(new Values(author,OT1,OT2));
System.out.println(author + " " + OT1+" "+OT2);
this.OT1.remove(author);
}
}else{
collector.fail(input);
}
collector.ack(input);
}

My Storm Topology neither working(not generating output) nor failing (not generating errors or exceptions)

I have a topology in which I am trying to count word occurrences which are being generated by SimulatorSpout (not real Stream) and after that write to MySQL database table, the table scheme is very simple:
Field | Type | ...
ID | int(11) | Auto_icr
word | varchar(50) |
count | int(11) |
But I am facing weird problem(as I beforementioned)
I successfully submitted The Topology to my Storm Cluster which consists of 4 supervisors, and I can see the flow of the Topology in Storm Web UI
(no exceptions) but when I checked the MySQL table, to my surprise, the table is empty...
ANY COMMENTS, SUGGESTIONS ARE WELCOME...
Here are spouts and bolts:
public class MySQLConnection {
private static Connection conn = null;
private static String dbUrl = "jdbc:mysql://192.168.0.2:3306/test?";
private static String dbClass = "com.mysql.jdbc.Driver";
public static Connection getConnection() throws SQLException, ClassNotFoundException {
Class.forName(dbClass);
conn = DriverManager.getConnection(dbUrl, "root", "qwe123");
return conn;
}
}
============================= SentenceSpout ===============================
public class SentenceSpout extends BaseRichSpout{
private static final long serialVersionUID = 1L;
private boolean _completed = false;
private SpoutOutputCollector _collector;
private String [] sentences = {
"Obama delivered a powerfull speech against USA",
"I like cold beverages",
"RT http://www.turkeyairline.com Turkish Airlines has delayed some flights",
"don't have a cow man...",
"i don't think i like fleas"
};
private int index = 0;
public void open (Map config, TopologyContext context, SpoutOutputCollector collector) {
_collector = collector;
}
public void nextTuple () {
_collector.emit(new Values(sentences[index]));
index++;
if (index >= sentences.length) {
index = 0;
Utils.waitForSeconds(1);
}
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("sentence"));
}
public void ack(Object msgId) {
System.out.println("OK: " + msgId);
}
public void close() {}
public void fail(Object msgId) {
System.out.println("FAIL: " + msgId);
}
}
============================ SplitSentenceBolt ==============================
public class SplitSentenceBolt extends BaseRichBolt {
private static final long serialVersionUID = 1L;
private OutputCollector _collector;
public void prepare (Map config, TopologyContext context, OutputCollector collector) {
_collector = collector;
}
public void execute (Tuple tuple) {
String sentence = tuple.getStringByField("sentence");
String httpRegex = "((https?|ftp|telnet|gopher|file)):((//)|(\\\\))+[\\w\\d:##%/;$()~_?\\+-=\\\\\\.&]*";
sentence = sentence.replaceAll(httpRegex, "").replaceAll("RT", "").replaceAll("[.|,]", "");
String[] words = sentence.split(" ");
for (String word : words) {
if (!word.isEmpty())
_collector.emit(new Values(word.trim()));
}
_collector.ack(tuple);
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}
=========================== WordCountBolt =================================
public class WordCountBolt extends BaseRichBolt {
private static final long serialVersionUID = 1L;
private HashMap<String , Integer> counts = null;
private OutputCollector _collector;
private ResultSet resSet = null;
private Statement stmt = null;
private Connection _conn = null;
private String path = "/home/hduser/logOfStormTops/logger.txt";
String rLine = null;
public void prepare (Map config, TopologyContext context, OutputCollector collector) {
counts = new HashMap<String, Integer>();
_collector = collector;
}
public void execute (Tuple tuple) {
int insertResult = 0;
int updateResult = 0;
String word = tuple.getStringByField("word");
//----------------------------------------------------
if (!counts.containsKey(word)) {
counts.put(word, 1);
try {
insertResult = wordInsertIfNoExist(word);
if (insertResult == 1) {
_collector.ack(tuple);
} else {
_collector.fail(tuple);
}
} catch (ClassNotFoundException e) {
e.printStackTrace();
} catch (SQLException e) {
e.printStackTrace();
}
} else {
//-----------------------------------------------
counts.put(word, counts.get(word) + 1);
try {
// writing to db
updateResult = updateCountOfExistingWord(word);
if (updateResult == 1) {
_collector.ack(tuple);
} else {
_collector.fail(tuple);
}
// Writing to file
BufferedWriter buffer = new BufferedWriter(new FileWriter(path));
buffer.write("[ " + word + " : " + counts.get("word") + " ]");
buffer.newLine();
buffer.flush();
buffer.close();
} catch (ClassNotFoundException e) {
e.printStackTrace();
} catch (SQLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("{word-" + word + " : count-" + counts.get(word) + "}");
}
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
}
// *****************************************************
public int wordInsertIfNoExist(String word) throws ClassNotFoundException, SQLException {
String query = "SELECT word FROM wordcount WHERE word=\"" + word + "\"";
String insert = "INSERT INTO wordcount (word, count) VALUES (\"" + word + "\", 1)";
_conn = MySQLConnection.getConnection();
stmt = _conn.createStatement();
resSet = stmt.executeQuery(query);
int res = 0;
if (!resSet.next()) {
res = stmt.executeUpdate(insert);
} else {
System.out.println("Yangi qiymatni kirityotrganda nimadir sodir bo'ldi");
}
resSet.close();
stmt.close();
_conn.close();
return res;
}
public int updateCountOfExistingWord(String word) throws ClassNotFoundException, SQLException {
String update = "UPDATE wordcount SET count=count+1 WHERE word=\"" + word + "\"";
_conn = MySQLConnection.getConnection();
stmt = _conn.createStatement();
int result = stmt.executeUpdate(update);
//System.out.println(word + "'s count has been updated (incremented)");
resSet.close();
stmt.close();
_conn.close();
return result;
}
}
========================= WordCountTopology ==============================
public class WordCountTopology {
private static final String SENTENCE_SPOUT_ID = "sentence-spout";
private static final String SPLIT_BOLT_ID = "split-bolt";
private static final String COUNT_BOLT_ID = "count-bolt";
private static final String TOPOLOGY_NAME = "NewWordCountTopology";
#SuppressWarnings("static-access")
public static void main(String[] args) throws AlreadyAliveException, InvalidTopologyException {
SentenceSpout spout = new SentenceSpout();
SplitSentenceBolt splitBolt = new SplitSentenceBolt();
WordCountBolt countBolt = new WordCountBolt();
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout(SENTENCE_SPOUT_ID, spout, 2);
builder.setBolt(SPLIT_BOLT_ID, splitBolt, 4).shuffleGrouping(SENTENCE_SPOUT_ID);
builder.setBolt(COUNT_BOLT_ID, countBolt, 4).fieldsGrouping(SPLIT_BOLT_ID, new Fields("word"));
Config config = new Config();
config.setMaxSpoutPending(100);
config.setDebug(true);
StormSubmitter submitter = new StormSubmitter();
submitter.submitTopology(TOPOLOGY_NAME, config, builder.createTopology());
}
}
It is because the _collector.ack(tuple) is not being called when there is exception thrown. When there are too many pending tuple, spout will stop sending new tuples. Try throwing out RuntimeException instead of printStackTrace.

Mapping username to tweets with storm

I'm trying to create a topology that has:1 spout that emits tweets and two bolts:
a TweetParserBolt that collects tweets
and UserParserBolt that collects the tweeters' username.
Suppose I've created a third bolt that anchors the TweetParserBolt and the UserParserBolt so that it can map the tweeter's username to a list of tweets that she/he have already posted.The problem I've encountered is that the bolt returns a null list of tweets.
Can anyone please help me understand what's wrong with the code
Below is my code for the topology and the three bolts:
public class TwitterTopology {
private static String consumerKey = "*********************";
private static String consumerSecret = "*****************";
private static String accessToken = "********************";
private static String accessTokenSecret = "****************";
public static void main(String [] args) throws Exception{
/*** SETUP ***/
String remoteClusterTopologyName = null;
if (args!=null) {
if (args.length==1) {
remoteClusterTopologyName = args[0];
}
// If credentials are provided as commandline arguments
else if (args.length==4) {
accessToken =args[0];
accessTokenSecret =args[1];
consumerKey =args[2];
consumerSecret =args[3];
}
}
/**************** ****************/
TopologyBuilder builder = new TopologyBuilder();
FilterQuery filterQuery = new FilterQuery();
filterQuery.track(new String[]{"#cloudcomputing"});
filterQuery.language(new String[]{"en"});
TwitterSpout spout = new TwitterSpout( accessToken, accessTokenSecret,consumerKey, consumerSecret, filterQuery);
builder.setSpout("TwitterSpout",spout,1);
builder.setBolt("TweetParserBolt",new TweetParserBolt(),4).shuffleGrouping("TwitterSpout");
builder.setBolt("UserMapperBolt",new UserParserBolt()).shuffleGrouping("TwitterSpout");
UserAndTweetsMapperBolt()).fieldsGrouping(("TweetParserBolt"), new Fields("username","tweet","bolt"))
.fieldsGrouping(("UserMapperBolt"),new Fields("username","tweet","bolt"));
Config conf = new Config();
conf.setDebug(true);
if (remoteClusterTopologyName!=null) {
conf.setNumWorkers(4);
StormSubmitter.submitTopology(remoteClusterTopologyName, conf, builder.createTopology());
}
else {
conf.setMaxTaskParallelism(3);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("test", conf, builder.createTopology());
Thread.sleep(460000);
cluster.shutdown();
}
}
public class TweetParserBolt extends BaseRichBolt {
private OutputCollector collector;
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer){
declarer.declare(new Fields("username","tweet","bolt"));
}
#Override
public void prepare(Map map,TopologyContext context,OutputCollector collector){
this.collector=collector;
}
#Override
public void execute(Tuple tuple){
Status tweet=(Status)tuple.getValue(0);
String username=tweet.getUser().getScreenName();
collector.emit(tuple,new Values(username,tweet,"tweet_parser_bolt"));
}
}
public class UserParserBolt extends BaseRichBolt{
private OutputCollector collector;
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer){
declarer.declare(new Fields("username","tweet"));
}
#Override
public void prepare(Map map,TopologyContext context,OutputCollector collector){
this.collector=collector;
}
#Override
public void execute(Tuple tuple){
Status tweet=(Status)tuple.getValue(0);
String username=tweet.getUser().getScreenName();
collector.emit(tuple,new Values(username,tweet,"user_parser_bolt"));
}
}
public class UserAndTweetsMapperBolt extends BaseRichBolt {
private OutputCollector collector;
List<Tuple>listOfTuples;
Map<String,Status>tempTweetsMap;
Map<String,List<Status>>UserAndTweetsMap;
List<Status>tweets;
List<String>tempUsers;
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer){
declarer.declare(new Fields("username","tweets"));
}
#Override
public void prepare(Map map,TopologyContext context,OutputCollector collector){
this.collector=collector;
this.listOfTuples=new ArrayList<Tuple>();
this.tempTweetsMap=new HashMap<String, Status>();
this.UserAndTweetsMap=new HashMap<String, List<Status>>();
this.tempUsers=new ArrayList<String>();
this.tweets=new ArrayList<Status>();
}
#Override
public void execute(Tuple tuple){
//String username=tuple.getStringByField("username");
//Status status=(Status)tuple.getValueByField("tweet");
String username=tuple.getValue(0).toString();
String sourceComponent=tuple.getSourceComponent();
if(sourceComponent.equals("TwitterParserBolt")){
String tempUser1=tuple.getValue(0).toString();
Status tempStatus1=(Status)tuple.getValue(1);
tempTweetsMap.put(tempUser1,tempStatus1);
}else if(sourceComponent.equals("UserParserBolt")){
String tempUser2=tuple.getValue(0).toString();
Status tempStatus2=(Status)tuple.getValue(1);
tempUsers.add(tempUser2);
}
for(int i=0;i<tempUsers.size();i++){
for(int j=0;j<tempTweetsMap.size();j++){
if(tempUsers.get(i).equals(tempTweetsMap.get(j).getUser().getScreenName())){
tweets.add(tempTweetsMap.get(j));
}
}
}
collector.emit(new Values(username,tweets));
}
}
You need to do a fields grouping on just the username in the bolt that combines them. If you group by all the fields as you're doing now, you may or may not get all the tweets for the same user in the same task. Also, your map will only capture the last status for any given user. If you want them all you need to make the value an array of statuses.
UserAndTweetsMapperBolt().
fieldsGrouping(("TweetParserBolt"), new Fields("username")).
fieldsGrouping(("UserMapperBolt"),new Fields("username"));

Categories