Synchronization Issue while using Apache Storm

Synchronization Issue while using Apache Storm - java

I am trying Apache Storm for Processing Streams of GeoHash Codes. I am using this library and Apache Storm 0.9.3. The geohash details for python can be found at enter link description here.
Currently, I am facing an synchronization issue in the execute method of one BOLT class. I have tried using a single bold, which gives me the correct output. But the moment I go from one Bolt thread to two or more. The output gets messed up.
The code snippet for one of the BOLT(Only this is having issues) is:
public static int PRECISION=6;
private OutputCollector collector;
BufferedReader br;
String lastGeoHash="NONE";
HashMap<String,Integer> map;
HashMap<String,String[]> zcd;
TreeMap<Integer,String> counts=new TreeMap<Integer,String>();
public void prepare( Map conf, TopologyContext context, OutputCollector collector )
{
String line="";
this.collector = collector;
map=new HashMap<String,Integer>();
zcd=new HashMap<String,String[]>();
try {
br = new BufferedReader(new FileReader("/tmp/zip_code_database.csv"));
int i=0;
while ((line = br.readLine()) != null) {
if(i==0){
String columns[]=line.split(",");
for(int j=0;j<columns.length;j++){
map.put(columns[j],j);
}
}else{
String []split=line.split(",");
zcd.put(split[map.get("\"zip\"")],new String[]{split[map.get("\"state\"")],split[map.get("\"primary_city\"")]});
}
i++;
}
br.close();
// System.out.println(zcd);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("Initialize");
initializeTreeMapAsPerOurRequirement(counts);
}
public void execute( Tuple tuple )
{
String completeFile = tuple.getStringByField("string");//So, this data is generated by Spout and it contains the complete shape file where each line is separated by a new line character i.e. "\n"
String lines[]=completeFile.split("\t");
String geohash=lines[0];
int count=Integer.parseInt(lines[1]);
String zip=lines[2];
String best="";
String city="";
String state="";
if(!(geohash.equals(lastGeoHash)) && !(lastGeoHash.equals("NONE"))){
//if(counts.size()!=0){
//System.out.println(counts.firstKey());
best=counts.get(counts.lastKey());
//System.out.println(geohash);
if(zcd.containsKey("\""+best+"\"")){
city = zcd.get("\""+best+"\"")[0];
state = zcd.get("\""+best+"\"")[1];
System.out.println(lastGeoHash+","+best+","+state+","+city+","+"US");
}else if(!best.equals("NONE")){
System.out.println(lastGeoHash);
city="MISSING";
state="MISSING";
}
// initializeTreeMapAsPerOurRequirement(counts);
//}else{
//System.out.println("else"+geohash);
//}
//}
}
lastGeoHash=geohash;
counts.put(count, zip);
collector.ack( tuple );
}
private void initializeTreeMapAsPerOurRequirement(TreeMap<Integer,String> counts){
counts.clear();
counts.put(-1,"NONE");
}
public void declareOutputFields( OutputFieldsDeclarer declarer )
{
System.out.println("here");
declarer.declare( new Fields( "number" ) );
}
Topology code is:
public static void main(String[] args)
{
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout( "spout", new SendWholeFileDataSpout(),2);
builder.setBolt( "map", new GeoHashBolt(),2).shuffleGrouping("spout");
builder.setBolt("reduce",new GeoHashReduceBolt(),2).fieldsGrouping("map", new Fields("value"));
Config conf = new Config();
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("test", conf, builder.createTopology());
Utils.sleep(10000);
cluster.killTopology("test");
cluster.shutdown();
}
Can someone look into the code and guide me a bit.

You have set the parallelism_hint to 2 for your spout and both of your bolts. It means 2 executers will run per component, which may mess-up your output.
By setting parallelism_hint to 1 you may achieve your desired output.

Related

Refactor for supplier classes

I'm looking to refactor two supplier classes as they both have very similar code. One provides and ArrayList and the other a Map. They are sorted in the configuration folder but I'm not sure thats the correct place. They both load data from a text file sorted in the project folder, which doesn't feel right to me.
The two supplier classes are:
#Component
public class ModulusWeightTableSupplier implements Supplier<List>{
private static final Logger LOGGER = LogManager.getLogger(CDLBankDetailsValidator.class);
private static final String MODULUS_WEIGHT_TABLE = "AccountModulus_Weight_Table.txt";
#Override
public List<ModulusWeightTableEntry> get(){
LOGGER.debug("Attempting to load modulus weight table " + MODULUS_WEIGHT_TABLE);
final List<ModulusWeightTableEntry> modulusWeightTable = new ArrayList<>();
try{
final InputStream in = new FileInputStream(MODULUS_WEIGHT_TABLE);
final BufferedReader br = new BufferedReader(new InputStreamReader(in));
String line;
while ((line = br.readLine()) != null) {
final String[] fields = line.split("\\s+");
modulusWeightTable.add(new ModulusWeightTableEntry(fields));
}
LOGGER.debug("Modulus weight table loaded");
br.close();
}
catch (final IOException e) {
throw new BankDetailsValidationRuntimeException("An error occurred loading the modulus weight table or sort code substitution table", e);
}
return modulusWeightTable;
}
}
and
#Component
public class SortCodeSubstitutionTableSupplier implements Supplier<Map> {
private static final Logger LOGGER = LogManager.getLogger(CDLBankDetailsValidator.class);
private static final String SORT_CODE_SUBSTITUTION_TABLE = "SCSUBTAB.txt";
#Override
public Map<String, String> get() {
LOGGER.debug("Attempting to load sort code substitution table " + SORT_CODE_SUBSTITUTION_TABLE);
final Map<String, String> sortCodeSubstitutionTable = new HashMap<>();
try {
final InputStream in = new FileInputStream(SORT_CODE_SUBSTITUTION_TABLE);
final BufferedReader br = new BufferedReader(new InputStreamReader(in));
String line;
while ((line = br.readLine()) != null) {
final String[] fields = line.split("\\s+");
sortCodeSubstitutionTable.put(fields[0], fields[1]);
}
LOGGER.debug("Sort code substitution table loaded");
br.close();
}
catch (final IOException e) {
throw new BankDetailsValidationRuntimeException("An error occurred loading the sort code substitution table", e);
}
return sortCodeSubstitutionTable;
}
}
Both classes have a lot of duplicate code. I'm trying to work out the best way to refactor them.

So your current code loads some configuration from textual files. Maybe best solution for this would be to go with properties or yaml files. This is most common approach for loading configuration data from external files.
Good starting point would be Spring Boot documentation for externalized configuration which provides information on how to use both properies and yaml files for loading configuration data in your application.

Java: Using a key from the ProductID HashMap, update the count of the purchases of this product

Should update the entry with key 5020 in products map increasing the count of purchases by 2.
I am not sure how to use ProductID.put to make the necessary changes to the entry for the Sales.txt file I have. The file's content displays perfectly but I have no idea how to update the file with the change.
I think I need use iterator at some point. but I am not familiar with HashMap.
public class StoreSales {
public static void main(String[] args) {
List<Customer> customer = new ArrayList<>();
try {
readFile("Sales.txt", customer);
} catch (Exception ex) {
ex.printStackTrace();
}
System.out.println(customer);
}
public static void readFile(String file, List<Customer> cust) throws IOException, ClassNotFoundException {
Map<Integer, Customer> CustomerID = new HashMap<>();
Map<Integer, Customer> ProductID = new HashMap<>();
try (BufferedReader in = new BufferedReader(new FileReader(file))) {
String line;
while ((line = in.readLine()) != null) {
String[] arr = line.split(" ");
cust.add(new Customer(Integer.parseInt(arr[0]), arr[1], arr[2], Integer.parseInt(arr[3]), arr[4], Double.parseDouble(arr[5]), Integer.parseInt(arr[6])));
if (CustomerID.containsKey(Integer.parseInt(arr[0]))) {
CustomerID.get(arr[0]).getSingleItemPrice();
}
if (ProductID.containsKey(Integer.parseInt(arr[3]))) {
ProductID.get(arr[3]).getItemsPurchased();
**this is the problem** //ProductID.put(, 2++);
}
}
}
}
}

Getting the object from the map will give you a reference to the object that you can manipulate.
Customer c = ProductID.get(arr[3]);
c.setItemsPurchased(c.getItemsPurchased() + 2);
But like Nishit pointed out in the comments the if statements will always be false because you never put anything in the maps.

ArrayOutofBoundsException - Attempting to read to/from file into Hash Map

I'm working on a homework assignment and have run into an odd "ArrayOutOfBoundsException" error - I know what the error means (essentially I'm trying to reference a location in an array that isn't there) but I'm not sure why it's throwing that error? I'm not sure what I'm missing, but obviously there must be some logic error somewhere that I'm not seeing.
PhoneDirectory.java
import java.util.HashMap;
import java.io.*;
class PhoneDirectory {
private HashMap<String, String> directoryMap;
File directory;
public PhoneDirectory() { //create file for phone-directory
directory = new File("phone-directory.txt");
directoryMap = new HashMap<String, String>();
try(BufferedReader buffer = new BufferedReader(new FileReader(directory))) {
String currentLine;
while((currentLine = buffer.readLine()) != null) { //set currentLine = buffer.readLine() and check if not null
String[] fileData = currentLine.split(","); //create array of values in text file - split by comma
directoryMap.put(fileData[0], fileData[1]); //add item to directoryMap
}
}
catch(IOException err) {
err.printStackTrace();
}
}
public PhoneDirectory(String phoneDirectoryFile) {
directory = new File(phoneDirectoryFile);
directoryMap = new HashMap<String, String>();
try(BufferedReader buffer = new BufferedReader(new FileReader(directory))) {
String currentLine;
while((currentLine = buffer.readLine()) != null) { //set currentLine = buffer.readLine() and check if not null
String[] fileData = currentLine.split(","); //create array of values in text file - split by comma
directoryMap.put(fileData[0], fileData[1]); //add item to directoryMap
}
}
catch(IOException err) {
err.printStackTrace();
}
}
public String Lookup(String personName) {
if(directoryMap.containsKey(personName))
return directoryMap.get(personName);
else
return "This person is not in the directory.";
}
public void AddOrChangeEntry(String name, String phoneNumber) {
//ASK IF "IF-ELSE" CHECK IS NECESSARY
if(directoryMap.containsKey(name))
directoryMap.put(name,phoneNumber); //if name is a key, update listing
else
directoryMap.put(name, phoneNumber); //otherwise - create new entry with name
}
public void DeleteEntry(String name) {
if(directoryMap.containsKey(name))
directoryMap.remove(name);
else
System.out.println("The person you are looking for is not in this directory.");
}
public void Write() {
try(BufferedWriter writeDestination = new BufferedWriter(new FileWriter(directory)))
{
for(String key : directoryMap.keySet())
{
writeDestination.write(key + ", " + directoryMap.get(key) + '\n');
writeDestination.newLine();
}
}
catch(IOException err) {
err.printStackTrace();
}
}
}
Driver.java
public class Driver {
PhoneDirectory list1;
public static void main(String[] args) {
PhoneDirectory list1 = new PhoneDirectory("test.txt");
list1.AddOrChangeEntry("Disney World","123-456-7890");
list1.Write();
}
}
Essentially I'm creating a file called "test.txt" and adding the line "Disney World, 123-456-7890" - what's weird is that the code still works - but it throws me that error anyway, so what's really happening? (For the record, I'm referring to the line(s): directoryMap.put(fileData[0], fileData[1]) - which would be line 14 and 28 respectively.)

Avoid repetition when writing strings to text file line by line

I use the following code to write strings to my simple text file:
EDITED:
private String fileLocation="/mnt/sdcard/out.txt";
public void saveHisToFile()
{
if (prefs.getBoolean("saveHis", true) && mWordHis != null && mWordHis.size() >= 1)
{
StringBuilder sbHis = new StringBuilder();
Set<String> wordSet= new HashSet<String>(mWordHis);
for (String item : wordSet)
{
sbHis.append(item);
sbHis.append("\n");
}
String strHis = sbHis.substring(0, sbHis.length()-1);
try {
BufferedWriter bw = new BufferedWriter(new FileWriter(new File(
fileLocation), true));
bw.write(strHis);
bw.newLine();
bw.close();
} catch (IOException e) {
}
}
}
The strings are successfully written to the text file, but weirdly, some strings are overwritten, such as:
apple
orange
grapes
grapes
grapes
apple
kiwi
My question is:
how can I stop a string being written more than once?
how can I stop writing a string (a line) to the file if it has already existed in the file?
I have consulted this post but failed to apply it to my case. Can you please give a little help? Thanks a lot in advance.

Try this:
public void saveHisToFile(Set<String> existingWords)
{
if (prefs.getBoolean("saveHis", true) && mWordHis != null && mWordHis.size() >= 1)
{
StringBuilder sbHis = new StringBuilder();
for (String item : mWordHis)
{
if (!existingWords.contains(item)) {
sbHis.append(item);
sbHis.append("\n");
}
}
String strHis = sbHis.substring(0, sbHis.length()-1);
try {
BufferedWriter bw = new BufferedWriter(new FileWriter(new File(
fileLocation), true));
bw.write(strHis);
bw.newLine();
bw.close();
} catch (IOException e) {
}
}
}

I guess mWordHis is a List, which can contain duplicate entries.
You can first convert it to a Set (which doesn't allow duplicates) and print only the words in the Set.
Set<String> wordSet= new HashSet<>(mWordHis);
for (String item : wordSet)
{
sbHis.append(item);
sbHis.append("\n");
}
As #fge commented, LinkedHashSet may also be used if insertion order matters.
If you need to run the same code several times with the same file, you must either save in memory all the records you've already wrote to the file, or read the file and get all data before writing to it.
Edit:
I can only think about trimming the words as some may contain unneeded spaces:
Set<String> wordSet= new HashSet<>();
for (String item : mWordHis){
wordSet.add(item.trim());
}

This is a complete example on how to solve your problem:
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.HashSet;
public class HisSaver {
private HashSet<String> uniqueTester = new HashSet<String>();
private String fileLocation="/mnt/sdcard/out.txt";
private static HisSaver instance = null;
private HisSaver() {
readWordsFromFile();
}
public static HisSaver getInstance() {
if(instance == null)
instance = new HisSaver();
return instance;
}
public void saveWord(String word) {
if(!uniqueTester.contains(word)) {
uniqueTester.add(word);
writeWordToFile(word);
}
}
private void writeWordToFile(String word) {
try {
BufferedWriter bw = new BufferedWriter(new FileWriter(new File(
fileLocation), true));
bw.write(word);
bw.newLine();
bw.close();
} catch (IOException e) {
}
}
private void readWordsFromFile() {
try {
BufferedReader br = new BufferedReader(new FileReader(new File(
fileLocation)));
String line;
while((line = br.readLine()) != null) {
if(!uniqueTester.contains(line)) {
uniqueTester.add(line);
}
}
} catch (IOException e) {
}
}
}
Now to use this, you simply do the following in your code:
HisSaver hs = HisSaver.getInstance();
hs.saveWord("newWord");
This will insert the "newWord" if and only if it is not already in your file, provided that no other function in your code accesses this file. Please note: this solution is NOT thread safe!!!
Edit: Explanation of what the code does:
We create a class HisSaver which is a singleton. This is realized by making it's constructor private and providing a static method getInstance() which returns an initialized HisSaver. This HisSaver will already contain all preexisting words in your file and thus only append new words to it. Calling the getInstance() method from another class will give you a handle for this singleton and allow you to call saveWord without having to worry whether you have the right object in your hands, since only one instance of it can ever be instantiated.

You could add all the strings into a HashMap and check for each new String if it is are already in there.
Example:
HashMap<String,String> test = new HashMap<String,String>();
if(!test.containsKey(item)) {
test.put(item,"");
// your processing: example
System.out.println(item);
} else {
// Your processing of duplicates, example:
System.out.println("Found duplicate of: " + item);
}
Edit: or use a HashSet as shown by the other solutions ...
HashSet<String> test = new HashSet<String>();
if(!test.contains(item)) {
test.add(item);
// your processing: example
System.out.println(item);
} else {
// Your processing of duplicates, example:
System.out.println("Found duplicate of: " + item);
}
Edit2:
private String fileLocation="/mnt/sdcard/out.txt";
public void saveHisToFile()
{
if (prefs.getBoolean("saveHis", true) && mWordHis != null && mWordHis.size() >= 1)
{
StringBuilder sbHis = new StringBuilder();
HashSet<String> test = new HashSet<String>();
Set<String> wordSet= new HashSet<String>(mWordHis);
for (String item : wordSet)
{
if(!test.contains(item)) {
test.add(item);
// your processing: example
sbHis.append(item+System.lineSeparator());
} else {
// Your processing of duplicates, example:
//System.out.println("Found duplicate of: " + item);
}
}
String strHis = sbHis.toString();
try {
BufferedWriter bw = new BufferedWriter(new FileWriter(new File(
fileLocation), true));
bw.write(strHis);
bw.newLine();
bw.close();
} catch (IOException e) {
}
}
}

Hadoop - Writing to HBase directly from the Mapper

I have a haddop job that its output should be written to HBase. I do not really needs reducer, the kind of row I would like to insert is determined in the Mapper.
How can I use TableOutputFormat to achieve this? From all the examples I have seen the assumption is that the reducer is the one creating the Put, and that TableMapper is just for reading from HBase table.
In my case the input is HDFS the output is Put to specific table, I cannot find anything in TableMapReduceUtil that can help me with that either.
Is there any example out there that can help me with that?
BTW, I am using the new Hadoop API

This is the example of reading from file and put all lines into Hbase. This example is from "Hbase: The definitive guide" and you can find it on repository. To get it just clone repo on your computer:
git clone git://github.com/larsgeorge/hbase-book.git
In this book you can also find all the explanations about the code. But if something is incomprehensible for you, feel free to ask.
` public class ImportFromFile {
public static final String NAME = "ImportFromFile";
public enum Counters { LINES }
static class ImportMapper
extends Mapper<LongWritable, Text, ImmutableBytesWritable, Writable> {
private byte[] family = null;
private byte[] qualifier = null;
#Override
protected void setup(Context context)
throws IOException, InterruptedException {
String column = context.getConfiguration().get("conf.column");
byte[][] colkey = KeyValue.parseColumn(Bytes.toBytes(column));
family = colkey[0];
if (colkey.length > 1) {
qualifier = colkey[1];
}
}
#Override
public void map(LongWritable offset, Text line, Context context)
throws IOException {
try {
String lineString = line.toString();
byte[] rowkey = DigestUtils.md5(lineString);
Put put = new Put(rowkey);
put.add(family, qualifier, Bytes.toBytes(lineString));
context.write(new ImmutableBytesWritable(rowkey), put);
context.getCounter(Counters.LINES).increment(1);
} catch (Exception e) {
e.printStackTrace();
}
}
}
private static CommandLine parseArgs(String[] args) throws ParseException {
Options options = new Options();
Option o = new Option("t", "table", true,
"table to import into (must exist)");
o.setArgName("table-name");
o.setRequired(true);
options.addOption(o);
o = new Option("c", "column", true,
"column to store row data into (must exist)");
o.setArgName("family:qualifier");
o.setRequired(true);
options.addOption(o);
o = new Option("i", "input", true,
"the directory or file to read from");
o.setArgName("path-in-HDFS");
o.setRequired(true);
options.addOption(o);
options.addOption("d", "debug", false, "switch on DEBUG log level");
CommandLineParser parser = new PosixParser();
CommandLine cmd = null;
try {
cmd = parser.parse(options, args);
} catch (Exception e) {
System.err.println("ERROR: " + e.getMessage() + "\n");
HelpFormatter formatter = new HelpFormatter();
formatter.printHelp(NAME + " ", options, true);
System.exit(-1);
}
return cmd;
}
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
String[] otherArgs =
new GenericOptionsParser(conf, args).getRemainingArgs();
CommandLine cmd = parseArgs(otherArgs);
String table = cmd.getOptionValue("t");
String input = cmd.getOptionValue("i");
String column = cmd.getOptionValue("c");
conf.set("conf.column", column);
Job job = new Job(conf, "Import from file " + input + " into table " + table);
job.setJarByClass(ImportFromFile.class);
job.setMapperClass(ImportMapper.class);
job.setOutputFormatClass(TableOutputFormat.class);
job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, table);
job.setOutputKeyClass(ImmutableBytesWritable.class);
job.setOutputValueClass(Writable.class);
job.setNumReduceTasks(0);
FileInputFormat.addInputPath(job, new Path(input));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}`

You just need to make the mapper output the pair. OutputFormat only specifies how to persist the output key-values. It does not necessarily mean that the key values come from reducer.
You would need to do something like this in the mapper:
... extends TableMapper<ImmutableBytesWritable, Put>() {
...
...
context.write(<some key>, <some Put or Delete object>);
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Synchronization Issue while using Apache Storm - java

You have set the parallelism_hint to 2 for your spout and both of your bolts. It means 2 executers will run per component, which may mess-up your output. By setting parallelism_hint to 1 you may achieve your desired output.

Related

Refactor for supplier classes

Java: Using a key from the ProductID HashMap, update the count of the purchases of this product

ArrayOutofBoundsException - Attempting to read to/from file into Hash Map

Avoid repetition when writing strings to text file line by line

Hadoop - Writing to HBase directly from the Mapper

Categories

Resources