I am new to storm but still i have configured storm on my local machine. I made an eclipse project and followed a simple example from internet. Now my topology is getting submitted but its not working.
Was topology submitted?
Yeah it was submitted successfully as I can see it on storm ui.
Work of my topology is to just print a number if it is a prime number. But its not printing it.
I have provided my code as follows:
Spout Class:
public class NumberSpout extends BaseRichSpout
{
private SpoutOutputCollector collector;
private static final Logger LOGGER = Logger.getLogger(SpoutOutputCollector.class);
private static int currentNumber = 1;
#Override
public void open( Map conf, TopologyContext context, SpoutOutputCollector collector )
{
this.collector = collector;
}
#Override
public void nextTuple()
{
// Emit the next number
LOGGER.info("Coming in spout tuble method");
collector.emit( new Values( new Integer( currentNumber++ ) ) );
}
#Override
public void ack(Object id)
{
}
#Override
public void fail(Object id)
{
}
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer)
{
declarer.declare( new Fields( "number" ) );
}
}
Bolt Class:
public class PrimeNumberBolt extends BaseRichBolt
{ private static final Logger LOGGER = Logger.getLogger(PrimeNumberBolt.class);
private OutputCollector collector;
public void prepare( Map conf, TopologyContext context, OutputCollector collector )
{
this.collector = collector;
}
public void execute( Tuple tuple )
{
int number = tuple.getInteger( 0 );
if( isPrime( number) )
{
LOGGER.info("Prime number printed is: )" +number);
System.out.println( number );
}
collector.ack( tuple );
}
public void declareOutputFields( OutputFieldsDeclarer declarer )
{
declarer.declare( new Fields( "number" ) );
}
private boolean isPrime( int n )
{
if( n == 1 || n == 2 || n == 3 )
{
return true;
}
// Is n an even number?
if( n % 2 == 0 )
{
return false;
}
//if not, then just check the odds
for( int i=3; i*i<=n; i+=2 )
{
if( n % i == 0)
{
return false;
}
}
return true;
}
}
Topology Class:
public class PrimeNumberTopology
{
public static void main(String[] args)
{
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout( "spout", new NumberSpout(),1 );
builder.setBolt( "prime", new PrimeNumberBolt(),1 )
.shuffleGrouping("spout");
Config conf = new Config();
conf.put(Config.NIMBUS_HOST, "127.0.0.1");
conf.setDebug(true);
Map storm_conf = Utils.readStormConfig();
storm_conf.put("nimbus.host", "127.0.0.1");
Client client = NimbusClient.getConfiguredClient(storm_conf)
.getClient();
String inputJar = "/home/jamil/Downloads/storm-twitter-word-count-master/target/storm-test-1.0-SNAPSHOT.jar";
NimbusClient nimbus = new NimbusClient("127.0.0.1",6627);
// upload topology jar to Cluster using StormSubmitter
String uploadedJarLocation = StormSubmitter.submitJar(storm_conf,
inputJar);
try {
String jsonConf = JSONValue.toJSONString(storm_conf);
nimbus.getClient().submitTopology("newtesttopology",
uploadedJarLocation, jsonConf, builder.createTopology());
} catch (AlreadyAliveException ae) {
ae.printStackTrace();
} catch (InvalidTopologyException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (TException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Now I want to ask that why its not printing? Or why its not writing it to log files?
PLUS: I am submitting topology from eclipse.
In addition to what #Thomas Jungblut said (regarding your log4j configuration) and assuming that is the complete source code of your topology, then have a look at your nextTuple() method of your spout.
Your spout is simply emitting one value and thats it. Great chances that you are missing the output of that emitting in your console because it is buried under a ton of other logging outputs.
Are you sure that you want to emit just one value?
Related
I have a rpt file, using which i will be generating multiple reports in pdf format. Using the Engine class from inet clear reports. The process takes very long as I have nearly 10000 reports to be generated. Can I use the Mutli-thread or some other approach to speed up the process?
Any help of how it can be done would be helpful
My partial code.
//Loops
Engine eng = new Engine(Engine.EXPORT_PDF);
eng.setReportFile(rpt); //rpt is the report name
if (cn.isClosed() || cn == null ) {
cn = ds.getConnection();
}
eng.setConnection(cn);
System.out.println(" After set connection");
eng.setPrompt(data[i], 0);
ReportProperties repprop = eng.getReportProperties();
repprop.setPaperOrient(ReportProperties.DEFAULT_PAPER_ORIENTATION, ReportProperties.PAPER_FANFOLD_US);
eng.execute();
System.out.println(" After excecute");
try {
PDFExportThread pdfExporter = new PDFExportThread(eng, sFileName, sFilePath);
pdfExporter.execute();
} catch (Exception e) {
e.printStackTrace();
}
PDFExportThread execute
public void execute() throws IOException {
FileOutputStream fos = null;
try {
String FileName = sFileName + "_" + (eng.getPageCount() - 1);
File file = new File(sFilePath + FileName + ".pdf");
if (!file.getParentFile().exists()) {
file.getParentFile().mkdirs();
}
if (!file.exists()) {
file.createNewFile();
}
fos = new FileOutputStream(file);
for (int k = 1; k <= eng.getPageCount(); k++) {
fos.write(eng.getPageData(k));
}
fos.flush();
fos.close();
} catch (Exception e) {
e.printStackTrace();
} finally {
if (fos != null) {
fos.close();
fos = null;
}
}
}
This is a very basic code. A ThreadPoolExecutor with a fixed size threads in a pool is the backbone.
Some considerations:
The thread pool size should be equal or less than the DB connection pool size. And, it should be of an optimal number which is reasonable for parallel Engines.
The main thread should wait for sufficient time before killing all threads. I have put 1 hour as the wait time, but that's just an example.
You'll need to have proper Exception handling.
From the API doc, I saw stopAll and shutdown methods from the Engine class. So, I'm invoking that as soon as our work is done. That's again, just an example.
Hope this helps.
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.sql.Connection;
import java.util.concurrent.Executors;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
public class RunEngine {
public static void main(String[] args) throws Exception {
final String rpt = "/tmp/rpt/input/rpt-1.rpt";
final String sFilePath = "/tmp/rpt/output/";
final String sFileName = "pdfreport";
final Object[] data = new Object[10];
ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(10);
for (int i = 0; i < data.length; i++) {
PDFExporterRunnable runnable = new PDFExporterRunnable(rpt, data[i], sFilePath, sFileName, i);
executor.execute(runnable);
}
executor.shutdown();
executor.awaitTermination(1L, TimeUnit.HOURS);
Engine.stopAll();
Engine.shutdown();
}
private static class PDFExporterRunnable implements Runnable {
private final String rpt;
private final Object data;
private final String sFilePath;
private final String sFileName;
private final int runIndex;
public PDFExporterRunnable(String rpt, Object data, String sFilePath,
String sFileName, int runIndex) {
this.rpt = rpt;
this.data = data;
this.sFilePath = sFilePath;
this.sFileName = sFileName;
this.runIndex = runIndex;
}
#Override
public void run() {
// Loops
Engine eng = new Engine(Engine.EXPORT_PDF);
eng.setReportFile(rpt); // rpt is the report name
Connection cn = null;
/*
* DB connection related code. Check and use.
*/
//if (cn.isClosed() || cn == null) {
//cn = ds.getConnection();
//}
eng.setConnection(cn);
System.out.println(" After set connection");
eng.setPrompt(data, 0);
ReportProperties repprop = eng.getReportProperties();
repprop.setPaperOrient(ReportProperties.DEFAULT_PAPER_ORIENTATION,
ReportProperties.PAPER_FANFOLD_US);
eng.execute();
System.out.println(" After excecute");
FileOutputStream fos = null;
try {
String FileName = sFileName + "_" + runIndex;
File file = new File(sFilePath + FileName + ".pdf");
if (!file.getParentFile().exists()) {
file.getParentFile().mkdirs();
}
if (!file.exists()) {
file.createNewFile();
}
fos = new FileOutputStream(file);
for (int k = 1; k <= eng.getPageCount(); k++) {
fos.write(eng.getPageData(k));
}
fos.flush();
fos.close();
} catch (Exception e) {
e.printStackTrace();
} finally {
if (fos != null) {
try {
fos.close();
} catch (IOException e) {
e.printStackTrace();
}
fos = null;
}
}
}
}
/*
* Dummy classes to avoid compilation errors.
*/
private static class ReportProperties {
public static final String PAPER_FANFOLD_US = null;
public static final String DEFAULT_PAPER_ORIENTATION = null;
public void setPaperOrient(String defaultPaperOrientation, String paperFanfoldUs) {
}
}
private static class Engine {
public static final int EXPORT_PDF = 1;
public Engine(int exportType) {
}
public static void shutdown() {
}
public static void stopAll() {
}
public void setPrompt(Object singleData, int i) {
}
public byte[] getPageData(int k) {
return null;
}
public int getPageCount() {
return 0;
}
public void execute() {
}
public ReportProperties getReportProperties() {
return null;
}
public void setConnection(Connection cn) {
}
public void setReportFile(String reportFile) {
}
}
}
I will offer this "answer" as a possible quick & dirty solution to get you started on a parallelization effort.
One way or another you're going to build a render farm.
I don't think there is a trivial way to do this in java; I would love to have someone post an answer that show how to parallelize your example in just a few lines of code. But until that happens this will hopefully help you make some progress.
You're going to have limited scaling in the same JVM instance.
But... let's see how far you get with that and see if it helps enough.
Design challenge #1: restarting.
You will probably want a place to keep the status for each of your reports e.g. "units of work".
You want this in case you need to re-start everything (maybe your server crashes) and you don't want to re-run all of the reports thus far.
Lots of ways you can do this; database, check to see if a "completed" file exists in your report folder (not sufficient for the *.pdf to exist, as that may be incomplete... for xyz_200.pdf you could maybe make an empty xyz_200.done or xyz_200.err file to help with re-running any problem children... and by the time you code up that file manipulation/checking/initialization logic, seems like it may have been easier to add a column to your database which holds the list of work to-be-done).
Design consideration #2: maximizing throughput (avoiding overload).
You don't want to saturate you system and run one thousand reports in parallel.
Maybe 10.
Maybe 100.
Probably not 5,000.
You will need to do some sizing research and see what gets you near 80 to 90% system utilization.
Design consideration #3: scaling across multiple servers
Overly complex, outside the scope of a Stack Exchange answer.
You'd have to spin up JVM's on multiple systems that are running something like the workers below, and a report-manager that can pull work items from a shared "queue" structure, again a database table is probably easier here than doing something file-based (or a network feed).
Sample Code
Caution: None of this code is well tested, it almost certainly has an abundance of typos, logic errors and poor design. Use at your own risk.
So anyway... I do want to give you the basic idea of a rudimentary task runner.
Replace your "// Loops" example in the question with code like the following:
main loop (original code example)
This is more or less doing what your example code did, modified to push most of the work into ReportWorker (new class, see below). Lots of stuff seems to be packed into your original question's example of "// Loop", so I'm not trying to reverse engineer that.
fwiw, it was unclear to me where "rpt" and "data[i]" are coming from so I hacked up some test data.
public class Main {
public static boolean complete( String data ) {
return false; // for testing nothing is complete.
}
public static void main(String args[] ) {
String data[] = new String[] {
"A",
"B",
"C",
"D",
"E" };
String rpt = "xyz";
// Loop
ReportManager reportMgr = new ReportManager(); // a new helper class (see below), it assigns/monitors work.
long startTime = System.currentTimeMillis();
for( int i = 0; i < data.length; ++i ) {
// complete is something you should write that knows if a report "unit of work"
// finished successfully.
if( !complete( data[i] ) ) {
reportMgr.assignWork( rpt, data[i] ); // so... where did values for your "rpt" variable come from?
}
}
reportMgr.waitForWorkToFinish(); // out of new work to assign, let's wait until everything in-flight complete.
long endTime = System.currentTimeMillis();
System.out.println("Done. Elapsed time = " + (endTime - startTime)/1000 +" seconds.");
}
}
ReportManager
This class is not thread safe, just have your original loop keep calling assignWork() until you're out of reports to assign then keep calling it until all work is done, e.g. waitForWorkToFinish(), as shown above. (fwiw, I don't think you could say any of the classes here are especially thread safe).
public class ReportManager {
public int polling_delay = 500; // wait 0.5 seconds for testing.
//public int polling_delay = 60 * 1000; // wait 1 minute.
// not high throughput millions of reports / second, we'll run at a slower tempo.
public int nWorkers = 3; // just 3 for testing.
public int assignedCnt = 0;
public ReportWorker workers[];
public ReportManager() {
// initialize our manager.
workers = new ReportWorker[ nWorkers ];
for( int i = 0; i < nWorkers; ++i ) {
workers[i] = new ReportWorker( i );
System.out.println("Created worker #"+i);
}
}
private ReportWorker handleWorkerError( int i ) {
// something went wrong, update our "report" status as one of the reports failed.
System.out.println("handlerWokerError(): failure in "+workers[i]+", resetting worker.");
workers[i].teardown();
workers[i] = new ReportWorker( i ); // just replace everything.
return workers[i]; // the new worker will, incidentally, be avaialble.
}
private ReportWorker handleWorkerComplete( int i ) {
// this unit of work was completed, update our "report" status tracker as success.
System.out.println("handleWorkerComplete(): success in "+workers[i]+", resetting worker.");
workers[i].teardown();
workers[i] = new ReportWorker( i ); // just replace everything.
return workers[i]; // the new worker will, incidentally, be avaialble.
}
private int activeWorkerCount() {
int activeCnt = 0;
for( int i = 0; i < nWorkers; ++i ) {
ReportWorker worker = workers[i];
System.out.println("activeWorkerCount() i="+i+", checking worker="+worker);
if( worker.hasError() ) {
worker = handleWorkerError( i );
}
if( worker.isComplete() ) {
worker = handleWorkerComplete( i );
}
if( worker.isInitialized() || worker.isRunning() ) {
++activeCnt;
}
}
System.out.println("activeWorkerCount() activeCnt="+activeCnt);
return activeCnt;
}
private ReportWorker getAvailableWorker() {
// check each worker to see if anybody recently completed...
// This (rather lazily) creates completely new ReportWorker instances.
// You might want to try pooling (salvaging and reinitializing them)
// to see if that helps your performance.
System.out.println("\n-----");
ReportWorker firstAvailable = null;
for( int i = 0; i < nWorkers; ++i ) {
ReportWorker worker = workers[i];
System.out.println("getAvailableWorker(): i="+i+" worker="+worker);
if( worker.hasError() ) {
worker = handleWorkerError( i );
}
if( worker.isComplete() ) {
worker = handleWorkerComplete( i );
}
if( worker.isAvailable() && firstAvailable==null ) {
System.out.println("Apparently worker "+worker+" is 'available'");
firstAvailable = worker;
System.out.println("getAvailableWorker(): i="+i+" now firstAvailable = "+firstAvailable);
}
}
return firstAvailable; // May (or may not) be null.
}
public void assignWork( String rpt, String data ) {
ReportWorker worker = getAvailableWorker();
while( worker == null ) {
System.out.println("assignWork: No workers available, sleeping for "+polling_delay);
try { Thread.sleep( polling_delay ); }
catch( InterruptedException e ) { System.out.println("assignWork: sleep interrupted, ignoring exception "+e); }
// any workers avaialble now?
worker = getAvailableWorker();
}
++assignedCnt;
worker.initialize( rpt, data ); // or whatever else you need.
System.out.println("assignment #"+assignedCnt+" given to "+worker);
Thread t = new Thread( worker );
t.start( ); // that is pretty much it, let it go.
}
public void waitForWorkToFinish() {
int active = activeWorkerCount();
while( active >= 1 ) {
System.out.println("waitForWorkToFinish(): #active workers="+active+", waiting...");
// wait a minute....
try { Thread.sleep( polling_delay ); }
catch( InterruptedException e ) { System.out.println("assignWork: sleep interrupted, ignoring exception "+e); }
active = activeWorkerCount();
}
}
}
ReportWorker
public class ReportWorker implements Runnable {
int test_delay = 10*1000; //sleep for 10 seconds.
// (actual code would be generating PDF output)
public enum StatusCodes { UNINITIALIZED,
INITIALIZED,
RUNNING,
COMPLETE,
ERROR };
int id = -1;
StatusCodes status = StatusCodes.UNINITIALIZED;
boolean initialized = false;
public String rpt = "";
public String data = "";
//Engine eng;
//PDFExportThread pdfExporter;
//DataSource_type cn;
public boolean isInitialized() { return initialized; }
public boolean isAvailable() { return status == StatusCodes.UNINITIALIZED; }
public boolean isRunning() { return status == StatusCodes.RUNNING; }
public boolean isComplete() { return status == StatusCodes.COMPLETE; }
public boolean hasError() { return status == StatusCodes.ERROR; }
public ReportWorker( int id ) {
this.id = id;
}
public String toString( ) {
return "ReportWorker."+id+"("+status+")/"+rpt+"/"+data;
}
// the example code doesn't make clear if there is a relationship between rpt & data[i].
public void initialize( String rpt, String data /* data[i] in original code */ ) {
try {
this.rpt = rpt;
this.data = data;
/* uncomment this part where you have the various classes availble.
* I have it commented out for testing.
cn = ds.getConnection();
Engine eng = new Engine(Engine.EXPORT_PDF);
eng.setReportFile(rpt); //rpt is the report name
eng.setConnection(cn);
eng.setPrompt(data, 0);
ReportProperties repprop = eng.getReportProperties();
repprop.setPaperOrient(ReportProperties.DEFAULT_PAPER_ORIENTATION, ReportProperties.PAPER_FANFOLD_US);
*/
status = StatusCodes.INITIALIZED;
initialized = true; // want this true even if we're running.
} catch( Exception e ) {
status = StatusCodes.ERROR;
throw new RuntimeException("initialze(rpt="+rpt+", data="+data+")", e);
}
}
public void run() {
status = StatusCodes.RUNNING;
System.out.println("run().BEGIN: "+this);
try {
// delay for testing.
try { Thread.sleep( test_delay ); }
catch( InterruptedException e ) { System.out.println(this+".run(): test interrupted, ignoring "+e); }
/* uncomment this part where you have the various classes availble.
* I have it commented out for testing.
eng.execute();
PDFExportThread pdfExporter = new PDFExportThread(eng, sFileName, sFilePath);
pdfExporter.execute();
*/
status = StatusCodes.COMPLETE;
System.out.println("run().END: "+this);
} catch( Exception e ) {
System.out.println("run().ERROR: "+this);
status = StatusCodes.ERROR;
throw new RuntimeException("run(rpt="+rpt+", data="+data+")", e);
}
}
public void teardown() {
if( ! isInitialized() || isRunning() ) {
System.out.println("Warning: ReportWorker.teardown() called but I am uninitailzied or running.");
// should never happen, fatal enough to throw an exception?
}
/* commented out for testing.
try { cn.close(); }
catch( Exception e ) { System.out.println("Warning: ReportWorker.teardown() ignoring error on connection close: "+e); }
cn = null;
*/
// any need to close things on eng?
// any need to close things on pdfExporter?
}
}
I have been testing remote submission of Storm Topologies using IDE (Eclipse).
And I succeeded uploading simple storm topology to remote Storm cluster, but the weird thing is when I checked Storm UI to make sure whether the topology, which was submitted remotely, is working without problems, I saw just _acker bolt in the UI but other bolts and spout is not there. After that I submitted the topology manually from command line and again checked Storm UI, and it is working as it is supposed to work without problems. I have been looking for the reason but couldn't find. I attached both topology and remote submitter class below and corresponding Storm UI pictures:
This is the output from Eclipse console (after remote submission)
225 [main] INFO backtype.storm.StormSubmitter - Uploading topology jar T:\STORM_TOPOLOGIES\Benchmark.jar to assigned location: /app/storm/nimbus/inbox/stormjar-d3ca2e14-c1d4-45e1-b21c-70f62c62cd84.jar
234 [main] INFO backtype.storm.StormSubmitter - Successfully uploaded topology jar to assigned location: /app/storm/nimbus/inbox/stormjar-d3ca2e14-c1d4-45e1-b21c-70f62c62cd84.jar
Here is topology:
public class StormBenchmark {
// ******************************************************************************************
public static class GenSpout extends BaseRichSpout {
//private static final Logger logger = Logger.getLogger(StormBenchmark.class.getName());
private Long count = 1L;
private Object msgID;
private static final long serialVersionUID = 1L;
private static final Character[] CHARS = new Character[] { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'};
private static final String[] newsagencies = {"bbc", "cnn", "reuters", "aljazeera", "nytimes", "nbc news", "fox news", "interfax"};
SpoutOutputCollector _collector;
int _size;
Random _rand;
String _id;
String _val;
// Constructor
public GenSpout(int size) {
_size = size;
}
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
_collector = collector;
_rand = new Random();
_id = randString(5);
_val = randString2(_size);
}
//Business logic
public void nextTuple() {
count++;
msgID = count;
_collector.emit(new Values(_id, _val), msgID);
}
public void ack(Object msgID) {
this.msgID = msgID;
}
private String randString(int size) {
StringBuffer buf = new StringBuffer();
for(int i=0; i<size; i++) {
buf.append(CHARS[_rand.nextInt(CHARS.length)]);
}
return buf.toString();
}
private String randString2(int size) {
StringBuffer buf = new StringBuffer();
for(int i=0; i<size; i++) {
buf.append(newsagencies[_rand.nextInt(newsagencies.length)]);
}
return buf.toString();
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("id", "item"));
}
}
// =======================================================================================================
// =================================== B O L T ===========================================================
public static class IdentityBolt extends BaseBasicBolt {
private static final long serialVersionUID = 1L;
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("id", "item"));
}
public void execute(Tuple tuple, BasicOutputCollector collector) {
String character = tuple.getString(0);
String agency = tuple.getString(1);
List<String> box = new ArrayList<String>();
box.add(character);
box.add(agency);
try {
fileWriter(box);
} catch (IOException e) {
e.printStackTrace();
}
box.clear();
}
public void fileWriter(List<String> listjon) throws IOException {
String pathname = "/home/hduser/logOfStormTops/logs.txt";
File file = new File(pathname);
if (!file.exists()){
file.createNewFile();
}
BufferedWriter writer = new BufferedWriter(new FileWriter(file, true));
writer.write(listjon.get(0) + " : " + listjon.get(1));
writer.newLine();
writer.flush();
writer.close();
}
}
//storm jar storm-benchmark-0.0.1-SNAPSHOT-standalone.jar storm.benchmark.ThroughputTest demo 100 8 8 8 10000
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new GenSpout(8), 2).setNumTasks(4);
builder.setBolt("bolt", new IdentityBolt(), 4).setNumTasks(8)
.shuffleGrouping("spout");
Config conf = new Config();
conf.setMaxSpoutPending(200);
conf.setStatsSampleRate(0.0001);
//topology.executor.receive.buffer.size: 8192 #batched
//topology.executor.send.buffer.size: 8192 #individual messages
//topology.transfer.buffer.size: 1024 # batched
conf.put("topology.executor.send.buffer.size", 1024);
conf.put("topology.transfer.buffer.size", 8);
conf.put("topology.receiver.buffer.size", 8);
conf.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "-Xdebug -Xrunjdwp:transport=dt_socket,address=1%ID%,server=y,suspend=n");
StormSubmitter.submitTopology("SampleTop", conf, builder.createTopology());
}
}
And here is The RemoteSubmitter class:
public class RemoteSubmissionTopo {
#SuppressWarnings({ "unchecked", "rawtypes", "unused" })
public static void main(String... args) {
Config conf = new Config();
TopologyBuilder topoBuilder = new TopologyBuilder();
conf.put(Config.NIMBUS_HOST, "117.16.142.49");
conf.setDebug(true);
Map stormConf = Utils.readStormConfig();
stormConf.put("nimbus.host", "117.16.142.49");
String jar_path = "T:\\STORM_TOPOLOGIES\\Benchmark.jar";
Client client = NimbusClient.getConfiguredClient(stormConf).getClient();
try {
NimbusClient nimbus = new NimbusClient(stormConf, "117.16.142.49", 6627);
String uploadedJarLocation = StormSubmitter.submitJar(stormConf, jar_path);
String jsonConf = JSONValue.toJSONString(stormConf);
nimbus.getClient().submitTopology("benchmark-tp", uploadedJarLocation, jsonConf, topoBuilder.createTopology());
} catch (TTransportException e) {
e.printStackTrace();
} catch (AlreadyAliveException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (InvalidTopologyException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (TException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
try {
Thread.sleep(6000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
And Here is the Storm UI pict (in case of remote submission)
And Here is the other Storm UI pict (in case of manual submission)
In RemoteSubmissionTopo you use TopologyBuilder topoBuilder = new TopologyBuilder(); but do not call setSpout(...)/setBolt(...). Thus, you are submitting an topology with no operators...
Btw: RemoteSubmissionTopo is actually not required at all. You can use StormBenchmark to submit remotely. Just add conf.put(Config.NIMBUS_HOST, "117.16.142.49"); in main and set JVM option -Dstorm.jar=/path/to/topology.jar and you are good to run.
I'm trying to do a multi-get on my redis data store which is distributed across multiple shards. However the keys I want to do this on do not belong to the same shard so I can't use redis' inbuilt multi-get.
Instead I'm trying to use futures to achieve this. But after checking the lookup times it almost seems like these cache calls are being made serially.
The request/sec on the server is about 1.5k with an average of 10 ms response time. Literature I've read told me that my threadpool size should be requests/sec * response time. Since I'm spawning 3 threads this becomes 1500 * 0.010 * 3 = 45. I've tried using threadpool sizes of 50,100,300. But this hasn't helped either.
I'm using Jedis as a client. I thought it could be an issue with exceeding Jedis' max total/idle connection limit. But even after increasing this from 8 to 24 I see no difference in lookup times.
I understand that some overhead will be there since there will be context switches and the overhead of spawning new threads.
Can anyone help me figure out where I'm missing out. Let me know if you need more info.
for(String recordKey : pidArr) {
//Adding futures. Max 3
if(count >= 3) {
break;
}
count++;
Callable<String> a = new FeedCacheCaller(recordKey);
Future<String> future = feedThreadPool.submit(a);
futureList.add(future);
}
//Getting the data from the futures
for(Future<String> foo : futureList) {
try {
String data = foo.get();
logger.debug(data);
feedDataList.add(parseInfo(data));
} catch (Exception e) {
logger.error("somethings going wrong in retrieval",e);
}
}
Here's the Callable class
public class FeedCacheCaller implements Callable {
String pid = null;
FeedCache feedCache;
public FeedCacheCaller(String pid) {
this.pid = pid;
this.feedCache = new FeedCache();
}
#Override
public String call() throws Exception {
return feedCache.get(pid);
}
}
Edit 1:
Here's the Jedis side code.
public class FeedCache {
private ShardedJedisPool feedClient = RedisPool.getPool("feed");
public String get(String key) {
ShardedJedis client = null;
String value = null;
try {
client = feedClient.getResource();
byte[] valueByteArray = client.get(key.getBytes(Constants.CHARSET));
if (valueByteArray != null) {
value = new String(CacheUtils.decompress(valueByteArray),
Constants.CHARSET);
}
} catch (JedisConnectionException e) {
if (client != null) {
feedClient.returnBrokenResource(client);
client = null;
}
logger.error(e.getMessage());
} finally {
if (client != null) {
feedClient.returnResource(client);
}
}
return value;
}
}
Here is the code that initializes the ShardedJedisPool
public class RedisPool {
private static final Logger logger = LoggerFactory.getLogger(
RedisPool.class);
private static ConcurrentHashMap<String, ShardedJedisPool> redisPools = new ConcurrentHashMap<String, ShardedJedisPool>();
public static void initializePool(String poolName) {
List<JedisShardInfo> shards = new ArrayList<JedisShardInfo>();
ArrayList<String> servers = new ArrayList<String>(Arrays.asList(
Constants.config.getStringArray(
poolName + "_redis_servers")));
for (int i = 0; i < servers.size(); i++) {
JedisShardInfo shardInfo = new JedisShardInfo(servers.get(i).split(":")[0], Integer.parseInt(servers.get(i).split(":")[1]));
shards.add(shardInfo);
}
redisPools.putIfAbsent(poolName,
new ShardedJedisPool(new GenericObjectPoolConfig(), shards));
}
public static ShardedJedisPool getPool(String poolName) {
if (!redisPools.containsKey(poolName)) {
synchronized (RedisPool.class) {
if (!redisPools.containsKey(poolName)) {
initializePool(poolName);
}
}
}
return redisPools.get(poolName);
}
public static void shutdown(String poolName) {
ShardedJedisPool pool = getPool(poolName);
pool.destroy();
redisPools.remove(poolName);
}
public static void main(String args[]) {
initializePool("vizidtoud");
}
}
I have defined a filter for the termination condition by k-means.
if I run my app it always compute only one iteration.
I think the problem is here:
DataSet<GeoTimeDataCenter> finalCentroids = loop.closeWith(newCentroids, newCentroids.join(loop).where("*").equalTo("*").filter(new MyFilter()));
or maybe the filter function:
public static final class MyFilter implements FilterFunction<Tuple2<GeoTimeDataCenter, GeoTimeDataCenter>> {
private static final long serialVersionUID = 5868635346889117617L;
public boolean filter(Tuple2<GeoTimeDataCenter, GeoTimeDataCenter> tuple) throws Exception {
if(tuple.f0.equals(tuple.f1)) {
return true;
}
else {
return false;
}
}
}
best regards,
paul
my full code here:
public void run() {
//load properties
Properties pro = new Properties();
FileSystem fs = null;
try {
pro.load(FlinkMain.class.getResourceAsStream("/config.properties"));
fs = FileSystem.get(new URI(pro.getProperty("hdfs.namenode")),new org.apache.hadoop.conf.Configuration());
} catch (Exception e) {
e.printStackTrace();
}
int maxIteration = Integer.parseInt(pro.getProperty("maxiterations"));
String outputPath = fs.getHomeDirectory()+pro.getProperty("flink.output");
// set up execution environment
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
// get input points
DataSet<GeoTimeDataTupel> points = getPointDataSet(env);
DataSet<GeoTimeDataCenter> centroids = null;
try {
centroids = getCentroidDataSet(env);
} catch (Exception e1) {
e1.printStackTrace();
}
// set number of bulk iterations for KMeans algorithm
IterativeDataSet<GeoTimeDataCenter> loop = centroids.iterate(maxIteration);
DataSet<GeoTimeDataCenter> newCentroids = points
// compute closest centroid for each point
.map(new SelectNearestCenter(this.getBenchmarkCounter())).withBroadcastSet(loop, "centroids")
// count and sum point coordinates for each centroid
.groupBy(0).reduceGroup(new CentroidAccumulator())
// compute new centroids from point counts and coordinate sums
.map(new CentroidAverager(this.getBenchmarkCounter()));
// feed new centroids back into next iteration with termination condition
DataSet<GeoTimeDataCenter> finalCentroids = loop.closeWith(newCentroids, newCentroids.join(loop).where("*").equalTo("*").filter(new MyFilter()));
DataSet<Tuple2<Integer, GeoTimeDataTupel>> clusteredPoints = points
// assign points to final clusters
.map(new SelectNearestCenter(-1)).withBroadcastSet(finalCentroids, "centroids");
// emit result
clusteredPoints.writeAsCsv(outputPath+"/points", "\n", " ");
finalCentroids.writeAsText(outputPath+"/centers");//print();
// execute program
try {
env.execute("KMeans Flink");
} catch (Exception e) {
e.printStackTrace();
}
}
public static final class MyFilter implements FilterFunction<Tuple2<GeoTimeDataCenter, GeoTimeDataCenter>> {
private static final long serialVersionUID = 5868635346889117617L;
public boolean filter(Tuple2<GeoTimeDataCenter, GeoTimeDataCenter> tuple) throws Exception {
if(tuple.f0.equals(tuple.f1)) {
return true;
}
else {
return false;
}
}
}
I think the problem is the filter function (modulo the code you haven't posted). Flink's termination criterion works the following way: The termination criterion is met if the provided termination DataSet is empty. Otherwise the next iteration is started if the maximum number of iterations has not been exceeded.
Flink's filter function keeps only those elements for which the FilterFunction returns true. Thus, with your MyFilter implementation you only keep the centroids which are before and after the iteration identical. This implies that you'll obtain an empty DataSet if all centroids have changed and, thus, the iteration terminates. This is clearly the inverse of the actual termination criterion. The termination criterion should be: Continue with k-means as long as there is a centroid which has changed.
You can do this with a coGroup function where you emit elements if there is no matching centroid from the preceding centroid DataSet. This is similar to a left outer join, just that you discard non null matches.
public static void main(String[] args) throws Exception {
// set up the execution environment
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet<Element> oldDS = env.fromElements(new Element(1, "test"), new Element(2, "test"), new Element(3, "foobar"));
DataSet<Element> newDS = env.fromElements(new Element(1, "test"), new Element(3, "foobar"), new Element(4, "test"));
DataSet<Element> filtered = newDS.coGroup(oldDS).where("*").equalTo("*").with(new FilterCoGroup());
filtered.print();
}
public static class FilterCoGroup implements CoGroupFunction<Element, Element, Element> {
#Override
public void coGroup(
Iterable<Element> newElements,
Iterable<Element> oldElements,
Collector<Element> collector) throws Exception {
List<Element> persistedElements = new ArrayList<Element>();
for(Element element: oldElements) {
persistedElements.add(element);
}
for(Element newElement: newElements) {
boolean contained = false;
for(Element oldElement: persistedElements) {
if(newElement.equals(oldElement)){
contained = true;
}
}
if(!contained) {
collector.collect(newElement);
}
}
}
}
public static class Element implements Key {
private int id;
private String name;
public Element(int id, String name) {
this.id = id;
this.name = name;
}
public Element() {
this(-1, "");
}
#Override
public int hashCode() {
return 31 + 7 * name.hashCode() + 11 * id;
}
#Override
public boolean equals(Object obj) {
if(obj instanceof Element) {
Element element = (Element) obj;
return id == element.id && name.equals(element.name);
} else {
return false;
}
}
#Override
public int compareTo(Object o) {
if(o instanceof Element) {
Element element = (Element) o;
if(id == element.id) {
return name.compareTo(element.name);
} else {
return id - element.id;
}
} else {
throw new RuntimeException("Comparing incompatible types.");
}
}
#Override
public void write(DataOutputView dataOutputView) throws IOException {
dataOutputView.writeInt(id);
dataOutputView.writeUTF(name);
}
#Override
public void read(DataInputView dataInputView) throws IOException {
id = dataInputView.readInt();
name = dataInputView.readUTF();
}
#Override
public String toString() {
return "(" + id + "; " + name + ")";
}
}
So, I'm working on a plugin at work and I've run into a situation where I could use a ContentProposalAdapter to my benefit. Basically, a person will start typing in someone's name and then a list of names matching the current query will be returned in a type-ahead manner (a la Google). So, I created a class IContentProposalProvider which, upon calling it's getProposals() method fires off a thread which handles getting the proposals in the background. The problem I am having is that I run into a race condition, where the processing for getting the proposals via HTTP happens and I try to get the proposals before they have actually been retrieved.
Now, I'm trying not to run into an issue of Thread hell, and that isn't getting me very far anyway. So, here is what I've done so far. Does anyone have any suggestions as to what I can do?
public class ProfilesProposalProvider implements IContentProposalProvider, PropertyChangeListener {
private IContentProposal[] props;
#Override
public IContentProposal[] getProposals(String arg0, int arg1) {
Display display = PlatformUI.getWorkbench().getActiveWorkbenchWindow().getShell().getDisplay();
RunProfilesJobThread t1 = new RunProfilesJobThread(arg0, display);
t1.run();
return props;
}
#Override
public void propertyChange(PropertyChangeEvent arg0) {
if (arg0.getSource() instanceof RunProfilesJobThread){
RunProfilesJobThread thread = (RunProfilesJobThread)arg0.getSource();
props = thread.getProps();
}
}
}
public class RunProfilesJobThread extends Thread {
private ProfileProposal[] props;
private Display display;
private String query;
public RunProfilesJobThread(String query, Display display){
this.query = query;
}
#Override
public void run() {
if (!(query.equals(""))){
GetProfilesJob job = new GetProfilesJob("profiles", query);
job.schedule();
try {
job.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
GetProfilesJobInfoThread thread = new GetProfilesJobInfoThread(job.getResults());
try {
thread.join();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
props = thread.getProps();
}
}
public ProfileProposal[] getProps(){
return props;
}
}
public class GetProfilesJobInfoThread extends Thread {
private ArrayList<String> names;
private ProfileProposal[] props;
public GetProfilesJobInfoThread(ArrayList<String> names){
this.names = names;
}
#Override
public void run() {
if (names != null){
props = new ProfileProposal[names.size()];
for (int i = 0; i < props.length - 1; i++){
ProfileProposal temp = new ProfileProposal(names.get(i), names.get(i));
props[i] = temp;
}
}
}
public ProfileProposal[] getProps(){
return props;
}
}
Ok, i'll try it...
I haven't tried to run it, but it should work more or less. At least it's a good start. If you have any questions, feel free to ask.
public class ProfilesProposalProvider implements IContentProposalProvider {
private List<IContentProposal> proposals;
private String proposalQuery;
private Thread retrievalThread;
public void setProposals( List<IContentProposal> proposals, String query ) {
synchronized( this ) {
this.proposals = proposals;
this.proposalQuery = query;
}
}
public IContentProposal[] getProposals( String contents, int position ) {
// Synchronize incoming thread and retrieval thread, so that the proposal list
// is not replaced while we're processing it.
synchronized( this ) {
/**
* Get proposals if query is longer than one char, or if the current list of proposals does with a different
* prefix than the new query, and only if the current retrieval thread is finished.
*/
if ( retrievalThread == null && contents.length() > 1 && ( proposals == null || !contents.startsWith( proposalQuery ) ) ) {
getProposals( contents );
}
/**
* Select valid proposals from retrieved list.
*/
if ( proposals != null ) {
List<IContentProposal> validProposals = new ArrayList<IContentProposal>();
for ( IContentProposal prop : proposals ) {
if(prop == null) {
continue;
}
String propVal = prop.getContent();
if ( isProposalValid( propVal, contents )) {
validProposals.add( prop );
}
}
return validProposals.toArray( new IContentProposal[ validProposals.size() ] );
}
}
return new IContentProposal[0];
}
protected void getProposals( final String query ) {
retrievalThread = new Thread() {
#Override
public void run() {
GetProfilesJob job = new GetProfilesJob("profiles", query);
job.schedule();
try {
job.join();
ArrayList<String> names = job.getResults();
if (names != null){
List<IContentProposal> props = new ArrayList<IContentProposal>();
for ( String name : names ) {
props.add( new ProfileProposal( name, name ) );
}
setProposals( props, query );
}
} catch (InterruptedException e) {
e.printStackTrace();
}
retrievalThread = null;
}
};
retrievalThread.start();
}
protected boolean isProposalValid( String proposalValue, String contents ) {
return ( proposalValue.length() >= contents.length() && proposalValue.substring(0, contents.length()).equalsIgnoreCase(contents));
}
}