Unable to run JAR - Spark Twitter Streaming with Java - java

I'm running Spark 2.4.3 in standalone mode in Ubuntu. I am using Maven to create the JAR file. Below is the code I'm trying to run which is intended to stream data from Twitter.
Once Spark is started Spark master will be at 127.0.1.1:7077.
The java version being used is 1.8.
package SparkTwitter.SparkJavaTwitter;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.function.VoidFunction;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaReceiverInputDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.spark.streaming.twitter.TwitterUtils;
import scala.Tuple2;
import twitter4j.Status;
import twitter4j.auth.Authorization;
import twitter4j.auth.OAuthAuthorization;
import twitter4j.conf.Configuration;
import twitter4j.conf.ConfigurationBuilder;
import com.google.common.collect.Iterables;
public class TwitterStream {
public static void main(String[] args) {
// Prepare the spark configuration by setting application name and master node "local" i.e. embedded mode
final SparkConf sparkConf = new SparkConf().setAppName("Twitter Data Processing").setMaster("local[2]");
// Create Streaming context using spark configuration and duration for which messages will be batched and fed to Spark Core
final JavaStreamingContext streamingContext = new JavaStreamingContext(sparkConf, Duration.apply(10000));
// Prepare configuration for Twitter authentication and authorization
final Configuration conf = new ConfigurationBuilder().setDebugEnabled(false)
.setOAuthConsumerKey("customer key")
.setOAuthConsumerSecret("customer key secret")
.setOAuthAccessToken("Access token")
.setOAuthAccessTokenSecret("Access token secret")
.build();
// Create Twitter authorization object by passing prepared configuration containing consumer and access keys and tokens
final Authorization twitterAuth = new OAuthAuthorization(conf);
// Create a data stream using streaming context and Twitter authorization
final JavaReceiverInputDStream<Status> inputDStream = TwitterUtils.createStream(streamingContext, twitterAuth, new String[]{});
// Create a new stream by filtering the non english tweets from earlier streams
final JavaDStream<Status> enTweetsDStream = inputDStream.filter((status) -> "en".equalsIgnoreCase(status.getLang()));
// Convert stream to pair stream with key as user screen name and value as tweet text
final JavaPairDStream<String, String> userTweetsStream =
enTweetsDStream.mapToPair(
(status) -> new Tuple2<String, String>(status.getUser().getScreenName(), status.getText())
);
// Group the tweets for each user
final JavaPairDStream<String, Iterable<String>> tweetsReducedByUser = userTweetsStream.groupByKey();
// Create a new pair stream by replacing iterable of tweets in older pair stream to number of tweets
final JavaPairDStream<String, Integer> tweetsMappedByUser = tweetsReducedByUser.mapToPair(
userTweets -> new Tuple2<String, Integer>(userTweets._1, Iterables.size(userTweets._2))
);
// Iterate over the stream's RDDs and print each element on console
tweetsMappedByUser.foreachRDD((VoidFunction<JavaPairRDD<String, Integer>>)pairRDD -> {
pairRDD.foreach(new VoidFunction<Tuple2<String,Integer>>() {
#Override
public void call(Tuple2<String, Integer> t) throws Exception {
System.out.println(t._1() + "," + t._2());
}
});
});
// Triggers the start of processing. Nothing happens if streaming context is not started
streamingContext.start();
// Keeps the processing live by halting here unless terminated manually
//streamingContext.awaitTermination();
}
}
pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>SparkTwitter</groupId>
<artifactId>SparkJavaTwitter</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>SparkJavaTwitter</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>2.4.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.12</artifactId>
<version>2.4.3</version>
<scope>provided</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-twitter -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-twitter_2.11</artifactId>
<version>1.6.3</version>
</dependency>
</dependencies>
</project>
To execute the code I'm using the following command
./bin/spark-submit --class SparkTwitter.SparkJavaTwitter.TwitterStream /home/hadoop/eclipse-workspace/SparkJavaTwitter/target/SparkJavaTwitter-0.0.1-SNAPSHOT.jar
Below is the output I'm getting.
19/11/10 22:17:58 WARN Utils: Your hostname, hadoop-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface enp0s3)
19/11/10 22:17:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/11/10 22:17:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Warning: Failed to load SparkTwitter.SparkJavaTwitter.TwitterStream: twitter4j/auth/Authorization
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
I've been running a word count program the same way and it works fine. When I build the JAR it builds successfully as well. Do I have to specify any more parameters while running the JAR?

I've faced a similar problem and found that you need to give the jars directly to spark-submit. What I do is point out the directory where the jars used to build the project are stored using the --jars "<path-to-jars>/*" option to spark-submit.
Perhaps this is not the best option, but it works...
Also, when updating versions beware that the jars in that folder must also be updated.

Related

facing an issue in Spark Structured Streaming

I have written a code to read a csf file and printing that on console using Spark Stuctured Stream. Code is below -
import java.util.ArrayList;
import java.util.List;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.sql.*;
import org.apache.spark.sql.streaming.StreamingQuery;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.types.StructType;
import com.cybernetix.models.BaseDataModel;
public class ReadCSVJob {
static List<BaseDataModel> bdmList=new ArrayList<BaseDataModel>();
public static void main(String args[]) {
SparkSession spark = SparkSession
.builder()
.config("spark.eventLog.enabled", "false")
.config("spark.driver.memory", "2g")
.config("spark.executor.memory", "2g")
.appName("StructuredStreamingAverage")
.master("local")
.getOrCreate();
StructType userSchema = new StructType();
userSchema.add("name", "string");
userSchema.add("status", "String");
userSchema.add("u_startDate", "String");
userSchema.add("u_lastlogin", "string");
userSchema.add("u_firstName", "string");
userSchema.add("u_lastName", "string");
userSchema.add("u_phone","string");
userSchema.add("u_email", "string")
;
Dataset<Row> dataset = spark.
readStream().
schema(userSchema)
.csv("D:\\user\\sdata\\user-2019-10-03_20.csv");
dataset.writeStream()
.format("console")
.option("truncate","false")
.start();
}
}
in this code line userSchema.add("name", "string"); causing the program to terrminate. Below is the log trace.
ANTLR Tool version 4.7 used for code generation does not match the current runtime version 4.5.3ANTLR Runtime version 4.7 used for parser compilation does not match the current runtime version 4.5.3Exception in thread "main" java.lang.ExceptionInInitializerError at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:84) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseDataType(ParseDriver.scala:39) at org.apache.spark.sql.types.StructType.add(StructType.scala:213) at com.cybernetix.sparks.jobs.ReadCSVJob.main(ReadCSVJob.java:45) Caused by: java.lang.UnsupportedOperationException: java.io.InvalidClassException: org.antlr.v4.runtime.atn.ATN; Could not deserialize ATN with UUID 59627784-3be5-417a-b9eb-8131a7286089 (expected aadb8d7e-aeef-4415-ad2b-8204d6cf042e or a legacy UUID). at org.antlr.v4.runtime.atn.ATNDeserializer.deserialize(ATNDeserializer.java:153) at org.apache.spark.sql.catalyst.parser.SqlBaseLexer.<clinit>(SqlBaseLexer.java:1175) ... 4 more Caused by: java.io.InvalidClassException: org.antlr.v4.runtime.atn.ATN; Could not deserialize ATN with UUID 59627784-3be5-417a-b9eb-8131a7286089 (expected aadb8d7e-aeef-4415-ad2b-8204d6cf042e or a legacy UUID). ... 6 more
I have added ANTLR maven dependency in pom.xml file but still facing the same issue.
<!-- https://mvnrepository.com/artifact/org.antlr/antlr4 -->
<dependency>
<groupId>org.antlr</groupId>
<artifactId>antlr4</artifactId>
<version>4.7</version>
</dependency>
I am not sure after adding antlr dependency , why in maven dependency list still it antlr-runtime-4.5.3.jar. Have a look to below screen shot.
Can anyone help me what i am doing wrong here?
Update your artifactId to antlr4-runtime, and try again. Please clean and build.
dependency should look like below:
<dependency>
<groupId>org.antlr</groupId>
<artifactId>antlr4-runtime</artifactId>
<version>4.7</version>
</dependency>

Can't load Phoenix JDBC driver for Storm's JdbcInsertBolt

During initialization of Apache's Storm JdbcInsertBolt I get an error
java.lang.ClassCastException:
Cannot cast org.apache.phoenix.jdbc.PhoenixDriver to javax.sql.DataSource
at com.zaxxer.hikari.util.UtilityElf.createInstance(UtilityElf.java:90)
at com.zaxxer.hikari.pool.PoolBase.initializeDataSource(PoolBase.java:292)
at com.zaxxer.hikari.pool.PoolBase.<init>(PoolBase.java:84)
at com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:102)
at com.zaxxer.hikari.HikariDataSource.<init>(HikariDataSource.java:71)
at org.apache.storm.jdbc.common.HikariCPConnectionProvider.prepare(HikariCPConnectionProvider.java:53)
at org.apache.storm.jdbc.mapper.SimpleJdbcMapper.<init>(SimpleJdbcMapper.java:43)
from the underlying HikariCPConnectionProvider. Whats wrong?
I am following http://storm.apache.org/releases/1.1.2/storm-jdbc.html, here is what I am doing based on that:
I like to write data from a Apache Storm topology to a HBase table via Phoenix. For that I downloaded the driver-file (phoenix-4.7.0.2.6.5.3003-25-client.jar) from my cluster-Server and added it to my local maven repository:
mvn install:install-file
-Dfile=lib\phoenix-4.7.0.2.6.5.3003-25-client.jar
-DgroupId=org.apache.phoenix
-DartifactId=phoenix-jdbc -Dversion=4.7.0 -Dpackaging=jar
After that I updated my .pom:
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-jdbc</artifactId>
<version>4.7.0</version>
</dependency>
Now add Storm's JDBC-Bolt:
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-jdbc</artifactId>
<version>1.2.2</version>
<scope>provided</scope>
</dependency>
and I am set-up to use the bolt. First: Setup Connection-Provider:
Map hikariConfigMap = new HashMap();
hikariConfigMap.put("dataSourceClassName", "org.apache.phoenix.jdbc.PhoenixDriver");
hikariConfigMap.put("dataSource.url", "<zookeeperQuorumURI>:2181:/hbase-unsecure");
this.connectionProvider = new HikariCPConnectionProvider(hikariConfigMap);
Now initialize the tuple-values-to-db-columns-mapper
this.simpleJdbcMapper = new SimpleJdbcMapper(this.tablename, connectionProvider);
During this the error mentioned above happens.
Just for completeness: The JdbcInsertBolt gets created like this:
new JdbcInsertBolt(this.connectionProvider, this.simpleJdbcMapper)
.withTableName(this.tablename)
.withQueryTimeoutSecs(30);
have you tried to set :
driverClassName -> org.apache.phoenix.jdbc.PhoenixDriver. The current code seems to have set the dataSourceClassName which is different i guess
Refering this

using mllib in apache spark 2.0.2 and "The import org.apache.spark.mllib cannot be resolved" error

I just want to do some 2D Matrix operation in using JavaRDD and looked into this link https://spark.apache.org/docs/latest/mllib-data-types.html. I tried doing exactly the same sample codes that are given here. But eclipse doesn't seem to recognize the mllib in the first place. Here is my code snippet (same as that in the above link)
import org.apache.spark.mllib.linalg.Vector;
import org.apache.spark.mllib.regression.LabeledPoint;
import org.apache.spark.mllib.regression.LabeledPoint;
import org.apache.spark.mllib.util.MLUtils;
import org.apache.spark.mllib.linalg.Matrix;
import org.apache.spark.mllib.linalg.Matrices;
JavaRDD<Vector> rows = ... // a JavaRDD of local vectors
// Create a RowMatrix from an JavaRDD<Vector>.
RowMatrix mat = new RowMatrix(rows.rdd());
// Get its size.
long m = mat.numRows();
long n = mat.numCols();
// QR decomposition
QRDecomposition<RowMatrix, Matrix> result = mat.tallSkinnyQR(true);
I am using Spark 2.0.2. Where am I going wrong? Do we need any maven dependency? I checked my spark home directory, and I have the mllib directory and mllib-local directory in my spark directory.
Check your pom.xml to see if there is a spark-mllib dependency. If not, get the right version from here: https://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.11
At the point of my answering, the latest version is:
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.11 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.1.0</version>
</dependency>
Make sure that your spark-mllib configuration in pom.xml hasn't been runtime.

Replacing deprecated AbstractEditHandlerDetailsWebAction in Atlassian JIRA plugin for 7.X

I'm following Atlassian's Tutorial - Custom message (mail) handler for JIRA
I've hit a brick wall with the second to last step:
3) Create a new file named EditDemoHandlerDetailsWebAction.java in src/main/java/com/example/plugins/tutorial/jira/mailhandlerdemo directory, and give it the following contents:
package com.example.plugins.tutorial.jira.mailhandlerdemo;
import com.atlassian.configurable.ObjectConfigurationException;
import com.atlassian.jira.plugins.mail.webwork.AbstractEditHandlerDetailsWebAction;
import com.atlassian.jira.service.JiraServiceContainer;
import com.atlassian.jira.service.services.file.AbstractMessageHandlingService;
import com.atlassian.jira.service.util.ServiceUtils;
import com.atlassian.jira.util.collect.MapBuilder;
import com.atlassian.plugin.PluginAccessor;
import java.util.Map;
public class EditDemoHandlerDetailsWebAction extends AbstractEditHandlerDetailsWebAction {
private final IssueKeyValidator issueKeyValidator;
public EditDemoHandlerDetailsWebAction(PluginAccessor pluginAccessor, IssueKeyValidator issueKeyValidator) {
super(pluginAccessor);
this.issueKeyValidator = issueKeyValidator;
}
private String issueKey;
public String getIssueKey() {
return issueKey;
}
public void setIssueKey(String issueKey) {
this.issueKey = issueKey;
}
// this method is called to let us populate our variables (or action state)
// with current handler settings managed by associated service (file or mail).
#Override
protected void copyServiceSettings(JiraServiceContainer jiraServiceContainer) throws ObjectConfigurationException {
final String params = jiraServiceContainer.getProperty(AbstractMessageHandlingService.KEY_HANDLER_PARAMS);
final Map<String, String> parameterMap = ServiceUtils.getParameterMap(params);
issueKey = parameterMap.get(DemoHandler.KEY_ISSUE_KEY);
}
#Override
protected Map<String, String> getHandlerParams() {
return MapBuilder.build(DemoHandler.KEY_ISSUE_KEY, issueKey);
}
#Override
protected void doValidation() {
if (configuration == null) {
return; // short-circuit in case we lost session, goes directly to doExecute which redirects user
}
super.doValidation();
issueKeyValidator.validateIssue(issueKey, new WebWorkErrorCollector());
}
}
The class inherits from AbstractEditHandlerDetailsWebAction which allows us to concentrate on parameter validation. It takes care of the add, edit, and cancel handler lifecycle itself.
This tutorial is supposed to support JIRA 5.0+ including the newest version up to 7.2
I am using JIRA 7.1.8
My problem is that maven is unable to locate the dependency for
import com.atlassian.jira.plugins.mail.webwork.AbstractEditHandlerDetailsWebAction;
After a TON of digging, I have found that com.atlassian.jira.plugins.mail exists in the specs for up to JIRA 5.1.8
However, in the specs for 5.2-m03 onward, this folder is not present, which is why maven cant find it.
Moreover, I can't find any information stating that these classes were deprecated nor any suggestion as to what I should replace this code with for my version of JIRA.
So, what can I use in place of the seemingly deprecated com.atlassian.jira.plugins.mail.webwork.AbstractEditHandlerDetailsWebAction; in the above class?
For whatever reason, the version numbers of the JIRA mail plugin became dissociated from the version numbers of JIRA itself. You will be able to build the project once you ensure that you are referencing the correct version of the mail plugin.
I was able to get it to build as follows:
Clone the repo from the tutorial
git clone https://bitbucket.org/atlassian_tutorial/jira-add-email-handler.git
Figure out which version of the JIRA mail plugin is in use
You can do this easily by looking in the JIRA install directory. In my JIRA 7.1 install, the mail plugin was v9.0.3:
$ find <PATH_TO_JIRA_INSTALL>/atlassian-jira -name '*jira-mail-plugin*.jar'
<your path here>/atlassian-jira/WEB-INF/atlassian-bundled-plugins/jira-mail-plugin-9.0.3.jar
Adjust the POM to correspond to the correct version of the mail plugin
Here is the patch I applied against the pom.xml:
diff --git a/pom.xml b/pom.xml
index f493ef2..a3bbb8f 100644
--- a/pom.xml
+++ b/pom.xml
## -54,7 +54,7 ##
<dependency>
<groupId>com.atlassian.jira</groupId>
<artifactId>jira-mail-plugin</artifactId>
- <version>${jira.version}</version>
+ <version>${jira.mail.plugin.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
## -104,8 +104,9 ##
</build>
<properties>
- <jira.version>6.0.4</jira.version>
- <amps.version>4.2.0</amps.version>
+ <jira.version>7.1.8</jira.version>
+ <jira.mail.plugin.version>9.0.3</jira.mail.plugin.version> <!-- the version of the mail plugin shipped with your version of JIRA -->
+ <amps.version>5.0.4</amps.version> <!-- Adjust this to the specific version of the plugin SDK you have installed -->
<plugin.testrunner.version>1.1.1</plugin.testrunner.version>
<!-- TestKit version 5.x for JIRA 5.x, 6.x for JIRA 6.x -->
<testkit.version>5.2.26</testkit.version>
Fix other type issues
There is one other reference in DemoHandler that you'll have to change from User to ApplicationUser.
After that, it builds for me.

Embed Avatar JS in Java Application Example

Using Java 8, I'd like to programmatically load a javascript file and execute it using Avatar JS (for Node env support). I also want to use Maven to manage the dependencies.
Here's the simple Nashorn snippet I'm using and I'd like to extend this to support Node.JS modules, ideally using Avatar JS.
ScriptEngine engine = new ScriptEngineManager().getEngineByName("nashorn");
InputStream in = getClass().getClassLoader().getResourceAsStream("js/hello-world.js");
String result = (String)engine.eval(new InputStreamReader(in));
System.out.print(result);
The relevant Maven config also looks like this:
<repositories>
<repository>
<id>nexus-snapshots</id>
<name>Nexus Snapshots</name>
<url>https://maven.java.net/content/repositories/snapshots/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>com.oracle</groupId>
<artifactId>avatar-js</artifactId>
<version>0.10.32-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>com.oracle</groupId>
<artifactId>libavatar-js-linux-x64</artifactId>
<version>0.10.32-SNAPSHOT</version>
<type>pom</type>
</dependency>
</dependencies>
I get the impression there's a lot of good functionality in Avatar, but I'm struggling to find any decent docs or examples. Can anyone provide a code example of how to do this?
I figured this out, the relevant code I have running looks like this:
import com.oracle.avatar.js.Server;
import com.oracle.avatar.js.Loader;
import com.oracle.avatar.js.log.Logging;
and
String runJs() throws Throwable {
StringWriter scriptWriter = new StringWriter();
ScriptEngine engine = new ScriptEngineManager().getEngineByName("nashorn");
ScriptContext scriptContext = engine.getContext();
scriptContext.setWriter(scriptWriter);
Server server = new Server(engine, new Loader.Core(), new Logging(false), System.getProperty("user.dir"));
server.run("js/hello-world.js");
return scriptWriter.toString();
}
and, for now, a simple hello-world.js:
var util = require('util')
var result = util.format('hello %s', 'Phil');
print(result);
I also pass in java.library.home as a JVM argument when running the application. The Avatar native library resides in this directory

Categories