I have a project in Java. This project has a class com.xyz.api.base.models.mongo.Member.
I want to import this Java project to a Scala project to use Member class.
However, I got this error (the library is already downloaded to Scala dependencies):
java.lang.RuntimeException: java.lang.ClassNotFoundException: models.mongo.Member
The strange thing is that there is not compilation error. The error above only happens at runtime. Furthermore, the error message does not mention com.xyz.api.base as the base package of models.mongo.Member.
My code:
import com.redmart.api.base.models.mongo.Member
import com.redmart.api.base.utils.RedisCacheImpl
import redis.RedisClient
object Redis extends App {
implicit val akkaSystem = akka.actor.ActorSystem()
val host: String = "127.0.0.1"
val port: Int = 6379
val db: Int = 0
val timeout: Long = 10000L
val key = "a2IxSE5kdW9HRHZUe"
var redisCacheImpl: RedisCacheImpl = _
try {
RedisCacheImpl.configRedis(host, port, db, timeout)
redisCacheImpl = RedisCacheImpl.getInstance()
val obj = redisCacheImpl.get(key)
val member = obj.asInstanceOf[Member]
println(s"member id ${member.getMemberId}")
}
Thank you for your help.
In this case spring-boot's version 1.2.3.RELEASE use mongo-java-driver 2.12.5. for more details go through this documentation :Link
Related
I am trying to connect java with R using Rserve
Java: 1.8.0_151
R: 3.5.0
OS: Mac 10.13.4 HighSierra
To connect R with Java, I typed the following on RStudio
install.packages("Rserve")
library(Rserve)
Rserve(args="--no-save")
things went smooth and I was so happy about it.
Then I jumped back to Java (Java Eclipse so to speak) and continued typing. Here is what I've done on Eclipse
package rserve;
import org.rosuda.REngine.REXPMismatchException;
import org.rosuda.REngine.REngineException;
import org.rosuda.REngine.Rserve.RConnection;
import org.rosuda.REngine.Rserve.RserveException;
public class WordCloud1 {
public static void main(String[] args) throws REngineException,
REXPMismatchException {
RConnection c = new RConnection();
String path = "/Users/JinhoShin/Desktop/study/R/r_temp2";
String file = "seoul_new.txt";
c.parseAndEval("library(KoNLP)");
c.parseAndEval("useSejongDic()");
c.parseAndEval("library(wordcloud)");
c.parseAndEval("library(RColorBrewer)");
c.parseAndEval("setwd('" + path + "')");
c.parseAndEval("data1=readLines('" + file + "')");
c.parseAndEval("data2 = sapply(data1,extractNoun,USE.NAMES=F)");
c.parseAndEval("data3 = unlist(data2)");
c.parseAndEval("data3=gsub('seoul','',data3)");
c.parseAndEval("data3=gsub('request','',data3)");
c.parseAndEval("data3=gsub('place','',data3)");
c.parseAndEval("data3=gsub('transportation','',data3)");
c.parseAndEval("data3=gsub(' ','',data3)");
c.parseAndEval("data3=gsub('-','',data3)");
c.parseAndEval("data3=gsub('OO','',data3)");
c.parseAndEval("write(unlist(data3),'seoul_2.txt')");
c.parseAndEval("data4 = read.table('seoul_2.txt')"); ########this is what blows me up
c.parseAndEval("wordcount=table(data4)");
c.parseAndEval("palete = brewer.pal(9,'Set3')");
c.parseAndEval(
"wordcloud(names(wordcount),freq = wordcount,scale=c(5,1),rot.per=0.25, min.freq = 1," +
" random.order=F, random.color = T, colors=palete)");
c.parseAndEval("savePlot('0517seoul.png', type = 'png')");
c.parseAndEval("dev.off()");
c.close();
}
}
as you notice from the code
c.parseAndEval("data4 = read.table('seoul_2.txt')"); => at rserve.WordCloud1.main(WordCloud1.java:30)
I have no idea why it can't read my text file despite the fact that it could write that file.
This is what Java Eclipse console keeps showing me
Exception in thread "main" org.rosuda.REngine.REngineException: eval failed
at org.rosuda.REngine.Rserve.RConnection.parseAndEval(RConnection.java:499)
at org.rosuda.REngine.REngine.parseAndEval(REngine.java:108)
at rserve.WordCloud1.main(WordCloud1.java:30)
Caused by: org.rosuda.REngine.Rserve.RserveException: eval failed
at org.rosuda.REngine.Rserve.RConnection.eval(RConnection.java:261)
at org.rosuda.REngine.Rserve.RConnection.parseAndEval(RConnection.java:497)
... 2 more
and this is what RStudio keeps showing me
Error: long vectors not supported yet: qap_encode.c:36
Fatal error: unable to initialize the JIT
I tried everything I could do to resolve this issue, but still I am on the same spot.
I go through document but still it is very much confusing how to get data from swift.
I configured swift in my one linux machine. By using below command I am able to get container list,
swift -A https://acc.objectstorage.softlayer.net/auth/v1.0/ -U
username -K passwordkey list
I seen many blog for blumix(https://console.ng.bluemix.net/docs/services/AnalyticsforApacheSpark/index-gentopic1.html#genTopProcId2) and written the below code
sc.textFile("swift://container.myacct/file.xml")
I am looking to integrate in java spark. Where need to configure object storage credential in java code. Is there any sample code or blog?
This notebook illustrates a number of ways to load data using the Scala language. Scala runs on the JVM. Java and Scala classes can be freely mixed, no matter whether they reside in different projects or in the same. Looking at the mechanics of how Scala code interacts with Openstack Swift object storage should help guide you to craft a Java equivalent.
From the above notebook, here are some steps illustrating how to configure and extract data from an Openstack Swift Object Storage instance using the Stocator library using the Scala language. The swift url decomposes into:
swift2d :// container . myacct / filename.extension
^ ^ ^ ^
stocator name of namespace object storage
protocol container filename
Imports
import org.apache.spark.SparkContext
import scala.util.control.NonFatal
import play.api.libs.json.Json
val sqlctx = new SQLContext(sc)
val scplain = sqlctx.sparkContext
Sample Creds
// #hidden_cell
var credentials = scala.collection.mutable.HashMap[String, String](
"auth_url"->"https://identity.open.softlayer.com",
"project"->"object_storage_3xxxxxx3_xxxx_xxxx_xxxx_xxxxxxxxxxxx",
"project_id"->"6xxxxxxxxxx04fxxxxxxxxxx6xxxxxx7",
"region"->"dallas",
"user_id"->"cxxxxxxxxxxaxxxxxxxxxx1xxxxxxxxx",
"domain_id"->"cxxxxxxxxxxaxxyyyyyyxx1xxxxxxxxx",
"domain_name"->"853255",
"username"->"Admin_cxxxxxxxxxxaxxxxxxxxxx1xxxxxxxxx",
"password"->"""&M7372!FAKE""",
"container"->"notebooks",
"tenantId"->"undefined",
"filename"->"file.xml"
)
Helper Method
def setRemoteObjectStorageConfig(name:String, sc: SparkContext, dsConfiguration:String) : Boolean = {
try {
val result = scala.util.parsing.json.JSON.parseFull(dsConfiguration)
result match {
case Some(e:Map[String,String]) => {
val prefix = "fs.swift2d.service." + name
val hconf = sc.hadoopConfiguration
hconf.set("fs.swift2d.impl","com.ibm.stocator.fs.ObjectStoreFileSystem")
hconf.set(prefix + ".auth.url", e("auth_url") + "/v3/auth/tokens")
hconf.set(prefix + ".tenant", e("project_id"))
hconf.set(prefix + ".username", e("user_id"))
hconf.set(prefix + ".password", e("password"))
hconf.set(prefix + "auth.method", "keystoneV3")
hconf.set(prefix + ".region", e("region"))
hconf.setBoolean(prefix + ".public", true)
println("Successfully modified sparkcontext object with remote Object Storage Credentials using datasource name " + name)
println("")
return true
}
case None => println("Failed.")
return false
}
}
catch {
case NonFatal(exc) => println(exc)
return false
}
}
Load the Data
val setObjStor = setRemoteObjectStorageConfig("sparksql", scplain, Json.toJson(credentials.toMap).toString)
val data_rdd = scplain.textFile("swift2d://notebooks.sparksql/" + credentials("filename"))
data_rdd.take(5)
Hello people of the Earth!
I'm using Airflow to schedule and run Spark tasks.
All I found by this time is python DAGs that Airflow can manage.
DAG example:
spark_count_lines.py
import logging
from airflow import DAG
from airflow.operators import PythonOperator
from datetime import datetime
args = {
'owner': 'airflow'
, 'start_date': datetime(2016, 4, 17)
, 'provide_context': True
}
dag = DAG(
'spark_count_lines'
, start_date = datetime(2016, 4, 17)
, schedule_interval = '#hourly'
, default_args = args
)
def run_spark(**kwargs):
import pyspark
sc = pyspark.SparkContext()
df = sc.textFile('file:///opt/spark/current/examples/src/main/resources/people.txt')
logging.info('Number of lines in people.txt = {0}'.format(df.count()))
sc.stop()
t_main = PythonOperator(
task_id = 'call_spark'
, dag = dag
, python_callable = run_spark
)
The problem is I'm not good in Python code and have some tasks written in Java. My question is how to run Spark Java jar in python DAG? Or maybe there is other way yo do it? I found spark submit: http://spark.apache.org/docs/latest/submitting-applications.html
But I don't know how to connect everything together. Maybe someone used it before and has working example. Thank you for your time!
You should be able to use BashOperator. Keeping the rest of your code as is, import required class and system packages:
from airflow.operators.bash_operator import BashOperator
import os
import sys
set required paths:
os.environ['SPARK_HOME'] = '/path/to/spark/root'
sys.path.append(os.path.join(os.environ['SPARK_HOME'], 'bin'))
and add operator:
spark_task = BashOperator(
task_id='spark_java',
bash_command='spark-submit --class {{ params.class }} {{ params.jar }}',
params={'class': 'MainClassName', 'jar': '/path/to/your.jar'},
dag=dag
)
You can easily extend this to provide additional arguments using Jinja templates.
You can of course adjust this for non-Spark scenario by replacing bash_command with a template suitable in your case, for example:
bash_command = 'java -jar {{ params.jar }}'
and adjusting params.
Airflow as of version 1.8 (released today), has
SparkSqlOperator - https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/spark_sql_operator.py ;
SparkSQLHook code - https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/spark_sql_hook.py
SparkSubmitOperator - https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/spark_submit_operator.py
SparkSubmitHook code - https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/spark_submit_hook.py
Notice that these two new Spark operators/hooks are in "contrib" branch as of 1.8 version so not (well) documented.
So you can use SparkSubmitOperator to submit your java code for Spark execution.
There is an example of SparkSubmitOperator usage for Spark 2.3.1 on kubernetes (minikube instance):
"""
Code that goes along with the Airflow located at:
http://airflow.readthedocs.org/en/latest/tutorial.html
"""
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator
from airflow.models import Variable
from datetime import datetime, timedelta
default_args = {
'owner': 'user#mail.com',
'depends_on_past': False,
'start_date': datetime(2018, 7, 27),
'email': ['user#mail.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
'end_date': datetime(2018, 7, 29),
}
dag = DAG(
'tutorial_spark_operator', default_args=default_args, schedule_interval=timedelta(1))
t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
print_path_env_task = BashOperator(
task_id='print_path_env',
bash_command='echo $PATH',
dag=dag)
spark_submit_task = SparkSubmitOperator(
task_id='spark_submit_job',
conn_id='spark_default',
java_class='com.ibm.cdopoc.DataLoaderDB2COS',
application='local:///opt/spark/examples/jars/cppmpoc-dl-0.1.jar',
total_executor_cores='1',
executor_cores='1',
executor_memory='2g',
num_executors='2',
name='airflowspark-DataLoaderDB2COS',
verbose=True,
driver_memory='1g',
conf={
'spark.DB_URL': 'jdbc:db2://dashdb-dal13.services.dal.bluemix.net:50001/BLUDB:sslConnection=true;',
'spark.DB_USER': Variable.get("CEDP_DB2_WoC_User"),
'spark.DB_PASSWORD': Variable.get("CEDP_DB2_WoC_Password"),
'spark.DB_DRIVER': 'com.ibm.db2.jcc.DB2Driver',
'spark.DB_TABLE': 'MKT_ATBTN.MERGE_STREAM_2000_REST_API',
'spark.COS_API_KEY': Variable.get("COS_API_KEY"),
'spark.COS_SERVICE_ID': Variable.get("COS_SERVICE_ID"),
'spark.COS_ENDPOINT': 's3-api.us-geo.objectstorage.softlayer.net',
'spark.COS_BUCKET': 'data-ingestion-poc',
'spark.COS_OUTPUT_FILENAME': 'cedp-dummy-table-cos2',
'spark.kubernetes.container.image': 'ctipka/spark:spark-docker',
'spark.kubernetes.authenticate.driver.serviceAccountName': 'spark'
},
dag=dag,
)
t1.set_upstream(print_path_env_task)
spark_submit_task.set_upstream(t1)
The code using variables stored in Airflow variables:
Also, you need to create a new spark connection or edit existing 'spark_default' with
extra dictionary {"queue":"root.default", "deploy-mode":"cluster", "spark-home":"", "spark-binary":"spark-submit", "namespace":"default"}:
Go to Admin -> Connection -> Create in Airflow UI. Create a new SSH connection by providing host = IP address, port = 22 and extra as {"key_file": "/path/to/pem/file", "no_host_key_check":true}
This host should be the Spark cluster master from which you can submit spark-jobs. Next, you need to create a DAG with SSHOperator. Following is the template for this.
with DAG(dag_id='ssh-dag-id',
schedule_interval="05 12 * * *",
catchup=False) as dag:
spark_job = ("spark-submit --class fully.qualified.class.name "
"--master yarn "
"--deploy-mode client "
"--driver-memory 6G "
"--executor-memory 6G "
"--num-executors 6 "
"/path/to/your-spark.jar")
ssh_run_query = SSHOperator(
task_id="random_task_id",
ssh_conn_id="name_of_connection_you just_created",
command=spark_job,
get_pty=True,
dag=dag)
ssh_run_query
That's it. You also get the complete logs for this Spark job in Airflow.
I have a child java project which has groovy files added in classpath using eclipse. Parent java project triggers some functionality in child which uses Groovy library to run the scripts. So import works fine in eclipse environment with opened child project but if I run it from command line or if I close child project then I get groovy compilation error at import statement. How can I resolve this ? I want to avoid using evaluate() method.
Following is my master groovy:
package strides_business_script
abstract class Business_Script extends Script {
//some stuff
}
Following is the another groovy:
import static strides_business_script.StridesBusiness_Script.*;
import org.json.simple.JSONArray;
import org.json.simple.JSONObject;
String Key = Part_Product_PartDetails
boolean containsData = checkIncomingMessage(Key)
if(containsData) {
def edgeKeyList = [PPR]
JSONArray partDetails = appendEdgeValueToMsg(edgeKeyList,Key,vertex,messageIterator);
//deleteMessages(Key);
JSONObject jsonObject = constructInfoWithPropertyJSON("NAME,PRODUCTTYPE,FGTYPE,UOM,ITEMCLASSIFICATIONBYMARKET");
jsonObject.put("PARTS",partDetails);
send(Product_AggPO_ProductDetails,convertJSONToString(jsonObject));
}
Edit:
My master script Business_Script.groovy resides in scripts/strides_business_script/ folder. All other scripts are in scripts/StridesComputationScripts/ folder and they import the Business_Script.groovy.
I run the application with remote debugging enabled like this:
java -cp "./lib/*:./scripts/strides_business_script/Business_Script.groovy" -Xdebug -Xrunjdwp:transport=dt_socket,address=6969,server=y -Dhibernate.cfg.xml.path=./conf/hibernate.cfg.xml -Dlog4j.configuration=file:./conf/log4j.properties com.biglabs.dataExtractor.dataDump.DataDumpDriver 7
and here I am trying to parse all computation scripts.
for (String scriptName : files) {
Script script = groovyShell.parse(new File(
SCRIPT_PLACED_AT + Constants.SLASH
+ SCRIPT_FILE_FOLDER + Constants.SLASH
+ scriptName));
scriptMping.put(scriptName, script);
}
It throws following exception while parsing using groovy shell:
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
/home/manoj/strides/release/strides/scripts/StridesComputationScripts/PRODUCT-script.groovy: 2: unable to resolve class strides_business_script.StridesBusiness_Script
# line 2, column 1.
import static strides_business_script.Business_Script.*;
^
/home/manoj/strides/release/strides/scripts/StridesComputationScripts/PRODUCT-script.groovy: 2: unable to resolve class strides_business_script.StridesBusiness_Script
# line 2, column 1.
import static strides_business_script.Business_Script.*;
^
2 errors
Fixed it by adding script path in comiler configuration:
CompilerConfiguration compilerConfiguration = new CompilerConfiguration();
String path = SCRIPT_PLACED_AT;
if(!SCRIPT_PLACED_AT.endsWith("/")){
path = path+ "/";
}
compilerConfiguration.setClasspath(path);
GroovyShell groovyShell = new GroovyShell(
compilerConfiguration);
for (String scriptName : files) {
Script script = groovyShell.parse(new File(
SCRIPT_PLACED_AT + Constants.SLASH
+ SCRIPT_FILE_FOLDER + Constants.SLASH
+ scriptName));
scriptMping.put(scriptName, script);
}
I've run into an issue with attempting to parse json in my spark job. I'm using spark 1.1.0, json4s, and the Cassandra Spark Connector, with DSE 4.6. The exception thrown is:
org.json4s.package$MappingException: Can't find constructor for BrowserData org.json4s.reflect.ScalaSigReader$.readConstructor(ScalaSigReader.scala:27)
org.json4s.reflect.Reflector$ClassDescriptorBuilder.ctorParamType(Reflector.scala:108)
org.json4s.reflect.Reflector$ClassDescriptorBuilder$$anonfun$6.apply(Reflector.scala:98)
org.json4s.reflect.Reflector$ClassDescriptorBuilder$$anonfun$6.apply(Reflector.scala:95)
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
My code looks like this:
case class BrowserData(navigatorObjectData: Option[NavigatorObjectData],
flash_version: Option[FlashVersion],
viewport: Option[Viewport],
performanceData: Option[PerformanceData])
.... other case classes
def parseJson(b: Option[String]): Option[String] = {
implicit val formats = DefaultFormats
for {
browserDataStr <- b
browserData = parse(browserDataStr).extract[BrowserData]
navObject <- browserData.navigatorObjectData
userAgent <- navObject.userAgent
} yield (userAgent)
}
def getJavascriptUa(rows: Iterable[com.datastax.spark.connector.CassandraRow]): Option[String] = {
implicit val formats = DefaultFormats
rows.collectFirst { case r if r.getStringOption("browser_data").isDefined =>
parseJson(r.getStringOption("browser_data"))
}.flatten
}
def getRequestUa(rows: Iterable[com.datastax.spark.connector.CassandraRow]): Option[String] = {
rows.collectFirst { case r if r.getStringOption("ua").isDefined =>
r.getStringOption("ua")
}.flatten
}
def checkUa(rows: Iterable[com.datastax.spark.connector.CassandraRow], sessionId: String): Option[Boolean] = {
for {
jsUa <- getJavascriptUa(rows)
reqUa <- getRequestUa(rows)
} yield (jsUa == reqUa)
}
def run(name: String) = {
val rdd = sc.cassandraTable("beehive", name).groupBy(r => r.getString("session_id"))
val counts = rdd.map(r => (checkUa(r._2, r._1)))
counts
}
I use :load to load the file into the REPL, and then call the run function. The failure is happening in the parseJson function, as far as I can tell. I've tried a variety of things to try to get this to work. From similar posts, I've made sure my case classes are in the top level in the file. I've tried compiling just the case class definitions into a jar, and including the jar in like this: /usr/bin/dse spark --jars case_classes.jar
I've tried adding them to the conf like this: sc.getConf.setJars(Seq("/home/ubuntu/case_classes.jar"))
And still the same error. Should I compile all of my code into a jar? Is this a spark issue or a JSON4s issue? Any help at all appreciated.