Pydoop job not running

Pydoop job not running - java

I have setup a single-node Hadoop 1.2.1 cluster and trying to run this script:
pydoop script transpose.py matrix.txt t_matrix
The script returns nothing and the job is in pending status.
The question is, after running the script the job is in pending status for more than 10 minutes. Why the Job is not running properly?
And this is the output generated while running:
Traceback (most recent call last): File "/home/hduser/hadoop/tmp/mapred/local
/taskTracker/distcache/-2030848362897089950_-2130723868_1886929692/localhost
/user/hduser /pydoop_script_91c491cf7e6b42f6bcbeda09edae9385
/exe90d967507f86405a9606c35582b2fc43", line 10, in import pydoop.pipes File"/usr/local
/lib/python2.7/dist-packages/pydoop/pipes.py", line 29, in pp =
pydoop.import_version_specific_module('_pipes') File "/usr/local/lib/python2.7/dist-
packages/pydoop/__init__.py", line 107, in import_version_specific_module return
import_module(complete_mod_name(name)) File "/usr/lib/python2.7/importli/__init__.py",
line 37, in import_module __import__(name) ImportError: /usr/local/lib/python2.7/dist-
packages/pydoop/_pipes_1_2_1.so: undefined symbol: BIO_s_mem

You are missing one required SSL library.
You will need to find and link "libssl.so.1.0.0" in your environment.
Try to execute the following before running your pydoop script:
export LD_PRELOAD=PATH_TO/libssl.so.1.0.0
For example:
export LD_PRELOAD=/lib/x86_64-linux-gnu/libssl.so.1.0.0

Related

I want to use a java project in python. On installing pip install pyjnius, it gives an error

pip install pyjnius
Collecting pyjnius
Using cached https://files.pythonhosted.org/packages/b6/57/c90acf31322e6417f06c90410dbfcb149633a6006b7efbf99dfebe177c1f/pyjnius-1.2.0.tar.gz
ERROR: Command errored out with exit status 1:
command: 'c:\users\dev1\appdata\local\programs\python\python37-32\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\Dev1\\AppData\\Local\\Temp\\pip-install-uxkserni\\pyjnius\\setup.py'"'"'; __file__='"'"'C:\\Users\\Dev1\\AppData\\Local\\Temp\\pip-install-uxkserni\\pyjnius\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\Dev1\AppData\Local\Temp\pip-install-uxkserni\pyjnius\pip-egg-info'
cwd: C:\Users\Dev1\AppData\Local\Temp\pip-install-uxkserni\pyjnius\
Complete output (18 lines):
warning: [options] bootstrap class path not set in conjunction with -source 6
error: Source option 6 is no longer supported. Use 7 or later.
error: Target option 6 is no longer supported. Use 7 or later.
WARNING: Not able to assign machine() = AMD64 to a cpu value!
Using cpu = 'i386' instead!
JDK_HOME: C:\Program Files\Java\jdk-13.0.1
JRE_HOME: None
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\Dev1\AppData\Local\Temp\pip-install-uxkserni\pyjnius\setup.py", line 246, in <module>
compile_native_invocation_handler(JDK_HOME, JRE_HOME)
File "C:\Users\Dev1\AppData\Local\Temp\pip-install-uxkserni\pyjnius\setup.py", line 96, in compile_native_invocation_handler
join('jnius', 'src', 'org', 'jnius', 'NativeInvocationHandler.java')
File "c:\users\dev1\appdata\local\programs\python\python37-32\lib\subprocess.py", line 347, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['C:\\Program Files\\Java\\jdk-13.0.1\\bin\\javac.exe', '-target', '1.6', '-source', '1.6', 'jnius\\src\\org\\jnius\\NativeInvocationHandler.java']' returned non-zero exit status 2.
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

TypeError: 'JavaPackage' object is not callable (spark._jvm)

I'm setting up GeoSpark Python and after installing all the pre-requisites, I'm running the very basic code examples to test it.
from pyspark.sql import SparkSession
from geo_pyspark.register import GeoSparkRegistrator
spark = SparkSession.builder.\
getOrCreate()
GeoSparkRegistrator.registerAll(spark)
df = spark.sql("""SELECT st_GeomFromWKT('POINT(6.0 52.0)') as geom""")
df.show()
I tried running it with python3 basic.py and spark-submit basic.py, both give me this error:
Traceback (most recent call last):
File "/home/jessica/Downloads/geo_pyspark/basic.py", line 8, in <module>
GeoSparkRegistrator.registerAll(spark)
File "/home/jessica/Downloads/geo_pyspark/geo_pyspark/register/geo_registrator.py", line 22, in registerAll
cls.register(spark)
File "/home/jessica/Downloads/geo_pyspark/geo_pyspark/register/geo_registrator.py", line 27, in register
spark._jvm. \
TypeError: 'JavaPackage' object is not callable
I'm using Java 8, Python 3, Apache Spark 2.4, my JAVA_HOME is set correctly, I'm running Linux Mint 19. My SPARK_HOME is also set:
$ printenv SPARK_HOME
/home/jessica/spark/
How can I fix this?

The Jars for geoSpark are not correctly registered with your Spark Session. There's a few ways around this ranging from a tad inconvenient to pretty seamless. For example, if when you call spark-submit you specify:
--jars jar1.jar,jar2.jar,jar3.jar
then the problem will go away, you can also provide a similar command to pyspark if that's your poison.
If, like me, you don't really want to be doing this every time you boot (and setting this as a .conf() in Jupyter will get tiresome) then instead you can go into $SPARK_HOME/conf/spark-defaults.conf and set:
spark-jars jar1.jar,jar2.jar,jar3.jar
Which will then be loaded when you create a spark instance. If you've not used the conf file before it'll be there as spark-defaults.conf.template.
Of course, when I say jar1.jar.... What I really mean is something along the lines of:
/jars/geo_wrapper_2.11-0.3.0.jar,/jars/geospark-1.2.0.jar,/jars/geospark-sql_2.3-1.2.0.jar,/jars/geospark-viz_2.3-1.2.0.jar
but that's up to you to get the right ones from the geo_pyspark package.
If you are using an EMR:
You need to set your cluster config json to
[
{
"classification":"spark-defaults",
"properties":{
"spark.jars": "/jars/geo_wrapper_2.11-0.3.0.jar,/jars/geospark-1.2.0.jar,/jars/geospark-sql_2.3-1.2.0.jar,/jars/geospark-viz_2.3-1.2.0.jar"
},
"configurations":[]
}
]
and also get your jars to upload as part of your bootstrap. You can do this from Maven but I just threw them on an S3 bucket:
#!/bin/bash
sudo mkdir /jars
sudo aws s3 cp s3://geospark-test-ds/bootstrap/geo_wrapper_2.11-0.3.0.jar /jars/
sudo aws s3 cp s3://geospark-test-ds/bootstrap/geospark-1.2.0.jar /jars/
sudo aws s3 cp s3://geospark-test-ds/bootstrap/geospark-sql_2.3-1.2.0.jar /jars/
sudo aws s3 cp s3://geospark-test-ds/bootstrap/geospark-viz_2.3-1.2.0.jar /jars/
If you are using an EMR Notebook
You need a magic cell at the top of your notebook:
%%configure -f
{
"jars": [
"s3://geospark-test-ds/bootstrap/geo_wrapper_2.11-0.3.0.jar",
"s3://geospark-test-ds/bootstrap/geospark-1.2.0.jar",
"s3://geospark-test-ds/bootstrap/geospark-sql_2.3-1.2.0.jar",
"s3://geospark-test-ds/bootstrap/geospark-viz_2.3-1.2.0.jar"
]
}

I was seeing a similar kind of issue with SparkMeasure jars on Windows 10 machine
self.stagemetrics =
self.sc._jvm.ch.cern.sparkmeasure.StageMetrics(self.sparksession._jsparkSession)
TypeError: 'JavaPackage' object is not callable
So what I did was
Went to 'SPARK_HOME' via Pyspark shell, and installed the required jar
bin/pyspark --packages ch.cern.sparkmeasure:spark-measure_2.12:0.16
Grabbed that jar ( ch.cern.sparkmeasure_spark-measure_2.12-0.16.jar ) and copied into the the Jars folder of 'SPARK_HOME'
Reran the script and now it worked without that above error.

Python Selenium Webdriver won't run, even when PATHed properly, and script written as instructed

I'm trying to learn Webdriver for Python, using a basic understanding of Python, and a more extensive understanding of Selenium and JAVA. I'm following the guide found here. My code:
import unittest
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://www.google.com")
assert "Google" in driver.title
sb = driver.find_element_by_name(lst-ib)
sb.clear()
sb.send_keys("Youtube")
sb.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
driver.close()
Now, running this in PyCharm will return:
C:\Users\mbrenn002c\AppData\Local\Programs\Python\Python35-32\python.exe C:/Users/mbrenn002c/PycharmProjects/PyDriver/Webdriver.py
Traceback (most recent call last):
File "C:/Users/mbrenn002c/PycharmProjects/PyDriver/Webdriver.py", line 5, in <module>
driver = webdriver.Firefox()
File "C:\Users\mbrenn002c\AppData\Local\Programs\Python\Python35-32\lib\site-packages\selenium\webdriver\firefox\webdriver.py", line 145, in __init__
keep_alive=True)
File "C:\Users\mbrenn002c\AppData\Local\Programs\Python\Python35-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 92, in __init__
self.start_session(desired_capabilities, browser_profile)
File "C:\Users\mbrenn002c\AppData\Local\Programs\Python\Python35-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 179, in start_session
response = self.execute(Command.NEW_SESSION, capabilities)
File "C:\Users\mbrenn002c\AppData\Local\Programs\Python\Python35-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 236, in execute
self.error_handler.check_response(response)
File "C:\Users\mbrenn002c\AppData\Local\Programs\Python\Python35-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 192, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Expected browser binary location, but unable to find binary in default location, no 'moz:firefoxOptions.binary' capability provided, and no binary flag set on the command line
Process finished with exit code 1
My pip packages are: selenium; beautifulsoup4.
My PATH looks as such:
%USERPROFILE%\AppData\Local\Mirosoft\WindowsApps;C:\Users\myuser\AppData\Programs\Python;C:\Pythone34;C:\Users\myuser\Desktop\File transfer\Eclipse Items\geckodrver.exe
My main question is; What am I doing wrong? As far as I know, I've followed everything correctly and this code should open geckodriver and work as listed. It won't even run with selenium standalone server for webdriver running.
I've tried running it with the same pip and code on my QPython client for my android phone, which returned some call-backs in the console, with this in the end:
Exception AttributeError: "'Service' object has no attribute 'log_file'" in <bound method Service.__del__ of <seleniumwebdriver.firefox.service.Service object at 0xf5e709f0>> ignored
It may be worth noting that my phone is not Rooted, and all i've really done is save this one script, and pip install Selenium and beautifulsoup4.

selenium.common.exceptions.WebDriverException: Message: Expected
browser binary location, but unable to find binary in default
location, no 'moz:firefoxOptions.binary' capability provided, and no
binary flag set on the command line
Exception is clearly states that Firefox has installed different directory than Selenium. Trying to access the default path but couldn't find. You need to describe where exactly firefox has installed in your code.
Use the snippet below;
import unittest
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
binary = FirefoxBinary('path/to/installed firefox binary')
driver = webdriver.Firefox(firefox_binary=binary)
driver.get("http://www.google.com")
assert "Google" in driver.title
sb = driver.find_element_by_name(lst-ib)
sb.clear()
sb.send_keys("Youtube")
sb.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
driver.close()

Running java jar classes with python using javabridge

I am trying to invoke some classes that are in jar file from python.I have set path and trying to run it like this :
import javabridge as jv
path=r'D:\myFiles\swinglibrary-1.9.5.jar'
jars = jv.JARS+[path]
jv.start_vm(run_headless=True,class_path=jars)
print(str(jv.get_static_field("org.robotframework.swing.SwingLibrary","runKeyword", a)))
I get this below error when i execute this :
Traceback (most recent call last):
File "<pyshell#52>", line 1, in <module>
print(str(javabridge.get_static_field("org.robotframework.swing.SwingLibrary", "runKeyword", a)))
File "C:\Python27\lib\site-packages\javabridge\jutil.py", line 952, in get_static_field
raise JavaException(jexception)
JavaException: org.robotframework.swing.SwingLibrary
I am not sure how to invoke a class in a jar using javabridge.Can anyone help me out with this?

Heroku installing play framework modules

I'm trying to install a module on my heroku app. Running this locally (minus the heroku run at the start) works, but I get an error when trying to run it on Heroku.
heroku run play install securesocial-0.2.2
and here's the output
...
~ Do you want to install this version (y/n)? y
~ Installing module securesocial-0.2.2...
~
~ Fetching http://www.playframework.org/modules/securesocial-0.2.2.zip
Traceback (most recent call last):
File ".play/play", line 153, in <module>
status = cmdloader.commands[play_command].execute(command=play_command, app=play_app, args=remaining_args, env=play_env, cmdloader=cmdloader)
File "/app/.play/framework/pym/play/commands/modulesrepo.py", line 58, in execute
install(app, args, env)
File "/app/.play/framework/pym/play/commands/modulesrepo.py", line 378, in install
Downloader().retrieve(fetch, archive)
File "/app/.play/framework/pym/play/commands/modulesrepo.py", line 88, in retrieve
try: urllib.urlretrieve(url, destination, self.progress)
File "/usr/local/lib/python2.7/urllib.py", line 91, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "/usr/local/lib/python2.7/urllib.py", line 241, in retrieve
tfp = open(filename, 'wb')
IOError: [Errno 2] No such file or directory: u'/app/.play/modules/securesocial-0.2.2.zip'
What's the proper way to do this? I've been searching, but I can't find any documentation on it.

Never used heroku but perhaps this step by step tutorial might help you out.

After you add the module locally you should be able to add the changes that are made to git and then push a new version of your app to Heroku.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Pydoop job not running - java

You are missing one required SSL library. You will need to find and link "libssl.so.1.0.0" in your environment. Try to execute the following before running your pydoop script: export LD_PRELOAD=PATH_TO/libssl.so.1.0.0 For example: export LD_PRELOAD=/lib/x86_64-linux-gnu/libssl.so.1.0.0

Related

I want to use a java project in python. On installing pip install pyjnius, it gives an error

TypeError: 'JavaPackage' object is not callable (spark._jvm)

Python Selenium Webdriver won't run, even when PATHed properly, and script written as instructed

Running java jar classes with python using javabridge

Heroku installing play framework modules

Categories

Resources