Hi I'm trying to learn how to use pyspark but when I run this first line :
import pyspark
sc = pyspark.SparkContext('local[*]')
I get this error :
Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module #0x724b93a8) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module #0x724b93a8
I can't seem to find what's causing it :/
Spark runs on Java 8/11, Scala 2.12, Python 3.6+ and R 3.5+. Python 3.6 support is deprecated as of Spark 3.2.0.
Java 8 prior to version 8u201 support is deprecated as of Spark 3.2.0
For the Scala API, Spark 3.2.0 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x).
For Python 3.9, Arrow optimization and pandas UDFs might not work due to the supported Python versions in Apache Arrow. Please refer to the latest Python Compatibility page.
For Java 11, -Dio.netty.tryReflectionSetAccessible=true is required additionally for Apache Arrow library. This prevents java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available when Apache Arrow uses Netty internally.
What worked for me:
brew install openjdk#8
sudo ln -sfn /usr/local/opt/openjdk#8/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk-8.jdk
If you need to have openjdk#8 first in your PATH, run:
echo 'export PATH="/usr/local/opt/openjdk#8/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc
Spark runs on Java 8/11 Java SE 8 Archive Downloads (JDK 8u202 and earlier)
I successfully installed Python 3.9.7 version of Anaconda distribution.
I’ve provided a spark installation link How to Install and Run PySpark in Jupyter Notebook on Windows
I’ve provided a spark installation video link youtube Video how to Run PySpark in Jupyter Notebook on Windows
This works me
Source: Eden Canlilar
How to Install and Run PySpark in Jupyter Notebook on Windows
When I write PySpark code, I use Jupyter notebook to test my code before submitting a job on the cluster. In this post, I will show you how to install and run PySpark locally in Jupyter Notebook on Windows. I’ve tested this guide on a dozen Windows 7 and 10 PCs in different languages.
A. Items needed
Spark distribution from spark.apache.orgspark.apache.org
Python and Jupyter Notebook. You can get both by installing the Python 3.x version of [Anaconda distribution.]
winutils.exe — a Hadoop binary for Windows — from Steve Loughran’s GitHub repo. Go to the corresponding Hadoop version in the Spark distribution and find winutils.exe under /bin. For example,
https://github.com/steveloughran/winutils/blob/master/hadoop-2.7.1/bin/winutils.exe
The findspark Python module, which can be installed by running python -m pip install findspark either in Windows command prompt or Git bash if Python is installed in item 2. You can find command prompt by searching cmd in the search box.
If you don’t have Java or your Java version is 7.x or less, download and install Java from Oracle. I recommend getting the latest JDK (current version 9.0.1).
If you don’t know how to unpack a .tgz file on Windows, you can download and install 7-zip on Windows to unpack the .tgz file from Spark distribution in item 1 by right-clicking on the file icon and select 7-zip > Extract Here.
B. Installing PySpark
After getting all the items in section A, let’s set up PySpark.
Unpack the .tgz file. For example, I unpacked with 7zip from step A6 and put mine under D:\spark\spark-2.2.1-bin-hadoop2.7
Move the winutils.exe downloaded from step A3 to the \bin folder of Spark distribution. For example, D:\spark\spark-2.2.1-bin-hadoop2.7\bin\winutils.exe
Add environment variables: the environment variables let Windows find where the files are when we start the PySpark kernel. You can find the environment variable settings by putting “environ…” in the search box.
The variables to add are, in my example,
Name
Value
SPARK_HOME
D:\spark\spark-2.2.1-bin-hadoop2.7
HADOOP_HOME
D:\spark\spark-2.2.1-bin-hadoop2.7
PYSPARK_DRIVER_PYTHON
jupyter
PYSPARK_DRIVER_PYTHON_OPTS
notebook
In the same environment variable settings window, look for the Path or PATH variable, click edit and add D:\spark\spark-2.2.1-bin-hadoop2.7\bin to it. In Windows 7 you need to separate the values in Path with a semicolon ; between the values.
(Optional, if see Java related error in step C) Find the installed Java JDK folder from step A5, for example, D:\Program Files\Java\jdk1.8.0_121, and add the following environment variable
Name
Value
JAVA_HOME
D:\Progra~1\Java\jdk1.8.0_121
If JDK is installed under \Program Files (x86), then replace the Progra~1 part by Progra~2 instead. In my experience, this error only occurs in Windows 7, and I think it’s because Spark couldn’t parse the space in the folder name.
Edit (1/23/19): You might also find Gerard’s comment helpful: How to Install and Run PySpark in Jupyter Notebook on Windows
C. Running PySpark in Jupyter Notebook
To run Jupyter notebook, open Windows command prompt or Git Bash and run jupyter notebook. If you use Anaconda Navigator to open Jupyter Notebook instead, you might see a Java gateway process exited before sending the driver its port number error from PySpark in step C. Fall back to Windows cmd if it happens.
Once inside Jupyter notebook, open a Python 3 notebook
In the notebook, run the following code
import findspark
findspark.init()
import pyspark # only run after findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.sql('''select 'spark' as hello ''')
df.show()
When you press run, it might trigger a Windows firewall pop-up. I pressed cancel on the pop-up as blocking the connection doesn’t affect PySpark.
If you see the following output, then you have
installed PySpark on your Windows system!
Related
On Fedora I installed (or at least tried to install) JDK-18. I rebooted my laptop and wanted to check if it succeeded. I used java --version, but it said bash : java : command not found.
Well all that I will write is subjective. It all depends upon how did you install java in Fedora. If you downloaded the jdk distribution and unzipped it to some folder. Then in your .bashrc, you need to provide the path where the binaries are available
export PATH=$PATH:/path/to/java/installation/folder/bin
You can also look at sdkman
https://sdkman.io/
This is a nice tool to manage your JDK(s) regardless of Linux distribution you are using. (disclaimer: I have no affiliation with sdkman - i find this tool very helpful)
I am unable to generate a folder/HTML report of jmeter in the command line.
I have previously upgraded to the latest java and somehow it did not work.
I have downloaded jdk8 but encountered this message below:
jmeter: line 128: [: : integer expression expected jmeter: line 199:
/Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home -v
1.8.331.09/bin/java: No such file or directory
You're using wrong Java, you need to have JDK (or at least JRE) and you seem to be using Java browser plugin
Follow the instructions from Installation of the JDK on macOS article to get the required version of Java (not earlier than JDK 8) and make sure that it's in your PATH before the one which is provided by the browser plugin.
Also you can consider using Homebrew for installing JMeter.
More information: Get Started With JMeter: Installation & Tests
I have to automate a process in which i need to install java on RHEL 7, using rpm package and then update the security jars later on the installed location. Once i execute the java rpm, how would i get the correct installed path? when i execute which java it says /bin/java. However, i need the installation location using shell script.
Here, the actual installation happened at /usr/java/jdk1.7.0_55/jre. I want to get this location. Can some one suggest me how to retrieve this?
The readlink program (part of coreutils, and available on any RHEL version) can resolve symbolic links:
foo=$(readlink -f $(which java))
echo $foo
(You may also have realpath, but perhaps not).
Java RPM packages from Oracle support the 'alternatives' system, which provides the detail you are seeking. For example, after installing a recent JRE, 'alternatives' reports like so:
[user#host ~]$ alternatives --display java
java - status is auto.
link currently points to /usr/java/jre1.8.0_60/bin/java
etc...
The output is relatively friendly for scripting.
I built my NetBeans web project with Java 1.5 successfully, however; my linux server supports / uses Java 1.4 and Java 1.5 (as well as JBoss 4.0.2).
When I check the version of my project ( java -version ) it says that the current version is Java 1.4.2. However, I don't want to change the "JAVA_HOME" setting on the server because other projects need to use this version.
I want my project to use Java 1.5 from the server...
An idea as to how I should go about doing this? Is there a configuration that I can change?
I have this error:
java.lang.UnsoupportedClassVersionError: bad Version in .class file
You need Java 1.5 version installed in your Linux server.
Next, run the following set of commands in your terminal/command prompt:
JAVA_HOME="{fix-me}"
export JAVA_HOME
export PATH = $JAVA_HOME/bin:$PATH
In the place holder {fix-me} specify the path of Java 5 that is installed in the Linux machine.
This would only temporarily set the Java version to 5 until the terminal/command prompt session is alive. So you need not worry about disturbing other projects.
I want to install the JDK in cygwin on my windows machine. I am downloading the linux version of JDK from oracle site using wget command. Here is the list of commands I am running to install JDK:
wget http://download.oracle.com/otn-pub/java/jdk/6u31-b04/jdk-6u31-linux-x64.rpm.bin
chmod a+x jdk-6u31-linux-x64.rpm.bin
./jdk-6u31-linux-x64.rpm.bin
All these instructions are same as suggested by Oracle for installing JDK over here but I am getting the following errors:
Firstly, those messages indicate that what you are trying to execute is an HTML document! In other words, the download has failed and given you an error page rather than an installer.
However, assuming that you succeed in downloading the (Linux) installer, it is unlikely that it will install properly, and there is about ZERO chance that the installed tools will run. Applications that have been compiled for Linux don't run on Cygwin.
What you need to do is to download and install the JDK for Windows, and then tweak your cygwin profile a bit. This page explains: http://horstmann.com/articles/cygwin-tips.html.
(If you Google for "java cygwin" there are various other tips for making Java work from Cygwin. However, in my experience there are a few rough edges ... due to the fact that the Windows Java utilities expect to have been called with windows-style arguments, pathnames, classpaths, etcetera.)