Spark RDD Class not Found - java

I am new to Spark and need help with the error:
java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD$
I am creating a standalone Spark example in Scala. I ran sbt clean package and sbt assembly to package the scala spark code. Both completed successfully without any error. Any operation on a RDD throws error. Any pointers to fix this issue will be really helpful.
I invoke the job using spark-submit command.
$SPARK_HOME/bin/spark-submit --class org.apache.spark.examples.GroupTest /Users/../spark_workspace/spark/examples/target/scala-2.10/spark-examples_2.10-1.3.0-SNAPSHOT.jar

I managed to throw this error and get past it. This is definitely a YMMV answer, but I leave it here in case it eventually helps someone.
In my case, I was running a homebrew installed spark (1.2.0) and mahout (0.11.0) on a mac. It was pretty baffling to me because if I ran a mahout command line by hand I didn't get the error, but if I invoked it from within some python code it threw the error.
I realized that I had updated my SPARK_HOME variable in my profile to use 1.4.1 instead and had re-sourced it in my by-hand terminal. The terminal where I was running the python code was still using 1.2.0. I re-sourced my profile in my python terminal and now it "just works."
The whole thing feels very black magicky, if I were to guess some rational reason for this error being thrown, maybe it's because one moving part assumes a different spark version, architecture, whatever than you have. That seems to be the solution hinted at in the comments, too.

Related

Trying to run swiftlint through Java subprocess fails (works normally through command line)

I have a java program to run different kinds of linters, calling the linters as a process using ProcessBuilder. So far, other linters have worked for the most part but with swiftlint, i'm facing a strange issue
When I run it normally from the command line, with
swiftlint --enable-all-rules
It works perfectly, but executing the same through my Java subprocess utility
ProcessBuilder processBuilder = new ProcessBuilder("swfitlint", "--enable-all-rules");
Process process = processBuilder.start();
It fails with the following error message
SourceKittenFramework/library_wrapper.swift:31: Fatal error: Loading libsourcekitdInProc.so failed
At first glance it seems like for some reason certain libraries are not available to the java program which are available to when firing a command through bash, but just to be
sure, I tried running the following command through my subprocess util
swiftlint --version
swift --version
Both of which worked, meaning Java does have access to swift binaries installed on my machine. Any help would be appreciated.
I'm not still not entirely sure why this happens but i believe it's due to some kind of environment issue on my machine itself.
I tried running the same code inside of a docker container that uses the image built from the following base images
FROM openjdk:17-slim as java_base
FROM ghcr.io/realm/swiftlint:latest
COPY --from=java_base /usr/local/openjdk-17 /usr/local/openjdk-17
and it just worked

Error with pyspark in local when I execute pytest in VS Code from Git Bash

I have the Error with pyspark in local when I execute pytest in VS Code from Git Bash.
If I execute pytest from gitbash console:
I try to debug my code when I create the spark Dataframe in Visual code, show me:
Java gateway process exited before sending its port number
I have configured all environment of variables in my PC:
I have seen this error can be my variable Java is not configured really well, but I have checked it is properly.
But if I execute jupyter notebook from git bash, pyspark work really, with this code
enter image description here
It's like the version mismatch causes this. As '_PYSPARK_DRIVER_CALLBACK_HOST' has been removed during version 0.10.7(here) to 2.3, and gets back from version 2.3.1. So, you should check your Spark version, as the SPARK_HOME points to the right one(at least 2.3.1).
from here
I solved it when I put into the .bash_profile file the following line:
export path_java (in my pc)
In this way, when I execute pytest in bash console I pass testing without problems
Although from VS Code I don't pass test with debugger, I can continue with my python library

AccessDenied Error while using Psutil library in MacOs via Jython

I am trying to use psutil library in MacOS via Jython but when I make a call to psutil.cpu_times function, I get the following error.
except AccessDenied as err:
In psutil documentation the reason is explained as following.
Note (OSX) psutil.AccessDenied is always raised unless running as root
(lsof does the same).
Is there a way to overcome this problem? I will run the program in Linux enviroment after development so a temporary solution will be fine.
Thanks.
For those who land on this page looking for an answer to the same question. I couldn't find a solution to this problem and returned back to extracting system status from /sys.

How to set multi-reducers for Hadoop Program in IntelliJ IDEA?

I am using IntelliJ IDEA in Ubuntu 14.04 to test my hadoop program. When I chang the number of reducer, I use the following code:
job.setNumReduceTasks(3)
I use build artifacts in IDEA to build a jar file and input hadoop jar xxx.jar MyClass intput output in linux shell. The output shows 3 files (part-r-00000, part-r-00001, part-r-00002), which is completely my expectation. However, when I runs the program in IDEA for convenience using the arguments input/ output/, the output result only contains one file part-r-00000. So I am wondering where goes wrong.
When you run in local mode only one reducer will be used - there is no parallelism in local mode. Nothing is going wrong with your code here.
Also see https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#Standalone_Operation:
Standalone Operation
By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.

eclipse hangs when run application (application runs fine, eclipse is what hangs)

I have a single application which will cause eclipse itself to hang when run via eclipse. If I export this application as a jar and run it it works fine. But if I run (or debug) it the application will appear to start (according to ps) and run; but eclipse itself will hang, and is reported as a 'stopped' program with no CPU or memory usage. I've placed a breakpoints on the first line of this application and it doesn't even get there before eclipse ends up stopped. If I forcefully wake eclipse out of stopped state it will work; but it will also lose it's connection with the program I started. My program I want to debug will continue running but eclipse can't control or kill it after I resume the stopped eclipse.
I can run plenty of other applications without issue from eclipse. Oddly I had this issue before, then I could run my application for a day, and now I'm back to the original issue. I don't know what changed between those that would matter.
Can anyone suggest what may cause this or how to repair it?
UPDATE:
I did some more linux magic. It seems that eclipse is stopped while waiting for the command:
sh -c stty -lcanon min 1
It also seems that before that there was a sh (defunct) command which also hung without being reaped for a few minutes which I think was keeping eclipse from running properly; the sh (defunct) finally goes away if I wait long enough; but then the sh command I just linked comes up. I don't know what the original defunct SH command was; I can't do ps fast enough to catch it before it goes defunct. Both issues occur only with eclipse; as a jar file this program runs perfectly fine.
My running assumption is that eclipse isn't getting or handling the sigchild correctly? that would explain the sh (defunct) application at least. It doesn't explain the current SH command which doesn't show as defunct; despite it being something that should execute in seconds?
UPDATE 2.0:
I found this link: http://linux.about.com/od/srl_howto/a/hwtsrl13t04_3.htm basically stty is known to hang when it uses the < /dev/tty syntax; which is why the syntax is deprecated and replaced with a newer syntax. I'm pretty sure this is the problem. Sadly I have no way of figuring out what library is using the deprecated command. I think this all started with the ConsoleReader being constructed; but who knows what code actually ran the command that freezes? also, it seems if this was that broken anyone running consoleReader from eclipse on a linux environment would have the same problem; which I think is safe to assume isn't the case or it would be documented all over the net; so maybe my understanding is still off?
It is related to the configuration of the stty process that is created to attach the console and hence will occur only on UNIX like systems. Seems fixed on current 2.11 jline version.
To bypass the problem you can disable the special unix-terminal capabilities using:
-Djline.terminal=none
as VM argument on the eclipse launch configuration.
Try increasing -Xms<abc>m/-Xmx<efg>m (depending on the system memory) on eclipse.ini in the root directory of eclipse installation.
The problem was that we were using an older version of 'jline' which used deprecated functionality. The new jline jar fixed the problem, as it no longer used the deprecated stty calls. I'm not quite certain why eclipse caused this to happen every time; it seems as if it should be an intermittent error, but jline was definitely the cause.

Categories