java.net.ConnectException when running MPJBOOT on linux cluster - java

I followed the steps on page http://mpjexpress.org/docs/guides/linuxguide.pdf, and I can start daemons on all machines. If I just run program on single server, it runs successfully
[ip1,master] MPJ Daemon started successfully with process id: xxx
[ip2,node] MPJ Daemon started successfully with process id: xxx
But when I use command mpjrun.sh -np 2 -dev niodev HelloWorld, it shows java.net.ConnectException: Connection refused and Multi-threaded starter: exceptionnull, but I can Ping this IP successfully.
Is anything wrong? Thanks for your comments.

Related

Remote debugging in flink

Added one parameter in flink-conf.yaml :
env.java.opts.taskmanager: "-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=51005"
Then i started a jobmanager and a taskmanager using below commands on localhost :
flink-1.9.1_bin/bin/jobmanager.sh start
flink-1.9.1_bin/bin/taskmanager.sh start
Jobmanager runs and i can see flink UI dashboard on localhost:8081. Taskmanager waits for connection on 51001 port. It runs after i debug my code from IDE(IntelliJ) which has setup for remote debugging on localhost:51001. I can see tasks slots been added when i start debugging from IDE. After this i ran below command:
flink-1.9.1_bin/bin/flink run -c myapp.Main myapp.jar
I am expecting debug point to come to my local code in IDE, but it's not coming. My code goes in running state directly that i can in flink UI dashboard.
I am able do remote debugging for normal java projects but not for flink jobs.

Why some VPN clients break Java debugging and how to work around the issue?

I am using IntelliJ to develop a Scala project.
Due to client-server architecture of the system, my integration tests have to be run with the following settings in build.sbt:
fork in IntegrationTest := true
javaOptions in (IntegrationTest) ++= Seq("-Djdk.logging.allowStackWalkSearch=true", "-XX:PermSize=256M", "-XX:MaxPermSize=512M", "-Xmx1024m")
// for attaching with debugger to the processes under test
javaOptions in (IntegrationTest) += "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005"
Normally everything works fine - tests are being run, debugging works.
Sometimes I need to establish a VPN connection to access some resources of my employee company. I'm using Check Point Endpoint Security VPN (that's the officially recommended software of my employee and I'm not sure if anything else would work).
So, if I happen to be connected to the VPN and then run the integration tests, then SBT console starts getting stuck right after:
Listening for transport dt_socket at address: 5005
The exact message is:
Listening for transport dt_socket at address: 5025
[error] Uncaught exception when running tests: java.net.ConnectException: Connection timed out: connect
[trace] Stack trace suppressed: run last project/it:test for the full output.
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
When I run last project/it:test (I have to reload SBT console first because it has stuck in (busy) > state) I see this:
[debug] javaOptions: List(-Djdk.logging.allowStackWalkSearch=true, -XX:PermSize=256M, -XX:MaxPermSize=512M, -Xmx1024m, -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005)
[debug] Forking tests - parallelism = false
[debug] Create a single-thread test executor
[error] Uncaught exception when running tests: java.net.ConnectException: Connection timed out: connect
Sometimes when I disconnect the VPN and run the tests again, it starts working. But often disconnecting VPN doesn't help and I have to reboot my computer.
I have tried some less dramatic solutions - restarting IDE, killing all java and javaw processes, looking at netstat results to see anything still using the port 5005, changing the port to 5025 in build.sbt and reloading SBT console... nothing works, only reboot and only until the next time I need to connect to VPN.
That's a nightmare. I don't want to reboot my machine each time after I connect to VPN.
Is there any solution to this? Any Java flags? Any Windows network stack settings? Any VPN settings?

Launch spark master windows7

Using win7-64, jdk8, sparks1.6.2.
I have spark running, winutils, HADOOP_HOME, etc
Per documentation Note: The launch scripts do not currently support Windows. To run a Spark cluster on Windows, start the master and workers by hand. But does not say how?
How do I launch spark master on windows?
Tried running sh start-master.sh thru git bash : failed to launch org.apache.spark.deploy.master.Master: Even though it prints out Master --ip Sam-Toshiba --port 7077 --webui-port 8080 - So I don't know what all this means.
But when I try spark-submit --class " " --master spark://Sam-Toshiba:7077 target/ .jar -
I get errors:
WARN AbstractLifeCycle: FAILED SelectChannelConnector#0.0.0.0:
4040: java.net.BindException: Address already in use: bind
java.net.BindException: Address already in use
WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
17/01/12 14:44:29 WARN AppClient$ClientEndpoint: Failed to connect to master Sam-Toshiba:7077
java.io.IOException: Failed to connect to Sam-Toshiba/192.168.137.1:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
Also tried spark://localhost:7077 - same errors
On Windows you can launch Master using below command. Open command prompt and go to Spark bin folder and execute
spark-class.cmd org.apache.spark.deploy.master.Master
Above command will print like Master: Starting Spark master at spark://192.168.99.1:7077 in console as per IP of your machine. You can check the UI at http://192.168.99.1:8080/
If you want to launch worker once your master is up you can use below command. This will use all the available cores of your machine.
spark-class.cmd org.apache.spark.deploy.worker.Worker spark://192.168.99.1:7077
If you want to utilize 2 cores of your 4 cores of machine then use
spark-class.cmd org.apache.spark.deploy.worker.Worker -c 2 spark://192.168.99.1:7077

spark standalone cluster slave unable to connect slave to master

i have the bin for spark-1.6.0-bin-hadoop2.6 im having issue trying to connect the slave to the master
so far i have tried(on ubuntu 14.04 live usb):
apt-get purge and install openssh-client and server on both systems
I have stated explicitly the ip address of the master in the spark url for the worker
spark://< master ip>:7077 and also tried changing the SPARK_MASTER_IP in /conf/spark-env.sh the worker executes but the log has the following error
im assuming there must be some ssh setup involved but i have tried ssh-keygen and ssh-copy-id # it also doesnt give any reuslts
16/02/22 07:49:16 INFO Worker: Connecting to master 192.168.0.208:7077...
16/02/22 07:49:16 WARN Worker: Failed to connect to master 192.168.0.208:7077
java.io.IOException: Failed to connect to /192.168.0.208:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /192.168.0.208:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:740)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
16/02/22 07:49:27 INFO Worker: Retrying connection to master (attempt # 2)
i am however able to open the master webUI by typing :8080 on my browser . i am also able to access the webUI of the slave from the master . im almost at the point of giving in so please helllppp.
Make sure that each master and worker has a firewall exception to allow connections for all other workers and masters.
Here's a simplified example from one of our master machines (master0):
$iptables -L
...
ACCEPT all -- worker0.company.com master0.company.com
ACCEPT all -- worker1.company.com master0.company.com
ACCEPT all -- master1.company.com master0.company.com
...
Of course you can also use IPs instead of hostnames.

Cannot connect to hsqldb database

I am using the following command to create a database using windows command and connect to it but I am getting java.net.SocketException: Unrecognized Windows Sockets error: 0: JVM_Bind error.
Command used to create a database named xdb and connect to it:
java -cp ./lib/hsqldb.jar org.hsqldb.Server -database.0 file:mydb -dbname.0 xdb
Complete error:
[Server#83cc67]: [Thread[main,5,main]]: checkRunning(false) entered
[Server#83cc67]: [Thread[main,5,main]]: checkRunning(false) exited
[Server#83cc67]: Startup sequence initiated from main() method
[Server#83cc67]: Loaded properties from [C:\Home\hsqldb\server.properties]
[Server#83cc67]: Initiating startup sequence...
[Server#83cc67]: [Thread[HSQLDB Server #83cc67,5,main]]: run()/openServerSocket(
):
java.net.SocketException: Unrecognized Windows Sockets error: 0: JVM_Bind
at java.net.PlainSocketImpl.socketBind(Native Method)
at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:365)
at java.net.ServerSocket.bind(ServerSocket.java:319)
at java.net.ServerSocket.<init>(ServerSocket.java:185)
at java.net.ServerSocket.<init>(ServerSocket.java:97)
at org.hsqldb.HsqlSocketFactory.createServerSocket(Unknown Source)
at org.hsqldb.Server.openServerSocket(Unknown Source)
at org.hsqldb.Server.run(Unknown Source)
at org.hsqldb.Server.access$000(Unknown Source)
at org.hsqldb.Server$ServerThread.run(Unknown Source)
[Server#83cc67]: Initiating shutdown sequence...
[Server#83cc67]: Shutdown sequence completed in 6 ms.
[Server#83cc67]: 2012-05-18 01:31:59.184 SHUTDOWN : System.exit() is called next
Could someone help me understand why am I getting this error and how to solve it?
Thanks
The default port for hsqldb is 9001
Run netstat -an check to see if there is something is LISTENING on port 9001
netstat -an | grep LISTENING to check for all servers listening for incoming connections
netstat -an | grep 9001 to check for a specific port number.
If there is something already there then the new of hsqldb that you are trying to start will fail to bind a socket to the 9001 port.
On Windows 7 you can run TCPView to see what process is currently listening on the "overcrowded" port. Then it's a matter of deciding to terminate that process which is using 9001 or reconfiguring hsqldb and your client application to use a different (unused) port.
It is possible to change the port that hsqldb listens on using the --port XXXX, where XXXX is the new port number.
Also from the java -cp ./lib/hsqldb.jar org.hsqldb.Server --help output...
The server looks for a 'server.properties' file in the current directory and loads properties from it if it exists. Command line options override those loaded from the 'server.properties' file.
There are other possible causes of this error so it would be useful to know what operating system the hsqldb is running on.
Failure to bind to a socket is a problem that can afflict any server application so you can review the answers provided for other server software that return this error such as the question asked about JBOSS here ...
java.net.SocketException: Unrecognized Windows Sockets error: 0: JVM_Bind (JBOSS)
It looks like you try to bind to port 0 and it doesn't exist. Try to config a different port

Categories