Why does Spark run with less memory than available?

Why does Spark run with less memory than available? - java

I'm running a single node application with Spark on a machine with 32 GB RAM.
More than 12GB of the memory is available at the time I'm running the applicaton.
But From the spark UI and logs, I see that it using 3.8GB of RAM (which is gradually decreased as the jobs run).
At this time this is logged, 5GB more memory is avilable. Where as Spark is using 3.8GB
UPDATE
I set these parameters in conf/spark-env.sh but still each time I run the application It is using exactly 3.8 GB
export SPARK_WORKER_MEMORY=6g
export SPARK_MEM=6g
export SPARK_DAEMON_MEMORY=6g
Log
2015-11-19 13:05:41,701 INFO org.apache.spark.SparkEnv.logInfo:59 - Registering MapOutputTracker
2015-11-19 13:05:41,716 INFO org.apache.spark.SparkEnv.logInfo:59 - Registering BlockManagerMaster
2015-11-19 13:05:41,735 INFO org.apache.spark.storage.DiskBlockManager.logInfo:59 - Created local directory at /usr/local/TC_SPARCDC_COM/temp/blockmgr-8513cd3b-ac03-4c0a-b291-65aba4cbc395
2015-11-19 13:05:41,746 INFO org.apache.spark.storage.MemoryStore.logInfo:59 - MemoryStore started with capacity 3.8 GB
2015-11-19 13:05:41,777 INFO org.apache.spark.HttpFileServer.logInfo:59 - HTTP File server directory is /usr/local/TC_SPARCDC_COM/temp/spark-b86380c2-4cbd-43d6-a3b7-aa03d9a05a84/httpd-ceaffbd0-eac4-447e-9d3f-c452627a28cb
2015-11-19 13:05:41,781 INFO org.apache.spark.HttpServer.logInfo:59 - Starting HTTP Server
2015-11-19 13:05:41,842 INFO org.spark-project.jetty.server.Server.doStart:272 - jetty-8.y.z-SNAPSHOT
2015-11-19 13:05:41,854 INFO org.spark-project.jetty.server.AbstractConnector.doStart:338 - Started SocketConnector#0.0.0.0:5279
2015-11-19 13:05:41,855 INFO org.apache.spark.util.Utils.logInfo:59 - Successfully started service 'HTTP file server' on port 5279.
2015-11-19 13:05:41,867 INFO org.apache.spark.SparkEnv.logInfo:59 - Registering OutputCommitCoordinator
2015-11-19 13:05:42,013 INFO org.spark-project.jetty.server.Server.doStart:272 - jetty-8.y.z-SNAPSHOT
2015-11-19 13:05:42,039 INFO org.spark-project.jetty.server.AbstractConnector.doStart:338 - Started SelectChannelConnector#0.0.0.0:4040
2015-11-19 13:05:42,039 INFO org.apache.spark.util.Utils.logInfo:59 - Successfully started service 'SparkUI' on port 4040.
2015-11-19 13:05:42,041 INFO org.apache.spark.ui.SparkUI.logInfo:59 - Started SparkUI at http://103.252.184.181:4040
2015-11-19 13:05:42,114 WARN org.apache.spark.metrics.MetricsSystem.logWarning:71 - Using default name DAGScheduler for source because spark.app.id is not set.
2015-11-19 13:05:42,117 INFO org.apache.spark.executor.Executor.logInfo:59 - Starting executor ID driver on host localhost
2015-11-19 13:05:42,307 INFO org.apache.spark.util.Utils.logInfo:59 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 31334.
2015-11-19 13:05:42,308 INFO org.apache.spark.network.netty.NettyBlockTransferService.logInfo:59 - Server created on 31334
2015-11-19 13:05:42,309 INFO org.apache.spark.storage.BlockManagerMaster.logInfo:59 - Trying to register BlockManager
2015-11-19 13:05:42,312 INFO org.apache.spark.storage.BlockManagerMasterEndpoint.logInfo:59 - Registering block manager localhost:31334 with 3.8 GB RAM, BlockManagerId(driver, localhost, 31334)
2015-11-19 13:05:42,313 INFO org.apache.spark.storage.BlockManagerMaster.logInfo:59 - Registered BlockManager

If you are using SparkSubmit you can use the --executor-memory and --driver-memory flags. Otherwise, change these configurations spark.executor.memory and spark.driver.memory either directly in your program or in spark-defaults.
Note that you should not set memory too high. As a rule of thumb, aim for ~75% of available memory. That will leave enough memory for other processes (like your OS) running on your machines.

It is correctly stated by #Glennie Helles Sindholt but setting driver flags while submitting jobs on a standalone machine won't affect the usage as the JVM has been already been initialized. Checkout this link of discussion:
How to set Apache Spark Executor memory
If you are using Spark submit command to submit a job following is an example for how to set parameters while submitting the job:
spark-submit --master spark://127.0.0.1:7077 \
--num-executors 2 \
--executor-cores 8 \
--executor-memory 3g \
--class <Class name> \
$JAR_FILE_NAME or path \
/path-to-input \
/path-to-output \
By varying the number of parameters in this you can see and understand how the usage of RAM is changing. Also, there is a utility named htop on Linux. It is useful to instantaneous usage of memory, CPU cores and Swap space to have an understanding of what is happening. To install htop, use the following:
sudo apt-get install htop
It will look something like this:
htop utility
For more information you can check out the following links:
https://spark.apache.org/docs/latest/configuration.html

Related

How to configure beam python sdk with spark in a kubernetes environment

TLDR;
How to configure Apache Beam pipelines options with "environment_type" = EXTERNAL or PROCESS?
Description
Currently, we have a standalone spark cluster inside Kubernetes, following this solution (and the setup) we launch a beam pipeline creating an embedded spark job server on the spark worker who needs to run a python SDK jointly.
Apache Beam allows running python SDK in 4 different ways:
"DOCKER" - Default and not possible inside a Kubernetes cluster (would use "container inside container")
"LOOPBACK" - Only for testing, not possible with more than 1 worker pod
"EXTERNAL" - Ideal setup, "just" create a sidecar container to run in the same pod as the spark workers
"PROCESS" - Execute a process in the spark worker, not ideal but could be too.
Development
Using "External" - Implementing the spark worker with the python sdk on the same pod:
apiVersion: apps/v1
kind: Deployment
metadata:
name: spark-worker
labels:
app: spark-worker
spec:
selector:
matchLabels:
app: spark-worker
template:
metadata:
labels:
app: spark-worker
spec:
containers:
- name: spark-worker
image: spark-py-custom:latest
imagePullPolicy: Never
ports:
- containerPort: 8081
protocol: TCP
command: ['/bin/bash',"-c","--"]
args: ["/start-worker.sh" ]
resources :
requests :
cpu : 4
memory : "5Gi"
limits :
cpu : 4
memory : "5Gi"
volumeMounts:
- name: spark-jars
mountPath: "/tmp"
- name: python-beam-sdk
image: apachebeam/python3.7_sdk:latest
command: ["/opt/apache/beam/boot", "--worker_pool"]
ports:
- containerPort: 50000
resources:
limits:
cpu: "1"
memory: "1Gi"
volumes:
- name: spark-jars
persistentVolumeClaim:
claimName: spark-jars
And them, if we execute the command
python3 wordcount.py \
--output ./data_test/counts \
--runner=SparkRunner \
--spark_submit_uber_jar \
--spark_job_server_jar=beam-runners-spark-job-server-2.28.0.jar \
--spark_master_url=spark://spark-master:7077 \
--spark_rest_url=http://spark-master:6066 \
--environment_type=EXTERNAL \
--environment_config=localhost:50000
We get a stuck terminal in the state of "RUNNING":
INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.
INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
INFO:oauth2client.client:Timeout attempting to reach GCE metadata service.
WARNING:apache_beam.internal.gcp.auth:Unable to find default credentials to use: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
Connecting anonymously.
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
INFO:root:Default Python SDK image for environment is apache/beam_python3.7_sdk:2.28.0
INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function lift_combiners at 0x7fc360c0b8c8> ====================
INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function sort_stages at 0x7fc360c0f048> ====================
INFO:apache_beam.runners.portability.abstract_job_service:Artifact server started on port 36369
INFO:apache_beam.runners.portability.abstract_job_service:Running job 'job-2448721e-e686-41d4-b924-5f8c5ae73ac2'
INFO:apache_beam.runners.portability.spark_uber_jar_job_server:Submitted Spark job with ID driver-20210305172421-0000
INFO:apache_beam.runners.portability.portable_runner:Job state changed to STOPPED
INFO:apache_beam.runners.portability.portable_runner:Job state changed to RUNNING
And in the spark worker log:
21/03/05 17:24:25 INFO ExecutorRunner: Launch command: "/usr/local/openjdk-8/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=45203" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler#spark-worker-64fd4ddd6-tqdrs:45203" "--executor-id" "0" "--hostname" "172.18.0.20" "--cores" "3" "--app-id" "app-20210305172425-0000" "--worker-url" "spark://Worker#172.18.0.20:44365"
And on the python sdk:
2021/03/05 17:19:52 Starting worker pool 1: python -m apache_beam.runners.worker.worker_pool_main --service_port=50000 --container_executable=/opt/apache/beam/boot
Starting worker with command ['/opt/apache/beam/boot', '--id=1-1', '--logging_endpoint=', '--artifact_endpoint=', '--provision_endpoint=', '--control_endpoint=']
2021/03/05 17:24:32 No logging endpoint provided.
Checking the spark worker stderr (on localhost 8081):
Spark Executor Command: "/usr/local/openjdk-8/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=45203" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler#spark-worker-64fd4ddd6-tqdrs:45203" "--executor-id" "0" "--hostname" "172.18.0.20" "--cores" "3" "--app-id" "app-20210305172425-0000" "--worker-url" "spark://Worker#172.18.0.20:44365"
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/03/05 17:24:26 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 230#spark-worker-64fd4ddd6-tqdrs
21/03/05 17:24:26 INFO SignalUtils: Registered signal handler for TERM
21/03/05 17:24:26 INFO SignalUtils: Registered signal handler for HUP
21/03/05 17:24:26 INFO SignalUtils: Registered signal handler for INT
21/03/05 17:24:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/03/05 17:24:27 INFO SecurityManager: Changing view acls to: root
21/03/05 17:24:27 INFO SecurityManager: Changing modify acls to: root
21/03/05 17:24:27 INFO SecurityManager: Changing view acls groups to:
21/03/05 17:24:27 INFO SecurityManager: Changing modify acls groups to:
21/03/05 17:24:27 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
21/03/05 17:24:27 INFO TransportClientFactory: Successfully created connection to spark-worker-64fd4ddd6-tqdrs/172.18.0.20:45203 after 50 ms (0 ms spent in bootstraps)
21/03/05 17:24:27 INFO SecurityManager: Changing view acls to: root
21/03/05 17:24:27 INFO SecurityManager: Changing modify acls to: root
21/03/05 17:24:27 INFO SecurityManager: Changing view acls groups to:
21/03/05 17:24:27 INFO SecurityManager: Changing modify acls groups to:
21/03/05 17:24:27 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
21/03/05 17:24:28 INFO TransportClientFactory: Successfully created connection to spark-worker-64fd4ddd6-tqdrs/172.18.0.20:45203 after 1 ms (0 ms spent in bootstraps)
21/03/05 17:24:28 INFO DiskBlockManager: Created local directory at /tmp/spark-bdffc2b3-f57a-42fa-a720-e22274b86b67/executor-f1eff7ca-d2cd-4ff4-b18b-c8d6a520f590/blockmgr-c61fb65f-ea97-4bd5-bf15-e0025845a251
21/03/05 17:24:28 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
21/03/05 17:24:28 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler#spark-worker-64fd4ddd6-tqdrs:45203
21/03/05 17:24:28 INFO WorkerWatcher: Connecting to worker spark://Worker#172.18.0.20:44365
21/03/05 17:24:28 INFO TransportClientFactory: Successfully created connection to /172.18.0.20:44365 after 1 ms (0 ms spent in bootstraps)
21/03/05 17:24:28 INFO WorkerWatcher: Successfully connected to spark://Worker#172.18.0.20:44365
21/03/05 17:24:28 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
21/03/05 17:24:28 INFO Executor: Starting executor ID 0 on host 172.18.0.20
21/03/05 17:24:28 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42561.
21/03/05 17:24:28 INFO NettyBlockTransferService: Server created on 172.18.0.20:42561
21/03/05 17:24:28 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/03/05 17:24:28 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(0, 172.18.0.20, 42561, None)
21/03/05 17:24:28 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(0, 172.18.0.20, 42561, None)
21/03/05 17:24:28 INFO BlockManager: Initialized BlockManager: BlockManagerId(0, 172.18.0.20, 42561, None)
Where it gets stuck forever.
Checking the source code of the python SDK we can see that "no logging endpoint provided" is fatal and it comes from the lack of configuration sent to him (no logging/artifact/provision/control endpoints). If I try to add the "--artifact_endpoint" to the python command I get grcp error of failed communication because the jobserver creates its own artifact endpoint. In this setup would be necessary to configure all these endpoints (probably as localhost as the SDK and the worker are in the same pod) with fixed ports but I can't find how to configure it. Checking SO I can find a related issue but in his case he gets the python SDK configurations automatically (maybe a spark runner issue?)
Using "PROCESS" - Trying to run the python SDK within a process, I built the python SDK with ./gradlew :sdks:python:container:py37:docker, copied the sdks/python/container/build/target/launcher/linux_amd64/boot executable to /python_sdk/boot inside the spark worker container and used the command:
python3 wordcount.py \
--output ./data_test/counts \
--runner=SparkRunner \
--spark_submit_uber_jar \
--spark_master_url=spark://spark-master:7077 \
--spark_rest_url=http://spark-master:6066 \
--environment_type=PROCESS \
--spark_job_server_jar=beam-runners-spark-job-server-2.28.0.jar \
--environment_config='{"os":"linux","arch":"x84_64","command":"/python_sdk/boot"}'
Resulting in "run time exception" in the terminal:
INFO:apache_beam.runners.portability.portable_runner:Job state changed to FAILED
Traceback (most recent call last):
File "wordcount.py", line 91, in <module>
run()
File "wordcount.py", line 86, in run
output | "Write" >> WriteToText(known_args.output)
File "/usr/local/lib/python3.7/dist-packages/apache_beam/pipeline.py", line 581, in __exit__
self.result.wait_until_finish()
File "/usr/local/lib/python3.7/dist-packages/apache_beam/runners/portability/portable_runner.py", line 608, in wait_until_finish
raise self._runtime_exception
RuntimeError: Pipeline job-95c13aa5-96ab-4d1d-bc68-7f9d203c8251 failed in state FAILED: unknown error
and checking again the spark stderr worker log I can see that the problem is java.lang.IllegalArgumentException: No filesystem found for scheme classpath which I don't know the reason.
21/03/05 18:33:12 INFO Executor: Adding file:/opt/spark/work/app-20210305183309-0000/0/./javax.servlet-api-3.1.0.jar to class loader
21/03/05 18:33:12 INFO TorrentBroadcast: Started reading broadcast variable 0
21/03/05 18:33:12 INFO TransportClientFactory: Successfully created connection to spark-worker-89c5c4c87-5q45s/172.18.0.20:34783 after 1 ms (0 ms spent in bootstraps)
21/03/05 18:33:12 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.3 KB, free 366.3 MB)
21/03/05 18:33:12 INFO TorrentBroadcast: Reading broadcast variable 0 took 63 ms
21/03/05 18:33:12 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 32.5 KB, free 366.3 MB)
21/03/05 18:33:13 INFO MemoryStore: Block rdd_13_0 stored as values in memory (estimated size 16.0 B, free 366.3 MB)
21/03/05 18:33:13 INFO MemoryStore: Block rdd_17_0 stored as values in memory (estimated size 16.0 B, free 366.3 MB)
21/03/05 18:33:13 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 5427 bytes result sent to driver
21/03/05 18:33:14 ERROR SerializingExecutor: Exception while executing runnable org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed#5f917914
java.lang.IllegalArgumentException: No filesystem found for scheme classpath
at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:467)
at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:537)
at org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService.getArtifact(ArtifactRetrievalService.java:125)
at org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService.getArtifact(ArtifactRetrievalService.java:99)
at org.apache.beam.model.jobmanagement.v1.ArtifactRetrievalServiceGrpc$MethodHandlers.invoke(ArtifactRetrievalServiceGrpc.java:327)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:172)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:817)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
21/03/05 18:33:16 ERROR SerializingExecutor: Exception while executing runnable org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed#67fb2b2c
java.lang.IllegalArgumentException: No filesystem found for scheme classpath
at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:467)
at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:537)
at org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService.getArtifact(ArtifactRetrievalService.java:125)
at org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService.getArtifact(ArtifactRetrievalService.java:99)
at org.apache.beam.model.jobmanagement.v1.ArtifactRetrievalServiceGrpc$MethodHandlers.invoke(ArtifactRetrievalServiceGrpc.java:327)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:172)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:817)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
21/03/05 18:33:19 INFO ProcessEnvironmentFactory: Still waiting for startup of environment '/python_sdk/boot' for worker id 1-1
Probably is missing some configuration parameters.
Obs
If I execute the command
python3 wordcount.py \
--output ./data_test/counts \
--runner=SparkRunner \
--spark_submit_uber_jar \
--spark_job_server_jar=beam-runners-spark-job-server-2.28.0.jar \
--spark_master_url=spark://spark-master:7077 \
--spark_rest_url=http://spark-master:6066 \
--environment_type=LOOPBACK
inside our spark worker (having only one worker in the spark cluster) we have a full working beam pipeline with these logs.

Using "External" - this definitely seems like a bug in Beam. The worker endpoints are supposed to be set up to use localhost; I don't think it is possible to configure them. I'm not sure why they would be missing; one educated guess is that the servers silently fail to start, leaving the endpoints empty. I filed a bug report (BEAM-11957) for this issue.
Using "Process" - The scheme classpath corresponds to ClassLoaderFileSystem. This file system is usually loaded using AutoService, which depends on ClassLoaderFileSystemRegistrar being present on the classpath (no relation to the name of the file system itself). The classpath of the job jar is based on spark_job_server_jar. Where are you getting your beam-runners-spark-job-server-2.28.0.jar from?

SonarQube stopped and is not running

I'm trying to setup SonarQube 7.8 version. Once i start sonar.sh file it is running but after that sonar stops.
root#automation:/opt/sonarqube-7.8/bin/linux-x86-64# ./sonar.sh start
Starting SonarQube...
Started SonarQube.
root#automation:/opt/sonarqube-7.8/bin/linux-x86-64# ./sonar.sh status
SonarQube is not running.
I checked the logs and this is what i get:
--> Wrapper Started as Daemon
Launching a JVM...
Wrapper (Version 3.2.3) http://wrapper.tanukisoftware.org
Copyright 1999-2006 Tanuki Software, Inc. All Rights Reserved.
2019.10.15 21:01:37 INFO app[][o.s.a.AppFileSystem] Cleaning or creating temp directory /opt/sonarqube-7.8/temp
2019.10.15 21:01:37 INFO app[][o.s.a.es.EsSettings] Elasticsearch listening on /127.0.0.1:9001
2019.10.15 21:01:37 INFO app[][o.s.a.ProcessLauncherImpl] Launch process[[key='es', ipcIndex=1, logFilenamePrefix=es]] from [/opt/sonarqube-7.8/elasticsearch]: /opt/sonarqube-7.8/elasticsearch/bin/elasticsearch
2019.10.15 21:01:37 INFO app[][o.s.a.SchedulerImpl] Waiting for Elasticsearch to be up and running
2019.10.15 21:01:38 INFO app[][o.e.p.PluginsService] no modules loaded
2019.10.15 21:01:38 INFO app[][o.e.p.PluginsService] loaded plugin [org.elasticsearch.transport.Netty4Plugin]
Java HotSpot(TM) 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
2019.10.15 21:01:41 WARN app[][o.s.a.p.AbstractManagedProcess] Process exited with exit value [es]: 1
2019.10.15 21:01:41 INFO app[][o.s.a.SchedulerImpl] Process[es] is stopped
2019.10.15 21:01:41 INFO app[][o.s.a.SchedulerImpl] SonarQube is stopped
<-- Wrapper Stopped
es.log file is here :
2019.10.15 21:01:41 ERROR es[][o.e.b.Bootstrap] Exception
java.lang.RuntimeException: can not run elasticsearch as root
at org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:103) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:170) ~[elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:333) [elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:159) [elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:150) [elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) [elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) [elasticsearch-cli-6.8.0.jar:6.8.0]
at org.elasticsearch.cli.Command.main(Command.java:90) [elasticsearch-cli-6.8.0.jar:6.8.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:116) [elasticsearch-6.8.0.jar:6.8.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:93) [elasticsearch-6.8.0.jar:6.8.0]
I'm not sure why SonarQube stops. Could you help me with that please?

SOLVED:
First, don't run sonarqube as the root user
Create user
command: useradd username (put username as sonaradmin(if you don't want to change any command)
Create a password command: passwd username
goto /opt/ directory
3.1) Rename sonarqube.x.x.x to sonarqube command:sudo mv sonarqube.x.x.x sonarqube(X.X.X you should change directory name)
Change permissions command:chmod 775 -R sonarqube(folder name may change Sonarqube )
Add Created user to sonaradmin group command: chown -R sonaradmin:sonaradmin sonarqube
goto : cd /opt/sonarqube/bin/linux-x86-64/
Command: su sonaradmin
Enter password
command:./sonar.sh start
Command:./sonar.sh status
Now goto browser check http://localhost:9000/sonar
Hoping this helps.... I have gone through the same issue,solved

ok, i found the solution. All i had to do is to change #RUN_AS_USER= in /opt/sonarqube-7.8/bin/linux-x86-64/sonar.sh line 48 to RUN_AS_USER=sonar

Your solution for me doesn't works, cause console show me this messagge (after edit to sonar.sh file)
groups: "sonar": no such user
chown: utente non valido: "sonar:sonar"
su: user sonar does not exist

if baya prakash reddy's answer doesn't work, so you can look at the documentation https://docs.sonarqube.org/latest/requirements/requirements/ where it says that you must ensure that:
vm.max_map_count is greater than or equal to 524288
yo can set it like this:
sysctl -w vm.max_map_count=524288

Ensure your sonarQube server has at least 4GB of RAM. I had this issue after installing SonarQube with 1GB of RAM. Running SonarQube as sonar did not resolve the issue. Once I installed on RHL with 4GB RAM, issue was resolved

Timed out waiting for container port to open (localhost ports: [32773] should be listening)

I am trying to use https://github.com/testcontainers/testcontainers-scala that is inherent from https://www.testcontainers.org/ as the following:
final class MessageSpec extends BddSpec
with ForAllTestContainer
with BeforeAndAfterAll {
override val container = GenericContainer("sweetsoft/sapmock").configure{ c =>
c.addExposedPort(8080)
c.withNetwork(Network.newNetwork())
}
override def beforeAll() {
}
feature("Process incoming messages") {
When I run the test with the command sbt test, I've got the following exception:
15:22:23.171 [pool-7-thread-2] ERROR 🐳 [sweetsoft/sapmock:latest] - Could not start container
org.testcontainers.containers.ContainerLaunchException: Timed out waiting for container port to open (localhost ports: [32775] should be listening)
at org.testcontainers.containers.wait.strategy.HostPortWaitStrategy.waitUntilReady(HostPortWaitStrategy.java:47)
at org.testcontainers.containers.wait.strategy.AbstractWaitStrategy.waitUntilReady(AbstractWaitStrategy.java:35)
at org.testcontainers.containers.wait.HostPortWaitStrategy.waitUntilReady(HostPortWaitStrategy.java:23)
at org.testcontainers.containers.wait.strategy.AbstractWaitStrategy.waitUntilReady(AbstractWaitStrategy.java:35)
at org.testcontainers.containers.GenericContainer.waitUntilContainerStarted(GenericContainer.java:582)
The image is a local image:
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
sweetsoft/sapmock latest f02be90356e7 3 hours ago 664MB
openjdk 8 bec43387959a 11 days ago 625MB
quay.io/testcontainers/ryuk 0.2.3 64849fd2d464 3 months ago 10.7MB
The question is, why is it waiting for 32775 port? And for what is the port good for?
Update
Maybe this log will help:
15:47:47.274 [pool-7-thread-4] INFO org.testcontainers.dockerclient.DockerClientProviderStrategy - Found Docker environment with Environment variables, system properties and defaults. Resolved:
dockerHost=unix:///var/run/docker.sock
apiVersion='{UNKNOWN_VERSION}'
registryUrl='https://index.docker.io/v1/'
registryUsername='developer'
registryPassword='null'
registryEmail='null'
dockerConfig='DefaultDockerClientConfig[dockerHost=unix:///var/run/docker.sock,registryUsername=developer,registryPassword=<null>,registryEmail=<null>,registryUrl=https://index.docker.io/v1/,dockerConfigPath=/home/developer/.docker,sslConfig=<null>,apiVersion={UNKNOWN_VERSION},dockerConfig=<null>]'
15:47:47.275 [pool-7-thread-4] INFO org.testcontainers.DockerClientFactory - Docker host IP address is localhost
15:47:47.277 [pool-7-thread-4] DEBUG com.github.dockerjava.core.command.AbstrDockerCmd - Cmd: com.github.dockerjava.core.exec.InfoCmdExec#51a07bb5
15:47:47.389 [pool-7-thread-4] DEBUG com.github.dockerjava.core.command.AbstrDockerCmd - Cmd: com.github.dockerjava.core.exec.VersionCmdExec#70fc9b37
15:47:47.392 [pool-7-thread-4] INFO org.testcontainers.DockerClientFactory - Connected to docker:
Server Version: 18.09.6
API Version: 1.39
Operating System: Ubuntu 18.04.2 LTS
Total Memory: 7976 MB
15:47:47.395 [pool-7-thread-4] DEBUG com.github.dockerjava.core.command.AbstrDockerCmd - Cmd: ListImagesCmdImpl[imageNameFilter=quay.io/testcontainers/ryuk:0.2.3,showAll=false,filters=com.github.dockerjava.core.util.FiltersBuilder#0,execution=com.github.dockerjava.core.exec.ListImagesCmdExec#562a343]
15:47:47.417 [pool-7-thread-4] DEBUG org.testcontainers.utility.RegistryAuthLocator - Looking up auth config for image: quay.io/testcontainers/ryuk:0.2.3
15:47:47.417 [pool-7-thread-4] DEBUG org.testcontainers.utility.RegistryAuthLocator - RegistryAuthLocator has configFile: /home/developer/.docker/config.json (does not exist) and commandPathPrefix:
15:47:47.418 [pool-7-thread-4] WARN org.testcontainers.utility.RegistryAuthLocator - Failure when attempting to lookup auth config (dockerImageName: quay.io/testcontainers/ryuk:0.2.3, configFile: /home/developer/.docker/config.json. Falling back to docker-java default behaviour. Exception message: /home/developer/.docker/config.json (No such file or directory)
15:47:47.418 [pool-7-thread-4] DEBUG org.testcontainers.dockerclient.auth.AuthDelegatingDockerClientConfig - Effective auth config [null]

Original java library has answer to your port question.
https://www.testcontainers.org/features/networking/
Note that this exposed port number is from the perspective of the
container.
From the host's perspective Testcontainers actually exposes this on a
random free port. This is by design, to avoid port collisions that may
arise with locally running software or in between parallel test runs.
Because there is this layer of indirection, it is necessary to ask
Testcontainers for the actual mapped port at runtime. This can be done
using the getMappedPort method, which takes the original (container)
port as an argument
In Scala library you can get this mapped port by calling
container.mappedPort(yourExposedPort)
Error is most likely related to this concept, you need to expose that port in advance, inside your docker image. Make sure that you either have expose 8080 command somewhere in your dockerfile or any image that is used to build yours have it

never ending job in mapreduce

I have set some MapReduce configuration in my main method as so
configuration.set("mapreduce.jobtracker.address", "localhost:54311");
configuration.set("mapreduce.framework.name", "yarn");
configuration.set("yarn.resourcemanager.address", "localhost:8032");
Now when I launch the mapreduce task, the process is tracked (I can see it in my cluster dashboard (the one listening on port 8088)), but the process never finishes. It remains blocked at the following line:
15/06/30 15:56:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/06/30 15:56:17 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
15/06/30 15:56:18 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/06/30 15:56:18 INFO input.FileInputFormat: Total input paths to process : 1
15/06/30 15:56:18 INFO mapreduce.JobSubmitter: number of splits:1
15/06/30 15:56:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1435241671439_0008
15/06/30 15:56:19 INFO impl.YarnClientImpl: Submitted application application_1435241671439_0008
15/06/30 15:56:19 INFO mapreduce.Job: The url to track the job: http://10.0.0.10:8088/proxy/application_1435241671439_0008/
15/06/30 15:56:19 INFO mapreduce.Job: Running job: job_1435241671439_0008
Someone has an idea?
Edit : in my yarn nodemanager log, I have this message
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_1435241671439_0003_03_000001
2015-06-30 15:44:38,396 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Event EventType: KILL_CONTAINER sent to absent container container_1435241671439_0002_04_000001
Edit 2 :
I also have in the yarn manager log, some exception that happened sooner (for a precedent mapreduce call) :
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [0.0.0.0:8040] java.net.BindException: Address already in use; For more details see:
Solution : I killed all the daemon processes and restarted again hadoop ! In fact, when I ran jps, I was still getting hadoop daemons though I had stopped them. This was a mismatch of HADOOP_PID_DIR

The default port of nodemanage of yarn is 8040. The error says that the port is already in use. Stop all the hadoop process, if you dont have data, may be format namenode once and try running the job again. From both of your edits, the issue is surely with node manager

Solution : I killed all the daemon processes and restarted again hadoop ! In fact, when I ran jps, I was still getting hadoop daemons though I had stopped them. This was related to a mismatch of HADOOP_PID_DIR

JBoss strucked at Remoting

I am new to JBoss deployment. I am using Java 32 bit, Unix, Jboss 6 environment. While starting my application shell file X.sh, Jboss struck at Remote service. I have spent lot of time but didn't get any clue to resolve. Please find the error below.
14:34:13,100 INFO [JMXKernel] Legacy JMX core initialized
14:34:24,603 INFO [AbstractServerConfig] JBoss Web Services - Native Server 3.4.1.GA
14:34:25,157 INFO [JSFImplManagementDeployer] Initialized 3 JSF configurations: [Mojarra-1.2, MyFaces-2.0, Mojarra-2.0]
14:34:32,683 WARNING [FileConfigurationParser] AIO wasn't located on this platform, it will fall back to using pure Java NIO. If your platform is Linux, install LibAIO to enable the AIO journal
14:34:37,911 INFO [mbean] Sleeping for 600 seconds
14:34:38,214 WARNING [FileConfigurationParser] AIO wasn't located on this platform, it will fall back to using pure Java NIO. If your platform is Linux, install LibAIO to enable the AIO journal
14:34:38,425 INFO [JMXConnector] starting JMXConnector on host 0.0.0.0:1090
14:34:38,560 INFO [MailService] Mail Service bound to java:/Mail
14:34:39,623 INFO [HornetQServerImpl] live server is starting..
14:34:39,705 INFO [JournalStorageManager] Using NIO Journal
14:34:39,730 WARNING [HornetQServerImpl] Security risk! It has been detected that the cluster admin user and password have not been changed from the installation default. Please see the HornetQ user guide, cluster chapter, for instructions on how to do this.
14:34:40,970 INFO [NettyAcceptor] Started Netty Acceptor version 3.2.1.Final-r2319 0.0.0.0:5455 for CORE protocol
14:34:40,971 INFO [NettyAcceptor] Started Netty Acceptor version 3.2.1.Final-r2319 0.0.0.0:5445 for CORE protocol
14:34:40,975 INFO [HornetQServerImpl] HornetQ Server version 2.1.2.Final (Colmeia, 120) started
14:34:41,040 INFO [WebService] Using RMI server codebase: http://esaxh036.hyd.lab.vignette.com:8083/
14:34:41,271 INFO [jbossatx] ARJUNA-32010 JBossTS Recovery Service (tag: JBOSSTS_4_14_0_Final) - JBoss Inc.
14:34:41,281 INFO [arjuna] ARJUNA-12324 Start RecoveryActivators
14:34:41,301 INFO [arjuna] ARJUNA-12296 ExpiredEntryMonitor running at Thu, 30 Oct 2014 14:34:41
14:34:41,323 INFO [arjuna] ARJUNA-12332 Failed to establish connection to server
14:34:41,348 INFO [arjuna] ARJUNA-12304 Removing old transaction status manager item 0:ffff0a601a3e:126a:5451fbf6:8
14:34:41,390 INFO [arjuna] ARJUNA-12310 Recovery manager listening on endpoint 0.0.0.0:4712
14:34:41,390 INFO [arjuna] ARJUNA-12344 RecoveryManagerImple is ready on port 4712
14:34:41,391 INFO [jbossatx] ARJUNA-32013 Starting transaction recovery manager
14:34:41,402 INFO [arjuna] ARJUNA-12163 Starting service com.arjuna.ats.arjuna.recovery.ActionStatusService on port 4713
14:34:41,403 INFO [arjuna] ARJUNA-12337 TransactionStatusManagerItem host: 0.0.0.0 port: 4713
14:34:41,425 INFO [arjuna] ARJUNA-12170 TransactionStatusManager started on port 4713 and host 0.0.0.0 with service com.arjuna.ats.arjuna.recovery.ActionStatusService
14:34:41,480 INFO [jbossatx] ARJUNA-32017 JBossTS Transaction Service (JTA version - tag: JBOSSTS_4_14_0_Final) - JBoss Inc.
14:34:41,549 INFO [arjuna] ARJUNA-12202 registering bean jboss.jta:type=ObjectStore.
14:34:41,764 INFO [AprLifecycleListener] The Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: /home/IWSTU/JBOSS/jboss-6.0.0.Final/bin/native/lib
14:34:41,922 INFO [ModClusterService] Initializing mod_cluster 1.1.0.Final
14:34:41,935 INFO [TomcatDeployment] deploy, ctxPath=/invoker
14:34:42,364 INFO [RARDeployment] Required license terms exist, view vfs:/home/IWSTU/JBOSS/jboss-6.0.0.Final/server/XDomain/deploy/jboss-local-jdbc.rar/META-INF/ra.xml
14:34:42,382 INFO [RARDeployment] Required license terms exist, view vfs:/home/IWSTU/JBOSS/jboss-6.0.0.Final/server/XDomain/deploy/jboss-xa-jdbc.rar/META-INF/ra.xml
14:34:42,395 INFO [RARDeployment] Required license terms exist, view vfs:/home/IWSTU/JBOSS/jboss-6.0.0.Final/server/XDomain/deploy/jms-ra.rar/META-INF/ra.xml
14:34:42,410 INFO [HornetQResourceAdapter] HornetQ resource adaptor started
14:34:42,421 INFO [RARDeployment] Required license terms exist, view vfs:/home/IWSTU/JBOSS/jboss-6.0.0.Final/server/XDomain/deploy/mail-ra.rar/META-INF/ra.xml
14:34:42,439 INFO [RARDeployment] Required license terms exist, view vfs:/home/IWSTU/JBOSS/jboss-6.0.0.Final/server/XDomain/deploy/quartz-ra.rar/META-INF/ra.xml
14:34:42,544 INFO [SimpleThreadPool] Job execution threads will use class loader of thread: Thread-7
14:34:42,578 INFO [SchedulerSignalerImpl] Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl
14:34:42,579 INFO [QuartzScheduler] Quartz Scheduler v.1.8.3 created.
14:34:42,582 INFO [RAMJobStore] RAMJobStore initialized.
14:34:42,585 INFO [QuartzScheduler] Scheduler meta-data: Quartz Scheduler (v1.8.3) 'JBossQuartzScheduler' with instanceId 'NON_CLUSTERED'
Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally.
NOT STARTED.
Currently in standby mode.
Number of jobs executed: 0
Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 10 threads.
Using job-store 'org.quartz.simpl.RAMJobStore' - which does not support persistence. and is not clustered.
14:34:42,585 INFO [StdSchedulerFactory] Quartz scheduler 'JBossQuartzScheduler' initialized from an externally opened InputStream.
14:34:42,586 INFO [StdSchedulerFactory] Quartz scheduler version: 1.8.3
14:34:42,586 INFO [QuartzScheduler] Scheduler JBossQuartzScheduler_$_NON_CLUSTERED started.
14:34:43,229 INFO [ConnectionFactoryBindingService] Bound
ConnectionManager 'jboss.jca:service=DataSourceBinding,name=DefaultDS' to JNDI name 'java:DefaultDS'
14:34:43,422 INFO [TomcatDeployment] deploy, ctxPath=/juddi
14:34:43,488 INFO [RegistryServlet] Loading jUDDI configuration.
14:34:43,494 INFO [RegistryServlet] Resources loaded from: /WEB-INF/juddi.properties
14:34:43,494 INFO [RegistryServlet] Initializing jUDDI components.
14:34:43,688 INFO [ConnectionFactoryBindingService] Bound ConnectionManager 'jboss.jca:service=ConnectionFactoryBinding,name=JmsXA' to JNDI name 'java:JmsXA'
14:34:43,738 INFO [ConnectionFactoryBindingService] Bound ConnectionManager 'jboss.jca:service=DataSourceBinding,name=OracleDS' to JNDI name 'java:OracleDS'
14:34:43,926 INFO [xnio] XNIO Version 2.1.0.CR2
14:34:43,937 INFO [nio] XNIO NIO Implementation Version 2.1.0.CR2
**14:34:44,170 INFO [remoting] JBoss Remoting version 3.1.0.Beta2** (Strucked here)
14:44:37,912 INFO [TicketMap] Start:
14:44:37,913 INFO [TicketMap] Complete:
14:44:37,930 INFO [mbean] Sleeping for 600 seconds
14:54:37,932 INFO [TicketMap] Start:
14:54:37,932 INFO [TicketMap] Complete:
14:54:37,944 INFO [mbean] Sleeping for 600 seconds

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Why does Spark run with less memory than available? - java

Related

How to configure beam python sdk with spark in a kubernetes environment

SonarQube stopped and is not running

Timed out waiting for container port to open (localhost ports: [32773] should be listening)

never ending job in mapreduce

JBoss strucked at Remoting

Categories

Resources