Docker blocks mounted volume - java

The situation is as follows:
The spring-boot service writes logs via logback to the output.log file
Logback is configured to rotate logs and it periodically empties output.log, dropping logs into archive files specified in the rotation policy. That is, the output.log itself is such a temporary buffer with the most recent log
That all works fine: https://www.youtube.com/watch?v=hrWmfC3y9zM
But if we mount the logs folder from the container outside to the host volume (and we need this for reading with a third-party utility), then everything breaks down: output.log stops being automatically cleared, grows indefinitely, and the archive files specified in the rotation policy are not created at all. This happens even when no one reads the file from this directory on the host machine.
For example
1) Directories inside container:
/app/
|-> service.jar
the buffer log is written here:
/app/logs/
|-> output.log
rotated archive log files are stored here
/app/
| -> out1.log
| -> out2.log
directory is mounted to the host in docker compose:
volumes:
- /LOGS:/app/logs:rw
2. The second container reading logs is mounted like this:
volumes:
- /LOGS:/var/log
But even when reading container is stopped, the file is locked. Just mounting folder from the source container is enough for blocking
Is this some kind of feature of docker volumes? Has it some workaround?

Related

How can I deploy cloudfoundry-uaa as a docker image based on tomcat?

We were using the cf-uaa's gradle tasks to create a docker image but those have been removed in the latest version. I've loaded the war in a recent version, but the service does not seem to be starting correctly.
I've been building the war from the v74 tag, adding it to tomcat:8.5.45-jdk12-openjdk-oracle or tomcat:9.0.24-jdk12-openjdk-oracle, and setting the various env vars that we were passing in to the previous image. I'm not seeing any log entries after the initial tomcat output stating that my war has been deployed and the server startup time.
The Dockerfile is basically just an adaptation of what was being passed in the previous image:
FROM tomcat:8.5.45-jdk12-openjdk-oracle
#FROM tomcat:9.0.24-jdk12-openjdk-oracle
ENV LOGIN_CONFIG_URL WEB-INF/classes/required_configuration.yml
ENV UAA_CONFIG_PATH /uaa
RUN bash -c "rm -r /usr/local/tomcat/webapps/ROOT"
RUN bash -c "rm -r /usr/local/tomcat/webapps/host-manager"
RUN bash -c "rm -r /usr/local/tomcat/webapps/manager"
RUN bash -c "rm -r /usr/local/tomcat/webapps/examples"
RUN bash -c "rm -r /usr/local/tomcat/webapps/docs"
ADD *.war /usr/local/tomcat/webapps/uaa.war
RUN bash -c "echo $LOGIN_CONFIG_URL"
EXPOSE 8080
I would expect to see the service responding to my requests, or some errors in the log indicating that the war failed to deploy. I am not currently getting any log output generated from the application code. When I send a request to the service, the response is a 500 with the an error header from the service.
X-Cf-Uaa-Error:Server failed to start. Possible configuration error.
update: I've located the uaa logs within .../tomcat/logs/uaa.log I'm not seeing anything indicating that the service failed to deploy, but I am also not seeing anything to indicate that it is picking up the env vars I have set in the container. I recreated the service using the war from the original setup which started successfully using the uaa.yml which I mounted as a volume. Comparing the logs, the original setup's first log entry is YamlProcessor which does not show up in the v75 logs at all. In fact, no debug entries show up at all, which suggests to me that my LOG_LEVEL env var is not propagating either.
Update 2: We reverted the image base to FROM tomcat:8.5-jre8 and started seeing flyway errors in the uaa.log. Our previous datasource url format was url: jdbc:postgresql://${POSTGRES_NAME}:5432/${DB}?currentSchema=uaa which caused a flyway exception. After removing the schema reference, it created the tables in the public schema. By creating the uaa schema manually before starting the service, it was able to run with the original format. The flyway version has updated, so perhaps there something new that needs to be set.
The application seems to be running, but when I try to get a token at /uaa/oauth/token I get a 500 with this error in the logs: Caused by: java.lang.NoSuchMethodError: java.nio.CharBuffer.limit(I)Ljava/nio/CharBuffer;
Since Jan 2021, UAA server docker images is now be available on cloudfoundry/uaa dockerhub repository.
docker pull cloudfoundry/uaa:75.0.0
See its Dockerfile for more details.
Can you try following ?
https://github.com/hortonworks/docker-cloudbreak-uaa
This works very well.

How to reference local Kafka and Zookeeper config on Spring Cloud Dataflow "Cloudfoundry" server start

Here's is what I have successfully done so far on SCDF Local Server
I have successfully deployed SCDF server on my local and also I have used Kafka and Zookeeper config parameters with it i.e
mymac$ java -jar spring-cloud-dataflow-server-local-1.3.0.RELEASE.jar
--spring.cloud.dataflow.applicationProperties.stream.spring.cloud.stream.kafka.binder.brokers=localhost:9092
--spring.cloud.dataflow.applicationProperties.stream.spring.cloud.stream.kafka.binder.zkNodes=localhost:2181
I was able to create my stream
ingest = producer-app > :broker1
filter = :broker1 > filter-app > :broker2
Now I need help to do the exact same thing on PCFDev
I have my PCFDEv running
I have to deploy SCDF-Cloudfoundry jar with my local kafka and zookeeper parameters to pcfDev but when I do the following steps it gives me an error that its
1.1) cf push -f manifest-scdf.yml --no-start -p /XXX/XXX/XXX/spring-cloud-dataflow-server-cloudfoundry-1.3.0.BUILD-SNAPSHOT.jar -k 1500M
this runs good...no problem. but 1.2
1.2) cf start dataflow-server --spring.cloud.dataflow.applicationProperties.stream.spring.cloud.stream.kafka.binder.brokers=host.pcfdev.io:9092 --spring.cloud.dataflow.applicationProperties.stream.spring.cloud.stream.kafka.binder.zkNodes=host.pcfdev.io:2181
gives me this error:--
Incorrect Usage: unknown flag `spring.cloud.dataflow.applicationProperties.stream.spring.cloud.stream.kafka.binder.brokers'
below is my manifest-scdf.yml file
---
instances: 1
memory: 2048M
applications:
- name: dataflow-server
host: dataflow-server
services:
- redis
- rabbit
env:
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_URL: https://api.local.pcfdev.io
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_ORG: pcfdev-org
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SPACE: pcfdev-space
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_DOMAIN: local.pcfdev.io
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_USERNAME: admin
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_PASSWORD: admin
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SKIP_SSL_VALIDATION: true
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES: rabbit
MAVEN_REMOTE_REPOSITORIES_REPO1_URL: https://repo.spring.io/libs-snapshot
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_DISK: 512
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_BUILDPACK: java_buildpack
spring.cloud.deployer.cloudfoundry.stream.memory: 400
spring.cloud.dataflow.features.tasks-enabled: true
spring.cloud.dataflow.features.streams-enabled: true
Please help me. Thank you.
There are few options to supply Kafka credentials to Stream-apps in PCF.
1. Kafka CUPs
This option allows you to create CUPs for an external Kafka-service. While deploying the stream, you can then supply the coordinates to each application either individually as described in the docs or you can supply them as global properties for all the stream-apps deployed by the SCDF-server.
2. Inline properties
Instead of extracting from CUPs, you can also directly supply the HOST/PORT while deploying the stream. Again, this can be applied globally, too.
stream deploy myTest --properties "app.*.spring.cloud.stream.kafka.binder.brokers=<HOST>:9092,app.*.spring.cloud.stream.kafka.binder.zkNodes=<HOST>:2181
Note: The HOST must be reachable for the stream-apps; o'wise, it ill continue to connect to localhost and potentially fail since the apps are running inside a VM.
The error you're seeing is coming from the CF CLI, it's interpreting those (I'm assuming environment) variables you're providing as flags to the cf start command and failing.
You could either provide them in your manifest.yml or set their values manually using the CLI's cf set-env command by doing something like this:
cf set-env dataflow-server spring.cloud.dataflow.applicationProperties.stream.spring.cloud.stream.kafka.binder.brokers host.pcfdev.io:9092
cf set-env dataflow-server spring.cloud.dataflow.applicationProperties.stream.spring.cloud.stream.kafka.binder.zkNodes host.pcfdev.io:2181
After you've set them they should be picked up when you run cf start dataflow-server.
Relevant CLI docs:
http://cli.cloudfoundry.org/en-US/cf/set-env.html

Kafka-connect logs in docker container

I have a kafka connect jar which needs to be run as a docker container. I need to capture all my connect logs on a log file in the container (preferably at a directory/file - /etc/kafka/kafka-connect-logs) which can later be pushed to localhost (on which docker engine is running) using volumes in docker. When I change my connect-log4j.properties to append into a log file, I see that no log file is created. If I try the same without docker and run the kafka connect on a local linux VM by changing connect-log4j.properties to write logs to a log file, it works perfectly but not from docker. Any suggestions will be very helpful.
Docker File
FROM confluent/platform
COPY Test.jar /usr/local/bin/
COPY kafka-connect-docker.sh /usr/local/bin/
COPY connect-distributed.properties /usr/local/bin/
COPY connect-log4j.properties /etc/kafka/connect-log4j.properties
RUN ["apt-get", "update"]
RUN ["apt-get", "install", "-yq", "curl"]
RUN ["chown", "-R", "confluent:confluent", "/usr/local/bin/kafka-connect-docker.sh", "/usr/local/bin/connect-distributed.properties", "/usr/local/bin/Test.jar"]
RUN ["chmod", "+x", "/usr/local/bin/kafka-connect-docker.sh", "/usr/local/bin/connect-distributed.properties", "/usr/local/bin/Test.jar"]
RUN ["chown", "-R", "confluent:confluent", "/etc/kafka/connect-log4j.properties"]
RUN ["chmod", "777", "/usr/local/bin/kafka-connect-docker.sh", "/etc/kafka/connect-log4j.properties"]
EXPOSE 8083
CMD [ "/usr/local/bin/kafka-connect-docker.sh" ]
connect-log4j.properties
# Root logger option
log4j.rootLogger = INFO, FILE
# Direct log messages to stdout
log4j.appender.FILE=org.apache.log4j.FileAppender
log4j.appender.FILE.File=/etc/kafka/log.out
# Define the layout for file appender
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.conversionPattern=%m%
log4j.logger.org.apache.zookeeper=ERROR
log4j.logger.org.I0Itec.zkclient=ERROR
kafka-connect-docker.sh
#!/bin/bash
export CLASSPATH=/usr/local/bin/Test.jar
exec /usr/bin/connect-distributed /usr/local/bin/connect-distributed.properties
It works fine when I am using the default connect-log4j.properties (appends logs to console), but am unable to create a log file in docker. Also, same process without docker works fine (creates log file ) in local VM.
Result of the discussion:
Declare the volume right away in the dockerfile and configure your logging configuration to place the log output directly to the volume. On docker run (or however you start your container) mount the volume and you should be fine.

How to specify the location of custom log4j.configuration when spark-submit to Amazon EMR?

I am trying to run a spark job in EMR cluster.
I my spark-submit I have added configs to read from log4j.properties
--files log4j.properties --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/log4j.properties"
Also I have added
log4j.rootLogger=INFO, file
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=/log/test.log
log4j.appender.file.MaxFileSize=10MB
log4j.appender.file.MaxBackupIndex=10
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %5p %c{7} - %m%n
in my log4j configurations.
Anyhow I see the logs in the console, though I don't see the log file generated. What am I doing wrong here ?
Quoting spark-submit --help:
--files FILES Comma-separated list of files to be placed in the working directory of each executor. File paths of these files in executors can be accessed via SparkFiles.get(fileName).
That doesn't much say what to do with the FILES if you cannot use SparkFiles.get(fileName) (which you cannot for log4j).
Quoting SparkFiles.get's scaladoc:
Get the absolute path of a file added through SparkContext.addFile().
That does not give you much either, but suggest to have a look at the source code of SparkFiles.get:
def get(filename: String): String =
new File(getRootDirectory(), filename).getAbsolutePath()
The nice thing about it is that getRootDirectory() uses an optional property or just the current working directory:
def getRootDirectory(): String =
SparkEnv.get.driverTmpDir.getOrElse(".")
That gives as something to work on, doesn't it?
On the driver the so-called driverTmpDir directory should be easy to find in Environment tab of web UI (under Spark Properties for spark.files property or Classpath Entries marked as "Added By User" in Source column).
On executors, I'd assume a local directory so rather than using file:/log4j.properties I'd use
-Dlog4j.configuration=file://./log4j.properties
or
-Dlog4j.configuration=file:log4j.properties
Note the dot to specify the local working directory (in the first option) or no leading / (in the latter).
Don't forget about spark.driver.extraJavaOptions to set the Java options for the driver if that's something you haven't thought about yet. You've been focusing on executors only so far.
You may want to add -Dlog4j.debug=true to spark.executor.extraJavaOptions that is supposed to print what locations log4j uses to find log4j.properties.
I have not checked that answer on a EMR or YARN cluster myself but believe that may have given you some hints where to find the answer. Fingers crossed!
With Spark 2.2.0 standalone cluster, executor JVM is started first and only then Spark distributes application jar and --files
Which means passing
spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j-spark.xml
does not makes sense as this file does not exist yet (is not downloaded) at the executor JVM launch time and log4j initialization
If you pass
spark.executor.extraJavaOptions=-Dlog4j.debug -Dlog4j.configuration=file:log4j-spark.xml
you will find at the beginning of the executor's stderr failed attempt to load log4j config file
log4j:ERROR Could not parse url [file:log4j-spark.xml].
java.io.FileNotFoundException: log4j-spark.xml (No such file or directory)
...
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
And bit later is logged download of --files from driver
18/07/18 17:24:12 INFO Utils: Fetching spark://123.123.123.123:35171/files/log4j-spark.xml to /ca/tmp-spark/spark-49815375-3f02-456a-94cd-8099a0add073/executor-7df1c819-ffb7-4ef9-b473-4a2f7747237a/spark-0b50a7b9-ca68-4abc-a05f-59df471f2d16/fetchFileTemp5898748096473125894.tmp
18/07/18 17:24:12 INFO Utils: Copying /ca/tmp-spark/spark-49815375-3f02-456a-94cd-8099a0add073/executor-7df1c819-ffb7-4ef9-b473-4a2f7747237a/spark-0b50a7b9-ca68-4abc-a05f-59df471f2d16/-18631083971531927447443_cache to /opt/spark-2.2.0-bin-hadoop2.7/work/app-20180718172407-0225/2/./log4j-spark.xml
It may work differently with yarn or another cluster manager but with standalone cluster, it seems there is no way you can specify your own logging configuration for executors on spark-submit.
You can dynamically reconfigure log4j in your job code (override log4j configuration programmatically: file location for FileAppender), but you would need to do it carefully in some mapPartition lambda that is executed in executor's JVM. Or maybe you can dedicate first stage of your job to it. All that sucks though...
Here is the complete command I used to run my uber-jar in EMR and I see log files generated in driver and executor nodes.
spark-submit --class com.myapp.cloud.app.UPApp --master yarn --deploy-mode client --driver-memory 4g --executor-memory 2g --executor-cores 8 --files log4j.properties --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties -Dlog4j.debug=true" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" --conf "spark.eventLog.dir=/mnt/var/log/" uber-up-0.0.1.jar
where log4j.properties is in my local filesystem.

Integrate Apache storm with zookeeper in windows

Can anyone help me how to integrate storm with zookeeper in windows.
I try to find the find a good installation steps for production in windows but I can't.
Write now I have installed stand alone zookeeper and I am trying to configure it in the storm.yaml.
sample code I tried :
storm.zookeeper.servers:
- "127.0.0.1"
- "server2"
storm.zookeeper.port: 2180
nimbus.host: "localhost"
If any body knows please help me.
Please follow the below steps for complete installation of storm with zookeeper in windows
1. create a zoo.cfg file under /conf directory of zookeeper.
here are the configuration entries
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=D:\\workspace\\zk-data
# the port at which the clients will connect
clientPort=2181
2.run the zookeeper by executing the zkServer.bat in /bin directory
3.create the storm.yaml file in /conf directory of the apache-storm
here are the configuration entries
########### These MUST be filled in for a storm configuration
storm.zookeeper.servers:
- "localhost"
#supervisor.slots.ports:
# - 6700
# - 6701
# - 6702
# - 6703
storm.local.dir: D:\\workspace\\storm-data
nimbus.host: "localhost"
#
#
4.run the below commands from the command prompt
a. bin\storm nimbus
b. bin\storm supervisor
c. bin\storm ui
hit the browser with localhost:8080 to see the nimbus ui
6.make sure your JAVA_HOME path does not have any space. otherwise storm command
will not run.

Categories