pio train fails with IOException: Connection reset by peer

pio train fails with IOException: Connection reset by peer - java

I've done a setup of predictionIO v0.13 on my linux machine in docker (running in swarm mode). This setup includes:
one container for pio v0.13
one container for elasticsearch v5.6.4
one container for mysql v8.0.16
one container for spark-master v2.3.2
one container for spark-worker v2.3.2
The template I am using is the ecomm-recommender-java, modified for my data. I don't know if I made an error with the template or with the docker setup, but there is something really wrong:
pio build succeeds
pio train fails - with
Exception in thread "main" java.io.IOException: Connection reset by peer
Because of this, I put a lot of logging into my template for various points, and this is what I found:
The train fails after the model is computed. I am using a custom Model class, for holding the logistic-regression model and the various user and product indices.
The model is a PersistentModel. In the save method I put logging after every step. Those are logged, and I can find the saved results in the mounted docker volume, so it seems like the save also succeeds, but after that I get the following exception:
[INFO] [Model] saving user index
[INFO] [Model] saving product index
[INFO] [Model] save done
[INFO] [AbstractConnector] Stopped Spark#20229b7d{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
Exception in thread "main" java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.SessionInputBufferImpl.fill(SessionInputBufferImpl.java:204)
at org.apache.predictionio.shaded.org.apache.http.impl.nio.codecs.AbstractMessageParser.fillBuffer(AbstractMessageParser.java:136)
at org.apache.predictionio.shaded.org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:241)
at org.apache.predictionio.shaded.org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
at org.apache.predictionio.shaded.org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
at java.lang.Thread.run(Thread.java:748)
I couldn't find any more relevant in any of the logs, but that's a possibility that I overlooked something.
I tried to play with the train parameters like so:
pio-docker train -- --master local[3] --driver-memory 4g --executor-memory 10g --verbose --num-executors 3
playing with the spark modes (i.e.: --master local[1-3], and not providing that to use the instances in the docker containers)
played with the --driver-memory (from 4g to 10g)
played with the --executor-memory (also from 4g to 10g)
played with the --num-executors number (from 1 to 3)
As most of the google search results are suggested these.
My main problem here is that I don't know from where this exception is coming and how to discover it.
Here is the save and method, which could be relevant:
public boolean save(String id, AlgorithmParams algorithmParams, SparkContext sparkContext) {
try {
logger.info("saving logistic regression model");
logisticRegressionModel.save("/templates/" + id + "/lrm");
logger.info("creating java spark context");
JavaSparkContext jsc = JavaSparkContext.fromSparkContext(sparkContext);
logger.info("saving user index");
userIdIndex.saveAsObjectFile("/templates/" + id + "/indices/user");
logger.info("saving product index");
productIdIndex.saveAsObjectFile("/templates/" + id + "/indices/product");
logger.info("save done");
} catch (IOException e) {
e.printStackTrace();
}
return true;
}
The hardcoded /templates/ is the docker-mounted volume for pio and for spark also.
Expected result is: train completes without error.
I am happy to share more details if necessary, please ask for them, as I am not sure what could be helpful here.
EDIT1: Including docker-compose.yml
version: '3'
networks:
mynet:
driver: overlay
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:5.6.4
environment:
- xpack.graph.enabled=false
- xpack.ml.enabled=false
- xpack.monitoring.enabled=false
- xpack.security.enabled=false
- xpack.watcher.enabled=false
- cluster.name=predictionio
- bootstrap.memory_lock=false
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
volumes:
- pio-elasticsearch-data:/usr/share/elasticsearch/data
deploy:
replicas: 1
networks:
- mynet
mysql:
image: mysql:8
command: mysqld --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
environment:
MYSQL_ROOT_PASSWORD: somepass
MYSQL_USER: someuser
MYSQL_PASSWORD: someotherpass
MYSQL_DATABASE: pio
volumes:
- pio-mysql-data:/var/lib/mysql
deploy:
replicas: 1
networks:
- mynet
spark-master:
image: bde2020/spark-master:2.3.2-hadoop2.7
ports:
- "8080:8080"
- "7077:7077"
volumes:
- ./templates:/templates
environment:
- INIT_DAEMON_STEP=setup_spark
deploy:
replicas: 1
networks:
- mynet
spark-worker:
image: bde2020/spark-worker:2.3.2-hadoop2.7
depends_on:
- spark-master
ports:
- "8081:8081"
volumes:
- ./templates:/templates
environment:
- "SPARK_MASTER=spark://spark-master:7077"
deploy:
replicas: 1
networks:
- mynet
pio:
image: tamassoltesz/pio0.13-spark.230:1
ports:
- 7070:7070
- 8000:8000
volumes:
- ./templates:/templates
dns: 8.8.8.8
depends_on:
- mysql
- elasticsearch
- spark-master
environment:
PIO_STORAGE_SOURCES_MYSQL_TYPE: jdbc
PIO_STORAGE_SOURCES_MYSQL_URL: "jdbc:mysql://mysql/pio"
PIO_STORAGE_SOURCES_MYSQL_USERNAME: someuser
PIO_STORAGE_SOURCES_MYSQL_PASSWORD: someuser
PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME: pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE: MYSQL
PIO_STORAGE_REPOSITORIES_MODELDATA_NAME: pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE: MYSQL
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE: elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS: predictionio_elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS: 9200
PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES: http
PIO_STORAGE_REPOSITORIES_METADATA_NAME: pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE: ELASTICSEARCH
MASTER: spark://spark-master:7077 #spark master
deploy:
replicas: 1
networks:
- mynet
volumes:
pio-elasticsearch-data:
pio-mysql-data:

I found out what the issue is: somehow the connection to elasticsearch is lost during the long-running train. This is a docker issue, not a predictionIO issue. For now, I "solved" this by not using elasticsearch at all.
Another thing I was not aware of: it does matter where you put your --verbose in the command. Providing it in the way I did originally (like pio train -- --driver-memory 4g --verbose) has no/little effect on the verbosity of the logging. The right way to do so is pio train --verbose -- --driver-memory 4g, so before the --. This way I got much more log, from which the origin of the issue became clear.

Related

unable to start Apache NiFi UI

I have created a container for Apache NiFi using a docker-compose file. When I run the docker-compose up command I get the following error when the nifi container is run:
2022-10-19 14:59:34,234 WARN [main] org.apache.nifi.web.server.JettyServer Failed to start the web server... shutting down.
nifi_container_persistent | java.io.IOException: Function not implemented
What exactly is the java.io.IOexception function? Where can I make the required change to that I can fix this error?
Here is the docker compose file for nifi with custom bridge network "my_network":
nifi:
hostname: mynifi
container_name: nifi_container_persistent
image: 'apache/nifi:1.16.1' # latest image as of 2021-11-09.
restart: on-failure
ports:
- '8091:8080'
environment:
- NIFI_WEB_HTTP_PORT=8080
- NIFI_CLUSTER_IS_NODE=true
- NIFI_CLUSTER_NODE_PROTOCOL_PORT=8082
- NIFI_ZK_CONNECT_STRING=myzookeeper:2181
- NIFI_ELECTION_MAX_WAIT=30 sec
- NIFI_SENSITIVE_PROPS_KEY='12345678901234567890A'
healthcheck:
test: "${DOCKER_HEALTHCHECK_TEST:-curl localhost:8091/nifi/}"
interval: "60s"
timeout: "3s"
start_period: "5s"
retries: 5
volumes:
- ./nifi/database_repository:/opt/nifi/nifi-current/database_repository
- ./nifi/flowfile_repository:/opt/nifi/nifi-current/flowfile_repository
- ./nifi/content_repository:/opt/nifi/nifi-current/content_repository
- ./nifi/provenance_repository:/opt/nifi/nifi-current/provenance_repository
- ./nifi/state:/opt/nifi/nifi-current/state
- ./nifi/logs:/opt/nifi/nifi-current/logs
# uncomment the next line after copying the /conf directory from the container to your local directory to persist NiFi flows
#- ./nifi/conf:/opt/nifi/nifi-current/conf
networks:
- my_network
please help

Cannot run or connect Portgresql container on Docker

In my Windows 10 machine I have a Java app and create Postgresql images on Docker using the following configuration:
docker-compose.yml:*
version: '2.0'
services:
postgresql:
image: postgres:11
ports:
- "5432:5432"
expose:
- "5432"
environment:
- POSTGRES_USER=demo
- POSTGRES_PASSWORD=******
- POSTGRES_DB=demo_test
And I use the following command to compose images:
cd postgresql
docker-compose up -d
Although pgadmin container is working on Docker, postgres container is generally restarting state and sometines seems to be running state for a second. When I look at that container log, I see I encounter the following errors:
2021-03-16 09:00:18.526 UTC [82] FATAL: data directory "/data/postgres" has wrong ownership
2021-03-16 09:00:18.526 UTC [82] HINT: The server must be started by the user that owns the data directory.
child process exited with exit code 1
*initdb: removing contents of data directory "/data/postgres"
running bootstrap script ... The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
I have tried to apply several workaround suggestions e.g. PostgreSQL with docker ownership issue, but none of them is working. So, how can I fix this problem?
Update: Here is last status of my docker-compoese.yml file:
version: '2.0'
services:
postgresql:
image: postgres:11
container_name: "my-pg"
ports:
- "5432:5432"
expose:
- "5432"
environment:
- POSTGRES_USER=demo
- POSTGRES_PASSWORD=******
- POSTGRES_DB=demo_test
volumes:
- psql:/var/lib/postgresql/data
volumes:
psql:

As I already stated in my comment I'd suggest using a named volume.
Here's my docker-compose.yml for Postgres 12:
version: "3"
services:
postgres:
image: "postgres:12"
container_name: "my-pg"
ports:
- 5432:5432
environment:
POSTGRES_USER: "postgres"
POSTGRES_PASSWORD: "postgres"
POSTGRES_DB: "mydb"
volumes:
- psql:/var/lib/postgresql/data
volumes:
psql:
Then I created the psql volume via docker volume create psql (so just a volume without any actual path mapping).

Problem with docker-compose up on windows 10 home "exec: not found"

I'm trying to create an image of a application that I'm creating from a Udemy course, it is a Java Spring-Boot REST application which uses MySQL database. Here's the problem: I've done the same steps as my teacher show on video, but for some reason, my docker can't run the image.
Here's the docker-compose.yml and Dockerfile and the logs:
docker-compose.yml
version: '3.4'
services:
db:
image: raphasalomao/restudemy
command: mysqld --default-authentication-plugin=mysql_native_password
restart: always
build:
context: .
dockerfile: Dockerfile
environment:
TZ: America/Sao_Paulo
MYSQL_ROOT_PASSWORD: docker
MYSQL_USER: docker
MYSQL_PASSWORD: docker
MYSQL_DATABASE: restudemy
ports:
- "3308:3306"
networks:
- udemy-network
restudemy:
image: raphasalomao/restudemy
restart: always
build: /Users/rapha/OneDrive/Documentos/Projetos/RestUdemy/02 RestWithSpringBoot
working_dir: /Users/rapha/OneDrive/Documentos/Projetos/RestUdemy/02 RestWithSpringBoot
environment:
TZ: America/Sao_Paulo
SPRING_BOOT_ENVIRONMENT: Production
volumes:
- ./02 RestWithSpringBoot:/Users/rapha/OneDrive/Documentos/Projetos/RestUdemy/02 RestWithSpringBoot
- ~/.m2:/root/.m2
ports:
- "8080:8080"
command: mvn clean spring-boot:run
links:
- db
depends_on:
- db
networks:
- udemy-network
networks:
udemy-network:
driver: bridge
Dockerfile:
FROM mysql:5.7.23
EXPOSE 3308
LOG:
/usr/local/bin/mvn-entrypoint.sh: 50: exec: mysqld: not found
I've tried to change the $PATH on wsl, use mysqld.exe instead of MySQL, update Windows, install docker and WSL again, but nothing works.

I found the problem, actually, this is the first time that I use docker, o don't know much about, but the problem was the mysql image, i change from "image: raphasalomao/restudemy" to "image: mysql:5.7"

This image "raphasalomao/restudemy" came from which Registry? DockerHub??
I Couldnt find this on DockerHub.
But based on the message, it appears to be a Maven related Image.
Why not use mysql Official Image?

Docker: Springboot container cannot connect to PostgreSql Container

I am building a Spring Boot application which uses PostgreSQL with docker-compose.
When I run my containers using docker-compose up --build, my Spring Boot application fails to start because it does not find the PostgreSQL container's hostname.
Spring Boot Dockerfile
FROM maven:3.6.3-openjdk-14-slim AS build
COPY src /usr/src/app/src
COPY pom.xml /usr/src/app
RUN mvn -f /usr/src/app/pom.xml clean package
FROM openjdk:14-slim
COPY --from=build /usr/src/app/target/server-0.0.1-SNAPSHOT.jar /usr/app/server-0.0.1-SNAPSHOT.jar
EXPOSE 9000
ENTRYPOINT ["java","-jar","/usr/app/server-0.0.1-SNAPSHOT.jar"]
docker-compose.yml
version: '3'
services:
db:
image: postgres
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: my_db
ports:
- "5432:5432"
volumes:
- db-data:/var/lib/postgresql/data
networks:
- db-network
restart: always
server:
build: './server'
depends_on:
- db
restart: always
ports:
- "9000:9000"
networks:
- db-network
volumes:
- ./server:/server
networks:
db-network:
volumes:
db-data:
application.properties
spring.datasource.url=jdbc:postgresql://db:5432/my_db
spring.datasource.username=postgres
spring.datasource.password=postgres
Error output
Caused by: java.net.UnknownHostException: db
My guess is that docker-compose's virtual network isn't created yet during the build stage of the Spring Boot Dockerfile.
Any idea on how to solve this issue ?

Lots of info here: https://docs.docker.com/compose/networking/
Within the web container, your connection string to db would look like
postgres://db:5432, and from the host machine, the connection string
would look like postgres://{DOCKER_IP}:8001.
What this is saying is db:5432 is fine to use within docker-compose.yaml and the IP address will be passed (not "db"), but using it externally within your application code isn't going to work. You could however pass from docker-compose.yaml db as an application input variable, which your application could fill in in the configuration file. This would enable you then to connect.
Externalising configuration like this is fairly common practice so should be a relatively easy fix.
eg:
Docker-compose.yaml
version: '3'
services:
db:
container_name: db
image: postgres
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: my_db
ports:
- "5432:5432"
volumes:
- db-data:/var/lib/postgresql/data
networks:
- db-network
restart: always
server:
build: './server'
depends_on:
- db
environment:
DB_HOST: db # untested, but there should be a way to pass this in
DB_PORT: 5432
DB_DATABASE: my_db
DB_USER: postgres
DB_PASSWORD: postgres
restart: always
ports:
- "9000:9000"
networks:
- db-network
volumes:
- ./server:/server
networks:
db-network:
volumes:
db-data:
Then have an application.properties file located under src/main/java/resources/application.properties
spring.datasource.url=jdbc:postgresql://${DB_HOST}:${DB_PORT}/${DB_DATABASE}
spring.datasource.username=${DB_USERNAME}
spring.datasource.password=${DB_PASSWORD}

This post completely solved my issue.
It turns out that maven was trying to establish the connection to the database while building the .jar file. All I had to do is modify my Dockerfile with this line RUN mvn -f /usr/src/app/pom.xml clean package -DskipTests.

Please do note while building the images the service will not have access to the database as it is not yet running . Only after the images are built and the containers are running do the services have access . So when you try to pass a host as db , it is not yet available in the build stage . It is available only once the db container starts running .

Dockerized Spring Boot microservice freeze without reason

I have a microservices based project. Each microservice is a Spring Boot (v.2.0.0-RC2) app. I have also a discovery, config and gateway microservices based on Spring Cloud (Finchley). The whole system is deployed on test machine using Docker Compose.
I realized that one of the microservices freezes after receiving several subsequent requests from frontend app, in a short period of time. After this, it becomes unresponsive for further requests, and I receive read timeout from my gateway. The same occurs when calling this microservice directly, bypassing the gateway.
I have a spring boot admin instance, and I realized the microservice goes offline and online again every 5 minutes. Despite of that, nothing interesting occurs in logs. No memory issues observed.
Next remark: this problem occurs only when I start all system from docker compose in same time. When I restart this single microservice, I can't reproduce it anymore.
And the last: the whole container of the microservice seems to be freezed. When I do 'docker stop' on it, the terminal hangs up, but after checking the container status in another terminal, the container appears as 'exited'. A very strange thing occured, when I did 'docker attach' on the container. The terminal also hung up and when I exited from it, my problematic microservice started to work properly and accepts incoming request with success.
Can anyone help me with this strange problem ? I have really no more ideas, what can I try to resolve it.
Thanks in advance for any clue.
EDIT
docker-compose.yml
version: '3.4'
services:
config-service:
image: im/config-service
container_name: config-service
environment:
- SPRING_PROFILES_ACTIVE=native
volumes:
- ~/production-logs:/logs
discovery-service:
image: im/discovery-service
container_name: discovery-service
environment:
- SPRING_PROFILES_ACTIVE=production
volumes:
- ~/production-logs:/logs
gateway-service:
image: im/gateway-service
container_name: gateway-service
ports:
- "8080:8080"
depends_on:
- config-service
- discovery-service
environment:
- SPRING_PROFILES_ACTIVE=production
volumes:
- ~/production-logs:/logs
car-service_db:
image: postgres:9.5
container_name: car-service_db
environment:
- POSTGRES_DB=car
- POSTGRES_USER=user
- POSTGRES_PASSWORD=pass
car-service:
image: im/car-service
container_name: car-service
depends_on:
- config-service
- discovery-service
- car-service_db
environment:
- SPRING_PROFILES_ACTIVE=production
- CAR_SERVICE_DB_URL=jdbc:postgresql://car-service_db:5432/car
- CAR_SERVICE_DB_USER=user
- CAR_SERVICE_DB_PASSWORD=pass
volumes:
- ~/production-logs:/logs
Dockerfile of car-service
FROM openjdk:8-jdk-alpine
VOLUME /tmp
EXPOSE 9005
ARG JAR_FILE
ADD ${JAR_FILE} app.jar
ENV JAVA_OPTS="-agentlib:jdwp=transport=dt_socket,address=8001,server=y,suspend=n"
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS -Djava.security.egd=file:/dev/./urandom -jar /app.jar"]
Command used to start up
docker-compose up
Test machine:
Ubuntu Server 16.04 LTS

RESOLVED
The cause was logging aspect. I realized a lot of threads waiting on:
sun.misc.Unsafe.park(Unsafe.java:-2) native
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
ch.qos.logback.core.OutputStreamAppender.writeBytes(OutputStreamAppender.java:197)
ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:231)
ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:102)
ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
ch.qos.logback.classic.Logger.filterAndLog_2(Logger.java:414)
ch.qos.logback.classic.Logger.debug(Logger.java:490)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

pio train fails with IOException: Connection reset by peer - java

Related

unable to start Apache NiFi UI

Cannot run or connect Portgresql container on Docker

Problem with docker-compose up on windows 10 home "exec: not found"

Docker: Springboot container cannot connect to PostgreSql Container

Dockerized Spring Boot microservice freeze without reason

Categories

Resources