I am trying to use Azure Blob Storage with Apache Flink 1.10 for checkpointing purposes.
I followed all the instructions mentioned in [Flink Documentation][1] https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/filesystems/azure.html
Step 1:
mkdir ./plugins/azure-fs-hadoop
cp ./opt/flink-azure-fs-hadoop-1.10.0.jar ./plugins/azure-fs-hadoop/
Step 2:
This is what I have in flink-conf.xml
#Azure Storage Key
**fs.azure.account.key.<storage-account>.blob.core.windows.net:xxxxxxxxxxxxxxxxxx**
Step 3:
Use Azure Blob storage for checkpointing
This is what I have in my flink job
final StateBackend stateBackend = new FsStateBackend("wasb://flink-blob#$<storage-account>.blob.core.windows.net/checkpoint");
I am not sure if i missed anything here but when I submit the job, I get below Exception (AskTimeoutException from Actor).
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:664)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:895)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:968)
at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:968)
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:199)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1741)
at org.apache.flink.streaming.api.environment.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:94)
at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:63)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1620)
at com.example.flink.checkpointing.CheckpointExample.main(CheckpointExample.java:78)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:321)
... 8 more
Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1736)
... 17 more
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$7(RestClusterClient.java:359)
at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
at java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$8(FutureUtils.java:274)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610)
at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1085)
at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Internal server error., <Exception on server side:
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#75666936]] after [10000 ms]. Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
at akka.pattern.PromiseActorRef$.$anonfun$defaultOnTimeout$1(AskSupport.scala:635)
at akka.pattern.PromiseActorRef$.$anonfun$apply$1(AskSupport.scala:650)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:870)
at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:109)
at scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:868)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)
at akka.actor.LightArrayRevolverScheduler$$anon$3.executeBucket$1(LightArrayRevolverScheduler.scala:279)
at akka.actor.LightArrayRevolverScheduler$$anon$3.nextTick(LightArrayRevolverScheduler.scala:283)
at akka.actor.LightArrayRevolverScheduler$$anon$3.run(LightArrayRevolverScheduler.scala:235)
at java.base/java.lang.Thread.run(Thread.java:834)
End of exception on server side>]
at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:390)
at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:374)
at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
... 4 more```
I think your problem is twofold. The true failure cause is hidden because of the AskTimeoutException. This problem has been solved with FLINK-16018 which will be released with Flink 1.10.1. The problem is that the timeout value is too aggressive so that a long lasting job submission will fail on the client side.
For the true failure cause, I would recommend to take a look at Flink's jobmanager.log. It should contain information about what went wrong. I would suspect that there is a misconfiguration of the Azure Blob Storage.
That's correct. You must set your checkpoints.dir and savepoints.dir in the following format and use
fs.azure.account.key.<storage-account-name>.blob.core.windows.net:<key>
wasbs://<container>#<storage-account-name>.blob.core.windows.net/<directory>/
Related
I am running a Spring-Boot project in IDEA Intelli for training that writes a recipe to the database using REST-full services. The IDE runs the server and I use Postman to query.
My problem is that the H2 database does not persist the data entered. In memory it's fine, but as soon as I shutdown and start up again, the data is gone.
Looking at the database file, I see a trace file that seems to show that the DB wasn't updated because there was a lock on it. I have searched for a lock file (locate *.lock.db), stopped all Java processes, and even rebooted without a change in behavior. I've changed the FILE_LOCK options for H2 but the problem persists.
EDIT: I just noticed that I hadn't tried FILE_LOCK=FILE. I just tried it and the db isn't updated. However, a trace file is not produced.
EDIT 2: I forgot to say that I'm running on Ubuntu 20.04.
EDIT 3: I am being careful to shutdown using the "actuator/shutdown" from Postman. Using file locking, I do this:
start the server running from my IDE. I can see the lock file being created
create a new recipe
make sure I can get it (returns the recipe, no error)
shutdown using the actuator. The lock file is removed.
startup the server again. New lock file.
try to get the recipe. Returns a 404 error. No trace file.
What could be causing this? Here is my trace.db file [note: when I use FILE_LOCK=FILE, no trace file is created, so maybe the locking problem was a red herring?]:
2021-09-09 07:54:08 database: flush
org.h2.message.DbException: General error: "java.lang.IllegalStateException: The file is locked: nio:/home/knute/IdeaProjects/recipes_db.mv.db [1.4.200/7]" [50000-200]
at org.h2.message.DbException.get(DbException.java:194)
at org.h2.message.DbException.convert(DbException.java:347)
at org.h2.mvstore.db.MVTableEngine$1.uncaughtException(MVTableEngine.java:93)
at org.h2.mvstore.MVStore.handleException(MVStore.java:2877)
at org.h2.mvstore.MVStore.panic(MVStore.java:481)
at org.h2.mvstore.MVStore.<init>(MVStore.java:402)
at org.h2.mvstore.MVStore$Builder.open(MVStore.java:3579)
at org.h2.mvstore.db.MVTableEngine$Store.open(MVTableEngine.java:170)
at org.h2.mvstore.db.MVTableEngine.init(MVTableEngine.java:103)
at org.h2.engine.Database.getPageStore(Database.java:2659)
at org.h2.engine.Database.open(Database.java:675)
at org.h2.engine.Database.openDatabase(Database.java:307)
at org.h2.engine.Database.<init>(Database.java:301)
at org.h2.engine.Engine.openSession(Engine.java:74)
at org.h2.engine.Engine.openSession(Engine.java:192)
at org.h2.engine.Engine.createSessionAndValidate(Engine.java:171)
at org.h2.engine.Engine.createSession(Engine.java:166)
at org.h2.engine.Engine.createSession(Engine.java:29)
at org.h2.engine.SessionRemote.connectEmbeddedOrServer(SessionRemote.java:340)
at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:173)
at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:152)
at org.h2.Driver.connect(Driver.java:69)
at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:138)
at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:358)
at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:206)
at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:477)
at com.zaxxer.hikari.pool.HikariPool.access$100(HikariPool.java:71)
at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:725)
at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:711)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.h2.jdbc.JdbcSQLNonTransientException: General error: "java.lang.IllegalStateException: The file is locked: nio:/home/knute/IdeaProjects/recipes_db.mv.db [1.4.200/7]" [50000-200]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:505)
at org.h2.message.DbException.getJdbcSQLException(DbException.java:429)
... 33 more
Caused by: java.lang.IllegalStateException: The file is locked: nio:/home/knute/IdeaProjects/recipes_db.mv.db [1.4.200/7]
at org.h2.mvstore.DataUtils.newIllegalStateException(DataUtils.java:950)
at org.h2.mvstore.FileStore.open(FileStore.java:166)
at org.h2.mvstore.MVStore.<init>(MVStore.java:381)
... 27 more
Caused by: java.nio.channels.OverlappingFileLockException
at java.base/sun.nio.ch.FileLockTable.checkList(FileLockTable.java:229)
at java.base/sun.nio.ch.FileLockTable.add(FileLockTable.java:123)
at java.base/sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:1154)
at org.h2.store.fs.FileNio.tryLock(FilePathNio.java:121)
at java.base/java.nio.channels.FileChannel.tryLock(FileChannel.java:1165)
at org.h2.mvstore.FileStore.open(FileStore.java:163)
... 28 more
2021-09-09 07:54:08 database: flush
org.h2.message.DbException: General error: "java.lang.IllegalStateException: The file is locked: nio:/home/knute/IdeaProjects/recipes_db.mv.db [1.4.200/7]" [50000-200]
at org.h2.message.DbException.get(DbException.java:194)
at org.h2.message.DbException.convert(DbException.java:347)
at org.h2.mvstore.db.MVTableEngine$1.uncaughtException(MVTableEngine.java:93)
at org.h2.mvstore.MVStore.handleException(MVStore.java:2877)
at org.h2.mvstore.MVStore.panic(MVStore.java:481)
at org.h2.mvstore.MVStore.<init>(MVStore.java:402)
at org.h2.mvstore.MVStore$Builder.open(MVStore.java:3579)
at org.h2.mvstore.db.MVTableEngine$Store.open(MVTableEngine.java:170)
at org.h2.mvstore.db.MVTableEngine.init(MVTableEngine.java:103)
at org.h2.engine.Database.getPageStore(Database.java:2659)
at org.h2.engine.Database.open(Database.java:675)
at org.h2.engine.Database.openDatabase(Database.java:307)
at org.h2.engine.Database.<init>(Database.java:301)
at org.h2.engine.Engine.openSession(Engine.java:74)
at org.h2.engine.Engine.openSession(Engine.java:192)
at org.h2.engine.Engine.createSessionAndValidate(Engine.java:171)
at org.h2.engine.Engine.createSession(Engine.java:166)
at org.h2.engine.Engine.createSession(Engine.java:29)
at org.h2.engine.SessionRemote.connectEmbeddedOrServer(SessionRemote.java:340)
at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:173)
at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:152)
at org.h2.Driver.connect(Driver.java:69)
at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:138)
at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:358)
at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:206)
at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:477)
at com.zaxxer.hikari.pool.HikariPool.access$100(HikariPool.java:71)
at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:725)
at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:711)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.h2.jdbc.JdbcSQLNonTransientException: General error: "java.lang.IllegalStateException: The file is locked: nio:/home/knute/IdeaProjects/recipes_db.mv.db [1.4.200/7]" [50000-200]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:505)
at org.h2.message.DbException.getJdbcSQLException(DbException.java:429)
... 33 more
Caused by: java.lang.IllegalStateException: The file is locked: nio:/home/knute/IdeaProjects/recipes_db.mv.db [1.4.200/7]
at org.h2.mvstore.DataUtils.newIllegalStateException(DataUtils.java:950)
at org.h2.mvstore.FileStore.open(FileStore.java:166)
at org.h2.mvstore.MVStore.<init>(MVStore.java:381)
... 27 more
Caused by: java.nio.channels.OverlappingFileLockException
at java.base/sun.nio.ch.FileLockTable.checkList(FileLockTable.java:229)
at java.base/sun.nio.ch.FileLockTable.add(FileLockTable.java:123)
at java.base/sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:1154)
at org.h2.store.fs.FileNio.tryLock(FilePathNio.java:121)
at java.base/java.nio.channels.FileChannel.tryLock(FileChannel.java:1165)
at org.h2.mvstore.FileStore.open(FileStore.java:163)
... 28 more
...and the application.properties file [note: in my current iteration, I am using FILE_LOCK=FILE]:
# Required by HyperSkill
server.port=8881
management.endpoints.web.exposure.include=*
management.endpoint.shutdown.enabled=true
spring.datasource.url=jdbc:h2:file:../recipes_db
# Solves file locking problem? No.
#spring.datasource.url=jdbc:h2:file:../recipes_db;DB_CLOSE_ON_EXIT=TRUE;FILE_LOCK=NO
#spring.datasource.url=jdbc:h2:file:../recipes_db;FILE_LOCK=NO
#spring.datasource.url=jdbc:h2:file:../recipes_db;FILE_LOCK=SOCKET
#spring.datasource.url=jdbc:h2:file:../recipes_db;FILE_LOCK=FS
# Needed?
spring.datasource.auto-commit=true
# To remove warning
spring.jpa.open-in-view=true
I found the problem! Adding debug=true to the applications.properties file, I saw this:
Starting delayed evictData of schema as part of SessionFactory shut-down
So for some reason, Spring was explicitly dropping my table! Adding this to the applications.properties file stopped that:
spring.jpa.hibernate.ddl-auto=update
I installed a new subversion repository but I am unable to connect to the repository.
In the old repository it is working as expected. The new repository thorws an exception:
org.eclipse.team.svn.core.connector.SVNConnectorException: svn: E210002: There was a problem while connecting to xn--x7h.example.com:22
at org.polarion.team.svn.connector.svnkit.SVNKitService.handleClientException(SVNKitService.java:59)
at org.polarion.team.svn.connector.svnkit.SVNKitConnector.listEntries(SVNKitConnector.java:1758)
at org.eclipse.team.svn.core.extension.factory.ThreadNameModifier.listEntries(ThreadNameModifier.java:324)
at org.eclipse.team.svn.core.utility.SVNUtility.list(SVNUtility.java:440)
at org.eclipse.team.svn.core.svnstorage.SVNRepositoryContainer.getChildren(SVNRepositoryContainer.java:79)
at org.eclipse.team.svn.core.operation.remote.GetRemoteFolderChildrenOperation.runImpl(GetRemoteFolderChildrenOperation.java:76)
at org.eclipse.team.svn.core.operation.AbstractActionOperation.run(AbstractActionOperation.java:82)
at org.eclipse.team.svn.core.utility.ProgressMonitorUtility.doTask(ProgressMonitorUtility.java:104)
at org.eclipse.team.svn.core.operation.CompositeOperation.runImpl(CompositeOperation.java:99)
at org.eclipse.team.svn.core.operation.AbstractActionOperation.run(AbstractActionOperation.java:82)
at org.eclipse.team.svn.core.operation.LoggedOperation.run(LoggedOperation.java:40)
at org.eclipse.team.svn.core.utility.ProgressMonitorUtility.doTask(ProgressMonitorUtility.java:104)
at org.eclipse.team.svn.core.utility.ProgressMonitorUtility.doTaskExternal(ProgressMonitorUtility.java:90)
at org.eclipse.team.svn.ui.utility.DefaultCancellableOperationWrapper.run(DefaultCancellableOperationWrapper.java:55)
at org.eclipse.team.svn.ui.utility.ScheduledOperationWrapper.run(ScheduledOperationWrapper.java:37)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)
Caused by: org.apache.subversion.javahl.ClientException: svn: E210002: There was a problem while connecting to xn--x7h.example.com:22
at org.apache.subversion.javahl.ClientException.fromException(ClientException.java:117)
at org.tmatesoft.svn.core.javahl17.SVNClientImpl.getClientException(SVNClientImpl.java:1539)
at org.tmatesoft.svn.core.javahl17.SVNClientImpl.list(SVNClientImpl.java:189)
at org.polarion.team.svn.connector.svnkit.SVNKitConnector.listEntries(SVNKitConnector.java:1745)
... 14 more
Caused by: org.tmatesoft.svn.core.SVNException: svn: E210002: There was a problem while connecting to xn--x7h.example.com:22
at org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:70)
at org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:57)
at org.tmatesoft.svn.core.internal.io.svn.SVNSSHConnector.open(SVNSSHConnector.java:145)
at org.tmatesoft.svn.core.internal.io.svn.SVNConnection.open(SVNConnection.java:77)
at org.tmatesoft.svn.core.internal.io.svn.SVNRepositoryImpl.openConnection(SVNRepositoryImpl.java:1273)
at org.tmatesoft.svn.core.internal.io.svn.SVNRepositoryImpl.getLatestRevision(SVNRepositoryImpl.java:172)
at org.tmatesoft.svn.core.internal.wc2.ng.SvnNgRepositoryAccess.getRevisionNumber(SvnNgRepositoryAccess.java:119)
at org.tmatesoft.svn.core.internal.wc2.SvnRepositoryAccess.getLocations(SvnRepositoryAccess.java:195)
at org.tmatesoft.svn.core.internal.wc2.ng.SvnNgRepositoryAccess.createRepositoryFor(SvnNgRepositoryAccess.java:46)
at org.tmatesoft.svn.core.internal.wc2.remote.SvnRemoteList.run(SvnRemoteList.java:36)
at org.tmatesoft.svn.core.internal.wc2.remote.SvnRemoteList.run(SvnRemoteList.java:1)
at org.tmatesoft.svn.core.internal.wc2.SvnOperationRunner.run(SvnOperationRunner.java:21)
at org.tmatesoft.svn.core.wc2.SvnOperationFactory.run(SvnOperationFactory.java:1235)
at org.tmatesoft.svn.core.wc2.SvnOperation.run(SvnOperation.java:294)
at org.tmatesoft.svn.core.javahl17.SVNClientImpl.list(SVNClientImpl.java:187)
... 15 more
Caused by: java.io.IOException: There was a problem while connecting to xn--x7h.example.com:22
at com.trilead.ssh2.Connection.connect(Connection.java:817)
at org.tmatesoft.svn.core.internal.io.svn.ssh.SshHost.openConnection(SshHost.java:225)
at org.tmatesoft.svn.core.internal.io.svn.ssh.SshHost.openSession(SshHost.java:153)
at org.tmatesoft.svn.core.internal.io.svn.ssh.SshSessionPool.openSession(SshSessionPool.java:85)
at org.tmatesoft.svn.core.internal.io.svn.SVNSSHConnector.open(SVNSSHConnector.java:122)
... 27 more
Caused by: java.io.IOException: Key exchange was not finished, connection is closed.
at com.trilead.ssh2.transport.KexManager.getOrWaitForConnectionInfo(KexManager.java:92)
at com.trilead.ssh2.transport.TransportManager.getConnectionInfo(TransportManager.java:231)
at com.trilead.ssh2.Connection.connect(Connection.java:769)
... 31 more
Caused by: java.io.IOException: Cannot negotiate, proposals do not match.
at com.trilead.ssh2.transport.KexManager.handleMessage(KexManager.java:413)
at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:765)
at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:480)
at java.base/java.lang.Thread.run(Thread.java:834)
Any ideas?
I checked on server-side what is going on. So I called tail -f /var/log/auth.log and discovered a message that the client and the server was not able to agree on matching kex-algorithms.
Finally my solution was to enable the line
KexAlgorithms +diffie-hellman-group1-sha1
to the corresponding
/etc/ssh/sshd_config
File and restart the sshd-server.
I know this kind of question have been asked previously, but I still don't get solution after reading their posts, so I decide to post this question again from here.
I Am working on Java multi-threaded application where I am trying to run HQL queries using JDBC on Hive environment. I have bunch of hive-sql queries and i am executing them on Hive in parallel with multiple threads and I am getting following exception when queries count more (for example, if i am running more than 100 queries). can some one please check this and help me on this?
2020-06-16 06:00:45,314 ERROR [main]: Terminal exception
java.lang.Exception: Map step agg_cas_auth_reinstate_derive failed.
at com.mine.idn.magellan.ParallelExecGraph.execute(ParallelExecGraph.java:198)
at com.mine.idn.magellan.WarehouseSession.executeMap(WarehouseSession.java:332)
at com.mine.idn.magellan.StandAloneEnv.execute(StandAloneEnv.java:872)
at com.mine.idn.magellan.StandAloneEnv.execute(StandAloneEnv.java:778)
at com.mine.idn.magellan.StandAloneEnv.executeAndExit(StandAloneEnv.java:642)
at com.mine.idn.magellan.StandAloneEnv.main(StandAloneEnv.java:77)
Caused by: java.sql.SQLException: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. java.io.IOException: Bad file descriptor
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.io.IOException: Bad file descriptor
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2850)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2685)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2591)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:1077)
at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2007)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:479)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:469)
at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:190)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:601)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:599)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hadoop.mapred.JobClient.getJobUsingCluster(JobClient.java:599)
at org.apache.hadoop.mapred.JobClient.getJobInner(JobClient.java:609)
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:639)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:295)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:559)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:425)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:201)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79)
Caused by: java.io.IOException: Bad file descriptor
at java.io.FileInputStream.close0(Native Method)
at java.io.FileInputStream.access$000(FileInputStream.java:49)
at java.io.FileInputStream$1.close(FileInputStream.java:336)
at java.io.FileDescriptor.closeAll(FileDescriptor.java:212)
at java.io.FileInputStream.close(FileInputStream.java:334)
at java.io.BufferedInputStream.close(BufferedInputStream.java:483)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2676)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2661)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2741)
... 22 more
at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:385)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
at com.mine.idn.magellan.WarehouseSession.executeMapStep(WarehouseSession.java:797)
at com.mine.idn.magellan.WarehouseSession.access$000(WarehouseSession.java:23)
at com.mine.idn.magellan.WarehouseSession$ParallelExecResources.executeMapStep(WarehouseSession.java:91)
at com.mine.idn.magellan.ParallelExecGraph$Node.run(ParallelExecGraph.java:85)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
What i dont understand is, why hadoop framework throwing - Bad File Descriptor Exception? My Java code invoking Hadoop-Hive code and its throwing this exception.
Also one more thing is this issue is intermittent, not consistent. If i re-run the same application, most of the cases, it went through.
Thank you for
I wrote a mapreduce job to scan an hbase table for a certain time range to count certain elements we need for analysis.
Mappers in the MR job keeps failing but I don't know why. Seems like each time I run the job, a different number of mappers fail. The YARN log (see below) from Cloudera manager isn't helpful in pointing what the problem is, although, someone said I might be running out of memory.
It seems to retry multiple times but each time it fails. What do I need to do to make it stop failing or how can I log things to help me better determine what is happening?
Below is a log from YARN for one of the mappers that failed.
Error: org.apache.hadoop.hbase.client.RetriesExhaustedException:
Failed after attempts=36, exceptions: Thu Jun 15 16:26:57 PDT 2017,
null, java.net.SocketTimeoutException: callTimeout=60000,
callDuration=60301: row '152_p3401.db161139.sjc102.dbi_1496271480' on
table 'dbi_based_data' at
region=dbi_based_data,151_p3413.db162024.iad4.dbi_1476974340,1486675565213.d83250d0682e648d165872afe5abd60e., hostname=hslave35118.ams9.mysecretdomain.com,60020,1483570489305,
seqNum=19308931 at
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:207)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at
org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
at
org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:403)
at
org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364)
at
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:236)
at
org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:147)
at
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1.nextKeyValue(TableInputFormatBase.java:216)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused
by: java.net.SocketTimeoutException: callTimeout=60000,
callDuration=60301: row '152_p3401.db161139.sjc102.dbi_1496271480' on
table 'dbi_based_data' at
region=dbi_based_data,151_p3413.db162024.iad4.dbi_1476974340,1486675565213.d83250d0682e648d165872afe5abd60e., hostname=hslave35118.ams9.mysecretdomain.com,60020,1483570489305,
seqNum=19308931 at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)
at
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) Caused by:
java.io.IOException: Call to
hslave35118.ams9.mysecretdomain.com/10.216.35.118:60020 failed on
local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException:
Call id=12, waitTime=60001, operationTimeout=60000 expired. at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException(AbstractRpcClient.java:291)
at
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272)
at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:226)
at
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:331)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:34094)
at
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:219)
at
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:64)
at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:360)
at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:334)
at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
... 4 more Caused by:
org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=12,
waitTime=60001, operationTimeout=60000 expired. at
org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:73) at
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1246)
... 13 more
So it looks like for my case I needed to extend the timeout setting. In my Java program I had to add the following lines to make the exception go away:
conf.set("hbase.rpc.timeout","90000");
conf.set("hbase.client.scanner.timeout.period","90000");
The answer was found on this link on Cloudera's site
For a short time today, my code running in Flexible Environment "compat" using Google Cloud Datastore API experienced java.net.SocketTimeoutException: Timeout while fetching URL on inserting an Item to Datastore in another GAE project.
In addition, Dataflow failed to insert Items at that time; almost certainly the same problem.
This also occurs on simple key queries, so it is not a problem of heavy data.
Before and after this, a lot of other data did get inserted correctly; including a rerun on this same data.
Googling the error suggests it can be caused by downtime in the Google Cloud, but the Google Cloud Status Dashboard shows green.
What caused this? How can we avoid it in the future?
com.freightos.backup.datastore.gcloudapi.GCloudApiDSBackup$CopyEntities run: ERROR in CopyEntities(commerceDocs/RFQ) java.lang.RuntimeException: com.google.cloud.datastore.DatastoreException: I/O error at
com.freightos.backup.datastore.gcloudapi.GCloudApiDSBackup$CopyEntities.retryIfAllowed(GCloudApiDSBackup.java:963) at
com.freightos.backup.datastore.gcloudapi.GCloudApiDSBackup$CopyEntities.putEntities(GCloudApiDSBackup.java:857) at
com.freightos.backup.datastore.gcloudapi.GCloudApiDSBackup$CopyEntities.retryIfAllowed(GCloudApiDSBackup.java:959) at
com.freightos.backup.datastore.gcloudapi.GCloudApiDSBackup$CopyEntities.putEntities(GCloudApiDSBackup.java:857) at
com.freightos.backup.datastore.gcloudapi.GCloudApiDSBackup$CopyEntities.putEntitiesByParts(GCloudApiDSBackup.java:991) at
com.freightos.backup.datastore.gcloudapi.GCloudApiDSBackup$CopyEntities.run(GCloudApiDSBackup.java:801) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at
java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at
java.lang.Thread.run(Thread.java:745) Caused by: com.google.cloud.datastore.DatastoreException: I/O error at
com.google.cloud.datastore.spi.DefaultDatastoreRpc.translate(DefaultDatastoreRpc.java:105) at
com.google.cloud.datastore.spi.DefaultDatastoreRpc.commit(DefaultDatastoreRpc.java:133) at
com.google.cloud.datastore.DatastoreImpl$4.call(DatastoreImpl.java:390) at
com.google.cloud.datastore.DatastoreImpl$4.call(DatastoreImpl.java:387) at
com.google.cloud.RetryHelper.doRetry(RetryHelper.java:179) at
com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:244) at
com.google.cloud.datastore.DatastoreImpl.commit(DatastoreImpl.java:386) at
com.google.cloud.datastore.DatastoreImpl.commitMutation(DatastoreImpl.java:380) at
com.google.cloud.datastore.DatastoreImpl.put(DatastoreImpl.java:340) at
com.freightos.backup.datastore.gcloudapi.GCloudApiDSBackup$CopyEntities.putEntities(GCloudApiDSBackup.java:836) ... 9 more Caused by: com.google.datastore.v1.client.DatastoreException: I/O error, code=UNAVAILABLE at
com.google.datastore.v1.client.RemoteRpc.makeException(RemoteRpc.java:126) at
com.google.datastore.v1.client.RemoteRpc.call(RemoteRpc.java:95) at
com.google.datastore.v1.client.Datastore.commit(Datastore.java:84) at
com.google.cloud.datastore.spi.DefaultDatastoreRpc.commit(DefaultDatastoreRpc.java:131) ... 17 more Caused by: java.net.SocketTimeoutException: Timeout while fetching URL: https://datastore.googleapis.com/v1/projects/freightos-prod-backup2:commit at
com.google.appengine.api.urlfetch.URLFetchServiceImpl.convertApplicationException(URLFetchServiceImpl.java:173) at
com.google.appengine.api.urlfetch.URLFetchServiceImpl.fetch(URLFetchServiceImpl.java:45) at
com.google.api.client.extensions.appengine.http.UrlFetchRequest.execute(UrlFetchRequest.java:74) at
com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981) at
com.google.datastore.v1.client.RemoteRpc.call(RemoteRpc.java:87) ... 19 more
It looks like UrlFetchTransport is being used when it shouldn't be.
I've filed this issue to fix the google-cloud-java library.
In the mean time, you should be able to force it to use NetHttpTransport when you construct the DatastoreOptions object:
import com.google.api.client.http.HttpTransport;
import com.google.api.client.http.javanet.NetHttpTransport;
import com.google.auth.http.HttpTransportFactory;
DatastoreOptions.newBuilder()
.setHttpTransportFactory(new HttpTransportFactory() {
#Override
public HttpTransport create() {
return new NetHttpTransport();
}
})
...