I am writing a Play2 Java web application to ingest data to HDInsight interactive query using the Hive Streaming API(https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest). Hive data is stored on Azure Data Lake Store.
I loosely based myself on https://github.com/mradamlacey/hive-streaming-azure-hdinsight/blob/master/src/main/java/com/cbre/eim/HiveStreamingExample.java.
When I run the code on one of my headnodes I receive the following error:
play.api.UnexpectedException: Unexpected exception[StreamingIOFailure: Failed creating RecordUpdaterS for adl://home/hive/warehouse/data/ingest_date=2018-05-07 txnIds[486,495]]
at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:251)
at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:182)
at play.core.server.AkkaHttpServer$$anonfun$2.applyOrElse(AkkaHttpServer.scala:343)
at play.core.server.AkkaHttpServer$$anonfun$2.applyOrElse(AkkaHttpServer.scala:341)
at scala.concurrent.Future.$anonfun$recoverWith$1(Future.scala:414)
at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:37)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
Caused by: org.apache.hive.hcatalog.streaming.StreamingIOFailure: Failed creating RecordUpdaterS for adl://home/hive/warehouse/data/ingest_date=2018-05-07 txnIds[486,495]
at org.apache.hive.hcatalog.streaming.AbstractRecordWriter.newBatch(AbstractRecordWriter.java:166)
at org.apache.hive.hcatalog.streaming.StrictJsonWriter.newBatch(StrictJsonWriter.java:41)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.<init>(HiveEndPoint.java:559)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.<init>(HiveEndPoint.java:512)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatchImpl(HiveEndPoint.java:397)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatch(HiveEndPoint.java:377)
at hive.HiveRepository.createMany(HiveRepository.java:76)
at controllers.HiveController.create(HiveController.java:40)
at router.Routes$$anonfun$routes$1.$anonfun$applyOrElse$2(Routes.scala:70)
at play.core.routing.HandlerInvokerFactory$$anon$4.resultCall(HandlerInvoker.scala:137)
Caused by: java.io.IOException: No FileSystem for scheme: adl
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2644)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.<init>(OrcRecordUpdater.java:233)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRecordUpdater(OrcOutputFormat.java:292)
at org.apache.hive.hcatalog.streaming.AbstractRecordWriter.createRecordUpdater(AbstractRecordWriter.java:226)
I raised the question on the Microsoft forum as well and on the Hive jira.
I can confirm that the jars described here are present in the classpath:
com.microsoft.azure.azure-data-lake-store-sdk-2.2.5.jar
org.apache.hadoop.hadoop-azure-datalake-3.1.0.jar
No FileSystem for scheme
You get this error when the filesystem is not configured which probably needs to be done at both the HiveServer and your local client's core-site.xml files
Just because the JARs exist doesn't mean they are loaded onto the classpath and configured to read from your Azure account
Related
I am trying to create Flink JBDC sink to an oracle database. When run locally (from a junit test and minicluster) it works but when deployed in k8s it throws an exception saying it cannot find a suitable Driver. The Classpath is:
Classpath: /flink/lib/flink-cep-scala_2.12-1.13.5-stream1.jar:/flink/lib/flink-connector-jdbc_2.12-1.13.5.jar:/flink/lib/flink-csv-1.13.5-stream1.jar:/flink/lib/flink-json-1.13.5-stream1.jar:/flink/lib/flink-queryable-state-runtime_2.12-1.13.5-stream1.jar:/flink/lib/flink-shaded-netty-tcnative-dynamic-2.0.30.Final-13.0-stream1.jar:/flink/lib/flink-shaded-zookeeper-3.4.14.jar:/flink/lib/flink-table-blink_2.12-1.13.5-stream1.jar:/flink/lib/flink-table_2.12-1.13.5-stream1.jar:/flink/lib/log4j-1.2-api-2.16.0.jar:/flink/lib/log4j-api-2.16.0.jar:/flink/lib/log4j-core-2.16.0.jar:/flink/lib/log4j-slf4j-impl-2.16.0.jar:/flink/lib/ojdbc8-21.5.0.0.jar:/flink/lib/vvp-flink-ha-kubernetes-flink113-1.4-20211013.091138-2.jar:/flink/lib/flink-dist_2.12-1.13.5-stream1.jar:::
I tried multiple things:
Included the driver in the flink/lib directory and the flink-connector-jdbc connector was packaged within the the jar and .withDriverName("oracle.jdbc.OracleDriver") /.withDriverName("oracle.jdbc.driver.OracleDriver")
Included both the driver and the connector into the flink/lib directory and .withDriverName("oracle.jdbc.OracleDriver") / .withDriverName("oracle.jdbc.driver.OracleDriver")
I also tried to change the classloading configuration to classloader.parent-first-patterns.additional: oracle.jdbc.
but nothing seems to be working for me. The exception is:
failure cause: java.io.IOException: unable to open JDBC writer
at org.apache.flink.connector.jdbc.internal.AbstractJdbcOutputFormat.open(AbstractJdbcOutputFormat.java:56)
at org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat.open(JdbcBatchingOutputFormat.java:115)
at org.apache.flink.connector.jdbc.internal.GenericJdbcSinkFunction.open(GenericJdbcSinkFunction.java:49)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:34)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
at org.apache.flink.streaming.api.operators.StreamSink.open(StreamSink.java:46)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:442)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:585)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100)
at org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:565)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:540)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:759)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: No suitable driver found for "jdbc:oracle:thin:#//SOMECONNECTION"
at org.apache.flink.connector.jdbc.internal.connection.SimpleJdbcConnectionProvider.getOrEstablishConnection(SimpleJdbcConnectionProvider.java:126)
at org.apache.flink.connector.jdbc.internal.AbstractJdbcOutputFormat.open(AbstractJdbcOutputFormat.java:54)
... 14 more
What am I missing?
There is no support in Flink 1.13 for Oracle via JDBC, that was only added in Flink 1.15
I am trying to connect using DSR method but getting following error while reading snapshot parquet files on azure adlsgen2 path.
I have added some maven dependencies: Ex. hadoop-client,hadoop-azure,parquet-hadoop,scala-library,spark-core_2.12
Error:
Exception in thread "main" java.lang.NoClassDefFoundError: scala/runtime/java8/JFunction0$mcJ$sp
at io.delta.standalone.internal.SnapshotManagement.getLogSegmentForVersion(SnapshotManagement.scala:102)
at io.delta.standalone.internal.SnapshotManagement.getLogSegmentForVersion$(SnapshotManagement.scala:96)
at io.delta.standalone.internal.DeltaLogImpl.getLogSegmentForVersion(DeltaLogImpl.scala:32)
at io.delta.standalone.internal.SnapshotManagement.getSnapshotAtInit(SnapshotManagement.scala:201)
at io.delta.standalone.internal.SnapshotManagement.$init$(SnapshotManagement.scala:35)
at io.delta.standalone.internal.DeltaLogImpl.<init>(DeltaLogImpl.scala:36)
at io.delta.standalone.internal.DeltaLogImpl$.apply(DeltaLogImpl.scala:83)
at io.delta.standalone.internal.DeltaLogImpl$.forTable(DeltaLogImpl.scala:72)
at io.delta.standalone.internal.DeltaLogImpl.forTable(DeltaLogImpl.scala)
at io.delta.standalone.DeltaLog.forTable(DeltaLog.java:86)
at com.example.demo.DeltaLakeApplication.main(DeltaLakeApplication.java:38)
I receive this error message in jenkins when trying to deploy my application to kubernetes. Is there something I am missing?
istio-gateway.yml ERROR: ERROR: java.io.IOException: ERROR: YAML file
istio-gateway.yml is invalid, please check it. Details:
java.io.IOException: Unknown apiVersionKind:
networking.istio.io/v1alpha3/Gateway known kinds are...
I am using Spring Boot with Java
You are trying to create a Istio Geteway using that yaml. You probably don't have istio installed in your Kubernetes cluster. If it's not installed then you need to install it.
I am trying to access adls gen2 in spark java with following configuration properties.
fs.azure.account.auth.type
fs.azure.account.oauth.provider.type
fs.azure.account.oauth2.client.endpoint
fs.azure.account.oauth2.client.id
fs.azure.account.oauth2.client.secret
I have created the blob container and uploaded the file path ex.https://devbdstreamsv2.dfs.core.windows.net/gen2container/adlsgen2/flat.json in it using the software "Azure storage Explorer" version 1.9 .I am trying to access the abfs filepath which I am using according to the code mentioned in the document.abfs[s]://<file_system>#<account_name>.dfs.core.windows.net/<path>/
But my doubt is we are not initialising abfs filepath anywhere in the runner code.So I am getting the exception " No FileSystem for scheme: abfs ".How can i resolve this issue?I want to know Initialization of abfs filesystem using spark java for adls gen2.
You need a distribution of Spark which has the abfs connector in the hadoop-azure JAR. The hadoop-2.7.x JARs in the normal ASF releases do not, as abfs came out later (2.9+)
I am struggling with Azure wasb on spark
I am reading loading a .json.gz file from disk and loading it into hdfs. I have used the following code extensively on other systems.
val file_a_raw = sqlContext.read.json('/home/users/repo_test/file_a.json.gz')
However, on Azure, this returns:
java.io.FileNotFoundException: Filewasb://server-2017-03-07t08-13-41-314z#server.blob.core.windows.net/home/users/repo_test/file_a.json.gz does not exist.
I have checked this location and the file is there and correct.
I think there should be a : between .net and then file path, but I get a java error trying to manually add that in.
java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected scheme name at index 0:
I've also tried:
Filewasb:///home/users/repo_test/file_a.json.gz
But that returns:
java.io.IOException: No FileSystem for scheme: Filewasb
This code works fine on non Azure spark
For Azure, you'll need to configure Spark with the proper credentials. Databricks has documentation on this: https://docs.databricks.com/user-guide/faq/azure-blob-storage.html