I am trying to get metadata of a file lying in Azure blob storage.
I am using ffprobe for this purpose. Though it works, since the ffprobe binary lies on my local system and file lies in Blob, the entire process is too slow
What would be the best way to do the above, getting meta data for a remote file?
Two ways for your reference:
1.Use blob.downloadAttributes(),then use blob.getMetadata()
This method populates the blob's system properties and user-defined
metadata. Before reading or modifying a blob's properties or metadata,
call this method or its overload to retrieve the latest values for the
blob's properties and metadata from the Microsoft Azure storage
service.
2.Use get-metadata-activity in ADF.
Get a file's metadata:
Related
I am trying to upload a file directly to Google Cloud Storage using Java Client Library
The Code I have written is
Instead of uploading the new file to cloud storage I am getting this output
What I am missing in the code to make the upload to Cloud Storage ?
You need configure the the authorization keys, is a file .json to you enverioment,see this in the documentation https://cloud.google.com/iam/docs/creating-managing-service-account-keys#iam-service-account-keys-create-gcloud
I don't think you have the correct "BUCKET_NAME" set, please compare the bucket name you are using with your bucket name on your Google Cloud Console so you can see if it's set correctly.
The way it's set, it looks like the compiler thought you were using a different constructor for your blobInfo.newBuilder method.
I am trying to run a query in Google Big Query and export the data to Google cloud storage using GZIP compression.
JobConfigurationExtract jobExtractConfig = new JobConfigurationExtract().setSourceTable(tableReference).set("results.csv", "CSV")
.setDestinationUri("gs://dev-app-uploads/results.zip")
.setCompression("GZIP");
By using this config i am able to generate a results.zip file successfully in cloud storage in the configured bucket dev-app-uploads. But the file inside the zip is generated without a .csv extension. When i extract the zip file, i am getting a "results" file and when i manually add the extention .csv and open the file, the contents are there.
But my necessity is to generate the file with .csv extension and zip it and place it in cloud storage.
Please let me know if this is possible or any other better options to upload data from big query using compression.
Instead of
gs://dev-app-uploads/results.zip
use below
gs://dev-app-uploads/results.csv.zip
I am having issue in reading data from azure blobs via spark streaming
JavaDStream<String> lines = ssc.textFileStream("hdfs://ip:8020/directory");
code like above works for HDFS, but is unable to read file from Azure blob
https://blobstorage.blob.core.windows.net/containerid/folder1/
Above is the path which is shown in azure UI, but this doesnt work, am i missing something, and how can we access it.
I know Eventhub are ideal choice for streaming data, but my current situation demands to use storage rather then queues
In order to read data from blob storage, there are two things that need to be done. First, you need to tell Spark which native file system to use in the underlying Hadoop configuration. This means that you also need the Hadoop-Azure JAR to be available on your classpath (note there maybe runtime requirements for more JARs related to the Hadoop family):
JavaSparkContext ct = new JavaSparkContext();
Configuration config = ct.hadoopConfiguration();
config.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem");
config.set("fs.azure.account.key.youraccount.blob.core.windows.net", "yourkey");
Now, call onto the file using the wasb:// prefix (note the [s] is for optional secure connection):
ssc.textFileStream("wasb[s]://<BlobStorageContainerName>#<StorageAccountName>.blob.core.windows.net/<path>");
This goes without saying that you'll need to have proper permissions set from the location making the query to blob storage.
As supplementary, there is a tutorial about HDFS-compatible Azure Blob storage with Hadoop which is very helpful, please see https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-use-blob-storage.
Meanwhile, there is an offical sample on GitHub for Spark streaming on Azure. Unfortunately, the sample is written for Scala, but I think it's still helpful for you.
df = spark.read.format(“csv”).load(“wasbs://blob_container#account_name.blob.core.windows.net/example.csv”, inferSchema = True)
Is it possible to upload a file using some web-service directly to HDFS space. I tried to write file in to local system and moved it to HDFS.
WebHDFS provides REST APIs to support all the filesystem operations.
Direct uploading is not possible though.
It has to follow 2 steps
Create File in the hdfs location http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CREATE
Write to that file - by specifying your local file path tat u want to upload in the Header http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=CREATE
Refer APIs here WebHDFS apis
I have a java webapp which needs to upload files via http and then store them on the server. The files need to be associated with specific records in an Oracle database, so there will be a table in the database storing the reference to the associated record, and a reference to the file, along with other descriptive data such as title etc. The options for this table appear to be:
store the file as a BLOB
store a BFILE reference to the file
store a String containing the path to the file
We would prefer to store the file outside of the database, so we will not store it as a BLOB. The DBAs have indicated that their preferred option is to store the reference as a BFILE. The oracle.sql.BFILE object provides access to an InputStream for reading the file, but no obvious way of writing to it.
What is the best way of writing the file data back to disk when the only reference to the storage directory is the Oracle directory alias?
We decided that simple java.io was the best way to write to the file, which means that the storage directory has to be available to the web application servers as a mount. Since the directory was available to the webapp anyway, we then decided that the BFILE was not required and just to store the filename in the database instead.
According to the Oracle JDBC Developers Guide - 14.6 Working with BFILEs:
BFILEs are read-only. You cannot insert data or otherwise write to a BFILE.
You cannot use JDBC to create a new BFILE. They are created only externally.
So it seems that you need a separate tool/method to upload the data to the Oracle storage directory, which probably makes BFILE a bad choice for your scenario.
"We would prefer to store the file outside of the database"
Why ? What is your backup/recovery scenarios for this. For example, in the event of a disk failure, how would you recover the file ? Are the files purely transitory, so you don't actually need to preserve them ?
Basically, the BFILE is a compromise between BLOB storage and simply storing a path in the database. You write to it in the same manner as you would a conventional file, then make it 'available' to the database for reading. If your web-app is running on a different physical server from the database, you'd need to have a storage location that is accessible to both boxes (and appropriate permissions for read/write).