Jave Reading Video/Audio Codecs From File - java

I have a folder of mixed video files (multiple formats for web display mp4, ogv, webm). When retrieving those files in Java I can verify the format by pulling the extension from the filename. Is there a way to retrieve other information such as video and audio codec data from the file?
To create the files I am using ffmpeg from within Java to transcode video files to the formats and sizes I need, so when the file is being created I do know the codec information. If its not automatically stored somewhere in the file, is there a way to set metadata or something and store the info manually so I can retrieve it later? I am not using a database to store file locations or other data, just simply scanning and retrieving from the file system.

I think ffmpeg can automatically obtain video and audio formats from the files you specify as a source for ffmpeg utility.
If you need the metadata for other purposes then you can try Red5 server sources (java). There are a lot of readers (including MP4Reader) that can be used to obtain metadata.
For example for mp4 files MP4Reader scans the entire file in class constructor and then you can obtain metadata from the first tag:
reader = new org.red5.io.mp4.impl.MP4Reader(myFile)
String v = reader.getVideoCodecId(); // e.g. "avc1" (for h.254), "VP6F"
String a = reader.getAudioCodecId(); // e.g. "mp4a" (for aac), ".mp3" (for mp3)
...

To the best of my knowledge there is no easy way to determine file type by content. You have to make assumptions and then test those assumptions with code (e.g. I think it's type X so I'll inspect the first Y bytes for the pattern that always is present in files of type X.)

Video and audio data streams (e.g. H.264/AVC video, AAC, Dolby Digital audio, etc.) are multiplexed (or 'muxed') together inside file formats that are known as a container formats. MP4 is one such container format and is designed to be able to hold many different types of video and audio stream (see http://www.mp4ra.org/codecs.html for some of the officially registered types).
The different container formats have metadata that identifies the different media streams that are contained so as to help determine what type of decoder should be used to decode a particular media stream.
If you can (you said you're using Java), try using ffprobe to determine the container, video and audio formats used in a media file (plus other metadata). It may not be 100% reliable for all media types (as it may not recognise some) but given that you are encoding with ffmpeg (ffprobe is an ffmpeg-derived tool) it should do the job.

Related

Android: Remove all metadata from video

Is there any native/existing/manual way to remove all the metadata of a video file in Android/Java/Kotlin from a (content) URI?
For photos, we can still use ExifInterface and remove the metadata we want by using setAttribute and passing null to delete the given/desired values, however this doesn't work for videos because an mp4 video has it's own metadata format. If there is no native Android class that can do this, is there any algorithm or reference that can be referred in order to implement this?
You can use mp4parser library. Typical tasks for the MP4 Parser are:
Muxing audio/video into an MP4 file
Append recordings that use the same encode settings
Adding/Changing metadata
Shorten recordings by omitting frames
The MetaDataInsert example shows how to write metadata.

Use metadata to uniquely identify files

Hello need to know how to identify the audio file in the storage of a device, the question is as follows:
I am developing a music player and am storing some playback data in the database that are attached to each audio file individually, from time to time the application checks for changes in the user's audio library (on sdcard or internal memory) and inserts the new songs (if any) in the database, the problem is I can not identify if the database already exists because I can not get a common identifier, I tried to use the music path in the storage but in some cases the music name has banned characters that prevent me from using in sqlite so the question is:
How to identify an audio file?
EDIT1:
I think my question was not very clear, what I wanted was a way to individually identify each audio file using for example some metadata of the file that was unique to it and could not be repeated such as the creation date of the file in milliseconds, or any other metadata that is unique to each file, like a fingerprint.
I'm testing a solution that I find not if it is is ideal, I take the path of the file and use the Base64 class to encode it:
String path = "/storage/emulated/0/Download/Disturbed-Ten Thousand Fists.mp3";
Base64.encodeToString(path.getBytes(), Base64.DEFAULT);'
result is: L3N0b3JhZ2UvZW11bGF0ZWQvMC9Eb3dubG9hZC9EaXN0dXJiZWQtVGVuIFRob3VzYW5kIEZpc3Rz
Lm1wMw
The size varies depending on the path but String gets only letters and numbers that are accepted in the database and the result is always the same for each path. What you tink about it?
An audio file can be identified by the extension. A list (not complete) of formats that is used can be found on Wikipedia Audio_file_format
Your best option would probably be to check the file extension and make a list of known extension types related to audio.
This does not, however, cover cases such as an MP4 file with audio and no video.
For the purpose of this, I will assume you already have a variable, either hard coded or in a loop/list, which is the File object you wish to check.
File audioFile;
//this is just for readability, do not write in your code as this should be replaced with the variable you have which is storing the audio file File.
String name = audioFile.getName();
//This is where you can do your logic. The name also returns the extension of the file so you can make sure your music player can handle the file extension, and also check the characters in the name
//Here is an example of detecting the ' character
if(name.contains("'")){
//do something
}
Please let me know if you have further questions!
You must clicked all file format in the file type frame then choose the insert file with audio format such mp3, real, wmp.

How to programmatically change filenames on S3 to remove filename extensions?

Lets say you have a bunch of images which you want to host on S3 and they are available in various formats: png, jpg, jpeg, gif ... etc.
Writing or using an image-processing service to normalize all image formats, down to a single one, is one approach ... but I'm wondering if its possible to use a shortcut where you can remove the extension name from a filename (after upload) because the file properties now hold the appropriate mime-type anyway?
So after I upload 1.png, 2.jpg, 3.jpeg and 4.gif ... why not programmatically change all filenames to remove the extensions and access the images as:
/my-bucket/1
/my-bucket/2
/my-bucket/3
/my-bucket/4
So, how can someone programmatically change filenames on S3 to remove filename extensions?
I would love to hack on this using substitutions to remove extensions .<ext> from filenames but I think that programmatically its only available for setting up a job for transferring data from devices that you will actually ship to Amazon.
It's not pretty, but it can be done by calling copyObject() for /myBucket/myFile.jpg and setting the new key to be /myBucket/myFile. After the copy is complete, delete the original. At this time I'm not aware of a proper "rename" method available.

Disguising Image Files In Java

I have a situation where I would like to do some very light image file obfustication. My application ships with a bunch of .png files and I'd like it if they weren't so readily editable.
I'm not looking for a 'secure' solution (I don't believe one really exists), I'd just like Joe Public to be unable to edit the files.
I am currently using;
ImageIO.read(new File("/images/imagefile.png"));
I'd rather not have to use Serialisation, as the ImageIO system is pretty deeply ingrained in the code, each image needs also to remain as its own file on disk.
I was hoping I could just change the file extension eg;
ImageIO.read(new File("/images/imagefile.dat"));
But ImageIO seems to use it to identify the file. Can I tell ImageIO that it is a PNG despite its extension?
Encrypt all the files on disk.
Then in the program, decrypt a file, load it in memory and go rocking.
Java image I/O uses the Service Provider Interface to support new image formats1. I believe it might be possible to add a new decoder using a file extension. If that is the case, there is the route to providing an easily pluggable reader for a custom image format.
Note that you will probably need to change the file extension in the source. That might be the job for an advanced IDE, or a one-time search and replace using grep.
As to the format, one extremely simple way make media files unreadable in common readers is to write the bytes of the image in reverse order. Then flip them back after read, put them in a ByteArrayInputStream, and pass them to ImageIO.read(InputStream).
After you have written the service provider and Jar'd it properly (using a manifest with attributes to identify the file/content type it handles, and the corresponding encoder/decoder), add it to the run-time class-path of the app., and it should be able to read the custom image format.
...or keep all images in a single file and seek() to the start position of each image as you load. You can do this by pre-seeking against a FileInputStream, or conversely by creating a ByteArrayInputStream for ImageIO.read(InputStream).
You could try this:
Iterator rs = ImageIO.getImageReadersByFormatName("png");
ImageReader ir = (ImageReader) rs.next();
File srcFile = new File("/images/imagefile.dat");
ImageInputStream iis = ImageIO.createImageInputStream(srcFile);
ir.setInput(iis);

Custom Binary Input - Hadoop

I am developing a demo application in Hadoop and my input is .mrc image files. I want to load them to hadoop and do some image processing over them.
These are binary files that contain a large header with metadata followed by the data of a set of images. The information on how to read the images is also contained in the header (eg. number_of_images, number_of_pixels_x, number_of_pixels_y, bytes_per_pixel, so after the header bytes, the first [number_of_pixels_x*number_of_pixels_y*bytes_per_pixel] are the first image, then the second and so on].
What is a good Input format for these kinds of files? I thought two possible solutions:
Convert them to sequence files by placing the metadata in the sequence file header and have pairs for each image. In this case can I access the metadata from all mappers?
Write a custom InputFormat and RecordReader and create splits for each image while placing the metadata in distributed cache.
I am new in Hadoop, so I may be missing something. Which approach you think is better? is any other way that I am missing?
Without knowing your file formats, the first option seems to be the better option. Using sequence files you can leverage a lot of SequenceFile related tools to get better performance. However, there are two things that do concern me with this approach.
How will you get your .mrc files into a .seq format?
You mentioned that the header is large, this may reduce the performance of SequenceFiles
But even with those concerns, I think that representing your data in SequenceFile's is the best option.

Categories