Multipart tiff and EXIF metadata

Multipart tiff and EXIF metadata - java

In the tif format, when you add EXIF meta data it creates an new IFD (tif-direcory) and stores the exif metadata as fields. So when parsing a tif file with a single image and exif data is easy. But you can get multipart tiffs, where a tif can contain more then one image, the question is can each of these images have EXIF data?
Does this create a new IFD for each pictures metadata?
What is is the arrangement of the IFD's then?
The tif specification doesn't go into any detail, I know that when a single image tif file has EXIF data there is an offset field to the EXIF data, so I can jump to that location and do the parsing myself, but the Java Sanselan library gives me easy access to the EXIF IFD and fields, but if it is possible to multiple EXIF IFD's (one for each image) then the library doesn't tell me to which image the data belongs.
If you cannot have more then 1 EXIF IFD in a multipart tif file, then it'll be trivial! In other words:
Do I need to go to the effort of manually parsing the exif data? Because I only need to do this if you can attach EXIF data to each image inside a multipart tif.
Or does anyone know of a good Linux app that allows me to add EXIF data to tif files so I can figure it out for myself?

To answer your questions:
can each of these images have EXIF data? Does this create a new IFD for each pictures metadata? What is is the arrangement of the IFD's then?
Yes, each of these images can have it's own EXIF data. Each image is related to its own IFD and each EXIF data is a SUB-IFD inside the corresponding image IFD.
but the Java Sanselan library gives me easy access to the EXIF IFD and fields, but if it is possible to multiple EXIF IFD's (one for each image) then the library doesn't tell me to which image the data belongs.
I never used Sanselan and it's successor Apache Imaging so I guess there could be two things happening here: first, Sanselan may by default choose the first page for a multipage TIFF if you actually can insert EXIF to a multipage TIFF; or there might be a parameter which you can set somewhere with a method like setWorkingPage(int page) and this is what I am doing with "icafe" Java image library.
The following is a bit more detailed information as to what is happening inside a TIFF image when you need to add EXIF metadata:
For a single page TIFF, there is a "main" IFD which specifies all the information regarding the image contained there. When EXIF data is needed, an specially tag called "EXIF_SUB_IFD" is added to the main IFD. The value for this tag is an offset address with regards to the image stream start. Now if we jump to the address specified by the offset, we will actually find a "sub" IFD with exactly the same structure as the "main" IFD which contains all the EXIF data.
The above mentioned structure is exactly like a directory tree and hence the name IFD. There is however a subtle difference here: the main IFD should contain the actual image data but the EXIF sub-IFD doesn't. In fact, there is also a GPS sub-IFD which is in parallel with the EXIF sub-IFD and with the same structure as well. An interesting thing is the data for the EXIF can be stored anywhere inside the TIFF image stream (as long as it doesn't break other part of the directory and image data).
Now comes to the multipage TIFF. The pages can be related or not. The last 4 bytes of each page IFD points to the offset of another IFD. They are sometimes gathering together to serve as a "single" document which could be from a scanner. That said, each page is itself a "single" page TIFF which could contain it's own EXIF metadata just like a single page TIFF.

You probably want to check out ExifTool. It works pretty well for what I use it on (JPEGs), but I've never used it with TIFF files containing multiple images. Also check ImageMagick, he has a ton of useful tools.

Related

Android : How to iterate through webp frames using android's ImageDecoder

I want to convert the animated webp into gif and I have gif encoder+decoder and webp ecnoder and it is working fine with gifs only. I want to process the animated webp as well so I need to decode the animated webp first and get bitamps for each frames. I could not get any animated webp decoder and later found that android.graphic has Image decoder which support animated webp image but it shown example for drawable and it has start() method for animated webp.
How can I iterate through each frames to convert them into bitmap or some data type like byte[], base64, streams, etc so that i can convet that into bitmap.
File file = new File(...);
ImageDecoder.Source source = ImageDecoder.createSource(file);
Drawable drawable = ImageDecoder.decodeDrawable(source);

As alternative for achieving same goal I have solved this by using Glide and APNG4 library along with some encoder decoder available on git.
You can do both encode decode and and other stuff alone with APNG4.
https://github.com/penfeizhou/APNG4Android

Here is how we can extract frames from animated webp file without using any third party library.
According to Google's Container Specification for WebP image format,
We need to read the image in specific way and you can do that with almost any language you like.
In Java you can create InputStream of animated webp file and read data in 4 bytes in sequence.
There is library android-webp-encoder for encoding webp image and written in pure java.
Although you can use it for decoding the image as well. Need to modify the the library. I have modified it but not published yet. Soon I will upload it on github as I fix the bugs.
But I can explain how to modify that library to decode frames or write your own codes to decode.
First create inputstream of image
Read data in 4 bytes chunks till the end of file.
Reading:
Read 4 bytes and check if it is 'RIFF' characters.
Then read next 4 bytes. This is file size.
After file size next 4 bytes must be 'WEBP' characters
Next 4 bytes will give 'VP8X' characters. Our actual image data and parameters starts from here.
Next 4 bytes must should contain value 10 as after that we need to read 10 bytes in specific manner stated in the google's container specification.
After VP8X, ANIM and other optional chunks we have to read ANMF followed by ALPH (optional) data, VP8/VP8L data. these are the actual image data we need to extract and create bitmaps out of it.
Each ANMF occurrence will signal us about each frames.
You can write static webp image data to ByteArrayOutputStream and create
bitmap using BitmapFactory.decodeByteArray(stream). This will return bitmap image of that frame.

Converting an image into an arbitrary/custom image file/format

Can anyone, at least in abstract, describe to me what to do if I have an arbitrary image file (that I know the layout of and other specifications) and I need to convert image files like .jpg, .tga and .png into that format.
ImageIO does not know this format. It's a custom format. I want to create a tool that will be able to work with this format = to be able to write this custom from jpg,png,tga etc.
How do I specify the things like the Header the custom image format is supposed to be saved (written) with?
Example: convert penguin.png into penguin.xyz where .xyz is a custom image format.
My idea: I could input the source image's (any common image file) bitmap data into a buffer, then add the specifications of the custom image file to the buffered image data and then write the new image file (my arbitrary image file).
I've been looking for 3 days, can't find a tutorial for this. Should I tackle this problem once I get some more experience?

Extract TIFF images from PDF without decoding

With the help of iText 5 I would like to extract all TIFF images from given PDF file and save them as TIFF files.
Examples and other posts (1, 2) use the following method:
Create PdfImageObject from PDF stream which in line 189 decodes the image stream (if corresponding filter implementation is present).
Call PdfImageObject#getImageAsBytes() which returns JPEG (original), PNG (re-encoded) or TIFF (in case of 8 bits per pixel).
As a result TIFF image with 1 bit color depth is converted to PNG, which is not what I need.
Another approach would be to call PdfImageObject#getBufferedImage() which will decode the image in step (2) into raster and afterwards encode it again as TIFF using ImageIO.write(bufferedImage, "tiff", file).
As one can see this is not efficient. Another solution shown in this post demonstrates how to save encoded TIFF image stream to file by prepending it a TIFF header – that is the solution I am looking for.
Can iText help here?

PDF images are not TIFF images.
PDFs however can contain images that use compression techniques that are also used in TIFF, e.g. Flate, CCITT, LZW, JPEG.

How to convert .jpeg image to .jif in java? (Exif TO JFIF)

From customer i have request to send images in .JIF format (JFIF) . I have java aplication, but i coould not google anithing to topic of how to convert to that image type, i could even hardly google anithing to ".JIF" format itself.
EDITED :
Can somebody advice how to convert Exif image to JFIF in java ? And how to add coments to this JFIF image?
(tried to use jheader library sadly ended with nullpointer exception, not much more choices on google.)

Edit: Converting Exif JPEGs to JFIF JPEGs:
If you don't mind losing some quality (due to lossy JPEG re-encoding), you can convert the image as simply as:
File inFile = ...;
File outFile = ...; // Feel free to use ".jif" as extension
if (!ImageIO.write(ImageIO.read(inFile), "JPEG", outFile)) {
System.err.println("Could not write JPEG format"); // Should never happen
}
This will work, because the default JPEGImageWriter plugin only supports JFIF format. And because we don't read the metadata, the old Exif information will just be lost. Doing it this way, will not allow you to add comments, however.
To add comments, you could still use standard ImageIO API, but we'll have to access the metadata, making the code more verbose. See JPEG Metadata Format Specification for more information on the metadata format. If you need to convert comments from the Exif metadata, please update your question to specific on that, as it requires further parsing of the meta data and extra support not currently in the ImageIO API.
File inFile = ...;
File outFile = ...; // Feel free to use ".jif" as extension
BufferedImage image = ImageIO.read(inFile);
ImageWriter jpegWriter = ImageIO.getImageWritersByFormatName("JPEG").next(); // Should be a least one
// To write comments, we need to add it to the metadata
ImageWriteParam param = jpegWriter.getDefaultWriteParam();
IIOMetadata metadata = jpegWriter.getDefaultImageMetadata(ImageTypeSpecifier.createFromRenderedImage(image), param);
IIOMetadataNode root = (IIOMetadataNode) metadata.getAsTree("javax_imageio_jpeg_image_1.0");
IIOMetadataNode markerSequence = (IIOMetadataNode) root.getElementsByTagName("markerSequence").item(0); // Should be only one
// Insert a "COM" marker, with our comment
IIOMetadataNode com = new IIOMetadataNode("com");
com.setAttribute("comment", "Hello JFIF!");
markerSequence.appendChild(com);
// Merge edited metadata
metadata.mergeTree("javax_imageio_jpeg_image_1.0", root);
ImageOutputStream output = ImageIO.createImageOutputStream(outFile);
try {
jpegWriter.setOutput(output);
// Write image along with metadata
jpegWriter.write(new IIOImage(image, null, metadata));
}
finally {
output.close();
}
jpegWriter.dispose();
This way, we still re-encode the image in lossy JPEG, but we convert from Exif to JFIF and add comments.
Now, there is still another option, to do this completely lossless. But it does require a bit of a deeper understanding of the JIF segment structure, and how the Exif and JFIF formats work. Unfortunately, there's no standard Java API (that I know of) to do this, so you will have to roll your own. Feel free to use my JPEG segment parsing code as a starting point. The JHeader project you linked also looks very promising, but I don't have any experience with this library, so I can't provide any advice there.
Here's the basic idea:
Parse/skip the marker segments until the SOS (Start of Scan) segment (the data following the SOS will be the compressed image data).
Write the SOI marker (0xffd8)
Create an APP0/"JFIF" marker (I think you can just use defaults here, see JFIF segment for details). You can write 0, 0 for the thumb dimensions, and skip writing thumbnail data.
Add your COM segments with whatever comments you need (possibly extracted from the Exif metadata)
Write the SOF, DHT, DQT etc. standard segments as-is from the original stream (skip the APP1/"Exif" and other "custom" segments).
Write the SOS marker and the image data from the original stream
In theory, this should work. You might have some minor color space issues, as the Exif data might contain different color spaces (normally sRGB or AdobeRGB1998), while JFIF doesn't have a defined color space. If you need this add an APP2/"ICC_PROFILE" segment with the required profile (after step 3).
Good luck! :-)
Note: This is not a complete answer, but instead an attempt to clarify why you need to talk to your client, and figure out what is wrong with your JPEGs and what he actually means by "JIF".
First an foremost, JPEG is not a file format. JPEG is a still image compression standard. Part of this standard (usually referred to as "Annex B") is a description of an interchange format, sometimes referred to as JIF. The standard also specifies a full file format known as SPIFF, but this format is not very widespread (and I don't think this is what you want).
The files you find everywhere, referred to as "JPEG files" (and I assume this is what you refer to as "Classic JPEG"), is usually in one of two slightly different flavors of basically the same file format:
The most basic format is JFIF. This format starts with a SOI marker, immediately followed by an APP0 marker with "JFIF" (null-terminated) as its identifier. According to the original JFIF specification "The JPEG File Interchange Format is entirely compatible with the standard JPEG interchange format; the only additional requirement is the mandatory presence of the APP0 marker right after the SOI marker." (this part is left out of the ITU and ISO versions of the specification, but still applies). Put simply, JFIF constrains the JPEG data to be 1 or 3 components, encoded as either Y or YCbCr, and highly recommends baseline DCT, Huffman coded compression.
The other common format is Exif. This format starts with a SOI marker, immediately followed by an APP1 marker with "Exif" (null-terminated) as its identifier. This format is developed by the digital camera manufacturers, and allows much richer meta data to be recorded within the file (in the form of a TIFF meta data structure). From what I understand, Exif constrains the JPEG data to be 3 components, encoded as YCbCr, using baseline DCT, Huffman coded compression (the last part may be just a an interoperability recommendation, the language in the spec is a little hard to read...).
Both of these formats contains the same "segment" layout and the image data is compatible, but still they are mutually exclusive, due to the requirement of having "their" marker as the first segment in the stream (because of this, also a "third" format exists, which is a JFIF for compatibility, but still contains an Exif segment for richer meta data).
Yet another family of "JPEG files" lacks both JFIF and Exif markers, but still follows the same segment layout, with SOI, APPn markers, SOF, DHT, DQT, SOS and EOI markers, as described in "Annex B" (JIF). Most decoders will decode these images as well.
TL;DR: To summarize, what all the "JPEG" file formats have in common, is that they use JPEG compression, and follows the JIF structure. Because of this, it is somewhat hard to understand what someone means by "convert classic JPEG to JIF".
"Classic JPEG" is JIF.

First of all you need to read that image then you must write that image into the dimensions and format you want.
You must use ImageIO class and BufferedReader to read images
To write them use Graphics2D class
Replace format name with jif
File inputFile = new File(inputImagePath);
BufferedImage inputImage = ImageIO.read(inputFile);
// creates output image
BufferedImage outputImage = new BufferedImage(scaledWidth,
scaledHeight, inputImage.getType());
// scales the input image to the output image
Graphics2D g2d = outputImage.createGraphics();
g2d.drawImage(inputImage, 0, 0, scaledWidth, scaledHeight, null);
g2d.dispose();
// extracts extension of output file
String formatName = outputImagePath.substring(outputImagePath
.lastIndexOf(".") + 1);
// writes to output file
ImageIO.write(outputImage, formatName, new File(outputImagePath));

How to reduce size of RTF with embedded images?

We have some code which produces an RTF document from a RTF template. It is basically doing string search and replaces of special tags within the RTF file. This is accessible via a web page.
Typically, the processing time for this is really quick.
However, we need to embed an image within a template. We've been embedding these as JPEG images using Word's "Insert/Picture/From File..." functionality. But we've found that the resultant RTF file size is massively dependant upon the image.
For example, I've inserted a 20k JPEG logo (which is basically a solid background with some text). The RTF file increased in size from around 390k (without the image) to 510k (with the image).
Then we inserted a JPEG containing a screenshot, i.e. the image contains text, multiple colours, etc. The JPEG is around 150k. Using this image, the RTF file increased in size from 390k to 3.5MB.
So the encoding that Word uses for storing images into an RTF doesn't perform linearly. I'm guessing it is dependant upon what is in the JPEG image.
I need to keep the size of the RTF templates to a minimum to try and keep our file processing times to a minimum.
Does anyone have any ideas on how to minimize the size of the RTF files with embedded images?
Is there any way of controlling the encoding that Word uses? I can't see any options anywhere.
Does anyone know what type of binary encoding Word/RTF uses?
Thanks in advance.

Here is the best solution
http://support.microsoft.com/kb/224663
Excerpt:
SYMPTOMS
When you save a Microsoft Word document that contains an EMF,
PNG, GIF, or JPEG graphic as a different file format (for example,
Word 6.0/95 (.doc) or Rich Text Format (.rtf)), the file size of the
document may dramatically increase.
For example, a Microsoft Word 2000 document that contains a JPEG
graphic that is saved as a Word 2000 document may have a file size of
45,568 bytes (44.5KB). However, when you save this file as Word 6.0/95
(.doc) or as Rich Text Format (.rtf), the file size may grow to
1,289,728 bytes (1.22MB).
CAUSE
This functionality is by design in Microsoft Word. If an
EMF, a PNG, a GIF, or a JPEG graphic is inserted into a Word document,
when the document is saved, two copies of the graphic are saved in the
document. Graphics are saved in the applicable EMF, PNG, GIF, or JPEG
format and are also converted to WMF (Windows Metafile) format.
RESOLUTION
Warning If you use
Registry Editor incorrectly, you may cause serious problems that may
require you to reinstall your operating system. Microsoft cannot
guarantee that you can solve problems that result from using Registry
Editor incorrectly. Use Registry Editor at your own risk.
To prevent Word from saving two copies of the graphic in the document,
and to reduce the file size of the document, add the
ExportPictureWithMetafile=0 string value to the Microsoft Windows
registry.

An image in an RTF file gets stored as a WMF, uncompressed. On mac, it it would be macpict. Your best bet to keep the file size down is to link the image to the document rather than insert a copy in the document. The trade-off is that you have to keep the files together.
EDIT
Is compressing the RTF an option? Using zip/rar, you'll get your file size back, but you'll have to uncompress, first obviously. There are supposed to be tools that will do rtf compression, but I have never used them.

We have done a similar project over at work. Only we're not using that "Insert/Picture/From File..." functionality. Our template has a tag named [photos], as I presume your own does also. When we process the document we replace the tag with the RTF codes needed to display images. We're putting them within a table and we're displaying two images on each row, plus a row on top for the title.
So, you might place a tag [photos] in your template. Then you replace the tag with the RTF Codes. You can find some good references to these codes on the web. For eg. here
.
Now, my code looks something like this:
\par {\rtf1\ansi\deff0{\trowd\cellx8810 {title}\intbl\qc\cell\row}{\trowd\cellx4405\cellx8810{\pict\jpegblip\picwgoal4000\pichgoal3000\piccropl-50\piccropr-50\piccropt-50\piccropb-50\hex
Your image as an array of bytes in hexadecimal }\intbl\cell{\pict\jpegblip\picwgoal4000\pichgoal3000\piccropl-50\piccropr-50\piccropt-50\piccropb-50\hex
Your other image }\intbl\cell\row}
if you get your image into a byte array, you may use BitConverter.ToString(array) to get your hex code. only you'll need to replace dashes "-" by "";
Our files will take up less than 1/10th of the space a "normal" RTF will. If we open the doc's code with an editor such as Notepad++, we can see the RTF codes, but if we open the document and save it as RTF (changing its name), it'll go from 1.5Mb to 50Mb!!
I'm guessing DaveParillo's reply justifies it: I'm only writing each image once.
Hope it helps.
Cheers mate

Initially, keep in mind that each byte is stored using 2 characters (two bytes), this means that the increments at least is the double size of original picture.
Other things that you need is that Word and Word Pad insert different (flavor or format) of the same image plus other fields (that RTF can to be displayed without them).
Here are some scripts used to insert images in RTF (https://joseluisbz.wordpress.com/2011/06/22/script-de-clases-rtf-para-jsp-y-php/), and one example of use (https://joseluisbz.wordpress.com/2011/07/16/subiendo-imagenes-png-y-jpg-y-archivos-a-mysql-con-php-y-jsp-y-mostrarlos-en-rtf-usando-clases/)
Now, maybe you will need replace the original Image with another (http://joseluisbz.wordpress.com/2013/07/26/exploring-a-wmf-file-0x000900/).

The Swartbees answer worked perfectly for me. I first reduced the image quality to "0" using G.I.M.P. Save as jpeg functionality. After following the microsoft solution suggested by Swartbees above I reinserted the picture into the file and the size increase was negligible 229k to 279k (as opposed to 29000kb).
Thanks for your suggestions guys.

Yes, by removing the redundant characters. And to do this you must insert them back into your stream.
For instance if you have over twenty f characters in one line, then you can replace with f[20] in your stream. It is a start.
-Best of luck.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.