I want to modify the metadata of some types of images (png, jpeg or gif) and I found a a code that works very well for PNG images on this topic, provided by haraldK. When I try to run it on a jpg image though, it throws this error :
javax.imageio.IIOException: JFIF APP0 must be first marker after SOI. The error is thrown when arriving on the line IIOImage image = reader.readAll(0, null);
What can I do to get this working ?
Thanks in advance for your answer.
The problem you face is the JPEG standard did not define a file format. Several file formats appeared. E.g. JFIF. EXIF. SPIFF. These formats represent metadata in different ways. Apparently the library you are trying to use only supports the JFIF file format. Apparently your library only supports the JFIF format while you have a file in a different format (likely EXIF).
So you need a library that supports your file format or you need to modify the library you have to work with whatever file format you have. That could be a fairly substantial change.
Related
I am trying to implement attachments in my application and user is able to upload image files (png, jpg, jpeg). I have read OWASP recommendations for image uploads, and one of the tips was to - convert the input image to a bitmap (keeping only the bitmap data, and throwing away all the extra annotations), then convert the bitmap to your desired output format. One reasonable way to do this is to convert to PBM format, then convert to PNG.
Image is saved as byte array.
I am trying to rewrite uploaded image by using ImageTranscoder from ImageIO library. But i am not really sure what it is doing, and if all the possibly malicious code is removed from image, because it seems that only metadata is being rewritten.
Is there any suggestions, best practices, of how desired goal should be achieved to remove all possibly malicious code inside image file?
You do not need an intermediate file format like PBM, as BufferedImage (which is the standard way of representing an in-memory bitmap in Java) is just plain pixel data. You can just go from encoded "anything" to decoded bitmap to encoded PNG.
The simplest way you could possibly do what you describe is:
ImageIO.write(ImageIO.read(input), "PNG", output);
This is rather naive code, and will break for many real-world files, or possibly just silently not output anything. You probably want to handle at least the most normal error cases, so something like below:
BufferedImage image = ImageIO.read(input);
if (image == null) {
// TODO: Handle image not read (decoded)
}
else if (!ImageIO.write(image, "PNG", output)) {
// TODO: Handle image not written (could not be encoded as PNG)
}
Other things to consider: The above will remove malicious code in the meta data. However, there might be special images crafted for DoS (small files decoding to huge in-memory representations, TIFF IFD loops, and much more). These problems need to be addressed in the image decoders for the various input formats. But at least your output files should be safe from this.
In addition, malicious code could be stored in the ICC profile, which might be carried over to the output image. You can probably avoid this by force converting all images to the built-in sRGB color space, or writing the images without ICC profiles.
PS: The ImageTranscoder interface is intended for situations where you want to keep as much meta data as possible (that is why it has methods only for meta data), and allows transformation of meta data from one file format to another (one could argue the name should have been MetadataTranscoder).
I'm a bit of a noob when it comes to java, and I've been trying to do my best to learn as much about the language as possible. Recently I've started learning the logic behind analyzing images pixel by pixel for their RGB data. Whilst doing this I stumbled upon svs files which are these extremely high quality files that are basically multilayered tiffs.I've explored several open source projects that decode and display .svs images, but couldn't find the algorithms or code in which they decoded the .svs files. Could someone direct me to what file(s) inside of the open source project that would contain the algorithm to decode an svs file, as I'm deeply interested in how one would go about decoding such a large and complex image file, or could someone help me with an algorithm to decode a .svs file in Java. Thanks in advance!
links:
https://github.com/openslide/openslide-java
https://github.com/imagej/imagej
SVS files are produced by Aperio scanners, so you need the Aperio decoder from openslide:
https://github.com/openslide/openslide/blob/master/src/openslide-vendor-aperio.c
openslide have some docs on the format here:
http://openslide.org/formats/aperio/
It's interesting to understand the details, but perhaps not very useful. If you want to read it yourself from Java, you can use the excellent openslide binding, or a libtiff binding plus a lot of extra code.
As mentioned, the .svs files created by Aperio, are just TIFF files (with some limitations, and some minor extensions).
It is not clear from the question whether you just want to read images from such files using an existing library, or if you want to develop such a solution for yourself (for educational purposes or otherwise).
If the latter is the case, you really should read the TIFF 6.0 specification along with the Adobe Tech Notes specifying "new" JPEG compression and and what you can find documented about the Aperio SVS format. You can also look at the source code of existing libraries. Describing the steps necessary to implement a TIFF/SVS reader/decoder from scratch is way beyond the scope of a StackOverflow answer.
If, on the other hand, you just want to open such a file in Java, you should be able to open most of them simply by using ImageIO and a TIFF plugin*.
Code could be as simple as:
BufferedImage image = ImageIO.read(new File("path/to/your.svs"));
This will read the first, and according to what I understand from the specification, the full resolution image in the file.
To read specific image (or, all images if you add a loop) in the file, the code becomes a little more verbose:
// Create input stream
File file = new File("path/to/your.svs");
try (ImageInputStream input = ImageIO.createImageInputStream(file)) {
// Get the reader
Iterator<ImageReader> readers = ImageIO.getImageReaders(input);
if (!readers.hasNext()) {
throw new IllegalArgumentException("No reader for: " + file);
}
ImageReader reader = readers.next();
try {
reader.setInput(input);
// Optionally, listen for read warnings, progress, etc.
reader.addIIOReadWarningListener(...);
reader.addIIOReadProgressListener(...);
// Use reader.getNumImages(true) to get the number of images
// in the file, and optionally add a loop to read all the images.
ImageReadParam param = reader.getDefaultReadParam();
// Optionally, control read settings like subsampling, source region or destination etc.
param.setSourceSubsampling(...);
param.setSourceRegion(...);
param.setDestination(...);
// ...
// Finally read the image, using settings from param
BufferedImage image = reader.read(0, param);
// Optionally, read thumbnails, etc...
int numThumbs = reader.getNumThumbnails(0);
// ...
// Optionally, get the image metadata (ie. to get the custom Aperio
// values from the ImageDescription tag for further processing)
IIOMetadata metadata = reader.getImageMetadata(0);
// ...
}
finally {
// Dispose reader in finally block to avoid memory leaks
reader.dispose();
}
}
This will allow you to skip images that are too large to display in Java (.svs files may contain images that are too large for a BufferedImage...) or have a custom compression not supported by the reader (.svs files may contain images compressed both in baseline JPEG and custom JPEG 2000).
You probably need to read up on the specification to see the order of images, what image is a "thumbnail", a "macro" and a "label" image. I think the "thumbnail" is always a JPEG stream.
*) TIFF plugins for ImageIO:
The most common TIFF plugin for ImageIO is JAI ImageIO (jai_imageio.jar), but it is no longer in development, and I have not tested it with .svs files.
My own project, TwelveMonkeys ImageIO, is actively developed and contains a TIFF plugin that aims to be compatible with the one from JAI, but fixing bugs and adding missing features. I have tested this plugin with some sample .svs files, and it can read them, except the ones having a non-standard (as in "not in the TIFF specification") JPEG 2000 compression.
There are TIFF plugins that I know of, but I haven't tried any of these.
There might also be special purpose Aperio SVS plugins available, that I don't know of.
Given some source file (or more generic - input stream), I need to find out
is it an image
if it is an image, then retrieve its type (png/jpeg/gif/etc)
retrieve exif data, if available
I looked at the API, but it is not clear how to get the type of image or Exif data.
Last time I had to do this, a couple of years ago, the standard API couldn't read EXIF data. This library can do so though:
http://www.drewnoakes.com/code/exif/
Easy answer:
Use https://github.com/drewnoakes/metadata-extractor/
If you're crazy/brave/curious, you could get image type from the stream by reading the first few bytes (these are magic numbers). I believe the exif is generally at the start of the stream too.
It's an old thread, but I was doing this recently and found the Apache Tika library useful.
Particularly for analysing generic streams to detect what content is in them.
Thought it might help others.
http://tika.apache.org/
I have a situation where I would like to do some very light image file obfustication. My application ships with a bunch of .png files and I'd like it if they weren't so readily editable.
I'm not looking for a 'secure' solution (I don't believe one really exists), I'd just like Joe Public to be unable to edit the files.
I am currently using;
ImageIO.read(new File("/images/imagefile.png"));
I'd rather not have to use Serialisation, as the ImageIO system is pretty deeply ingrained in the code, each image needs also to remain as its own file on disk.
I was hoping I could just change the file extension eg;
ImageIO.read(new File("/images/imagefile.dat"));
But ImageIO seems to use it to identify the file. Can I tell ImageIO that it is a PNG despite its extension?
Encrypt all the files on disk.
Then in the program, decrypt a file, load it in memory and go rocking.
Java image I/O uses the Service Provider Interface to support new image formats1. I believe it might be possible to add a new decoder using a file extension. If that is the case, there is the route to providing an easily pluggable reader for a custom image format.
Note that you will probably need to change the file extension in the source. That might be the job for an advanced IDE, or a one-time search and replace using grep.
As to the format, one extremely simple way make media files unreadable in common readers is to write the bytes of the image in reverse order. Then flip them back after read, put them in a ByteArrayInputStream, and pass them to ImageIO.read(InputStream).
After you have written the service provider and Jar'd it properly (using a manifest with attributes to identify the file/content type it handles, and the corresponding encoder/decoder), add it to the run-time class-path of the app., and it should be able to read the custom image format.
...or keep all images in a single file and seek() to the start position of each image as you load. You can do this by pre-seeking against a FileInputStream, or conversely by creating a ByteArrayInputStream for ImageIO.read(InputStream).
You could try this:
Iterator rs = ImageIO.getImageReadersByFormatName("png");
ImageReader ir = (ImageReader) rs.next();
File srcFile = new File("/images/imagefile.dat");
ImageInputStream iis = ImageIO.createImageInputStream(srcFile);
ir.setInput(iis);
When ImageIO is used to read an image file, then writes the BufferedImage (without any manipulation of the BufferedImage objet) to another file, the file size of the written file is different as compared to the original file.
Does ImageIO reads the full contents (including any metadata, like Exif metadata) of the image file?
And if it does read the full contents, then does ImageIO writes out the image contents including any metadata?
Many file formats (including for instance jpeg) can be correctly compressed in several different ways (even for the same quality settings). The decompression is deterministic while the compression may non-deterministic.
The fact that there is no unique right way of compressing an image implies that you can't expect ImageIO to produce a byte-equivalent result after loading / saving a file.