Check if a file is an image - java

I am using JAI and create a file with:
PlanarImage img = JAI.create("fileload", myFilename);
I check before that line if the file exists. But how could I check if the file is a .bmp or a .tiff or an image file?
Does anyone know?

The Image Magick project has facilities to identify image and there's a Java wrapper for Image Magick called JMagick which I think you may want to consider instead of reinventing the wheel:
http://www.jmagick.org
I'm using Image Magick all the time, including its "identify" feature from the command line and it never failed once to identify a picture.
Back in the days where I absolutely needed that feature and JMagick didn't exist yet I used to Runtime.exec() ImageMagick's identify command from Java and it worked perfectly.
Nowadays that JMagick exist this is probably not necessary anymore (but I haven't tried JMagick yet).
Note that it gives much more than just the format, for example:
$ identify tmp3.jpg
tmp3.jpg JPEG 1680x1050 1680x1050+0+0 DirectClass 8-bit 293.582kb
$ identify tmp.png
tmp.png PNG 1012x900 1012x900+0+0 DirectClass 8-bit 475.119kb

Try using the width of the image:
boolean isImage(String image_path){
Image image = new ImageIcon(image_path).getImage();
if(image.getWidth(null) == -1){
return false;
}
else{
return true;
}
}
if the width is -1 then is not image.

To tell if something is a png, I've used this below snippet in Android java.
public CompressFormat getCompressFormat(Context context, Uri fileUri) throws IOException {
// create input stream
int numRead;
byte[] signature = new byte[8];
byte[] pngIdBytes = { -119, 80, 78, 71, 13, 10, 26, 10 };
InputStream is = null;
try {
ContentResolver resolver = context.getContentResolver();
is = resolver.openInputStream(fileUri);
// if first 8 bytes are PNG then return PNG reader
numRead = is.read(signature);
if (numRead == -1)
throw new IOException("Trying to reda from 0 byte stream");
} finally {
if (is != null)
is.close();
}
if (numRead == 8 && Arrays.equals(signature, pngIdBytes)) {
return CompressFormat.PNG;
}
return null;
}

At the beginning of files, there is an identifying character sequence.
For example JPEG files starts with FF D8 FF.
You can check for this sequence in your program but I am not sure whether this works for every file.
For information about identifying characters you can have a look at http://filext.com

You could use DROID, a tool for file format identification that also offers a Java API, to be used roughly like this:
AnalysisController controller = new AnalysisController();
controller.readSigFile(signatureFileLocation);
controller.addFile(fileToIdentify.getAbsolutePath());
controller.runFileFormatAnalysis();
Iterator<IdentificationFile> it = controller.getFileCollection().getIterator();
Documentation on the API usage is rather sparse, but you can have a look at this working example (the interesting part is in the identifyOneBinary method).

The only (semi-)reliable way to determine the contents of a file is to open it and read the first few characters. Then you can use a set of tests such as implemented in the Unix file command to make an educated guess as to the contents of the file.

Expanding on Birkan's answer, there is a list of 'magic numbers' available here:
http://www.astro.keele.ac.uk/oldusers/rno/Computing/File_magic.html
I just checked a BMP and TIFF file (both just created in Windows XP / Paint), and they appear to be correct:
First two bytes "42 4d" -> BMP
First four bytes "4d 4d 00 2a" -> TIFF
I used VIM to edit the files and then did Tools | Convert to Hex, but you can also use 'od -c' or something similar to check them.
As a complete aside, I was slightly amused when I found out the magic numbers used for compiled Java Classes: 'ca fe ba be' - 'cafe babe' :)

Try using the standard JavaBeans Activation Framework (JAF)
With the JavaBeans Activation Framework standard extension, developers who use Java technology can take advantage of standard services to determine the type of an arbitrary piece of data, encapsulate access to it, discover the operations available on it, and to instantiate the appropriate bean to perform said operation(s). For example, if a browser obtained a JPEG image, this framework would enable the browser to identify that stream of data as an JPEG image, and from that type, the browser could locate and instantiate an object that could manipulate, or view that image.

if(currentImageType ==null){
ByteArrayInputStream is = new ByteArrayInputStream(image);
String mimeType = URLConnection.guessContentTypeFromStream(is);
if(mimeType == null){
AutoDetectParser parser = new AutoDetectParser();
Detector detector = parser.getDetector();
Metadata md = new Metadata();
mimeType = detector.detect(is,md).toString();
if (mimeType.contains("pdf")){
mimeType ="pdf";
}
else if(mimeType.contains("tif")||mimeType.contains("tiff")){
mimeType = "tif";
}
}
if(mimeType.contains("png")){
mimeType ="png";
}
else if( mimeType.contains("jpg")||mimeType.contains("jpeg")){
mimeType = "jpg";
}
else if (mimeType.contains("pdf")){
mimeType ="pdf";
}
else if(mimeType.contains("tif")||mimeType.contains("tiff")){
mimeType = "tif";
}
currentImageType = ImageType.fromValue(mimeType);
}

Related

It is possible to use the TessAPI1.TessPDFRendererCreate API of tess4J without needing to create physical files?

I am using the Tesseract Java API (tess4J) to convert Tiff images to PDFs.
This works nicely, but I am forced to write both the source Tiff image and the output PDF to local filestore as actual physical files in order to use the TessAPI1.TessPDFRendererCreate API.
Please note the following in the code snippet below: -
The input Tiff is originally a java.awt.image.BufferedImage, but I have to write it to a physical file (sourceTiffFile is a File object).
I must specify a file path for the output (pdfFullFilepath is a String representing an absolute path for the new PDF file).
try {
ImageIO.write(bufferedImage, "tiff", sourceTiffFile);
} catch (Exception ioe) {
//handling code...
}
TessResultRenderer renderer = TessAPI1.TessPDFRendererCreate(pdfFullFilepath, dataPath, 0);
TessAPI1.TessResultRendererInsert(renderer, TessAPI1.TessPDFRendererCreate(pdfFullFilepath, dataPath, 0));
int result = TessAPI1.TessBaseAPIProcessPages(handle, sourceTiffFile.getAbsolutePath(), null, 0, renderer);
I would really like to avoid creating physical files, but am not sure if it is possible with this API. Ideally, I would like to pass the Tiff as a java.awt.image.BufferedImage or a byte array and receive the output PDF as a byte array.
Any suggestions would be most welcome as always. Thank you :)
You can pass in ProcessPage API method a Pix, which can be converted from a BufferedImage, but the output will still be a physical file. Tesseract API dictates that.
https://tesseract-ocr.github.io/tessapi/4.0.0/a01625.html
http://tess4j.sourceforge.net/docs/docs-4.4/net/sourceforge/tess4j/TessAPI1.html
For ex:
int result = TessAPI1.TessBaseAPIProcessPage(handle, LeptUtils.convertImageToPix(bufferedImage), page_index, "input file name", null, 0, renderer);

GIF image only partially displayed

I got a strange issue with a GIF image in Java. The image is provided by an XML API as Base64 encoded string. To decode the Base64, I use the commons-codec library in version 1.13.
When I just decode the Base64 string and write the bytes out to a file, the image shows properly in browsers and MS Paint (nothing else to test here).
final String base64Gif = "[Base64 as provided by API]";
final byte[] sigImg = Base64.decodeBase64(base64Gif);
File sigGif = new File("C:/Temp/pod_1Z12345E5991872040.org.gif");
try (FileOutputStream fos = new FileOutputStream()) {
fos.write(sigImg);
fos.flush();
}
The resulting file opened in MS Paint:
But when I now start consuming this file using Java (for example creating a PDF document from HTML using the openhtmltopdf library), it is corrupted and does not show properly.
final String htmlLetterStr = "[HTML as provided by API]";
final Document doc = Jsoup.parse(htmlLetterStr);
try (FileOutputStream fos = new FileOutputStream(new File("C:/Temp/letter_1Z12345E5991872040.pdf"))) {
PdfRendererBuilder builder = new PdfRendererBuilder();
builder.useFastMode();
builder.withW3cDocument(new W3CDom().fromJsoup(doc), "file:///C:/Temp/");
builder.toStream(fos);
builder.useDefaultPageSize(210, 297, BaseRendererBuilder.PageSizeUnits.MM);
builder.run();
fos.flush();
}
When I now open the resulting PDF, the image created above looks like this. It seems that only the first pixel lines are printed, some layer is missing, or something like that.
The same happens, if I read the image again with ImageIO and try to convert it into PNG. The resulting PNG looks exactly the same as the image printed in the PDF document.
How can I get the image to display properly in the PDF document?
Edit:
Link to original GIF Base64 as provided by API: https://pastebin.com/sYJv6j0h
As #haraldK pointed out in the comments, the GIF file provided via the XML API does not conform to the GIF standard and thus cannot be parsed by Java's ImageIO API.
Since there does not seem to exist a pure Java tool to repair the file, the workaround I came up with now is to use ImageMagick via Java's Process API. Calling the convert command with the -coalesce option will parse the broken GIF and create a new one that does conform to the GIF standard.
// Decode broken GIF image and write to disk
final String base64Gif = "[Base64 as provided by API]";
final byte[] sigImg = Base64.decodeBase64(base64Gif);
Path gifPath = Paths.get("C:/Temp/pod_1Z12345E5991872040.tmp.gif");
if (!Files.exists(gifPath)) {
Files.createFile(gifPath);
}
Files.write(gifPath, sigImg, StandardOpenOption.WRITE, StandardOpenOption.TRUNCATE_EXISTING);
// Use the Java Process API to call ImageMagick (on Linux you would use the 'convert' binary)
ProcessBuilder procBuild = new ProcessBuilder();
procBuild.command("C:\\Program Files\\ImageMagick-7.0.9-Q16\\magick.exe", "C:\\Temp\\pod_1Z12345E5991872040.tmp.gif", "-coalesce", "C:\\Temp\\pod_1Z12345E5991872040.gif");
Process proc = procBuild.start();
// Wait for ImageMagick to complete its work
proc.waitFor();
The newly created file can be read by Java's ImageIO API and be used as expected.

imageIO to open .HDR file

I need to open an .hdr file and work on it, but imageIO doesn't supports that format.
The problem is that I need to keep the information loss as little as possible: 32bpc is perfect, 16 is fine and less the 16 won't work.
There are 3 possible solutions I came up to:
Find a plugin that allow me to open .HDR file. I've been searching for it a lot but without luck;
Find a way to convert the .HDR file to a format I can find a plugin for. Tiff maybe? Tried this too but still no luck;
Reduce the dynamic range from 32bpc to 16bpc and then convert it to png. This is tricky because once I have a png file I win, but it's not that easy to cut the range without killing the image..
What would you recommend me to do? Do you know a way to make one of those 3 options works? Or do you have a better idea?
You can now read .HDR using ImageIO. :-)
This is a first version, so it might be a little rough around the edges, but should work for standard (default settings) Radiance RGBE .HDR files.
The returned image will be a custom BufferedImage with a DataBufferFloat backing (ie., samples will be in 3 samples, 32-bit float interleaved RGB format).
By default, a simple global tone-mapping is applied, and all RGB values will be normalized to range [0...1] (this allows anyone to just use ImageIO.read(hdrFile) and the image will look somewhat reasonable, in a very reasonable time).
It is also possible to pass an HDRImageReadParam to the ImageReader instance with a NullToneMapper. This is even faster, but the float values will be unnormalized, and might exceed the max value. This allows you to do custom, more sophisticated tone-mapping on the image data, before converting to something more displayable.
Something like:
// Create input stream
ImageInputStream input = ImageIO.createImageInputStream(hdrFile);
try {
// Get the reader
Iterator<ImageReader> readers = ImageIO.getImageReaders(input);
if (!readers.hasNext()) {
throw new IllegalArgumentException("No reader for: " + hdrFile);
}
ImageReader reader = readers.next();
try {
reader.setInput(input);
// Disable default tone mapping
HDRImageReadParam param = (HDRImageReadParam) reader.getDefaultReadParam();
param.setToneMapper(new NullToneMapper());
// Read the image, using settings from param
BufferedImage image = reader.read(0, param);
}
finally {
// Dispose reader in finally block to avoid memory leaks
reader.dispose();
}
}
finally {
// Close stream in finally block to avoid resource leaks
input.close();
}
// Get float data
float[] rgb = ((DataBufferFloat) image.getRaster().getDataBuffer()).getData();
// TODO: Custom tone mapping on float RGB data
// Convert the image to something easily displayable
BufferedImage converted = new ColorConvertOp(null).filter(image, new BufferedImage(image.getWidth(), image.getHeight(), BufferedImage.TYPE_INT_RGB));
// Optionally write as JPEG or other format
ImageIO.write(converted, "JPEG", new File(...));

How to know if the image exists or not by reading URL in Java?

http://nichehire.com/Nichehire/upload/img_job_photo_burhan393#gmail.com.jpg
I have a requirement to know whether an image exists or not by the given URL.
For that I am using below code. In the below code I'm returning true if no error occurs and if the URL doesn't contain an image then false will returned.
But the above given URL doesn't contain an image even though it returns true.
Is something wrong in my code?
public static boolean exists(String URLName) {
boolean result = false;
try {
InputStream input = (new URL(URLName)).openStream();
result = true;
} catch (IOException ex) {
System.out.println("Image doesnot exits :");
}
return result;
}
..whether an image exists or not..
ImageIO.read(URL)
This is really the only way to ensure that:
The URL points to something (not returning HTTP 404)
The server allows access to the resource (e.g. not returning HTTP 500)
The data at the end of the URL actually represents an image (as understood by Java) as opposed to being a text file renamed to some.gif.
[SOLVED] This source code works fine!
String url1 = "https://fbcdn-dragon-a.akamaihd.net/hphotos-ak-xft1/t39.1997-6/p200x200/851575_126362190881911_254357215_n.png";
Image image = ImageIO.read(new URL(url1));
if(image != null){
System.out.println("IMAGE");
}else{
System.out.println("NOT IMAGE");
}
try
boolean isImage(String image_path){
Image image = new ImageIcon(image_path).getImage();
if(image.getWidth(null) == -1){
return false;
}
else{
return true;
}
}
or
The Image Magick project has facilities to identify image and there's a Java wrapper for Image Magick called JMagick which I think you may want to consider instead of reinventing the wheel:
I'm using Image Magick all the time, including its "identify" feature from the command line and it never failed once to identify a picture. Back in the days where I absolutely needed that feature and JMagick didn't exist yet I used to Runtime.exec() ImageMagick's identify command from Java and it worked perfectly. Nowadays that JMagick exist this is probably not necessary anymore (but I haven't tried JMagick yet).
Note that it gives much more than just the format, for example:
$ identify tmp3.jpg
tmp3.jpg JPEG 1680x1050 1680x1050+0+0 DirectClass 8-bit 293.582kb
$ identify tmp.png
tmp.png PNG 1012x900 1012x900+0+0 DirectClass 8-bit 475.119kb
That's because you're assuming that any returned response is an image, although your URL could be returning anything, HTML, JSON .. anything.

Test if a file is an image file

I am using some file IO and want to know if there is a method to check if a file is an image?
This works pretty well for me. Hope I could help
import javax.activation.MimetypesFileTypeMap;
import java.io.File;
class Untitled {
public static void main(String[] args) {
String filepath = "/the/file/path/image.jpg";
File f = new File(filepath);
String mimetype= new MimetypesFileTypeMap().getContentType(f);
String type = mimetype.split("/")[0];
if(type.equals("image"))
System.out.println("It's an image");
else
System.out.println("It's NOT an image");
}
}
if( ImageIO.read(*here your input stream*) == null)
*IS NOT IMAGE*
And also there is an answer: How to check a uploaded file whether it is a image or other file?
In Java 7, there is the java.nio.file.Files.probeContentType() method. On Windows, this uses the file extension and the registry (it does not probe the file content). You can then check the second part of the MIME type and check whether it is in the form <X>/image.
You may try something like this:
String pathname="abc\xyz.png"
File file=new File(pathname);
String mimetype = Files.probeContentType(file.toPath());
//mimetype should be something like "image/png"
if (mimetype != null && mimetype.split("/")[0].equals("image")) {
System.out.println("it is an image");
}
You may try something like this:
import javax.activation.MimetypesFileTypeMap;
File myFile;
String mimeType = new MimetypesFileTypeMap().getContentType( myFile ));
// mimeType should now be something like "image/png"
if(mimeType.substring(0,5).equalsIgnoreCase("image")){
// its an image
}
this should work, although it doesn't seem to be the most elegant version.
There are a variety of ways to do this; see other answers and the links to related questions. (The Java 7 approach seems the most attractive to me, because it uses platform specific conventions by default, and you can supply your own scheme for file type determination.)
However, I'd just like to point out that no mechanism is entirely infallible:
Methods that rely on the file suffix will be tricked if the suffix is non-standard or wrong.
Methods that rely on file attributes (e.g. in the file system) will be tricked if the file has an incorrect content type attribute or none at all.
Methods that rely on looking at the file signature can be tricked by binary files which just happen to have the same signature bytes.
Even simply attempting to read the file as an image can be tricked if you are unlucky ... depending on the image format(s) that you try.
Other answers suggest to load full image into memory (ImageIO.read) or to use standard JDK methods (MimetypesFileTypeMap and Files.probeContentType).
First way is not efficient if read image is not required and all you really want is to test if it is an image or not (and maybe to save it's content type to set it in Content-Type response header when this image will be read in the future).
Inbound JDK ways usually just test file extension and not really give you result that you can trust.
The way that works for me is to use Apache Tika library.
private final Tika tika = new Tika();
private MimeType detectImageContentType(InputStream inputStream, String fileExtension) {
Assert.notNull(inputStream, "InputStream must not be null");
String fileName = fileExtension != null ? "image." + fileExtension : "image";
MimeType detectedContentType = MimeType.valueOf(tika.detect(inputStream, fileName));
log.trace("Detected image content type: {}", detectedContentType);
if (!validMimeTypes.contains(detectedContentType)) {
throw new InvalidImageContentTypeException(detectedContentType);
}
return detectedContentType;
}
The type detection is based on the content of the given document stream and the name of the document. Only a limited number of bytes are read from the stream.
I pass fileExtension just as a hint for the Tika. It works without it. But according to documentation it helps to detect better in some cases.
The main advantage of this method compared to ImageIO.read is that Tika doesn't read full file into memory - only first bytes.
The main advantage compared to JDK's MimetypesFileTypeMap and Files.probeContentType is that Tika really reads first bytes of the file while JDK only checks file extension in current implementation.
TLDR
If you plan to do something with read image (like resize/crop/rotate it), then use ImageIO.read from Krystian's answer.
If you just want to check (and maybe store) real Content-Type, then use Tika (this answer).
If you work in the trusted environment and you are 100% sure that file extension is correct, then use Files.probeContentType from prunge's Answer.
Here's my code based on the answer using tika.
private static final Tika TIKA = new Tika();
public boolean isImageMimeType(File src) {
try (FileInputStream fis = new FileInputStream(src)) {
String mime = TIKA.detect(fis, src.getName());
return mime.contains("/")
&& mime.split("/")[0].equalsIgnoreCase("image");
} catch (IOException e) {
throw new RuntimeException(e);
}
}

Categories