Feed tesseract (Tess4J) from ImageMagick (JMagick) - java

I'm trying to create a Java program that will OCR many formats of images. Images cannot be read directly from file, because their bytes are to be send through network.
I'm currently able to read raw bytes of image pixels using ImageIO. However I would like to support all the formats that are supported by ImageMagick, so read the image using JMagick and then give raw bytes to Tess4J. I'm not sure how I should approach this. I found this function can give me bytes:
PixelPacket[] MagickImage.getColormap();
But I would have to write special method for transforming obtained the PixelPacket objects to consecutive bytes. I can do that, but maybe there's better way to do this? For example maybe there's some extremely raw file format (even more than http://en.wikipedia.org/wiki/BMP_file_format#mediaviewer/File:BMPfileFormat.png) that I could use for example in this method:
byte[] imageToBlob(ImageInfo imageInfo) ?
The imageInfo object will have to point to this raw format and then I can cut out the pixels information from the bytes array.
Is this the proper way or I should use something simpler (faster/more robust)?
Edit
I found the format I had in mind is called PNM.

I think using the dispatchImage method is what you are looking for, if using JMagick. It will give you access to the raw pixels of the image directly. No file format required.
See my MagickUtil class for examples, or just use that class if you feel like.
I've also written pure Java ImageIO plugins for many of the same formats that JMagick supports, that might be of use. You'll find them in the my GitHub repository.

Related

No reader matches PNG-Stream in Java.ImageIO

I'm trying to read the meta-data of a PNG file with java following the solution proposed here.
But the method ImageIO.getImageReaders(inputStream) is returning an empty list of readers.
I assured that the stream is correct by reading it via ImageIO.read and rendering the resulting Image to the screen.
And this is why I'm confused: since ImageIO.read returns a valid image, i assume there is some ImageReader claiming to be able to interpret this stream. Is there a difference between interpreting image data and the meta-data of the image?
Any hints or even solutions to this problem?
Thank you very much.
I believe that ImageIO.getImageReaders() expects an ImageInputStream, you can try to create one from your InputStream using createImageInputStream. I guess that's what ImageIO.read(InputStream) does under the hood.
Anyway, if you already know that you have a PNG, why not use getImageReadersByFormatName("png") ?
BTW: height and width (and color model, etc) can be considered as "image metadata", in the sense that they are not part of the pixels values (which would be the real data), but in common parlance, they are regarded rather as image (esential) properties. The image metadata is generally (and specifcally in IIOMetadata) understood to be additional "miscelanous" data (as physical resolution, timestamp) which is normally not needed to access the image data.

ImageIO - get image type and exif data

Given some source file (or more generic - input stream), I need to find out
is it an image
if it is an image, then retrieve its type (png/jpeg/gif/etc)
retrieve exif data, if available
I looked at the API, but it is not clear how to get the type of image or Exif data.
Last time I had to do this, a couple of years ago, the standard API couldn't read EXIF data. This library can do so though:
http://www.drewnoakes.com/code/exif/
Easy answer:
Use https://github.com/drewnoakes/metadata-extractor/
If you're crazy/brave/curious, you could get image type from the stream by reading the first few bytes (these are magic numbers). I believe the exif is generally at the start of the stream too.
It's an old thread, but I was doing this recently and found the Apache Tika library useful.
Particularly for analysing generic streams to detect what content is in them.
Thought it might help others.
http://tika.apache.org/

Pure Java alternative to JAI ImageIO for detecting CMYK images

first I'd like to explain the situation/requirements that lead to the question:
In our web application we can't support CMYK images (JPEG) since IE 8 and below can't display them.
Thus we need to detect when someone wants to upload such an image and deny it.
Unfortunately, Java's ImageIO won't read those images or would not enable me to get the detected color space. From debugging it seems like JPEGImageReader internally gets the color space code 11 (which would mean JCS_YCCK) but I can't safely access that information.
When querying the reader for the image types I get nothing for CMYK, so I might assume no image types = unsupported image.
I converted the source CMYK image to RGB using an imaging tool in order to test whether it would then be readable (I tried to simulate the admin's steps when getting the message "No CMYK supported"). However, JPEGImageReader would not read that image, since it assumes (comment in the source!)3-component RGB color space but the image header reports 4 components (maybe RGBA or ARGB) and thus an IllegalArgumentException is thrown.
Thus, ImageIO is not an option since I can't reliably get the color space of an image and I can't tell the admin why an otherwise fine image (it can be displayed by the browser) would not be accepted due to some internal error.
This led me to try JAI ImageIO whose CLibJPEGImageReader does an excellent job and correctly reads all my test images.
However, since we're deploying our application in a JBoss that might host other applications as well, we'd like to keep them as isolated as possible. AFAIK, I'd need to install JAI ImageIO to the JRE or otherwise make the native libs available in order to use them, and thus other applications might get access to them as well, which might cause side effects (at least we'd have to test a lot to ensure that's not the case).
That's the explanation for the question, and here it comes again:
Is there any pure Java alternative to JAI ImageIO which reliably detects and possibly converts CMYK images?
Thanks in advance,
Thomas
I found a solution that is ok for our needs: Apache Commons Sanselan. This library reads JPEG headers quite fast and accurate (at least all my test images) as well as a number of other image formats.
The downside is that it won't read JPEG image data, but I can do that with the basic JRE tools.
Reading JPEG images for conversion is quite easy (the ones that ImageIO refuses to read, too):
JPEGImageDecoder decoder = JPEGCodec.createJPEGDecoder(new FileInputStream( new File(pFilename) ) );
BufferedImage sourceImg = decoder.decodeAsBufferedImage();
Then if Sanselan tells me the image is actually CMYK, I get the source image's raster and convert myself:
for( /*each pixel in the raster, which is represented as int[4]*/ )
{
double k = pixel[3] / 255.0;
double r = (255.0 - pixel[0])*k;
double g = (255.0 - pixel[1])*k;
double b = (255.0 - pixel[2])*k;
}
This give quite good results in the RGB images not being too bright or dark. However, I'm not sure why multiplying with k prevents the brightening. The JPEG is actually decoded in native code and the CMYK->RGB conversion I got states something different, I just tried the multiply to see the visual result.
If anybody could shed some light on this, I'd be grateful.
I've posted a pure Java solution for reading all sorts of JPEG images and converting them to RGB.
It's built on the following facts:
While ImageIO cannot read JPEG images with CMYK as a buffered image, it can read the raw pixel data (raster).
Sanselan (or Apache Commons Imaging as it's called now) can be used to read the details of CMYK images.
There are images with inverted CMYK values (an old Photoshop bug).
There are images with YCCK instead of CMYK (can easily be converted).
Beware of another post as the Java 7 does not allow to use directly Sun's implementation without special parameters as indicated in import com.sun.image.codec.jpeg.*.
In our web application we can't support CMYK images (JPEG) since
IE 8 and below can't display them. Thus we need to detect when someone
wants to upload such an image and deny it.
I don't agree with your "Thus we need to detect when someone wants to upload such an image and deny it". A much more user-friendly policy would be to convert it to something else than CMYK.
The rest of your post is a bit confusing in that regards seen that you ask both for detection and conversion, which are two different things. Once again, I think converting the image is much more user-friendly.
No need to write in bold btw:
Is there any pure Java alternative to JAI ImageIO which reliably
detects and possibly converts CMYK images?
Pure Java I don't know, but ImageMagick works fine to convert CMYK image to RGB ones. Calling ImageMagick on the server-side from Java really isn't complicated. I used to do it manually by calling an external process but nowadays there are wrappers like JMagick and im4java.

How can I do image manipulation on a very large BMP?

I am trying to do some manipulation (specifically, conversion to a different type of splitting into tiles) on a set of very large (a few GB) BMP image files.
I'm not sure I understand the BMP file format, but is it necessary to load the entire file into memory? I was unable to find any API that didn't require loading the entire file at some point. ImageMagick wasn't able to do it either.
Java would be the best tool of choice for me, but any other solution including command line tools or desktop software would be acceptable.
Based on this, it should be reasonably apparent that you can use the fact that it's row-packed. You should be able to read part of a row, store it, advance to the same position in the next row and repeat until you have completed a tile of the desired size. Obviously you may be able to do multiple tiles at once if you can store an entire row worth of tiles in memory all at once.
It is not necessary to load more than the header of the file at once. The header format is describe in the linked Wikipedia entry. It's probably worth paying attention to any compression schemes being used - compression is likely to make this task a bit harder (though still not impossible) :)
Not sure about 2011 year of this questing, but Java 7 (and possibly 6) has good approach for such pleasure.
Use ImageReader and ImageReadParam classes, they allow to read any rectangular part of source image if needed. Set source rectangle and read
... // initialization
private javax.imageio.ImageReadParam m_params;
private javax.imageio.ImageReader m_reader;
... // reading
m_params.setSourceRegion( readRect );
BufferedImage rdImg = m_reader.read( i, m_params );
... // processing/displaying etc
As BMP (it works too for any standard Java image type) is very planar format without any compression in 99.99% cases, to read any its part such way is very fast process with low memory consumption except resulted BufferedImage. Or you can even reuse previous BufferedImage on consequent request with the rect of same dimensions. See ImageReadParam.setDestination(BufferedImage destination). But I didn't test this option.
Also I found one small reading bug namely in BMP reader class implementation of Java runtime lib. If somebody be interested in this, I'll show simple way to correct it.

BarCode Image Generator in Java

How can I create a barcode image in Java? I need something that will allow me to enter a number and produce the corresponding barcode image. Is there a free library available for this type of task?
iText is a great Java PDF library. They also have an API for creating barcodes. You don't need to be creating a PDF to use it.
This page has the details on creating barcodes. Here is an example from that site:
BarcodeEAN codeEAN = new BarcodeEAN();
codeEAN.setCodeType(codeEAN.EAN13);
codeEAN.setCode("9780201615883");
Image imageEAN = codeEAN.createImageWithBarcode(cb, null, null);
The biggest thing you will need to determine is what type of barcode you need. There are many different barcode formats and iText does support a lot of them. You will need to know what format you need before you can determine if this API will work for you.
There is also this free API that you can use to make free barcodes in java.
Barbecue
There is a free library called barcode4j
ZXing is a free open source Java library to read and generate barcode images. You need to get the source code and build the jars yourself. Here's a simple tutorial that I wrote for building with ZXing jars and writing your first program with ZXing.
[http://www.vineetmanohar.com/2010/09/java-barcode-api/]
I use
barbeque
, it's great, and supports a very wide range of different barcode formats.
See if you like
its API
.
Sample API:
public static Barcode createCode128(java.lang.String data)
throws BarcodeException
Creates a Code 128 barcode that
dynamically switches between character
sets to give the smallest possible
encoding. This will encode all
numeric characters, upper and lower
case alpha characters and control
characters from the standard ASCII
character set. The size of the barcode
created will be the smallest possible
for the given data, and use of this
"optimal" encoding will generally
give smaller barcodes than any of the
other 3 "vanilla" encodings.

Categories