I have a EBCDIC file from which i extracted images. However, there is some data on the images which is key source in identifying my transactions.
Assume that i have an image as "stackoverflow logo" stored under name "img1.jpg" on my desktop and when i read it using the following code, it works
String inputImage = "C:\\Desktop\\img1.jpg";
File imageFile = new File(inputImage);
BufferedImage image1 = ImageIO.read(imageFile);
System.out.println(image1);
However, when i attempt the same with an image decoded from EBCDIC conversion, it returns null.
The difference i observed is that there is no color associated in the decoded image. Is there any way to read these images and retrieve the text on the image. Following is not the exact image which i am working on, but just to give an idea i am sharing a sample from internet.
Note: The image am working on looks like a Scanned image (Grayscale)
Example:
Also, I observed that if i open the decode file and do a screen capture via snipping tool and store it as jpg file (which already is jpg) and read it, system is reading that file. not sure where is the issue, is it compression or color coding or format.
Thank you everyone.
I used Tess4j to decode the TIFF image. Unfortunately the information i was looking for isn't available in the decoded text. But, done with the POC.
used the following library and added eng.traineddata in the folder where images exist
import net.sourceforge.tess4j.*;
String inputImage = "C:\\Desktop\\img1.tiff";
File imageFile = new File(inputImage);
ITesseract imageRead = new Tesseract();
imageRead.setDataPath("C:\\Desktop\\");
imageRead.setLanguage("eng");
String imageText = imageRead.doOCR(imageFile);
System.out.println(imageText);
Related
I am using the Tesseract Java API (tess4J) to convert Tiff images to PDFs.
This works nicely, but I am forced to write both the source Tiff image and the output PDF to local filestore as actual physical files in order to use the TessAPI1.TessPDFRendererCreate API.
Please note the following in the code snippet below: -
The input Tiff is originally a java.awt.image.BufferedImage, but I have to write it to a physical file (sourceTiffFile is a File object).
I must specify a file path for the output (pdfFullFilepath is a String representing an absolute path for the new PDF file).
try {
ImageIO.write(bufferedImage, "tiff", sourceTiffFile);
} catch (Exception ioe) {
//handling code...
}
TessResultRenderer renderer = TessAPI1.TessPDFRendererCreate(pdfFullFilepath, dataPath, 0);
TessAPI1.TessResultRendererInsert(renderer, TessAPI1.TessPDFRendererCreate(pdfFullFilepath, dataPath, 0));
int result = TessAPI1.TessBaseAPIProcessPages(handle, sourceTiffFile.getAbsolutePath(), null, 0, renderer);
I would really like to avoid creating physical files, but am not sure if it is possible with this API. Ideally, I would like to pass the Tiff as a java.awt.image.BufferedImage or a byte array and receive the output PDF as a byte array.
Any suggestions would be most welcome as always. Thank you :)
You can pass in ProcessPage API method a Pix, which can be converted from a BufferedImage, but the output will still be a physical file. Tesseract API dictates that.
https://tesseract-ocr.github.io/tessapi/4.0.0/a01625.html
http://tess4j.sourceforge.net/docs/docs-4.4/net/sourceforge/tess4j/TessAPI1.html
For ex:
int result = TessAPI1.TessBaseAPIProcessPage(handle, LeptUtils.convertImageToPix(bufferedImage), page_index, "input file name", null, 0, renderer);
I got a strange issue with a GIF image in Java. The image is provided by an XML API as Base64 encoded string. To decode the Base64, I use the commons-codec library in version 1.13.
When I just decode the Base64 string and write the bytes out to a file, the image shows properly in browsers and MS Paint (nothing else to test here).
final String base64Gif = "[Base64 as provided by API]";
final byte[] sigImg = Base64.decodeBase64(base64Gif);
File sigGif = new File("C:/Temp/pod_1Z12345E5991872040.org.gif");
try (FileOutputStream fos = new FileOutputStream()) {
fos.write(sigImg);
fos.flush();
}
The resulting file opened in MS Paint:
But when I now start consuming this file using Java (for example creating a PDF document from HTML using the openhtmltopdf library), it is corrupted and does not show properly.
final String htmlLetterStr = "[HTML as provided by API]";
final Document doc = Jsoup.parse(htmlLetterStr);
try (FileOutputStream fos = new FileOutputStream(new File("C:/Temp/letter_1Z12345E5991872040.pdf"))) {
PdfRendererBuilder builder = new PdfRendererBuilder();
builder.useFastMode();
builder.withW3cDocument(new W3CDom().fromJsoup(doc), "file:///C:/Temp/");
builder.toStream(fos);
builder.useDefaultPageSize(210, 297, BaseRendererBuilder.PageSizeUnits.MM);
builder.run();
fos.flush();
}
When I now open the resulting PDF, the image created above looks like this. It seems that only the first pixel lines are printed, some layer is missing, or something like that.
The same happens, if I read the image again with ImageIO and try to convert it into PNG. The resulting PNG looks exactly the same as the image printed in the PDF document.
How can I get the image to display properly in the PDF document?
Edit:
Link to original GIF Base64 as provided by API: https://pastebin.com/sYJv6j0h
As #haraldK pointed out in the comments, the GIF file provided via the XML API does not conform to the GIF standard and thus cannot be parsed by Java's ImageIO API.
Since there does not seem to exist a pure Java tool to repair the file, the workaround I came up with now is to use ImageMagick via Java's Process API. Calling the convert command with the -coalesce option will parse the broken GIF and create a new one that does conform to the GIF standard.
// Decode broken GIF image and write to disk
final String base64Gif = "[Base64 as provided by API]";
final byte[] sigImg = Base64.decodeBase64(base64Gif);
Path gifPath = Paths.get("C:/Temp/pod_1Z12345E5991872040.tmp.gif");
if (!Files.exists(gifPath)) {
Files.createFile(gifPath);
}
Files.write(gifPath, sigImg, StandardOpenOption.WRITE, StandardOpenOption.TRUNCATE_EXISTING);
// Use the Java Process API to call ImageMagick (on Linux you would use the 'convert' binary)
ProcessBuilder procBuild = new ProcessBuilder();
procBuild.command("C:\\Program Files\\ImageMagick-7.0.9-Q16\\magick.exe", "C:\\Temp\\pod_1Z12345E5991872040.tmp.gif", "-coalesce", "C:\\Temp\\pod_1Z12345E5991872040.gif");
Process proc = procBuild.start();
// Wait for ImageMagick to complete its work
proc.waitFor();
The newly created file can be read by Java's ImageIO API and be used as expected.
I am creating a report printer in Java using the LibreOffice SDK and Apache Batik. Using Batik, I draw svgs which I then insert into a LibreOffice Writer document. To properly insert the image, all I found is using a path to load the image from disk and insert it into the document. So far so good, but I have to explicitly save the document to disk in order to read it into libreoffice again.
I tried to use a data url as the image path but it did not work. Are there any possibilities to read an image from a stream or anything else I can use without storing the file to disk?
I found a solution. I realized how to do it when I realized that all my images I added were just image links. So I had to embed the images instead.
To use this, you need:
Access to the XComponentContext
A TextGraphicObject in your document (see the links above)
The image as byte[] or use another stream
The code:
Object graphicProviderObject = xComponentContext.getServiceManager().createInstanceWithContext(
"com.sun.star.graphic.GraphicProvider",
xComponentContext);
XGraphicProvider xGraphicProvider = UnoRuntime.queryInterface(
XGraphicProvider.class, graphicProviderObject);
PropertyValue[] v = new PropertyValue[1];
v[0] = new PropertyValue();
v[0].Name = "InputStream";
v[0].Value = new ByteArrayToXInputStreamAdapter(imageAsByteArray);
XGraphic graphic = xGraphicProvider.queryGraphic(v);
if (graphic == null) {
LOGGER.error("Error loading the image");
return;
}
XPropertySet xProps = (XPropertySet) UnoRuntime.queryInterface(
XPropertySet.class, textGraphicObject);
// Set the image
xProps.setPropertyValue("Graphic", graphic);
This worked effortlessly even for my svg images.
Source: https://blog.oio.de/2010/05/14/embed-an-image-into-an-openoffice-org-writer-document/
I want to read a .mnc file I've downloaded from http://brainweb.bic.mni.mcgill.ca/brainweb/.
This file contains simulated data of MRI scanning of a human brain. What I want is, to read the file using java code and convert it into buffer image object so that I can process buffer image object.
Bio-Formats can read MINC MRI files.
Use BF.openImagePlus(String path) to open the file as an ImagePlus:
ImagePlus imp = BF.openImagePlus(path);
You can then use ImageJ to work with the pixel data (i.e. using an ImageProcessor or, if necessary, a BufferedImage).
I need to upload an image file and generate a thumbnail for the uploaded file in my JSF webapplication. The original image is stored on the server in /home/myname/tomcat/webapps/uploads, while the thumbnail is stored in /home/myname/tomcat/webapps/uploads/thumbs. I'm using the thumbnail generator class I copied from philreeve.com.
I have successfully uploaded the file with help from BalusC. But using Toolkit.getImage(), I can't access the image.
I used the uploaded file's absolute path, like so:
inFilename = file.getAbsolutePath();
The relevant code from the thumbnail generator is:
public static String createThumbnail(String inFilename, String outFilename, int largestDimension) {
...
Image inImage = Toolkit.getDefaultToolkit().getImage(inFilename);
if (inImage.getWidth(null) == -1 || inImage.getHeight(null) == -1) {
return "Error loading file: \"" + new File(inFilename).getAbsolutePath() + "\"";
}
...
}
Since I am already using the absolute path, I don't understand why it is not working. I have also used the following values for inFilename, but I always get the "Error loading file...".
/home/myname/tomcat/webapps/uploads/filename.ext
/uploads/filename.ext
But I did check the directory, and the image is there. (I uploaded using /home/myname/tomcat/webapps/uploads/filename.ext, and it works.) What is the correct path for the image in that directory? Thank you.
Update
I got the code to work by using:
Image inImage = ImageIO.read(new File(inFilename));
I still don't understand why Toolkit.getImage() does not work though.
Are you sure it's a JPEG file? Use an image viewer to make sure nothing bad happened to the file during upload (or that it was an image to begin with).
Also, use new File(inFilename).exists() to make sure the path is correct. I also suggest to print new File(inFilename).getAbsolutePath() in error messages because relative paths can hurt you.
That said, the rest of the code looks correct.
The problem is that Toolkit.getImage() does not return the image immediately. The issue is well-described in this bug report, a relevant extract of which is here:
This is not a bug. The submitter is not properly using the asynchronous
Image API correctly. He assumes that getImage loads all of the image's bits
into memory. However, it is well documented that the actual loading of
bits does not take place until a call to Component.prepareImage or
Graphics.drawImage. In addition, these two functions return before the
Image is fully loaded. Developers are required to install an ImageObserver
to listen for notification that the Image has been fully loaded. Once they
receive this notification, they can repaint the Image.
I found that the answer to this question works well:
Image image = new ImageIcon(this.getClass().getResource("/images/bell-icon16.png")).getImage();