Get Image from the document using Apache POI

Get Image from the document using Apache POI - java

I am using Apache Poi to read images from docx.
Here is my code:
enter code here
public Image ReadImg(int imageid) throws IOException {
XWPFDocument doc = new XWPFDocument(new FileInputStream("import.docx"));
BufferedImage jpg = null;
List<XWPFPictureData> pic = doc.getAllPictures();
XWPFPictureData pict = pic.get(imageid);
String extract = pict.suggestFileExtension();
byte[] data = pict.getData();
//try to read image data using javax.imageio.* (JDK 1.4+)
jpg = ImageIO.read(new ByteArrayInputStream(data));
return jpg;
}
It reads images properly but not in order wise.
For example, if document contains
image1.jpeg
image2.jpeg
image3.jpeg
image4.jpeg
image5.jpeg
It reads
image4
image3
image1
image5
image2
Could you please help me to resolve it?
I want to read the images order wise.
Thanks,
Sithik

public static void extractImages(XWPFDocument docx) {
try {
List<XWPFPictureData> piclist = docx.getAllPictures();
// traverse through the list and write each image to a file
Iterator<XWPFPictureData> iterator = piclist.iterator();
int i = 0;
while (iterator.hasNext()) {
XWPFPictureData pic = iterator.next();
byte[] bytepic = pic.getData();
BufferedImage imag = ImageIO.read(new ByteArrayInputStream(bytepic));
ImageIO.write(imag, "jpg", new File("D:/imagefromword/" + pic.getFileName()));
i++;
}
} catch (Exception e) {
System.exit(-1);
}
}

Related

Convert PDF to JPG2000 file(s)

I recently started working on this project where I need to convert a PDF File into a JPEG2000 file(s) - 1 jp2 file per page -.
The goal was to replace a previous pdf to jpeg converter method we had, in order to reduce the size of the output file(s).
Based on a code I found on the internet, I made the pdftojpeg2000 converter method below, and I've been changing the setEncodingRate parameter value and comparing the results.
I managed to get smaller jpeg2000 output files, but the quality is very poor, compared to the Jpeg ones, specially for colored text or images.
Here is what my orginal pdf file looks like:
When I set setEncodingRate to 0.8 it looks like this:
My output file size is 850Ko, which is even bigger than the Jpeg (around 600Ko) ones, and lower quality.
At 0.1 setEncodingRate, the file size is considerably small, 111 Ko, but basically unreadable.
So basically what I'm trying to get here is smaller output files ( <600K ) with a better quality, And I'm wondering if it is feasible with the Jpeg2000 format.
public class ImageConverter {
public void compressor(String inputFile, String outputFile) throws IOException {
J2KImageWriteParam iwp = new J2KImageWriteParam();
PDDocument document = PDDocument.load(new File (inputFile), MemoryUsageSetting.setupMixed(10485760L));
PDFRenderer pdfRenderer = new PDFRenderer(document);
int nbPages = document.getNumberOfPages();
int pageCounter = 0;
BufferedImage image;
for (PDPage page : document.getPages()) {
if (page.hasContents()) {
image = pdfRenderer.renderImageWithDPI(pageCounter, 300, ImageType.RGB);
if (image == null)
{
System.out.println("If no registered ImageReader claims to be able to read the resulting stream");
}
Iterator writers = ImageIO.getImageWritersByFormatName("JPEG2000");
String name = null;
ImageWriter writer = null;
while (name != "com.sun.media.imageioimpl.plugins.jpeg2000.J2KImageWriter") {
writer = (ImageWriter) writers.next();
name = writer.getClass().getName();
System.out.println(name);
}
File f = new File(outputFile+"_"+pageCounter+".jp2");
long s = System.currentTimeMillis();
ImageOutputStream ios = ImageIO.createImageOutputStream(f);
writer.setOutput(ios);
J2KImageWriteParam param = (J2KImageWriteParam) writer.getDefaultWriteParam();
IIOImage ioimage = new IIOImage(image, null, null);
param.setSOP(true);
param.setWriteCodeStreamOnly(true);
param.setProgressionType("layer");
param.setLossless(true);
param.setCompressionMode(J2KImageWriteParam.MODE_EXPLICIT);
param.setCompressionType("JPEG2000");
param.setCompressionQuality(0.01f);
param.setEncodingRate(1.01);
param.setFilter(J2KImageWriteParam.FILTER_53 );
writer.write(null, ioimage, param);
System.out.println(System.currentTimeMillis() - s);
writer.dispose();
ios.flush();
ios.close();
image.flush();
pageCounter++;
}
}
}
public static void main(String[] args) {
String input = "E:/IMGTEST/mail-DOC0002.pdf";
String output = "E:/IMGTEST/mail-DOC0002/docamail-DOC0002-";
ImageConverter imgcv = new ImageConverter();
try {
imgcv.compressor(input, output);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

How to get image of a PDF page(included text). not images in a PDF page

I have tried a PDF page to image, But just extracted each images in the PDF page. not page image.
Below Code :
public class ExtractionPDFtoThumbImgs {
static String filePath = "/Users/tmdtjq/Downloads/PDFTest/test.pdf";
static String outputFilePath = "/Users/tmdtjq/Downloads/PDFTest/pageimages";
public static void change(File inputFile, File outputFolder) throws IOException {
//TODO check the input file exists and is PDF
//TODO for the treatment of PDF encrypted
PDDocument doc = null;
try {
doc = PDDocument.load(inputFile);
List<PDPage> allPages = doc.getDocumentCatalog().getAllPages();
for (int i = 0; i <allPages.size(); i++) {
PDPage page = allPages.get(i);
page.convertToImage();
BufferedImage image = page.convertToImage();
ImageIO.write(image, "jpg", new File(outputFolder.getAbsolutePath() + File.separator + (i + 1) + ".jpg"));
}
} finally {
if (doc != null) {
doc.close();
}
}
}
public static void main(String[] args) {
File inputFile = new File(ExtractionPDFtoThumbImgs.filePath);
File outputFolder = new File(ExtractionPDFtoThumbImgs.outputFilePath);
if(!outputFolder.exists()){
outputFolder.mkdirs();
}
try {
ExtractionPDFtoThumbImgs.change(inputFile, outputFolder);
} catch (IOException e) {
e.printStackTrace();
}
}
}
Above code extract images in PDF page. not convert image in PDF page(included text).
Are there converting tools (PDF page to image) or Converting PDFBox class?
Please Suggest how to get image of a PDF page(included text). not to get images in a PDF page.

Try pdftocairo, it's part of poppler.
I was using imagemagick to convert PDF to images, but it relies on Ghostscript which is sometimes picky about the PDF you feed it so it was hit or miss...
So far pdftocairo has been solid.
http://poppler.freedesktop.org

How do you access an attachment stored as MIME Part?

It seems to me there are two ways to store an attachment in a NotesDocument.
Either as a RichTextField or as a "MIME Part".
If they are stored as RichText you can do stuff like:
document.getAttachment(fileName)
That does not seem to work for an attachment stored as a MIME Part. See screenshot
I have thousands of documents like this in the backend. This is NOT a UI issue where I need to use the file Download control of XPages.
Each document as only 1 attachment. An Image. A JPG file. I have 3 databases for different sizes. Original, Large, and Small. Originally I created everything from documents that had the attachment stored as RichText. But my code saved them as MIME Part. that's just what it did. Not really my intent.
What happened is I lost some of my "Small" pictures so I need to rebuild them from the Original pictures that are now stored as MIME Part. So my ultimate goal is to get it from the NotesDocument into a Java Buffered Image.
I think I have the code to do what I want but I just "simply" can't figure out how to get the attachment off the document and then into a Java Buffered Image.
Below is some rough code I'm working with. My goal is to pass in the document with the original picture. I already have the fileName because I stored that out in metaData. But I don't know how to get that from the document itself. And I'm passing in "Small" to create the Small image.
I think I just don't know how to work with attachments stored in this manner.
Any ideas/advice would be appreciated! Thanks!!!
public Document processImage(Document inputDoc, String fileName, String size) throws IOException {
// fileName is the name of the attachment on the document
// The goal is to return a NEW BLANK document with the image on it
// The Calling code can then deal with keys and meta data.
// size is "Original", "Large" or "Small"
System.out.println("Processing Image, Size = " + size);
//System.out.println("Filename = " + fileName);
boolean result = false;
Session session = Factory.getSession();
Database db = session.getCurrentDatabase();
session.setConvertMime(true);
BufferedImage img;
BufferedImage convertedImage = null; // the output image
EmbeddedObject image = null;
InputStream imageStream = null;
int currentSize = 0;
int newWidth = 0;
String currentName = "";
try {
// Get the Embedded Object
image = inputDoc.getAttachment(fileName);
System.out.println("Input Form : " + inputDoc.getItemValueString("form"));
if (null == image) {
System.out.println("ALERT - IMAGE IS NULL");
}
currentSize = image.getFileSize();
currentName = image.getName();
// Get a Stream of the Imahe
imageStream = image.getInputStream();
img = ImageIO.read(imageStream); // this is the buffered image we'll work with
imageStream.close();
Document newDoc = db.createDocument();
// Remember this is a BLANK document. The calling code needs to set the form
if ("original".equalsIgnoreCase(size)) {
this.attachImage(newDoc, img, fileName, "JPG");
return newDoc;
}
if ("Large".equalsIgnoreCase(size)) {
// Now we need to convert the LARGE image
// We're assuming FIXED HEIGHT of 600px
newWidth = this.getNewWidth(img.getHeight(), img.getWidth(), 600);
convertedImage = this.getScaledInstance(img, newWidth, 600, false);
this.attachImage(newDoc, img, fileName, "JPG");
return newDoc;
}
if ("Small".equalsIgnoreCase(size)) {
System.out.println("converting Small");
newWidth = this.getNewWidth(img.getHeight(), img.getWidth(), 240);
convertedImage = this.getScaledInstance(img, newWidth, 240, false);
this.attachImage(newDoc, img, fileName, "JPG");
System.out.println("End Converting Small");
return newDoc;
}
return newDoc;
} catch (Exception e) {
// HANDLE EXCEPTION HERE
// SAMLPLE WRITE TO LOG.NSF
System.out.println("****************");
System.out.println("EXCEPTION IN processImage()");
System.out.println("****************");
System.out.println("picName: " + fileName);
e.printStackTrace();
return null;
} finally {
if (null != imageStream) {
imageStream.close();
}
if (null != image) {
LibraryUtils.incinerate(image);
}
}
}

I believe it will be some variation of the following code snippet. You might have to change which mimeentity has the content so it might be in the parent or another child depending.
Stream stream = session.createStream();
doc.getMIMEEntity().getFirstChildEntity().getContentAsBytes(stream);
ByteArrayInputStream bais = new ByteArrayInputStream(stream.read());
return ImageIO.read(bais);
EDIT:
session.setConvertMime(false);
Stream stream = session.createStream();
Item itm = doc.getFirstItem("ParentEntity");
MIMEEntity me = itm.getMIMEEntity();
MIMEEntity childEntity = me.getFirstChildEntity();
childEntity.getContentAsBytes(stream);
ByteArrayOutputStream bo = new ByteArrayOutputStream();
stream.getContents(bo);
byte[] mybytearray = bo.toByteArray();
ByteArrayInputStream bais = new ByteArrayInputStream(mybytearray);
return ImageIO.read(bais);

David have a look at DominoDocument,http://public.dhe.ibm.com/software/dw/lotus/Domino-Designer/JavaDocs/XPagesExtAPI/8.5.2/com/ibm/xsp/model/domino/wrapped/DominoDocument.html
There you can wrap every Notes document
In the DominoDocument, there such as DominoDocument.AttachmentValueHolder where you can access the attachments.
I have explained it at Engage. It very powerful
http://www.slideshare.net/flinden68/engage-use-notes-objects-in-memory-and-other-useful-java-tips-for-x-pages-development

Can I tell what the file type of a BufferedImage originally was?

In my code, I have a BufferedImage that was loaded with the ImageIO class like so:
BufferedImage image = ImageIO.read(new File (filePath);
Later on, I want to save it to a byte array, but the ImageIO.write method requires me to pick either a GIF, PNG, or JPG format to write my image as (as described in the tutorial here).
I want to pick the same file type as the original image. If the image was originally a GIF, I don't want the extra overhead of saving it as a PNG. But if the image was originally a PNG, I don't want to lose translucency and such by saving it as a JPG or GIF. Is there a way that I can determine from the BufferedImage what the original file format was?
I'm aware that I could simply parse the file path when I load the image to find the extension and just save it for later, but I'd ideally like a way to do it straight from the BufferedImage.

As #JarrodRoberson says, the BufferedImage has no "format" (i.e. no file format, it does have one of several pixel formats, or pixel "layouts"). I don't know Apache Tika, but I guess his solution would also work.
However, if you prefer using only ImageIO and not adding new dependencies to your project, you could write something like:
ImageInputStream input = ImageIO.createImageInputStream(new File(filePath));
try {
Iterator<ImageReader> readers = ImageIO.getImageReaders(input);
if (readers.hasNext()) {
ImageReader reader = readers.next();
try {
reader.setInput(input);
BufferedImage image = reader.read(0); // Read the same image as ImageIO.read
// Do stuff with image...
// When done, either (1):
String format = reader.getFormatName(); // Get the format name for use later
if (!ImageIO.write(image, format, outputFileOrStream)) {
// ...handle not written
}
// (case 1 done)
// ...or (2):
ImageWriter writer = ImageIO.getImageWriter(reader); // Get best suitable writer
try {
ImageOutputStream output = ImageIO.createImageOutputStream(outputFileOrStream);
try {
writer.setOutput(output);
writer.write(image);
}
finally {
output.close();
}
}
finally {
writer.dispose();
}
// (case 2 done)
}
finally {
reader.dispose();
}
}
}
finally {
input.close();
}

BufferedImage does not have a "format"
Once the bytes have been translated into a BufferedImage the format of the source file is completely lost, the contents represent a raw byte array of the pixel information nothing more.
Solution
You should use the Tika library to determine the format from the original byte stream before the BufferedImage is created and not rely on file extensions which can be inaccurate.

One could encapsulate the BufferedImage and related data in class instance(s) like so:
final public class TGImage
{
public String naam;
public String filename;
public String extension;
public int layerIndex;
public Double scaleX;
public Double scaleY;
public Double rotation;
public String status;
public boolean excluded;
public BufferedImage image;
public ArrayList<String> history = new ArrayList<>(5);
public TGImage()
{
naam = "noname";
filename = "";
extension ="";
image = null;
scaleX = 0.0;
scaleY = 0.0;
rotation = 0.0;
status = "OK";
excluded = false;
layerIndex = 0;
addHistory("Created");
}
final public void addHistory(String str)
{
history.add(TGUtil.getCurrentTimeStampAsString() + " " + str);
}
}
and then use it like this:
public TGImage loadImage()
{
TGImage imgdat = new TGImage();
final JFileChooser fc = new JFileChooser();
FileNameExtensionFilter filter = new FileNameExtensionFilter("Image Files", "jpg", "png", "gif", "tif");
fc.setFileFilter(filter);
fc.setCurrentDirectory(new File(System.getProperty("user.home")));
int result = fc.showOpenDialog(this); // show file chooser
if (result == JFileChooser.APPROVE_OPTION)
{
File file = fc.getSelectedFile();
System.out.println("Selected file extension is " + TGUtil.getFileExtension(file));
if (TGUtil.isAnImageFile(file))
{
//System.out.println("This is an Image File.");
try
{
imgdat.image = ImageIO.read(file);
imgdat.filename = file.getName();
imgdat.extension = TGUtil.getFileExtension(file);
info("image has been loaded from file:" + imgdat.filename);
} catch (IOException ex)
{
Logger.getLogger(TGImgPanel.class.getName()).log(Level.SEVERE, null, ex);
imgdat.image = null;
info("File not loaded IOexception: img is null");
}
} else
{
imgdat = null;
info("File not loaded: The requested file is not an image File.");
}
}
return imgdat;
}
Then you have everything relevant together in TGImage instance(s).
and perhaps use it in an imagelist like so:
ArrayList<TGImage> images = new ArrayList<>(5);

In Java is it possible to convert a BufferedImage to an IMG Data URI?

I have created a graphical image with the following sample code.
BufferedImage bi = new BufferedImage(50,50,BufferedImage.TYPE_BYTE_BINARY);
Graphics2D g2d = bi.createGraphics();
// Draw graphics.
g2d.dispose();
// BufferedImage now has my image I want.
At this point I have BufferedImage which I want to convert into an IMG Data URI. Is this possible? For example..
<IMG SRC="data:image/png;base64,[BufferedImage data here]"/>

Not tested, but something like this ought to do it:
ByteArrayOutputStream out = new ByteArrayOutputStream();
ImageIO.write(bi, "PNG", out);
byte[] bytes = out.toByteArray();
String base64bytes = Base64.encode(bytes);
String src = "data:image/png;base64," + base64bytes;
There are lots of different base64 codec implementations for Java. I've had good results with MigBase64.

You could use this solution which doesn't use any external libraries. Short and clean! It uses a Java 6 library (DatatypeConverter). Worked for me!
ByteArrayOutputStream output = new ByteArrayOutputStream();
ImageIO.write(image, "png", output);
DatatypeConverter.printBase64Binary(output.toByteArray());

I use Webdriver, get captcha, like this below:
// formatName -> png
// pathname -> C:/Users/n/Desktop/tmp/test.png
public static String getScreenshot(WebDriver driver, String formatName, String pathname) {
try {
WebElement element = driver.findElement(By.xpath("//*[#id=\"imageCodeDisplayId\"]"));
File screenshot = element.getScreenshotAs(OutputType.FILE);
// base64 data
String base64Str = ImageUtil.getScreenshot(screenshot.toString());
return base64Str;
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
public static String getScreenshot(String imgFile) {
InputStream in;
byte[] data = null;
try {
in = new FileInputStream(imgFile);
data = new byte[in.available()];
in.read(data);
in.close();
} catch (IOException e) {
e.printStackTrace();
}
String base64Str = new String(Base64.getEncoder().encode(data));
if (StringUtils.isAnyBlank(base64Str)) {
return null;
}
if (!base64Str.startsWith("data:image/")) {
base64Str = "data:image/jpeg;base64," + base64Str;
}
return base64Str;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Get Image from the document using Apache POI - java

Related

Convert PDF to JPG2000 file(s)

How to get image of a PDF page(included text). not images in a PDF page

How do you access an attachment stored as MIME Part?

Can I tell what the file type of a BufferedImage originally was?

In Java is it possible to convert a BufferedImage to an IMG Data URI?

Categories

Resources