I'm trying to find a way to copy the whole slide, and paste it with the formatting of 'as an image' to a blank slide, using POI's APIs.
The reason why I'm trying to do this is because I want to save each slide as an image.
The POI's Slide.draw() API by itself does not do a very good job of saving the slides' contents to images.
For example it cannot draw the following two types of objects:
Tables created on PowerPoint
Charts pasted from Excel
Is there any way to 'copy' and 'paste with formatting (as an image)' via POI, just like you do these two operations on MS PowerPoint running on a Windows, in order to save the slides as they are?
Since Slide.draw() works just fine with images, once I paste all the objects as a single flat image onto a slide (using POI), I'll be good to go.
Or if there is a better way to save the slides contents as intact as possible, could you please let me know?
The license which comes along with the method needs to be Apache license or otherwise something more permissive.
Also, I read the following post:
Programmatically extracting slides as images from a PowerPoint presentation (.PPT)
unoconv is GPL-licensed, so it is not an option for us.
JODConverter is LGPL-licensed; I'm not sure if it's acceptable so I will talk to my boss and check.
Then I ran the POI as a command line tool, as suggested by Michael,
but I ended up getting the same problem (tables and pasted Excel charts
do not show up in the saved images.)
Thanks.
Related
I am working on a Spring REST service based web application (UI is based on HTML5, backbone.js). The actual requirement is, an uploaded document (could be any document like excel, word, ppt, pdf etc) requires an preview option using which an user can view the document in the browser (user may or may not have office installed).
My idea is to convert the documents into images and display them to the user. On searching, i found multiple ways to convert a PDF to image but not much ODT to image (Note: I am looking for an open source). JODConverter, docx4j can be used to convert the documents to pdf. Then I can convert these PDFs to images. But is this the right way. Is there any other efficient way to achieve the same. Please suggest and point me to the right direction.
Thanks in advance.
Gopi
Yes, you won't do any better than .docx to .pdf to image. You really need a stable workflow, and this is as good as you'll find for this purpose, unless you're running on a Microsoft server and you have access to the official Microsoft Office stuff.
For previews, docx4j or similar will do just fine. Not everything converts perfectly, but it should be fine for a preview.
I am using the Apache POI library to create powerpoint slides with Java.
Our client is interested in embedded text, images and videos. No fancy
stuff like charts etc. is needed for now. I understand that XSLF is still
under development and not yet a mature product.
I have achieved my target using Apache POI HSLF model but the only thing it is missing is that videos which are embedded doesn't show up any playback controls. After little researching I found that it is the pptx and ppt file standards which are making the things different. So now to solve this issue I am migrating from HSLF to XSLF. But unfortunately XSLF library doesn't have any method to add video file (unlike HSLF addmovie method).
What method you guys recomend ? Is there any other way to show the playback controls on ppt files(and not pptx)?I mean by additional activex control/mediaplayer. If yes how should it be done using Java ?
Beginning from Powerpoint 2010 it's possible to embed videos in PPTX-files (... instead of linking them or using some kind of ActiveX/youtube combo). If you embed MP4-videos you need to have the Quicktime plugin installed.
Regarding the playback controls, my PP 2010 viewer displays them when you move the mouse over the video shape. Sometimes they never show up again, when you click straight into the image instead of waiting for the popup.
The following code ...
fetches a MPEG (could be also a local file)
creates a snapshot of the frame on the 5th second, which is used as the preview image. I've used the Xuggle libs here, but of course any other libs are ok too (... plain JMF (without extension pack) couldn't handle (this) MPEGs)
embeds image and video
and adds some arbitrary ;) stuff, which PP needs to actually play the video
The code is in the XSLF examples.
(Update 2016-02-06: moved the code to POI examples, so there's only one place to be modifed in case of new features. Furthermore there was a regression in POI 3.13 making it impossible to add pictures after adding movies to the media directory - this is fixed in the upcoming POI 3.14)
Dear Stackoverflowers,
Outputting an Excel spreadsheet with images is a requirement of a project I'm doing. I've done a little research and found the following (perhaps incorrect) consensus :
various python libs for creating excels sheets work well
it is possible to insert images (but only in bmp)
the "internal format" of images used in excel files, is complicated, which may be why there is no 3rd party library support for inserting normal formats like jpeg.
I don't want to use or convert to bmp. Why? BMP are not compressed well, and these will be big sheets, so I want to mitigate the size impact of images (1 per row) as much as possible.
My ideal answer comes from someone who has actually done this. The method suggested can be in Java,Ruby,Python,(but not .NET) or some other creative way of doing it.
I'm really hoping someone out there has a solution, as I anticipate this could be a tricky area (similar in complexity to playing around with PDFs, perhaps).
The Perl module Excel::Writer::XLSX can insert JPEG, PNG, and BMP images into a new Excel workbook.
I am currently porting it to a Python module called XlsxWriter and the inset_image() function is near the top of the TODO list.
Update: As of version 0.1.6 of XlsxWriter it is now possible to a add PNG/JPEG images. See the example in the documentation.
As said in the comment above, Apache POI can solve your problem.
I did a little research and this example should be useful Apache POI Excel Insert an Image
I'm using PDFBox to extract the outline (bookmarks) information from PDF files, that's even explained in the same site.
However, I've had problems not extracting but generating the qualified urls (foo.pdf#page=22777&zoom=2,2,777) to open the PDF in those bookmarks. Sometimes PDFBox is not able to find the page in which the bookmark is placed (i.e. the page number, left coordinate or top coordinate are wrong.)
Anyone knows a PDF library capable to do this (preferably in Java)? Thanks.
Best regards,
Alexander.
iText (http://itextpdf.com) might work for you.
I've used it mostly to create PDFs (not so much with parsing already exitingones), but the library is good, and does have objects related to outlines and bookmarks.
Is there any free Java library for extracting text from PDF, that is compatible with Google Application Engine?
I've read about PDFJet, but it can't read PDF, can it?
Is there perhaps other way how to extract text from PDF? I tried http://www.pdfdownload.org/, unfortunately they don't handle non-English characters correctly.
iText now has a text parsing module (I'm one of the parser authors). See the com.itextpdf.text.pdf.parser.PdfContentReaderTool class for an example of how to use it.
PdfBox does not run on GAE. It uses not-allowed java classes.
(GAE only permits these http://code.google.com/appengine/docs/java/jrewhitelist.html)
I have partially modified a very old version of PdfBox (0.7.3) to be GAE complaiant. Now I'm able to extract text from PDF (whole page or rectangular area). I only modified a minumum part of the pdf text extraction and not the whole PdfBox. :)
The idea was to remove refences to java.awt.retangle & C. using my own "rectangle" class.
More info: http://fhtino.blogspot.com/2010/04/pdfbox-text-extration-gae.html
I modified the latest (1.8.0-Snapshot) version to run on Google AppEngine. Had to disable one Unit-Test, but it runs fine for simple text extraction.
Following the simple try-fail-fix approach i had to modify 5 files in total. Pretty doable.
You'll also have to explicitly use a RandomAccessBuffer, like Fabrizio explained.
For the extra lazy, heres the compiled jar, dependencies for text extraction, and the patch. Note that it might not work for every usecase (i.e. rectangle based extraction). Used it to extract text of a whole page.
https://docs.google.com/folder/d/0B53n_gP2oU6iVjhOOVBNZHk0a0E/edit
I know there is http://pdfbox.apache.org/index.html
Apache PDFBox is an open source Java
PDF library for working with PDF
documents. This project allows
creation of new PDF documents,
manipulation of existing documents and
the ability to extract content from
documents.
but I've never tested it.
Last month, I'd just finished extracting text from pdf file in my project. I used XPDF tool for getting text, and text coordinates, but I used it in Xcode (Objective-C). This tool was open source, written by C++, and able to be encoded in many language. However, I didn't know whether XPdf would be work on your java, or not. Anyway, You can try this tool.