Changing word doc into pdf file - java

Is there a way to convert .doc file to .pdf keeping the format same as doc file which can also include images?
I am able to generate PDF file from doc but only the text appears.

You can use a library based on Open-Office.
It allows to convert from (and to) all the formats supported by OpenOffice.
Moreover, if your doc is read correctly by OpenOffice, it should be converted exactly as you see.
I know JOD Converter for exemple :
File inputFile = new File("document.doc");
File outputFile = new File("document.pdf");
// connect to an OpenOffice.org instance running on port 8100
OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);
connection.connect();
// convert
DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
converter.convert(inputFile, outputFile);
// close the connection
connection.disconnect();
You can also use a simple command line (with oo installed) :
#!/bin/sh
DIR=$(pwd)
DOC=$DIR/$1
echo "Doc to convert : $DOC"
/user/bin/oowriter-invisible "macro://Standard.Module1.ConvertWordToPDF($DOC)"

You can use Apache POI to read the doc file and then Apache PDFBox to write the pdf file.

You can use Openoffice Macro for export doc as pdf like,
sub Docaspdf
rem ----------------------------------------------------------------------
rem define variables
dim document as object
dim dispatcher as object
rem ----------------------------------------------------------------------
rem get access to the document
document = ThisComponent.CurrentController.Frame
dispatcher = createUnoService("com.sun.star.frame.DispatchHelper")
rem ----------------------------------------------------------------------
dim args1(2) as new com.sun.star.beans.PropertyValue
args1(0).Name = "URL"
args1(0).Value = "file:///C:/doc.pdf"
args1(1).Name = "FilterName"
args1(1).Value = "writer_pdf_Export"
args1(2).Name = "FilterData"
args1(2).Value = Array(Array("UseLosslessCompression",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("Quality",0,90,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("ReduceImageResolution",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("MaxImageResolution",0,300,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("UseTaggedPDF",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("SelectPdfVersion",0,0,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("ExportNotes",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("ExportBookmarks",0,true,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("OpenBookmarkLevels",0,-1,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("UseTransitionEffects",0,true,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("IsSkipEmptyPages",0,true,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("IsAddStream",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("EmbedStandardFonts",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("FormsType",0,0,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("ExportFormFields",0,true,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("AllowDuplicateFieldNames",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("HideViewerToolbar",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("HideViewerMenubar",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("HideViewerWindowControls",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("ResizeWindowToInitialPage",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("CenterWindow",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("OpenInFullScreenMode",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("DisplayPDFDocumentTitle",0,true,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("InitialView",0,0,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("Magnification",0,0,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("Zoom",0,100,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("PageLayout",0,0,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("FirstPageOnLeft",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("InitialPage",0,1,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("Printing",0,2,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("Changes",0,4,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("EnableCopyingOfContent",0,true,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("EnableTextAccessForAccessibilityTools",0,true,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("ExportLinksRelativeFsys",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("PDFViewSelection",0,0,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("ConvertOOoTargetToPDFTarget",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("ExportBookmarksToPDFDestination",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("_OkButtonString",0,"",com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("EncryptFile",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("PreparedPasswords",0,,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("RestrictPermissions",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("PreparedPermissionPassword",0,Array(),com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("",0,,com.sun.star.beans.PropertyState.DIRECT_VALUE))
dispatcher.executeDispatch(document, ".uno:ExportToPDF", "", 0, args1())
end sub

import officetools.OfficeFile;
FileInputStream(new File("test.doc"));
FileOutputStream fos = new FileOutputStream(new File("test.pdf")); /
OfficeFile f = new OfficeFile(fis,"localhost","8100", false);
convert to pdf
f.convert(fos,"pdf");

You may use Aspose.Words for Java to convert Doc files to PDF. This component retains the format of the word document when converted to PDF. It also converts the images along with text.
Disclosure: I work as developer evangelist at Aspose.

Related

How to convert docx to PDF without split tables

I have dynamics docx with few tables and I'm trying to convert to a PDF. When I converted to PDF then it covers two pages. I use Apache POI XWPF converter in 2.0.2 version.
In docx file everything is okey but when I convert to PDF then tables are spited
Someone have any idea or better library to convert docx to pdf?
PdfOptions options = PdfOptions.getDefault();
options.fontProvider((familyName, encoding, size, style, color) -> {
try {
BaseFont baseFont = BaseFont.createFont("fonts/times.ttf", encoding, BaseFont.EMBEDDED);
return new Font(baseFont, size, style, color);
} catch (Exception e) {
throw new RuntimeException(e);
}
});
PdfConverter.getInstance().convert(document, out, options);
There is no library to convert a doc[x] file into a completely correctly formatted PDF. The only program that can do that is Word itself.
I have achieved this by using the Word API in a PowerShell script:
$document_path = $args[0]
$document_parent_folder = $args[1]
$file_name = $args[2]
$word_app = New-Object -ComObject Word.Application
$document = $word_app.Documents.Open($document_path)
$pdf_filename = "$($document_parent_folder)\$($file_name)"
$document.SaveAs([ref] $pdf_filename, [ref] 17)
$document.Close()
$word_app.Quit()
Yes it is not the best solution and it is heavily dependent on having Microsoft Office installed in the machine and a lot of other problems that accompany this solution... But it is the only solution that formatted my documents exactly how I wanted them.
The script takes three arguments
The path of the document that will be converted
The folder where it is located
The name of the pdf file

GIF image only partially displayed

I got a strange issue with a GIF image in Java. The image is provided by an XML API as Base64 encoded string. To decode the Base64, I use the commons-codec library in version 1.13.
When I just decode the Base64 string and write the bytes out to a file, the image shows properly in browsers and MS Paint (nothing else to test here).
final String base64Gif = "[Base64 as provided by API]";
final byte[] sigImg = Base64.decodeBase64(base64Gif);
File sigGif = new File("C:/Temp/pod_1Z12345E5991872040.org.gif");
try (FileOutputStream fos = new FileOutputStream()) {
fos.write(sigImg);
fos.flush();
}
The resulting file opened in MS Paint:
But when I now start consuming this file using Java (for example creating a PDF document from HTML using the openhtmltopdf library), it is corrupted and does not show properly.
final String htmlLetterStr = "[HTML as provided by API]";
final Document doc = Jsoup.parse(htmlLetterStr);
try (FileOutputStream fos = new FileOutputStream(new File("C:/Temp/letter_1Z12345E5991872040.pdf"))) {
PdfRendererBuilder builder = new PdfRendererBuilder();
builder.useFastMode();
builder.withW3cDocument(new W3CDom().fromJsoup(doc), "file:///C:/Temp/");
builder.toStream(fos);
builder.useDefaultPageSize(210, 297, BaseRendererBuilder.PageSizeUnits.MM);
builder.run();
fos.flush();
}
When I now open the resulting PDF, the image created above looks like this. It seems that only the first pixel lines are printed, some layer is missing, or something like that.
The same happens, if I read the image again with ImageIO and try to convert it into PNG. The resulting PNG looks exactly the same as the image printed in the PDF document.
How can I get the image to display properly in the PDF document?
Edit:
Link to original GIF Base64 as provided by API: https://pastebin.com/sYJv6j0h
As #haraldK pointed out in the comments, the GIF file provided via the XML API does not conform to the GIF standard and thus cannot be parsed by Java's ImageIO API.
Since there does not seem to exist a pure Java tool to repair the file, the workaround I came up with now is to use ImageMagick via Java's Process API. Calling the convert command with the -coalesce option will parse the broken GIF and create a new one that does conform to the GIF standard.
// Decode broken GIF image and write to disk
final String base64Gif = "[Base64 as provided by API]";
final byte[] sigImg = Base64.decodeBase64(base64Gif);
Path gifPath = Paths.get("C:/Temp/pod_1Z12345E5991872040.tmp.gif");
if (!Files.exists(gifPath)) {
Files.createFile(gifPath);
}
Files.write(gifPath, sigImg, StandardOpenOption.WRITE, StandardOpenOption.TRUNCATE_EXISTING);
// Use the Java Process API to call ImageMagick (on Linux you would use the 'convert' binary)
ProcessBuilder procBuild = new ProcessBuilder();
procBuild.command("C:\\Program Files\\ImageMagick-7.0.9-Q16\\magick.exe", "C:\\Temp\\pod_1Z12345E5991872040.tmp.gif", "-coalesce", "C:\\Temp\\pod_1Z12345E5991872040.gif");
Process proc = procBuild.start();
// Wait for ImageMagick to complete its work
proc.waitFor();
The newly created file can be read by Java's ImageIO API and be used as expected.

How to convert a MS Word doc containing UTF-8 characters to PDF with Apache POI?

For some reason I cannot get the PdfConverter from Apache POI to convert my MS Word document properly on a Linux machine. On Windows and MacOS it seems to work fine but whenever I try it on a Linux machine it basically doesn't convert the UTF-8 characters. I also tried to configure the fontEncoding option which can be passed to the PdfConverter but that doesn't seem to help.
final InputStream in = new FileInputStream(new File("src/test/resources/SOMEDOC.docx"));
final XWPFDocument document = new XWPFDocument(in);
final OutputStream out = new FileOutputStream(new File("target/test.pdf"));
final PdfOptions options = PdfOptions.getDefault();
// This actually breaks the whole conversion. No text will be displayed if you set this font encoding option to UTF-8
options.fontEncoding("UTF-8");
PdfConverter.getInstance().convert(document, out, options);
Does anybody know what I am doing wrong here?

I can't import com.itextpdf.text.Document class

I'm building an android app and I want to use iText for creating pdf file, but I can't use Document class. As I seen in tutorials, there should be import com.itextpdf.text.Document for using Document class. For this app, I'm using com.itextpdf:itext-pdfa:5.5.9 library. I want to create a simple pdf file with 2 paragraphs, something like this:
try{
File pdfFolder = new File(Environment.getExternalStoragePublicDirectory(
Environment.DIRECTORY_DOCUMENTS), "pdfdemo");
if (!pdfFolder.exists()) {
pdfFolder.mkdir();
}
Date date = new Date() ;
String timeStamp = new SimpleDateFormat("yyyyMMdd_HHmmss").format(date);
File myFile = new File(pdfFolder + timeStamp + ".pdf");
OutputStream output = new FileOutputStream(myFile);
Document document = new Document();
PdfAWriter.getInstance(document, output);
document.open();
document.add(new Paragraph(mSubjectEditText.getText().toString()));
document.add(new Paragraph(mBodyEditText.getText().toString()));
document.close();
}catch (Exception e) {}
'
Could anyone help me with this problem? What am I doing wrong?
You say:
I'm using com.itextpdf:itext-pdfa:5.5.9 library
That is wrong for two reasons:
itext-pdfa is an addon to iText that is meant for writing or manipulating PDF/A documents. It requires the core iText libary. Read about the different parts of iText on the official web site: https://developers.itextpdf.com/itext-java
You say you want to use iText on Android, but you are referring to iText for Java. iText for Java contains classes that are not allowed on Android (java.awt.*, javax.nio,...). You should use the Android port for iText, which is called iTextG: https://developers.itextpdf.com/itextg-android
It's as if you're using iText without having visited the official iText web site. How is that even possible?
Just open your app level gradle file and add following code into your dependencies
implementation 'com.itextpdf:itext-pdfa:5.5.9'
It works for me

Java byteArray[] to docx

doc file in byte[] type.
Is it possible to convert it from byte[] into .docx file.
tried just change file extension programilly but it does not work.
any suggestions?
I generate report using BiRT eclipse
code of saving doc:
options = new RenderOptionBase();
ByteArrayOutputStream bos = new ByteArrayOutputStream();
options.setOutputStream(bos);
options.setOutputFormat("doc");
if(parameters != null){
task.setParameterValues(parameters);
}
task.setRenderOption(options);
task.run();
return bos.toByteArray()
//IRunAndRenderTask task;
problem is that we use BIRT 3.7 which not support DocxRenderOption
Take a look at Aspose.Words for Java -- http://www.aspose.com/java/word-component.aspx
It has really good doc too -- http://www.aspose.com/docs/display/wordsjava/load+or+create+a+document
Code will be as simple as
// Open a document.
Document doc = new Document("input.doc");
// Save document.
doc.save("output.docx");
Step1: save the doc file
Step2: using this lib convert the file and save as docx file.

Categories