Error convert docx to pdf in java

Error convert docx to pdf in java - java

Good afternoon all,
Come to my case, I'm generating a docx document Junction 2 other docx, I'm doing a merge.
public static void main(String[] args) throws Exception {
InputStream in1 = new FileInputStream(new File("C:\\Clientes\\Constremac\\Repositorio_DOCS\\UPLOAD\\LAYOUT_PAGINA_VERSAO_FINAL.docx"));
InputStream in2 = new FileInputStream(new File("C:\\Clientes\\Constremac\\Repositorio_DOCS\\UPLOAD\\modeloContratoSocial.docx"));
OutputStream out = new FileOutputStream(new File("C:\\Clientes\\Constremac\\Repositorio_DOCS\\UPLOAD\\modeloContratoSocialMerge.docx"));
mergeDocx(in1,in2,out);
}
public static void mergeDocx(InputStream s1, InputStream s2, OutputStream os) throws Exception {
WordprocessingMLPackage target = WordprocessingMLPackage.load(s1);
insertDocx(target.getMainDocumentPart(), IOUtils.toByteArray(s2));
SaveToZipFile saver = new SaveToZipFile(target);
saver.save(os);
}
private static void insertDocx(MainDocumentPart main, byte[] bytes) throws Exception {
AlternativeFormatInputPart afiPart = new AlternativeFormatInputPart(new PartName("/part" + (chunk++) + ".docx"));
afiPart.setContentType(new ContentType(CONTENT_TYPE));
afiPart.setBinaryData(bytes);
Relationship altChunkRel = main.addTargetPart(afiPart);
//convertAltChunks()
CTAltChunk chunk = Context.getWmlObjectFactory().createCTAltChunk();
chunk.setId(altChunkRel.getId());
main.addObject(chunk);
}
My final document (docx) is ok, I can open it normally. The problem occurs when I will convert this generated file to PDF, the following error appears: NOT IMPLEMENTED: support for w: altChunk -.
public boolean createPDF(String nomeArquivo) {
try {
long start = System.currentTimeMillis();
Configuration confg = new Configuration();
System.out.println(Configuration.repositorioUpload + nomeArquivo + ".docx");
InputStream is = new FileInputStream(new File(Configuration.repositorioUpload + nomeArquivo + ".docx"));
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(is);
PdfSettings pdfSettings = new PdfSettings();
OutputStream out = new FileOutputStream(new File(Configuration.repositorioUpload + nomeArquivo + ".pdf"));
PdfConversion converter = new Conversion(wordMLPackage);
converter.output(out, pdfSettings);
System.err.println("Generate " + Configuration.repositorioUpload + nomeArquivo + ".pdf" + " with " + (
System.currentTimeMillis() - start) + "ms");
}
catch (Throwable e) {
e.printStackTrace();
}
return false;
}
I'm sending the java code i use, for a while I'm trying to generate this pdf, if anyone able to help me I am grateful.
Thank you all.
Hugs!
I found a way to use AltChunck, but even beyond not run correctly merge the images footer and header when exported to PDF does not appear.
public static void main(String[] args) throws Exception {
boolean ADD_TO_HEADER = true;
HeaderPart hp = null;
String inputfilepath = "C:\\Clientes\\Constremac\\Repositorio_DOCS\\UPLOAD\\default_template.xml";
String chunkPath = "C:\\Clientes\\Constremac\\Repositorio_DOCS\\UPLOAD\\sample.docx";
boolean save = true;
String outputfilepath = "C:\\Clientes\\Constremac\\Repositorio_DOCS\\UPLOAD\\altChunk_out.docx";
// Open a document from the file system
// 1. Load the Package
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new java.io.File(inputfilepath));
//proce
MainDocumentPart main = wordMLPackage.getMainDocumentPart();
if (ADD_TO_HEADER) {
hp = wordMLPackage.getDocumentModel().getSections().get(0).getHeaderFooterPolicy().getDefaultHeader();
}
AlternativeFormatInputPart afiPart = new AlternativeFormatInputPart(new PartName("/chunk.docx"));
afiPart.setBinaryData(new FileInputStream(chunkPath));
afiPart.setContentType(new ContentType("application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml")); //docx
//afiPart.setContentType(new ContentType("application/xhtml+xml")); //xhtml
Relationship altChunkRel = null;
if (ADD_TO_HEADER) {
altChunkRel = hp.addTargetPart(afiPart);
} else {
altChunkRel = main.addTargetPart(afiPart);
}
CTAltChunk ac = Context.getWmlObjectFactory().createCTAltChunk();
ac.setId(altChunkRel.getId());
if (ADD_TO_HEADER) {
hp.getJaxbElement().getEGBlockLevelElts().add(ac);
} else {
main.addObject(ac);
}
// Save it
if (save) {
SaveToZipFile saver = new SaveToZipFile(wordMLPackage);
saver.save(outputfilepath);
System.out.println("Saved " + outputfilepath);
}
}
What am I doing wrong?

An altChunk is not "real" docx content.
Before it can be outputted in PDF, it needs to be replaced with normal WordML paragraphs, tables etc.
You can try doing this yourself, which is easy enough if the content does not include any relationships (images, hyperlinks etc), or conflicting styles or numbering. Please see further http://www.docx4java.org/blog/2010/11/merging-word-documents/ .. or my company's website plutext.com

This can be solved
An altChunk is not "real" docx content.
using java we can convert altchunk to original content word tags,
convert the document.xml inside docx
Docx4jProperties.setProperty(“docx4j.Convert.Out.HTML.OutputMethodXML”,
true);
Docx4J.toHTML(htmlSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);
Open the link for complete code.
[Convert AltChunk To Original Content][1]
https://kishankichi.wordpress.com/2016/05/26/convert-altchunk-to-original-content-or-convert-to-real-docx-format-using-java
https://kishankichi.wordpress.com/2016/05/26/convert-altchunk-to-original-content-or-convert-to-real-docx-format-using-java/
Note:
Kindly ignore &nbsp and other such tags in your html content.
I have checked only for &nbsp.
Thanks for the replay...

Related

How to replace DataXML from Slide Diagram in Powerpoint using Apache POI

i want to replace the one data.xml file of power point presentation in java using apache API with other file data.xml
For the reference i want to replace the following file with another power point file.
Following is the code i have tried but xml isnt replacing. I have different XML for both files every time i run after replacing using this code
public static void main(String[] args) {
// TODO Auto-generated method stub
final String filename = "C:/Users/skhan/Desktop/game.pptx";
final String filename1 = "C:/Users/skhan/Desktop/globe.pptx";
try {
XMLSlideShow ppt = new XMLSlideShow(new FileInputStream(filename));
OPCPackage pkg = ppt.getPackage();
PackagePart data = pkg.getPart(
PackagingURIHelper.createPartName("/ppt/diagrams/data1.xml"));
InputStream data1Inp = data.getInputStream();
XMLSlideShow ppt1 = new XMLSlideShow(new FileInputStream(filename1));
OPCPackage pkg1 = ppt1.getPackage();
PackagePart data11 = pkg1.getPart(
PackagingURIHelper.createPartName("/ppt/diagrams/data1.xml"));
InputStream data1Inp1 = data11.getInputStream();
String data1String = GetData(data1Inp);
String data2String = GetData(data1Inp1);
//i want to replace here
PrintStream pr = new PrintStream(data.getOutputStream());
pr.print(data2String);
pr.close();
System.out.println("Completed");
} catch (Exception e) {
e.printStackTrace();
}
}
public static String GetData(InputStream input) throws Exception
{
StringBuilder builder = new StringBuilder();
int ch;
while((ch = input.read()) != -1){
builder.append((char)ch);
}
String theString = builder.toString();
return theString;
}

I added the few line after changing in order to save the file.
The XMLSlideShow must write to some file after changing or adding.
File file =new File(filename);
FileOutputStream out = new FileOutputStream(file);
ppt.write(out);
out.close();

JAVA POI failed to write a large word file

I am using POI to delete "enter" in a .doc file (Blank line).
My code below works correctly when the input file is not large (for example, less than 1MB). However, when I deal with large input.doc that is 4mb, the output.doc is not correctly generated. I can not open the file.
Does anyone have better idea to write the big file correctly? Or, is there any other java code that can delete "enter" in a big .doc file? Thank you very much.
package mydoc;
import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.hwpf.*;
import org.apache.poi.hwpf.usermodel.*;
import java.io.*;
public class test {
/*The ASCII of "Enter" is 13*/
private static final short ENTER_ASCII = 13;
public static void main(String[] args){
/* the location of the input file */
String fileName = "D:\\input.doc";
deleteEnter(fileName);
}
public static void deleteEnter(String fileName){
POIFSFileSystem fs = null;
try{
fs = new POIFSFileSystem(new FileInputStream(fileName));
HWPFDocument doc = new HWPFDocument(fs);
Range range = doc.getRange();
for (int i = 0; i < range.numParagraphs(); i++)
{
if (range.getParagraph(i).text().toCharArray()[0]==ENTER_ASCII)
{
range.getParagraph(i).delete();
}
}
FileOutputStream fos = null;
fos = new FileOutputStream(new File("D:\\output.doc"));
doc.write(fos);
fos.flush();
fos.close();
}//end try
catch (Exception e){
e.printStackTrace();
}//end catch
}
}

Depending on your needs you could even use a macro;
You should even be able to use regex like this: "^13{2,}", but that didn't work for me in Word 2010, see http://social.msdn.microsoft.com/Forums/en-US/0d921f97-b59a-48a9-a01a-20fe72f21c19/how-to-remove-blank-lines-?forum=worddev
Sub RemoveBlankLines()
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^p^p"
.Replacement.Text = "^p"
.MatchWildcards = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
Sub RemoveEnters()
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
'^11 or ^l New line
.Text = "^l"
.Replacement.Text = ""
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
'^13 or ^p Carriage return/paragraph mark
.Text = "^p"
.Replacement.Text = ""
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub

"enter" is the line separator right ? It's platform dependant so I propose the above solution :
String separator = System.getProperty("line.separator")
file = new File(filename);
FileInputStream fis=new FileInputStream(file.getAbsolutePath());
HWPFDocument document=new HWPFDocument(fis);
extractor = new WordExtractor(document);
String [] fileData = extractor.getParagraphText();
for(int i=0;i<fileData.length;i++){
if(fileData[i] != null)
fileData[i] = fileData[i].replace(separator,"");
}
And then you just have to output fileData in a clean doc file.

In need of a clear example on how to get the word count of DOC and DOCX files

I am able to read a DOC file and get its word count, BUT it is wrong.
My code:
public class WordCounter {
public static void main(String[] args) throws Throwable {
processDOC();
}
private static void processDOC() throws Throwable {
File file = new File("/Users/yjiang/Desktop/whatever.doc");
File file2 = new File("/Users/yjiang/Desktop/Test.docx");
File file3 = new File("/Users/yjiang/Desktop/QB Tests 4-14-2014.xls");
File file4 = new File("/Users/yjiang/Desktop/QB Tests 4-14-2014.xlsx");
try {
FileInputStream fs = new FileInputStream(file);
POIFSFileSystem poifsFileSystem = new POIFSFileSystem(fs);
DirectoryEntry directoryEntry = poifsFileSystem.getRoot();
DocumentEntry documentEntry = (DocumentEntry) directoryEntry.getEntry(SummaryInformation.DEFAULT_STREAM_NAME);
DocumentInputStream dis = new DocumentInputStream(documentEntry);
PropertySet ps = new PropertySet(dis);
SummaryInformation si = new SummaryInformation(ps);
System.out.println(si.getWordCount());
} catch (Exception e) {
e.printStackTrace();
}
try {
HWPFDocument hwpfDocument = new HWPFDocument(new FileInputStream(file));
System.out.println(hwpfDocument.getDocProperties().getCWords()); // actually 71 words using word count in MSWord, returned 57.
System.out.println(hwpfDocument.getDocProperties().getCWordsFtnEnd());
XWPFDocument xwpfDocument = new XWPFDocument(new FileInputStream(file2)); // actually 71 words using word count in MSWord, returned 57.
System.out.println(xwpfDocument.getProperties().getExtendedProperties().getUnderlyingProperties().getWords());
System.out.println();
} catch (Exception e) {
e.printStackTrace();
}
}
}
"whatever.doc" has 71 words, when I run this, it returns only 57.
Seems I cannot use the same method to read DOCX files, when I run it I get the following:
org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
Could provide an example?

I've also found that the built-in word counters give strange counts, but text extraction seems to be more reliable, so I use this solution:
public long getWordCount(File file) throws IOException {
POITextExtractor textExtractor;
if (file.getName().endsWith(".docx")) {
XWPFDocument doc = new XWPFDocument(new FileInputStream(file));
textExtractor = new XWPFWordExtractor(doc);
}
else if (file.getName().endsWith(".doc")) {
textExtractor = new WordExtractor(new FileInputStream(file));
}
else {
throw new IllegalArgumentException("Not a MS Word file.");
}
return Arrays.stream(textExtractor.getText().split("\\s+"))
.filter(s -> s.matches("^.*[\\p{L}\\p{N}].*$"))
.count();
}
The regex at the bottom can be adjusted if needed, but overall this one has proved fairly resilient.

Downloading a PDF file from server

I need to convert certain data to a pdf file. For this I have wriitten the following code which will save data in TablePdf.pdf in the server. (Here the pdf file is saved in C:\ directory)
public String generatePdf() throws Exception
{
Font font = FontFactory.getFont("Ms Dialog Light");
BaseFont pdfFont = font.getBaseFont();
// TODO Auto-generated method stub
HashMap inputMap = new HashMap();
inputMap.put(TableProperties.PDF_PATH, "c://TablePdf.pdf");
inputMap.put(TableProperties.PDF_TABLE_NAME, "Table");
inputMap.put(TableProperties.PDF_HEIGHT, "1000");
inputMap.put(TableProperties.PDF_WIDTH, "1500");
ArrayList<String> columnNameList = new ArrayList<String>();
ArrayList<String> dataList = new ArrayList<String>();
ArrayList<String> columnWidthList = new ArrayList<String>();
columnNameList.add("Col1");
columnNameList.add("Col2");
columnNameList.add("Col3");
columnNameList.add("Col4");
columnNameList.add("Col5");
columnWidthList.add("1");
columnWidthList.add("2");
columnWidthList.add("2");
columnWidthList.add("3");
columnWidthList.add("1");
for (int i = 0; i < 9; i++)
{
dataList.add("Id" + i);
dataList.add("Name is = " + Math.random() * i);
dataList.add("Field Value1 is = " + Math.random() * i);
dataList.add("Field Value2 is = " + Math.random() * i);
dataList.add("Field Value3 is = " + Math.random() * i);
}
inputMap.put(TableProperties.PDF_TABLE_COLUMN_NUMBER, "5");
inputMap.put(TableProperties.PDF_TABLE_COLUMN_NAME, columnNameList);
inputMap.put(TableProperties.PDF_TABLE_COLUMN_VALUES, dataList);
inputMap.put(TableProperties.PDF_TABLE_HEADER_WIDTH, columnWidthList);
inputMap.put(TableProperties.PDF_HEADER, " Hello\n\n");
inputMap.put(TableProperties.PDF_HEADER_FONT_NAME, pdfFont);
inputMap.put(TableProperties.PDF_HEADER_FONT_SIZE, "20.0");
inputMap.put(TableProperties.PDF_HEADER_ALIGNMENT, Element.ALIGN_LEFT);
inputMap.put(TableProperties.PDF_FOOTER, " Tata");
inputMap.put(TableProperties.PDF_FOOTER_FONT_NAME, pdfFont);
inputMap.put(TableProperties.PDF_FOOTER_FONT_SIZE, "9.0");
inputMap.put(TableProperties.PDF_FOOTER_ALIGNMENT, Element.ALIGN_RIGHT);
inputMap.put(TableProperties.PDF_TABLE_CELL_HEIGHT, "6.0");
inputMap.put(TableProperties.PDF_TABLE_HEADER_HEIGHT, "4.0");
inputMap.put(TableProperties.PDF_TABLE_ALTERNATE_BACKGROUND_COLOR, "Y");
inputMap.put(TableProperties.PDF_TABLE_BACKGROUND_COLOR, BaseColor.CYAN);
inputMap.put(TableProperties.PDF_TABLE_CELL_ALIGNMENT, new Integer(Element.ALIGN_LEFT));
inputMap.put(TableProperties.PDF_TABLE_FONT_NAME, pdfFont);
inputMap.put(TableProperties.PDF_TABLE_FONT_SIZE, "6.0");
inputMap.put(TableProperties.PDF_TABLE_HEADER_ALIGNMENT, new Integer(Element.ALIGN_CENTER));
inputMap.put(TableProperties.PDF_TABLE_HEADER_BACKGROUND_COLOR, BaseColor.GRAY);
inputMap.put(TableProperties.PDF_TABLE_HEADER_FONT_NAME, FontFactory.getFont("Times-Roman").getBaseFont());
inputMap.put(TableProperties.PDF_TABLE_HEADER_FONT_SIZE, "6.0");
CreateTable crtTbl = new CreateTable();
crtTbl.createTable(inputMap);
}
Now I need to allow the client so that they can download the pdf file.
--------------------EDITED--------------------------------
Below is my jsp code to download the pdf file. Its giving no error in the console, but the file is not downloading.
<%# page import="java.util.*,java.io.*"%>
<%# page language="java"%>
<%
try
{
response.setContentType ("application/pdf");
//set the header and also the Name by which user will be prompted to save
response.setHeader ("Content-Disposition", "attachment;filename=TablePdf.pdf");
File f = new File ("C:\\TablePdf.pdf");
InputStream inputStream = new FileInputStream(f);
ServletOutputStream servletOutputStream = response.getOutputStream();
int bit = 256;
int i = 0;
try
{
while ((bit) >= 0)
{
bit = inputStream.read();
servletOutputStream.write(bit);
}
System.out.println("" +bit);
}
catch (Exception ioe)
{
ioe.printStackTrace(System.out);
}
servletOutputStream.flush();
//outs.close();
inputStream.close();
}
catch(Exception e)
{
}
%>

There are many options. Two of them:
Install a simple Apache server - you store the PDF files under htdocs, and they will be accessible
Have tomcat (or another servlet container), and make a servlet that reads files from the directory they are stored and streams them for download. In short, this is done by transferring their bytes from the FileInputStream to the response.getOutputStream(). Also set the Content-Disposition` header accordingly

How to extract Lotus Notes database icon?

I have tried to extract Lotus Notes database icon by using DXL Exporter but it is not success. Result file is corrupt and can not be opened by image viewer.
How can I extract Lotus Notes database icon by using java?
private String extractDatabaseIcon() {
String tag = "";
String idfile = "";
String password = "";
String dbfile = "";
NotesThread.sinitThread();
Session s = NotesFactory.createSessionWithFullAccess();
s.createRegistration().switchToID(idfile, password);
Database d = s.getDatabase("", dbfile);
NoteCollection nc = d.createNoteCollection(false);
nc.setSelectIcon(true);
nc.buildCollection();
String noteId = nc.getFirstNoteID();
int counter = 0;
while (noteId != null) {
counter++;
try {
Document doc = d.getDocumentByID(noteId);
DxlExporter dxl = s.createDxlExporter();
String xml = dxl.exportDxl(doc);
xml = xml.substring(xml.indexOf("<note "));
org.jsoup.nodes.Document jdoc = Jsoup.parse(xml);
Element ele = jdoc.select("rawitemdata").first();
String raw = ele.text().trim();
String temp = System.getProperty("java.io.tmpdir") + UUID.randomUUID().toString() + "\\";
File file = new File(temp);
file.mkdir();
String filename = temp + UUID.randomUUID().toString().replaceAll("-", "") + ".gif";
byte[] buffer = decode(raw.getBytes());
FileOutputStream fos = new FileOutputStream(filename);
fos.write(buffer);
fos.close();
tag = filename;
} catch (Exception e) {
logger.error("", e);
}
if (counter >= nc.getCount()) {
noteId = null;
} else {
noteId = nc.getNextNoteID(noteId);
}
}
return tag;
}
private byte[] decode(byte[] b) throws Exception {
ByteArrayInputStream bais = new ByteArrayInputStream(b);
InputStream b64is = MimeUtility.decode(bais, "base64");
byte[] tmp = new byte[b.length];
int n = b64is.read(tmp);
byte[] res = new byte[n];
System.arraycopy(tmp, 0, res, 0, n);
return res;
}

It is not even a bitmap, it is an icon. The format you can find here:
http://www.daubnet.com/formats/ICO.html
I managed to do this, a long time ago, in LotusScript. My code was based on an earlier version of this page:
http://www2.tcl.tk/11202
For the icon itself, you only have to open one document:
NotesDocument doc = db.getDocumentByID("FFFF8010")
exporter = session.createDXLExporter
exporter.setConvertNotesBitmapsToGIF(false)
outputXML = exporter.export(doc)
and then parse the XML to find the rawitemdata from the IconBitmap item, as you did in your original code.

I'm not sure what the format is. As far as I know' it's a 16 color bitmap, but not in standard BMP file format. And it's definitely not GIF format, but you can tell the DXLExporter to convert it. The default is to leave it native, so you need to add this to your code before you export:
dxl.setConvertNotesBitmapsToGIF(true);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Error convert docx to pdf in java - java

Related

How to replace DataXML from Slide Diagram in Powerpoint using Apache POI

JAVA POI failed to write a large word file

In need of a clear example on how to get the word count of DOC and DOCX files

Downloading a PDF file from server

How to extract Lotus Notes database icon?

Categories

Resources