Apache pdfbox .doc to .pdf conversion

Apache pdfbox .doc to .pdf conversion - java

I'm trying to convert .doc to .pdf, but I got this exception and I don't know how to fix it.
java.io.IOException: Missing root object specification in trailer
at org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2042)
This is where the exception is thrown:
PDDocument pdfDocument = PDDocument.load(convertDocToPdf(documentInputStream));
Here is my conversion method:
private byte[] convertDocToPdf(InputStream documentInputStream) throws Exception {
Document document = null;
WordExtractor we = null;
ByteArrayOutputStream out = null;
byte[] documentByteArray = null;
try {
document = new Document();
POIFSFileSystem fs = new POIFSFileSystem(documentInputStream);
HWPFDocument doc = new HWPFDocument(fs);
we = new WordExtractor(doc);
out = new ByteArrayOutputStream();
PdfWriter writer = PdfWriter.getInstance(document, out);
Range range = doc.getRange();
document.open();
writer.setPageEmpty(true);
document.newPage();
writer.setPageEmpty(true);
String[] paragraphs = we.getParagraphText();
for (int i = 0; i < paragraphs.length; i++) {
org.apache.poi.hwpf.usermodel.Paragraph pr = range.getParagraph(i);
paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n", "");
document.add(new Paragraph(paragraphs[i]));
}
documentByteArray = out.toByteArray();
} catch (Exception ex) {
ex.printStackTrace(System.out);
throw new Exception(STATE.FAILED_CONVERSION.name());
} finally {
document.close();
try {
we.close();
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return documentByteArray;
}

You use iText classes and do
documentByteArray = out.toByteArray();
before you finish the document
document.close();
Thus, the documentByteArray only contains an incomplete PDF which PDFBox complains about.

Related

PDF File generated by PdfWriter is showing blank page

I have code to merge between 2 pdf files using the library OpenPDF like this
public byte[] mergePDF(byte[] pdf1, byte[] pdf2) throws IOException {
ByteArrayOutputStream os = new ByteArrayOutputStream();
Document document = new Document();
PdfWriter writer;
try {
writer = PdfWriter.getInstance(document, os);
} catch (DocumentException e) {
throw new IllegalStateException(e);
}
document.open();
PdfContentByte cb = writer.getDirectContent();
PdfReader pdfReader1 = new PdfReader(pdf1);
PdfReader pdfReader2 = new PdfReader(pdf2);
for (int i = 0; i < pdfReader1.getNumberOfPages(); i++) {
PdfImportedPage page = writer.getImportedPage(pdfReader1, i + 1);
document.setPageSize(pdfReader1.getPageSize(i + 1));
document.newPage();
cb.addTemplate(page, 0, 0);
}
for (int i = 0; i < pdfReader2.getNumberOfPages(); i++) {
PdfImportedPage page = writer.getImportedPage(pdfReader2, i + 1);
document.setPageSize(pdfReader2.getPageSize(i + 1));
document.newPage();
cb.addTemplate(page, 0, 0);
}
document.close();
return os.toByteArray();
}
Merged PDF generated showing a blank page from on Page come from the second pdf but it showing when open with adobe acrobat, firefox, and ubuntu document viewer. Does anyone know what issue? or any missing configuration that needs to be set in my code?

downloading blank pdf pages with itext and ACJRuntime.jar

I have an xml file and .doj template file. trying to download a PDF using itext.jar and ACTRuntime.jar. It is working in java 1.6 and Websphere 7 but after migrating to java 8 and websphere 8, it is just downloading blank white pages in PDF format.
Below is the code that I'm using to produce PDF.
public ByteArrayOutputStream generatePdf(){
ArrayList<Object> bean = new ArrayList<Object>();
BeanOne beanOne = new BeanOne();
beanOne.setId(1);
beanOne.setName("Sai");
beanOne.setPhone("1234567890");
BeanOne beanOne1 = new BeanOne();
beanOne1.setId(2);
beanOne1.setName("Ram");
beanOne1.setPhone("9876543210");
bean.add(beanOne);
bean.add(beanOne1);
AppDataHandler ds = new AppDataHandler();
String tableName = "First Talbe";
Object obj= bean.get(0);
Vector fields = new Vector();
fields.addElement("id = getId");
fields.addElement("name = getName");
fields.addElement("phone = getPhone");
ds.registerObjectAsTable(obj, tableName, fields);
ds.registerDataSet(tableName, bean);
FileInputStream reportTemplateInputStream = new FileInputStream(new File("/template.jod"));
ACJEngine acjEngine = new ACJEngine();
acjEngine.readTemplate(reportTemplateInputStream);
TemplateManager templateManager = acjEngine.getTemplateManager();
templateManager.setLabel("ID", "ID");
templateManager.setLabel("NAME", "NAME");
templateManager.setLabel("PHONE", "PHONE");
acjEngine.setX11GfxAvailibility(false);
acjEngine.setDataSource(ds);
ACJOutputProcessor ec = new ACJOutputProcessor();
IViewerInterface ivi = acjEngine.generateReport();
ByteArrayOutputStream generatedPDFStream = new ByteArrayOutputStream();
ec.setPDFProperty("OutputStream", generatedPDFStream);
ec.generatePDF();
reportTemplateInputStream.close();
Object[] pdfFromActuateArray = new Object[1];
pdfFromActuateArray[0] = generatedPDFStream.toByteArray();
return mergePdfsUsingItext(pdfFromActuateArray);
}
private ByteArrayOutputStream mergePdfsUsingItext(Object[] documents) throws com.itextpdf.text.DocumentException {
ByteArrayOutputStream content = new ByteArrayOutputStream();
int f;
byte[] byteDoc = null;
for (f = 0; f < documents.length; ++f) {
byteDoc = (byte[]) documents[f];
if (byteDoc != null)
break;
}
PdfReader reader = new PdfReader(byteDoc);
int n = reader.getNumberOfPages();
Document document = new Document(reader.getPageSizeWithRotation(1));
PdfWriter writer = PdfWriter.getInstance(document, content);
document.open();
PdfContentByte cb = writer.getDirectContent();
PdfImportedPage page;
int rotation;
while (f < documents.length) {
int i = 0;
while (i < n) {
i++;
document.setPageSize(reader.getPageSizeWithRotation(i));
document.newPage();
page = writer.getImportedPage(reader, i);
cb.addTemplate(page, 1f, 0, 0, 1f, 0, 0);
}
f++;
if (f < documents.length) {
reader = new PdfReader((byte[]) documents[f]);
n = reader.getNumberOfPages();
}
}
document.close();
return content;
}
With the above code I'm getting the ByteArrayOutputStream which I'm printing on the jsp page with content-type as application/pdf.
The result pdf is completely blank. Hope Someone could explain the issue. Also please suggest any good alternative to this code.
Thanks in Advance.

why file is not deleted inspite of using delete function?

for (int i = 0; i < listOfTempFiles.length; i++) {
for (int j = 0; j < listOfFAQFiles.length; j++) {
if (listOfTempFiles[i].isFile() && listOfTempFiles[i].length() > 0) {
if (listOfTempFiles[i].getName().toLowerCase().contains(".pdf")) {
if (listOfTempFiles[i].getName().substring(listOfTempFiles[i].getName().lastIndexOf("#") + 1).equals(listOfFAQFiles[j].getName())) {
try {
List<InputStream> list = new ArrayList<InputStream>();
list.add(new FileInputStream(listOfTempFiles[i]));
list.add(new FileInputStream(listOfFAQFiles[j]));
System.out.println(listOfTempFiles[i].getName() + "with FAQ: " + listOfFAQFiles[j].getName());
int iend = listOfTempFiles[i].getName().lastIndexOf("#");
if (iend != -1) {
outputFilename = listOfTempFiles[i].getName().substring(0, iend);
}
OutputStream out = new FileOutputStream(new File(finalPDFParh + "/" + outputFilename + ".pdf"));
doMerge(list, out);
boolean flag=listOfTempFiles[i].delete();
System.out.println("Flag----->"+flag);
list.clear();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (DocumentException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
public static void doMerge(List<InputStream> list, OutputStream outputStream)
throws DocumentException, IOException {
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
document.open();
PdfContentByte cb = writer.getDirectContent();
for (InputStream in : list) {
PdfReader reader = new PdfReader(in);
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
document.newPage();
//import the page from source pdf
PdfImportedPage page = writer.getImportedPage(reader, i);
//add the page to the destination pdf
cb.addTemplate(page, 0, 0);
}
}
outputStream.flush();
document.close();
outputStream.close();
}
I want to delete original file from listofTempFiles after it merges with FAQ file.doMerge methhod merges pdf added in the list.I have used delete function but it is not deleted?What can I do with it? I have used delete function.

How to add a page number to the output pdf when merging two pdfs?

I am using the following code to merge two pdfs:
File firstPdfFile = new File("firstPdf.pdf");
File secondPdfFile = new File("secondPdf.pdf");
PDFMergerUtility merger = new PDFMergerUtility();
merger.addSource(firstPdfFile);
merger.addSource(secondPdfFile);
String pdfPath = "PdfFile.pdf";
OutputStream bout2 = new BufferedOutputStream(new FileOutputStream(pdfPath));
merger.setDestinationStream(bout2);
merger.mergeDocuments();
File pdfFile = new File(pdfPath);
I am getting the merged pdf correctly but I want to add page number in this pdf file.

Try this code.
File firstPdfFile = new File("firstPdf.pdf");
File secondPdfFile = new File("firstPdf.pdf");
PDFMergerUtility merger = new PDFMergerUtility();
merger.addSource(firstPdfFile);
merger.addSource(secondPdfFile);
String pdfPath = "PdfFile.pdf";
OutputStream bout2 = new BufferedOutputStream(new FileOutputStream(pdfPath));
merger.setDestinationStream(bout2);
merger.mergeDocuments();
PDDocument doc = null;
try {
URL file = new URL("file:///PdfFile.pdf");
doc = PDDocument.load(file);
List<?> allPages = doc.getDocumentCatalog().getAllPages();
PDFont font = PDType1Font.HELVETICA_BOLD;
float fontSize = 36.0f;
for (int i = 0; i < allPages.size(); i++) {
PDPage page = (PDPage) allPages.get(i);
PDPageContentStream footercontentStream = new PDPageContentStream(doc, page, true, true);
footercontentStream.beginText();
footercontentStream.setFont(font, fontSize);
footercontentStream.moveTextPositionByAmount((PDPage.PAGE_SIZE_A4.getUpperRightX() / 2), (PDPage.PAGE_SIZE_A4.getLowerLeftY()));
footercontentStream.drawString(String.valueOf(i + 1));
footercontentStream.endText();
footercontentStream.close();
}
doc.save("PdfFile.pdf");
} finally {
if (doc != null) {
doc.close();
}
}

Try below code for PDFBox 2.0
public class PageNumberExample {
final boolean isCompress = false;
final boolean isContextReset = true;
public static void main(String[] args) throws IOException {
new PageNumberExample().addPageNumber("merged PDF path");
}
public void addPageNumber(String pdfPath) throws IOException {
File mergePpdfFile = new File(pdfPath);
PDDocument document = PDDocument.load(mergePpdfFile);
int totalPage = document.getNumberOfPages();
for(int i=0; i<totalPage; i++) {
PDPage page = document.getPage(i);
PDPageContentStream stream = new PDPageContentStream(document, page, PDPageContentStream.AppendMode.APPEND, isCompress, isContextReset);
stream.setNonStrokingColor(Color.BLACK);
stream.beginText();
stream.setFont(PDType1Font.COURIER, 10);
stream.newLineAtOffset(100, 100); //Set position where you want to print page number.
stream.showText("Page " + (i+1) + " of " + totalPage);
stream.endText();
stream.close();
}
document.save(pdfPath);
document.close();
}
}

excel file not generating after UTF-8 encoding chosen

This part of my code was creating xls file successfuly
FileOutputStream fileOut = new FileOutputStream("c:\\Decrypted.xls");
wb.write(fileOut);
fileOut.close();
when other part of the code had this statement ( which was before the above code )
in = new ByteArrayInputStream(theCell_00.getBytes(""));
But when I changed it to
in = new ByteArrayInputStream(theCell_00.getBytes("UTF-8"));
this part
FileOutputStream fileOut = new FileOutputStream("c:\\Decrypted.xls");
wb.write(fileOut);
fileOut.close();
is not generating any xls file anymore.
I need to change the encoding to UTF-8 as I have done in ByteArrayInputStream line, so what should I do that the code still generates xls file.
In case you need it, the two parts are taken from this function.
public void getExcel() throws Exception {
try {
ByteArrayInputStream in = null;
FileOutputStream out = null;
HSSFWorkbook wb = new HSSFWorkbook();
HSSFSheet sheet = wb.createSheet("new sheet");
/*
* KeyGenerator kgen = KeyGenerator.getInstance("AES"); kgen.init(128); SecretKey key =
* kgen.generateKey(); byte[] encoded = key.getEncoded();
*
* IOUtils.write(encoded, new FileOutputStream(new
* File("C:\\Users\\abc\\Desktop\\key.txt")));
*/
FileInputStream fin = new FileInputStream("C:\\key.txt");
DataInputStream din = new DataInputStream(fin);
byte b[] = new byte[16];
din.read(b);
InputStream excelResource = new FileInputStream(path);
Workbook rwb = Workbook.getWorkbook(excelResource);
int sheetCount = rwb.getNumberOfSheets();
Sheet rs = rwb.getSheet(0);
int rows = rs.getRows();
int cols = rs.getColumns();
for (int i = 0; i < rows; i++) {
for (int j = 0; j < Col.length; j++) {
String theCell_00 = rs.getCell(j, i).getContents();
System.out.println("the Cell Content : " + theCell_00);
in = new ByteArrayInputStream(theCell_00.getBytes(""));
out = new FileOutputStream("c:\\Decrypted.txt");
try {
// System.out.println(b);
SecretKey key1 = new SecretKeySpec(b, "AES");
// Create encrypter/decrypter class
AESDecrypter encrypter = new AESDecrypter(key1);
encrypter.encrypt(new ByteArrayInputStream(theCell_00.getBytes()),
new FileOutputStream("temp.txt"));
// Decrypt
// encrypter.encrypt(,new FileOutputStream("Encrypted.txt"));
encrypter.decrypt(in, out);
try {
if (out != null)
out.close();
} finally {
if (in != null)
in.close();
}
// encrypter.decrypt(new
// ByteArrayInputStream(theCell_00.getBytes(Latin_1)),new
// FileOutputStream("c:\\Decrypted.txt"));
String filename = "c:\\Decrypted.txt";
BufferedReader bufferedReader = null;
try {
// Construct the BufferedReader object
bufferedReader = new BufferedReader(new FileReader(filename));
// System.out.println(bufferedReader.readLine());
String line = null;
while ((line = bufferedReader.readLine()) != null) {
// Process the data, here we just print it out
/*
* HSSFWorkbook wb = new HSSFWorkbook(); HSSFSheet sheet =
* wb.createSheet("new sheet"); HSSFRow row = sheet.createRow(2);
*/
// System.out.println(i);
HSSFRow row = sheet.createRow(i);
int s_col = 0;
row.createCell(s_col).setCellValue(line);
// s_col++;
// row.createCell(1).setCellValue(new Date());
FileOutputStream fileOut = new FileOutputStream("c:\\Decrypted.xls");
wb.write(fileOut);
fileOut.close();
// System.out.println(line);
}
} catch (FileNotFoundException ex) {
ex.printStackTrace();
} catch (IOException ex) {
ex.printStackTrace();
} finally {
// Close the BufferedReader
try {
if (bufferedReader != null)
bufferedReader.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
rwb.close();
} catch (Exception ex) {
ex.printStackTrace();
ex.getMessage();
}
}

What data types are expected by the call to AESDecrypter.decrypt? Does it have to take in a FileOutputStream object? Or can you pass in a Writer or other OutputStream?
I normally do something like this to write UTF-8 output:
Writer out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("c:\\Decrypted.txt"), "UTF-8"));

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Apache pdfbox .doc to .pdf conversion - java

You use iText classes and do documentByteArray = out.toByteArray(); before you finish the document document.close(); Thus, the documentByteArray only contains an incomplete PDF which PDFBox complains about.

Related

PDF File generated by PdfWriter is showing blank page

downloading blank pdf pages with itext and ACJRuntime.jar

why file is not deleted inspite of using delete function?

How to add a page number to the output pdf when merging two pdfs?

excel file not generating after UTF-8 encoding chosen

Categories

Resources