How to convert an html String to a PDF InputStream?

How to convert an html String to a PDF InputStream? - java

If we have string with a content of a html page, how can we convert it to a InputStream made after transform this string to a pdf document?
I'm trying to use iText with XMLWorkerHelper, and this following code works, but the problem is I don't want the output on a file. I have tried several variations in order to get the result on a InputStream that I could convert to a Primefaces StreamedContent but no success. How we can do it?
Is there another technique that we can use to solve this problem?
The motivation to this is use xhtml files wich is already rendered and output it as a pdf to be downloaded by the user.
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document,
new FileOutputStream("results/loremipsum.pdf"));
document.open();
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new FileInputStream("/html/loremipsum.html"));
document.close();

If you need an InputStream from which some other code can read the PDF your code produces, you can simply create the PDF using a byte array output stream and thereafter wrap the byte array from that stream in a byte array input stream:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, baos);
document.open();
XMLWorkerHelper.getInstance().parseXHtml(writer, document, new FileInputStream("/html/loremipsum.html"));
document.close();
ByteArrayInputStream pdfInputStream = new ByteArrayInputStream(baos.toByteArray());
You can optimize this a bit by creating and processing the PDF in different threads and using a PipedOutputStream and a PipedInputStream instead.

Related

ă ș ț characters missing from pdf generated from html with PdfWriter

I am trying to convert some html content to a pdf using the itext PdfWriter, like this:
Document document = new Document();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
document.open();
InputStream stream = new ByteArrayInputStream(content.getBytes(StandardCharsets.UTF_8));
XMLWorkerHelper.getInstance().parseXHtml(writer, document, stream, Charset.forName("UTF-8"));
document.close();
but the ă ș ț charaters are missing from the generated pdf. I have tried setting the encoding or the font, but with no luck. What I tried was to use a font provider and set it as a param to the parseXHtml method.
I set the encoding, but nothing changed.
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider();
fontProvider.setUseUnicode(true);
fontProvider.defaultEncoding = BaseFont.CP1257;
I also tried setting the font, but it was not applied to the pdf.
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register(PATH_TO_TTF_FONT_FILE_HOSTED_ON_S3);
And then set the param for parseXHtml.
XMLWorkerHelper.getInstance().parseXHtml(writer, document, stream, Charset.forName("UTF-8"), fontProvider);
Is there any way I could use the PdfWriter to convert all characters correctly from html to pdf?

merge many pdf files into one pdf files in web application java

I have many pdf files and I have to merge all pdf into one big pdf file and render it into browser.I am using itext. Using this, I am able to merge pdf files into one file into disk but I cannot merge into browser and there is only last pdf in browser..following is my code.. please help me on this.
Thanks in advance.
Document document = new Document();
List<PdfReader> readers =
new ArrayList<PdfReader>();
int totalPages = 0;
ServletOutputStream servletOutPutStream = response.getOutputStream();;
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();;
InputStream is=null;
List<InputStream> inputPdfList = new ArrayList<InputStream>();
System.err.println(imageMap.size());
for(byte[] imageList:imageMap)
{
System.out.println(imageList.toString()+" "+imageList.length);
byteArrayOutputStream.write(imageList);
byteArrayOutputStream.writeTo(response.getOutputStream());
is = new ByteArrayInputStream(byteArrayOutputStream.toByteArray());
inputPdfList.add(is);
}
response.setContentType("application/pdf");
response.setContentLength(byteArrayOutputStream.size());
System.out.println(inputPdfList.size()+""+inputPdfList.toString());
//Create pdf Iterator object using inputPdfList.
Iterator<InputStream> pdfIterator =
inputPdfList.iterator();
// Create reader list for the input pdf files.
while (pdfIterator.hasNext()) {
InputStream pdf = pdfIterator.next();
PdfReader pdfReader = new PdfReader(pdf);
readers.add(pdfReader);
totalPages = totalPages + pdfReader.getNumberOfPages();
}
// Create writer for the outputStream
PdfWriter writer = PdfWriter.getInstance(document, response.getOutputStream());
//Open document.
document.open();
//Contain the pdf data.
PdfContentByte pageContentByte = writer.getDirectContent();
PdfImportedPage pdfImportedPage;
int currentPdfReaderPage = 1;
Iterator<PdfReader> iteratorPDFReader = readers.iterator();
// Iterate and process the reader list.
while (iteratorPDFReader.hasNext()) {
PdfReader pdfReader = iteratorPDFReader.next();
//Create page and add content.
while (currentPdfReaderPage <= pdfReader.getNumberOfPages()) {
document.newPage();
pdfImportedPage = writer.getImportedPage(
pdfReader,currentPdfReaderPage);
pageContentByte.addTemplate(pdfImportedPage, 0, 0);
currentPdfReaderPage++;
}
currentPdfReaderPage = 1;
}
//Close document and outputStream.
servletOutPutStream.flush();
outputStream.flush();
document.close();
outputStream.close();
servletOutPutStream.close();
System.out.println("Pdf files merged successfully.");

There are numerous errors in your code:
Only write to the response output stream what you want to return to the browser
Your code writes a wild collection of data to the response output stream:
ServletOutputStream servletOutPutStream = response.getOutputStream();;
[...]
for(byte[] imageList:imageMap)
{
[...]
byteArrayOutputStream.writeTo(response.getOutputStream());
[...]
}
[...]
PdfWriter writer = PdfWriter.getInstance(document, response.getOutputStream());
[... merge PDFs into the writer]
servletOutPutStream.flush();
document.close();
servletOutPutStream.close();
This results in many copies of the imageMap elements to be written there and the merged file only to be added thereafter.
What do you expect the browser to do, ignore all the leading source PDF copies until finally the merged PDF appears?
Thus, please only write the merged PDF to the response output stream.
Don't write a wrong content length
It is a good idea to write the content length to the response... but only if you use the correct value!
In your code you write a content length:
response.setContentLength(byteArrayOutputStream.size());
but the byteArrayOutputStream at this time only contains a wild mix of copies of the source PDFs and not yet the final merged PDF. Thus, this will only serve to confuse the browser even more.
Thus, please do not add false headers to the response.
Don't mangle your input data
In the loop
for(byte[] imageList:imageMap)
{
System.out.println(imageList.toString()+" "+imageList.length);
byteArrayOutputStream.write(imageList);
byteArrayOutputStream.writeTo(response.getOutputStream());
is = new ByteArrayInputStream(byteArrayOutputStream.toByteArray());
inputPdfList.add(is);
}
you take byte arrays which I assume contain a single source PDF each, pollute the response output stream with them (as mentioned before), and create a collection of input streams where the first one contains the first source PDF, the second one contains the concatenation of the first two source PDFs, the third one the concatenation of the first three source PDFs, etc...
Because you never reset or re-instantiate the byteArrayOutputStream, it only gets bigger and bigger.
Thus, please start or end loops like this with a reset of the byteArrayOutputStream.
(Actually you don't need that loop at all, the PdfReader has a constructor which can immediately take a byte[], no need to wrap it in a byte stream.)
Don't merge PDFs using a plain PdfWriter, use a PdfCopy
You merge the PDFs using a PdfWriter / getImportedPage / addTemplate approach. There are dozens of questions and answer on stack overflow (many of them answered by iText developers) explaining that this usually is a bad idea and that you should use PdfCopy.
Thus, please make use of the many good answers which already exist on this topic here and use PdfCopy for merging.
Don't flush or close streams only because you can
You finalize the response output by closing numerous streams:
//Close document and outputStream.
servletOutPutStream.flush();
outputStream.flush();
document.close();
outputStream.close();
servletOutPutStream.close();
I have not seen a line in which you declared or set that outputStream variable, but even if it contained the response output stream, there is no need to close that because you already close it in the servletOutPutStream variable.
Thus, please remove unnecessary calls like this.

//Suppose we want to merge one pdf with another main pdf
InputStream is1 = null;
if (file1 != null) {
FileInputStream fis1 = new FileInputStream(file1);
byte[] file1Data = new byte[(int) file1.length()];
fis1.read(file1Data);
is1 = new java.io.ByteArrayInputStream(file1Data);
}
//
InputStream mainContent = <ur main content>
org.apache.pdfbox.pdmodel.PDDocument mergedPDF = new org.apache.pdfbox.pdmodel.PDDocument();
org.apache.pdfbox.pdmodel.PDDocument mainDoc = org.apache.pdfbox.pdmodel.PDDocument.load(mainContent);
org.apache.pdfbox.multipdf.PDFMergerUtility merger = new org.apache.pdfbox.multipdf.PDFMergerUtility();
merger.appendDocument(mergedPDF, mainDoc);
PDDocument doc1 = null;
if (is1 != null) {
doc1 = PDDocument.load(is1);
merger.appendDocument(mergedPDF, doc1);
//1st file appended to main pdf");
}
ByteArrayOutputStream baos = new ByteArrayOutputStream();
mergedPDF.save(baos);
//Now either u save it here or convert into InputStream if u want
ByteArrayInputStream mergedInputStream = new ByteArrayInputStream(baos.toByteArray());

Convert a PDF with forms to a PDF with text only (preserve data) using iText

I have multiple PDFs that get populated with multiple records (a.pdf,b.pdf,c[0-9].pdf,d[0-9].pdf,ez.pdf) using acroforms and pdfbox.
The resulting files (aflat.pdf,bflat.pdf,c[0-9]flat.pdf,d[0-9]flat.pdf,ezflat.pdf) should have their forms(dictionaries and whatever adobe uses) removed but the fields filled as raw text saved on the pdf (setReadOnly is not what I want!).
PdfStamper can only remove fields without saving their content but I've found some references to PdfContentByte as a way to save the content. Alas, the documentation is too brief to understand how I should do this.
As a last resort I could use FieldPosition to write directly on the PDF. Has anyone ever encountered such problem? How do I solve it?
UPDATE: Saving a single page of b.pdf yields a valid bfilled.pdf but a blank bflattened.pdf. Saving the whole document solved the issue.
populateB();
try (PDDocument doc = new PDDocument(); FileOutputStream stream = new FileOutputStream("bfilled.pdf")) {
//importing the page will corrupt the fields
/*wrong approach*/doc.importPage((PDPage)pdfDocuments.get(0).getDocumentCatalog().getAllPages().get(0));
/*wrong approach*/doc.save(stream);
//save the whole document instead
pdfDocuments.get(0).save(stream);//<---right approach
}
try (FileOutputStream stream = new FileOutputStream("bflattened.pdf")) {
PdfStamper stamper = new PdfStamper(new PdfReader("bfilled.pdf"), stream);
stamper.setFormFlattening(true);
stamper.close();
}

Use PdfStamper.setFormFlattening(true) to get rid of the fields and write them as content.

Always use the whole page when working with acroforms
populateB();
try (PDDocument doc = new PDDocument(); FileOutputStream stream = new FileOutputStream("bfilled.pdf")) {
//importing the page will corrupt the fields
doc.importPage((PDPage) pdfDocuments.get(0).getDocumentCatalog().getAllPages().get(0));
doc.save(stream);
//save the whole document instead
pdfDocuments.get(0).save(stream);
}
try (FileOutputStream stream = new FileOutputStream("bflattened.pdf")) {
PdfStamper stamper = new PdfStamper(new PdfReader("bfilled.pdf"), stream);
stamper.setFormFlattening(true);
stamper.close();
}

Save itext pdf as blob without physical existence.

I am using this code to generate PDF using iText. First it creates HTML to PDF after that it converts that PDF in byte array or in BLOB or in byte array.
I dont want to create any physical stores of pdf on my server. First i want to convert HTML to blob of PDF using itext, And after that i want to store that blob in my DB(Stores in DB i will done).
String userAccessToken=requests.getSession()
.getAttribute("access_token").toString();
Document document = new Document(PageSize.LETTER);
String name="/pdf/invoice.pdf";
PdfWriter pdfWriter = PdfWriter.getInstance
(document, new FileOutputStream(requests.getSession().getServletContext().getRealPath("")+"/assets"+name));
document.open();
document.addAuthor("Real Gagnon");
document.addCreator("Real's HowTo");
document.addSubject("Thanks for your support");
document.addTitle("Please read this ");
XMLWorkerHelper worker = XMLWorkerHelper.getInstance();
//data is an html string
String str = data;
worker.parseXHtml(pdfWriter, document, new StringReader(str));
document.close();
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
PdfWriter.getInstance(document, byteArrayOutputStream);
byte[] pdfBytes = byteArrayOutputStream.toByteArray();
link=name;
System.out.println("Byte array is "+pdfBytes);
PROBLEM:- Convert html to pdf BLOB using itext, Without physical existence of PDF.

The other answer to this question is almost correct, but not quite.
You can use any OutputStream when you create a PdfWriter. If you want to create a file entirely in memory, you can use a ByteArrayOutputStream like this:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Document document = new Document();
PdfWriter.getInstance(document, baos);
document.open();
// add stuff
document.close();
byte[] pdf = baos.toByteArray();
In short: you first create a ByteArrayOutputStream, you pass this OutputStream to the PdfWriter and after the document is closed, you can get the bytes from the OutputStream.
(In the other answer, there was no way to retrieve the bytes. Also: it is important that you don't try to retrieve the bytes before the document is closed.)

Write into a ByteArrayOutputStream (instead of a FileOutputStream):
PdfWriter pdfWriter = PdfWriter.getInstance
(document, new ByteArrayOutputStream());

binary pdf byte[] to com.lowagie.text.Document

I had an binary pdf(Byte []). I would like to convert it to a com.lowagie.text.Document.is there anyway to convert it without losing any information on it.Thanks
this is i tried, and to not sure how to move further.
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, out);
document.open();
PdfReader reader = new PdfReader(byteBuffer);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to convert an html String to a PDF InputStream? - java

Related

ă ș ț characters missing from pdf generated from html with PdfWriter

merge many pdf files into one pdf files in web application java

Convert a PDF with forms to a PDF with text only (preserve data) using iText

Save itext pdf as blob without physical existence.

binary pdf byte[] to com.lowagie.text.Document

Categories

Resources