Save itext pdf as blob without physical existence. - java

I am using this code to generate PDF using iText. First it creates HTML to PDF after that it converts that PDF in byte array or in BLOB or in byte array.
I dont want to create any physical stores of pdf on my server. First i want to convert HTML to blob of PDF using itext, And after that i want to store that blob in my DB(Stores in DB i will done).
String userAccessToken=requests.getSession()
.getAttribute("access_token").toString();
Document document = new Document(PageSize.LETTER);
String name="/pdf/invoice.pdf";
PdfWriter pdfWriter = PdfWriter.getInstance
(document, new FileOutputStream(requests.getSession().getServletContext().getRealPath("")+"/assets"+name));
document.open();
document.addAuthor("Real Gagnon");
document.addCreator("Real's HowTo");
document.addSubject("Thanks for your support");
document.addTitle("Please read this ");
XMLWorkerHelper worker = XMLWorkerHelper.getInstance();
//data is an html string
String str = data;
worker.parseXHtml(pdfWriter, document, new StringReader(str));
document.close();
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
PdfWriter.getInstance(document, byteArrayOutputStream);
byte[] pdfBytes = byteArrayOutputStream.toByteArray();
link=name;
System.out.println("Byte array is "+pdfBytes);
PROBLEM:- Convert html to pdf BLOB using itext, Without physical existence of PDF.

The other answer to this question is almost correct, but not quite.
You can use any OutputStream when you create a PdfWriter. If you want to create a file entirely in memory, you can use a ByteArrayOutputStream like this:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Document document = new Document();
PdfWriter.getInstance(document, baos);
document.open();
// add stuff
document.close();
byte[] pdf = baos.toByteArray();
In short: you first create a ByteArrayOutputStream, you pass this OutputStream to the PdfWriter and after the document is closed, you can get the bytes from the OutputStream.
(In the other answer, there was no way to retrieve the bytes. Also: it is important that you don't try to retrieve the bytes before the document is closed.)

Write into a ByteArrayOutputStream (instead of a FileOutputStream):
PdfWriter pdfWriter = PdfWriter.getInstance
(document, new ByteArrayOutputStream());

Related

ă ș ț characters missing from pdf generated from html with PdfWriter

I am trying to convert some html content to a pdf using the itext PdfWriter, like this:
Document document = new Document();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
document.open();
InputStream stream = new ByteArrayInputStream(content.getBytes(StandardCharsets.UTF_8));
XMLWorkerHelper.getInstance().parseXHtml(writer, document, stream, Charset.forName("UTF-8"));
document.close();
but the ă ș ț charaters are missing from the generated pdf. I have tried setting the encoding or the font, but with no luck. What I tried was to use a font provider and set it as a param to the parseXHtml method.
I set the encoding, but nothing changed.
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider();
fontProvider.setUseUnicode(true);
fontProvider.defaultEncoding = BaseFont.CP1257;
I also tried setting the font, but it was not applied to the pdf.
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register(PATH_TO_TTF_FONT_FILE_HOSTED_ON_S3);
And then set the param for parseXHtml.
XMLWorkerHelper.getInstance().parseXHtml(writer, document, stream, Charset.forName("UTF-8"), fontProvider);
Is there any way I could use the PdfWriter to convert all characters correctly from html to pdf?

merge many pdf files into one pdf files in web application java

I have many pdf files and I have to merge all pdf into one big pdf file and render it into browser.I am using itext. Using this, I am able to merge pdf files into one file into disk but I cannot merge into browser and there is only last pdf in browser..following is my code.. please help me on this.
Thanks in advance.
Document document = new Document();
List<PdfReader> readers =
new ArrayList<PdfReader>();
int totalPages = 0;
ServletOutputStream servletOutPutStream = response.getOutputStream();;
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();;
InputStream is=null;
List<InputStream> inputPdfList = new ArrayList<InputStream>();
System.err.println(imageMap.size());
for(byte[] imageList:imageMap)
{
System.out.println(imageList.toString()+" "+imageList.length);
byteArrayOutputStream.write(imageList);
byteArrayOutputStream.writeTo(response.getOutputStream());
is = new ByteArrayInputStream(byteArrayOutputStream.toByteArray());
inputPdfList.add(is);
}
response.setContentType("application/pdf");
response.setContentLength(byteArrayOutputStream.size());
System.out.println(inputPdfList.size()+""+inputPdfList.toString());
//Create pdf Iterator object using inputPdfList.
Iterator<InputStream> pdfIterator =
inputPdfList.iterator();
// Create reader list for the input pdf files.
while (pdfIterator.hasNext()) {
InputStream pdf = pdfIterator.next();
PdfReader pdfReader = new PdfReader(pdf);
readers.add(pdfReader);
totalPages = totalPages + pdfReader.getNumberOfPages();
}
// Create writer for the outputStream
PdfWriter writer = PdfWriter.getInstance(document, response.getOutputStream());
//Open document.
document.open();
//Contain the pdf data.
PdfContentByte pageContentByte = writer.getDirectContent();
PdfImportedPage pdfImportedPage;
int currentPdfReaderPage = 1;
Iterator<PdfReader> iteratorPDFReader = readers.iterator();
// Iterate and process the reader list.
while (iteratorPDFReader.hasNext()) {
PdfReader pdfReader = iteratorPDFReader.next();
//Create page and add content.
while (currentPdfReaderPage <= pdfReader.getNumberOfPages()) {
document.newPage();
pdfImportedPage = writer.getImportedPage(
pdfReader,currentPdfReaderPage);
pageContentByte.addTemplate(pdfImportedPage, 0, 0);
currentPdfReaderPage++;
}
currentPdfReaderPage = 1;
}
//Close document and outputStream.
servletOutPutStream.flush();
outputStream.flush();
document.close();
outputStream.close();
servletOutPutStream.close();
System.out.println("Pdf files merged successfully.");
There are numerous errors in your code:
Only write to the response output stream what you want to return to the browser
Your code writes a wild collection of data to the response output stream:
ServletOutputStream servletOutPutStream = response.getOutputStream();;
[...]
for(byte[] imageList:imageMap)
{
[...]
byteArrayOutputStream.writeTo(response.getOutputStream());
[...]
}
[...]
PdfWriter writer = PdfWriter.getInstance(document, response.getOutputStream());
[... merge PDFs into the writer]
servletOutPutStream.flush();
document.close();
servletOutPutStream.close();
This results in many copies of the imageMap elements to be written there and the merged file only to be added thereafter.
What do you expect the browser to do, ignore all the leading source PDF copies until finally the merged PDF appears?
Thus, please only write the merged PDF to the response output stream.
Don't write a wrong content length
It is a good idea to write the content length to the response... but only if you use the correct value!
In your code you write a content length:
response.setContentLength(byteArrayOutputStream.size());
but the byteArrayOutputStream at this time only contains a wild mix of copies of the source PDFs and not yet the final merged PDF. Thus, this will only serve to confuse the browser even more.
Thus, please do not add false headers to the response.
Don't mangle your input data
In the loop
for(byte[] imageList:imageMap)
{
System.out.println(imageList.toString()+" "+imageList.length);
byteArrayOutputStream.write(imageList);
byteArrayOutputStream.writeTo(response.getOutputStream());
is = new ByteArrayInputStream(byteArrayOutputStream.toByteArray());
inputPdfList.add(is);
}
you take byte arrays which I assume contain a single source PDF each, pollute the response output stream with them (as mentioned before), and create a collection of input streams where the first one contains the first source PDF, the second one contains the concatenation of the first two source PDFs, the third one the concatenation of the first three source PDFs, etc...
Because you never reset or re-instantiate the byteArrayOutputStream, it only gets bigger and bigger.
Thus, please start or end loops like this with a reset of the byteArrayOutputStream.
(Actually you don't need that loop at all, the PdfReader has a constructor which can immediately take a byte[], no need to wrap it in a byte stream.)
Don't merge PDFs using a plain PdfWriter, use a PdfCopy
You merge the PDFs using a PdfWriter / getImportedPage / addTemplate approach. There are dozens of questions and answer on stack overflow (many of them answered by iText developers) explaining that this usually is a bad idea and that you should use PdfCopy.
Thus, please make use of the many good answers which already exist on this topic here and use PdfCopy for merging.
Don't flush or close streams only because you can
You finalize the response output by closing numerous streams:
//Close document and outputStream.
servletOutPutStream.flush();
outputStream.flush();
document.close();
outputStream.close();
servletOutPutStream.close();
I have not seen a line in which you declared or set that outputStream variable, but even if it contained the response output stream, there is no need to close that because you already close it in the servletOutPutStream variable.
Thus, please remove unnecessary calls like this.
//Suppose we want to merge one pdf with another main pdf
InputStream is1 = null;
if (file1 != null) {
FileInputStream fis1 = new FileInputStream(file1);
byte[] file1Data = new byte[(int) file1.length()];
fis1.read(file1Data);
is1 = new java.io.ByteArrayInputStream(file1Data);
}
//
InputStream mainContent = <ur main content>
org.apache.pdfbox.pdmodel.PDDocument mergedPDF = new org.apache.pdfbox.pdmodel.PDDocument();
org.apache.pdfbox.pdmodel.PDDocument mainDoc = org.apache.pdfbox.pdmodel.PDDocument.load(mainContent);
org.apache.pdfbox.multipdf.PDFMergerUtility merger = new org.apache.pdfbox.multipdf.PDFMergerUtility();
merger.appendDocument(mergedPDF, mainDoc);
PDDocument doc1 = null;
if (is1 != null) {
doc1 = PDDocument.load(is1);
merger.appendDocument(mergedPDF, doc1);
//1st file appended to main pdf");
}
ByteArrayOutputStream baos = new ByteArrayOutputStream();
mergedPDF.save(baos);
//Now either u save it here or convert into InputStream if u want
ByteArrayInputStream mergedInputStream = new ByteArrayInputStream(baos.toByteArray());

how to set attributes for existing pdf that contains only images using java itext?

I would like to set attributes to pdf before uploading it into a server.
Document document = new Document();
try
{
OutputStream file = new FileOutputStream({Localpath});
PdfWriter.getInstance(document, file);
document.open();
//Set attributes here
document.addTitle("TITLE");
document.close();
file.close();
} catch (Exception e)
{
e.printStackTrace();
}
But its not working. The file is getting corrupted
In a comment to another answer the OP clarified:
I want to set attributes to an existing pdf(not to create new pdf)
Obviously, though, his code creates a new document from scratch (as is obvious from the fact that a mere FileOutputStream is used to access the file, no reading, only writing).
To manipulate an existing PDF, one has to use a PdfReader / PdfWriter couple. Bruno Lowagie provided an example for that in his answer to the stack overflow question "iText setting Creation Date & Modified Date in sandbox.stamper.SuperImpose.java":
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
Map info = reader.getInfo();
info.put("Title", "New title");
info.put("CreationDate", new PdfDate().toString());
stamper.setMoreInfo(info);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
XmpWriter xmp = new XmpWriter(baos, info);
xmp.close();
stamper.setXmpMetadata(baos.toByteArray());
stamper.close();
reader.close();
}
(ChangeMetadata.java)
As you see the code sets the metadata both in the ol'fashioned PDF info dictionary (stamper.setMoreInfo) and in the XMP metadata (stamper.setXmpMetadata).
Obviously src and dest should not be the same here.
Without a second file
In yet another comment the OP clarified that he had already tried a similar solution but that he wants to prevent the
Temporary existence of second file
This can easily be prevented by first reading the original PDF into a byte[] and then stamping to it as the target file. E.g. if File singleFile references the original file which is also to be the target file, you can implement:
byte[] original = Files.readAllBytes(singleFile.toPath());
PdfReader reader = new PdfReader(original);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(singleFile));
Map<String, String> info = reader.getInfo();
info.put("Title", "New title");
info.put("CreationDate", new PdfDate().toString());
stamper.setMoreInfo(info);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
XmpWriter xmp = new XmpWriter(baos, info);
xmp.close();
stamper.setXmpMetadata(baos.toByteArray());
stamper.close();
reader.close();
(UpdateMetaData test testChangeTitleWithoutTempFile)

How to convert an html String to a PDF InputStream?

If we have string with a content of a html page, how can we convert it to a InputStream made after transform this string to a pdf document?
I'm trying to use iText with XMLWorkerHelper, and this following code works, but the problem is I don't want the output on a file. I have tried several variations in order to get the result on a InputStream that I could convert to a Primefaces StreamedContent but no success. How we can do it?
Is there another technique that we can use to solve this problem?
The motivation to this is use xhtml files wich is already rendered and output it as a pdf to be downloaded by the user.
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document,
new FileOutputStream("results/loremipsum.pdf"));
document.open();
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new FileInputStream("/html/loremipsum.html"));
document.close();
If you need an InputStream from which some other code can read the PDF your code produces, you can simply create the PDF using a byte array output stream and thereafter wrap the byte array from that stream in a byte array input stream:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, baos);
document.open();
XMLWorkerHelper.getInstance().parseXHtml(writer, document, new FileInputStream("/html/loremipsum.html"));
document.close();
ByteArrayInputStream pdfInputStream = new ByteArrayInputStream(baos.toByteArray());
You can optimize this a bit by creating and processing the PDF in different threads and using a PipedOutputStream and a PipedInputStream instead.

binary pdf byte[] to com.lowagie.text.Document

I had an binary pdf(Byte []). I would like to convert it to a com.lowagie.text.Document.is there anyway to convert it without losing any information on it.Thanks
this is i tried, and to not sure how to move further.
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, out);
document.open();
PdfReader reader = new PdfReader(byteBuffer);

Categories