PDF context into String from a blob - java

I want to extract PDF context into a String. I've done that previously using PDFBox but it doesn't support a lot of fonts I have in my PDFs.
Decided to use iText instead. How can I do that using getByteStream from a blob rather than a file on disk?
Blob blobPdf = ...;
File outputFile = new File("/tmp/blah/whatever.pdf");
FileOutputStream fout = new FileOutputStream(outputFile);
IOUtils.copy(blobPdf.getBinaryStream(), fout);
I want this sort of logic but insert the context into the String variable instead. How can I do that?
#EDIT
This is my attempt
InputStream is = resultSet.getBinaryStream(3);
PdfReader reader = new PdfReader(is);
String text = PdfTextExtractor.getTextFromPage(reader, 1);
System.out.println(text);

Related

Delete pdf pages in java with iTextpdf

I have an existing function to show pdf files that I can't change.
The input of function is an InputStream variable.
In the past they used to pass a pdf file to it and it shows it.
But right now they asked me to show only first 30 pages of the pdf. So I am using iTextpdf and I do something like this:
PdfReader reader = new PdfReader (inputStream);
reader.selectPages("1-30");
Now I should send the result as InputStream variable to show method.
How I should do it?
Thanks
You can store the result using a PdfStamper like this:
PdfReader reader = new PdfReader (inputStream);
reader.selectPages("1-30");
ByteArrayOutputStream os = new ByteArrayOutputStream();
PdfStamper stamper = new PdfStamper(reader, os);
stamper.close();
byte[] changedPdf = os.toByteArray();
If you want the result again to be in the InputStream inputStream variable, simply add a line
inputStream = new ByteArrayInputStream(changedPdf);
Get the reader of existing pdf file by
PdfReader pdfReader = new PdfReader("source pdf file path");
Now update the reader by
reader.selectPages("1-5,15-20");
then get the pdf stamper object to write the changes into a file by
PdfStamper pdfStamper = new PdfStamper(pdfReader,
new FileOutputStream("destination pdf file path"));
close the PdfStamper by
pdfStamper.close();
It will close the PdfReader too.

iText PdfCopy creates editable pdf document

I have a template pdf file which is used in a spring boot application. I need to update values in this template based on user input per request. Also in the request i will get multiple pdf files I need to merge those files along with updated file which is first page of final pdf.
I am using iText with Spring Boot. I am able to update the values in template and merge file content as well but final pdf is coming as editable with files are hidden. If i click on that filed i can able to see my values also can able to edit.
public void mergefiles(Map<String, String> tempData,MultipartFile[] userInfoFiles)
throws Exception{
FileOutputStream mergeOutStream = new FileOutputStream(new File("C:\\UpdateFile\\mergepath\\updatetem.pdf")); //To update user content to Template
PdfReader reader = new PdfReader(new FileInputStream(new File("C:\\UpdateFile\\template\\template.pdf"))); //Template File Stream
PdfStamper stamper = new PdfStamper(reader, mergeOutStream);
stamper.setFormFlattening(false);
AcroFields form = stamper.getAcroFields();
Map<String, Item> fieldMap = form.getFields();
for (String key : fieldMap.keySet()) {
String fieldValue = dataMap.get(key);
if (fieldValue != null) {
form.setField(key, fieldValue);
}
}
//Above part creates updated pdf with read only
//Below section creates merged file but first page is editable with
//filed values are hidden.
Document mergePdfDoc = new Document();
PdfCopy pdfCopy;
boolean smartCopy = false;
FileOutputStream newmergeOutStream = new FileOutputStream(new File("C:\\UpdateFile\\mergepath\\newmerged.pdf"));
if(smartCopy)
pdfCopy = new PdfSmartCopy(mergePdfDoc, newmergeOutStream);
else
pdfCopy = new PdfCopy(mergePdfDoc, newmergeOutStream);
mergePdfDoc.open();
pdfCopy.addDocument(stamper.getReader());
pdfCopy.freeReader(stamper.getReader());
PdfReader[] pdfReader = new PdfReader[userInfoFiles.length];
for(int i=0; i<=userInfoFiles.length-1;i++) {
pdfReader[i] = new PdfReader(userInfoFiles[i].getInputStream());
pdfCopy.addDocument(pdfReader[i]);
pdfCopy.freeReader(pdfReader[i]);
pdfReader[i].close();
}
stamper.close();
mergeOutStream.close();
mergePdfDoc.close();
}
Any input why final pdf is in editable form and filed values are hidden. I have to create a merged document and get ByteArray stream of the final document as its input to another function call.I am Using iText5.
The problem is that you add the PdfReader the PdfStamper is based on as input to your PdfCopy:
pdfCopy.addDocument(stamper.getReader());
The reader a stamper works on is dirty: some changes applied via the stamper are made to the objects the reader holds, some are only in the stamper or its output.
E.g. in your case the form fields are already defined in the original pdf. The field value is added to this field directly. Thus, it gets changed in the reader. But the appearance, the field visualization including a drawing of its current value, gets generated in a new indirect object which is added to the stamper output. Thus, there still is the original, empty visualization in the reader.
In a pdf viewer, therefore, the PdfCopy result at first has the looks of empty fields (as the appearances have been generated in the stamper only) but when editing a field, the changed value becomes visible (because the field editor is initialized with the field value).
To fix this, don't use the dirty reader but instead create a new, clean reader from the stamping result.
First create the stamped file:
FileOutputStream mergeOutStream = new FileOutputStream(new File("C:\\UpdateFile\\mergepath\\updatetem.pdf")); //To update user content to Template
PdfReader reader = new PdfReader(new FileInputStream(new File("C:\\UpdateFile\\template\\template.pdf"))); //Template File Stream
PdfStamper stamper = new PdfStamper(reader, mergeOutStream);
stamper.setFormFlattening(false);
AcroFields form = stamper.getAcroFields();
Map<String, Item> fieldMap = form.getFields();
for (String key : fieldMap.keySet()) {
String fieldValue = dataMap.get(key);
if (fieldValue != null) {
form.setField(key, fieldValue);
}
}
stamper.close();
And then merge:
Document mergePdfDoc = new Document();
PdfCopy pdfCopy;
boolean smartCopy = false;
FileOutputStream newmergeOutStream = new FileOutputStream(new File("C:\\UpdateFile\\mergepath\\newmerged.pdf"));
if(smartCopy)
pdfCopy = new PdfSmartCopy(mergePdfDoc, newmergeOutStream);
else
pdfCopy = new PdfCopy(mergePdfDoc, newmergeOutStream);
mergePdfDoc.open();
PdfReader reader = new PdfReader(new FileInputStream(new File("C:\\UpdateFile\\mergepath\\updatetem.pdf")));
pdfCopy.addDocument(reader);
pdfCopy.freeReader(reader);
PdfReader[] pdfReader = new PdfReader[userInfoFiles.length];
for(int i=0; i<=userInfoFiles.length-1;i++) {
pdfReader[i] = new PdfReader(userInfoFiles[i].getInputStream());
pdfCopy.addDocument(pdfReader[i]);
pdfCopy.freeReader(pdfReader[i]);
pdfReader[i].close();
}
mergeOutStream.close();
mergePdfDoc.close();
}

merge many pdf files into one pdf files in web application java

I have many pdf files and I have to merge all pdf into one big pdf file and render it into browser.I am using itext. Using this, I am able to merge pdf files into one file into disk but I cannot merge into browser and there is only last pdf in browser..following is my code.. please help me on this.
Thanks in advance.
Document document = new Document();
List<PdfReader> readers =
new ArrayList<PdfReader>();
int totalPages = 0;
ServletOutputStream servletOutPutStream = response.getOutputStream();;
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();;
InputStream is=null;
List<InputStream> inputPdfList = new ArrayList<InputStream>();
System.err.println(imageMap.size());
for(byte[] imageList:imageMap)
{
System.out.println(imageList.toString()+" "+imageList.length);
byteArrayOutputStream.write(imageList);
byteArrayOutputStream.writeTo(response.getOutputStream());
is = new ByteArrayInputStream(byteArrayOutputStream.toByteArray());
inputPdfList.add(is);
}
response.setContentType("application/pdf");
response.setContentLength(byteArrayOutputStream.size());
System.out.println(inputPdfList.size()+""+inputPdfList.toString());
//Create pdf Iterator object using inputPdfList.
Iterator<InputStream> pdfIterator =
inputPdfList.iterator();
// Create reader list for the input pdf files.
while (pdfIterator.hasNext()) {
InputStream pdf = pdfIterator.next();
PdfReader pdfReader = new PdfReader(pdf);
readers.add(pdfReader);
totalPages = totalPages + pdfReader.getNumberOfPages();
}
// Create writer for the outputStream
PdfWriter writer = PdfWriter.getInstance(document, response.getOutputStream());
//Open document.
document.open();
//Contain the pdf data.
PdfContentByte pageContentByte = writer.getDirectContent();
PdfImportedPage pdfImportedPage;
int currentPdfReaderPage = 1;
Iterator<PdfReader> iteratorPDFReader = readers.iterator();
// Iterate and process the reader list.
while (iteratorPDFReader.hasNext()) {
PdfReader pdfReader = iteratorPDFReader.next();
//Create page and add content.
while (currentPdfReaderPage <= pdfReader.getNumberOfPages()) {
document.newPage();
pdfImportedPage = writer.getImportedPage(
pdfReader,currentPdfReaderPage);
pageContentByte.addTemplate(pdfImportedPage, 0, 0);
currentPdfReaderPage++;
}
currentPdfReaderPage = 1;
}
//Close document and outputStream.
servletOutPutStream.flush();
outputStream.flush();
document.close();
outputStream.close();
servletOutPutStream.close();
System.out.println("Pdf files merged successfully.");
There are numerous errors in your code:
Only write to the response output stream what you want to return to the browser
Your code writes a wild collection of data to the response output stream:
ServletOutputStream servletOutPutStream = response.getOutputStream();;
[...]
for(byte[] imageList:imageMap)
{
[...]
byteArrayOutputStream.writeTo(response.getOutputStream());
[...]
}
[...]
PdfWriter writer = PdfWriter.getInstance(document, response.getOutputStream());
[... merge PDFs into the writer]
servletOutPutStream.flush();
document.close();
servletOutPutStream.close();
This results in many copies of the imageMap elements to be written there and the merged file only to be added thereafter.
What do you expect the browser to do, ignore all the leading source PDF copies until finally the merged PDF appears?
Thus, please only write the merged PDF to the response output stream.
Don't write a wrong content length
It is a good idea to write the content length to the response... but only if you use the correct value!
In your code you write a content length:
response.setContentLength(byteArrayOutputStream.size());
but the byteArrayOutputStream at this time only contains a wild mix of copies of the source PDFs and not yet the final merged PDF. Thus, this will only serve to confuse the browser even more.
Thus, please do not add false headers to the response.
Don't mangle your input data
In the loop
for(byte[] imageList:imageMap)
{
System.out.println(imageList.toString()+" "+imageList.length);
byteArrayOutputStream.write(imageList);
byteArrayOutputStream.writeTo(response.getOutputStream());
is = new ByteArrayInputStream(byteArrayOutputStream.toByteArray());
inputPdfList.add(is);
}
you take byte arrays which I assume contain a single source PDF each, pollute the response output stream with them (as mentioned before), and create a collection of input streams where the first one contains the first source PDF, the second one contains the concatenation of the first two source PDFs, the third one the concatenation of the first three source PDFs, etc...
Because you never reset or re-instantiate the byteArrayOutputStream, it only gets bigger and bigger.
Thus, please start or end loops like this with a reset of the byteArrayOutputStream.
(Actually you don't need that loop at all, the PdfReader has a constructor which can immediately take a byte[], no need to wrap it in a byte stream.)
Don't merge PDFs using a plain PdfWriter, use a PdfCopy
You merge the PDFs using a PdfWriter / getImportedPage / addTemplate approach. There are dozens of questions and answer on stack overflow (many of them answered by iText developers) explaining that this usually is a bad idea and that you should use PdfCopy.
Thus, please make use of the many good answers which already exist on this topic here and use PdfCopy for merging.
Don't flush or close streams only because you can
You finalize the response output by closing numerous streams:
//Close document and outputStream.
servletOutPutStream.flush();
outputStream.flush();
document.close();
outputStream.close();
servletOutPutStream.close();
I have not seen a line in which you declared or set that outputStream variable, but even if it contained the response output stream, there is no need to close that because you already close it in the servletOutPutStream variable.
Thus, please remove unnecessary calls like this.
//Suppose we want to merge one pdf with another main pdf
InputStream is1 = null;
if (file1 != null) {
FileInputStream fis1 = new FileInputStream(file1);
byte[] file1Data = new byte[(int) file1.length()];
fis1.read(file1Data);
is1 = new java.io.ByteArrayInputStream(file1Data);
}
//
InputStream mainContent = <ur main content>
org.apache.pdfbox.pdmodel.PDDocument mergedPDF = new org.apache.pdfbox.pdmodel.PDDocument();
org.apache.pdfbox.pdmodel.PDDocument mainDoc = org.apache.pdfbox.pdmodel.PDDocument.load(mainContent);
org.apache.pdfbox.multipdf.PDFMergerUtility merger = new org.apache.pdfbox.multipdf.PDFMergerUtility();
merger.appendDocument(mergedPDF, mainDoc);
PDDocument doc1 = null;
if (is1 != null) {
doc1 = PDDocument.load(is1);
merger.appendDocument(mergedPDF, doc1);
//1st file appended to main pdf");
}
ByteArrayOutputStream baos = new ByteArrayOutputStream();
mergedPDF.save(baos);
//Now either u save it here or convert into InputStream if u want
ByteArrayInputStream mergedInputStream = new ByteArrayInputStream(baos.toByteArray());

how to set attributes for existing pdf that contains only images using java itext?

I would like to set attributes to pdf before uploading it into a server.
Document document = new Document();
try
{
OutputStream file = new FileOutputStream({Localpath});
PdfWriter.getInstance(document, file);
document.open();
//Set attributes here
document.addTitle("TITLE");
document.close();
file.close();
} catch (Exception e)
{
e.printStackTrace();
}
But its not working. The file is getting corrupted
In a comment to another answer the OP clarified:
I want to set attributes to an existing pdf(not to create new pdf)
Obviously, though, his code creates a new document from scratch (as is obvious from the fact that a mere FileOutputStream is used to access the file, no reading, only writing).
To manipulate an existing PDF, one has to use a PdfReader / PdfWriter couple. Bruno Lowagie provided an example for that in his answer to the stack overflow question "iText setting Creation Date & Modified Date in sandbox.stamper.SuperImpose.java":
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
Map info = reader.getInfo();
info.put("Title", "New title");
info.put("CreationDate", new PdfDate().toString());
stamper.setMoreInfo(info);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
XmpWriter xmp = new XmpWriter(baos, info);
xmp.close();
stamper.setXmpMetadata(baos.toByteArray());
stamper.close();
reader.close();
}
(ChangeMetadata.java)
As you see the code sets the metadata both in the ol'fashioned PDF info dictionary (stamper.setMoreInfo) and in the XMP metadata (stamper.setXmpMetadata).
Obviously src and dest should not be the same here.
Without a second file
In yet another comment the OP clarified that he had already tried a similar solution but that he wants to prevent the
Temporary existence of second file
This can easily be prevented by first reading the original PDF into a byte[] and then stamping to it as the target file. E.g. if File singleFile references the original file which is also to be the target file, you can implement:
byte[] original = Files.readAllBytes(singleFile.toPath());
PdfReader reader = new PdfReader(original);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(singleFile));
Map<String, String> info = reader.getInfo();
info.put("Title", "New title");
info.put("CreationDate", new PdfDate().toString());
stamper.setMoreInfo(info);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
XmpWriter xmp = new XmpWriter(baos, info);
xmp.close();
stamper.setXmpMetadata(baos.toByteArray());
stamper.close();
reader.close();
(UpdateMetaData test testChangeTitleWithoutTempFile)

Convert a PDF with forms to a PDF with text only (preserve data) using iText

I have multiple PDFs that get populated with multiple records (a.pdf,b.pdf,c[0-9].pdf,d[0-9].pdf,ez.pdf) using acroforms and pdfbox.
The resulting files (aflat.pdf,bflat.pdf,c[0-9]flat.pdf,d[0-9]flat.pdf,ezflat.pdf) should have their forms(dictionaries and whatever adobe uses) removed but the fields filled as raw text saved on the pdf (setReadOnly is not what I want!).
PdfStamper can only remove fields without saving their content but I've found some references to PdfContentByte as a way to save the content. Alas, the documentation is too brief to understand how I should do this.
As a last resort I could use FieldPosition to write directly on the PDF. Has anyone ever encountered such problem? How do I solve it?
UPDATE: Saving a single page of b.pdf yields a valid bfilled.pdf but a blank bflattened.pdf. Saving the whole document solved the issue.
populateB();
try (PDDocument doc = new PDDocument(); FileOutputStream stream = new FileOutputStream("bfilled.pdf")) {
//importing the page will corrupt the fields
/*wrong approach*/doc.importPage((PDPage)pdfDocuments.get(0).getDocumentCatalog().getAllPages().get(0));
/*wrong approach*/doc.save(stream);
//save the whole document instead
pdfDocuments.get(0).save(stream);//<---right approach
}
try (FileOutputStream stream = new FileOutputStream("bflattened.pdf")) {
PdfStamper stamper = new PdfStamper(new PdfReader("bfilled.pdf"), stream);
stamper.setFormFlattening(true);
stamper.close();
}
Use PdfStamper.setFormFlattening(true) to get rid of the fields and write them as content.
Always use the whole page when working with acroforms
populateB();
try (PDDocument doc = new PDDocument(); FileOutputStream stream = new FileOutputStream("bfilled.pdf")) {
//importing the page will corrupt the fields
doc.importPage((PDPage) pdfDocuments.get(0).getDocumentCatalog().getAllPages().get(0));
doc.save(stream);
//save the whole document instead
pdfDocuments.get(0).save(stream);
}
try (FileOutputStream stream = new FileOutputStream("bflattened.pdf")) {
PdfStamper stamper = new PdfStamper(new PdfReader("bfilled.pdf"), stream);
stamper.setFormFlattening(true);
stamper.close();
}

Categories