Split a PDF page in two parts [duplicate]

Split a PDF page in two parts [duplicate] - java

This question already has an answer here:
itext Split PDF Vertically
(1 answer)
Closed 6 years ago.
I would like to take a single-page PDF, and than split it in two parts (cutting that page in the middle), without considering the text on that page. I'm using iText, but I don't find any examples on how to do this.

You cannot really split a page, it would be a quite difficult task, what you can do is to clone content of a page inside a new one with half its original size, and repeat for the second page applying a translation to the content.
I show an example with PDFBox , I'm using it lately and I had a sandbox project ready to do the test, surely you can do the same with iText.
package printit;
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
public class CutIt {
public static void main(String[] args) throws IOException {
PDDocument outdoc = new PDDocument();
PDDocument doc = PDDocument.load(new File("sample_1.pdf"));
PDPage page = (PDPage) doc.getDocumentCatalog().getPages().get(0);
PDRectangle cropBox = page.getCropBox();
float upperRightY = cropBox.getUpperRightY();
float lowerLeftY = cropBox.getLowerLeftY();
cropBox.setLowerLeftY(upperRightY/2);
page.setCropBox(cropBox);
outdoc.importPage(page);
cropBox = page.getCropBox();
cropBox.setUpperRightY(upperRightY/2);
cropBox.setLowerLeftY(lowerLeftY);
page.setCropBox(cropBox);
outdoc.importPage(page);
outdoc.save("cut.pdf");
outdoc.close();
doc.close();
}
}

Related

Apache PDFBox - vertical match between image and text position

I need help to achieve a mapping between text and image objects in a PDF document.
As the first figure shows, my PDF documents have 3 images arranged randomly in the y-direction. To the left of them are texts. The texts extend along the height of the images.
My goal is to combine the texts into "ImObj" objects (see the class ImObj).
The 2nd figure shows that I want to use the height of the image to detect the position of the texts (all texts outside of the image height should be ignored). In the example, there will be 3 ImObj-objects formed by the 3 images.
The link to the pdf file is here (on wetransfer):
[enter link description here][3]
But my mapping does not work, because I probably use the wrong coordinates from the image. Now I have already looked at some examples, but I still don't really understand how to get the coordinates of text and images working together?
Here is my code:
import java.awt.Image;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.contentstream.operator.Operator;
import org.apache.pdfbox.cos.COSBase;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.TextPosition;
import org.apache.pdfbox.util.Matrix;
public class ImExample extends PDFTextStripper {
public static void main(String[] args) {
File file = new File("C://example document.pdf");
try {
PDDocument document = PDDocument.load(file);
ImExample example = new ImExample();
for (int pnr = 0; pnr < document.getPages().getCount(); pnr++) {
PDPage page = document.getPages().get(pnr);
PDResources res = page.getResources();
example.processPage(page);
int idx = 0;
for (COSName objName : res.getXObjectNames()) {
PDXObject xObj = res.getXObject(objName);
if (xObj instanceof PDImageXObject) {
System.out.println("...add a new image");
PDImageXObject imXObj = (PDImageXObject) xObj;
BufferedImage image = imXObj.getImage();
// Here is my mistake ... but I do not know how to solve it.
ImObj imObj = new ImObj(image, idx++, pnr, image.getMinY(), image.getMinY() + image.getHeight());
example.imObjects.add(imObj);
}
}
}
example.setSortByPosition(true);
example.getText(document);
// Output
for (ImObj iObj : example.imObjects)
System.out.println(iObj.idx + " -> " + iObj.text);
document.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public List<ImObj> imObjects = new ArrayList<ImObj>();
public ImExample() throws IOException {
super();
}
#Override
protected void writeString(String text, List<TextPosition> textPositions) throws IOException {
// match between imagesize and textposition
TextPosition txtPos = textPositions.get(0);
for (ImObj im : imObjects) {
if(im.page == (this.getCurrentPageNo()-1))
if (im.minY < txtPos.getY() && (txtPos.getY() + txtPos.getHeight()) < im.maxY)
im.text.append(text + " ");
}
}
}
class ImObj {
float minY, maxY;
Image image = null;
StringBuilder text = new StringBuilder("");
int idx, page = 0;
public ImObj(Image im, int idx, int pnr, float yMin, float yMax) {
this.idx = idx;
this.image = im;
this.minY = yMin;
this.maxY = yMax;
this.page = pnr;
}
}
Best regards

You're looking for the images in the (somewhat) wrong place!
You iterate over the image XObject resources of the page itself and inspect them. But this is not helpful:
An image XObject resource merely is that, a resource. I.e. it can be used on the page, even more than once, but you cannot determine from this resource alone how it is used (where? at which scale? transformed somehow?)
There are other places an image can be stored and used on a page, e.g. in the resources of some form XObject or pattern used on the page, or inline in the content stream.
What you actually need is to parse the page content stream for uses of images and the current transformation matrix at the time of use. For a basic implementation of this have a look at the PDFBox example PrintImageLocations.
The next problem you'll run into is that the coordinates PDFBox returns in the TextPosition methods getX and getY is not from the original coordinate system of the PDF page in question but from some coordinate system normalized for the purpose of easier handling in the text extraction code. Thus, you most likely should use the un-normalized coordinates.
You can find information on that in this answer.

PDFBox: COSStream has been closed and cannot be read. Adding PDF File to PDF. And Review

Topic: attempting to open a COSStream after you have added files to your mainPDF. In other words, attempting to write text to your main PDF Document after you have added PDF files to your main PDF File. I do not want to save or close my mainPDF because I plan on adding more PDFFiles and more text again and again. In my case, I am attempting to open a PDPageContentStream contentStream after using TreeMerge. Apparently, the main document is closing prematurely?
Scroll down to see my code below
There is certainly lots of trouble with this topic since around 2014:
I really wish the answers to these questions would post complete code. So, I made a list of posts of similar problems and attempts at solving this (scroll below). I am sure many more begginners are going to run into this problem.
#mkl and #community https://stackoverflow.com/a/49973366/14092356
Try using newDoc.addPage(newDoc.importPage(doc.getPage(0))); instead of newDoc.addPage(doc.getPage(1));
Remember to close the new files you are inserting int your mainPDF
When adding your new files, make sure you assign them to a variable so garbage collection does not close it prematurely either
Try adding these files to an ArrayList of PDDocuments https://stackoverflow.com/a/61299092/14092356
Where the PDPages are instantiated matters. COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?
From the post: PDFBox IO Exception: COSStream has been closed and cannot be read. The comment https://stackoverflow.com/a/55591429/14092356 - did not work for me: I post the code here
'''
public void tableOfContents() throws IOException {
String path1 = new File("Index1.pdf").getAbsoluteFile().toString();
File file1 = new File(path1);
String path2 = new File("Index2.pdf").getAbsoluteFile().toString();
File file2 = new File(path2);
PDFMergerUtility merger = new PDFMergerUtility();
PDDocument combine = PDDocument.load(file1);
PDDocument combine2 = PDDocument.load(file2);
merger.appendDocument(mainDocument, combine);
//combine.close();
merger.appendDocument(mainDocument, combine2);
merger.mergeDocuments();
merger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
}
More Resources
PDFBox COSStream closed before use
java.io.IOException: COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?
"IOException: COSStream has been closed and cannot be read" when trying to save PDF after adding page with PdfBox
Merging lots of pdf files is going to be difficult because you need to keep track of opening and closing https://issues.apache.org/jira/browse/PDFBOX-3901
Reading a particular page from a PDF document using PDFBox
Here is my code with my problem
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.interactive.documentnavigation.outline.PDOutlineItem;
import java.io.IOException;
public class main {
public static void main(String[] args) throws IOException {
PDDocument document = new PDDocument();
PDOutlineItem pagesOutline = new PDOutlineItem();
for (int numberOfPages = 0; numberOfPages < 5; numberOfPages++) {
//Creating a blank page
PDPage blankPage = new PDPage();
//Adding the blank page to the document
document.addPage(blankPage);
}
ExampleImportingClass example = new ExampleImportingClass(document);
document.save("exampleError.pdf");
System.out.println("PDF created");
document.close();
}
}
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.PDPageTree;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import java.io.File;
import java.io.IOException;
public class ExampleImportingClass {
private PDDocument document;
public ExampleImportingClass(PDDocument document) throws IOException {
this.document = document;
page1();
page2_3();
page4();
}
public void page1() throws IOException {
PDPage page1 = document.getPage(0);
String path = new File("catPicture3.jpg").getAbsoluteFile().toString();
PDImageXObject pdImage = PDImageXObject.createFromFile(path, document);
PDPageContentStream content1 = new PDPageContentStream(document, page1);
content1.beginText();
content1.endText();
content1.drawImage(pdImage, 50, 400,300,300);
content1.close();
}
public void page2_3() throws IOException {
String path1 = new File("Example1.pdf").getAbsoluteFile().toString();
File file1 = new File(path1);
String path2 = new File("Example2.pdf").getAbsoluteFile().toString();
File file2 = new File(path2);
PDPage page1 = document.getPage(0);
PDPage page2 = document.getPage(2);
PDPageTree mergePD = document.getPages();
PDDocument doc1 = PDDocument.load(file1);
PDDocument doc2 = PDDocument.load(file2);
mergePD.insertAfter(doc1.getPage(0), page1);
mergePD.insertAfter(doc2.getPage(0), page2);
}
public void page4() throws IOException {
PDPage page = document.getPage(3);
PDPageContentStream content1 = new PDPageContentStream(document, page);
content1.beginText();
content1.setFont(PDType1Font.TIMES_ROMAN, 14);
content1.newLineAtOffset(50, 350);
content1.showText("If we remove page4() method AND its not working, then its not ");
content1.endText();
content1.close();
}
Here is the infamous error stack trace:
Aug 27, 2020 11:10:48 PM org.apache.pdfbox.cos.COSDocument finalize
WARNING: Warning: You did not close a PDF Document
Aug 27, 2020 11:10:48 PM org.apache.pdfbox.cos.COSDocument finalize
WARNING: Warning: You did not close a PDF Document
Exception in thread "main" java.io.IOException: COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?
at org.apache.pdfbox.cos.COSStream.checkClosed(COSStream.java:154)
at org.apache.pdfbox.cos.COSStream.createRawInputStream(COSStream.java:204)
at org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1219)
at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:475)
at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:158)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:526)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObjects(COSWriter.java:464)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:448)
at org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1113)
at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:452)
at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1386)
at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1273)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1357)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1328)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1316)
at main.main(main.java:22)

Several things can cause this error. What finally worked for me was adding an offset value and blank text at the beginning of the stream. So I thought this was merging error because the problem disappeared when I stopped merging new PDF Files. However, it was a problem with my cover page. The COSStream from the cover page interfered with pdf files being merged.
My cover page only had an image and no text. For reasons still unknown to me, this was causing the error. Once I an offset for text and blank text, the program compiled.
Note that PDImage already had an offset value before. Apparently this was not enough for the compiler.
content1.newLineAtOffset(50, 350);
content1.setFont(PDType1Font.TIMES_ROMAN, 14);
content1.showText("If we add this code it works now. Strange");
public void page1() throws IOException {
PDPage page1 = document.getPage(0);
String path = new File("catPicture3.jpg").getAbsoluteFile().toString();
PDImageXObject pdImage = PDImageXObject.createFromFile(path, document);
PDPageContentStream content1 = new PDPageContentStream(document, page1);
content1.beginText();
content1.newLineAtOffset(50, 350);
content1.setFont(PDType1Font.TIMES_ROMAN, 14);
content1.showText("If we add this code it works now. Strange");
content1.endText();
content1.drawImage(pdImage, 50, 400,300,300);
content1.close();
}

PDFBox does not correctly render Simsun (chinese) font

Context
I am writing a Java code which fill PDF Forms using PDFBox with some user inputs.
Some of the inputs are in Chinese.
When I generated the PDF, I don't have any errors in the logs but the rendered text is absolutely not the same.
What I currently have
Here is what I do:
In the PDF file, I specified the SimSun font for the field using Adobe Pro.
This font handle Simplified Chinese characters.
I have the font SimSun installed on my server.
PDFBox doesn't display any error (if I remove the SimSun font from my server then PDFBox fallback on another font that is not able to render the characters). So i guess it is able to find the font and use it.
What I tried
I was able to make this work but I had to manually load the font in the code and add it to the PDF (see examples below).
But that is not a solution as it means that I would have to load the font every time and add it the the PDF. I would also have to do the same for many other languages.
As far as I understood, PDFBox should be able to use any fonts installed on the server.
Below is a test class that tries 3 different approaches. Only the last one works so far:
Classic generation
Simply put Chinese characters inside the text field without changing anything.
The characters are not rendered correctly (some of them are missing and the ones displayed does not match the input).
Generation with embedded font
Try to embed the SimSun font inside the PDF with the PDResource.add(font) method.
The result is the same as the first method.
Embed the font and use it
I embed the SimSun font and I also override the font used in the TextField to use the SimSun font I just added.
This approach works.
After quite a few readings, I found out that the issue might come from the version of the font I am using.
Windows 8 (which I use to create the form) uses v5.04 of Simsun font.
I use v2.10 on my laptop and my servers, both being Linux based (I can not find the v5.04).
However, I don't know:
If the issue is really coming from this.
If I have the right to use this font, as it is developed by Microsoft (and Apple).
Where to find the latest version of it.
I tried using another font but:
I only find OTF fonts (and not TTF) that support Chinese characters.
PDFBox does not support OTF (yet). It is planed for v3.0.0.
So if someone has an idea on how to make this work without having to embed and change the font's name in the code, that would be great!
Here are the PDF I used and the code that tests the 3 methods I talked about.
The TextField in the pdf is named comment.
package org.test;
import org.apache.pdfbox.cos.COSDictionary;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.cos.COSString;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType0Font;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Hello world!
*/
public class App {
private static final String SIMPLIFIED_CHINESE_STRING = "我不明白为什么它不起作用。";
public static void main(String[] args) throws IOException {
System.out.println("Hello World!");
// Test 1
classicGeneration();
// Test 2
generationWithEmbededFont();
Test 3
generationWithFontOverride();
System.out.println("Bye!");
}
/**
* Classic PDF generation without any changes to the PDF.
*/
private static void classicGeneration() throws IOException {
PDDocument document = loadPdf();
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
PDField commentField = acroForm.getField("comment");
commentField.setValue(SIMPLIFIED_CHINESE_STRING);
document.save(new File("result-classic-generation.pdf"));
}
/**
* Trying to embed the font in the PDF. It doesn't seem to work.
* The result is the same as classicGeneration method.
*/
private static void generationWithEmbededFont() throws IOException {
PDDocument document = loadPdf();
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
PDFont font = PDType0Font.load(document, new File("/usr/share/fonts/SimSun.ttf"));
PDResources res = acroForm.getDefaultResources();
if (res == null) {
res = new PDResources();
}
COSName fontName = res.add(font);
acroForm.setDefaultResources(res);
PDField commentField = acroForm.getField("comment");
commentField.setValue(SIMPLIFIED_CHINESE_STRING);
document.save(new File("result-with-embeded-font.pdf"));
}
/**
* Embed the font in the PDF and change the font used in the TextField to use this one.
* Here the PDF is correctly rendered and all the characters are displayed.
* #throws IOException
*/
private static void generationWithFontOverride() throws IOException {
PDDocument document = loadPdf();
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
PDField commentField = acroForm.getField("comment");
// Load the font
InputStream resourceAsStream = Thread.currentThread().getContextClassLoader().getResourceAsStream("SimSun.ttf");
PDFont font = PDType0Font.load(document, resourceAsStream);
PDResources res = acroForm.getDefaultResources();
if (res == null) {
res = new PDResources();
}
COSName fontName = res.add(font);
acroForm.setDefaultResources(res);
// Change the font used by the TextField
COSDictionary dict = commentField.getCOSObject();
COSString defaultAppearance = (COSString) dict.getDictionaryObject(COSName.DA);
if (defaultAppearance != null) {
String currentFont = dict.getString(COSName.DA);
// Retrieve the current font size and color used for the field in order to use the same but with the new font.
String regex = "[\\w]* ([\\w\\s]*)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(currentFont);
// Default font size if we fail to extract the current one
String fontSize = " 11 Tf";
if (matcher.find()) {
fontSize = " " + matcher.group(1);
}
// Change the font of the TextField.
dict.setString(COSName.DA, "/" + fontName.getName() + fontSize);
}
commentField.getCOSObject().addAll(dict);
commentField.setValue(SIMPLIFIED_CHINESE_STRING);
document.save(new File("result-with-font-override.pdf"));
}
// HELPER
private static PDDocument loadPdf() throws IOException {
InputStream stream = Thread.currentThread().getContextClassLoader().getResourceAsStream("sample.pdf");
return PDDocument.load(stream);
}
}

extract image from image

Is it possible to extract an image from a jpeg, png or tiff file? NOT PDF! Suppose I have a file containing both text and images in jpeg format (so it's basically a picture); I want to be able to extract the image only programmatically (preferably using Java). If anyone knows useful libraries please let me know. I have already tried AspriseOCR and tesseract-ocr, they have been successful at extracting text only (obviously).
Thank you.

Try :
int startProintX = xxx;
int startProintY = xxx;
int endProintX = xxx;
int endProintY = xxx;
BufferedImage image = ImageIO.read(new File("D:/temp/test.jpg"));
BufferedImage out = image.getSubimage(startProintX, startProintY, endProintX, endProintY);
ImageIO.write(out, "jpg", new File("D:/temp/result.jpg"));
These point are region of image you want to extract.
Extract image from pdf file
I suggest to change your post tile. You can use pdfbox or iText api. The below example to extract the all of the image from pdf file.
There might be some resource for you. If there are a lot of image in pdf, may be occur java.lang.OutOfMemoryError.
Download pdfbox.xx.jar here.
import java.io.File;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import org.apache.pdfbox.PDFBox;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage;
import org.jdom.Document;
public class ExtractImagesFromPDF {
public static void main(String[] args) throws Exception {
PDDocument document = PDDocument.load(new File("D:/temp/test.pdf"));
List pages = document.getDocumentCatalog().getAllPages();
Iterator iter = pages.iterator();
while(iter.hasNext()) {
PDPage page = (PDPage)iter.next();
PDResources resources = page.getResources();
Map images = resources.getImages();
if( images != null ) {
Iterator imageIter = images.keySet().iterator();
while(imageIter.hasNext()) {
String key = (String)imageIter.next();
System.out.println("Key : " + key);
PDXObjectImage image = (PDXObjectImage)images.get(key);
File file = new File("D:/temp/" + key + "." + image.getSuffix());
image.write2file(file);
}
}
}
}
}
Extract specific image from pdf file
To extract specific image, you have to know index of page and index of image of that page. Otherwise, you cannot extract.
The following example program extract first image of first page.
int targetPage = 0;
PDPage firstPage = (PDPage)document.getDocumentCatalog().getAllPages().get(targetPage);
PDResources resources = firstPage.getResources();
Map images = resources.getImages();
int targetImage = 0;
String imageKey = "Im" + targetImage;
PDXObjectImage image = (PDXObjectImage)images.get(imageKey);
File file = new File("D:/temp/" + imageKey + "." + image.getSuffix());
image.write2file(file);

If you are interested in an out-of-box product that could do this via black-box processing with minimal non-programming configuration (since you tried other products), then ABBYY FlexiCapture can do it. It can be configured to look for dynamic sizes of pictures/objects in loosely defined areas, or anywhere on the page, with full control over search logic. I used it once to extract lines of specific shape and thickness to separate chapters of a book, where each line indicated a new chapter, and could be anywhere on the page.

how to insert text into a scanned pdf document using java

I have to add text to pdf documents where there are many scanned pdf documents so the inserted text is inserted back to the scanned image and not over the image. how to add text over the scanned image inside the pdf.
package editExistingPDF;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import jxl.Cell;
import jxl.Sheet;
import jxl.Workbook;
import jxl.read.biff.BiffException;
import org.apache.commons.io.FilenameUtils;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Font;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfContentByte;
import com.itextpdf.text.pdf.PdfImportedPage;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfWriter;
public class AddPragraphToPdf {
public static void main(String[] args) throws IOException, DocumentException, BiffException {
String tan = "no tan";
File inputWorkbook = new File("lars.xls");
Workbook w;
w = Workbook.getWorkbook(inputWorkbook);
// Get the first sheet
Sheet sheet = w.getSheet(0);
Cell[] tnas =sheet.getColumn(0);
File ArticleFolder = new File("C:\\Documents and Settings\\sathishkumarkk\\My Documents\\article");
File[] listOfArticles = ArticleFolder.listFiles();
for (int ArticleInList = 0; ArticleInList < listOfArticles.length; ArticleInList++)
{
Document document = new Document(PageSize.A4);
// System.out.println(listOfArticles[ArticleInList].toString());
PdfReader pdfArticle = new PdfReader(listOfArticles[ArticleInList].toString());
if(listOfArticles[ArticleInList].getName().contains(".si."))
{continue;}
int noPgs=pdfArticle.getNumberOfPages();
String ArticleNoWithOutExt = FilenameUtils.removeExtension(listOfArticles[ArticleInList].getName());
String TanNo=ArticleNoWithOutExt.substring(0,ArticleNoWithOutExt.indexOf('.'));
// Create output PDF
PdfWriter writer = PdfWriter.getInstance(document,new FileOutputStream("C:\\Documents and Settings\\sathishkumarkk\\My Documents\\toPrint\\"+ArticleNoWithOutExt+".pdf"));
document.open();
PdfContentByte cb = writer.getDirectContent();
//get tan form excel sheet
System.out.println(TanNo);
for(Cell content : tnas){
if(content.getContents().contains(TanNo)){
tan=content.getContents();
System.out.println(tan);
}else{
continue;
}
}
// Load existing PDF
//PdfReader reader = new PdfReader(new FileInputStream("1.pdf"));
for (int i = 1; i <= noPgs; i++) {
PdfImportedPage page = writer.getImportedPage(pdfArticle, i);
// Copy first page of existing PDF into output PDF
document.newPage();
cb.addTemplate(page, 0, 0);
// Add your TAN here
Paragraph p= new Paragraph(tan);
Font font = new Font();
font.setSize(1.0f);
p.setLeading(12.0f, 1.0f);
p.setFont(font);
document.add(p);
}
document.close();
}
}
}
NOTE: The problem is that when there is a pdf create with only text I have no problem but when a pdf is with full of scanned document and when I try to add text; it gets added to the back of the scanned document. so while I print those pdf I will not get those text I added.

From this iText Example (which is the reverse of what you want, but switch getUnderContent with getOverContent and you'll be fine) :
Blockquote
Each PDF page has two extra layers; one that sits on top of all text / graphics and one that goes to the bottom. All user added content gets in-between these two. If we get into this bottommost content, we can write anything under that we want. To get into this bottommost layer, we can use the " getUnderContent" method of PdfStamper object.
This is documented in iText API Reference as shown below:
public PdfContentByte getUnderContent(int pageNum)
Gets a PdfContentByte to write under the page of the original document.
Parameters:
pageNum - the page number where the extra content is written
Returns:
a PdfContentByte to write under the page of the original document

To do this, you will need to first read in the PDF document, extract the elements and then add text to the document and resave it as a PDF document. This of course assumes that you can read the PDF document in the first place.
I'd recommend iText (see Example Code iText) to help you do this.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Split a PDF page in two parts [duplicate] - java

Related

Apache PDFBox - vertical match between image and text position

PDFBox: COSStream has been closed and cannot be read. Adding PDF File to PDF. And Review

PDFBox does not correctly render Simsun (chinese) font

extract image from image

how to insert text into a scanned pdf document using java

Categories

Resources