Editing an existing PDF without using iText - java

I want to add an index page to existing PDF file. And add page numbers to the page of the pdf file.
All the suggested solutions point towards creating a new pdf and merging the existing pdf file with the new one.
Is there any other way for this ??
Also I dont want to use itext since its not free for commercial uses.

According to your comments to the original question, you think in PDFBox
for adding a new page & content, you need to create a new pdf add new content and then merge the existing pdf. I wanted to avoid the merging step. Renaming is not the issue
You might want to try something like this:
PDDocument doc = PDDocument.load(new FileInputStream(new File("original.pdf")));
PDPage page = new PDPage();
// fill page
doc.addPage(page);
doc.save("original-plus-page.pdf");
EDIT: In a comment to the answer the question arose how to insert a new page at specific index(page number). To do this, obviously the doc.addPage(page) has to be changed somehow.
Originally this PDDocument method is defined like this:
/**
* This will add a page to the document. This is a convenience method, that
* will add the page to the root of the hierarchy and set the parent of the
* page to the root.
*
* #param page The page to add to the document.
*/
public void addPage( PDPage page )
{
PDPageNode rootPages = getDocumentCatalog().getPages();
rootPages.getKids().add( page );
page.setParent( rootPages );
rootPages.updateCount();
}
We merely need a similar function which merely does not simply add the page to the kids but instead adds it at a given index. Thus a helper method like the following in our code will do:
public static void addPage(PDDocument doc, int index, PDPage page)
{
PDPageNode rootPages = doc.getDocumentCatalog().getPages();
rootPages.getKids().add(index, page);
page.setParent(rootPages);
rootPages.updateCount();
}
If you now replace the line
doc.addPage(page);
in the code of the original answer by
addPage(doc, page, 0);
the empty page is added up front.

Related

Add HTML Markup using java Apache PDFBOX

I have been using PDFBOX and EasyTable which extends PDFBOX to draw datatables. I have hit a problem whereby I have a java object with a string of HTML data that I need to be added to the PDF using PDFBOX. A dig at the documentation seems not to bear any fruits.
The code below is a snippet hello world, which I want on the pdf been generated to have H1 formatting.
// Create a document and add a page to it
PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage( page );
// Create a new font object selecting one of the PDF base fonts
PDFont font = PDType1Font.HELVETICA_BOLD;
// Start a new content stream which will "hold" the to be created content
PDPageContentStream contentStream = new PDPageContentStream(document, page);
// Define a text content stream using the selected font, moving the cursor and drawing the text "Hello World"
contentStream.beginText();
contentStream.setFont( font, 12 );
contentStream.moveTextPositionByAmount( 100, 700 );
contentStream.drawString( "<h1>HelloWorld</h1>" );
contentStream.endText();
// Make sure that the content stream is closed:
contentStream.close();
// Save the results and ensure that the document is properly closed:
document.save( "Hello World.pdf");
document.close();
}
Use jerico to format the html to free text while mapping correctly the output of tags.
sample
public String extractAllText(String htmlText){
return new net.htmlparser.jericho
.Source(htmlText)
.getRenderer()
.setMaxLineLength(Integer.MAX_VALUE)
.setNewLine(null)
.toString();
}
Include on your gradle or maven:
compile group: 'net.htmlparser.jericho', name: 'jericho-html', version: '3.4'
PDFBox does not know HTML, at least not for creating content.
Thus, with plain PDFBox you have to parse the HTML yourself and derive special text drawing characteristics from the tags text is in.
E.g. when you encounter "<h1>HelloWorld</h1>", you have to extract the text "HelloWorld" and use the information that it is in a h1 tag to select an appropriate prime header font and font size to draw that "HelloWorld".
Alternatively you can look for a library doing that HTML parsing and transforming to PDF text drawing instructions for PDFBox, e.g. Open HTML to PDF.

Adding a page with PDFBox doesn't work

I'm trying to add a page to an existing PDF-Document that I'm performing multiple different actions on before and after the page is supposed to be added.
Currently I open the page at the beginning of the document and write stuff on the first and second page of it. On the second page I add some images aswell. The Stuff that's written on the PDFs is different per PDF and sometimes it's so much stuff that two pages (or sometimes even 3) aren't enough. Now I'm trying to add a third or even fourth page once a certain amount of written text/printed images is on the second page.
Somehow no matter what I do, the third page I want to add doesn't show up in the final document. Here's my code to add the page:
if(doc.getNumberOfPages() < p+1){
PDDocument emptyDoc = PDDocument.load("./data/EmptyPage.pdf");
PDPage emptyPage = (PDPage)emptyDoc.getDocumentCatalog().getAllPages().get(0);
doc.addPage(emptyPage);;
emptyDoc.close();
}
When I check doc.getNumberOfPages() before, it says 2. Afterwards it says 3. The final document still just has 2 pages. The code after the if-clause contains multiple contentStreams that are supposed to write text on the new page (and on the first and second page).
contentStream = new PDPageContentStream(doc, (PDPage) allPages.get((int)p), true, true);
In the end, I save the document via
doc.save(tarFolder+nr.get(i)+".pdf");
doc.close();
I've created a whole new project with a class that's supposed to do the exact same thing - add a page to another PDF. This code works perfectly fine and the third page shows up - so it seems like I'm just missing something. My code works perfectly fine for page 1 + 2, we just had the new case that we need a third/fourth page sometimes lately, so I want to integrate that into my main project.
Here's the new project that works:
PDDocument doc = PDDocument.load("D:\\test.pdf");
PDDocument doc2 = PDDocument.load("D:\\EmptyPage.pdf");
List<PDPage> allPages = doc2.getDocumentCatalog().getAllPages();
PDPage page = (PDPage) allPages.get(pageNumber);
doc.addPage(page);
doc.save("D:\\testoutput.pdf");
What's weird in my main project is that the third page I add gets counted by
"getNumberOfPages()"
but doesn't show up in the final product. The program throws an error if I don't add the page because it tries to write content on the third page.
Any idea what I'm doing wrong?
Thanks in advance!
Edit:
If I add the page at the beginning, when my document is loaded the first time, the page gets added and exists in my final document - like this:
doc = PDDocument.load(config.getFolder("template"));
PDDocument emptyDoc = PDDocument.load("./data/EmptyPage.pdf");
PDPage emptyPage = (PDPage)emptyDoc.getDocumentCatalog().getAllPages().get(0);
doc.addPage(emptyPage);
However, since some documents don't need that extra page, it gets unnecessarily complicated - and I feel like removing the page if it isn't needed isn't really pretty, since I'd like to avoid adding it in the first place. Maybe somebody has an idea now?
I found an answer thanks to Tilman Hausherr.
If I move the
emptyDoc.close()
to the end of my code, right after:
doc.save(tarFolder+nr.get(i)+".pdf");
doc.close();
the page shows up in the final document without any issues.

iText PDF add text in absolute position on top of the 1st page

I have a script that creates a PDF file and writes contents to it. After the execution is complete I need to write the status (fail, success) to the PDF, but the status should be on the top of the page. So the solution I came up with is to use absolute positioned text. Below is my code for the same
PdfContentByte cb = writer.DirectContent;
BaseFont bf = BaseFont.CreateFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
cb.SaveState();
cb.BeginText();
cb.MoveText(700, 30);
cb.SetFontAndSize(bf, 12);
cb.ShowText("My status");
cb.EndText();
cb.RestoreState();
But as the PDF creates multiple pages, this is added to the last page of the PDF. How can I add it to the 1st page??
Also is there a way to calculate the top coordinates of the page, ie the top-left coordinate?
iText was written with internet applications in mind. It was designed to flush content from memory as soon as possible: if a page is finished, that page is sent to the OutputStream and there is no way to return to that page.
That doesn't mean your requirement is impossible. PDF has a concept known as Form XObject. In iText, this concept is implemented under the name PdfTemplate. Such a PdfTemplate is a rectangular canvas with a fixed size that can be added to a page without being part of that page.
An example should clarify what that means. Please take a look at the WriteOnFirstPage example. In this example, we create a PdfTemplate like this:
PdfContentByte cb = writer.getDirectContent();
PdfTemplate message = cb.createTemplate(523, 50);
This message object refers to a Form XObject. It is a piece of content that is external to the page content.
We wrap the PdfTemplate inside an Image object. By doing so, we can add the Form XObject to the document just like any other object:
Image header = Image.getInstance(message);
document.add(header);
Now we can add as much data as we want:
for (int i = 0; i < 100; i++) {
document.add(new Paragraph("test"));
}
Adding 100 "test" lines will cause iText to create 3 pages. Once we're on page 3, we no longer have access to page 1, but we can still write content to the message object:
ColumnText ct = new ColumnText(message);
ct.setSimpleColumn(new Rectangle(0, 0, 523, 50));
ct.addElement(
new Paragraph(
String.format("There are %s pages in this document", writer.getPageNumber())));
ct.go();
If you check the resulting PDF write_on_first_page.pdf, you'll notice that the text we've added last is indeed on the first page.

JavaFx WebEngine - Overwriting a website's stylesheet with (local) files

I'd like to customise the appearance of a website that I am loading, so I created a little test.css file that does nothing but changing the look of all table rows:
tr {
height: 22px;
background-image: url("test.png");
}
How do I get he WebEngine to load this file and replace the page's own CSS rules with mine?
Also, i'd like to be able to load page-specific css files and not one huge file for all pages.
I found this page, but it only shows how to run through the DOM and assign a new style to the desired elements by hand. This is, of course, not what I want. Instead, I'd like the browser to use my files as 'user defaults'.
Thx for any help :)
First of I have to state, that I hope you know what you are doing, as these things can seriously damage a web site.
So here is what you can do:
You grab the Document from the WebEngine, retrieve the head element and add a style child element to it, containing the src location of the stylesheet you want to add.
Document doc = webView.getEngine().getDocument();
URL scriptUrl = getClass().getResource(pathToAttachedDocument); // pathToAttachedDocument = CSS file you want to attach
String content = IOUtils.toString(scriptUrl); // Use Apache IO commons for conveniance
Element appendContent = doc.createElement("style");
appendContent.appendChild(doc.createTextNode(content));
doc.getElementsByTagName("head").item(0).appendChild(appendContent);
By the way, JavaScript can be added in a similar way, it's just 'script' instead of 'style'
I would do like this to ADD or REPLACE any rules :
String css = getFileAsString("style.css");
Document doc = webEngine.getDocument();
Element e = doc.getElementById("my_style");
e.setTextContent(css);
... given a
<style id="my_style"></style>
tag in the HTML document.
setUserStyleSheetLocation()was designed for that very purpose: to let the user of the web page, style it as they want.
Usage:
webEngine.setUserStyleSheetLocation(styleSheetURL.toString());

How to create TOC or index using the Flying Saucer project?

I convert HTML files to PDF format using The Flying Saucer Project. This are documents containing repetitive information - premises and their addresses, let's call them elements. At the end of a document I need to create an index. Each index entry should have a page number referring to the page where element was added. The number of elements that can fit on one page will vary.
How can I create a document index? Or how can I get notified while library adds certain type of HTML element to the PDF document?
Try this:
In CSS
ol.toc a::after { content: leader('.') target-counter(attr(href), page);}
In HTML
<h1>Table of Contents</h1>
<ol class='toc'>
<li>Loomings</li>
<li>The Carpet-Bag</li>
<li>The Spouter-Inn</li>
</ol>
<div id="chapter1">Loomings</div>
I found possible answer. You have to start playing with org.xhtmlrenderer.render.BlockBox class. A method public void layout(LayoutContext c, int contentStart) is used to place properly any HTML element in the PDF document. This method iterates through an element a few times. After the last iteration a valid page number is set.
If you mark an element you want to index, by for example using a class attribute, then you can get a page number using following code:
String cssClass = getElement().getAttribute("class");
if(!cssClass.equals("index")) {
int pageNumber = c.getRootLayer().getPages().size();
/* ... */
}

Categories