I am attempting to view a pdf through a JFrame that contains a PDFPagePanel. While I can view the PDF, it is smaller than the actual page and therefore it appears fuzzy at best. I need the document in the full resolution and size that it is. I do NOT want to turn this into an image. I need it to stay a PDF onscreen. Here is the code I am using:
protected PDDocument originalDoc;
protected PDPage memDoc;
protected PDFPagePanel pdfPanel; // initialized in constructor
....
List<PDPage> memPDF = originalDoc.getDocumentCatalog().getAllPages();
memDoc = memPDF.get(0);
pdfPanel.setPage(memDoc);
add(pdfPanel);
setBounds(1, 1, pdfPanel.getWidth(), pdfPanel.getHeight() + 51); // set JFrame to fit the size of the document + dropdown menu
pdfPanel.setVisible(true);
I get the PDF onscreen. It's not the correct size, though. The page itself is 8.5" x 11" and when I view it in Adobe PDF the "image" is fine and is very readable. It is very obviously not scaled correctly on the screen and I cannot find any solutions as to why? This is not the same as converting this to a BufferedImage which are the solutions I see online. I need it to stay in PDPage format and display that in the PDFPagePanel class.
Related
A mechanism for converting an HTML element to an image (e.g., PNG) is "rendering" it via a JEditorPane, as follows:
public void render(String html, int width, int height, file output) {
JEditorPane jep = new JEditorPane("text/html", html);
jep.setSize(width, height);
BufferedImage image = new BufferedImage(width, height, BufferedImage.TYPE_INT_RGB);
jep.print(image.getGraphics());
ImageIO.write(image, "PNG", output);
}
This approach works for simple HTML code, including basic support for CSS.
Is there any way to "autodetect" the width and height, instead of specifying it explicitly in the render() method?
Or any better way to convert HTML to an image, via "plain vanilla" Java?
Most of HTML element has no "default" size. Size are set by the text found into this element and the width of the screen where the element is displayed.
But, if you want to render an element with the current style displayed on the user screen, you can probably try to get the root element width.
For that, try to convert the HTML String to DOM Element (Creating a new DOM element from an HTML string using built-in DOM methods or Prototype) then, get the root element width and height.
I have a script that creates a PDF file and writes contents to it. After the execution is complete I need to write the status (fail, success) to the PDF, but the status should be on the top of the page. So the solution I came up with is to use absolute positioned text. Below is my code for the same
PdfContentByte cb = writer.DirectContent;
BaseFont bf = BaseFont.CreateFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
cb.SaveState();
cb.BeginText();
cb.MoveText(700, 30);
cb.SetFontAndSize(bf, 12);
cb.ShowText("My status");
cb.EndText();
cb.RestoreState();
But as the PDF creates multiple pages, this is added to the last page of the PDF. How can I add it to the 1st page??
Also is there a way to calculate the top coordinates of the page, ie the top-left coordinate?
iText was written with internet applications in mind. It was designed to flush content from memory as soon as possible: if a page is finished, that page is sent to the OutputStream and there is no way to return to that page.
That doesn't mean your requirement is impossible. PDF has a concept known as Form XObject. In iText, this concept is implemented under the name PdfTemplate. Such a PdfTemplate is a rectangular canvas with a fixed size that can be added to a page without being part of that page.
An example should clarify what that means. Please take a look at the WriteOnFirstPage example. In this example, we create a PdfTemplate like this:
PdfContentByte cb = writer.getDirectContent();
PdfTemplate message = cb.createTemplate(523, 50);
This message object refers to a Form XObject. It is a piece of content that is external to the page content.
We wrap the PdfTemplate inside an Image object. By doing so, we can add the Form XObject to the document just like any other object:
Image header = Image.getInstance(message);
document.add(header);
Now we can add as much data as we want:
for (int i = 0; i < 100; i++) {
document.add(new Paragraph("test"));
}
Adding 100 "test" lines will cause iText to create 3 pages. Once we're on page 3, we no longer have access to page 1, but we can still write content to the message object:
ColumnText ct = new ColumnText(message);
ct.setSimpleColumn(new Rectangle(0, 0, 523, 50));
ct.addElement(
new Paragraph(
String.format("There are %s pages in this document", writer.getPageNumber())));
ct.go();
If you check the resulting PDF write_on_first_page.pdf, you'll notice that the text we've added last is indeed on the first page.
With the follow code I open a PDF file (with filechooser object, I don't show all the code here, it's not important) and then I "put" everything in a List od PDPage, where there are all the PDF pages . If the PDF has 3 pages I will have 3 images , if the PDF has 4 pages I will have 4 images , etc ..
PDDocument document2 = PDDocument.loadNonSeq(new File(pdfFile), null);
List<PDPage> pdPages = document2.getDocumentCatalog().getAllPages();
In order to convert in BufferedImage I do this:
for (PDPage pdPage : pdPages)
{
++page;
BufferedImage bim = pdPage.convertToImage(BufferedImage.TYPE_INT_RGB, 300);
ImageIOUtil.writeImage(bim, pdfFile + "-" + page + ".png", 300);
}
and then I show, for example, one page in a stackpane. Now the problem, I want to show a preview of pdPages(so, no image format but PDPage) because I think is much better for big PDF. I mean, if I want to "convert" a PDF with 200 pages, it will take long time to complete the task. I thought to show a preview (maximum 10 or 15, it's not important) of the pages(in this way the user will click the interested image) but in PDPage format, not yet Image converted. I am thinking in the right way or I am wrong? In positive case, what can I do ?
Thanks in advance
I am able to get data from pdf pages in a string.
But along with those, footer data is also extracted. I want to remove those from all the pages of pdf. How can I remove that
I used Rectangle2D but coordinates are not giving data
In a comment the OP indicated that he used this code:
PDDocument doc = PDDocument.load("xyz.pdf");
PDPage page = (PDPage)doc.getDocumentCatalog().getAllPages().get( 1 );
Rectangle2D region = new Rectangle2D.Double(10, 10, 10, 10);
String regionName = "region";
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.addRegion(regionName, region);
stripper.extractRegions(page);
System.out.println("Region is "+ stripper.getTextForRegion("region"));
For most documents this code will extract no text because it looks at a small (10x10 pt) region in the upper left region of the second document page. Thus, the values in new Rectangle2D.Double(10, 10, 10, 10) have to change.
I tried with various regions , yet I am not getting any text, If you have idea for a normal pdf page , you should share
There is nothing like a normal pdf page. The goal of PDF is to enable users to exchange and view electronic documents easily and reliably, independent of the environment in which they were created or the environment in which they are viewed or printed. There is no serious restriction on page dimensions or location of content on pages.
E.g. for this form
you need values like these
PDPage page = (PDPage)doc.getDocumentCatalog().getAllPages().get(0);
Rectangle2D region = new Rectangle2D.Float(0f, 230f, 612f, 300f);
to extract the body "I authorize any health plan ... I have received a copy of this authorization." without headers, footers, or form lines.
If you have many similar pages (e.g. one large document with many pages with a similarly layout), you have to measure but once for many pages to extract.
First of all, sorry for my bad english.
I´m trying to remove the Header and Footer of a PDF page, it´s necessary to search at the page break for some words, but it´s impossible with the header and footer, so it´s necessary to crop it and then convert to text than it´s "possible" to search the words.
I´m doing it:
PDDocument pdDoc = PDDocument.load("document.pdf");
PDPage page = (PDPage) pdDoc.getDocumentCatalog().getAllPages().get(0);
PDRectangle rectangle = new PDRectangle();
rectangle.setUpperRightY(page.findCropBox().getUpperRightY() - 100);
rectangle.setLowerLeftY(page.findCropBox().getLowerLeftY());
rectangle.setUpperRightX(page.findCropBox().getUpperRightY());
rectangle.setLowerLeftX(page.findCropBox().getLowerLeftX());
page.setMediaBox(rectangle);
PDDocument document = new PDDocument();
document.addPage(page);
document.save("newDocument.pdf");
document.close();
But when I change it to HTML, it steal keeps the text that was hidden. Is there any way to save it withou the croped area and convert to html correctly?
Thanks.
Best regard´s.