Performance issue in PDFBox load function

Performance issue in PDFBox load function - java

I am using pdfbox 2.0.24 and JAVA 8 to load pdf file. It takes around 53 mins to load 9 page PDF file. Code snippet is as below:
PDDocument pdfDocument = PDDocument.load(new File("C:\local\first.pdf"), MemoryUsageSetting.setupTempFileOnly());
We load pdf to check file is password protected or not? Can someone suggest any optimize way or other approach for this.

Related

With iText 5, how to save a PDF file as linearized PDF

With iText 5 (java), is it possible to save a PDF file as a linearized PDF, also called sometimes “Web Optimized” or “Fast Web View” enabled PDF ?

iText 5 does not have a feature for saving linearized PDFs.
Actually this would not fit the iText 5 architecture at all which attempts to write data to its target output stream as early as possible, and you cannot do that while creating a linearized PDF.
Nonetheless, you can of course create a PdfStamper-like class which takes a PdfReader representing an existing PDF and stores it as a linearized PDF. This will require quite some coding, though.

Checking if a PDF is PDF/A 1-a format or not using PDFBOX in java

I have to check if a pdf file is in PDF/A 1-a format or not using pdfbox or any other free library in java . I have searched a lot on google in this regard but still i couldnt get any code or technique for doing this.
How can I check this in java .

The document from pdfbox shows how to do PDF/A-1b validation:
https://pdfbox.apache.org/cookbook/pdfavalidation.html
to do pdf/a-1a validation, you simply change :
parser.parse();
to:
parser.parse(Format.PDF_A1A);
I was able to ascertain this from reading the parser source code located here:
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.pdfbox/preflight/1.8.2/org/apache/pdfbox/preflight/parser/PreflightParser.java

how Read a particular Page from a doc or pdf file.

I have some document of .doc and .pdf file and my requirement is to read a particular page from the .doc or .pdf file which i will provide at the run time .This can be possible by reading page by page and at the end of each page if i do numbering .but some i am getting some document where numbering is not their so how can i do that?
is their any api or any other logic so that i can fixed this problem?
hello all
i have .DOC file but i am not supposed to read entire file instead i am given a page number.
therefore i got to read only that particular page from the doc file.
I am using apache.poi api.
file = new File("c://doc/assignment/afternoon_24.doc");
FileInputStream fis=new FileInputStream(file.getAbsolutePath());
i need to read the page X of this file and write to a text file?

I guess there is a missunderstanding: You can not read a DOC (or PDF) simply as an Inputstream and skip pages (unless you know and evaluate the fileformat).
Both files have a format (encoding the formatting and meta info into some binary formats). Just try to open a PDF in notepad or another plain text editor. You will see it.
As mkl suggested: to access the contents of a DOC (or PDF) you need a library that can handle that fileformat. For Microsoft Office formats there is for example the open source library Apache POI, for PDF there is for example PDF box among others and a full thread about it. There are different libraries for each of the formats with different features and licensing models.

PDF to XPS with java

Could you please help me to find a way to convert the PDF file to XPS file with java programmable? Is it possible to do that as freeware library??

Its not the best solution but according to this thread you can use ghostscript (invoking command line command) to convert the PDF to Images, from those images then create an XPS document:
gswin32c.exe -Z, -sDEVICE=png16m-sOutputFile="%04do.png" "temp.pdf"

When printing PDF Document from Java, some of the text is rotated

I tried to print a PDF document from Java using PDFRenderer and ICEpdf.
In both cases some of the text came out rotated in 180 degrees while the images stayed correct.
With PDFREndere all the text is rotated and in ICEpdf only some of the lines.
Any idea why is this happening?

Sounds like a bug in pdf renderer. Have you tried their bug tracker? https://pdf-renderer.dev.java.net/servlets/ProjectIssues

Perhaps the pdf is broken. Also try to open the pdf using Ghostscript. Acrobat is too indulgent with broken or malformed pdf files (it automatically correct some "syntax" error in the pdf structure).

Sounds like the font you are using in the document isn't available on the system. I had the very same problem with java-generated eps files.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Performance issue in PDFBox load function - java

Related

With iText 5, how to save a PDF file as linearized PDF

Checking if a PDF is PDF/A 1-a format or not using PDFBOX in java

how Read a particular Page from a doc or pdf file.

PDF to XPS with java

When printing PDF Document from Java, some of the text is rotated

Categories

Resources