Extract text from unselected PDF content using java PDFBox - java

I can easily get the content from the PDF file, but I got some file which text is not selectable when I open it. My existing code doesn't able to extract those text with following code block -
public class PDFBoxExample {
public static void main(String[] args) {
try {
File file = new File("C:\\pdf\\pdf_result.pdf");
try (PDDocument document = PDDocument.load(new FileInputStream(file))) {
document.getClass();
if (!document.isEncrypted()) {
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(false);
stripper.setShouldSeparateByBeads(true);
PDFTextStripper tStripper = new PDFTextStripper();
String content = tStripper.getText(document);
System.out.println(content);
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Please find the following link of my pdf file-
https://1drv.ms/b/s!AmRKaLhGJhJphvMOUBGADveatrx0hA?e=a0seG7
Can you please provide some solution for the same.

Related

PDF image rendering in java swing

I am using JPedal free version jar to render PDF in my java swing program. The normal PDF is getting rendered properly but while rendering the PDF image(Image file converted to PDF file) the quality drops considerably(not readable at all).
Example code :
public OpenViewer() {
//create and initialise JPedal viewer component
final Viewer myViewer =new Viewer();
myViewer.setupViewer();
//code to open when required
final File file=null; //example is commented out below
final InputStream stream = null;
//open the stream or File
try {
file = new File("/Users/markee/Desktop/myfile.pdf"); // This PDF is converted from tiff
} catch (Exception e) {
e.printStackTrace();
}
if(file!=null) {
myViewer.executeCommand(Commands.OPENFILE, new Object[]{file});
}
}

Give password protection to existing pdf file

Am trying to give password to an existing pdf file. It is working for a jasper report which is saved with .jrxml or .jasper but how to give it for pdf file.
Sample code:
public static void main(String[] args) {
String USER="Sai123";
String OWNER="Sairam";
try {
InputStream input=new FileInputStream(new File("D:\\Project1\\EmailSendExample\\WebContent\\PDFiles\\AnnexI.pdf"));
OutputStream file = new FileOutputStream(new File("D:\\Test.pdf"));
/*PdfReader reader = new PdfReader(input);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream("D:\\Test.pdf"));
stamper.setEncryption(PdfWriter.ALLOW_PRINTING, OWNER,USER, PdfWriter.ENCRYPTION_AES_128 | PdfWriter.DO_NOT_ENCRYPT_METADATA);
stamper.close();
reader.close();*/
JRPdfExporter exporter = new JRPdfExporter();
//exporter.setParameter(JRExporterParameter.INPUT_FILE, new File("D:\\Project1\\EmailSendExample\\WebContent\\PDFiles\\AnnexI.pdf"));
exporter.setParameter(JRExporterParameter.OUTPUT_FILE,new File("D:\\Test.pdf"));
exporter.setParameter(JRPdfExporterParameter.OWNER_PASSWORD, "Sai123");
exporter.setParameter(JRPdfExporterParameter.USER_PASSWORD, "Sairam");
exporter.setParameter(JRPdfExporterParameter.IS_ENCRYPTED, Boolean.TRUE);
exporter.exportReport();
System.out.println("Report Generation Complete");
file.close();
} catch (Exception e) {
e.printStackTrace();
}
it is throwing error like
net.sf.jasperreports.engine.JRException: No input source supplied to the exporter.
at net.sf.jasperreports.engine.JRAbstractExporter.setInput(JRAbstractExporter.java:922)
at net.sf.jasperreports.engine.export.JRPdfExporter.exportReport(JRPdfExporter.java:296)
at pdfpassword.main(pdfpassword.java:45)
Thanks in advance for your valuable suggestions.
According to me,we cannot provide pdf file as input to JRexporter. so in order to make existing pdf password protected use the code below.It works for me.
code:
private static String USER_PASSWORD = "password";
private static String OWNER_PASSWORD = "naveen";
public static void main(String[] args) throws IOException {
try
{
PdfReader pdfReader = new PdfReader("/home/base/Desktop/newtask/ext.pdf");
PdfStamper pdfStamper = new PdfStamper(pdfReader,new FileOutputStream("/home/base/Desktop/newtask/ext1.pdf"));
pdfStamper.setEncryption(USER_PASSWORD.getBytes(),OWNER_PASSWORD.getBytes(), PdfWriter.ALLOW_PRINTING,PdfWriter.ENCRYPTION_AES_128);
pdfStamper.close();
} catch (FileNotFoundException e)
{
e.printStackTrace();
} catch (com.itextpdf.text.DocumentException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
I see this line commented -
//exporter.setParameter(JRExporterParameter.INPUT_FILE, new File("D:\\Project1\\EmailSendExample\\WebContent\\PDFiles\\AnnexI.pdf"));
And exception talks about input -
net.sf.jasperreports.engine.JRException: No input source supplied to the exporter.

Inserting image loses PDF content

I'm trying to insert an image in an existing PDF file but iText puts it on the first page and I'm losing the rest of the page content. How can I insert it without losing existing content?
I used this code:
public static void main(String[] args) {
Document document = new Document(PageSize.A4);
try {
PdfWriter.getInstance(document, new FileOutputStream(
"/home/amira/work/APPS-579/word/generatedMergedDocs/FinalTest/1.pdf"));
document.open();
Image image = Image.getInstance(
"/home/amira/work/APPS-579/word/generatedMergedDocs/FinalTest/a.jpg");
document.add(image);
} catch (DocumentException de) {
de.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
}
document.close();
}
You're adding the image to an empty document, because you're not reading the original document, just overwriting it.
To modify an existing document with itext with an image, please see the following tutorial which explains it perfectly.

How to check if a PDF is Password Protected or not

I am trying to use iText's PdfReader to check if a given PDF file is password protected or not, but am getting this exception:
Exception in thread "Main Thread" java.lang.NoClassDefFoundError:org/bouncycastle/asn1/ASN1OctetString
But when testing the same code against a non-password protected file it runs fine. Here is the complete code:
try {
PdfReader pdf = new PdfReader("C:\\abc.pdf");
} catch (IOException e) {
e.printStackTrace();
}
In the old version of PDFBox
try
{
InputStream fis = new ByteArrayInputStream(pdfBytes);
PDDocument doc = PDDocument.load(fis);
if(doc.isEncrypted())
{
//Then the pdf file is encrypeted.
}
}
In the newer version of PDFBox (e.g. 2.0.4)
InputStream fis = new ByteArrayInputStream(pdfBytes);
boolean encrypted = false;
try {
PDDocument doc = PDDocument.load(fis);
if(doc.isEncrypted())
encrypted=true;
doc.close();
}
catch(InvalidPasswordException e) {
encrypted = true;
}
return encrypted;
Use Apache PDFBox - Java PDF Library from here:Sample Code:
try
{
document = PDDocument.load( "C:\\abc.pdf");
if(document.isEncrypted())
{
//Then the pdf file is encrypeted.
}
}
The way I do it is by attempting to read the PDF file using PdfReader without passing a password of course. If the file is password protected, a BadPasswordException will be thrown. This is using the iText library.
Here's a solution that doesn't require 3rd party libraries, using the PdfRenderer API.
fun checkIfPdfIsPasswordProtected(uri: Uri, contentResolver: ContentResolver): Boolean {
val parcelFileDescriptor = contentResolver.openFileDescriptor(uri, "r")
?: return false
return try {
PdfRenderer(parcelFileDescriptor)
false
} catch (securityException: SecurityException) {
true
}
}
Reference: https://developer.android.com/reference/android/graphics/pdf/PdfRenderer
try {
PdfReader pdfReader = new PdfReader(String.valueOf(file));
pdfReader.isEncrypted();
} catch (IOException e) {
e.printStackTrace();
}
Using iText PDF library you can check. If it went to an exception handle it(ask for password)
Try this code:
boolean isProtected = true;
PDDocument pdfDocument = null;
try
{
pdfDocument = PDDocument.load(new File("your file path"));
isProtected = false;
}
catch(Exception e){
LOG.error("Error while loading file : ",e);
}
Syste.out.println(isProtected);
If your document is password protected then it can not load document and throw IOException.
Verified above code using pdfbox-2.0.4.jar
public boolean checkPdfEncrypted(InputStream fis) throws IOException {
boolean encrypted = false;
try {
PDDocument doc = PDDocument.load(fis);
if (doc.isEncrypted())
encrypted = true;
doc.close();
} catch (
IOException e) {
encrypted = true;
}
return encrypted;
}
Note: There is one corner case in iText some file are encrypted protected but open without a password, to read those files and add water mark like this
PdfReader reader = new PdfReader(src);
reader.setUnethicalReading(true);
I didn't want to use any third party library, so i used this -
try {
new PdfRenderer(ParcelFileDescriptor.open(file, ParcelFileDescriptor.MODE_READ_ONLY));
} catch (Exception e) {
e.printStackTrace();
// file is password protected
}
If the file was password protected, i didn't use it.

Create a Word file using POI

My requirement is that I should read a template file and change some values in its content and write it back to another file. Most importantly it should have the same styles as that of the template.
The problem I face is that I am able to read and write, but its very difficult to transfer the styles as well. Especially I am tired trying to apply the paragraph styles to the document. Pls help me..... this is my code
public static void main(String[] args) {
try {
HWPFDocument templateFile = new HWPFDocument(new FileInputStream("D:\\POI\\testPOIin.doc"));
HWPFDocument blankFile = new HWPFDocument(new FileInputStream("D:\\POI\\blank.doc"));
ParagraphProperties pp = templateFile.getRange().getParagraph(4).cloneProperties();
blankFile.getRange().insertAfter(pp, 0);
OutputStream out = new FileOutputStream("D:\\POI\\testPOIout.doc");
blankFile.write(out);
} catch (FileNotFoundException fnfe) {
// TODO: Add catch code
fnfe.printStackTrace();
} catch (Exception ioe) {
// TODO: Add catch code
ioe.printStackTrace();
}
}
}
Pls let me know that I am doing wrong.....
I also had similar task and after investigation i created solution, but it works only for docx files:
public static void main(String[] args) throws Exception {
FileOutputStream fos = new FileOutputStream(new File("transformed.docx"));
XWPFDocument doc = new XWPFDocument(new FileInputStream(new File("original.docx")));
for(XWPFParagraph p:doc.getParagraphs()){
for(XWPFRun r:p.getRuns()){
for(CTText ct:r.getCTR().getTList()){
String str = ct.getStringValue();
if(str.contains("NAME")){
str = str.replace("NAME", "Java Dev");
ct.setStringValue(str);
}
}
}
}
doc.write(fos);
}
it operates on low level elements so it saves styles and other props. Hope it will help somebody.

Categories