PDF Validation task in ASP.Net web Application while uploading - java

I've been provided with PDF Validation task in my ASP.Net web Application. I need to do Preflight check for the following points.
Check for presence or barcode or text in a defined area.
Check for embedded font issues.
Check for image transparent issue.
Check version.
I have checked for the options available like Itextsharp etc but they are not fulfilling my requirement. Please help.

My name is Tilal Ahmad and I am developer evangelist at Aspose.
You may try Aspose.Pdf for .NET to accomplish your requirements:
- Check for presence or barcode or text in a defined area.
For checking text in a defined PDF page region, please check following documetnation link of Aspose.Pdf for .NET and analyze the extracted text string(extractedText).
Extract Text from an particular page region
To check presence of barcode in a defined PDF page area. Initially You should convert a specific page region to image and then use Aspose.Barcode to detect barcode from that image.
- Check for embedded font issues.
If you meant to embed un-embedded fonts into PDF document by "checking missing embedded fonts...to correct it" then you may try this documentation link of Aspose.Pdf for .NET for the purpose.
Embedding fonts in an existing PDF document
- Check version.
You may load your PDF document to Aspose.Pdf.Document() object and get the PDF version as following.
Aspose.Pdf.Document doc = new Aspose.Pdf.Document("input.pdf");
Console.WriteLine("PDF version: {0}",doc.Version);
- Check for image transparent issue.
For image transparency issue we need some investigation, if possible you can post your sample document and details in Aspose.PDF forum. We will look into it and guide you.
Furthermore, if you want to validate some PDFA standard then you may load PDF document to Aspose.Pdf.Document() object and use Validate method.
Validate PDF document for PDFA standard

Related

Extract values of contents inside the Google's PDF viewer

(Edited)
I am posting question detailing the need to do 'CTRL+f' in Google Chrome to search a string in PDF. Can anyone please suggest some solution?
Testing requirement: Access a test application, open the PDF and search a testdata is present in the PDF or not.
Application overview: The test application opens the PDF in the embedded Goggle PDF viewer html view (sample screenshot attached in this comment below 'Sample PDF viewer.jpg'). In the PDF viewer, the PDF pages cannot be captured as web elements. So, the only way remains is to do 'CTRL+f', type-in the string in the search box and extract result from the search box.
Attachments:
The element's value which I'm looking to extract
Sample PDF viewer
(Original)
I'm able to open 'find box' (CTRL + F) in Chrome browser using Robot class and enter desired string in the 'find box'. How to extract the number of results from the find box using selenium in java language?
Attached image highlights the result (in yellow) I want to extract using selenium in java code.
Image

SVG markup in PDF as generated from Jasper Reports

Problem Statement
I have a SVG markup sent from front end javascript to back-end action classes. I am using Jasper Reports to generate PDF which will contain the SVG image( i have markup data only). How do i do that.
What i have tried
I have tried to embed SVG image( having the link to image file) to the PDF file while generating reports.
Looking for
How to embed svg markup so that i can see the image in PDF. Any other best approach to solve it.
In later versions of jasper reports you no longer need to add class="net.sf.jasperreports.engine.JRRenderable" to the imageExpression.
It is the default in JasperReports 6+ and JasperSoftStudio (JSS) will remove it if you add it in the Source pane.
The Tomcat SVG file provided in the JasperSoft Community answer works nicely. My own SVG file would show properly in MS Edge or Chrome but didn't appear in JSS.
When I added width, height, viewbox and overflow attributes to the svg element, it then did appear in JSS - so try the example first before trying the svg you actually want.
According to their own web site (here: http://community.jaspersoft.com/wiki/how-add-svg-image-your-report-jrxml) you need to include the SVG using a special HTML element:
<imageExpression class="net.sf.jasperreports.engine.JRRenderable">
<![CDATA[net.sf.jasperreports.renderers.BatikRenderer.getInstance(new java.io.File("D:\\tomcat.svg"))]]>
</imageExpression>
As the question indicates the absence of a SVG markup file, you could stick to the answer by David van Driessche but use a different getter:
BatikRenderer.getInstanceFromText($P{svgMarkup}) for example where svgMarkup is a parameter of String type containing the SVG markup data.

How to generate the PDF in CQ5.6.1 using page content in cq5

How to generate the PDF in CQ5.6.1 using page content.
A button in my site (genarate PDF) on click of the button i have to genarate the PDF file using the same page content.
Please let me know is there any out of the box PDF genarator in CQ or do i need to get the any linsenced product to genarate the PDF.
Thanks..
Adobe CQ is integrated with the Apache FOP, a formatter able to create PDF files. This tutorial describes how to enable content rewriter providing PDF version of the content under the .pdf extension.
However, please keep in mind that this approach requires manually writing the XSLT transform file able to process your page (and every component on it) and output the XSL-FO document.
In a previous project (CQ 5.5) we used https://code.google.com/p/wkhtmltopdf/ to create PDF files.. worked pretty good!
I had used Phantomjs to create a a custom pdf from the cq5 pages. for example if you don not want to display the right trail in the pdf or you want to disable the header footer. all this can be achieved with the help of phantomjs.
create a servlet which will execute a command at your server.
phantomjs <custom.js> 'page_url' nameofthepdf.pdf
here custom.js will show or hide html content based on your need.
This will work for all pages irrespective of cq5 or any tool.

How to read content of scanned pdf file in java / jsp or in javascript

How can i read content of scanned pdf file in java/jsp or in javascript, can you tell how to achieve this with developing code?
advance thanks for reply
You can convert the scanned PDF to a image using GhostScript and then feed it to an OCR engine, such as Tesseract. Take a look at VietOCR for an example implementation.
What you are trying to do (I think) is use OCR to extract text from a image PDF produced by a scanner. Java is probably the best for doing this. There are a number of options for doing this, depending on whether you are prepared to pay for software to do this. Google for Java (or Javascript), PDF and OCR.
IMO, this task is not something that should be done in a JSP. JSPs are best for rendering results ... not for generating them in the first place.
Actually, I am working on the same project at the moment, I am doing this in the following steps and the result works well.
User upload a scanned pdf to PDFUploader servlet, returns a server side file name to front end, which indicates upload is successful.
Front end uses this file name and default page 0 to ask PDFReader servlet to retrieve the first page of pdf file and display is at the front end, you can convert this pdf to a image for use an iframe to have the embedded pdf reader.
Front end uses this file name and default page 0 to ask OCRServlet to perform OCR. I am using WeOCR and tesseract as my OCR engine in an Apache http server. I have modified some parts of the submit.cgi in WeOCR server since I know what types of the format that the WeOCR server will receive. I still have some problems while I convert the scanned pdf to an image (I am using pdfbox )
Google for anything OCR related,
best bet will be to use existing libraries like http://asprise.com/product/ocr/index.php?lang=java

Convert html to pdf with linked documents inline

I need to convert a bundle of static HTML documents into a single PDF file programmatically on the server side on a Java/J2EE platform using a batch process preferably. The pdf files would be distributed to site users for offline browsing of the web pages.
The major points of the requirements are:
The banner at the top should not be present in the final pdf document.
The navigation bar on the left should be transformed into pdf bookmarks from html hyperlinks.
All hyperlinked contents (html/pdf/doc/docx etc.) present in the web pages should be part of the final pdf document with pdf bookmarks.
Is there any standard open source way of doing this?
Try Apache FOP. I just used it to convert XML to PDF and I think you can do the same with HTML/DOM. The website has a whole section on running FOP in a Java application and there's example code for DOM to PDF.
You can try iText - but I am not sure whether it handles all that you require.
Moreover, it is always better if you explore many options and then decide what you can and cannot do. In many cases there won't be any library/API that will out of the box support all that you ask for.
You can try www.alt-soft.com Xml2PDF for this

Categories