Convert html to pdf automatically using ASPOSE in Java - java

I am trying to convert html to pdf using aspose,also i have to use PageSize A1,A2,A3,A4 .this is worked perfectly..but i dont want set pagesize for pdf generation.So far i have tried below code
HtmlLoadOptions htmloptions = new HtmlLoadOptions(basePath);
htmloptions.getPageInfo().setWidth(PageSize.getA2().getWidth());
htmloptions.getPageInfo().setHeight(PageSize.getA2().getHeight());
// Load HTML file
Document doc = new Document(basePath + "400010_DOC002_L_10_2508016.html", htmloptions);
// Save HTML file
doc.save("D:/Web+URL_output.pdf");
Can anyone suggest with out set page size i have convert html to pdf conversion ? or else please let me know what tools are available for this. Please let me know any other tools for this conversion.

#Shankar, you may use the below code sample in order to convert an HTML file to a PDF file without setting page size. By default, the page size of the rendered PDF file will be as of the A4 page size.
Simply omit the code which is setting a page size, else remains the same.
HtmlLoadOptions htmloptions = new HtmlLoadOptions(basePath);
// Load HTML file
Document doc = new Document(basePath + "400010_DOC002_L_10_2508016.html", htmloptions);
// Save HTML file
doc.save("D:/Web+URL_output.pdf");
Please let us know if you need any further assistance. I work with Aspose as Developer Evangelist.

Related

Adding pages to PDF/A file with PDFBox without losing PDF/A validity

I'm developing a Java application that has to process a folder with PDF/A files, adding a page with some information to each of them using Apache's PDFBox library. The problem is that the output PDF file after adding the information is not PDF/A anymore. This is a validation test from the website: https://www.pdf-online.com/osa/validate.aspx:
And this is the relevant part of the code that I use to generate the PDF file:
String pdfFileName = this.baseFolder+this.extendedPDFFileName;
File file = new File(pdfFileName);
PDDocument pdfFile = PDDocument.load(file);
PDPage pag = new PDPage();
// As a test, simply adding a page makes the PDF unvalid as PDF/A
pdfFile.addPage(pag);
pdfFile.save(file);
pdfFile.close();
What could I do to keep the PDF/A format validity? Thanks in advance,
As Tilman Hausherr suggested, the problem has been solved by adding a PDResources object to the new page, like this:
pag.setResources(new PDResources());
Now I'm having troubles with the embedded fonts, but this is another question :)
Many thanks!
You create a normal PDF in your code, you should create a valid PDF/A from the start.
Here's a link: https://pdfbox.apache.org/1.8/cookbook/pdfacreation.html

How to write data to pdf file which contains html tags using itext lib in Java

I have String which contains some html tags and it is coming from database, i want to write that in PDF file with same styling present in the String in the form of HTML tag. I tried to use XMLWorkerHelper like this
String html = What is the equation of the line passing through the
point (2,-3) and making an angle of -45<sup>2</sup> with the positive
X-axis?
XMLWorkerHelper.getInstance().parseXHtml(writer, document, new
StringReader(html));
but it only reads the data which is inside the html tag(in this case only 2) other string it simply ignores. But i want the entire String with HTML formating.
With HTMLWorker it works perfectly but that is deprecated so please let me know how to achieve this.
I am using iText 5 lib

Extract the first page content from docx file by XML parsing

I need to extract the first page content from the docx file and save it as a seperate document. I need everything from the first page( images, tables, text) to be saved as it is in new docx file.
What i tried is :
I looked into the xml of the unzipped docx file. Since word document is reflowable i couldnt find a page break after each page ends. So i couldnt find the end of each page via the document.xml
Is there any way to get the XML content of the first page of the document alone using java XML DOM parser ?
Do not write a new parser, there are tons of already existing tools for that (e.g., what if your input changes from XML to binary Word files?).
Use Apache POI for example, as #JFB suggested.

How to convert dynamic html to pdf?

I want to convert dynamic html to pdf. Following code show the conversion of static html to pdf:
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("d:/sample/pdfaskkea.pdf"));
// step 3
document.open();
// step 4
XMLWorkerHelper.getInstance().parseXHtml(writer, document,new FileInputStream("webcontent/jsp/index.jsp"), null);
// XMLWorkerHelper.getInstance().parseXHtml(writer, document,new FileInputStream("C:\\pdf_table1.html"), null);
//step 5
document.close();
System.out.println( "PDF Created!" );
From your question it is not clear, what you mean under "dynamic HTML".
If it is an HTML dynamically created with JSP, for example, PD4ML offers a JSP custom tag library - you only need to surround your code with and to output PDF instead of HTML.
If under dynamic HTML you mean JavaScript-rich HTML pages, I would recommend to take a look at PhantomJS, which can convert HTMLs also built on-a-fly with JavaScript. PhantomJS is a native standalone application, based on WebKit.
You can use itext pdf library to convert html into rich PDF files. To generate dynamic HTML content you can use a template library like thymeleaf.
I have a detailed article about generating PDF files with thymeleaf in a spring boot application if you are interested.

Converting HTML files into PDF

I am using the following code to generate a PDF file of the HTML Report
String url = new File("Test.html").toURI().toURL().toString();
OutputStream os = new FileOutputStream("Test.pdf");
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(url);
renderer.layout();
renderer.createPDF(os);
os.close();
I was able to use it on sample HTML files to convert to pdf. But when it comes to my real usage, the HTML content consists of various special symbols, like &,<,> that can't be parsed by XML.
I tried using CDATA, while generating HTML itself, but later found that the text around CDATA is not visible in HMTL.
Does anyone have a solution for this?
Have you tried to print to pdf from the browser? Google primo pdf for a program that we'll let you do it.
I don't know if this will help you, but you can use StringEscapeUtils from apache-commons. It has methods for escape and unescape HTML (you may use them to pre-process your HTML before PDF generation).

Categories