Flying Saucer - html entities are not rendered

Flying Saucer - html entities are not rendered - java

I'm generating pdf using flying-saucer lib. But I have problem with some html entities.
I've already was searching for solution I found many tips in this forum, and in other places but still there is the problem.
I've tried this approach :
http://sdtidbits.blogspot.com/2008/11/flying-saucer-xhtml-rendering-and-local.html
but without any success
My code look like this:
os = new FileOutputStream(pdf);
ITextRenderer renderer = new ITextRenderer();
ChainingReplacedElementFactory chainingReplacedElementFactory = new ChainingReplacedElementFactory();
chainingReplacedElementFactory.addReplacedElementFactory(new B64ImgReplacedElementFactory(renderer.getSharedContext()));
renderer.getSharedContext().setReplacedElementFactory(chainingReplacedElementFactory);
renderer.setDocument(url);
renderer.layout();
renderer.createPDF(os);
where pdf is the name of new pdf to create and url is
File f = new File(url);
if (f.exists()) {
url = f.toURI().toURL().toString();
}
my html file look like this
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content="HTML Tidy, see www.w3.org" />
<style type="text/css">
html, body, div, span, applet, object, iframe, h1, h2, h3, h4, h5, h6, p, blockquote, pre, a, abbr, acronym, address, big, cite, code, del, dfn, em, font, img, ins, kbd, q, s, samp, small, strike, strong, sub, sup, tt, var, b, u, i, center, dl, dt, dd, ol, ul, li, fieldset, form, label, legend, caption, tbody, tfoot, thead, tr, th
{
color: #444;
font-family: Arial;
font-size: 14px;
line-height: 25px;
border: none;
}
table, td {border: solid 1px #CCC;}
img {page-break-inside: avoid;}
</style>
<title></title>
</head>
<body>
<h1>Test</h1>
<p>Html etites to test</p>
<p>←</p>
<p>←</p>
<p>↑</p>
<p>↑</p>
<p>↓</p>
<p></p>
</body>
</html>
Everything works fine beside those entities. There is nothing rendered only blank spots where should by arrows.
Does anyone has solution for that ?

The issue is that the font used by iText by default doesn't support the caracters you want to print.
The solution is to embed another font which can display this character, for example DejaVu.
In the java file, declare the font to the renderer:
ITextRenderer renderer = new ITextRenderer();
renderer.getFontResolver().addFont("font/DEJAVUSANS.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
renderer.setDocument(url);
renderer.layout();
renderer.createPDF(os);
And change the font-family declaration in the HTML:
body
{
font-family: DejaVu Sans;
}

Related

flying saucer create blank page when set landscape?

I am using flying saucer to create pdf from html.
#page land { size: A4 landscape; }
.landscapePage {
page:land;
height: 14cm;
width: 30cm;
background-color: green;
}
<div class="landscapePage">
</div>
But i don't know why it created a blank page before landscape page. How do I avoid it?

Simple way to display currency symbol in html2pdf for iText 7

I updated my code from iText 5.0 to iText 7 and html2pdf 2.0 according to this post. In earlier version rupee symbol was rendering properly, but because of css issue i changed the code. Now complete page is converting properly to pdf except rupee symbol.
Tried adding font in html style tag itself like * { font-family: Arial; }.
Changed value of rupee symbol from &#x20b9, ₹ and also added directly ₹ , but no use.
My Html:
<html>
<head>
<style>
* { font-family: Arial; }
</style>
<title>HTML div</title>
</head>
<body>
<p style="margin-bottom: 0in; padding-left: 60px;">
<div style="font-size: 450%; text-indent: 150px;">
<strong>BUY <span style="color: #ff420e;">2</span> GET
</strong>
</div>
</p>
<div
style="float: left; display: inline-block; margin: 10px; text-align: right; font-size: 70%; line-height: 27; transform: rotate(270deg);">Offer
Expiry Date : 30/11/2017</Div>
<div
style="float: left; display: inline-block; margin: 10px; text-align: right; font-size: 350%;">
₹
<!-- ₹ -->
</div>
<div
style="float: left; display: inline-block; margin: auto; font-size: 1500%; color: red; font-weight: bold;">99</div>
<div
style="float: left; display: inline-block; margin: 10px; text-align: left; font-size: 250%; line-height: 10;">OFF</div>
<div
style="position: absolute; height: 40px; font-size: 250%; line-height: 600px; color: red; text-indent: 50px">Pepsi
2.25 Pet Bottle ltr</div>
<div
style="position: absolute; height: 40px; font-size: 245%; line-height: 694px; text-indent: 50px">
MRP: ₹ <span style="color: #ff420e;">654</span>
</div>
</body>
</html>
Java Code :
public class Test {
final static String DEST = "D://Workspace_1574973//POP//sample_12.pdf";
final static String SRC = "D://Workspace_1574973//POP//src//com//resources//test.html";
public static void main(String[] args) throws Exception {
createPdf(SRC, DEST);
}
public static void createPdf(String src, String dest) throws IOException {
HtmlConverter.convertToPdf(new File(src), new File(dest));
}
}
Earlier code, which was working with symbols.
log.info("Creating file start");
OutputStream file = new FileOutputStream(new File("font_check.pdf"));
Document document = new Document(PageSize.A4);
PdfWriter writer = PdfWriter.getInstance(document, file);
document.open();
InputStream is = new ByteArrayInputStream(fileTemplate.getBytes());
XMLWorkerHelper.getInstance().parseXHtml(writer, document, is);
document.close();
file.close();
log.info("Creating file end");
Is there any simple approach to achieve this, with minimal and optimized code ?
Because I've to generate thousands of pdf in one go, So the performance should not affect.
Please let me know, if anyone achieved this through latest version.
Edit : Also how to set particular paper type in this like A6, A3, A4 etc.

Hope you are not mad, because I don't have reputation to write simple comments... so I'll post a full answer instead. I parse HTML for my work, and I read SO sometimes. There is a lot on the subject regarding UTF-8 here. Most software systems support the "greater than char #256" (UTF-8) codes - for instance the Indian Rupee Symbol. However, most of the time the programmer has to include a specific request for such a desired behavior, explicitly.
In HTML, for instance - adding this line usually helps:
String UTF8MetaTag = "<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />";
Anyway, not having used HTMLToPDF - I might not be the right guy to post answers to your questions - but, because I have dealt with UTF-8 foreign language characters for three years, I know that setting a software setting to handle the 65,000 or so chars is usually VERY EASY, BUT ALSO ALWAYS VERY MANDATORY.
Here is an SO post about using HTMLToPDF and UTF-8 to handle Japanese Kanji characters. Most likely, it should handle all UTF-8, but that is not a guarantee.
HTML2PDF support for japanese language(utf8) is not working
Here are a few posts about it using HTML2PDF in PHP:
Converting html 2 pdf (php) using hebrew returns "???"
Having æøå chars in HTML2PDF charset

docx4j conversion html->docx->html

I'm working on my first project using docx4j... My goal is to export xhtml from a webapp (ckeditor created html) into a docx, edit it in Word, then import it back into the ckeditor wysiwyg.
(*crosspost from http://www.docx4java.org/forums/xhtml-import-f28/html-docx-html-inserts-a-lot-of-space-t1966.html#p6791?sid=78b64a02482926c4dbdbafbf50d0a914
will update when answered)
I have created an html test document with the following contents:
<html><ul><li>TEST LINE 1</li><li>TEST LINE 2</li></ul></html>
My code then creates a docx from this html like so:
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
.createPackage();
NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
ndp.unmarshalDefaultNumbering();
XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
xHTMLImporter.setHyperlinkStyle("Hyperlink");
wordMLPackage.getMainDocumentPart().getContent()
.addAll(xHTMLImporter.convert(new File("test.html"), null));
System.out.println(XmlUtils.marshaltoString(wordMLPackage
.getMainDocumentPart().getJaxbElement(), true, true));
wordMLPackage.save(new java.io.File("test.docx"));
My code then attempts to convert the docx BACK to html like so:
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
.createPackage();
NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
ndp.unmarshalDefaultNumbering();
XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
xHTMLImporter.setHyperlinkStyle("Hyperlink");
WordprocessingMLPackage docx = WordprocessingMLPackage.load(new File("test.docx"));
AbstractHtmlExporter exporter = new HtmlExporterNG2();
OutputStream os = new java.io.FileOutputStream("test.html");
HTMLSettings htmlSettings = new HTMLSettings();
javax.xml.transform.stream.StreamResult result = new javax.xml.transform.stream.StreamResult(
os);
exporter.html(docx, result, htmlSettings);
The html returned is:
<?xml version="1.0" encoding="UTF-8"?><html xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<style>
<!--/*paged media */ div.header {display: none }div.footer {display: none } /*#media print { */#page { size: A4; margin: 10%; #top-center {content: element(header) } #bottom-center {content: element(footer) } }/*element styles*/ .del {text-decoration:line-through;color:red;} .ins {text-decoration:none;background:#c0ffc0;padding:1px;}
/* TABLE STYLES */
/* PARAGRAPH STYLES */
.DocDefaults {display:block;margin-bottom: 4mm;line-height: 115%;font-size: 11.0pt;}
.Normal {display:block;}
/* CHARACTER STYLES */ span.DefaultParagraphFont {display:inline;}
-->
</style>
<script type="text/javascript">
<!--function toggleDiv(divid){if(document.getElementById(divid).style.display == 'none'){document.getElementById(divid).style.display = 'block';}else{document.getElementById(divid).style.display = 'none';}}
--></script>
</head>
<body>
<!-- userBodyTop goes here -->
<div class="document">
<p class="Normal DocDefaults " style="text-align: left;position: relative; margin-left: 17mm;text-indent: -0.25in;margin-bottom: 0in;">• <span class="DefaultParagraphFont " style="font-weight: normal;color: #000000;font-style: normal;font-size: 11.0pt;">TEST LINE 1</span>
</p>
<p class="Normal DocDefaults " style="text-align: left;position: relative; margin-left: 17mm;text-indent: -0.25in;margin-bottom: 0in;">• <span class="DefaultParagraphFont " style="font-weight: normal;color: #000000;font-style: normal;font-size: 11.0pt;">TEST LINE 2</span>
</p>
</div>
<!-- userBodyTail goes here -->
</body>
</html>
There is a lot of extra space created after each line now. Not sure why this is happening, the conversion appears to add a lot of extra white space/carriage returns.

Its not clear from your question whether you are worried about whitespace in the (X)HTML source document, or in your page as rendered (presumably in CKEditor). If the latter, then the browser and CK version may be relevant.
Whitespace may or may not be significant; try Googling 'xhtml significant whitespace' for more.
By way of background, depending on docx4j property docx4j.Convert.Out.HTML.OutputMethodXML, docx4j will use
<xsl:output method="html" encoding="utf-8" omit-xml-declaration="no" indent="no"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>
or
<xsl:output method="xml" encoding="utf-8" omit-xml-declaration="no" indent="no"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>
Note the different in the value of #method. If you want something different, you can modify docx2html.xsl or docx2xhtml.xsl respectively.

Is there a way to convert wordMLPackage to html without all the extra stuff like:
<?xml version="1.0" encoding="UTF-8"?>
and the css?
Could it just be something simple as the original html and inline css like <html><body><div style="...."></div></body></html> ?

displaying cdata in XML to be rendered as html

I know that something similar has been asked many times but I cannot find a solution that works in my situation.
I'm generating CData section within an XML using java (StringBuffer) and I'm putting a simple HTML code as shown below:
public String createXML(OrderDetailBean orderBean) throws ParserConfigurationException {
logger.info("Starting to Create the XML");
getConnectionProperties(); //Load properties file and set the Connection parameters
// Create document
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = dbf.newDocumentBuilder();
Document doc = builder.newDocument();
//Configuring the Factory to get a validating parser (ie one that understands name and spaces)
dbf.setNamespaceAware(true);
dbf.setValidating(true);
//Create doc type
DOMImplementation domImpl = doc.getImplementation();
DocumentType doctype = domImpl.createDocumentType("paymentService", "-//CompanyName//DTD CompanyName PaymentService v1//EN", "http://dtd.CompanyName.com/Service_v1.dtd");
doc.appendChild(doctype);
/******** Add ROOT element: PaymentService ********/
Element rootElement = doc.createElement("paymentService");
//Add Attributes to the Root Element
rootElement.setAttribute("version", "1.4");
rootElement.setAttribute("Code", Code);
/******** Add first element: submit ********/
Element elementSubmit = doc.createElement("submit");
/******** Add second element: order *******/
Element elementOrder = doc.createElement("order");
elementOrder.setAttribute("orderCode", ""+System.currentTimeMillis());
// Add THIRD child element for CData
Element elementOrderContent = doc.createElement("orderContent");
StringBuffer orderContent = new StringBuffer();
orderContent.append("<![CDATA[<center><table> <tr><td class=\"one width190\" align=\"left\" valign=\"top\">");
orderContent.append("<span style=\" font-family: Arial, Helvetica, sans-serif; font-size: 12pt; color: #002469;\">");
orderContent.append("Product:</span> </td><tr><td class=\"one\" align=\"left\" valign=\"top\"><span style=\" font-family: Arial, Helvetica, sans-serif; font-size: 12pt; color: #002469;\">");
orderContent.append("<strong>Product title</strong></span></td></tr> </table></center>]]>");
logger.info("The orderContent Element in XML : "+orderContent.toString());
Text orderContentText = doc.createTextNode(orderContent.toString());
logger.debug("Converted Text for Order Content is: "+orderContentText);
elementOrderContent.appendChild(orderContentText);
elementOrder.appendChild(elementOrderContent); //Add third Order Child: OrderContent
elementSubmit.appendChild(elementOrder); //Add Order Element to Submit
rootElement.appendChild(elementSubmit); //Add First Element (Submit) to Root Element (PaymentService)
doc.appendChild(rootElement); //Add Root Element to XML Doc
String stringXML = convertDocintoString(doc); //print the XML to File
logger.info("The XML Generated is: " + stringXML);
return stringXML;
}
This part is fine. I'm then converting that XML(XML Document) into String using XMLSerializer as shown below:
/*
* Convert the XML Document into a String: Serialize DOM Document to generate the xml String
*/
public String convertDocintoString(Document doc) {
logger.info("Converting the XML Document into String XML");
//OutputFormat format = new OutputFormat(doc);
OutputFormat format = new OutputFormat(doc, "UTF-8", true);
//format.setIndenting(true);
XMLSerializer serializer;
String outXML = null;
try {
StringWriter stringOut = new StringWriter ();
serializer = new XMLSerializer(stringOut, format);
serializer.asDOMSerializer();
serializer.serialize(doc);
outXML = stringOut.toString();
logger.debug("The XML String IS: " + outXML);
}
catch (FileNotFoundException e) {
e.printStackTrace();
logger.debug("XML Document Not Found for Serialization!", e);
}
catch (IOException e) {
e.printStackTrace();
logger.debug((new StringBuilder("Issues when converting the XML Document into String XML")).append(e).toString());
}
return outXML;
}
Here in this step above, I noticed that all the '<' and '>' tags get replaced by < and >. But I believe that this is normal.
Now when I'm trying to display that CData block in an HTML page, that CData block is being rendered as actual text rather than the actual HTML ie exactly as first code block that I pasted above.Can somebody please suggest whats happening here and what am I doing wrong? The HTML output is:
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<META http-equiv='Pragma' content='no-cache'>
<META http-equiv='Expires' content='0'>
<title>Select Method</title>
<style type="text/css" media="screen"> #import url(/pictures/dispatcher.css);</style>
<script type="text/javascript" src="/jsp/js/jquery-1.6.2.min.js"></script>
</head>
<body >
<div id="ordercontainer"><font ><b>Your Details</b></font>
<br/><font ><![CDATA[<input type="hidden" name="MC_mycustomvar" value="M_ and MC_ combined"><center><table><tr><td class="one width190" align="left" valign="top"><span style=" font-family: Arial, Helvetica, sans-serif; font-size: 12pt; color: #002469;">Product:</span>&nbsp;&nbsp;</td><tr><td class="one" align="left" valign="top"><span style=" font-family: Arial, Helvetica, sans-serif; font-size: 12pt; color: #002469;"><strong>Product title</strong></span></td></tr></table></center>]]></font><br/>
</body>
</html>
Thanks

You need to use the method org.w3c.dom.Document.createCDATASection(String data)
Anything you pass in the data parameter should be wrapped in CDATA in the resulting node.
// Add THIRD child element for CData
Element elementOrderContent = doc.createElement("orderContent");
StringBuffer orderContent = new StringBuffer();
// Note: Removed the <![CDATA[ ]]> from this string concat
orderContent.append("<center><table> <tr><td class=\"one width190\" align=\"left\" valign=\"top\">");
orderContent.append("<span style=\" font-family: Arial, Helvetica, sans-serif; font-size: 12pt; color: #002469;\">");
orderContent.append("Product:</span> </td><tr><td class=\"one\" align=\"left\" valign=\"top\"><span style=\" font-family: Arial, Helvetica, sans-serif; font-size: 12pt; color: #002469;\">");
orderContent.append("<strong>Product title</strong></span></td></tr> </table></center>");
logger.info("The orderContent Element in XML : "+orderContent.toString());
// HERE IS THE UPDATED LINE
Text orderContentText = doc.createCDATASection(orderContent.toString());
logger.debug("Converted Text for Order Content is: "+orderContentText);
elementOrderContent.appendChild(orderContentText);
elementOrder.appendChild(elementOrderContent); //Add third Order Child: OrderContent

Replace a substring with a StringBuffer substring

I have a Huge string which is complete html obtained into a string by JSOUP.I have made changes to a substring of the html using String Bufer replace API(replace(int startIndex,int endIndex, "to be changed string).The String buffer is populated perfectly.But when I try to replace the substring of html with new String buffer it does not work.
Here is the code snippet.
html = html.replace(divStyle1.trim(), heightwidthM.toString().trim());
The initial big html is
<!DOCTYPE html>
<html xmlns:og="http://opengraphprotocol.org/schema/" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" class="SAF" id="global-header-light">
<head>
</head>
<body>
**<div style="background-image: url(http://aka-cdn-ns.adtech.de/rm/ads/23274/HPWomenLOFT_1381687318.jpg);background-repeat: no-repeat;-webkit-background-size: 1001px 2059px; height: 2059px; width: 1001px; text-align: center; margin: 0 auto;">**
<div style="height:2058px; padding-left:0px; padding-top:36px;">
<iframe style="height:90px; width:728px;"/>
</div>
</div>
</body>
</html>
The divStyle1 string is
background-image: url(http://aka-cdn-ns.adtech.de/rm/ads/23274/HPWomenLOFT_1381687318.jpg);background-repeat: no-repeat;-webkit-background-size: 1001px 2059px; height: 2059px; width: 1001px; text-align: center; margin: 0 auto;
And the String buffer has value
background-image: url(http://aka-cdn-ns.adtech.de/rm/ads/23274/HPWomenLOFT_1381687318.jpg);background-repeat: no-repeat;-webkit-background-size: 1001px 2059px; height:720px; width:900px; text-align: center; margin: 0 auto;
does not work where divStyle is a substring of the last HTML(in String) and heightwidthM is a Stringbuffer value with which it has to be replaced.It doesnt throw any errors but it does not change it as well.
Thanks
Swaraj

This is very easy with JSoup
String html = "<!DOCTYPE html>\n<html xmlns:og=\"http://opengraphprotocol.org/schema/\" xmlns:fb=\"http://www.facebook.com/2008/fbml\" xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\" class=\"SAF\" id=\"global-header-light\">\n<head>\n\n</head>\n<body>\n\n\n**<div style=\"background-image: url(http://aka-cdn-ns.adtech.de/rm/ads/23274/HPWomenLOFT_1381687318.jpg);background-repeat: no-repeat;-webkit-background-size: 1001px 2059px; height: 2059px; width: 1001px; text-align: center; margin: 0 auto;\">** \n\n<div style=\"height:2058px; padding-left:0px; padding-top:36px;\">\n\n\n<iframe style=\"height:90px; width:728px;\"/>\n\n\n\n</div>\n</div>\n\n</body>\n</html>";
String newStyle = "background-image: url(http://aka-cdn-ns.adtech.de/rm/ads/23274/HPWomenLOFT_1381687318.jpg);background-repeat: no-repeat;-webkit-background-size: 1001px 2059px; height:720px; width:900px; text-align: center; margin: 0 auto;";
Document document = Jsoup.parse(html);
document.body().child(0).attr("style", newStyle);
System.out.println(document.html());

Coming back to my suggestion, if you don't mind trying, you can do something of this sort:
Document newDocument = Jsoup.parse(<your html string>, StringUtils.EMPTY, Parser.htmlParser());
Elements yourStyles = newDocument.select("div[style]"); // this will select all div with attributes style
yourStyles.get(0).attr("style", <your new value>); // this will get your first div and replace attribute style to your new value
System.out.println(newDocument.outerHtml());

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Flying Saucer - html entities are not rendered - java

Related

flying saucer create blank page when set landscape?

Simple way to display currency symbol in html2pdf for iText 7

docx4j conversion html->docx->html

displaying cdata in XML to be rendered as html

Replace a substring with a StringBuffer substring

Categories

Resources