Export HTML with data to MS Word

Export HTML with data to MS Word - java

We have a requirement where we are asking our customers to fill the BRD document which is in a HTML file. HTML consists of radio buttons, text box etc along with colors and table. We will have a button which when clicked should call a java class which exports the HTML along with data customer inputs to word document. We are successful in converting a HTML code which is given directly as a string in the java program to word document. We are having issues in sending the HTML along with data.
Can any one let me know how I can achieve this? Or is there any better way we can do this.
public class XhtmlToDocx {
public static void main(String[] args) throws Exception {
//String html = "<html><form><input type=\"checkbox\" name=\"xhtml_mp_tutorial_chapter\" value=\"1\"/></form></html>";
String html = "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">"+
"<html xmlns=\"http://www.w3.org/1999/xhtml\">"+
"<head>"+
"<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">"+
"<title>Untitled Form</title>"+
"<link rel=\"stylesheet\" type=\"text/css\" href=\"view.css\" media=\"all\">"+
"<script type=\"text/javascript\" src=\"view.js\"></script>"+
"<script type=\"text/javascript\" src=\"calendar.js\"></script>"+
"</head>"+
"<body id=\"main_body\" >"+
" "+
" <img id=\"top\" src=\"top.png\" alt=\"\">"+
" <div id=\"form_container\">"+
" "+
" <h1><a>Untitled Form</a></h1>"+
" <form id=\"form_82495\" class=\"appnitro\" method=\"post\" action=\"\">"+
" <div class=\"form_description\">"+
" <h2>Untitled Form</h2>"+
" <p>This is your form description. Click here to edit.</p>"+
" </div> "+
" <ul >"+
" "+
" <li id=\"li_1\" >"+
" <label class=\"description\" for=\"element_1\">Text </label>"+
" <div>"+
" <input id=\"element_1\" name=\"element_1\" class=\"element text medium\" type=\"text\" maxlength=\"255\" value=\"\"/> "+
" </div> "+
" </li> <li id=\"li_3\" >"+
" <label class=\"description\" for=\"element_3\">Multiple Choice </label>"+
" <span>"+
" <input id=\"element_3_1\" name=\"element_3\" class=\"element radio\" type=\"radio\" value=\"1\" />"+
"<label class=\"choice\" for=\"element_3_1\">First option</label>"+
"<input id=\"element_3_2\" name=\"element_3\" class=\"element radio\" type=\"radio\" value=\"2\" />"+
"<label class=\"choice\" for=\"element_3_2\">Second option</label>"+
"<input id=\"element_3_3\" name=\"element_3\" class=\"element radio\" type=\"radio\" value=\"3\" />"+
"<label class=\"choice\" for=\"element_3_3\">Third option</label>"+
""+
" </span> "+
" </li> <li id=\"li_2\" >"+
" <label class=\"description\" for=\"element_2\">Date </label>"+
" <span>"+
" <input id=\"element_2_1\" name=\"element_2_1\" class=\"element text\" size=\"2\" maxlength=\"2\" value=\"\" type=\"text\"> /"+
" <label for=\"element_2_1\">MM</label>"+
" </span>"+
" <span>"+
" <input id=\"element_2_2\" name=\"element_2_2\" class=\"element text\" size=\"2\" maxlength=\"2\" value=\"\" type=\"text\"> /"+
" <label for=\"element_2_2\">DD</label>"+
" </span>"+
" <span>"+
" <input id=\"element_2_3\" name=\"element_2_3\" class=\"element text\" size=\"4\" maxlength=\"4\" value=\"\" type=\"text\">"+
" <label for=\"element_2_3\">YYYY</label>"+
" </span>"+
" "+
" <span id=\"calendar_2\">"+
" <img id=\"cal_img_2\" class=\"datepicker\" src=\"calendar.gif\" alt=\"Pick a date.\"> "+
" </span>"+
" <script type=\"text/javascript\">"+
" Calendar.setup({"+
" inputField : \"element_2_3\","+
" baseField : \"element_2\","+
" displayArea : \"calendar_2\","+
" button : \"cal_img_2\","+
" ifFormat : \"%B %e, %Y\","+
" onSelect : selectDate"+
" });"+
" </script>"+
" "+
" </li>"+
" "+
" <li class=\"buttons\">"+
" <input type=\"hidden\" name=\"form_id\" value=\"82495\" />"+
" "+
" <input id=\"saveForm\" class=\"button_text\" type=\"submit\" name=\"submit\" value=\"Submit\" />"+
" </li>"+
" </ul>"+
" </form> "+
" <div id=\"footer\">"+
" Generated by pForm"+
" </div>"+
" </div>"+
" <img id=\"bottom\" src=\"bottom.png\" alt=\"\">"+
" </body>"+
"</html>";
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
AlternativeFormatInputPart afiPart = new AlternativeFormatInputPart(new PartName("/hw.html"));
afiPart.setBinaryData(html.getBytes());
afiPart.setContentType(new ContentType("text/html"));
Relationship altChunkRel = wordMLPackage.getMainDocumentPart().addTargetPart(afiPart);
// .. the bit in document body
CTAltChunk ac = Context.getWmlObjectFactory().createCTAltChunk();
ac.setId(altChunkRel.getId() );
wordMLPackage.getMainDocumentPart().addObject(ac);
// .. content type
wordMLPackage.getContentTypeManager().addDefaultContentType("html", "text/html");
wordMLPackage.save(new java.io.File("C:/Users/****/Downloads/Word.docx"));
}
}

The problem you are facing seems to be happening because you are reading the static HTML page not the submitted page.
In order to get the full content of the submitted html, you need to submit your form first with the data, create it as a static html page and then access that page with XMLSerializer or URLStreamReader to get the final data to be passed to the word processing part of your program.
I am not providing an exact solution with code as I suppose you will be able to implement the solution yourself and you are mainly stuck on the logic.

Related

Extract texts from a string

I've following string which is HTML -
<html>
<head>
<title>Repository</title>
</head>
<body>
<h2>Subversion</h2>
<ul>
<li>
..
</li>
<li>
branch_A
</li>
<li>
branch_B
</li>
</ul>
</body>
</html>
Out of this I want to get labels of li tag which are branch_A, branch_B
Count of li's can vary. I want to get all of them. Can you please help how I can parse this String and get those values?
NOTE I could have used jsoup library to achieve same, but considering our project restriction, I cannot use it.

You can use an HTML parser for this. In the code below jsoup (https://www.baeldung.com/java-with-jsoup) is used and its quick and easy.
Document doc = Jsoup.connect(fix url here).get();
doc.select(tag you want).forEach(System.out::println);
Other tools are discussed here: https://tomassetti.me/parsing-html/

Using Java 8 streams:
String html = "<html>\n" +
" <head>\n" +
" <title>Repository</title>\n" +
" </head>\n" +
" <body>\n" +
" <h2>Subversion</h2>\n" +
" <ul>\n" +
" <li>\n" +
" ..\n" +
" </li>\n" +
" <li>\n" +
" branch_A\n" +
" </li>\n" +
" <li>\n" +
" branch_B\n" +
" </li>\n" +
" </ul>\n" +
" </body>\n" +
"</html>";
html.lines().filter(line -> line.contains("<a href")).forEach(System.out::println);
Output:
..
branch_A
branch_B
Keep in mind you can run streams in parallel if you have huge HTML file.
Also you can strip HTML tags using map:
html.lines().filter(line -> line.contains("<a href")).map(line -> line.replaceAll("<[^>]*>","")).forEach(System.out::println);
Output:
branch_A
..
branch_B

International characters not showing correctly in Java PDF export

I am trying to export chinese/japanese characters AND polish characters to a pdf. But only one of both works, not both, and I don't understand why.
Here is the program:
public class Japanese {
public static void main(String[] args) throws Exception {
StringBuffer writer = new StringBuffer();
writer.append("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1?DTD/transitional.dtd\">\n" +
"<html>\n" +
"<head>\n" +
"<META http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />\n" +
"<title>some title.</title>\n" +
"\n" +
"<style type=\"text/css\">\n" +
" .myDiv\n" +
" {\n" +
" font-family: \"Noto Sans CJK TC Regular\", Sans-Serif;\n" +
" }\n" +
"</style>\n" +
"</head>\n" +
"<body> <div class=\"myDiv\">" +
"Chinese: 百威英博雪津(三明)啤酒有限公司 <br />" +
"Japanese: 日本にほんでは、近頃ちかごろ多おおくの人ひとが保育園ほいくえん問題もんだいについて話はなしている。 <br />" +
"Polish: ąćęł <br />" +
"German: TüööäE_3STß <br />" +
"Hello World: 你好，世界 <br />" +
"\n" +
" <br />\n" +
"\n" +
"END TEXT\n</div>" +
"<br />\n" +
"<br />\n" +
"</body></html>");
String pdfContent = writer.toString();
ITextRenderer renderer = new ITextRenderer();
ITextFontResolver resolver = renderer.getFontResolver();
resolver.addFont("lib/NotoSans-Regular.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
resolver.addFont("lib/NotoSansCJKtc-Regular.otf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
renderer.setDocumentFromString(pdfContent);
renderer.layout();
FileOutputStream os = new FileOutputStream("output.pdf");
renderer.createPDF(os);
System.out.println("Finished");
}
The program above creates a pdf with correct japanese/chinese characters. For this, I leave this line as it is:
" font-family: \"Noto Sans CJK TC Regular\", Sans-Serif;\n" +
But the polish characters are not shown, only the japanese/chinese. If I change the line above to:
" font-family: \"Noto Sans\", Sans-Serif;\n" +
then the chinese/japanese are not shown, only the polish.
I would expect that with this line, both would be shown correctly:
" font-family: \"Noto Sans\", \"Noto Sans CJK TC Regular\", Sans-Serif;\n" +
But only the polish characters are shown correctly. Here: https://www.google.com/get/noto/help/guidelines/ it is stated, that this is the correct way to achieve this:
font-family: "Noto Sans", "Noto Sans CJK JP", sans-serif;
So why is this not working?
I am using the following libraries:
flying-saucer-core-9.0.8.jar
flying-saucer-pdf-9.0.8.jar
itext-2.1.7.jar
Any idea why this may be happening?

Getting data in order with Jsoup

I'm trying to get data from html in order from a web. Html code looks like:
<div class="text">
First Text
<br>
<br>
<div style="margin:20px; margin-top:5px; ">
<table cellpadding="5">
<tbody><tr>
<td class="alt2">
<div>
Written by <b>excedent</b>
</div>
<div style="font-style:italic">quote message</div>
</td>
</tr>
</tbody></table>
</div>Second Text<br>
<br>
<img class="img" src="https://developer.android.com/_static/images/android/touchicon-180.png"><br>
<br>
Third Text
</div>
What I want to do is create an Android layout scraping html, but I need to preserve the order of the elements. In this case:
TextView => First Text
TextView => Quote Message
TextView => Second Text
ImageView => img
TextView => Third Text
The problem comes when I try to get html values in order, using JSoup I get a String with "First Text Second Text Third Text" with Element.ownText, an then img at the end, resulting:
TextView => First Text Second Text Third Text
TextView => Quote Message
ImageView => img
What can I do to get that data in order?
Thanks in advance

You can parse the html into a list of html nodes. The list of nodes will preserve the DOM order and give what you want.
Check the parseFragment method :
This method will give you a list of nodes.

Try this.
String html = ""
+ "<div class=\"text\">"
+ " First Text"
+ " <br>"
+ " <br>"
+ " <div style=\"margin:20px; margin-top:5px; \">"
+ " <table cellpadding=\"5\">"
+ " <tbody><tr>"
+ " <td class=\"alt2\">"
+ " <div>"
+ " Written by <b>excedent</b>"
+ " </div>"
+ " <div style=\"font-style:italic\">quote message</div>"
+ " </td>"
+ " </tr></tbody>"
+ " </table>"
+ " </div>Second Text<br>"
+ " <br>"
+ " <img class=\"img\" src=\"https://developer.android.com/_static/images/android/touchicon-180.png\"><br>"
+ " <br>"
+ " Third Text"
+ " </div>";
Document doc = Jsoup.parse(html);
List<String> rootTexts = doc.select("div.text").first().textNodes().stream()
.map(node -> node.text().trim())
.filter(s -> !s.isEmpty())
.collect(Collectors.toList());
System.out.println(rootTexts);
OUTPUT:
[First Text, Second Text, Third Text]

This answer is a little late, but the correct way to do what you want to do is this. For your outermost <div>, instead of getting the child elements using Element.children(), you'll want to use Element.childNodes() instead.
Element.children() only returns child Elements, in which text is not included.
Element.childNodes() returns all child nodes, which includes TextNodes and Elements.
This solution works for me.

How can I Hide fields in browser inspector

I am new in Java and JQuery. I am using hidden field to hide my value but using Browser Inspector others can find my data. how can I hide values in Browser.
out.println("<form method='post' action='test.jsp'>");
out.println("<input type=hidden name=test1 value=" + test1 + " />");
out.println("<input type=hidden name=test2 value=" + test2 + " />");
out.println("<input type=hidden name=test3 value=" + test3 + " />");
out.println("<input type=hidden name=test4 value=" + test4 + " />");
out.println("<input type=submit value=Launch />");
out.println("</form>");

html renders the data on users browser, html standards offers you to hide data by only using hidden form fields, the initial purpose of hidden form field to allow session tracking.
with hidden form field user can view the content when he views the source,
if you want to hide data i suggest to use some kind of encryption.

Website functionality breaks when code comes from AJAX

I have a gallery of images, and when I hover the mouse over any of the images, the image pulls back revealing some text.
When I have the following HTML code (at bottom of post) inside the HTML document, everything works fine.
However, when I put the exact same HTML code inside a Java servlet and have it returned to the page, everything looks normal, but the image pullback doesn't work anymore.
Any idea why that would occur? Perhaps I need to do some kind of refresh to make it work properly?
relevant code for one of the items in the gallery:
<li>
<div class="header"><p>Product 1 Shirt</p></div>
<div class="gallery_item">
<img src="gallery/thumb/gallery_01.jpg" width="214" height="194" class="cover" alt="" />
<p>Highlight 1</p>
<p>Highlight 2</p>
<p>Highlight 3</p>
More Info
Enlarge
</div>
<div class="p2"><p>Price: $10</p></div>
<div class="p2"><p>In Stock: Yes</p></div>
</li>
As requested: the servlet:
public void service(HttpServletRequest request,
HttpServletResponse response) throws IOException,
ServletException
{
PrintWriter out = response.getWriter();
response.setContentType("text/html");
String requestType = request.getParameter("type");
String result;
if(requestType.equals("getproductlist"))
{
Products products = Products.getProductsInstance();
String keywords = request.getParameter(("keywords"));
String organization = request.getParameter(("organization"));
String price = request.getParameter(("price"));
String sort = request.getParameter(("sort"));
result = products.getProducts(keywords, organization, price, sort);
//this next lines of html are actually what is returned from products.getProducts. I'm just putting it here for clarity. All the variables (name, h1, etc) are okay.
result += "<li>"
+ "<div class=\"header\"><p>"+ name +"</p></div>"
+ "<div class=\"gallery_item\">"
+ "<img src=\"gallery/thumb/gallery_01.jpg\" width=\"214\" height=\"194\" class=\"cover\" alt=\"\" />"
+ "<p>"+ h1 +"</p>"
+ "<p>"+ h2 +"</p>"
+ "<p>"+ h3 +"</p>"
+ "More Info"
+ "Enlarge "
+ ""
+ "</div>"
+ "<div class=\"p2\"><p>Price: "+ itemPrice +"</p></div>"
+ "<div class=\"p2\"><p>In Stock: "+ inStock +"</p></div> "
+ "</li>";
out.println(result);
out.close();
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Export HTML with data to MS Word - java

Related

Extract texts from a string

International characters not showing correctly in Java PDF export

Getting data in order with Jsoup

How can I Hide fields in browser inspector

Website functionality breaks when code comes from AJAX

Categories

Resources