How to get the mail-content without the whole source code?

How to get the mail-content without the whole source code? - java

I read some mails out with javax.
Then I want to save the content of a message.
For example, I read a mail with the simple content of By: Test.
Now I read the content with the .getContent() method:
Object body = message.getContent();
String content = ((body instanceof String) ? (String) body : "NO STRING CONTENT");
But the problem here is, the simple e-mail content of By: Test gets displayed by the whole Outlook-source code of the message:
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
#font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.E-MailFormatvorlage17
{mso-style-type:personal-compose;
font-family:"Arial","sans-serif";
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";
mso-fareast-language:EN-US;}
#page WordSection1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 2.0cm 70.85pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="DE-CH" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">By: Test<o:p></o:p></span></p>
</div>
</body>
</html>
So how can I read out a mail-content without getting the whole mail source-code?

First, I would start by extracting the content in the <body> section of the String. Afterwards, it depends on your liking, but you could remove every HTML-tag, for example, but beware that any formatting (line breaks!) code is gone and you get only a big chunk of text.

I just remember the simple and better way. You can just take a plain/text piece of the email.
String content = getPlainText((Part)message);
private String getPlainText(Part p) throws MessagingException, IOException {
if (p.isMimeType("text/plain")) {
return (String) p.getContent();
} else if (p.isMimeType("multipart/*")) {
Multipart mp = (Multipart) p.getContent();
for (int i = 0; i < mp.getCount(); i++) {
String s = getPlainText(mp.getBodyPart(i));
if (s != null) return s;
}
}
return null;
}

Related

How do we split words from a html file using string manipulations in java?

I need to create a method that reads a html file then display the number of word occurrence.
for example: String [] words = {"happy", "nice", "good"};
The word happy was used 7 times.
The word nice was used 1 times.
The word happy was used 2 times.
This is what I did:
public static void ReadWriteDisplay() {
Path in = Paths.get("E:\\TextToHTML.html");
Path out = Paths.get("E:\\HTMLToText.txt");
String s = "";
String str = "";
try {
InputStream input = new BufferedInputStream(Files.newInputStream(in));
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
OutputStream output = new BufferedOutputStream(Files.newOutputStream(out, CREATE, WRITE, TRUNCATE_EXISTING));
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(output));
s = reader.readLine();
while(s != null) {
str += s;
writer.write(s);
writer.newLine();
s = reader.readLine();
}
reader.close();
writer.close();
String a[] = str.split(" ");
System.out.println("str: "+str);
String [] positive = {"happy", "nice", "good", "joy", "love"};
int [] count = {0, 0, 0, 0, 0};
for (int i = 0; i < a.length; i++) {
if(positive[0].equalsIgnoreCase(a[i]))
count[0]++;
if(positive[1].equalsIgnoreCase(a[i]))
count[1]++;
if(positive[2].equalsIgnoreCase(a[i]))
count[2]++;
if(positive[3].equalsIgnoreCase(a[i]))
count[3]++;
if(positive[4].equalsIgnoreCase(a[i]))
count[4]++;
}
for (int x = 0; x < 5; x++) {
System.out.println("The word "+positive[x]+" was used "+count[x]+" times.");
}
} catch(Exception e) {
System.err.println("Message: "+ e);
}
}
My method runs but it does not provide accurate number of occurrence. The reason because some words in html are enclosed in <> which caused <>Hello<> to be stored in my string array instead of the word Hello.
Here is the sample output:
str: <!DOCTYPE html><html lang="en"><head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <meta http-equiv="content-language" content="en" /> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="google-site-verification" content="rUp8isOBygjhxPJ2qyy6QtBi9vWRFhIboMXucJsCtrE" /> <title>JustPaste.it - Share Text & Images the Easy Way</title> <link rel="preload" href="/static/img/jp_logo_1_en_v4.png" as="image" /> <meta name="robots" content="noindex, nofollow" /> <meta name="googlebot" content="noindex, nofollow" /> <link rel="preload" href="/build/global.395f53d0.css" as="style" /> <link rel="stylesheet" type="text/css" href="/build/global.395f53d0.css" /> <link rel="shortcut icon" href="/static/other/fav.ico" /> <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> <!-- WARNING: Respond.js doesn't work if you view the page via file:// --> <!--[if lt IE 9]> <script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script> <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> <![endif]--> <script> window.article = {"id":42017684,"url":"https:\/\/justpaste.it\/6fn9m","shortUrl":"https:\/\/jpst.it\/2wiek","pdfUrl":"https:\/\/justpaste.it\/6fn9m\/pdf","qrCodeData":"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAFcAAABXCAIAAAD+qk47AAAACXBIWXMAAA7EAAAOxAGVKw4bAAACCklEQVR4nO2by27DMAwEx0X\/\/5fTAwFdaNB8SEmB7BzjSDEWy4ikpOv1evH1\/Hz6Bf4FUgGkgiEVQCoYv\/6j67omM65FJzOPX6HWKD9PaebSj8oLIBWMm4hYlBIq79Jg+Pqyd3vpR4dvuJAXQCoYUUQsAi9lPOlt74dnloZzbygvgFQwUhExpJft9EKjh7wAUsF4R0QE+Bh5g\/898gJIBSMVEUNzDjOiDMN55AWQCkYUEcOWTqlrtL18KCEvgFQwbiJie7qSMXkpELa\/obwAUsFI7UcEpXHw397bmMh0cXtJVzBKXgCpYFyB3xYlT\/Ye3bzZ7q264EflBZAKRmqHLmPyYJR\/5IeXEqrt8SgvgFQwojoiY9feEpN5VCLo4maQF0AqGLVzTcM\/50UpEdpVj+sUxwNSAao7dJk6erHrhN65umYhL4BUMGoRUTJ56TsBw\/UoM0peAKlg1CrrRamgLnEu6VLW9IBUgLj7Ouz\/DJePHr16RF4AqWA096yDc92lCXs3hjzDyJIXQCoYB+\/Q9Q4vDS9cBPOojnhAKsDRO3R+nl3dp94uhrKmB6QCHL1Dlznp1GsWbUdeAKlgvOPGUK8juqt5mymx5QWQCsbBiCglS5+9KCEvgFQwDt6hO3djdHtfV14AqWAcvEO36B1M6mVNvQpFXgCpYNzs0H0h8gJIBUMqgFQwpALAH\/JvmLtnlWjnAAAAAElFTkSuQmCC"}; window.statsUrl = 'https\u003A\/\/stats.justpaste.it'; window.viewKey = 'x6ER'; window.barOptions = {"isLoggedIn":false,"hasPublicProfile":false,"displayOwnership":false,"isArticleOwner":false,"isPasswordProtected":false,"isCaptchaRequired":null,"isCaptchaEntered":false,"captchaSettings":null,"premiumUserData":null,"isPrivate":false,"isExpired":false,"expireAfterRead":false,"isShared":false,"defaultAvatar":"\/static\/img\/avatar60.jpg","createdText":"6h","showLastEdit":false,"modifiedText":"6h","isInTrash":false,"viewsText":"2","favouritesCount":0,"onlineText":"1","getFavouriteArticleUrl":"https:\/\/justpaste.it\/api\/account\/v1\/favourite-article\/42017684","addFavouriteArticleUrl":"https:\/\/justpaste.it\/api\/account\/v1\/favourite-article","removeFavouriteArticleUrl":"https:\/\/justpaste.it\/api\/account\/v1\/favourite-article-delete\/42017684","apiShowArticleDynamicUrl":"\/api\/v1\/article-dynamic","voteUrl":"\/api\/account\/v1\/vote","contentLang":"en","positiveVotes":0,"negativeVotes":0,"currentVote":"empty","linkSharingUrl":null,"linkSharingSecret":null}; </script> <script src="/build/runtime.a1e5a72a.js" async></script> <script src="/build/1676.2c557867.js" async></script> <script src="/build/8452.a9a1e0c5.js" async></script> <script src="/build/5936.ad26e56d.js" async></script> <script src="/build/9412.4a605741.js" async></script> <script src="/build/showarticlewidget.3bbca334.js" async></script> </head><body marginwidth="0" dir="ltr" marginheight="0"><!-- Static navbar --><div class="navbar navbar-default navbar-static-top mainTableTopMiddle" role="navigation"> <div class="container"> <div class="navbar-header pull-left"> <img src="/static/img/jp_logo_1_en_v4.png" width="186px" height="54px" alt="JustPaste.it" /> </div> <div class="navbar-header pull-left"> <div class="nav navbar-nav mainTableTopMiddleRight hidden-xs hidden-sm"> <img src="/static/img/jp_logo_2_en_v5.png" width="390px" height="54px" /> </div> </div> <div class="navbar-header pull-right" style="padding-top:8px"> <div id="mainPanelButtons"></div> </div> </div><!--/.nav-collapse --></div><div id="headContainer" class="container" style="max-width: 960px"> <div class="row"> <div class="col-md-12"> <div id="mainTableContent"> <div style="max-width: 960px; vertical-align: top"> <div id="showArticleWidget"><div class="showArticleWidgetPlaceholder"></div></div> <div id="articleContent"> <p>happy</p> <p>nice nice</p> <p>good good good</p> <p>joy Joy joy Joy joy</p> <p>Love love Love love Love</p> </div> <div id="showArticleBottomWidget"><div class="articleBottomWidgetPlaceholder"></div></div> <span style="visibility:hidden" class="glyphicon glyphicon-link"></span></div> </div> </div> </div> <!-- /row --></div> <!-- /container --><div id="footer" style="min-height: 30px;"> <div class="container" style="vertical-align: middle"> <div class="col-md-3 col-xs-5 col-sm-4 text-muted" style="font-size: 95%;" align="left"> © 2021 <span class="hidden-xs">justpaste.it</span> </div> <div class="col-md-9 col-xs-7 col-sm-8 text-muted" align="right"> <ul class="list-inline basePageFooterList"> <li class="hidden-xs"> Account </li> <li class="hidden-xs"> Terms </li> <li class="hidden-xs"> Privacy </li> <li class="hidden-xs"> Cookies </li> <li> Blog </li> <li> About </li> </ul> </div> </div></div> <script> window.mainPanelOptions = { addArticleUrl: '/', loginUrl: '/login', logoutUrl: '/logout', favouriteArticlesUrl: '/account/favourite', subscribedArticlesUrl: '/account/subscribed', sharedArticlesUrl: '/account/shared', manageAccountUrl: '/account/manage', messagesUrl: '/account/messages', articlesStatsUrl: '/account/articles-stats', premiumUrl: '/premium/subscription', unreadMessagesUrl: 'https://msg.justpaste.it/api/v1/conversation/unread', profileSettings: '/account/settings', isLoggedIn: false, userEmail: null, userPermalink: null, userProfileIsPublic: false, userProfileLink: null }; </script> <script src="/build/mainpanelwidget.80530742.js" async></script> </body></html>
The word happy was used 0 times.
The word nice was used 0 times.
The word good was used 1 times.
The word joy was used 3 times.
The word love was used 3 times.
How do I properly split or count the number of occurrence? Thank you!

You can simply use jsoup: Java HTML Parser library to fetch all text of html structure.
Download jar file from: https://jsoup.org/download
Below code will count occurrences of words:
static void countOccurance(String htmlStructure) {
String[] positive = { "happy", "nice", "good", "joy", "love" };
Document document = Jsoup.parse(htmlStructure);
String[] text = document.body().text().split("\\s+");
for (String word : positive) {
int wordCount = countWord(text, word);
System.out.println("The word " + word + " was used " + wordCount + " times.");
}
}
static int countWord(String[] documentText, String wordToFind) {
int count = 0;
for (int i = 0; i < documentText.length; i++) {
if (wordToFind.equalsIgnoreCase(documentText[i]))
count++;
}
return count;
}

This will help you to remove special characters, this will only allow alphabets for example : <>Hello<> will be replaced like Hello
String alphaOnly = input.replaceAll("[^a-zA-Z]+","");

Java android save HTML from website

In java this is what I would use to download the html:
static public String savePage(final String URL) throws IOException {
String line = "", all = "";
java.net.URL myUrl = null;
BufferedReader in = null;
try {
myUrl = new URL(URL);
in = new BufferedReader(new InputStreamReader(myUrl.openStream()));
while ((line = in.readLine()) != null) {
all += line;
}
} finally {
if (in != null) {
in.close();
}
}
return all;
}
The HTML I get by using this code in normal java is exactly what I need. However when I try using this code in Android Java (Android studio) the resulting HTML is incomplete and is not what I need. All I want is the HTML to be exactly how it is on the actual link.
This is what the HTML looks like when I download it in Android Java:
<!DOCTYPE html><html lang="en-GB"> <head id="head"> <style
name="www-roboto">#font-face{font-family:'Roboto';font-style:italic;font-weight:400;src:url(//fonts.gstatic.com/s/roboto/v15/W4wDsBUluyw0tK3tykhXEXYhjbSpvc47ee6xR_80Hnw.ttf)format('truetype');}#font-face{font-family:'Roboto';font-style:normal;font-weight:400;src:url(//fonts.gstatic.com/s/roboto/v15/QHD8zigcbDB8aPfIoaupKOvvDin1pK8aKteLpeZ5c0A.ttf)format('truetype');}#font-face{font-family:'Roboto';font-style:normal;font-weight:500;src:url(//fonts.gstatic.com/s/roboto/v15/RxZJdnzeo3R5zSexge8UUSZ2oysoEQEeKwjgmXLRnTc.ttf)format('truetype');}#font-face{font-family:'Roboto';font-style:italic;font-weight:500;src:url(//fonts.gstatic.com/s/roboto/v15/OLffGBTaF0XFOW1gnuHF0SwlidHJgAgmTjOEEzwu1L8.ttf)format('truetype');}</style><script
name="www-roboto">if (document.fonts && document.fonts.load) {document.fonts.load("400 10pt Roboto", "E");document.fonts.load("500 10pt Roboto", "E");}</script> <script>var ytcsi = {gt: function(n) {n = (n || '') + 'data_';return ytcsi[n] || (ytcsi[n] = {tick: {},span: {},info: {}});},tick: function(l, t, n) {ytcsi.gt(n).tick[l] = t || +new Date();},span: function(l, s, e, n) {ytcsi.gt(n).span[l] = (e ? e : +new Date()) - ytcsi.gt(n).tick[s];},setSpan: function(l, s, n) {ytcsi.gt(n).span[l]
= s;},info: function(k, v, n) {ytcsi.gt(n).info[k] = v;},setStart: function(s, t, n) {ytcsi.info('yt_sts', s, n);ytcsi.tick('_start', t, n);}};(function(w, d) {ytcsi.perf = w.performance || w.mozPerformance ||w.msPerformance || w.webkitPerformance;ytcsi.setStart('dhs', ytcsi.perf ? ytcsi.perf.timing.responseStart : null);var isPrerender = (d.visibilityState || d.webkitVisibilityState) == 'prerender';var vName = d.webkitVisibilityState ? 'webkitvisibilitychange' : 'visibilitychange';if (isPrerender) {ytcsi.info('prerender', 1);var startTick = function() {ytcsi.setStart('dhs');d.removeEventListener(vName, startTick);};d.addEventListener(vName, startTick, false);}if (d.addEventListener) {d.addEventListener(vName, function() {ytcsi.tick('vc');}, false);}})(window, document);</script> <script>if (window.ytcsi) {window.ytcsi.tick("_start", null, 'initpb');}</script> <script>if (window.ytcsi) {window.ytcsi.tick("_start", null, 'blz_watch_ads');}</script> <script>if (window.ytcsi) {window.ytcsi.tick("_start", null, 'blz_home_ads');}</script> <script>if (window.ytcsi) {window.ytcsi.tick("_start", null, 'blz_search_ads');}</script> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, target-densityDpi=medium-dpi"> <link rel="icon" href="//s.ytimg.com/yts/favicon-vflz7uhzw.ico" type="image/x-icon"> <link rel="shortcut icon" href="//s.ytimg.com/yts/favicon-vflz7uhzw.ico" type="image/x-icon"> <title>YouTube</title> <link rel="stylesheet" href="//s.ytimg.com/yts/cssbin/mobile-nirvana-tablet-mangled-vflylHmeV.css" id="page_css"> </head> <body id="body" class="atom fusion-tn"> <script> var original_url = encodeURIComponent(encodeURIComponent(encodeURIComponent(document.location.href))); var iframe_url = 'https://accounts.google.com/ServiceLogin?continue=http%3A%2F%2Fwww.youtube.com%2Fsignin%3Fnext%3Dhttp%253A%252F%252Fm.youtube.com%252Fsignin_passive%253Foriginal_url%253DORIGINAL_URL_PLACE_HOLDER%26hl%3Den-GB%26feature%3Dmobile_passive%26app%3Dm%26action_handle_signin%3Dtrue&hl=en-GB&passive=true&service=youtube&uilel=3'.replace('ORIGINAL_URL_PLACE_HOLDER', original_url); document.write('<iframe src=\"' + iframe_url + '\" style=\"width:0;height:0;margin:0;border-width:0;padding:0;position:absolute;\"></iframe>'); </script> <div id="player"></div> <div id="guide-layout-container"> <div id="guide-container"></div> <div id="content-container"> <div id="content"></div> </div> <div id="guide-overlay"></div> <div id="lightbox"></div> <div id="toast"></div> <div id="content-overlay"></div> </div> <div id="_yt_orientation_de
This HTML is nothing like the website, Im trying to download it from. Ive tried a lot of different methods for downloading html from websites and all give me incomplete and random HTML like this.
I have tried to encode the URL and used libraries that I can use to download HTML but still no luck.
An explanation to this and a maybe even code that would do what I want would be greatly appreciated. Android java is new to me so lots of details would help me understand better.
Thank you

According to the comments, if you want to site to think you are not communicating with it from a mobile device, you need to set the User-Agent in the network request.

Unusual outcome while writing to file

I am having trouble while writing to file using the PrintWriter. Following is my code:
String abc = request.getParameter("textAreaField"); //String is "a b c" (with spaces)
String fileA = dir + "/A";
PrintWriter fileWriterA = new PrintWriter(new FileOutputStream(fileA,true));
fileWriterA.println(abc);
fileWriterA.close();
The problem I am having here is while writing to the file "A" in the directory "dir" only "a" from String abc will be written and the rest after the space is not written. String abc here in the code is coming from a textarea in html and I have the above code in my servlet. I am not able to understand why it won't write the string with spaces to file. I think it should. I have also checked by printing the String abc and it does print the string "a b c" (with spaces). But it won't print that to file. Is there a problem with my code? Any help would be appreciated.
Thanks in advance.

I have used your code and written a servlet . It is working absolutely fine. Here is the code.
protected void doPost(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
System.out.println(request.getParameter("ta"));
String abc = request.getParameter("ta");
String fileA = "/A";
PrintWriter fileWriterA = new PrintWriter(new FileOutputStream(fileA,true));
fileWriterA.println(abc);
fileWriterA.close();
}
and here is the jsp:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>Insert title here</title>
</head>
<body>
<form action="Test">
<textarea rows="20" cols="20" name="ta"></textarea><!-- having value -- check some spaces -->
<input type="submit" value="Submit">
</form>
</body>
</html>

docx4j conversion html->docx->html

I'm working on my first project using docx4j... My goal is to export xhtml from a webapp (ckeditor created html) into a docx, edit it in Word, then import it back into the ckeditor wysiwyg.
(*crosspost from http://www.docx4java.org/forums/xhtml-import-f28/html-docx-html-inserts-a-lot-of-space-t1966.html#p6791?sid=78b64a02482926c4dbdbafbf50d0a914
will update when answered)
I have created an html test document with the following contents:
<html><ul><li>TEST LINE 1</li><li>TEST LINE 2</li></ul></html>
My code then creates a docx from this html like so:
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
.createPackage();
NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
ndp.unmarshalDefaultNumbering();
XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
xHTMLImporter.setHyperlinkStyle("Hyperlink");
wordMLPackage.getMainDocumentPart().getContent()
.addAll(xHTMLImporter.convert(new File("test.html"), null));
System.out.println(XmlUtils.marshaltoString(wordMLPackage
.getMainDocumentPart().getJaxbElement(), true, true));
wordMLPackage.save(new java.io.File("test.docx"));
My code then attempts to convert the docx BACK to html like so:
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
.createPackage();
NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
ndp.unmarshalDefaultNumbering();
XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
xHTMLImporter.setHyperlinkStyle("Hyperlink");
WordprocessingMLPackage docx = WordprocessingMLPackage.load(new File("test.docx"));
AbstractHtmlExporter exporter = new HtmlExporterNG2();
OutputStream os = new java.io.FileOutputStream("test.html");
HTMLSettings htmlSettings = new HTMLSettings();
javax.xml.transform.stream.StreamResult result = new javax.xml.transform.stream.StreamResult(
os);
exporter.html(docx, result, htmlSettings);
The html returned is:
<?xml version="1.0" encoding="UTF-8"?><html xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<style>
<!--/*paged media */ div.header {display: none }div.footer {display: none } /*#media print { */#page { size: A4; margin: 10%; #top-center {content: element(header) } #bottom-center {content: element(footer) } }/*element styles*/ .del {text-decoration:line-through;color:red;} .ins {text-decoration:none;background:#c0ffc0;padding:1px;}
/* TABLE STYLES */
/* PARAGRAPH STYLES */
.DocDefaults {display:block;margin-bottom: 4mm;line-height: 115%;font-size: 11.0pt;}
.Normal {display:block;}
/* CHARACTER STYLES */ span.DefaultParagraphFont {display:inline;}
-->
</style>
<script type="text/javascript">
<!--function toggleDiv(divid){if(document.getElementById(divid).style.display == 'none'){document.getElementById(divid).style.display = 'block';}else{document.getElementById(divid).style.display = 'none';}}
--></script>
</head>
<body>
<!-- userBodyTop goes here -->
<div class="document">
<p class="Normal DocDefaults " style="text-align: left;position: relative; margin-left: 17mm;text-indent: -0.25in;margin-bottom: 0in;">• <span class="DefaultParagraphFont " style="font-weight: normal;color: #000000;font-style: normal;font-size: 11.0pt;">TEST LINE 1</span>
</p>
<p class="Normal DocDefaults " style="text-align: left;position: relative; margin-left: 17mm;text-indent: -0.25in;margin-bottom: 0in;">• <span class="DefaultParagraphFont " style="font-weight: normal;color: #000000;font-style: normal;font-size: 11.0pt;">TEST LINE 2</span>
</p>
</div>
<!-- userBodyTail goes here -->
</body>
</html>
There is a lot of extra space created after each line now. Not sure why this is happening, the conversion appears to add a lot of extra white space/carriage returns.

Its not clear from your question whether you are worried about whitespace in the (X)HTML source document, or in your page as rendered (presumably in CKEditor). If the latter, then the browser and CK version may be relevant.
Whitespace may or may not be significant; try Googling 'xhtml significant whitespace' for more.
By way of background, depending on docx4j property docx4j.Convert.Out.HTML.OutputMethodXML, docx4j will use
<xsl:output method="html" encoding="utf-8" omit-xml-declaration="no" indent="no"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>
or
<xsl:output method="xml" encoding="utf-8" omit-xml-declaration="no" indent="no"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>
Note the different in the value of #method. If you want something different, you can modify docx2html.xsl or docx2xhtml.xsl respectively.

Is there a way to convert wordMLPackage to html without all the extra stuff like:
<?xml version="1.0" encoding="UTF-8"?>
and the css?
Could it just be something simple as the original html and inline css like <html><body><div style="...."></div></body></html> ?

How do I make a image viewer like facebook?

I want to make an image viewer (for my website) like the one in Facebook (the old one). When the user click the next or back arrow it will change the picture and the URL of the page.
This is an example of what I want (http://www.facebook.com/pages/Forest-Ville/307556775942281)
Most importantly I want the page to reload with each click with new (URL, comment box, ads, etc.) I do not want to use any Cookies.
Now I am using this, but its completely different from what I want.
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>
<script language="JavaScript">
var NumberOfImages = 10
var img = new Array(NumberOfImages)
img[0] = "http://damnthisfunny.site40.net/1.jpg"
img[1] = "http://damnthisfunny.site40.net/2.jpg"
img[2] = "http://damnthisfunny.site40.net/3.jpg"
img[3] = "http://damnthisfunny.site40.net/4.jpg"
img[4] = "http://damnthisfunny.site40.net/5.jpg"
img[5] = "http://damnthisfunny.site40.net/6.jpg"
img[6] = "http://damnthisfunny.site40.net/7.jpg"
img[7] = "http://damnthisfunny.site40.net/8.jpg"
img[8] = "http://damnthisfunny.site40.net/9.jpg"
img[9] = "http://damnthisfunny.site40.net/10.jpg"
var imgNumber = 0
function NextImage()
{
imgNumber++
if (imgNumber == NumberOfImages)
imgNumber = 0
document.images["VCRImage"].src = img[imgNumber]
}
function PreviousImage()
{
imgNumber--
if (imgNumber < 0)
imgNumber = NumberOfImages - 1
document.images["VCRImage"].src = img[imgNumber]
}
</script>
<body>
<center>
<img name="VCRImage" src="http://damnthisfunny.site40.net/1.jpg" /></dr>
<br />
<a href="javascript:PreviousImage()">
<img border="0" src="left1.jpg" /></a>
<a href="javascript:NextImage()">
<img border="0" src="right1.jpg" /></a>
</center>
</body>
</html>
Any ideas ?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to get the mail-content without the whole source code? - java

First, I would start by extracting the content in the <body> section of the String. Afterwards, it depends on your liking, but you could remove every HTML-tag, for example, but beware that any formatting (line breaks!) code is gone and you get only a big chunk of text.

Related

How do we split words from a html file using string manipulations in java?

Java android save HTML from website

Unusual outcome while writing to file

docx4j conversion html->docx->html

How do I make a image viewer like facebook?

Categories

Resources