Modifying HTML using java - java

I am trying to read a HTML file and add link to some of the texts :
for example :
I want to add link to "Campaign0" text. :
<td><p style="overflow: hidden; text-indent: 0px; "><span style="font-family: SansSerif;">101</span></p></td>
<td><p style="overflow: hidden; text-indent: 0px; "><span style="font-family: SansSerif;">Campaign0</span>
<td><p style="overflow: hidden; text-indent: 0px; "><span style="font-family: SansSerif;">unknown</span></p></td>
Link to be added:
<a href="Second.html">
I need a JAVA program that modify html to add hyperlink over "Campaign0" .
How i do this with Jsoup ?
I tried this with JSoup :
File input = new File("D://First.html");
Document doc = Jsoup.parse(input, "UTF-8", "");
Element span = doc.select("span").first(); <-- this is only for first span tag :(
span.wrap("");
Is this correct ?? It's not working :(
In short : is there anything like-->
if find <span>Campaign0</span>
then replace by <span>Campaign0</span>
using JSoup or any technology inside JAVA code??

Your code seems pretty much correct. To find the span elements with "Campaign0", "Campaign1", etc., you can use the JSoup selector "span:containsOwn(Campaign0)". See additional documentation for JSoup selectors at jsoup.org.
After finding the elements and wrapping them with the link, calling doc.html() should return the modified HTML code. Here's a working sample:
input.html:
<table>
<tr>
<td><p><span>101</span></p></td>
<td><p><span>Campaign0</span></p></td>
<td><p><span>unknown</span></p></td>
</tr>
<tr>
<td><p><span>101</span></p></td>
<td><p><span>Campaign1</span></p></td>
<td><p><span>unknown</span></p></td>
</tr>
</table>
Code:
File input = new File("input.html");
Document doc = Jsoup.parse(input, "UTF-8", "");
Element span = doc.select("span:containsOwn(Campaign0)").first();
span.wrap("");
span = doc.select("span:containsOwn(Campaign1)").first();
span.wrap("");
String html = doc.html();
BufferedWriter htmlWriter =
new BufferedWriter(new OutputStreamWriter(new FileOutputStream("output.html"), "UTF-8"));
htmlWriter.write(html);
htmlWriter.close();
output:
<html>
<head></head>
<body>
<table>
<tbody>
<tr>
<td><p><span>101</span></p></td>
<td><p><span>Campaign0</span></p></td>
<td><p><span>unknown</span></p></td>
</tr>
<tr>
<td><p><span>101</span></p></td>
<td><p><span>Campaign1</span></p></td>
<td><p><span>unknown</span></p></td>
</tr>
</tbody>
</table>
</body>
</html>

Related

How to get html preview result as a string from html element with jsoup?

I want to get HTML preview result from the jsoup element. Let say I have the jsoup element that has the following html code:
Element's HTML Code:
<div class="code-container">
<div id="highlighter_245626" class="syntaxhighlighter nogutter night">
<table border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td class="code">
<div class="container">
<div class="line number1 index0 alt2"><code class="comments">// C++ program for implementation of FCFS </code></div>
<div class="line number2 index1 alt1"><code class="comments">// scheduling </code></div>
<div class="line number3 index2 alt2"><code class="preprocessor">#include<bits/stdc++.h> </code></div>
<div class="line number4 index3 alt1"><code class="keyword bold">using</code> <code class="keyword bold">namespace</code> <code class="plain">std; </code></div>
</div>
</td>
</tr>
</tbody>
</table>
</div></div>
HTML preview result string:
// C++ program for implementation of FCFS
#include<bits/stdc++.h>
using namespace std;
I have tried to get HTML preview string with Element.Text() And I have the following problems:
Broken line endings
Irregular spacings
Is there a better way to get HTML preview result as a string from HTML element with jsoup?
This will preserve line breaks for you:
public static String cleanPreserveLineBreaks(String bodyHtml) {
// get pretty printed html with preserved br and p tags
String prettyPrintedBodyFragment = Jsoup.clean(bodyHtml, "", Whitelist.none().addTags("br", "p"), new OutputSettings().prettyPrint(true));
// get plain text with preserved line breaks by disabled prettyPrint
return Jsoup.clean(prettyPrintedBodyFragment, "", Whitelist.none(), new OutputSettings().prettyPrint(false));
}

How to get unformatted html from Jsoup

String testCases[] = {
"<table><tbody><tr><td><div><inline>Normal Line Text</inline><br/></div></td></tr></tbody></table>",
};
for (String testString : testCases) {
Document doc = Jsoup.parse(testString,"", Parser.xmlParser());
Elements elements = doc.select("table");
for (Element ele : elements) {
System.out.println("===============================================");
System.out.println(ele.html()); //Formatted
System.out.println("-----------------------------------------------");
System.out.println(ele.html().trim().replace("\n","").replace("\r","")); //Notice the Difference
}
}
Output:
===============================================
<tbody>
<tr>
<td>
<div>
<inline>
Normal Line Text
</inline>
<br />
</div></td>
</tr>
</tbody>
-----------------------------------------------
<tbody> <tr> <td> <div> <inline> Normal Line Text </inline> <br /> </div></td> </tr></tbody>
Due to the formatting done by JSoup, the value of textNodes change to include newlines.
Changing <inline> to <span> in the test case seems to work fine, but unfortunately, we have legacy data/html containing <inline> tags generated by redactor.
Try this:
Document doc = Jsoup.parse(testString,"", Parser.xmlParser());
doc.outputSettings().prettyPrint(false);
Hope it helps.
Taken from https://stackoverflow.com/a/19602313/3324704

Setting image from database

I am trying to get the image stored in the MYSQL database and set it on a JSP page, but I am getting the following error:
Servlet.service() for servlet [SearchStudentControlling] in context
with path [/Roomantech] threw exception
[java.lang.IllegalStateException: getOutputStream() contain error so i
didn,t the image in my jsp page
I am using the following code to get and set the image. I am getting the error on line number 339 which doesn't exist (my code has only 312 lines). I have been struggling for last two hours but couldn't pinpoint the problem.
session = request.getSession(false);
if(session!=null)
{
%>
<div style="width:100%;height:70%;">
<%#include file="header.jsp"%>
<div align="center" style="margin-top:150px">
<table align="center" width="73%;" height="100%;" class="abc">
<form action="nextpersonalinformation.do" method="post">
<strong><h2 style="color: darkblue;margin-bottom:30px;"><b><u>Person Information****</b></u></h2></strong>
<tr><td></td><td></td></tr>
<tr style="background-color: #003366;color: white;border-radius:5px"><td><h4 style="color: darkblue"><b style="color: white"> Your Hostel details****:</b></h4></td><td><b style="color: white">Enter Values:</b></td></tr>
<tr><td></td><td></td></tr>
<tr style="background-color: teal;font-size: 20px"></tr>
<%
ArrayList<PersonDetaildto>pa = (ArrayList<PersonDetaildto>)session.getAttribute("InformationPerson");
for(PersonDetaildto kk : pa)
{
System.out.println(kk.getFirst_nm()+" "+kk.getMiddle_nm());
%>
<tr>
<td>Person Image</td>
<td><img height="75px" width="75px" align="left" src="
<% byte[] imgbytes = kk.getPerson_image();
InputStream images = kk.getPersonimagee();
int size1=0;
response.reset();
response.setContentType("image/jpeg");
response.addHeader("Content-Disposition","filename=logo.jpg");
while((size1=images.read(imgbytes))!= -1 )
{
response.getOutputStream().write(imgbytes,0,size1);
}
response.flushBuffer();
images.close();
%>"/></td>
</tr>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
<tr>
</table>
</div>
<div style="width:100%;height:200px;">
</div>
<%
}
}
else
{
response.sendRedirect("welcome.do");
}
%>
<%#include file="footer.jsp"%>
</body>
</html>
EDIT:
You cannot add images by inserting the bytes into HTML (as Funtik already pointed out).
But you can insert it by converting your image-bytes into Base64:
<img alt="YourEmbeddedImage" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAgkAAAJNCAIAAAA0yXHVAAAACXBIWXMA..." />
As #Ben pointed out, you can output the bytes of an image inside your src tag by encoding it in base64.
But still, I advice you to use another solution for your problem.
A better solution would be to store the images on the hard drive, but keep the filenames inside the database. In this case your solution would be something like:
<td>Person Image</td>
<td><img height="75px" width="75px" align="left" src="<% kk.getPersonimagee() %>"></td>

how to extract data inside a specific td in html table using java

I have:
<table class="cast_list">
<tr><td colspan="4" class="castlist_label"></td></tr>
<tr class="odd">
<td class="primary_photo">
<a href="/name/nm0000209/?ref_=ttfc_fc_cl_i1" ><img height="44" width="32" alt="Tim Robbins" title="Tim Robbins"src="http://ia.media-imdb.com/images/G/01/imdb/images/nopicture/32x44/name-2138558783._V379389446_.png"class="loadlate hidden " loadlate="http://ia.media-imdb.com/images/M/MV5BMTI1OTYxNzAxOF5BMl5BanBnXkFtZTYwNTE5ODI4._V1_SY44_CR1,0,32,44_AL_.jpg" /></a> </td>
<td class="itemprop" itemprop="actor" itemscope itemtype="http://schema.org/Person">
<a href="/name/nm0000209/?ref_=ttfc_fc_cl_t1" itemprop='url'> <span class="itemprop" itemprop="name">Tim Robbins</span>
</a> </td>
<td class="ellipsis">
...
</td>
how can I get only the information inside the second td class? (td class= itemprop). I want to get "/name/nm0000209/?ref_=ttfc_fc_cl_t1" and "Tim Robbins".
This is my code:
Elements elms = doc.getElementsByClass("cast_list").first().getElementsByTag("table");
Elements tds = elms.select("td");
for(Element td : tds){
if(td.attr("class").contains("itemprop")){
Elements links = tds.select("a[href]");
for(Element link : links){
if(link.attr("href").contains("name/nm"))
{
String castname = link.text();
String castImdbId = link.attr("href");
System.out.println("CastName:" + castname + "\n");
System.out.println("CastImdbID:" + castImdbId + "\n");
}
but it also returns the text of the link inside td class="primary_phptp" which is null, this is part of my output:
CastName:
CastImdbID:/name/nm0000209/?ref_=ttfc_fc_cl_i1
CastName:Tim Robbins
CastImdbID:/name/nm0000209/?ref_=ttfc_fc_cl_t1
CastName:
......
Could someone please let me know where is my problem? I think the condition if(td.attr("class").contains("itemprop")) does not work at all.
Thanks,
Use a different css selector instead of td. Since the right <td> is identified be the class, why not use it:
td.itemprop
Your java code then would start like this instead
Elements tds = elms.select("td.itemprop");

Selenium WebDriver Java - Clicking on element by label not working on certain labels

I am trying to click on element by the label. Here is the code I am using:
driver.findElement(By.xpath("id(//label[text() = 'LABEL TEXT HERE']/#for)")).click();
It works for (Select All) & Hayward but cant find Los Angeles, San Fran, or San Jose.
UPDATE:
For now I guess this may be my best option until I see something better. This will allow the user to pass the full String and the function in the method will grab the last word of the string and insert it into the contains xpath.
public void subStringLocationTest(String location) {
String par = location.substring(location.lastIndexOf(" ") + 1);
driver.findElement(By.xpath("//label[contains(text(), '" + par + "')]")).click();
}
Here is the HTML code:
<div id="ReportViewer1_ctl04_ctl03_divDropDown" onclick="event.cancelBubble=true;" onactivate="event.cancelBubble=true;" style="border: 1px solid rgb(169, 169, 169); font-family: Verdana; font-size: 8pt; overflow: auto; background-color: window; display: inline; position: absolute; z-index: 11; left: 131px; top: 41px; width: 188px;">
<span><table cellpadding="0" cellspacing="0" style="background-color:window;">
<tbody><tr>
<td nowrap="nowrap"><span style="font-family:Verdana;font-size:8pt;"><input id="ReportViewer1_ctl04_ctl03_divDropDown_ctl00" type="checkbox" name="ReportViewer1$ctl04$ctl03$divDropDown$ctl00" onclick="$get('ReportViewer1_ctl04_ctl03').control.OnSelectAllClick(this);"><label for="ReportViewer1_ctl04_ctl03_divDropDown_ctl00">(Select All)</label></span></td>
</tr><tr>
<td nowrap="nowrap"><span style="font-family:Verdana;font-size:8pt;"><input id="ReportViewer1_ctl04_ctl03_divDropDown_ctl02" type="checkbox" name="ReportViewer1$ctl04$ctl03$divDropDown$ctl02" onclick="$get('ReportViewer1_ctl04_ctl03').control.OnValidValueClick(this, 'ReportViewer1_ctl04_ctl03_divDropDown_ctl00');"><label for="ReportViewer1_ctl04_ctl03_divDropDown_ctl02">Hayward</label></span></td>
</tr><tr>
<td nowrap="nowrap"><span style="font-family:Verdana;font-size:8pt;"><input id="ReportViewer1_ctl04_ctl03_divDropDown_ctl03" type="checkbox" name="ReportViewer1$ctl04$ctl03$divDropDown$ctl03" onclick="$get('ReportViewer1_ctl04_ctl03').control.OnValidValueClick(this, 'ReportViewer1_ctl04_ctl03_divDropDown_ctl00');"><label for="ReportViewer1_ctl04_ctl03_divDropDown_ctl03">Los Angeles</label></span></td>
</tr><tr>
<td nowrap="nowrap"><span style="font-family:Verdana;font-size:8pt;"><input id="ReportViewer1_ctl04_ctl03_divDropDown_ctl04" type="checkbox" name="ReportViewer1$ctl04$ctl03$divDropDown$ctl04" onclick="$get('ReportViewer1_ctl04_ctl03').control.OnValidValueClick(this, 'ReportViewer1_ctl04_ctl03_divDropDown_ctl00');"><label for="ReportViewer1_ctl04_ctl03_divDropDown_ctl04">San Francisco</label></span></td>
</tr><tr>
<td nowrap="nowrap"><span style="font-family:Verdana;font-size:8pt;"><input id="ReportViewer1_ctl04_ctl03_divDropDown_ctl05" type="checkbox" name="ReportViewer1$ctl04$ctl03$divDropDown$ctl05" onclick="$get('ReportViewer1_ctl04_ctl03').control.OnValidValueClick(this, 'ReportViewer1_ctl04_ctl03_divDropDown_ctl00');"><label for="ReportViewer1_ctl04_ctl03_divDropDown_ctl05">San Jose</label></span></td>
</tr>
</tbody></table><input type="hidden" name="ReportViewer1$ctl04$ctl03$divDropDown$ctl01$HiddenIndices" id="ReportViewer1_ctl04_ctl03_divDropDown_ctl01_HiddenIndices" value=""></span>
</div>
You can use xpath locater. For example:
d.findElement(By.xpath("//...../label[contains(#for,'ReportViewer1_ctl04_ctl03_divDropDown_ctl02')]")).click();
Use following xpaths :
//label[text()='(Select All)']
//label[text()='Hayward']
//label[text()='Los Angeles']
//label[text()='San Francisco']
//label[text()='San Jose']
You can also do a contains like this:
"//label[contains(text(), 'TEXT_TO_FIND')]"
Try to use id instead of xpath:
//for Select All
driver.findElement(By.id("ReportViewer1_ctl04_ctl03_divDropDown_ctl00")).click();
//for Hayward
driver.findElement(By.id("ReportViewer1_ctl04_ctl03_divDropDown_ctl02")).click();
//for Los Angeles
driver.findElement(By.id("ReportViewer1_ctl04_ctl03_divDropDown_ctl03")).click();
//for San Francisco
driver.findElement(By.id("ReportViewer1_ctl04_ctl03_divDropDown_ctl04")).click();
//for San Jose
driver.findElement(By.id("ReportViewer1_ctl04_ctl03_divDropDown_ctl05")).click();
If still have problem or id is changing dynamically you cane use xpath like:
//label[contains(.,'(Select All)')]
//label[contains(.,'Hayward')]
//label[contains(.,'Los Angeles')]
//label[contains(.,'San Francisco')]
//label[contains(.,'San Jose')]
driver.findElement(By.xpath("//label[text()='LABEL TEXT']/../input")).click();
Try CSS:
driver.findElement(By.CSS("label:contains('Partial or full text')");
try this, maybe good startpoint:
System.out.print(driver.findElement(By.xpath("//div[#id='ReportViewer1_ctl04_ctl03_divDropDown']/span/table/tbody/tr[" + i +"]/td/span/label")).getText());
driver.findElement(By.xpath("//div[#id='ReportViewer1_ctl04_ctl03_divDropDown']/span/table/tbody/tr[" + i +"]/td/span/label")).click();
i = 3; //Los Angeles
i = 4; //San Francisco
I hope, my xpath is correct.
Use below code to click on label or choose label
driver.findElement(By.xpath("//label[#for='your id which you want to click']")).click();

Categories