How to parse a nested Div into a table structure using Jsoup - java

I have div structure like this
<div class="DivClass-1"> Div One
<div class="DivClass-A"> Div A </div>
</div>
<div class="DivClass-2"> Div Two
<div class="DivClass-A"> Div B </div>
</div>
<div class="DivClass-3"> Div Three
<div class="DivClass-A"> Div C </div>
</div>
<div class="DivClass-4"> Div Four
<div class="DivClass-A"> Div D </div>
</div>
and i want to parse it and convert this div structure into a table structure
can any body give an idea how to achieve this.

Use replaceall() to replace all div tags

I am not clear which <div> tag you want to convert to <tr> and <td> tag.
But, I assume DivClass-1, DivClass-2, DivClass-3, DivClass-4 are convert to <tr> tag. Others are convert to <td> tag.
I hope following code will give you little idea.
StringBuffer myHTML = new StringBuffer();
myHTML.append("<div class=\"DivClass-1\"> Div One <div class=\"DivClass-A\"> Div A </div> </div>" +
"<div class=\"DivClass-2\"> Div Two<div class=\"DivClass-A\"> Div B </div></div>" +
"<div class=\"DivClass-3\"> Div Three<div class=\"DivClass-A\"> Div C </div></div>" +
"<div class=\"DivClass-4\"> Div Four <div class=\"DivClass-A\"> Div D </div></div>");
Document myDoc = Jsoup.parse(myHTML.toString());
//get DivClass-1, DivClass-2, etc.
Elements DivClass = myDoc.select("div").not("div.DivClass-A");
Elements DivClass_A = myDoc.select("div.DivClass-A");
//rename the tag <div class="DivClass-1"> to <tr class="DivClass-1">
DivClass.tagName("tr");
//renamed the tag <div class="DivClass-A"> to <td class="DivClass-A">
DivClass_A.tagName("td");
System.out.println(myDoc.toString());
Here's the printout-
<tr class="DivClass-1">
Div One
<td class="DivClass-A"> Div A </td>
</tr>
<tr class="DivClass-2">
Div Two
<td class="DivClass-A"> Div B </td>
</tr>
<tr class="DivClass-3">
Div Three
<td class="DivClass-A"> Div C </td>
</tr>
<tr class="DivClass-4">
Div Four
<td class="DivClass-A"> Div D </td>
</tr>

Related

jsp paging problem when i go to next page searching keyword doesn't apply

when i search specific word only first page is classified. it shows pages and posts well on first page.
but when i go to page 2 or next page, seaching keyword doesn't apply on
is this address problem?
i guess this is sql or Paging.java problem because when i print log of page at BDAO it shows page well which i clicked.
also I don't know how can i transfer keyWord &keyField for that..!
I use oracle DB.
<%
String keyWord = (String)request.getParameter("keyWord");
String keyField = (String)request.getParameter("keyField");
%>
<script>
function searchCheck(frm){
//검색
if(frm.keyWord.value ==""){
alert("검색 단어를 입력하세요.");
frm.keyWord.focus();
return;
}
frm.submit();
}
function PageMove(page){
var keyWord = '<%=keyWord%>'
var keyField = '<%=keyField%>'
console.log(keyWord);
if(keyWord !=''){
location.href = "list.do?page="+page+"&keyWord=" + keyWord + "&keyField=" + keyField;
}
location.href = "list.do?page="+page;
}
</script>
</head>
<body>
<table width="800" cellpadding="0" cellspacing="0" border="1">
<tr>
<td>번호</td>
<td>이름</td>
<td>제목</td>
<td>날짜</td>
<td>히트</td>
</tr>
<c:forEach items="${list}" var="dto">
<tr>
<td>${dto.bId}</td>
<td>${dto.bName}</td>
<td>
<c:forEach begin="1" end="${dto.bIndent}">-</c:forEach>
${dto.bTitle}</td>
<td>${dto.bDate}</td>
<td>${dto.bHit}</td>
</tr>
</c:forEach>
<tr>
<td colspan="5">
<form action="list.do" method="post" name="search">
<select name="keyField">
<option value="bTitle">글 제목</option>
<option value="bContent">글 내용</option>
<option value="bName">작성자</option>
</select>
<input type="text" name="keyWord">
<input type="button" value="검색" onclick="searchCheck(form)">
</form>
</td>
</tr>
<tr>
<td colspan="5"> 글작성 </td>
</tr>
</table>
<div class="toolbar-bottom">
<div class="toolbar mt-lg">
<div class="sorter">
<ul class="pagination">
<li>맨앞으로</li>
<li>앞으로</li>
<c:forEach var="i" begin="${paging.startPageNo}" end="${paging.endPageNo}" step="1">
<c:choose>
<c:when test="${i eq paging.pageNo}">
<li class="active">${i}</li>
</c:when>
<c:otherwise>
<li>${i}</li>
</c:otherwise>
</c:choose>
</c:forEach>
<li>뒤로</li>
<li>맨뒤로</li>
</ul>
</div>
</div>
</div>
You never seem to be passing the keyword or keyfield when you call pageMove(). You might as well look up their values inside the function instead of having them as parameters:
function PageMove(page){
var keyWord = document.getElementById("keyWord").value;
var keyField = document.getElementById("keyField").value;
location.href = "list.do?page=" + page + "&keyWord=" + keyWord + "&keyField=" + keyField;
}

parsing a table with jsoup

I'm trying to extract the e-mail adress and the phone number from a linkedin profile using jsoup, each of these informations is in a table. I have written a code to extract them but it doesn't work, the code should work on any linkedin profile. Any help or guidance would be much appreciated.
public static void main(String[] args) {
try {
String url = "https://fr.linkedin.com/";
// fetch the document over HTTP
Document doc = Jsoup.connect(url).get();
// get the page title
String title = doc.title();
System.out.println("Nom & Prénom: " + title);
// first method
Elements table = doc.select("div[class=more-info defer-load]").select("table");
Iterator < Element > iterator = table.select("ul li a").iterator();
while (iterator.hasNext()) {
System.out.println(iterator.next().text());
}
// second method
for (Element tablee: doc.select("div[class=more-info defer-load]").select("table")) {
for (Element row: tablee.select("tr")) {
Elements tds = row.select("td");
if (tds.size() > 0) {
System.out.println(tds.get(0).text() + ":" + tds.get(1).text());
}
}
}
}
}
here is an example of the html code that i'm trying to extract (taken from a linkedin profile)
<table summary="Coordonnées en ligne">
<tr>
<th>E-mail</th>
<td>
<div id="email">
<div id="email-view">
<ul>
<li>
adam1adam#gmail.com
</li>
</ul>
</div>
</div>
</td>
</tr>
<tr class="no-contact-info-data">
<th>Messagerie instantanée</th>
<td>
<div id="im" class="editable-item">
</div>
</td>
</tr>
<tr class="address-book">
<th>Carnet d’adresses</th>
<td>
<span class="address-book">
<a title="Une nouvelle fenêtre s’ouvrira" class="address-book-edit" href="/editContact?editContact=&contactMemberID=368674763">Ajouter</a> des coordonnées.
</span>
</td>
</tr>
</table>
<table summary="Coordonnées">
<tr>
<th>Téléphone</th>
<td>
<div id="phone" class="editable-item">
<div id="phone-view">
<ul>
<li>0021653191431 (Mobile)</li>
</ul>
</div>
</div>
</td>
</tr>
<tr class="no-contact-info-data">
<th>Adresse</th>
<td>
<div id="address" class="editable-item">
<div id="address-view">
<ul>
</ul>
</div>
</div>
</td>
</tr>
</table>
To scrape email and phone number, use css selectors to target the element identifiers.
String email = doc.select("div#email-view > ul > li > a").attr("href");
System.out.println(email);
String phone = doc.select("div#phone-view > ul > li").text();
System.out.println(phone);
See CSS Selectors for more information.
Output
mailto:adam1adam#gmail.com
0021653191431 (Mobile)

How to check the checkboxes using the Selenium Java WebDriver?

I was trying to figure out a way to check the checkboxes in the table grid. Usually they are defined as type='checkbox'. So I'm finding it difficult to implement using the webDriver to check the checkboxes since they are in the tags.
A sample HTML code is given below.
<tbody id="gridview-2345-body">
<tr id="gridview-2345-record-/DNA/Study1_HS.xml" class="x4-grid-row x4-grid-data-row x4-grid-row-selected" data-boundview="gridview-1270" role="row">
<td id="ext4-ext-gen1234" class="x4-grid-cell x4-grid-td" role="gridcell">
<div class="x4-grid-cell-inner " style="text-align:left;" unselectable="on">
<div class="x4-grid-row-checker"/>
</div>
</td>
<td id="ext4-ext-1235" class="x4-grid-cell x4-grid-td" role="gridcell">
<div class="x4-grid-cell-inner " style="text-align:left;" unselectable="on">
<span id="ext4-icon1568" class="fa fa-file-code-o labkey-file-icon"/>
</div>
</td>
<td id="ext4-ext-gen1236" class="x4-grid-cell x4-grid-td" role="gridcell">
<div class="x4-grid-cell-inner " style="text-align:left;" unselectable="on">
<div width="100%" height="16px">
<div style="float: left;"/>
<div style="padding-left: 8px; white-space:normal !important;">
<span style="display: inline-block; white-space: nowrap;">Study1_HS.xml</span>
</div>
</div>
</div>
</td>
</tr>
</tbody>
I tried using 'contains' in the xpath
driver.findElement(By.xpath("//*[contains(#id, 'Study1_HS.xml')]/td[1]/div/div")).click();
I'm wondering... since the TR contains the class change when the checkbox is checked, maybe clicking the TR will trigger the check. Try this and see.
String searchText = "Study1_HS.xml";
List<WebElement> rows = driver.findElements(By.tagName("tr"));
for (WebElement row : rows)
{
if (row.getText().contains(searchText))
{
row.click();
break;
}
}
So I used 'preceding' in the xpath to make it work
//span[text()='Study1_HS.xml']/preceding::td/div/div[#class='x4-grid-row-checker']
http://www.xpathtester.com/xpath/b1d50008dd4be8ab7545548c4b8238f5

Handling multiple tables using selenium webdriver

I am checking the folder hierarchy on a webpage, depending on the type of user. User1 has a set of permissions which enable him to see the folder structure like this :
Main Folder
- First Child
-First Grandchild
-Second Grandchild
- Second Child
- Third Child
Each branch of the tree is a table consisting of 1 row. But the number of columns varies depending on the generation.
The "Main Folder" parent has only 1 column. The cell content is the string "Main Folder".
The children branches have 2 columns, the first cell containing blank space, and the next cell containing the name of the branch ("First Child", "Second Child").
The grandchildren branches have 3 columns, the first and second cell containing blank space, and the the third cell containing the name of the branch (" First Grandchild", "Second Grandchild").
HTML code :
<div id = 0>
<div id = 1>
<table id = 1>
<tbody>
<tr>
<td id="content1"
<a id="label1"
<span id="treeNode1"
Main Folder
</span>
</a>
</td>
</tr>
</tbody>
</table>
<div id = 2>
<table id = 2>
<tbody>
<tr>
<td>
<td id="content2"
<a id="label2"
<span id="treeNode2"
First Child
</span>
</a>
</td>
</td>
</tr>
</tbody>
</table>
<div id = 5>
<table id = 5>
<tbody>
<tr>
<td>
<td>
<td id="content5"
<a id="label5"
<span id="treeNode5"
First GrandChild
</span>
</a>
</td>
</td>
</td>
</tr>
</tbody>
</table>
</div>
<div id = 6>
<table id = 6>
<tbody>
<tr>
<td>
<td>
<td id="content6"
<a id="label6"
<span id="treeNode6"
Second GrandChild
</span>
</a>
</td>
</td>
</td>
</tr>
</tbody>
</table>
</div>
</div> /* End of division 2 */
<div id = 3>
<table id = 3>
<tbody>
<tr>
<td>
<td id="content3"
<a id="label3"
<span id="treeNode3"
Second Child
</span>
</a>
</td>
</td>
</tr>
</tbody>
</table>
</div>
<div id = 4>
<table id = 4>
<tbody>
<tr>
<td>
<td id="content4"
<a id="label4"
<span id="treeNode4"
Third Child
</span>
</a>
</td>
</td>
</tr>
</tbody>
</table>
</div>
</div> /*End of division 1 */
</div> /* End of division 0 */
User2 has a different set of permissions, which enable him to see the folder structure like this :
Main Folder
- First Child
-First Grandchild
- Second Child
- Third Child
The corresponding table is absent in the html code for this user.
My test case is to check User2 doesn't have access to the second grandchild. This means I need to ensure that particular table doesn't exist on the webpage.
How can I check this in selenium ? I am using JUnit for my test cases. I want to do an "assert" to ensure the second grandchild is not present.
You'll want to check to see if the element is not present or not visible. Calling isElementVisible() inside an assert false should do the trick. Just get the By locator of the elements you want to check.
private boolean isElementVisible(By by)
{
try
{
return driver.findElement(by).isDisplayed();
}
catch(NoSuchElementException e)
{
return false;
}
}

Selecting a checkbox based on a string in Selenium

After Entering a string into a table with a checkbox next to it, I would like to click on the checkbox. In selenium, how can i iterate through the table and search for a particular text, then check the checkbox next to it.
Here's the html of the table:
<tbody>
<tr class="keyword-list-item">
<td width="75%">
<input class="keyword-selection-checkbox" type="checkbox" data-id="gw_78669090303"/>
<span>+spatspatulalas</span>
</td>
<td width="25%" style="text-align: right; padding-right: 4px;">
<span class="icon iconGoogle"/>
</td>
</tr>
<tr class="keyword-list-item">
<td width="75%">
<input class="keyword-selection-checkbox" type="checkbox" data-id="gw_102731166303"/>
<span>12.10 test post</span>
</td>
<td width="25%" style="text-align: right; padding-right: 4px;">
<span class="icon iconGoogle"/>
</td>
</tr>
You can use xpath for this. Just needs be little smart how you write the xpath. Notice the following xpath find the checkbox using the text of it.
String text = "12.10 test post";
By xpath = By.xpath("//span[contains(text(),'" + text + "')]/../input");
WebElement element = driver.findElement(xpath);

Categories