Extract all string data except String containing HTML Table's in java

Extract all string data except String containing HTML Table's in java - java

I have a long String like this.
<p>Some Text above the tabular data. I hope this text will be seen.</p>
<table border="1" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td style="width:150px">
<p>S.No.</p>
</td>
</td>
</tr>
<tr>
<td style="width:150px">
<p>2</p>
</td>
</tbody>
</table>
<p> </p>
<p>Please go through this tabular data.</p>
<table border="1" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td style="width:150px">
<p>S.No.</p>
</td>
</tr>
<tr>
<td style="width:150px">
<p>1</p>
</td>
<tr>
<td style="width:150px">
>
</td>
</td>
</tr>
</tbody>
</table>
<p>End Of String</p>
Now I want to extract whole string before html table and after it like this. And add "HTML Table..." inplace of HTML Table. I tried few things but not able to achive it. Tried splitting into arrays, but it didn't worked
Sample Output
<p>Some Text above the tabular data. I hope this text will be seen.</p>
<p> </p>
HTML Table....
<p>Please go through this tabular data.</p>
<p>End Of String</p>

You can do this simply with String.replaceAll using regexp handling multiline and case-insensitive flags (?is):
String noTables = longTableString.replaceAll("(?is)(\\<table .*?\\</table\\>)", "HTML Table...");
// result
<p>Some Text above the tabular data. I hope this text will be seen.</p>
HTML Table...
<p> </p>
<p>Please go through this tabular data.</p>
HTML Table...
<p>End Of String</p>

This is may not be the most elegant solution, you can start with using regex to capture your table locations and then replace it with the desired content. Something like below will help.
String htmlString = <your html string> ;
Pattern pattern = Pattern.compile( "(<table)([\\s\\S]*?)(</table>)" ); // capture table elements using a suitable regex.
Matcher matcher = pattern.matcher( htmlStr );
String result = htmlStr;
while( matcher.find() )
{
// replace the table elements with another string
result = result.replace( htmlStr.substring( matcher.start(), matcher.end() ), "HTML Table...." );
}
System.out.println( result ); // print output
There are few drawbacks in this approach, like your regex must match with the html content. And the spacing depends on the original string spaces. You really don't have control over how the spaces in the output will look like. And more importantly, the regex evaluation is CPU intensive depending on the size of your HTML string.
This is just an approach to try.

Related

Regex to iterate over an table and extract the td information inside an div using java

Hello i know parsing HTML with regex is not efficient .But i need to do with regex i have no other option.
HTML
<div class="test">
<h2>what</h2>
<table cellpadding="0" cellspacing="0">
<tbody>
<tr>
<th>Example </th>
<td> ui </td>
</tr>
<tr>
<th>Sample </th>
<td>123 </td>
</tr>
</tbody>
</table>
</div>
I tried to do it using (?s)<div class="test">.*<td>(.*?)</td>.*</div> it extracts only the last value can any one tell me what is the issue?

Why only using Regular expression, how about some jquery??
$("div.test > td").each(function() {
var $this = $(this);
alert( $this.text() )
});

The * operator reads as much as is possible, so the first .* also swallows most of the text.
Try with .*?. The question mark reduces this behaviour and lets * only take as much as necessary, not as much as possible.
Otherwise please be more specific what parts you really want and which not.

Selenium read a table and write into a file line by line

I am looking for the help to read a table span using selenium code and write span into the file,
following is my html code
<table>
<tbody><tr>
<tr>
<td><span>
FIRST
</span>
</td>
</tr>
<tr>
<td>
<span>SECOND</span>
</td>
</tr>
<tr>
<td>
<span>THIRD</span>
</td>
</tr>
<tbody>
</table>
I need to write FIRST SECOND THIRD on a file in java.
Thanks a lot.

I suppose you have located/found the table WebElement. Then you can get span elements content like this:
List<WebElement> spanElements = tableElement.findElements(By.ByTagName("span"));
for (WebElement element : spanElements) {
String spanContent = element.getText();
//save it to a collection or a StringBuilder, then write it to a file
}
Having a look at this and this might help.

1) XPath that gets all text is:
//span/text()
2) in java code you may type something like
String text = selenium.getText("xpath=//span");

How to verify the Literal text on the UI

Here is part of the UI where i have got some literal texts like
Opportunity Name Momentum Test (ID=AANA-19KVBE)
Account Account Internal
Owner Christopher Braden
Sales Order Region North America
Locale English United States
Currency US Dollar
The piece of code for the above is as follows
<table class="datatable" width="100%">
<tbody>
<tr class="oddrow">
<td>Opportunity Name</td>
<td class="wrap_content">
Momentum Test (ID=AANA-19KVBE)
<br>
<a target="_blank" href="http://www.example.com">View in Salesforce</a>
</td>
</tr>
<tr class="evenrow">
<td>Account</td>
<td class="wrap_content">
Akamai Internal
<br>
<a target="_blank" href="http://www.example.com">View in Salesforce</a>
</td>
</tr>
<tr class="oddrow">
<td>Owner</td>
<td>Christopher Braden</td>
</tr>
<tr class="evenrow">
<td>Sales Order Region</td>
<td>North America</td>
</tr>
<tr class="oddrow">
<td>Locale</td>
<td>English - United States</td>
</tr>
<tr class="evenrow">
<td>Currency</td>
<td>US Dollar</td>
</tr>
</tbody>
I need to retrieve these values individually, store it to compare it in a different page.
for example : i need to know "United States" is stored under "Location"
Please help me
Thanks & Regards
Kiran

You can read and store all this information in a Map -
Map<String,String> map = new Map<String,String>;
List<WebElement> list = driver.findElements(By.xpath("//*[#class='datatable']/tbody/tr"));
for(int i=1;i<=list.size();i++){
String key = driver.findElement(By.xpath("//*[#class='datatable']/tbody/tr["+i+"]/td[1]")).getText();
String value = driver.findElement(By.xpath("//*[#class='datatable']/tbody/tr["+i+"]/td[2]")).getText();
map.put(key,value);
}
In this way you can read all the information in the table and store it in Map to be used later. Let me know if this works for you.

In tag <td></td> DOUBLE_WHITESPCE in query href

I have a very strange problem.
<table border ="1">
<tbody>
<c:forEach var="question" items="${questions}">
<tr>
<td>
${question.getQuestion()}
</td>
<td>
<c:forEach var="answer" items="${question.getAnswers()}">
<input type="checkbox" name ="user_answer" value="${answer.getAnswer()}">
${answer.getAnswer()}
<br />
</c:forEach>
</td>
<td>
<a href="/TutorWebApp/controller?command=edit_qestion&question=${question}">
Edit
</a>
</td>
</tr>
</c:forEach>
</tbody>
</table>
But If I use in I get next error
But if I don't use tag <a> in <td> it's OK. I don't have any ideas.
Thanks

I think this is just a bug/limitation of your editor. Try deploying your JSP and see if it works as expected or not.
That said, if your question contains characters that must be URL and/or HTML escaped, your HTML code will be invalid. You should use the c:url tag to avoid that:
<c:url var="editQuestionUrl" value="/TutorWebApp/controller">
<c:param name="command" value="edit_question"/>
<c:param name="question" value="${question}"/>
</c:url>
<%-- now the params are url-encoded --%>
Edit
<%-- now the query string is HTML-escaped --%>

You need to encode your question text (or whole URL) here by calling URLEncoder#encode()
You can look at this Q&A on how to encode a URL in JSTL.
Alternatively you can try calling JSTL's escapeXml function on your question text.

try replacing this line
<a href="/TutorWebApp/controller?command=edit_qestion&question=${question}">
with
<a href="/TutorWebApp/controller?command=edit_qestion&question='${question}'">

How do I replace an element?

I have the following HTML
<html>
<head>
<title>test</title>
</head>
<body>
<table>
<caption>table title and/or explanatory text</caption>
<thead>
<tr>
<th>header</th>
</tr>
</thead>
<tbody>
<tr>
<td id=\"test\" width=\"272\"></td>
</tr>
</tbody>
</table>
Test link
<img src=\"http://www.google.se/images/nav_logo95.png\" />"
</body>
</html>;
And I want to find the first link with jsoup and replace it with a text
Element elem = page.select("a[href=" + link.getUrl() + "]:contains(" + link.getName() + ")").first();
I can only replace the inner HTML with elem.html("foo") or print the outerHtml with elem.outerHtml()
Does anyone know how I can achieve this?

I found the answer!
TextNode text = new TextNode("foo", "");
elem.replaceWith(text);

Once you have found the element that you want to work with, you may apply the commands such as explained here: http://jsoup.org/cookbook/modifying-data/set-html
I could not get it right. I am trying this:
elemento.prepend("<a href='www.test.com'>");
elemento.html("Roberto C. Santos.");
elemento.append("</a>");
elemento.wrap("<a href='www.test.com'> </a>");
But I am getting this:
<td> <a style="" target="_self" title="" href="http://ivv.veveivv.vvzenes.com.br/mao/ara/ccacao" data-mce-href="resolveuid/5cc1c7c8c9efcacaaeedec22a9c69a54" class="internal-link">Roberto C. Santos.</a></td>
</tr>
I still don´t know the exact way to exchange the contents of an URL element.
I´d like to have, as the result:
<a href='www.test.com'> Roberto C. Santos.</a>"
How coul´d I erase the href that is inbetween?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extract all string data except String containing HTML Table's in java - java

Related

Regex to iterate over an table and extract the td information inside an div using java

Selenium read a table and write into a file line by line

How to verify the Literal text on the UI

In tag <td></td> DOUBLE_WHITESPCE in query href

How do I replace an element?

Categories

Resources