I am trying to get the validate the $40 by asserting but unable to track the xpath. Any Suggestions
<div class="MuiGrid-root MuiGrid-item MuiGrid-grid-xs-12"><div class="vetting-details"><h3 class="paragraph text-center"><span>Vetting Price: </span>$40 <br> <span>Estimated Time for Vetting:</span> 30 seconds</h3></div></div>
That 40 is basically a text node. You can retrieve it using :
WebElement e = driver.findElement(By.xpath("//h3[contains(#class,'paragraph text-center')]"));
String el = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].textContent;", e);
String s = el.split("\\ ")[2].trim();
System.out.println(s)
Explanation :
40 is not a plain text rather it is a text node. we need JS intervention to target the p tag, and then get all the text content. Then using split to split the string to get the desired element.
We can use below code (without JavaScriptExecutor)
String value=driver.findElement(By.tagName("h3")).getText();
String s1=value.split(":")[1].trim();
System.out.println(s1);
Related
I'm parsing html of a website with JSoup. I want to parse this part:
<td class="lastpost">
This is a text 1<br>
Website Page - 1
</td>
I want like this:
String text = "This is a text 1";
String textNo = "Website Page - 1";
String link = "post/13594";
How can I get the parts like this?
Your code would only get all the text that is in the td elements that you are selecting. If you want to store the text in separate variables, you should grab the parts separately like the following code. Extra comments added so you can understand how/why it is getting each piece.
// Get the first td element that has class="lastpost"
Element lastPost = document.select("td.lastpost").first();
// Get the first a element that is a child of the td
Element linkElement = lastPost.getElementsByTag("a").first();
// This text is the first child node of td, get that node and call toString
String text = lastPost.childNode(0).toString();
// This is the text within the a (link) element
String textNo = linkElement.text();
// This text is the href attribute value of the a (link) element
String link = linkElement.attr("href");
I have an html code something like:
<h3> Some Heading </h3>
<p> Some String </p>
<p> more string </p>
<h3> Other heading</h3>
<p> some text </p>
I am trying to access Some String, more string and some text. With java, am trying to access like this:
List<WebElement> h3Tags = driver.findElements(By.tagName("h3"));
List<WebElement> para = null;
WebElement bagInfo = h3Tags.get(0); //reads first h3
if(bagInfo.getText().contains("carry-on") || bagInfo.getText().contains("Carry-on")){
para = AutoUtils.findElementsByTagName(bagInfo, "p");
System.out.println(para.get(0).getText()); //Null pointer here
}
bagInfo = h3Tags.get(1);
if(bagInfo.getText().contains("checked") || bagInfo.getText().contains("Checked")){
para = AutoUtils.findElementsByTagName(bagInfo, "p");
System.out.println(para.get(0).getText()); //Null pointer here too
}
Tried xpath like "h3['/p']" but still no luck. What is the best way to access those <p> strings?
Try xpath //h3/following-sibling::p to match all 3 paragraphs
Also note that your XPath h3['/p'] doesn't work as it means return h3 node which is DOM root node. Predicate ['/p'] will always return True as non-empty string ('/p' in your case) is always True
To access Some String, more string and some text you can use the following Locator Strategy :
To access the node with text as Some String
By.xpath("//h3[normalize-space()='Some Heading']//following::p[1]")
To access the node with text as more string
By.xpath("//h3[normalize-space()='Some Heading']//following::p[2]")
To access the node with text as some text
By.xpath("//h3[normalize-space()='Other heading']//following::p[1]")
Once you locate those elements you can use getAttribute("innerHTML") method to extract the text within the nodes.
Below is the html code
<h2 id="xyz" class="test">
<button class="restore 1" value="test2" title="hello"> Line1 </button>
<button class="restore 2" value="test3" title="click"> Line2 </button>
I need this text
</h2>
I need to extract the text "I need this text" from the above code.
I tried by the following ways but could not get the only line "I need this text:
1.) By.xpath("//h2[#id='xyz']").gettext();
getting error saying InvalidSelectorError: The result of the xpath expression "//h2[#id='xyz']/following-sibling::text()"
is: [object Text]. It should be an element.
2.) By.xpath("//h2[#id='xyz']").getattribute(innerText);
By this selector i am getting the output as line1,line2 & I need this text
My expected output should only be "I need this text"
The Selenium API doesn't support text nodes. If you wish to only get the text from the nodes, then use an helper function with some JavaScript:
WebElement element = driver.findElement(By.cssSelector("#xyz"));
String text = getNodesText(element);
public static String getNodesText(WebElement element) {
String SCRIPT =
"for(var arr = [], e = arguments[0].firstChild; e; e = e.nextSibling) " +
" if(e.nodeType === 3) arr.push(e.nodeValue); " +
"return arr.join('').replace(/ +/g, ' ').trim(); " ;
WebDriver driver = ((RemoteWebElement)element).getWrappedDriver();
return (String)((JavascriptExecutor)driver).executeScript(SCRIPT, element);
}
You're running the getText() method on a By. you need to run it on a WebElement:
WebElement h2 = driver.findElement(By.xpath("//h2[#id='xyz']"));
String wantedText = h2.getText();
Better of, use the ID directly:
WebElement h2 = driver.findElement(By.id("xyz"));
Edit: your second option that returns the hole text, e.g. Line1,line2 etc... Works. The html is written wrong since it has the buttons under the h2 tag... So your better option is to remove the text your getting from the buttons.
Is it possible to convert below String content to an arraylist using split, so that you get something like in point A?
<a class="postlink" href="http://test.site/i7xt1.htm">http://test.site/i7xt1.htm<br/>
</a>
<br/>Mirror:<br/>
<a class="postlink" href="http://information.com/qokp076wulpw">http://information.com/qokp076wulpw<br/>
</a>
<br/>Additional:<br/>
<a class="postlink" href="http://additional.com/qokdsfsdwulpw">http://additional.com/qokdsfsdwulpw<br/>
</a>
Point A (desired arraylist content):
http://test.site/i7xt1.htm
Mirror:
http://information.com/qokp076wulpw
Additional:
http://additional.com/qokdsfsdwulpw
I am now using below code but it doesn`t bring the desired output. (mirror for instance is being added multiple times etc).
Document doc = Jsoup.parse(string);
Elements links = doc.select("a[href]");
for (Element link : links) {
Node previousSibling = link.previousSibling();
while (!(previousSibling.nodeName().equals("u") || previousSibling.nodeName().equals("#text"))) {
previousSibling = previousSibling.previousSibling();
}
String identifier = previousSibling.toString();
if (identifier.contains("Mirror")) {
totalUrls.add("MIRROR(s):");
}
totalUrls.add(link.attr("href"));
}
Fix your links first. As cricket_007 mentioned, having proper HTML would make this a lot easier.
String html = yourHtml.replaceAll("<br/></a>", "</a>"); // get rid of bad HTML
String[] lines = html.split("<br/>");
for (String str : Arrays.asList(lines)) {
Jsoup.parse(str).text();
... // you can go further here, check if it has a link or not to display your semi-colon;
}
Now that the errant <br> tags are out of the links, you can split the string on the <br> tags that remain and print out your html result. It's not pretty, but it should work.
I need to parse text from a webpage. The text is presented in this way:
nonClickableText= link1 link2 nonClickableText2= link1 link2
I want to be able to convert all to a string in java. The non clickable text should remain like it is while the clickable text should be replaced with its actual link.
So in java I would have this:
String parsedHTML = "nonClickableText= example.com example.com nonClickableText2= example3.com example4.com";
Here are some pictures: first second
What exactly is link1 and link2? According to your example
"... nonClickableText2= example3.com example4.com"
they can be different, so what would be the source besides the href?
Based on you images the following code should give you everything to adopt your final string presentation. First we grab the <strong>-block and then go through the child nodes, using <a>-children with preceding text-nodes:
String htmlString = "<html><div><p><strong>\"notClickable1\"<a rel=\"nofollow\" target=\"_blank\" href=\"example1.com\">clickable</a>\"notClickable2\"<a rel=\"nofollow\" target=\"_blank\" href=\"example2.com\">clickable</a>\"notClickable3\"<a rel=\"nofollow\" target=\"_blank\" href=\"example3.com\">clickable</a></strong></p></div></html>";
Document doc = Jsoup.parse(htmlString); //can be replaced with Jsoup.connect("yourUrl").get();
String parsedHTML = "";
Element container = doc.select("div>p>strong").first();
for (Node node : container.childNodes()) {
if(node.nodeName().equals("a") && node.previousSibling().nodeName().equals("#text")){
parsedHTML += node.previousSibling().toString().replaceAll("\"", "");
parsedHTML += "= " + node.attr("href").toString() + " ";
}
}
parsedHTML.trim();
System.out.println(parsedHTML);
Output:
notClickable1= example1.com notClickable2= example2.com notClickable3= example3.com