compare jsoup elements

compare jsoup elements - java

hello guys i am try to compare one jsoup element with all other elements and if two elements are equal i need to make count++; in this case i need to compare all elements in links1 with all elements in links2 links3 links4....
Document document1 = Jsoup.parse(webPage1);
Elements links1 = document1.select("example");
Document document2 = Jsoup.parse(webPage2);
Elements links2 = document2.select("example");
Document document3 = Jsoup.parse(webPage3);
Elements links3 = document3.select("example");
Document document4 = Jsoup.parse(webPage4);
Elements links4 = document4.select("example");
what would be the code....in JSP....

Elements is just a List of Element, so compating will look like:
for (Element element : links1) {
if(links2.contains(element)){
count++;
}
//maybe do the same thing with links3 links4.
}
If you want do it in JSP — this is another question.

Related

Selenium: Select all the elements on the page containing any text

I want to select all the elements on the page containing any text.
Only elements actually containing texts themselves, not the parent elements containing texts in their child elements only.
This XPath is matching elements containing any non-empty texts
//*[text() != ""]
However this
List<WebElement> list = driver.findElements(By.xpath("//*[text() != '']"));
gives me a list of all elements containing texts themselves or in their child elements.
I can iterate over this list with something like this to get elements actually containing texts themselves into real list
List<WebElement> real = new ArrayList<>();
for(WebElement element : list){
js = (JavascriptExecutor)driver;
String text = js.executeScript("""
return jQuery(arguments[0]).contents().filter(function() {
return this.nodeType == Node.TEXT_NODE;
}).text();
""", element);
if(text.length()>0){
real.add(element);
}
But this is a kind of workaround.
I'm wondering is there a way to get the list of elements actually containing any text doing that directly or more elegantly?

List<WebElement> elementsWithOwnText = new ArrayList<WebElement>();
List<WebElement> allElements = driver.findElements(By.xpath("//*"));
for (WebElement element: allElements) {
List<WebElement> childElements = element.findElements(By.xpath(".//*"));
String text = element.getText();
if (childElements.size() == 0 && text.lenght() > 0) {
elementsWithOwnText.add(element);
}
}
Be aware of org.openqa.selenium.StaleElementReferenceException. While looping allElements any of them may be no more attached to the page document (dynamic content f.e.).

You can try this:
it selects all leaf elements with text.
List<WebElement> list = driver.findElements(By.xpath("//*[not(child::*) and text()]"));
for (WebElement webElement : list)
System.out.println(webElement.getText());

Until you find the xpath that you need, as a temporary solution, I would recommand to try the below iteration too (even though is not so efficient as a direct xpath).
In my case it took 1 minute to evaluate 700 nodes with text and returned 152 elements that have its own text:
public static List<WebElement> getElementsWithText(WebDriver driver) {
return driver.findElements(By.xpath("//*[normalize-space() != '']"))
.stream().filter(element -> doesParentHaveText(element))
.collect(Collectors.toList());
}
private static boolean doesParentHaveText(WebElement element) {
try {
String text = element.getText().trim();
List<WebElement> children = element.findElements(By.xpath("./*"));
for (WebElement child: children) {
text = text.replace(child.getText(), "").trim();
}
return text.trim().replace("[\\n|\\t|\\r]", "").length() > 0;
} catch (WebDriverException e) {
return false; //in case something does wrong on reading text; you can change the return false with thrown error
}
}

this could help:
source
List<String> elements = driver.findElements(By.xpath("//a")).stream().map(productWebElement -> productWebElement.getText()).distinct().collect(Collectors.toList());
// Print count of product found
System.out.println("Total unique product found : " + elements.size());
// Printing product names
System.out.println("All product names are : ");
elements.forEach(name -> System.out.println(name));

Transform linkedhashmap into org.w3c.dom.Element

I need to create a method that given any LinkedHashMap , this method have to transform it into XML/Dom elements of a Document like :
#Override
public Element marshal(Object linkedHashMap) {
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
final Element element = doc.createElement("root");
//For each attribute and object of the linked hashmap
// I must iterate recursively in order to get all the objects and atributes of the LinkedHashMap and append them to the root node element:
//element.appendChild(...));
return element; // here the element must be already populated with all the attributes of the linked hashmap and its values.
}
I have no idea about how can I achieve this , how can I loop through the attributes of a LinkedHashMap in order to map them to Element ?
I need something like this , but it must iterate over all the levels and sublevels (nested linkedhashmap objects) of the linkedhashmap:
private void marshalMapElements(ArrayList<LinkedHashMap> linkedHashMaps) {
Document doc = getDocument();
Element root = doc.createElement("root");
for (Map<String, Object> element : linkedHashMaps) {
Element e = doc.createElement(element.getKey());
e.setTextContent(element.getValue());
root.appendChild(e);
}
}
}

Verify over 200 List elements on a page in Webdriver

I have some 200 elements who's mark-up is as follows:
<span id="1356329740258" class="pagename">Sport & leisure</span>
<span id="1356329740259" class="pagename">Food & drink</span>
<span id="1356329740260" class="pagename">Household</span>
<span id="1356329740261" class="pagename">Gardening</span>
I can access them with Webdriver in a fairly ugly manner:
List<WebElement> elements;
elements = driver.findElements(By.xpath( ".//*[starts-with(#id, '135')]"));
...Because each starts with a '135'.
But driver.findElement(By.cssSelector(".pagename");
...does not work, perhaps something to do with the '' tags
What I now need to do, is do a .getText() for each element in the list and verify it against the expected, corresponding array value. I'm starting off thinking of this method:
String[] expected = {"Sport & leisure", "Food & drink", "Household", "Gardening"};
List<WebElement> elements = select.find.Elements(By.xpath( ".//*[starts-with(#id,'135')]"));
// compare #array items with #found elements in List
if (expected.length != elements.size()) {
System.out.println("the wrong number of elements were found");
}
// check value of every pagename class element equals expected value
for (int i = 0; i < expected.length; i++) {
String elementsValue = elements.get(i).getAttribute("value");
if (elementsValue.equals(expected[i])) {
System.out.println("passed on: " + elements);
} else {
System.out.println("failed on: " + elements);
}
}
This has the obvious limitation of potentially having to store 200 odd text strings in the array and will therefore become unwieldy. Is there a more elegant solution? I could read the array values in from a .csv I guess and used Parameterized runner but then I'd still need to declare each value in the constructor right?

You can use the Lists contains or containsAll function to determine equality. So basically like this:
final List<String> expectedElements = readFromCSV("expectedElements.csv");
final List<WebElement> elements = select.find.Elements(By.xpath( ".//*[starts-with(#id,'135')]"));
final List<String> stringElements = new ArrayList<>(elements.length);
for (WebElement element : elements) {
stringElements.add(element.getAttribute("value"));
}
final boolean isSame = stringElements.containsAll(expectedElements);

This is not a direct answer to your question, but only a few corrections to your code:
1.
You can replace the code that you consider "ugly":
List<WebElement> elements = select.findElements(By.xpath(".//*[starts-with(#id,'135')]"));
With a code that finds the elements using their class attribute:
List<WebElement> elements = select.findElements(By.xpath("//span[#class='pagename']"));
2.
Since non of these elements has a value attribute, you should replace the following line:
String elementsValue = elements.get(i).getAttribute("value");
With:
String elementsValue = elements.get(i).getAttribute("innerHTML");

Jsoup getting value of tag

I am using Jsoup to try and read all the elements in the html and loop through and do stuff based on the the type of element.
I'm not having any luck, I can't find the proper method to check for the values of each element.
Any suggestions?
This is my latest attempt:
Elements a = doc.getAllElements();
for(Element e: a)
{
if( e.val().equals("td"))
{
System.out.println("TD");
}
else if(e.equals("tr"))
{
System.out.println("TR");
}
}
This does not print anything.

Try this one:
Elements tdElements = doc.getElementsByTag("td");
for(Element element : tdElements )
{
//Print the value of the element
System.out.println(element.text());
}

Better you select each element by its tags:
Elements tdTags = doc.select("td");
Elements trTags = doc.select("tr");
// Loop over all tdTags - you can do the same with trTags
for( Element element : tdTags )
{
System.out.println(element); // print the element
}

e.tag() will do that
Elements tdElements = doc.getElementsByTag("td");
for(Element element : tdElements )
{
//Print the value of the element
System.out.println(element.tag());
}

Get unique List elements based on a substring

I have an ArrayList(String) which contains a list of formatted dates like this:
element 1: "2012-5-1"
element 2: "2012-8-10"
element 3: "2012-12-5"
element 4: "2013-12-21"
element 5: "2013-12-13"
element 6: "2014-5-8"
What is the most efficient/framework way to create another list or normal primitive array that contains the unique year entries? For example my new list would contain:
element 1: "2012"
element 2: "2013"
element 3: "2014"

Try this
ArrayList<String> yearsOnlylist = new ArrayList<String> ();
for(String s : elements) {
String yearExtracted = s.substring(0,4);
yearsOnlylist.add(yearExtracted);
}
Where elements is the name of your list of date in the extended form.
Using as destination list
LinkedList<String> yearsOnlylist = new LinkedList<String> ();
instead of an ArrayList could sightly improve the conversion efficiency (because the add is O(1) in LinkedList) but access a specific position in a second time, has a lower efficiency (O(n) vs O(1)).

Just add them to a Set and convert it to a list:
Set<String> unique = new HashSet<String>();
for (String element : elements) {
set.put(element.substring(0,4));
}
List<String> uniqueList = new ArrayList<String>();
uniqueList.addAll(unique);

Iterate through your array list and take a substring of the first 4 characters of each member of the array list.
Add that substring to a set implementation such as a HashSet, which will give you what you want.

public List<String> trimmer(List<String> x) {
Log.e("", Integer.toString(x.size()));
for (int i = 0; i < x.size(); i++) {
String s = x.get(i).toString();
String a = s.substring(6);
Log.e("after trim is?", a);
x.remove(i);
x.add(i, a);
}
// check if the element got added back
Log.e("Trimmer function", x.get(1));
return x;
}
This will help you hopefully!

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

compare jsoup elements - java

Elements is just a List of Element, so compating will look like: for (Element element : links1) { if(links2.contains(element)){ count++; } //maybe do the same thing with links3 links4. } If you want do it in JSP — this is another question.

Related

Selenium: Select all the elements on the page containing any text

Transform linkedhashmap into org.w3c.dom.Element

Verify over 200 List elements on a page in Webdriver

Jsoup getting value of tag

Get unique List elements based on a substring

Categories

Resources