I'm using selenium (with java) to search all the classNames of a Page and then use Regex to only save the className(s) which have "insignia" in them.
I tried using the below code with regex to search for classNames which a mention of "insignia" in them but it didn't return any result.
System.out.println(driver.findElements(By.className(".*\\binsignia\\b.*")).get(1).getAttribute("src"));
You can't use Regex inside a locator string. You can use a CSS selector and find all elements that contain "insignia" in the class name.
System.out.println(driver.findElements(By.cssSelector("[class*='insignia']")).get(1).getAttribute("src"));
CSS selector reference
driver.findElements(By.xpath("//*[contains(#class,'.*\\binsignia\\b.*')]")
is going to return the webElements containing class name insignia
Related
There are 2 classes with the same name
<div class="website text:middle"> A</div>
<div class="website text:middle"> 1</div>
How to get A and 1? I tried using getElementById with :eq(0) and it gives out null
Method getElementById queries for elements with a specified id, not class; I'm not sure what you were trying to query with :eq(0) either.
Try:
// String html = ...
Document doc = Jsoup.parse(html);
List<String> result = doc.getElementsByClass("text:middle").eachText();
// result = ["A", "1"]
EDIT
You can query for elements that match multiple classes! See Jsoup select div having multiple classes.
However, a colon (:) is a special character in css and needs to be escaped when it appears as part of a class name in a selector query. I don't think that jsoup currently supports this and simply treats everything after a colon as a pseudo-class.
To add to Janez's correct answer - while jsoup's CSS selector (currently) doesn't support escaping a : character in the class name, there are other ways to get it to work if you want to use the select() method instead of getElementsByXXX -- e.g. if you want to combine selectors in one call:
Elements divs = doc.select("div[class=website text:middle]");
That will find div elements with the literal attribute class="website text:middle". Example.
Or:
Elements divs = doc.select("div[class~=text:middle]");
That finds elements with the class attribute that matches the regex /text:middle/. Example
For the presented data though, I think think the getElementsByClass() DOM method is the way to go and the most general. I just wanted to show a couple alternatives for other cases.
document.querySelectorAll(".website")[0] // 0 is child index
you should use querySelector it is fully supported by every browser
check this for support details support
I have a situation where I need to include both of the two possible words from JSoup selector. I have already done it for the first word, but struggle to have some kind of logical OR 'contain another word'. Code I already have:
Iterator<Element> activity = table.select("td[class=xl75], td[class=xl71], td[class=xl73]:contains(word1))").iterator();
I have tried to edit it this way:
Iterator<Element> activity = table.select("td[class=xl75], td[class=xl71], td[class=xl73]:contains(word1):contains(word2)").iterator();
but it's not working. Any ideas have to have both of two words included in one selector?
You can consider using regex matching for this kinda work.
Where your selector:
td[class=xl75], td[class=xl71], td[class=xl73]:contains(word1):contains(word2)
Can be rewritten as the following code:
td[class=xl75], td[class=xl71], td[class=xl73]:matches((word1)|(word2))
I did the following search
parts.get(i).findElements(By.xpath("//li[starts-with(#class, '_lessons--row-')]"))
and it returned dozens of results, while I see in Developer Tools, that there are no more than 3 of them.
parts.get(i) returns single WebElement.
Looks like it searches not children of a given element, but over entire page. Can double slash cause this? What double slash means in XPath?
Your xpath is faulty here.
"//li[starts-with(#class, '_lessons--row-')]"
// searches from root level, to search from node preappend .:
".//li[starts-with(#class, '_lessons--row-')]"
Try your xpath with .// , normally you should start xpath with "." to stop finding elements from root.
.//li[starts-with(#class, '_lessons--row-')]
// match relative data. which starts at the document root. In your case you are trying to locate using
//li[starts-with(#class, '_lessons--row-')]
So it will return all the match in your html. If you want to locate some specific portion of element with class have start text_lessons--row- . You have to make your xpath more specific.
e.g
//div[#id='someid']//li[starts-with(#class, '_lessons--row-')]
I had a similar case, but . before // didn't help me. Just added findElements(By.xpath("your_xpath")).stream().filter(WebElement::isDisplayed).toList() as a workaround.
I would like to find any WebElement based on text using XPath.
WebElement that I am interested to find,
Its HTML,
Basically my WebElement that I am trying to retrieve by Text contains an input element.
I currently use,
driver.findElement(By.xpath("//*[normalize-space(text()) = 'Own Hotel']"));
which does not find the WebElement above, but it usually works to retrieve all other web elements.
Even,
By.xpath("//*[contains(text(),'Own Hotel')]")
did not give me any results. Although I am interested in exact text match.
I am looking for a way to find web element by text immaterial of the elements that are present inside the web element. If text matches, it should return the WebElement.
Thanks!
It seems text is wrapped inside a label and not input. Try this
driver.findElement(By.xpath(".//label[text()[normalize-space() = 'Own Hotel']]"));
There is nice explanation about this xpath pattern here
In the HTML below:
The innerText Own Hotel within the <input> node contains a lot of white-space characters in the beginning as well at the end. Due to the presence of these leading and trailing white-space characters you can't use the location path text() as:
text() selects all text node children of the context node
As an alternative, you need to use the String Function string normalize-space(string?) as follows:
driver.findElement(By.xpath("//*[normalize-space()='Own Hotel']"));
However, it would a better idea to make your search a bit more granular adding the tagName and preferably an unque attribute as follows:
Using tagName and normalize-space():
driver.findElement(By.xpath("//input[normalize-space()='Own Hotel']"));
Using tagName, and normalize-space():
driver.findElement(By.xpath("//input[#name='ownHotel' and normalize-space()='Own Hotel']"));
References
you can find a couple of relevant discussions using normalize-space() in:
How to click on a link with trailing white-space characters on a web page using Selenium?
How to locate and click the element when the innerText contains leading and trailing white-space characters using Selenium and Python
How to click on the button when the textContext contains leading and trailing white-space characters using Selenium and Python
Possible duplicate: RegEx matching HTML tags and extracting text
I need to get the text between the html tag like <p></p> or whatever. My pattern is this
Pattern pText = Pattern.compile(">([^>|^<]*?)<");
Anyone knows some better pattern, because this one its not very usefull. I need it to get for index the content from web page.
Thanks
SO is about to descend on you. But let me be the first to say, don't use regular expressions to parse HTML. Here is a list of Java HTML Parsers. Look around until you see an API that suits your fancy and use that instead.
It looks like you are trying to use the | operator inside a negative set, which is neither working nor needed. Just specify the characters that you don't want to match:
Pattern pText = Pattern.compile(">([^<>]*?)<");
Don't use regular expressions when parsing HTML.
Use XPath instead (if your HTML is well formed). You can reference text nodes using the text() function very easily.