How to get text from div style using JSOUP - java

how to get the text "xxxx" and it's url using JSOUP.
<div style="width:45%;float:left;border: dashed 1px #966;margin:0 10px;padding:10px;height:400px;">
<ul>
<li>xxxx</li>
<li><b>years:</b>2015</li>
<li><b>language:</b>non </li>
<li><b>color:</b>color</li>
</ul>
</div>
This is my current approach but I receive nothing:
Elements mvYearElement = doc.select("div[style*=width:45%;float:left;border: dashed.1px #966;margin:0 10px;padding:10px;height:400px;]");

The problem is probably that styles do not need to appear in an particular order. Your selector however fixates the order and lists a lot of styles. I would try to identify the part of the style the really is discriminating the link and only use this part. Since I don't know the rest of the HTML i only could guess what is that discriminating part. This maybe?
Elements els = doc.select(div[style*=dashed]);
That is only a wild guess however. But maybe it is also the contents of the div that are discriminating it from the others? In that case you could do something like this:
Elements els = doc.select(div[style]:has(ul));
Or something else. If you would share more of the HTML I could be more specific.

Related

How to read html on java without jsoup or any other third party?

I have a StringBuilder object in my class which I want to display on UI. This object has few html tags for ex: <li> <br> etc. I would like to know how to format this object so that the html tags are not shown as it is on screen, however they are converted to a readable format.
Note: I don't want to remove these tags and get a plain text. Rather if there is a <br> tag it should break line while displaying the text. Also, due to project restrictions I don't want to use any third party like jsoup etc.
Any help to achieve this would be appreciated!
How about simple .toString().replaceAll with specific replacements? Like:
<br> = \r\n
<li> = \r\n •
...and so on..

Selenium Web driver clicking images

How do I select in image on and click on it using Selenium web driver? Say if it says this
<style type="text/css"> <ul id="nav"> <li> <li> <li> <li> <li> My Dashboard </li> </ul>
Would I use
driver.findElement(By.linkText("My Dashboard")).click();
or something else?
If you want to click on link in your example, you can use the selector you wrote, different kinds of css selectors (for example, By.cssSelector("#nav a") (looks for a link inside the "nav" list) or By.cssSelector("a[href='dashboard.action']") (looks for a link with specific href)) or using xPath selectors.
The important thing is to have a unique identifier to locate your element and an identifier that will fire 100% of the time.
For example, if you expect the link text to change on you, then don't look for that particular link text, because you have no guarantee that it will work 100% of the time.
Similarly, if there are 30 different elements that have the same id tag, don't use that either.
If things turn out to be very complex... that is, if you are in a large page with a lot of unknown variables, find by XPATH.
In the end, it really depends on the complexity of the website you are entering, and the goal of what you need done.
For more information, go to the Selenium javadocs and click BY on the sidebar for a list of different methods and how to use them.
If you need to click a link with an image, it would be better to locate the element with the explicit wait.
Example :
new WebDriverWait(driver, timeout).until(ExpectedConditions.presenceOfElementLocated(locator));

Get content of list of span elements with HTMLUnit and XPath

I want to get a list of values from an HTML document. I am using HTMLUnit.
There are many span elements with the class topic. I want to extract the content within the span tags:
<span class="topic">
Lean Startup
</span>
My code looks like this:
List<?> topics = (List)page.getByXPath("//span[#class='topic']/text()");
However whenever I try to iterate over the list I get a NoSuchElementException. Can anyone see an obvious mistake? Also links to good tutorials would be appreciated.
If you know you'll always have an <a> then just add it to the XPath and then get the text() from the a.
If you don't really know if you always will have an a in there then I'd recommend to use the .asText() method that all HtmlElement and their descendants have.
So first get each of the spans:
List<?> topics = (List)page.getByXPath("//span[#class='topic']");
And then, in the loop, get the text inside each of the spans:
topic.asText();
text() will only extract the text from that element, and that example you've given has no text component, only a child element.
Try this instead:
List<?> topics = (List)page.getByXPath("//span[#class='topic']");

get the second div from an HTML content using regex JAVA

I have a HTML code where i have the div with same id can we extract the second one.
HTML code
<div id="test>example </div>
<div id ="test">example11</div>
I need to extract the example11
This works (?s)<div id="test>.*<div id ="test">(.*?)</div> but i have a lot of div with same ID so this wont be good so can any one tell me do we have any other way to extract the content.
I know REGEX is not good for HTML paring and i have no choice.
try this !
<div.*>.*</div><div.*>(.*)</div>
now you can select the first group. and its done ;)
a dirty solution would be
<div.*>.*</div><div.*>.*</div><div.*>.*</div><div.*>.*</div><div.*>.*</div><div.*>.*</div><div.*>.*</div><div.*>.*</div><div.*>.*</div><div.*>.*</div><div.*>(.*)</div>
hehe but aint so proud about this one ofc....uhm...will think about it..

How to check in Java color of text when I have .html and .css files?

I have html with css and I want to check what is real color (and other visual text attributes) of specified text in html document. Can I do this with JSoup or must I look for some real-like html engine/processor? Speed of processing this operation is one of main factor.
I think he wants to retreive this data in Java program. So you need few things to do.
Download stylesheet files.
Parse html and find class attribute.
Match .class in css with html attribute and find specific information you want.
But beware if you want to find information about any html element without class attribute. In such case you need to find xpath of html element e.g:
<table class="entityTable">
<tr>
<td> <input type="text" value="abcdef" /></td>
</tr>
Then you need to find xpath like : body/div/.../table/tr/td/input and you need to match any css rules which can influence your input tag attributes.
.entityTable tr td input
{
color:red;
}
This is much more difficult so if html to parse is your page put everywhere class attribute into your html tags. Otherwise you need to find way to mach html tags to css rules.
Cheers.
Though it is still in beta, the Cobra HTML parser has this capability.
if you need to know accurate info about the object in web page,
like default border of standard HTML table, or color of a standard link,
use FireBug extension for FireFox.
If you're doing this in an applet, you can use javascript to collect the information, and pass it to your applet.
CSSBox is definitely what you want. It allow you to load external css and transform it in inline style for every dom element.
http://cssbox.sourceforge.net/manual/

Categories