I have the html:
<p>
click here
Welcome
</p>
And I just want to retrieve the "Welcome" part using Xpath combined with the Jaxen lib the Xpath I am using is;
//p/text()
Now when I remove the /text() it retrieves;
click here
Welcome
With the /text() added it retrieve null
Is there any other way to retrieve everything inside the p tag but excluding any other tags?
From the XML parser point of view, there are multiple text elements to choose from (Welcome and the whitespace preceding and following it), so it doesn't choose any one. You have a few options, mainly stripping the whitespace before parsing or being more specific about the query, like selecting the second most text element:
//p/text()[2]
Related
I have a StringBuilder object in my class which I want to display on UI. This object has few html tags for ex: <li> <br> etc. I would like to know how to format this object so that the html tags are not shown as it is on screen, however they are converted to a readable format.
Note: I don't want to remove these tags and get a plain text. Rather if there is a <br> tag it should break line while displaying the text. Also, due to project restrictions I don't want to use any third party like jsoup etc.
Any help to achieve this would be appreciated!
How about simple .toString().replaceAll with specific replacements? Like:
<br> = \r\n
<li> = \r\n •
...and so on..
I'm trying to use JSoup to parse any web page and programmatically identify the elements that are content blocks, defined as any element that occurs multiple times and contains text, a link, and an image. All was going well until I got to http://fansided.com/. Images on this page shows up not in an <img> tag, but in an attribute like data-background="http://cdn.fansided.com/wp-content/blogs.dir/314/files/2015/01/8O7hjxQ-268x150.png".
Is there a way to use a single CSS selector (perhaps a regex?) that will select all elements containing images, regardless of their type?
try this one
Document doc = Jsoup.connect("http://fansided.com/").userAgent("Mozilla").get();
Elements select = doc.select("[data-background],[style~=background:url]");
It will get any element which contains the "data-background" or "style=background:url..." attribute.
I have an android app with a search functionality. The search functionality loops through locally stored html files and appends a span with a background color to words that equal the imputed word, the same as if you press ctrl -f on your desktop. The problem i am having is that if the user searches for head, body, div, span etc it adds a span to the html tags. My question. Is there an android validation library that deals with this issue or do i need to make my own blacklist? I am aware of Android form validator's libraries but but i am not sure that they are built for what i am looking for.
I've use jsoup before to strip out unwanted html tags. You could do this in order to make the html data more "searchable". Also look at Android's Html.escapeHtml(CharSequence) that converts html into a String.
I am using Apache PDFBox to read a PDF document that has a hierarchy defined by bookmarks. The hierarchy is in a tree form with contents only at the leaf level.
Extracting the text between two leaf level bookmarks using the following code:
Stripper.setStartBookmark(),
Stripper.setEndBookmark(),
Stripper.writeText()),
Returns text in the whole page instead. In short, my problem is similar to that mentioned in this thread.
Is there a way to extract the contents between two bookmarks?
If so, what should be the change in my code?
I am guessing that your bookmark does not contain the correct data.
It sounds like the bookmark you are using is only pointing to the page where your content starts, rather than a location on the page.
Here is an example of a bookmark that contains location data:
<Title Action="GoTo" Style="bold" Page="2 FitH 518">
Title Name
</Title>
I'm running test automation software that will rely on "id" tags to recognize controls.
I'm developing in java on eclipse using the GWT plugin and have tried using both of the below methods to set the id tag for a button "add".
add.setId("addId");
DOM.setElementAttribute(add.getElement(), "id", "addId");
neither of these are modifying the id property correctly. Have you had this problem before or do you know a workaround?
Thank you for any help!
Jerry
If I remember correctly, several browsers (or probably just Internet Explorer) won't let you set a DOM element's ID after it has been appended to the DOM. This limitation would be there even if you are directly doing this hand coded javascript. The browser won't show any error on setting id attribute but won't update the attribute either.
So you need to set the ID before appending the element to the DOM.
EDIT
From discussion below it appears that you were assuming that setting ID on Button widget's DOM element will set the ID on a <input type="button"> DOM element. But this assumptions is not proving to be correct because Button widget wraps the <input type="button"> DOM element in other DOM elements (like table or div).
EDIT
You may want to try Button.wrap(element) method if you want to customize the input type="button"> element. First create (DOM.createButton()) or locate a DOM element, set it's id, and wrap it using Button.wrap(element)
Long time ago I had Selenium test suite for a GWT app and I used ensureDebugId method to set the ID.
Edit - It still seems to be part of the API