How to check the visibility of an element in Jsoup? - java

I cannot find any direct method like isDisplayed() in Jsoup Element.
I can check the input with type = "hidden" by using the following code.
"HIDDEN".equals(elm.attr("type").toUpperCase())
But I need the CSS hidden to be captured as well. And also the inherited hidden elements.

Pshemo said it already in his comment: JSOUP is not a JavaScript interpreter. And JSOUP does not combine external CSS info into html. JSOUP just interprets html, and it is very good at this. Nothing much more but also nothing much less. You can also access the internet and load html pages with JSOUP, but that is really the limit of it.
About your problem: You should think hard if it is really needed to know if an element is visible or hidden. If it is in your context, you problably need a testing framework that behaves like a browser. For Java there are very good bindings to selenium webdriver. This drives a real browser to load and test pages. You can also scrape the content with selenium. I have good experience using both, selenium for accessing web content and then switching over to JSOUP for actually scraping. In your case you can use the powerful webdriver API directly to find out if an element is hidden or not.
Selenium webdriver is able to work with Firefox, Chrome and a bunch of other browsers. If you need a lightweight alternative you may use a headless browser. For that there exists PhantomJs, which is exellenttly supported by selenium. Or HTMLUnit, which is even lighter and uses the Java Rhino interpreter for JavaScript.
You see, there are quite some options to choose from to achieve what you want. Just not JSOUP, although it is a great library.

Related

Automatically navigating a website with java

A few years ago I made a program in .NET that uses the webbrowser control. With that I was able to automatically log in to a website, navigate, and download pictures. It was GUI based since it was using the webbrowser control. It had the advantage that I could follow along and see if something went wrong.
What is the best way forward to replicate that idea in Java? Is there a similar free control that acts as a webbrowser and gives access to the DOM?
I suspect the optimal way would be to use the Google Chrome Developer tools to replicate the login via GET/POST methods, but at first would prefer the webbrowser approach.
You can use Selenium for that. It is a free (open source) automated testing suite for web applications across different browsers and platforms. It mainly focuses on automating web-based applications.
In Java, You can use Selenium which will give you full control on Web-Browsers as well as DOM.
In Selenium Web Driver is a class which provides full automated control of a browser that we want to use.
This may help You!
Thanks!
You could use the JavaFX webView, class javafx.scene.web.WebView.
It uses a Webkit engine that is HTML 5 compliant and seem to be up to date (it was in java 8 & 9).
The engine has interraction with the JS engine that may help to introspect and navigate.
Example to get the "window" JS object:
JSObject window = (JSObject) webView.getEngine().executeScript("window");
Webview ewample:
JavaFx Webview HTML5 DragAndDrop

How to access html information that is generated by javascript?

I am trying to get article headlines from NY times .
But I think the html is generated by javascript, as it is only visible when I use the 'inspect element' on firefox.
How can I get to the articles? Probably, one of the ways is to emulate a browser but that seems like overkill.
I would prefer to do this in Java but Python is okay too. Your help is appreciated!
edit:
I tried using the api. But there are a lot of bad urls (page not found). Anyone has any more ideas on how to get the urls and headlines?
Selenium is probably what you're looking for; it's a browser automation framework.
You can use Python but Selenium actually uses Firefox to parse a site's content (last time I heard).
You can get the python version here but there are other options.
You could try to use a browser without GUI like HtmlUnit. It has good JavaScript-support and you're able to read the contents of the page from your Java-program.
As an alternative solution to this particular problem, how about using the New York Times API? They provide JSONP for JavaScript support. Using the API is probably more future-proof if they ever change the site layout.

how to make sure run website with javascript disabled

Our website UI is build in javaScript, JQuery. Lots of task is performed by jquery. UI doing lots of filter , sorting searching , tabular display.
Now we want to make sure that web site works fine even javascript is disabled on browser?
SPRING MVC <>
can any body give me idea how can we achieve this?
Can any other technology will do over it?
can GWT will be able to achieve this ?
If your website is built using JavaScript technology itself, then unless you build it WITHOUT JavaScript, there is no way you can achieve this.
GWT is out because it basically compile some Java-ish code into javascript. With spring MVC, you can rewrite your site to use only clients view (JSP) and server side actions (MVC controllers) for sorting, ...
What you were supposed to do with your particular requirement was to build a basic site that worked without JavaScript and then use JavaScript to make it much more whizzbang(Pretty with cool effects)!! :-D But since you have already built the site the only solution I can think off the top of my head is to crate a new basic site with HTML and make it the default site. From that basic you can check if JavaScript is enabled and then redirect the user to the whizzbang site(JavaScript enabled one) with a simple JavaScript redirect!
Some sites can degrade gracefully from a faster, slicker version that uses JS, to another that does not. Unfortunately that does not seem the case with your site.
One strategy is to define a redirect in HTML (should be set for 5-10 seconds1)that points to needjs.html that explains to the user that:
Sorry, we do not have the ability to provide this site without JS.
In JS, cancel the redirect.
The majority of sites now-a-days presume there's javascript. If you want to check how your site behaves, turn off javascript in the browser. If you want content for when javascript is disabled only, put it in a <noscript> tag, but be aware GoogleBot (SEO) runs without javascript, and will hit this. How to make the site function nicely without javascript? Build some ninja html and css and do all your work server-side. But again, since most every site presumes javascript is enabled, users who disable it are already familiar with how broken they've made the web. It may just be sufficient to put a <noscript> that includes a message about how this site requires javascript.

GWT + symfony2, am I crazy?

Does anyone has experience on integrating GWT and Symfony2?
Currently I'm using Symfony2 with the frontend being JQuery + HTML.
Writing Javascript drives me crazy although JQuery has already been used.
I'd like to know if there are any successful cases? GWT can generates javascript for me.
I only need to write type-safe and OO Java.
But, there is another concern, with GWT normal practice, the HTML elements are all created dynamically. So when a page is being crawled by the search engine, there are no elements for it to crawl. Is it a serious problem affecting the SEO rank?
GWT is a powerful tool you can write your code in java and GWT will generate Cross browser supported JavaScript.
Dynamic pages are created on the fly. These pages function well for users who visit the site, but they don’t work well for search engine crawlers.
Why? Because dynamically generated pages don’t actually exist until a user selects the variable that generate them. A search engine spider can’t select variables, so the pages don’t get generated and can’t be indexed. There are different strategies available for reference check Dynamic sites SEO tips

Handling CSS and JavaScript when building a Java browser

My task is to create a simple web browser in Java.
So far it can only read HTML pages.
I'm using standard JEditorPane component to display webpages.
Now I was wondering is there any way you could explain me how can I manage to display at least some simple pages that contain CSS/Javascript.
If you could point me to some useful links or appropriate examples I would be very happy.
Well, my advice would be to look at open source rendering engines such as Gecko - https://developer.mozilla.org/en/Gecko_FAQ
You can embed Gecko with Java using the JREX library - http://jrex.mozdev.org/
Starting from scratch with a problem like this is a very big task, and as your username is AmateurProgrammer, I wouldn't recommend it.
There alrady is some prior art for the Java browser segment.
concerning javascript, you will have to use a javascript interpreter in Java. A renowned one is Rhino (by Mozilla). Its integration may reveals to be an interesting challenge.
concerning CSS, it seems the question has already been asked ...

Categories