Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have used JSOUP for scraping and its works perfectly till the ajax and javascript not playing their roles to display webpage content .
Now guys any clue , how to scrape those content which get displayed with ajax or by JavaScript after page get loads completely .
Thanks in advance !!
You can use a headless browser as PhatomJS.
PhantomJS is a headless WebKit scriptable with a JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.
In order to ease your work, You could use CapserJS
CasperJS is a companion for PhatomJS which brings a greatly improved API to ease the creation of scraping and automation workflows.
These tools are very useful when you have to scrape a websites with dynamic content, for instance, websites where the content is displayed after it ran process in Javascript (sometimes including ajax calls).
You can see a example about how casper works here:
CasperJs and Jquery with chained Selects
You can't do it directly with JSoup. You'll need a headless browser, which is a much more complex thing. There are headless versions of Firefox, Safari, and others. Searches for "headless X" (where X is the browser engine you want to use) should turn up some useful projects.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I want to scrape my website and then use the data from the website to populate elements in my app, my website has login pages and certain pages only open after the login has been done.
I started working with HtmlUnit as it is a headless browser and completed the custom api in a java IDE, later i tried to use the jar i generated from the java IDE and found that there are incompatibility issues with HtmlUnit and Android.
Can anyone propose a solution to this problem?
Edit :
Since no one actually answered this question I am currently going with a work around using android's native WebView, settings its Visibility to invisible and then using javascript interfacing to a Java object, I can inject JS code to scrape any data.
Use Jsoup library for such purpose. Very handy and easy to use.
Start with this answer and follow documents and other examples.
Either you contribute to HtmlUnit to produce a version of HtmlUnit not using the missing dependencies from Android.
Or you can use an alternative method like this one, as this seems to be the path someone else go before you.
If a real headless browser able to manage any recent web features, would exist, it would mean a team would have developed it and then invest much effort in it (in supporting existing and coming features) consistently.
Apart from Opera, Chrome, IE, and Firefox browsers, there is no such team.
I would point out Chromium (CEF) as the most open and actively supported cross language wise. Try Cef for java
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I'm planning to write a simple program that displays course prerequisites for students at my university in graph form (ie as a network of vertices and edges). I'd like to embed the program in a webpage to save people the hassle of downloading an executable.
Currently I'm looking at making my program a Java applet (Java also would give me access to the handy Swing library), but I don't like the fact that applets can't be viewed on most mobile devices.
What alternatives to applets exist for a project like this? I'd like to make it compatible with as many devices as possible, and also not have to build the graphics stuff from scratch.
One final consideration is I'm doing this mostly as a learning exercise. Ideally the tools I'd be working with would be helpful to know in the future.
Please don't use applets. They have been sufficiently deprecated.
The best way to do this is by using html/js/css. A lot of useful libraries exist that can help you with this task. jQuery seems obvious, but there's also d3.js or vis.js for displaying visual representations of data, and bootstrap for responsiveness (mobile friendliness).
You may use Angularjs with angular-chart for Showing graph in Web Browser.
If your graph data is dynamic you might use Nodejs and mongoDB for backend.
angular-chart is responsive and its easy to show dynamic graph. But as it uses HTML5 canvas some mobile browsers might not show its transitions smoothly depending on the device.
I personally do not prefer using applet in web browser when the same functionality can be achieved using great frameworks like Angularjs.
why dont you try to build your project through Servlet framework
by the way cgi were removed by servlet because of the handling of the request
applet uses the same concept
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I was wondering if there is a way to pull specific data from a website using java (eclipse). For example, stock information from Yahoo Finances or from Bloomberg. I've looked around and have found some resources, but I haven't been able to get them to work, perhaps I'm missing something or they're outdated. If possible, I also want to avoid downloading any external resources, I've read up on JSoup and will consider it more seriously if all else fails.
Thanks for the help.
The answer is: yes there are many different ways to pull data from websites.
There are essentially 2 alternatives no matter the programming language (Java, .NET, Perl...):
the website has an API: in this case it will be a REST or SOAP API or perhaps a custom one (REST and SOAP probably account for the vast majority). Check out that website's API documentation if any. Also check out Programmable Web for references.
the website doesn't have an API. You then need to do what you call here as screen-scraping. Essentially you will send a series of HTTP GET or HTTP POST requests as your browser would. The server replies with a response which contains HTML code. From there on, you need to "parse" the HTML to extract the information you need. This will require heavy duty XPath (if the content is XML) or regular expressions (if the content is HTML or text).
Look at Apache HTTP Components to get you started.
If all you want is Finance information, Google has a JSON/REST API for that and there's a question on SO that will help you: How can I get stock quotes using Google Finance API?.
Yahoo also has one and there is also already an question on it in SO: Yahoo Finance All Currencies quote API Documentation
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
I do program to parse html pages and store images. But I ran into a problem. This page is dynamically generated by JavaScript, it means when I download the source code of the page and there are links to pictures. Can you please advise how to bypass it? Alternatively, some příkalad to be in Java. thank you
Downloading page:
http://www.lide.cz/detail/j0YbgS6Xp7AoMAOP
That is not as easy as it seems to be at first glance. You need a headless browser engine like PhantomJS or the like that runs the Javascript and returns you the generated HTML.
See this answer to get more information on that topic.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I often have to run test cases on a site I work on. Most of the time I only need to check that an element exists on a site, or I have to scrape a bit of data of the site. Up until now I have been using Jsoup to do this work.
I have recently been introduced to Selenium Webdriver. I have been doing a bit of reading about it but am just trying to figure out when it is best to use it. In cases like mine, checking if an element exists on a page or scraping data I presume I would still be better off using Jsoup? And Selenium would be best suited for filling out forms and clicking buttons on a site?
Selenium has the advantage that it runs an actual browser, it will therefore execute all the JS and so forth in a way that's very, very close to an actual user of the site. The API further allows you to write your tests in a language that's close to the actual interactions (e.g. click to click on a link).
So if you do have tests that have some form of interactions (forms, links, etc.) The overhead of Selenium is worth it. If you simply want to open an URL and check for the existence of some content Jsoup will do.