in my project I need to read some web pages. Usually it is pretty easy: I read the source code using java classes, parse the output and save interesting data.
But sometimes it is harder; for example reading Google pages. I think it is because of javascript. Do you know to get the real web page code, I mean without javascript? For example if I analyse the page using the Firebug extension of Firefox I read exactly what I need: javascript is correctly replaced by its results. Any idea to do it using Java?
Thanks in advance
Related
I realize this looks like a duplicate question, but it's not!(as far as I know, and I've searched a lot...) So for the last few days I've been trying to get the HTML content of my whatsapp web application but using the input stream reader provided by java seems to not give me the full html code. The URL I'm using is just https://web.whatsapp.com/, which I suppose could be a problem, but there aren't any personal URLs as far as I'm aware. However, in developer tools using the element inspector I can easily access and read the DOM elements I'm interested in. I'm wondering if there's a way I can get this source directly using java/perl/python.
I'm also looking to do this as a learning project, so preferably would like to stay away from tools such as jsoup and such. Thanks!
You can use selenium.webdriver in python. Something like:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("https://web.whatsapp.com/")
html = browser.page_source
If you want to get your own whatsapp page, you should use selenium to log into the site before getting the page_source.
I use Java. I want to get web page source code but on the page works JavaScript and I want get code generated by JavaScript (code which we see in firebug in firefox)
Anyone knows what I should do?
To inspect the page after modification by JavaScript, you need a client-side JavaScript engine that can run the scripts and then let you inspect the DOM.
HtmlUnit can do this - it is a "GUI-Less browser for Java programs".
See also this question
However, this won't give you the exact original page source, because that has already been parsed into a DOM by this point.
I think you want to see the source code of DOM Elements created after the page load via AJAX.
If that´s what you want, the only way to see it is through a DOM inspector, like firebug in firefox or Developers Tools in Chrome.
Going to "View source code" only shows the source at load-time.
If I understand your question, yes your javascript objects can be passed back to your java backend either by a creating a html <form> element with inputelements, fill them with your values and then submit the form, or asynchronously via ajax/json (which doesn't require re-loading your web page). For both methods you need to configure an endpoint on your java side to receive the submitted data and return some kind of confirmation to the client, i.e. your javascript. I would recommend googling "jQuery.post" for the javascript side and finding some examples for your java backend.
I have a situation where in I write to a text file programmatically using java and simultaneously I read from the same file using jQuery.
The problem I face is jQuery is unable to find the updated content whenever a content is written into the text file via java.
I have Googled a lot but the only results I find are for java and java processing and not for java and javascript (i.e A Client side and Server side)
I am not sure if this is even possible.
More about the question:
I write into the file the crawling results using java and I am trying to display the same using javascript (jQuery.post() method).
JAVA
A multi-threaded crawling program that crawls a website and does some functionality. I am trying to write some content into a text file using the same java program as and when the crawling happens. The content I write mostly are the details about which thread is getting invoked and what is the current link that is being crawled.
The reason I write this in the text file is I need to show the output in the UI so that people looking at the UI will understand what happens.
Writing happens perfectly as expected.
JAVASCRIPT (jQUERY)
This using the
jQuery.get or post ("sample.txt", function (result) {
$("#someID").html(result);
});
It reads from the text file normally but when java and javascript both are trying to access the file, It is the java that dominates leaving javascript behind thus jQuery is unable to fetch the updated content as and when it happens.
I guess this explanation is more than sufficient to make people understand what exactly my problem is !
On the whole, java and javascript try to access the same file at the same time. So there comes this issue.
Any help is appreciated.
Thanks in advance
I think the file is cached. Easiest thing is to request the file by different urls. Try something like "sample.txt?rnd="+Math.rand()
There can be synchronization problems and your data will be corrupted.
I have a question, is it must be done with Ajax? I think you are trying to figure out about
Ajax push and pull
This is not very easy to do and I wouldn't really recommend it. However, there is a better technology called websocket. So what you can do is, client can submit request to the server to write data into a file then server can send back updated content to the client. Moreover, this is much better than achieving the same objective through numerous amount of HTTP requests.
Additionally, if you want the crossbrowser compatibility, have a look at http://socket.io/
Thanks for all those who were trying to help me out.
I have finally come up with a solution. I, instead of using jquery post to directly read from file, am using another jsp file that reads the file contents and prints using out.println on screen, and after which I am using jQuery post to get the content written by that jsp file. Hence the synchronization problem is avoided.
Here is more about my explanation:
Earlier I had
java program -> Text File <- javascript (jQuery post) // Resulted in synchronization problem where in javascript was not able to access the updated content.
Now
java program -> Text file <- JSP file <- javascript (jQuery post) // Avoided the synchronization problem as that file is accessed by the same server side language. After that jQuery reads the content printed by JSP page.
After many changes, finally came up with one good working solution.
Thanks all.
I need a solution for getting HTML content from a browser. As rendering in a browser, js will be ran, and if not, js won't be ran. So any html libraries like lxml, beautifulsoup and others are all not gonna work.
I've searched a project named pywebkitgtk, but it's purpose is to create a browser with a front end.
Is there any way to put a url into a "fake browser" and render it and run its all javascript and save it into a html file? I don't need any front-end, just back-end is ok.
I need to use Python or java to do that.
selenium-rc lets you drive an actual browser for your purpose, under control of any of several languages at your choice, which include both Python and Java. Check it out!
For a detailed example of use with Python, see here.
i am very new to jsp... i am currently doing a project where i have to interface a card reader with my html page.
i got the card-reader code in a cpp and .h file. is there any way i can use these file with my jsp.. or do i have to recode it in java and include a .js file.
specifically, i have a text input for ID on my page. i need it to be populated with the input from card. i got the code to interact with card and extract that number in cpp program. so can i like call that function from my html page?
Why on earth you need to interface your card-reader to your JSP page. It doesn't make any sense to me, I am sorry. First understand that JSP is a Java web technology for presentation, which runs on server and spit HTML to the browser. Hence, what you get on the client is HTML.
Now, could you please elaborate what you are trying to achieve?
There are several way to do this:
You could do a system call from your jsp if your C++-code can run standalone.
You could use a Java-C++-bridge.
You could use the Java Native Interface.
You will have to look into the Java Native Interface if you want to reference C++ code from java.
For more information see the following:
Wikipedia
API Guide
Nice Guide in PDF format
A jsp renders HTML, in the part you will see in your browser you are no longer in your jsp, you are not even in your code anymore.
If you want to read a card from an HTML page you will need to ignore the fact you have jsp technology and realise its HTML technology you are using.
SO you will need an applet, some flash, some activeX or other browser technonlogy first before even trying to interface with the cpp
if you need to read from card JSP cannot help you. If you read card number otherwise and send it to JSP with POST, then you do not need any reading. What you might need is signed applet on user's side which will try to read card from card reader. Then I will advise you to use javax.card - java 1.6 has a support for reading smart cards ...
http://java.sun.com/javacard/