I am trying to automate a login page which appears to be using Knockout.js.
HtmlUnit doesnt seem to load the full page, it is missing all the input fields which makes it impossible to actually login.
I have tried ensuring that the JavaScript timeouts are set and have also enabled NicelyResynchronizingAjaxController I am waiting after the page has loaded using:
waitForBackgroundJavaScript,
waitForBackgroundJavaScriptStartingBefore
Thread.sleep (just for
good measure)
I have even checked for additional windows (WebClient.getWebWindows), but there just seems to be the one.
It appears Knockout (assuming it is actually Knockout) is creating the inputs, is this just too much for htmlunit or have I missed something?
This is a know problem (see https://github.com/HtmlUnit/htmlunit/issues/37).
Hopefully i will find some time to figure out what is going wrong here.
I am in the process of writing a program whose purpose is centered around generating custom URLs for intelius.com and then extracting data from them with selenium. I have observed interesting behavior that I am unsure how to address.
My program creates URLs after the following pattern: https://intelius.com/people-search/LASTNAME/CITY-STATE, but I have found that attempting to access these constructed links consistently leads to a timeout error.
For example, http://intelius.com/people-search/Williams/Brooklyn-NY does not load the expected results page
Digging around in the website's source, I have found what appears to be a link validator script — what exactly that means, I do not know — and am unsure how to proceed.
How exactly would I go about authenticating my queries, without programming selenium to manually input the data into the search textbox and to press the submit button? Is my link-construction approach flawed in some blatantly obvious manner? I am a bit lost and would appreciate some direction. Thanks!
I think your problem is using http instead of https, and omitting www from URL. So this works:
https://www.intelius.com/people-search/Williams/Brooklyn-NY
The problem lies in the way the URL being formed. You need to construct and pass the arguments the way the web application understands it. The following works -
https://www.intelius.com/people-search/William-Brooklyn/NY
This is my first time working with Java and tomcat and I'm a little confused about how everything fits together - I've googled endlessly but can't seem to wrap my head around a few concepts.
I have completed a Java program that outputs bufferedImages. My goal is to eventually get these images to display on a webpage.
I'm having trouble understanding how my java file (.java) which is currently running in NetBeans interacts with a servlet and/or JSP.
Ideally, a servlet or JSP (not 100% clear on how either of those works. I mostly understand the syntax by looking at various examples, however) could get my output (the bufferedImages) when the program runs and the HTML file could somehow interact with whatever they are doing so that the images could be displayed on the webage. I'm not sure if this is possible. If anyone could suggest a general order of going about things, that would be awesome.
In every example/tutorial i find, no one uses .java files - there are .classes in the WEB-INF folder -- it doesn't seem like people are using full on java programs. However, I need my .java program to run so that I can retrieve the output and use it on the webapp.
Any general guidance would be greatly appreciated!
I think this kind of documentation is sadly lacking; too many think that an example is an explanation, and for all the wonderful things you can get out of an example, sometimes an explanation is not one of them. I'm going to attempt to explain some of the overall concepts you mentioned; they aren't going to help you solve your buffered image display problem directly, unfortunately.
Tomcat and other programs like it are "web servers"; these are programs that accept internet connections from other computers and return information in a particular format. When you enter a "www" address in a browser, the string in that address eventually ends up (as a "request") at a web server, which then returns you a web page (also called a "response"). Tomcat, Apache, Jetty, JBoss, and WebSphere are all similar programs that do this sort of thing. In the original form of the world-wide-web, the request string represented a file on the server machine, and the web server's job was to return that (html) file for display in the browser.
A Servlet is a kind of java program that runs on some web servers. The servlet itself is a java class with methods defined by the javax.servlet.Servlet interface. In webservers that handle servlets, someone familiar with the configuration files can instruct the web server program to accept certain requests and, instead of returning an HTML file (or whatever) from the server, to instead execute the servlet code. A servlet, by its nature, returns content itself - think of a program that outputs HTML and you're on the right track.
But it turns out to be a pain to output complete HTML from a program -- there's a tedious amount of HTML that doesn't have much to do with the "heavy lifting" for which you need a programming language of some sort. You have to have Java (or some language) to make database inquiries, filter results, etc., but you don't really need Java to put in the and the hundreds of other tags that a modern web page needs.
So a JavaServerPage (JSP) is a special kind of hybrid, a combination of HTML and things related to servlets. You CAN put java code directly in a JSP file, but it is usually considered better to use html-like 'tags' which are then interpreted by a "JSP compiler" and turned into a servlet. So the creator of the JSP page learns how to use these tags, which are (if correctly constructed) more logical for web page creators than the java programming language is, and in fact doesn't have to be a programmer at all. So a programmer, working with this content-oriented person, creates tags for the page to use to describe how it wants its page to look, then the programmer does the programming and the content-person creates the web pages with it.
For your specific problem, we'll need more detail to help you. Do you envision this program running and using some information provided by the user as part of his request to generate the images? Or are the images generated once and now you just need to display them? I think that's a topic for another question, actually.
This ought to be enough to get you started. I would now suggest the wikipedia articles on these things to get more details, and good luck getting your head around the concepts. I hope this has helped.
This addendum provided after a comment you made about wanting to do a slideshow.
An important web programming concept is the client-server and request-response nature of it. In the traditional, non-Javascript web environment, the client (read browser) sends a request to the server, and the server sends back bytes. There is no ongoing connection between the two computers after the stream of bytes finishes, and there are restrictions on how long that stream of bytes can continue. Additionally, outside of this request and response, the server usually has no capability to send anything to the client unless the client requests it; the client 'drives' the exchange of data.
So a 'slideshow', for instance, where the server periodically sends bytes representing an additional image, is not the way HTML works (or was meant to work). You could do one under the user's control: the user presses a button for each next picture, the browser sends a request for the next picture and it appears in the place where the previous one was. That fits the request-response paradigm.
Now, the effect of an automatic slideshow is possible using Javascript. Javascript, based on Java but otherwise unrelated, is a scripting language; it is part of an HTML page, is downloaded with the page to the browser, and it runs in the browser's environment (as opposed to a JSP/servlet, which executes on the server). You can write a timer in Javascript, and it can wait N seconds and send another request to the server (for another picture or whatever). Javascript has its own rules, etc., but even so I think it a good idea to keep in mind that you aren't just doing HTML any more.
If a slideshow is what you are after, then you don't need JSP at all. You can create an HTML page with places for the picture being displayed, labels and text and etc., buttons for stopping the slideshow and so forth, in HTML, and Javascript for requesting additional pictures.
You COULD use JSP to create the page, and it might help you depending on how complex the page is, but it isn't going to help you with an essential function: getting the next picture for the slideshow. When the browser requests a JSP page:
the request goes to the server,
the server determines the page you want and that it is a JSP page,
the server compiles that page to a servlet if it hasn't already,
the servlet runs, producing HTML output according to the tags now compiled into Java,
the server returns HTML to the browser.
Then the server is done, and more bytes won't go to the browser until another request is made.
Again, I hope this has helped. Your example of a slideshow has revealed some basic concepts that need to be understood about web programming, servers, HTML, JSPs, and Javascript, and I wish you luck on your journey through them all. And if you come to think of it all as a bit more convoluted than it seems it needed to be, well, you won't be the first.
You can create a JSP that invokes a method in your Java class to retrieve the BufferedImage. Then you must set the content type to the adequate image type:
response.setContentType()
The tricky part is that you must print the image from the JSP, so you have to call:
response.getOutputStream()
from your JSP, and with that OutputStream you must pass the bytes of your BufferedImage.
Note that in that JSP you'll not be able to print out HTML, only the image.
I'm not sure where you need more clarification, as it seems you're a bit confused about the concepts.
BTW.: A JSP is just a servlet that has an easier syntax to write HTML and Java code together.
I am trying to download the contents of a site. The site is a magneto site where one can filter results by selecting properties on the sidebar. See zennioptical.com for a good example.
I am trying to download the contents of a site. So if we are using zennioptical.com as an example i need to download all the rectangular glasses. Or all the plastic etc..
So how do is send a request to the server to display only the rectangular frames etc?
Thanks so much
You basic answer is you need to do a HTTP GET request with the correct query params. Not totally sure how you are trying to do this based on your question, so here are two options.
If you are trying to do this from javascript you can look at this question. It has a bunch of answers that show how to perform AJAX GETs with the built in XMLHttpRequest or with jQuery.
If you are trying to download the page from a java application, this really doesn't involve AJAX at all. You'll still need to do a GET request but now you can look at this other question for some ideas.
Whether you are using javascript or java, the hard part is going to be figuring out the right URLs to query. If you are trying to scrape someone else's site you will have to see what URLs your browser is requesting when you filter the results. One of the easiest ways to see that info is in Firefox with the Web Console found at Tools->Web Developer->Web Console. You could also download something like Wireshark which is a good tool to have around, but probably overkill for what you need.
EDIT
For example, when I clicked the "rectangle frames" option at zenni optical, this is the query that fired off in the Web Console:
[16:34:06.976] GET http://www.zennioptical.com/?prescription_type=single&frm_shape%5B%5D=724&nav_cat_id=2&isAjax=true&makeAjaxSearch=true [HTTP/1.1 200 OK 2328ms]
You'll have to do a sufficient number of these to figure out how to generate the URLs to get the results you want.
DISCLAIMER
If you are downloading someone's else data, it would be best to check with them first. The owner of the server may not appreciate what they might consider stealing their data/work. And then depending on how you use the data you pull down, you could be venturing into all sorts of ethical issues... Then again, if you are downloading from your own site, go for it.
Although I've been programming for a few years I've only really dabbled in the web side of things, it's been more application based for computers up until now. I was wondering, in java for example, what library defined function or self defined function I would use to have a program launch a web browser to a certain site? Also as an extension to this how could I have it find a certain field in the website like a search box for instance (if it wasnt the current target of the cursor) and then populate it with a string and submit it to the server? (maybe this is a kind of find by ID scenario?!)
Also, is there a way to control whethere this is visible or not to the user. What I mean is, if I want to do something as a background task whilst the user carries on using the program, I will want the program to be submitting data to a webpage without the whole visual side of things that would interrupt the user?
This may be basic but like I say, I've never tried my hand at it so perhaps if someone could just provide some rough code outlines I'd really appreciate it.
Many thanks
I think Selenium might be what you are looking for.
Selenium allows you to start a Web browser, launch it to a certain website and interact with it. Also, there is a Java API (and a lot of other languages, by the way) allowing you to control the launched browser from a Java application.
There are some tweaking to do, but you can also launch Selenium in background, using a headless Web browser.
as i understand it you want to submit data to a server via the excisting webinterface?
in that case you need to find out how the URL for the request is build and then make a http-call using the corresponding URL
i advice reading this if it involves a POST submit