I am in the process of writing a program whose purpose is centered around generating custom URLs for intelius.com and then extracting data from them with selenium. I have observed interesting behavior that I am unsure how to address.
My program creates URLs after the following pattern: https://intelius.com/people-search/LASTNAME/CITY-STATE, but I have found that attempting to access these constructed links consistently leads to a timeout error.
For example, http://intelius.com/people-search/Williams/Brooklyn-NY does not load the expected results page
Digging around in the website's source, I have found what appears to be a link validator script — what exactly that means, I do not know — and am unsure how to proceed.
How exactly would I go about authenticating my queries, without programming selenium to manually input the data into the search textbox and to press the submit button? Is my link-construction approach flawed in some blatantly obvious manner? I am a bit lost and would appreciate some direction. Thanks!
I think your problem is using http instead of https, and omitting www from URL. So this works:
https://www.intelius.com/people-search/Williams/Brooklyn-NY
The problem lies in the way the URL being formed. You need to construct and pass the arguments the way the web application understands it. The following works -
https://www.intelius.com/people-search/William-Brooklyn/NY
Related
I am trying to automate a login page which appears to be using Knockout.js.
HtmlUnit doesnt seem to load the full page, it is missing all the input fields which makes it impossible to actually login.
I have tried ensuring that the JavaScript timeouts are set and have also enabled NicelyResynchronizingAjaxController I am waiting after the page has loaded using:
waitForBackgroundJavaScript,
waitForBackgroundJavaScriptStartingBefore
Thread.sleep (just for
good measure)
I have even checked for additional windows (WebClient.getWebWindows), but there just seems to be the one.
It appears Knockout (assuming it is actually Knockout) is creating the inputs, is this just too much for htmlunit or have I missed something?
This is a know problem (see https://github.com/HtmlUnit/htmlunit/issues/37).
Hopefully i will find some time to figure out what is going wrong here.
I have a code, that it's not mine, and I have to make a test with Selenium for that code.
The problem that I have it's that the web application have ajax functions (that I didn't see before) and they are refresh a part of the application all the time with data, that it's changing in each refresh. I have to retrieve this information to work with them in the project with Java.
Searching by the Internet I found that ajax it's Asynchronous JavaScript And XMLand that means that the information in a part of your web application can refresh without refreshing the page (correct me if I have a wrong idea).
Also I found that I need a Servlet or something like that to retrieve the information but I couldn't really understand what it's it and how I had to use it for my objective.
Anyway, I suppose that if the web application that I'm using it's retrieving data in each refresh, it would have to use some function or something to make this properly. And my doubt it's that if any of you know how it is usually retrieve (I mean the information in ajax functions) to search it in the project or if there is something like some function or method to know WHEN and WHAT data the application it's retrieving in each refresh without using the functions that I suppose that are in the code (and that I couldn't find).
Any help would be appreciated, both to clearify some information of ajax (that I put above) and to clearify how I can engage this problem.
Thanks in advance!
P.S: I can't put any code here because it's the code of a third person and it's not mine. I expect you could help me.
I'm trying to read in the HTML from a webpage and parse information from it using a URLConnection in Java. It works, but the page only loads part of the content, the rest is loaded as the user scrolls down the page. Is there any way for a Java program to trigger this? My program doesn't actually open the webpage in a browser, just a connection to the page. If it's relevant, I can add the URL I'm accessing.
I've been trying to find the answer, and found a few similar topics on here, most of them without answers. However, I eventually made my way to this topic, which sounds like what I need, but I looked at the URLs of the calls being made and they're not always the same, so I can't just type them into the program. I looked at the topic it was supposedly a duplicate of, but that didn't seem to apply to my problem either, unless I misunderstood something. Is there any way to find these URLs each time the program runs, or any way to trick the connection into thinking I'm scrolling down the page? Or can I make a general "request" or "POST" as I've seen in some related topics, that will automatically call the appropriate URL (An explanation of a "POST" would be appreciated as well)?
I am trying to download the contents of a site. The site is a magneto site where one can filter results by selecting properties on the sidebar. See zennioptical.com for a good example.
I am trying to download the contents of a site. So if we are using zennioptical.com as an example i need to download all the rectangular glasses. Or all the plastic etc..
So how do is send a request to the server to display only the rectangular frames etc?
Thanks so much
You basic answer is you need to do a HTTP GET request with the correct query params. Not totally sure how you are trying to do this based on your question, so here are two options.
If you are trying to do this from javascript you can look at this question. It has a bunch of answers that show how to perform AJAX GETs with the built in XMLHttpRequest or with jQuery.
If you are trying to download the page from a java application, this really doesn't involve AJAX at all. You'll still need to do a GET request but now you can look at this other question for some ideas.
Whether you are using javascript or java, the hard part is going to be figuring out the right URLs to query. If you are trying to scrape someone else's site you will have to see what URLs your browser is requesting when you filter the results. One of the easiest ways to see that info is in Firefox with the Web Console found at Tools->Web Developer->Web Console. You could also download something like Wireshark which is a good tool to have around, but probably overkill for what you need.
EDIT
For example, when I clicked the "rectangle frames" option at zenni optical, this is the query that fired off in the Web Console:
[16:34:06.976] GET http://www.zennioptical.com/?prescription_type=single&frm_shape%5B%5D=724&nav_cat_id=2&isAjax=true&makeAjaxSearch=true [HTTP/1.1 200 OK 2328ms]
You'll have to do a sufficient number of these to figure out how to generate the URLs to get the results you want.
DISCLAIMER
If you are downloading someone's else data, it would be best to check with them first. The owner of the server may not appreciate what they might consider stealing their data/work. And then depending on how you use the data you pull down, you could be venturing into all sorts of ethical issues... Then again, if you are downloading from your own site, go for it.
Although I've been programming for a few years I've only really dabbled in the web side of things, it's been more application based for computers up until now. I was wondering, in java for example, what library defined function or self defined function I would use to have a program launch a web browser to a certain site? Also as an extension to this how could I have it find a certain field in the website like a search box for instance (if it wasnt the current target of the cursor) and then populate it with a string and submit it to the server? (maybe this is a kind of find by ID scenario?!)
Also, is there a way to control whethere this is visible or not to the user. What I mean is, if I want to do something as a background task whilst the user carries on using the program, I will want the program to be submitting data to a webpage without the whole visual side of things that would interrupt the user?
This may be basic but like I say, I've never tried my hand at it so perhaps if someone could just provide some rough code outlines I'd really appreciate it.
Many thanks
I think Selenium might be what you are looking for.
Selenium allows you to start a Web browser, launch it to a certain website and interact with it. Also, there is a Java API (and a lot of other languages, by the way) allowing you to control the launched browser from a Java application.
There are some tweaking to do, but you can also launch Selenium in background, using a headless Web browser.
as i understand it you want to submit data to a server via the excisting webinterface?
in that case you need to find out how the URL for the request is build and then make a http-call using the corresponding URL
i advice reading this if it involves a POST submit