Half year before. I already accessed to Factva.com(A NEWS DATABASE WITH ACCESS CONTROL) by Selenium driver,searching news automatically. However, two months ago, Factiva updates its access protocol from HTTP to HTTPS, which disables my program.
Under HTTP, my program can switch to a new URL within the same Chrome(EVEN a new tab). I did this by using the function: chrome.get(URL) where the URL is a direct search link. I even access the new search request by this method. You know, under HTTP search, you can just combine some key words and logic operator to produce a new URL. And I use chrome.get(URL) to switch to the new search results.
However, under HTPPS, I cannot use the above method anymore. When I use the method, Factiva shows the infos that I was in the off-login status( I should login again. Even though if I click the backforward button with my mouse, I can go back to the Factiva webpage and need not login)
My question is: Why my old method didn't work UNDER HTTPS? How can I let Factiva know, even I use the old method, I am already in the access and need not login again!
My old method under HTTP:
Class SearchTXT{
public String getSearchUrl(String keyword, String startTime, String endTime){
String f0=”http://global.factiva.com.***.uk/zhcn/du/headlines.asp?napc=S&searchText=”
String f2="&dateRangeMenu=custom&dateFormat=dmy&dateFrom=";
String f4="&dateTo=";
String f6="&sortBy=y¤tSources=";
String f7=”Wall Street Journal”;
String f8="&searchLanguage=custom&searchLang=EN&dedupe=0&srchuiver=2&accountid=9MEM000300&namespace=16";
Return f0+keyword+f2+startTime+f4+endTime+f6+f7+f8;
}
}
AFTER login the FACTIVA database by Selenium Driver, I got a chrome with the access to FACTIVA.
Then: I use this:
SearchTXT st = new SearchTXT():
String searchURL= st.geSearchUrl(“DELL”, “2011-11-10”, “2014-10-10”);
chrome.get(searchURL);
String pageSource=chrome.getPageSource();
Then I can use Jsoup.jar to Parser and collect the search results.
But this method never works after the updating from HTTP to HTTPS. These days I am so annoying about this problem.
Are there some alternative methods that can access the data under HTTPS? Maybe there are some tricks that can make my old method work again. Otherwise, I have to totally modify my whole key program and use JavascriptExecutor to deal with this problem. That means a heavy work. The worst is that the javascript code on FACTIVA website is very complicated. Some key javascript codes and functions are even not shown on the FACTIVA page source directly.
Thanks a LOT!
Related
I'm trying to compile the results of my class from my college's results page (exam.msrit.edu).
The USNs for my class are from 1MS16CS001-100
Is there any way that I could go about writing a scraper program to enter different values in the USN box and gather data?
I am quite new to scraping but have decent enough exposure to Python and java
Any advice is much appreciated :)
Not necessarily a scrape, but you can use Selenium Web Driver to browse to the page and input everything for you. Selenium Web Driver can be found here.
Essentially, once it is installed you only need to instantiate the driver and then loop through a list of values inputting them as you go.
from selenium import webdriver
# V sets up browser. If you want to use chrome addtional setup required
browser = webdriver.Firefox()
for i in len(100): #loops to arbitrary amount
browser.get("http://exam.msrit.edu/") #HTTP GET Request to page
elem = browser.find_element_by_id('id') #This is an html id. Could also use name, xpath, etc.
elem.send_keys("your string {}".format(i)) #sends up keys
elem. browser.find_element_by_id('id) #id for search button
elem.click() #clicks that element
The documentation on selenium is pretty good. http://selenium-python.readthedocs.io/navigating.html
This will open a actual browser and will take some time to load so it will not be the quickest way to do it but it will work. You can even take screenshots.
I have an URL which shows me a coupon form based on id:
GET /coupon/:couponId
All the coupon forms are different and submit different POST params to:
POST /saveCoupon/:id
I want to have a convenient way of debugging my coupons and be able to have a way of viewing actual POST params submitted.
I've made a controller on URL POST /outputPOST/saveCoupon/:id which saves nothing, but prints to browser POST params received.
Now I want to have an URL like GET /changeActionUrl/coupon/:couponId which calls GET /coupon/:couponId and then substitutes form's action URL POST /saveCoupon/:id with POST /outputPOST/saveCoupon/:id .
In other words I want to do something like:
Result.getHtml().replace("/saveCoupon/","/outputPOST/saveCoupon/");
With this I can easily debug my coupons just by adding "/outputPOST" in the browser.
You could just use a bookmarklet and javascript to replace all of the forms' action attributes. That way your developer can do it with one click instead of changing urls.
Something like this will prefix all form actions on the page with "/outputPOST".
javascript:(function(){var forms=document.getElementsByTagName('FORM');for(i=0;i<forms.length;++i){forms[i].setAttribute('action','/outputPOST'+forms[i].getAttribute('action'));}})();
I don't understand, at least not everything ;)
In general you can debug every piece of Play app using debugger (check for your favorite IDE tips how to do that) - this will be always better, faster, etc etc, than modifying code only for checking incoming values.
I.e. Idea 13+ with Play support allows for debbuging like a dream!
I wanted to automate some processes on www.imgur.com, and I decided to use the Selenium WebDriver library for Java. I have been able to get much of my code to work with one hitch: when I access imgur directly only a white screen shoes up and will not change upon refresh. Accessing the sign in page directly yields an SSL error.
System.setProperty("webdriver.chrome.driver","C:\\workspace\\Test\\chromedriver.exe");
WebDriver driver = new ChromeDriver();
driver.get("https://www.imgur.com/signin");
WebElement username = driver.findElement(By.id("username"));
username.sendKeys("username");
WebElement password = driver.findElement(By.id("password"));
String pass = "password";
password.sendKeys(pass);
password.submit();
driver.get("http://www.imgur.com");
I have been able to work around this by using links google searches provide to imgur, but adding more features will require I be able to manage the URL directly.
Thanks in advance!
It's just http://imgur.com/, not http://www.imgur.com. That's why Google's links work, they are linking to the first one - a different url.
The www prefix is not required by any technical policy. Some choose to have urls both with and without the prefix point to the same server. Some choose to use only one or the other. It seems imgur is going without the prefix.
Here's a little more info on the www prefix:
http://en.wikipedia.org/wiki/World_Wide_Web#WWW_prefix
I been asked to make a selenium test that checks the local database of html5 and verify that the information in there matches what's being displayed on the screen. This is for a mobile application that can be used on chrome I have everything working as far as selenium working with chrome.Now I am just stuck on trying to find a method that can be used for with selenium that will access the local database storage. There's a interface in selenium html5 packages that DatabaseStorage however I can not figure out how that works or how to use it. The test cases are being written in Java. Thank you all for any help you can provide on this.
I have tried to create a new object of the database storage. which dident work i tried creating a new object of result set also tried doing implements database storage. in the API for database storage it says it a interface but it dose not list a constructor. i not sure how to access a method when there's no constructor for the interface.
// Database Storage
private ResultSet executeQuery(String statement, String... param) {
String databaseName = "'HTML5', '1.0',"
+" 'Offline document storage', 100*1024";
return ((DatabaseStorage) driver).executeSQL(databaseName, statement, (Object[]) param);
}
see Selenium's HTML5 test for more details.
I am using htmlunit in jython and am having trouble selecting a pull down link. The page I am going to has a table with other ajax links, and I can click on them and move around and it seems okay but I can't seem to figure out how to click on a pulldown menu that allows for more links on the page(this pulldown affects the ajax table so its not redirecting me or anything).
Here's my code:
selectField1 = page.getElementById("pageNumSelection")
options2 = selectField1.getOptions()
theOption3 = options2[4]
This gets the option I want, I verify its right. so I select it:
MoreOnPage = selectField1.setSelectedAttribute(theOption3, True)
and I am stuck here(not sure if selecting it works or not because I don't get any message, but I'm not sure what to do next. How do I refresh the page to see the larger list? When clicking on links all you have to do is find the link and then select linkNameVariable.click() into a variable and it works. but I'm not sure how to refresh a pulldown. when I try to use the webclient to create an xml page based on the the select variable, I still get the old page.
to make it a bit easier, I used htmlunit scripter and got some code that should work but its java and I'm not sure how to port it to jython. Here it is:
try
{
page = webClient.getPage( url );
HtmlSelect selectField1 = (HtmlSelect) page.getElementById("pageNumSelection");
List<HtmlOption> options2 = selectField1.getOptions();
HtmlOption theOption3 = null;
for(HtmlOption option: options2)
{
if(option.getText().equals("100") )
{
theOption3 = option;
break;
}
}
selectField1.setSelectedAttribute(theOption3, true );
Have a look at HtmlForm getSelectedByName
HtmlSelect htmlSelect = form.getSelectByName("stuff[1].type");
HtmlOption htmlOption = htmlSelect.getOption(3);
htmlOption.setSelected(true);
Be sure that WebClient.setJavaScriptEnabled is called. The documentation seems to indicate that it is on by default, but I think this is wrong.
Alternatively, you can use WebDriver, which is a framework that supports both HtmlUnit and Selenium. I personally find the syntax easier to deal with than HtmlUnit.
If I understand correctly, the selection of an option in the select box triggers an AJAX calls which, once finished, modifies some part of the page.
The problem here is that since AJAX is, by definition, asynchronous, you can't really know when the call is finished and when you may inspect the page again to find the new content.
HtmlUnit has a class named NicelyResynchronizingAjaxController, which you can pass an instance of to the WebClient's setAjaxController method. As indicated in the javadoc, using this ajax controller will automatically make the asynchronous calls coming from a direct user interaction synchronous instead of asynchronous. Once the setSelectedAttribute method is called, you'll thus be able to see the changed made to the original page.
The other option is to use WebClient's waitForBackgrounfJavascript method after the selection is done, and inspect he page once the background JavaScript has ended, or the timeout has been reached.
This isn't really an answer to the question because I've not used HtmlUnit much before, but you might want to look at Selenium, and in particular Selenium RC. With Selenium RC you are able to control the interactions with a page displayed in a native browser (Firefox for example). It has developer API's for Java and Python amongst others.
I understand that HtmlUnit uses its own javascript and web browser rendering engine and I'm wondering whether that may be a problem.