I wanted to automate some processes on www.imgur.com, and I decided to use the Selenium WebDriver library for Java. I have been able to get much of my code to work with one hitch: when I access imgur directly only a white screen shoes up and will not change upon refresh. Accessing the sign in page directly yields an SSL error.
System.setProperty("webdriver.chrome.driver","C:\\workspace\\Test\\chromedriver.exe");
WebDriver driver = new ChromeDriver();
driver.get("https://www.imgur.com/signin");
WebElement username = driver.findElement(By.id("username"));
username.sendKeys("username");
WebElement password = driver.findElement(By.id("password"));
String pass = "password";
password.sendKeys(pass);
password.submit();
driver.get("http://www.imgur.com");
I have been able to work around this by using links google searches provide to imgur, but adding more features will require I be able to manage the URL directly.
Thanks in advance!
It's just http://imgur.com/, not http://www.imgur.com. That's why Google's links work, they are linking to the first one - a different url.
The www prefix is not required by any technical policy. Some choose to have urls both with and without the prefix point to the same server. Some choose to use only one or the other. It seems imgur is going without the prefix.
Here's a little more info on the www prefix:
http://en.wikipedia.org/wiki/World_Wide_Web#WWW_prefix
Related
I have some Selenium sessions where, if certain events occurs, I spawn a new browser and leave the old one as is so I later on can manually intervene. The problem is that it is hard to distinguish between such a deserted browser session and the one that is currently running.
Ideally I would like to add a badge to the browser icon that is displayed in the application switcher (cmd-tab) and the dock (but other solutions/suggestions are also welcome, like add something to the name of the browser). Is that possible?
Using Java on a Mac. A solution can be platform specific.
You can use below execute_script (This python code use java equalent)
from selenium import webdriver
import time
driver = webdriver.Chrome()
driver.get(
"https://stackoverflow.com/questions/9943771/adding-a-favicon-to-a-static-html-page")
head = driver.find_element_by_tag_name("head")
link = driver.find_element_by_css_selector('link[rel="shortcut icon"]')
driver.execute_script('''var link = document.createElement("link");
link.setAttribute("rel", "icon");
link.setAttribute("type", "image/png");
link.setAttribute("href", "https://i.stack.imgur.com/uOtHF.png?s=64&g=1");
arguments[1].remove();
arguments[0].appendChild(link);
''',head,link)
time.sleep(70000)
you can use link element on head tag to add favicon. THe above code is an exaple where stackoverflow site will showup with my avatar
Output:
You should find the current link the website uses, remove it and replace it with your new link as shown in the code
I'm trying to compile the results of my class from my college's results page (exam.msrit.edu).
The USNs for my class are from 1MS16CS001-100
Is there any way that I could go about writing a scraper program to enter different values in the USN box and gather data?
I am quite new to scraping but have decent enough exposure to Python and java
Any advice is much appreciated :)
Not necessarily a scrape, but you can use Selenium Web Driver to browse to the page and input everything for you. Selenium Web Driver can be found here.
Essentially, once it is installed you only need to instantiate the driver and then loop through a list of values inputting them as you go.
from selenium import webdriver
# V sets up browser. If you want to use chrome addtional setup required
browser = webdriver.Firefox()
for i in len(100): #loops to arbitrary amount
browser.get("http://exam.msrit.edu/") #HTTP GET Request to page
elem = browser.find_element_by_id('id') #This is an html id. Could also use name, xpath, etc.
elem.send_keys("your string {}".format(i)) #sends up keys
elem. browser.find_element_by_id('id) #id for search button
elem.click() #clicks that element
The documentation on selenium is pretty good. http://selenium-python.readthedocs.io/navigating.html
This will open a actual browser and will take some time to load so it will not be the quickest way to do it but it will work. You can even take screenshots.
Half year before. I already accessed to Factva.com(A NEWS DATABASE WITH ACCESS CONTROL) by Selenium driver,searching news automatically. However, two months ago, Factiva updates its access protocol from HTTP to HTTPS, which disables my program.
Under HTTP, my program can switch to a new URL within the same Chrome(EVEN a new tab). I did this by using the function: chrome.get(URL) where the URL is a direct search link. I even access the new search request by this method. You know, under HTTP search, you can just combine some key words and logic operator to produce a new URL. And I use chrome.get(URL) to switch to the new search results.
However, under HTPPS, I cannot use the above method anymore. When I use the method, Factiva shows the infos that I was in the off-login status( I should login again. Even though if I click the backforward button with my mouse, I can go back to the Factiva webpage and need not login)
My question is: Why my old method didn't work UNDER HTTPS? How can I let Factiva know, even I use the old method, I am already in the access and need not login again!
My old method under HTTP:
Class SearchTXT{
public String getSearchUrl(String keyword, String startTime, String endTime){
String f0=”http://global.factiva.com.***.uk/zhcn/du/headlines.asp?napc=S&searchText=”
String f2="&dateRangeMenu=custom&dateFormat=dmy&dateFrom=";
String f4="&dateTo=";
String f6="&sortBy=y¤tSources=";
String f7=”Wall Street Journal”;
String f8="&searchLanguage=custom&searchLang=EN&dedupe=0&srchuiver=2&accountid=9MEM000300&namespace=16";
Return f0+keyword+f2+startTime+f4+endTime+f6+f7+f8;
}
}
AFTER login the FACTIVA database by Selenium Driver, I got a chrome with the access to FACTIVA.
Then: I use this:
SearchTXT st = new SearchTXT():
String searchURL= st.geSearchUrl(“DELL”, “2011-11-10”, “2014-10-10”);
chrome.get(searchURL);
String pageSource=chrome.getPageSource();
Then I can use Jsoup.jar to Parser and collect the search results.
But this method never works after the updating from HTTP to HTTPS. These days I am so annoying about this problem.
Are there some alternative methods that can access the data under HTTPS? Maybe there are some tricks that can make my old method work again. Otherwise, I have to totally modify my whole key program and use JavascriptExecutor to deal with this problem. That means a heavy work. The worst is that the javascript code on FACTIVA website is very complicated. Some key javascript codes and functions are even not shown on the FACTIVA page source directly.
Thanks a LOT!
I am using Selenium web driver. I have below method to navigate to page.
public String navigate(String url){
driver = new FirefoxDriver();
driver.get(url);
return "Success";
}
Above code works fine if the server is up. some times server might be down then the page will not be loaded. Now how can I return "failure" string if the page is not loaded?
Thanks!
You can't directly test that a get() failed because the navigator always displays a page. You can either check that this page is a known error page, or check that you are not in the expected page.
First solution
It depends on the navigator. Chrome displays a special page when it can't find an url, firefox another page, etc.. You can test the title of those pages. For example firefox error page title is something like "Page load error" or "Problem loading page". Then all you have to do is something like :
if(driver.getTitle().equals("Problem loading page"))
return "failure";
Second solution
You must check the non-existence of an element that is present in every pages of your website (for example a logo or a home button). Say the ID of this element is "foo", you can do something like :
if(driver().findElements(By.id("foo")).isEmpty())
return "failure";
Dave Haeffner has a good solution for checking status codes using a proxy with the webdriver configuration.
http://elementalselenium.com/tips/17-retrieve-http-status-codes
The examples are in python, but the API is pretty close between python and java. I've not had much difficulty finding the java-analagous methods from the tips I've implemented myself.
That site has a lot of good information.
If using the Page Object Model, leveraging the LoadableComponentClass can help in determining whether the page is loaded or not either as a result of server down or something else.
Here's the link
https://code.google.com/p/selenium/wiki/LoadableComponent
I'm trying to extract Google Translate's pinyin transliteration of a Chinese word using Selenium but am having some trouble finding its WebElement.
For example, the word I look up is "事". My code would be as follows:
String word = "事";
WebDriver driver = new HtmlUnitDriver();
driver.get("http://translate.google.com/#zh-CN/zh-CN/" + word);
When I go to the actual page using my browser, I can see that its pinyin is "Shì" and that its id, according to Inspect Element is src-translit. However, when I go to view source, though the id="src-translit" is present, you don't see anything resembling "Shì" nearby. It's simply empty.
Thinking that the page has had no time to load properly. I implemented a waiting period of 30 seconds (kind of a long wait, I know, but I just wanted to know if it would work).
int timeoutInSeconds = 30;
WebDriverWait wait = new WebDriverWait(driver, timeoutInSeconds);
wait.until(ExpectedConditions.visibilityOfElementLocated(By.id("src-translit")));
Unfortunately, even with the wait time, transliteration and its text still returns as empty.
WebElement transliteration = driver.findElement(By.id("src-translit"));
String pinyin = transliteration.getText();
My question, then, is: what's happened to the src-translit? Why won't it display in the html code and how can I go about finding it and copying it from Google Translate?
Sounds like javascript isn't being executed. Looking at the docs, you can enable javascript like this
HtmlUnitDriver driver = new HtmlUnitDriver();
driver.setJavascriptEnabled(true);
or
HtmlUnitDriver driver = new HtmlUnitDriver(true);
See if that makes a difference.
EDIT:
I still think the problem is related to javascript. When I run it using FirefoxDriver, it works fine: the AJAX request is made, and src-translit element has been updated with Shi.
Workaround:
In any case, monitoring the network traffic, you can see that when you want to translate 事 , it makes an AJAX call to
http://translate.google.com/translate_a/t?client=t&sl=zh-CN&tl=zh-CN&hl=en&sc=2&ie=UTF-8&oe=UTF-8&pc=1&oc=1&otf=1&rom=1&srcrom=1&ssel=0&tsel=0&q=%E6%B2%92%E4%BA%8B
Which returns JSON:
[[["事","事","Shì","Shì"]],,"zh-CN",,[["事",,false,false,0,0,0,0]],,,,[],10]
Maybe you could parse that instead for now.