I have successfully saved the source code of a webpage but some webpages dynamically serve the info. That makes it so I cannot get the info that’s on the page.. If I right click and "inspect element" it’s there.. Is there any way to copy what is loaded from that webpage and save it to a text file?
if you're looking for the web page source use this
Create a driver instance
WebDriver driver=new WebDriver();
call the below function when you think the dynamically served content is appeared on the web page
driver.getPageSource();
this should work.
use selenium webdriver to get page source and even you can get the values of a particular element say inputbox, checkbox,dropdown etc from the webpage. Also you can enter the values and submit the form
reate a new instance of the html unit driver
// Notice that the remainder of the code relies on the interface,
// not the implementation.
WebDriver driver = new HtmlUnitDriver();
// And now use this to visit Google
driver.get("http://www.google.com");
System.out.println( driver.getPageSource());
driver.quit();
}
}
refer getPageSource() in Selenium WebDriver(a.k.a Selenium2) using Java
https://code.google.com/p/selenium/wiki/GettingStarted
Related
I'm practicing and trying to parse behance.net to retrieve .jpg files.
First, I tried with JSOUP, but I only receives and JS code without any useful code. Then i tried with selenium:
System.setProperty("webdriver.chrome.driver", "S:\\behance-id\\src\\main\\resources\\chromedriver.exe");
WebDriver driver = new ChromeDriver();
driver.get("https://www.behance.net/gallery/148589707/Hercules-and-Randy");
String str = driver.getPageSource();
And I got same result. Through Google Chrome inspect option I found what I need:
But I cannot acces to this source page via Selenium and JSOUP and other instruments.
I only receive this with <script> tags:
Is it possible?
That page is loading its resources dynamically, after the original HTML, so you should use waits in Selenium. This is the Java example for waiting an element to be loaded in page, from the documentation:
WebDriver driver = new ChromeDriver();
driver.get("https://google.com/ncr");
driver.findElement(By.name("q")).sendKeys("cheese" + Keys.ENTER);
// Initialize and wait till element(link) became clickable - timeout in 10 seconds
WebElement firstResult = new WebDriverWait(driver, Duration.ofSeconds(10))
.until(ExpectedConditions.elementToBeClickable(By.xpath("//a/h3")));
// Print the first result
System.out.println(firstResult.getText());
The documentation can be found at https://www.selenium.dev/documentation/webdriver/waits/
I'm trying to using the jsoup library to get 'li' from a website. The problem is this:
If I open the source of website with CTRL+U(which is the same read by jsoup), the 'ul' tag is hidden.
if I open the code with the fuction "inspect code" of google chrome,'li' are shown.
Posting the code is not necessary; I only want to know how can access to this 'li' with jsoup or other java free libraries, Whereas in the source code(and through jsoup) these informations are hidden.
The site is https://farmaci.agenziafarmaco.gov.it/bancadatifarmaci/cerca-farmaco and try to search something(i.e. Tachi)
The problem with Jsoup is that it won't handle scripts. It is just getting html as it is before the AJAX code is executed.
You can use something like HtmlUnit, which is basically a GUI-less browser. So, it can handle scripts.
You can try something like this after getting the HtmlUnit library:
String url = "https://farmaci.agenziafarmaco.gov.it/bancadatifarmaci/cerca-farmaco?search=Tachi";
try(final WebClient webClient = new WebClient()) {
final HtmlPage page = webClient.getPage(url);
final HtmlUnorderedList list = page.getHtmlElementById("ul_farm_results");
System.out.println(list.asText());
}
I couldn't check the code as the website's certificate is improperly configured and I didn't want to import it's certificate. You may want to take a look at this to resolve the certificate errors.
JSoup does not execute all the scripts, it just gets the HTML returned by the server. What you are looking for is call rendered HTML, that is the HTML produced by the browser after executing all the scripts.
The best solution in Java is to use Selenium with your preferred browser. Selenium was developed for UI testing, it is however very popular as a scraping tool.
A good getting started page is to be found here.
Some code example with Firefox:
WebDriver driver = new FirefoxDriver();
driver.get("https://farmaci.agenziafarmaco.gov.it/bancadatifarmaci/cerca-farmaco");
// Find the element
String id = "ul_farm_results";
WebElement element = driver.findElement(By.id(id));
I am trying to write some UI test cases for an SAP-webUI (web based) application. After login, it shows the dashboard ( Workcenter's ) screen.
Now the problem is, I am able to open the page, enter U/N, Pwd and login through Selenium. After i press the "Login" button the URL changes and the page got redirected/refreshed.
E.g. URL before login : https://a/b/c/d/e/f/g.htm?sap-client=001&sap-sessioncmd=open
E.g. URL after successful Login : https://a/b(bDsdfsdsf1lg==)/c/d/e/f/g.htm
After this im unable to perform any action or press any link in any part of the page. I tried with all the possible attributes ( css, xpath, id ). Webdriver was unable to find any element on the page. It shows the error "No element found" alone.
I am using java with Selenium Web Driver.
Please find the html structure of the webpage below
<html><body><div><div><iframe>#document<html><head></head><frameset><frameset><frame>#document<html><head></head><body><form><div><div><table><tbody><tr><td><div><ul><li><a id=abcdef></a></li></ul></div></td></tr></tbody></table></div></div></form></body></html></frame></frameset></frameset></html></iframe></div></div></body></html>
Actually i want to click a linkmenu "abcd", which is inside iframe and frame as shown in the below HTML code
<html><head></head><body><iframe name=a1><html><head></head><frameset><frameset name=fs1><frame name=f1><html><head></head><body><table><tbody><tr><td><ul><li><a id=abcdef>
I tried the below code as well.
driver.switchTo().frame("a1");
driver.findElement(By.id("abcd")).click();
OR
driver.findElement(By.xpath("//*[#id='abcd']")).click();
After using the above code, still im getting the error "No such element"
Kindly suggest
Regards,
Siva
Do it this way...
driver.switchTo().frame(driver.findElement(By.xpath("//iframe[#name='a1']"))); // switching to iframe
followed by
driver.switchTo().frame("f1"); // switch to frame
and then your desired action...
driver.findElement(By.id("abcd")).click();
This is because of the iframe. You need to switch to it first:
driver.switchTo().frame(0);
driver.findElement(By.id("abcdef")).click();
where 0 is a frame index.
See doc on implicit wait here
I guess you should do a implicit wait until your chosen element is available
modify these code to suit your chosen element:
WebDriverWait wait = new WebDriverWait(driver, 10);
WebElement element = wait.until(ExpectedConditions.elementToBeClickable(By.id("someid")));
I'm trying to get page content which is dynamically loaded after you get to the end of the page but after use sendKeys(Keys.END) page content seems to be the same.
Any chance to handle it and get new reloaded source? I'm using PhantomJS in Java.
The basic idea is to explicitly wait for a presence or visibility of an element loaded dynamically - then, call the getPageSource() method.
For example, if there is a div with a "additionalContent" class dynamically loaded, here is how you can wait for it to appear:
WebDriverWait wait = new WebDriverWait(driver, 10);
wait.until(ExpectedConditions.visibilityOfElementLocated(By.cssSelector("div.additionalContent")));
// get the page source
System.out.println(driver.getPageSource());
I'm trying to use Selenium (Java) to automate some searches.
When I go to the page and click Inspect Element on the search bar, I see that it has the id searchBox and name q, both of which are potentially useful. Nevertheless, these properties are nowhere to be found in the HTML when I go to View Source.
When I try to use
WebElement search = driver.findElement(By.id("searchBox"));
or
WebElement search = driver.findElement(By.name("q"));
both come back as unable to be found.
How do I proceed with populating the search field then hitting submit (the submit button also is missing in the source page) if I can't find either of these elements?
EDIT:
As requested, here's a more detailed picture of what's wrong:
The URL of the page accessed by WebDriver is http://www.ycharts.com using the line
driver.get("http://www.ycharts.com/login");
If you go to this page with your actual web browser, right click on the search bar and choose Inspect Element, you'll see that its ID is "searchBox" and its name is "q". However, if you go to view the same page's source, you'll see that there's no such element in the HTML. Clearly, this is why WebDriver can't find it.
driver was initiated as follows:
WebDriver driver = new HtmlUnitDriver();
When I try something like
WebElement search = driver.findElement(By.id("searchBox"));`
The exception returned is:
Exception in thread "main" org.openqa.selenium.NoSuchElementException: Unable to locate element with ID: searchBox
So, back to the original question: clearly, the element is there, but it's not in the HTML – how do you interact with it?
The problem is being caused by the fact that the search box is added to the html post page load by javascript.
The HTML returned by http://ycharts.com/ does not contain the searchbox, therefore when Selenium thinks the page has finished loading (i.e DOM ready state), there is no search box.
In order to interact with the search box, you need to tell Selenium to wait until the box appears in the DOM.
The easiest way to achieve this is with WebDriverWait;
WebDriverWait wait = new WebDriverWait(webDriver, timeoutInSeconds);
wait.until(ExpectedConditions.visibilityOfElementLocated(By.id<locator>));