I have my webpage opened using RFT. In that page, I have a link I want to click.
For that I am using
objMap.ClickTabLink(objMap.document_eBenefitsHome(), "Upload Documentation", "Upload Documentation");
The current page link name is "Upload Documentation"
I know that objMap.document_eBenefitsHome() takes it back to the initial page, what can I use in that place which uses the "current page opened" ?
Many thanks in advance.
There are some alternatives that could solve your problem:
Open the Test Object Map; select from the map the object that represents the document document_eBenefitsHome; modify the .url property using regular expression, so that the URLs of the two pages you cited in your question match the regex.
Find dinamically the document object using the find method. Once the page containing the link you want to click was fully loaded, try to use this code to find the document: find(atDescendant(".class", "Html.HtmlDocument"), false). The false boolean value allow the find method to search also among object that are not previously recorded.
Related
I am working on an app in Android Studio and am having some trouble web-scraping with JSoup. I have successfully connected to the webpage and returned some basic elements to test the library, but now I cannot actually get the elements I need for my app.
I am trying to get a number of elements with the "data-at" attribute. The weird thing is, a few elements with the "data-at" attribute are returned, but not the ones I am looking for. For whatever reason my code is not extracting all of the elements that share the "data-at" attribute on the web page.
This is the URL of the webpage I am scraping:
https://express.liatoyotaofcolonie.com/inventory?f=dealer.name%3ALia%20Toyota%20of%20Colonie&f=submodel%3ACamry&f=trim%3ALE&f=year%3A2020
The method containing the web-scraping code:
#Override
protected String doInBackground(Void... params) {
String title = "";
Document doc;
Log.d(TAG, queryString.toString());
try {
doc = Jsoup.connect(queryString.toString()).get();
Elements content = doc.select("[data-at]");
for (Element e: content) {
Log.d(TAG, e.text());
}
} catch (IOException e) {
Log.e(TAG, e.toString());
}
return title;
}
The results in Logcat
The element I want to retrieve
One of the elements that is actually being retrieved
This is because some of the content - including the one you are looking for - is created asyncronously and is not present in initial DOM (Javascript ;))
When you view the source of the page you will notice that there is only 17 data-at occurences, while running document.querySelector("[data-at]") 29 nodes are returned.
What you are able to get in the JSoup is static content of the page (initial DOM). You wont be able to fetch dynamically created content as you do not run required JS scripts.
In order to overcome this, you will have to either fetch and parse required resources manually (eg trace what AJAX calls are made by the browser) or use headless browser setup. Selenium + headless Chrome should be enough.
Letter option will allow you to scrape ANY posible web application, including SPA apps, which is not possible using plaing Jsoup.
I don't quite know what to do about this, but I'm going to try one more time... The "Problematic Lines" in your code are these:
doc = Jsoup.connect(queryString.toString()).get();
Elements content = doc.select("[data-at]");
It is the queryString that you have requested - the URL points to a page that contains quite a bit of script code. When you load up a browser and click the button (or menu-option) that reads: "View Source", the HTML you see is not the same exact HTML that is broadcast to and received by JSoup.
If the HTML that is broadcast contains any <SCRIPT TYPE="text/javascript"> ... </SCRIPT> in it (and the named URL in your question does), AND those <SCRIPT> tags are involved in the initial loading of the page, then JSoup will not know anything about it... It only parses what it receives, it cannot process any dynamic content.
There are four ways that I know of to get the "Post Script Loaded" version of the HTML from a dynamic web-page, and I will type them here, now. The first is likely the most popular method (in Java) that I have heard about on Stack Overflow:
Selenium This Answer will show how the tool can run Java-Script. These are some Selenium Docs. And then there is this page right here has a great "first class" for using the tool to retrieve post-script processed HTML. Again, there is no way JSoup can retrieve HTML that is sent to the browser by script (JS/AJAX/Angular/React) since it just a parser.
Puppeteer This requires running a language called Node.js Perhaps calling a simple Node.js program from Java could work, but it would be a "Two Language" solution. I've never used it. Here is an answer that shows getting, sort of, what you are trying to get... The HTML after the script.
WebView Android Java Programmers have a popular class called "WebView" (documented here), that I have recently been told about (yesterday ... but it has been out for years) that will execute script in a browser, and return the HTML. Here is an answer that shows "JavaScript Injection" to retrieve DOM Tree elements from a "WebView" instance (which is how I was told it was done)
Splash My favorite tool, which I don't think anyone has heard of, but has been the simplest for me... So there is an A.P.I. called the "Splash API". Here is their explanation for a "Java-Script Rendering Service." Since this one I have been using... I'll post a code snippet that shows how "Splash Tool" can retrieve post-script processed HTML below.
To run the Splash API (only if you have access to the docker loading program) ... You start a Splash Server as below. These two lines are typed into a GCP (Google Cloud Platform) Shell instance, and the server starts right up without any configurations:
Pull the image:
$ sudo docker pull scrapinghub/splash
Start the container:
$ sudo docker run -it -p 8050:8050 --rm scrapinghub/splash
In your code, just prepend the String to your URL's:
"http://localhost:8050/render.html?url="
So in your code, you would use the following command (instead), and the script would (more likely) load all the HTML Elements that you are not finding:
String SPLASH_URL = "http://localhost:8050/render.html?url=";
doc = Jsoup.connect(SPLASH_URL + queryString.toString()).get();
I'm trying to determine the element of the below image/icon.
Note: Other icons have the same //div[#class='infor-collapsed-icon-img' so i think i need another unique id to identify the exact element below. ID is dynamic btw
Here's what i tried so far by using xpath:
1.) //div[#class='infor-collapsed-icon-img' and contains(#title,'Print Manager - Print Manager webpart allows the Lawson workspace user to contextually filter the print files of batch Jobs.')]
2.) //img[#title='Print Manager - Print Manager webpart allows the Lawson workspace user to contextually filter the print files of batch Jobs.']
3.) //img[contains(#title,'Print Manager - Print Manager webpart allows the Lawson workspace user to contextually filter the print files of batch Jobs.')]
Any thoughts on this? thanks
Try this XPath. First select the div and then the img tag within it.
"//div[#class='infor-collapsed-icon-img']/img"
EDIT 1: If you want to fetch a specific image then you can fetch it by using the id attribute of the tag
"//img[#id='img_WebPartTitlect100_m_g_f26cdbcd_963c_46f4_94b1_c6a4fd7a9aab']"
Or by the index of its occurrence in sequence. (I'd recommend this one since it is much cleaner)
"(//div[#class='infor-collapsed-icon-img']/img)[1]"
EDIT 2: Try using contains() to match the text partially.
"//div[#class='infor-collapsed-icon-img']/img[contains(#title, 'Print Manager')]"
If you want to 1st image
//div[#class='infor-collapsed-icon-img']/img[1]
If you want to 2nd image
//div[#class='infor-collapsed-icon-img']/img[2]
Hope it will help you :)
You can find the element by ID
driver.findElement(By.id("imgId"));
Id's are unique, so you will have the specific element.
In your case img_WebPartTitlect100..., look for the id attribute after src attribute.
Edit :
You can also try
driver.findElement(By.cssSelector("[title*='Print Manager']"));
That will give you element with has title which contains "Print Manager".
Try this:
driver.findElement(By.Xpath("//img[#src='your source']");
Try this:
//table[#class='infor-collapsed-pane']/tr[1]/td//img
I am trying to retrieve a JSON element, the problem is that in the source code it doesn't exist, but I can find it via inspect element.
I have tried with
C.driver.findElement(By.id("ticket-parsed"))
and via XPath
C.driver.findElement(By.xpath("//*[#id=\"ticket_parsed\"]"));
and I can't find it.
Also
C.driver.switchTo().frame("html5-frame");
System.out.println(C.driver.findElement(By.id("ticket_parsed")));
C.driver.switchTo().defaultContent();
i get
[[ChromeDriver: chrome on XP (1f75e50635f9dd5b9535a149a027a447)] -> id: ticket_parsed]
on
driver.switchTo().frame(0) or driver.switchTo().frame(1)
i get that the frame doesn't exists
and at last i tried
WebElement frame = C.driver.findElement(By.id("html5-frame"));
C.driver.switchTo().frame(frame.getAttribute("ticket_parsed"));
an i got a null pointer exception
Here's an image of the source:
what am I doing wrong?
Well!
The element #ticket-parsed is in iFrame. So, you can click it without getting into an iframe.
Here is the code to switch to iFrame,
driver.switchTo().frame("frame_name");
or
driver.switchTo().frame(frame_index);
In your case,
driver.switchTo().frame("html5-frame");
After switching into the iframe, you can click that element using either XPath or CSS.
C.driver.findElement(By.id("ticket-parsed"))
NOTE:
After completing the operation inside the iframe, you have to again return back to the main window using the following command.
driver.switchTo().defaultContent();
I didn't found a solution with my excising setup,but i did found a js command which gets the object correctly
document.getElementById("html5-frame").contentDocument.getElementById("ticket_parsed")
you can integrate js commands like this
JavascriptExecutor js=(JavascriptExecutor)driver;
js.executeScript(*yourCommandHere*);
if you want to get the output of the command just add the word return before your command (in this specific situation it didn't work but in any other situation it did)
*TypeOfData* foo = js.executeScript(return *yourCommandHere*);
at last because of limited time i had to use unorthodox methods like taking screenshots and comparing the images if they are exactly the same
Thanks for the help
I am using Selenium web driver. I have below method to navigate to page.
public String navigate(String url){
driver = new FirefoxDriver();
driver.get(url);
return "Success";
}
Above code works fine if the server is up. some times server might be down then the page will not be loaded. Now how can I return "failure" string if the page is not loaded?
Thanks!
You can't directly test that a get() failed because the navigator always displays a page. You can either check that this page is a known error page, or check that you are not in the expected page.
First solution
It depends on the navigator. Chrome displays a special page when it can't find an url, firefox another page, etc.. You can test the title of those pages. For example firefox error page title is something like "Page load error" or "Problem loading page". Then all you have to do is something like :
if(driver.getTitle().equals("Problem loading page"))
return "failure";
Second solution
You must check the non-existence of an element that is present in every pages of your website (for example a logo or a home button). Say the ID of this element is "foo", you can do something like :
if(driver().findElements(By.id("foo")).isEmpty())
return "failure";
Dave Haeffner has a good solution for checking status codes using a proxy with the webdriver configuration.
http://elementalselenium.com/tips/17-retrieve-http-status-codes
The examples are in python, but the API is pretty close between python and java. I've not had much difficulty finding the java-analagous methods from the tips I've implemented myself.
That site has a lot of good information.
If using the Page Object Model, leveraging the LoadableComponentClass can help in determining whether the page is loaded or not either as a result of server down or something else.
Here's the link
https://code.google.com/p/selenium/wiki/LoadableComponent
I developed a shopsystem. there is a product page, which lists the available items filtered by some select menus. there is also one item detail page to view some content about each product. the content of that page will be loaded out of an xml property file. if one would click the link in the listview of an item, to view some details, an item specific GET parameter is set. with the parameters value, i can dynamically load the content for that specific item from my properties, by altering the loaded keys name.
so far so good, but not really good. so much to the backgroud. lets get to some details.
most of all, this is some SEO motivated stuff. so far there is also a problem with the pageinstance Id in the url for statefull pages, not only because of the nonstable url, also because wicket is doing 302 redirects to manipulate the url. maybe I will remove the statefull components of the item detailpage to solve that problem.
so now there are some QR-code on the products being sold, that contain a link to my detail page. these links are not designed by myself and as you can imagine, they look a whole lot of different like the actual url. lets say the QR-code url path would be "/shop/item1" where item1 would be the product name. my page class would be ItemDetailPage .
I wrote an IRequestMapper that I am mounting in my WebApplication#init() that is resolving the incoming requests URL and checks wether it needs to be resolved by this IRequestMapper. If so, I build my page with PageProvider and return a requesthandler for it.
public IRequestHandler mapRequest(Request request) {
if(compatibilityScore>0) {
PageProvider provider = new PageProvider(ItemDetailPage.class, new ItemIDUrlParam(request.getUrl().getPath().split("/")[1]));
provider.setPageSource(Application.get().getMapperContext());
return new RenderPageRequestHandler(provider);
}
return null;
}
So as you can see, I build up a parameter that my detailpage can handle. But the resulting URL is not very nice. I'd like to keep the original url by mapping the bookmarkable content to it, without any redirect.
My first thought was to implement an URLCodingStrategy to rebuild the URL with its parameters in the form of a path. I think the HybridUrlCodingStrategy is doing something like that.
After resolving the URL path "/shop/item1/" with the IRequestMapper it would look like "/shop/item?1?id=item1" where the first parameter off course is the wicket pageinstance Id, which will most likely be removed as I will rebuild the detail page to be stateless :(
after applying an HybridURLCodingStrategy it might look like "/shop/item/1/id/item1" or "/shop/item/id/item1" without pageinstance Id. another Idea would be to remove the second path part and the parameter name and only use the parameters value so the url would look like "/shop/item1" which is then the same url as it was in the request.
Do you guys have any experience with that or any smart ideas?
The rewuirements are
Having one fix URL for each product the SE bot can index
no parameters
stateless and bookmarkable
no 302 redirects in any way.
the identity of the requested item must be available for the detailpage
with kind regards from germany
Marcel
As Bert stated, your use case should be covered with normal page mounting, see also the MountedMapper wiki page, for your case a concrete example:
mountPage("/shop/${id}", ShopDetailPage.class);
Given that "item1" is the ID of the item (which is not very clear to me), you can retrieve it now as the named page parameter id in Wicket. Another example often seen in SEO links, containing both the unique ID and the (non-unique, changing) title:
mountPage("/shop/${id}/${title}", ShopDetailPage.class);
Regarding the page instance ID, there are some ways to get rid of it, perhaps the best way is to make the page stateless as you said, another easy way is to configure IRequestCycleSettings.RenderStrategy.ONE_PASS_RENDER as the render strategy (see API doc for consequences).