One of the ways WebDriver identifies itself as a bot to external websites is by setting the webdriver-active flag to true.
A user on SO suggested that it is possible modify Chrome Driver source code to remove all bot-identifying attributes (see this and this response).
Is it possible to achieve a similar outcome w/ Firefox by modifying the source code of Geckodriver, Firefox WebDriver or perhaps both? I'm asking because there is currently no way to conceal WebDriver using Firefox Options without source code modification.
If we can somehow remove bot identifying features from the source code, we can prevent WebDriver from being identified as a bot without needing to bundle TOR with Firefox.
While there's no getting around the fact that Selenium (in its present state) identifies itself, surely we can modify source code to remove all identification similar to how it's achieved in Chrome Driver?
In the discussion Can a website detect when you are using Selenium with chromedriver? as suggested by different users to open the ChromeDriver in a Hex Editor and edit the document variables replacing the cdc_ and $wdc_ string might be possible, but achiving the same with GeckoDriver may not be possible.
Moreover, the commands like execute_cdp_cmd() and Python libraries like selenium-stealth may not be currently supported by GeckoDriver.
The GeckoDriver source code can be easily downloaded from mozilla / geckodriver page both in zip and tar.gz format. If you are on windows system you can unzip the downloaded file and find the the source code of different modules in the ...\geckodriver-0.30.0\src directory:
Additionally, geckodriver is made available under the Mozilla Public License. GeckoDriver source code can also be found in mozilla-central under testing/geckodriver.
WebDriver Specifications
Now as per WebDriver W3C Editor's Draft:
The webdriver-active flag is set to true when the user agent is under remote control. It is initially false.
So there can be two possible ways to keep webdriver flag as false as:
Remove the readonly attribute, so can be edited runtime. (as discussed in this answer)
Strangle the WebDriver from emitting the signals that the user agent is under remote control.
To me the second option looks pretty much viable as the most frequently updated tier is the second tier (Selenium WebDriver.dll and WebDriver.Support.dll modules). Since App Studio uses C# and .Net version 4.0 (before Selenium 4.1.0 (November 22, 2021)) to communicate with Selenium, you need to download the .Net 4.0 version of the Selenium modules. The current stable version being 4.1.0. Once the zip file is downloaded, extract the content to a folder and navigate to the net40 subfolder.
Now, you can copy the WebDriver.dll and WebDriver.Support.dll files to the bin folder of the App Studio installation. e.g, C:\ibi\AppStudio82\bin and make the required changes.
As an alternative, you can also download the NuGet, copy the .Net 4.0 content of the NuGet package into the bin folder of the App Studio installation and make the required changes.
tl; dr
Building geckodriver
Testing geckodriver
Related
I am trying to use Headless feature of the Chrome to convert a html to pdf. However, i am not getting output at all. Console doesn't show any error as well. I am running below commands in my windows m/c.
chrome --headless --disable-gpu --print-to-pdf
I tried all the various options. Nothing is being generated. I am having chrome version 60
Command Line --print-to-pdf
By default, --print-to-pdf attempts to create a PDF in the User Directory. By default, that user directory is where the actual chrome binary is stored, which is the specific version folder for the version you're running - for example, "C:\Program Files (x86)\Google\Chrome\Application\61.0.3163.100". And, by default... Chrome is not allowed to write to this folder. You can watch it try, and fail, by adding --enable-logging to your command.
So unfortunately, by default, this command fails.*
You can solve this by either providing a path in the argument, where Chrome can write - like
--print-to-pdf="C:\Users\Jane\test.pdf"
Or, you can change the User Directory:
--user-data-dir="C:\Users\Jane"
One reason you might prefer to change the User Directory is if you want the PDF to automatically receive its name from the webpage; Chrome looks at the title tag and then dumps it like <title>My Page</title> => My-Page.pdf
*I think this default behavior is super confusing, and should be filed as a bug against Chrome. However, apparently part of the Chrome team is outright opposed to the mere existence of this command line option, and instead believe it would be better to force everyone using it to get a node.js build going with Puppeteer and the flag removed outright.
Limitations of Command Line on Windows
Invoking chrome in this way will work fine for example in a local dev env on IIS Express with Visual Studio, but it will fail, even in headless mode, on a server running IIS, because IIS users are not given interactive/desktop permissions, and the way chrome grabs this PDF actually requires interactive/desktop permissions. There are complicated ways to provide those permissions, but anyplace you read up on how begins with DON'T PROVIDE INTERACTIVE/DESKTOP PERMISSIONS. Further, the above risk of Chrome one day getting rid of the command-line makes working even harder to get it working an iffy proposition.
Alternatives to chrome command line
wkhtmltopdf
Behind the scenes Chrome simply uses wkhtmltopdf. I haven't tried it but it's likely this will get the job done. The one minor risk is that when producing PDFs in Chrome, testing is obvious: View the page in Chrome. Open Print Preview if you're nervous. In wkhtmltopdf, it's actually a different build of Chromium, and that may produce rendering differences. Maybe.
Selenium
Another alternative is to get ahead of the group looking to get rid of --print-to-pdf and use the browser dev API (via Selenium) as they prefer.**
private static void pdfSeleniumImpl(string url, string pdfPath)
{
var options = new OpenQA.Selenium.Chrome.ChromeOptions();
options.AddArgument("headless");
using (var chrome = new OpenQA.Selenium.Chrome.ChromeDriver(options))
{
chrome.Url = url;
var printToPdfOpts = new Dictionary<string, object>();
var resultDict = (Dictionary<string, object>)
chrome.ExecuteChromeCommandWithResult(
"Page.printToPDF", printToPdfOpts);
dynamic result = new DDict(resultDict);
string data = result.data;
var pdfFile = Convert.FromBase64String(data);
System.IO.File.WriteAllBytes(pdfPath, pdfFile);
}
}
The DDict above is the GracefulDynamicDictionary from another of my answers.
https://www.nuget.org/packages/GracefulDynamicDictionary/
https://github.com/b9chris/GracefulDynamicDictionary
https://stackoverflow.com/a/24192518/176877
Ideally this would be async, since all the calls to Selenium are actually network commands, and writing that file could take a lot of Disk IO. The data returned from Chrome is actually a Stream as well. However Selenium's conventionally used library does not use async at all unfortunately, so it would take upgrading that library or identifying a solid async Selenium library for .Net to really do this right.
https://github.com/puppeteer/puppeteer/blob/master/lib/Page.js#L1007
https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF
**The Page.pdf chrome Dev API command is also deprecated, so if that contingent gets their way, neither the command line nor the Dev API will work. That said it looks like those lobbying to wreck it gave up 2 years ago.
This is working:
chrome --headless --disable-gpu --print-to-pdf=file1.pdf https://www.google.co.in/
creates file in the folder: C:\Program Files (x86)\Google\Chrome\Application\61.0.3163.100.
Do not forget to open your terminal/cmd with admin rights :) Otherwise it will just not save the file at all.
I was missing "=" after print-to-pdf command.
The correct command is:
chrome --headless --disable-gpu --print-to-pdf="C:/temp/name.pdf" https://www.google.com/
Now it is working.
extending the brilliantly simple answer by suraj, I created a small function that is in my sourced path so it works like a CLI tool:
function webtopdf(){
chromium-browser --headless --disable-gpu --print-to-pdf=$2 $1
}
so a quick
webtopdf https://goo.com/some-article some-article.pdf
does the job for me now
This worked for me in windows
start chrome --headless --disable-gpu
--print-to-pdf=C:\Users\username\pdfs\chrome.pdf --no-margins https://www.google.com
Currently, this is only available for Linux and Mac OS.
The FileDownloader class provided on the question below worked fine until I upgraded to selenium 2.46:
Programmatically downloading a file using Selenium in Java
When I run the same test with selenium 2.46, I now get redirected to the login page. Did anyone else face this issue?
The Selenide project has a great, really well thought out, download helper in it. I would investigate that, either as an example, or possibly actually using it.
$.download()
My corporate web application is using Java applet to access users file system. There is no way for us to replace it with anything else for now.
How do I enable Java in Microsoft Edge?
As other folks have mentioned, Java, ActiveX, Silverlight, Browser Helper Objects (BHOs) and other plugins are not supported in Microsoft Edge. Most modern browsers are moving away from plugins and toward standard HTML5 controls and technologies.
If you must continue to use the Java plugin in a corporate web app, consider adding the site to an Enterprise Mode site list. This will automatically prompt the user to open in IE.
You cannot open Java Applets (nor any other NPAPI plugin) in Microsoft Edge - they aren't supported and won't be added in the future.
Further you should be aware that in the next release of Google Chrome (v45 - due September 2015) NPAPI plugins will also no longer be supported.
Work-arounds
There are a couple of things that you can do:
Use Internet Explorer 11
You will find that in Windows 10 you will already have Internet Explorer 11 installed. IE 11 continues to support NPAPI (incl Java Applets).
IE11 is squirrelled away (c:\program files\internet explorer\iexplore.exe). Just pin this exe to your task bar for easy access.
Use FireFox
You can also install and use a Firefox 32-bit Extended Support Release in Win10. Firefox have disabled NPAPI by default, but this can be overridden. This will only be supported until early 2018.
Edge has dropped all support for plugins. This means that Java, ActiveX, Silverlight, and other plugins are no longer supported. For this reason Microsoft has included Internet Explorer 11, which does support these plugins, with non-mobile versions of Windows 10. If you are running Windows 10 and need plugin support Edge is not an option, but IE 11 is.
About this, java declares that on Windows 10, Edge browser does not support plugins, so it will NOT run java.
(see https://www.java.com/it/download/win10.jsp --> only visible with edge in win10)
It also reports a notice: java is not officially supported yet in Windows 10.
(see https://www.java.com/it/download/faq/win10_faq.xml)
IE11 do accept Java according to the link below :
http://windows.microsoft.com/en-us/internet-explorer/install-java#ie=ie-11
And firefox also intended to remove NPAPI by the end of 2016 according to :
https://blog.mozilla.org/futurereleases/2015/10/08/npapi-plugins-in-firefox/
That Java Applets are not working in modern browsers is known but there is a quick workaround which is activate the Microsoft Compatibility Mode. This mode can be activated in your Edge browser and you can select to open the pages on the IE compatibility mode, and in this fashion the ActivX and Java and so works as in IE11.
M Edge in IE mode supports the following Internet Explorer functionality:
All document modes and enterprise modes
ActiveX controls (such as Java or Silverlight)
As refs:
https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RWEHMs
I hope you are doing well.
You can download add edge extension on MS edge browser that will allow you to run java applet.
You can try the extension called : CheerpJ Applet Runner
I've recently been learning Processing, a sort of Java based visual language. It has a feature to export sketches/scripts as html documents and open them in a browser and run them with a java applet. However, when I try to open them (on a Mac OS X 10.5.8), it redirects me to the Java page telling me that Apple supplies its own version. I checked for software updates and tried downloading another version of Java to no avail. Also, I checked on a website to see if Firefox had Java, but it said it was disabled, despite my preferences having Javascript checked off.
Any help? Thanks.
You're running an unsupported version of Java, you need to update to 10.6 or higher to get the latest version. https://discussions.apple.com/thread/3995956?start=0&tstart=0
This belongs on https://apple.stackexchange.com/ as well.
I'm creating an extension for Google Chrome, so any code has to be compatible with Chrome and Chrome only. In this extension, I need the user to select a folder from his local machine. This simple task is becoming quite a problem. The chrome extensions options page will not run applets, so I couldn't really do Java. It's Google Chrome only so an ActiveX object is out of the question as well. I just need a simple way of selecting a folder(not a file) and passing its path to Javascript. Might this be possible in Flash Actionscript? It seems FileReference and FileReferenceList classes in AS only allow you to choose a file, and not a folder. Is there another possibility besides Flash? All the options page files DO rest on the local users machine, so it's not server side.
Thank you for your time.
You can use the webkit-directory attribute on your element to select directories and get the same sort of result as from the "multiple" attribute.
A demo of this: http://www.thecssninja.com/demo/webkitdirectory/
The chromium bug: http://crbug.com/58977