How to safely use Google translate in Selenium - java

I am into a project which requires translating text from different languages to English . In a day , I would have to translate nearly 5000 documents.I have written a small selenium code that would help me translate these documents.
Now my question is that if i use Selenium for translating a huge data from google translate ,will I be blocked by Google . If yes , what is the solution to avoid being blocked by Google Translate ?
I have posted my code below for reference :
public static WebDriver google_translate(WebDriver driver,String filename)
{
driver.manage().timeouts().implicitlyWait(5
, TimeUnit.SECONDS);
try{
driver.get("http://translate.google.com/#auto/en");
String text="";
text=read_contents.read_from_html(filename);
if(text.length()<5)
return driver ;
// Enter the query string "Cheese"
System.out.println("file read");
WebElement query = driver.findElement(By.id("source"));
query.sendKeys(text);
WebElement query1 = driver.findElement(By.id("gt-submit"));
query1.click();
System.out.println("text entered");
Date d=new Date();
long intial=d.getTime();
WebElement result;
do{
result = driver.findElement(By.id("result_box"));
d=new Date();
}while(result.getText().length()<20 && (d.getTime()-intial<15000) );
System.out.println("result fetched");
String output=Global.prop.get(1).toString()+"/"+new File(filename).getName()+".txt";
output_writer.txt_writer(result.getText(),output);
}
catch(UnhandledAlertException e)
{
e.printStackTrace();
}
catch(NoSuchElementException e)
{
e.printStackTrace();
}
catch(UnknownServerException e)
{
e.printStackTrace();
}
//System.out.println(result.getText());
return driver ;

Related

Java Selenium + 2Captcha + Submit Form

Hello i am trying to automate some process here . i am using 2captch to solve captcha , please check out image .
I have got site_key and api_key , now i am sending api_key + site_key and it is returning me response_token, i have added returned response token into g-recaptcha-response but it is not submitting form.
what i want is that : i can solve captcha and submit form .
Here is my current java code :
System.setProperty("webdriver.chrome.driver", "chromedriver.exe");
ChromeDriver driver;
driver = new ChromeDriver();
driver.manage().deleteAllCookies();
driver.manage().window().maximize();
driver.get("https://id.sonyentertainmentnetwork.com/signin/?client_id=fe1fdbfa-f1a1-47ac-b793-e648fba25e86&redirect_uri=https://secure.eu.playstation.com/psnauth/PSNOAUTHResponse/pdc/&service_entity=urn:service-entity:psn&response_type=code&scope=psn:s2s&ui=pr&service_logo=ps&request_locale=en_GB&error=login_required&error_code=4165&error_description=User+is+not+authenticated&no_captcha=false#/signin?entry=%2Fsignin");
Thread.sleep(5000);
driver.findElement(By.xpath("//input[#title='Sign-In ID (Email Address)']")).sendKeys("email");
Thread.sleep(2000);
driver.findElement(By.xpath("//input[#title='Password']")).sendKeys("password");
Thread.sleep(2000);
driver.findElement(By.xpath("//button[#class='primary-button row-button text-button touch-feedback']")).click();
Thread.sleep(3000);
By captcha = By.xpath("//iframe[#title='recaptcha challenge']");
String src = driver.findElement(captcha).getAttribute("src");
String key = getKey(src);
System.out.println(key);
String apiKey = "API_KEY";
String googleKey = key;
String pageUrl = "https://id.sonyentertainmentnetwork.com/signin/?client_id=fe1fdbfa-f1a1-47ac-b793-e648fba25e86&redirect_uri=https://secure.eu.playstation.com/psnauth/PSNOAUTHResponse/pdc/&service_entity=urn:service-entity:psn&response_type=code&scope=psn:s2s&ui=pr&service_logo=ps&request_locale=en_GB&error=login_required&error_code=4165&error_description=User+is+not+authenticated&no_captcha=false#/signin?entry=%2Fsignin";
String proxyIp = "183.38.231.131";
String proxyPort = "8888";
String proxyUser = "username";
String proxyPw = "password";
TwoCaptchaService service = new TwoCaptchaService(apiKey, googleKey, pageUrl, proxyIp, proxyPort, proxyUser, proxyPw, ProxyType.HTTP);
try {
String responseToken = service.solveCaptcha();
System.out.println("The response token is: " + responseToken);
JavascriptExecutor js = (JavascriptExecutor) driver;
js.executeScript("document.getElementById(\"g-recaptcha-response\").innerHTML = \'"+responseToken+"\';");
} catch (InterruptedException e) {
System.out.println("ERROR case 1");
e.printStackTrace();
} catch (IOException e) {
System.out.println("ERROR case 2");
e.printStackTrace();
}
UPDATED CODE :
js.executeScript("document.getElementById(\"g-recaptcha-response\").innerHTML = \'" + responseToken + "\';");
Thread.sleep(500);
WebElement frameElement = driver.findElement(captcha);
driver.switchTo().frame(frameElement);
js.executeScript("document.getElementById('recaptcha-verify-button').click();");
it is clicking on button but , it shows Please select all matching images.
. please check out screenshot
All you need to do is submit it like this:
js.executeScript("document.getElementById('g-recaptcha-response').innerHTML='" + responseToken + "';");
Thread.sleep(500);
js.executeScript("document.getElementById('captcha-form').submit();");
also don't forget to check this ID : "captcha-form", it can be different
To reach to element "recaptcha-verify-button":
After you got the response from the API;
By frame = By.xpath("//iframe[#title='recaptcha challenge']");
WebElement frameElement = driver.findElement(frame);
driver.switchTo.frame(frameElement);
then you can execute your script. Finally, for your script if your captcha form is a button
you
cannot call submit();
you
can call click();
Final Answer:
Also check this: js.executeScript("widgetVerified('TOKEN');");
To find the function called widgetVerified() please run this code in the console.
___grecaptcha_cfg.clients[0]
this will return a json, inside of that json try to find the callback function in #Awais case it was wigdetVerified(e)
Warn : Don't use any adblocker

Stale Object Reference while Navigation using Selenium

I have been trying a simple program that navigates and fetches data from the new page, comes back in history and open other page and fetch data and so on until all the links have been visited and data is fetched.
After getting results on the below site, i am trying to loop through all the links i get in the first column and open those links one by one and extract text from each of these page. But the below program only visits first link and gives StaleElementReferenceException, I have tried using Actions but it didn't work and I am not aware about JavascriptExecutor. I also tried solutions posted on other SO questions, one of which was mine over here. I would like to have the mistake corrected in the below code and a working code.
public class Selenium {
private final static String CHROME_DRIVER = "C:\\Selenium\\chromedriver\\chromedriver.exe";
private static WebDriver driver = null;
private static WebDriverWait wait = null;
private void setConnection() {
try {
System.setProperty("webdriver.chrome.driver", CHROME_DRIVER);
driver = ChromeDriver.class.newInstance();
wait = new WebDriverWait(driver, 5);
driver.get("https://sanctionssearch.ofac.treas.gov");
this.search();
} catch (Exception e) {
e.printStackTrace();
}
}
private void search() {
try {
driver.findElement(By.id("ctl00_MainContent_txtLastName")).sendKeys("Dawood");
driver.findElement(By.id("ctl00_MainContent_btnSearch")).click();
this.extractText();
} catch (Exception e) {
e.printStackTrace();
}
}
private void extractText() {
try {
List<WebElement> rows = driver.findElements(By.xpath("//*[#id='gvSearchResults']/tbody/tr"));
List<WebElement> links = null;
for (int i = 1; i <= rows.size(); i++) {
links = driver.findElements(By.xpath("//*[#id='gvSearchResults']/tbody/tr/td[1]/a"));
for (int j = 0; j < links.size(); j++) {
System.out.println(links.get(j).getText() + ", ");
links.get(j).click();
System.out.println("Afte click");
driver.findElement(By.id("ctl00_MainContent_btnBack")).click();
this.search();
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] ar) {
Selenium object = new Selenium();
object.setConnection();
}
}
Generally we will be getting the Stale Exception if the element attributes or something is changed after initiating the webelement. For example, in some cases if user tries to click on the same element on the same page but after page refresh, gets staleelement exception.
To overcome this, we can create the fresh webelement in case if the page is changed or refreshed. Below code can give you some idea.
Example:
webElement element = driver.findElement(by.xpath("//*[#id='StackOverflow']"));
element.click();
//page is refreshed
element.click();//This will obviously throw stale exception
To overcome this, we can store the xpath in some string and use it create a fresh webelement as we go.
String xpath = "//*[#id='StackOverflow']";
driver.findElement(by.xpath(xpath)).click();
//page has been refreshed. Now create a new element and work on it
driver.fineElement(by.xpath(xpath)).click(); //This works
In this case, we are collecting a group of webelements and iterating to get the text. But it seems there is some changes in the webelement after collecting the webelements and gettext throws staleness. We can use a loop and create the element on the go and get text.
for(int i = 0; i<5; i++)
{
String value = driver.findElement(by.xpath("//.....["+i+"]")).getText);
System.out.println(value);
}
Hope this helps you. Thanks.
The reason you get StaleElementReference Exception, is normally because you stored element(s) into some variable, however after that you did some action and page has changed (due to some ajax response) and so your stored element has become stale.
The best solution is not to store element in any variable in such case.
This should work.
links = driver.findElements(By.xpath("//*[#id='gvSearchResults']/tbody/tr/td[1]/a"));
for (int j = 0; j < links.size(); j++) {
System.out.println(links.get(j).getText() + ", ");
driver.findElements(By.xpath("//*[#id='gvSearchResults']/tbody/tr/td[1]/a")).get(j).click();
System.out.println("Afte click");
driver.findElement(By.id("ctl00_MainContent_btnBack")).click();
this.search();
}
Please check this code
private void extractText() {
try {
List<WebElement> rows = driver.findElements(By.xpath("//*[#id='gvSearchResults']/tbody/tr"));
List<WebElement> links = null;
System.out.println(rows.size());
for (int i = 0; i < rows.size(); i++) {
links = driver.findElements(By.xpath("//*[#id='gvSearchResults']/tbody/tr/td[1]/a"));
WebElement ele= links.get(0);
System.out.println(ele.getText() + ", ");
ele.click();
System.out.println("After click");
driver.findElement(By.id("ctl00_MainContent_btnBack")).click();
}
} catch (Exception e) {
e.printStackTrace();
}
}

Java ExecutorService Runnable doesn't update value

I'm using Java to download HTML contents of websites whose URLs are stored in a database. I'd like to put their HTML into database, too.
I'm using Jsoup for this purpose:
public String downloadHTML(String byLink) {
String htmlInPage = "";
try {
Document doc = Jsoup.connect(byLink).get();
htmlInPage = doc.html();
} catch (org.jsoup.UnsupportedMimeTypeException e) {
// process this and some other exceptions
}
return htmlInPage;
}
I'd like to download websites concurrently and use this function:
public void downloadURL(int websiteId, String url,
String categoryName, ExecutorService executorService) {
executorService.submit((Runnable) () -> {
String htmlInPage = downloadHTML(url);
System.out.println("Category: " + categoryName + " " + websiteId + " " + url);
String insertQuery =
"INSERT INTO html_data (website_id, html_contents) VALUES (?,?)";
dbUtils.query(insertQuery, websiteId, htmlInPage);
});
}
dbUtils is my class based on Apache Commons DbUtils. Details are here: http://pastebin.com/iAKXchbQ
And I'm using everything mentioned above in a such way: (List<Object[]> details are explained on pastebin, too)
public static void main(String[] args) {
DbUtils dbUtils = new DbUtils("host", "db", "driver", "user", "pass");
List<String> categoriesList =
Arrays.asList("weapons", "planes", "cooking", "manga");
String sql = "SELECT lw.id, lw.website_url, category_name " +
"FROM list_of_websites AS lw JOIN list_of_categories AS lc " +
"ON lw.category_id = lc.id " +
"where category_name = ? ";
ExecutorService executorService = Executors.newFixedThreadPool(10);
for (String category : categoriesList) {
List<Object[]> sitesInCategory = dbUtils.select(sql, category );
for (Object[] entry : sitesInCategory) {
int websiteId = (int) entry[0];
String url = (String) entry[1];
String categoryName = (String) entry[2];
downloadURL(websiteId, url, categoryName, executorService);
}
}
executorService.shutdown();
}
I'm not sure if this solution is correct but it works. Now I want to modify code to save HTML not from all websites in my database, but only their fixed ammount in each category.
For example, download and save HTML of 50 websites from the "weapons" category, 50 from "planes", etc. I don't think it's necessary to use sql for this purpose: if we select 50 sites per category, it doesn't mean we save them all, because of possibly incorrect syntax and connection problems.
I've tryed to create separate class implementing Runnable with fields: counter and maxWebsitesPerCategory, but these variables aren't updated. Another idea was to create field Map<String,Integer> sitesInCategory instead of counter, put each category as a key there and increment its value until it reaches maxWebsitesPerCategory, but it didn't work, too. Please, help me!
P.S: I'll also be grateful for any recommendations connected with my realization of concurrent downloading (I haven't worked with concurrency in Java before and this is my first attempt)
How about this?
for (String category : categoriesList) {
dbUtils.select(sql, category).stream()
.limit(50)
.forEach(entry -> {
int websiteId = (int) entry[0];
String url = (String) entry[1];
String categoryName = (String) entry[2];
downloadURL(websiteId, url, categoryName, executorService);
});
}
sitesInCategory has been replaced with a stream of at most 50 elements, then your code is run on each entry.
EDIT
In regard to comments. I've gone ahead and restructured a bit, you can modify/implement the content of the methods I've suggested.
public void werk(Queue<Object[]> q, ExecutorService executorService) {
executorService.submit(() -> {
try {
Object[] o = q.remove();
try {
String html = downloadHTML(o); // this takes one of your object arrays and returns the text of an html page
insertIntoDB(html); // this is the code in the latter half of your downloadURL method
}catch (/*narrow exception type indicating download failure*/Exception e) {
werk(q, executorService);
}
}catch (NoSuchElementException e) {}
});
}
^^^ This method does most of the work.
for (String category : categoriesList) {
Queue<Object[]> q = new ConcurrentLinkedQueue<>(dbUtils.select(sql, category));
IntStream.range(0, 50).forEach(i -> werk(q, executorService));
}
^^^ this is the for loop in your main
Now each category tries to download 50 pages, upon failure of downloading a page it moves on and tries to download another page. In this way, you will either download 50 pages or have attempted to download all pages in the category.

twitter4j result.nextQuery() is giving me always null

Hello guys I would ask you why I my code doesn't get me all the tweet I asked for in the query, and it's just stop next the first page result. I'm asking because the same code worked very well just six months ago.
Query query = new Query("Carolina OR flood lang:en since:2015-10-04 until:2015-10-09");
query.setCount(100);
QueryResult result;
createNewFile(contFile);
do {
result = twitterInstance.search(query);
List<Status> tweets = result.getTweets();
for (Status tweet : tweets) {
if (cont > MAX_TWEET_PER_FILE) {
cont = 1;
contFile++;
writer.close();
createNewFile(contFile);
}
writeToFile(cont,tweet);
cont++;
}
if(result.getRateLimitStatus().getRemaining()<1){
try {
Thread.sleep(result.getRateLimitStatus().getSecondsUntilReset() * 1000);
} catch (InterruptedException e) {
e.printStackTrace();
throw new RuntimeException(e);
}
}
} while (query!=null);
writer.flush();
writer.close();
System.exit(0);
So after the first iteration of do, query is always null, and what I gain is only the tweets about few hours of friday. I thought that I'm not wil be able to obtain tweets older than a week, but this is only a day (3 days ago...)
Are there any news or updates from Twitter guys I've missed?
Try using:
do{
...
}while((query = result.nextQuery()) != null);
Your query will get the results from the next page, if it exists.

Fetching articles form Liferay portal

Our goal is to fetch some of the content from Liferay Portal via SOAP services using Java. We are successfully loading articles right now with JournalArticleServiceSoap. The problem is that the method requires both group id and entry id, and what we want is to fetch all of the articles from a particular group. Hence, we are trying to get the ids first, using AssetEntryServiceSoap but it fails.
AssetEntryServiceSoapServiceLocator aesssLocator = new AssetEntryServiceSoapServiceLocator();
com.liferay.client.soap.portlet.asset.service.http.AssetEntryServiceSoap assetEntryServiceSoap = null;
URL url = null;
try {
url = new URL(
"http://127.0.0.1:8080/tunnel-web/secure/axis/Portlet_Asset_AssetEntryService");
} catch (MalformedURLException e) {
e.printStackTrace();
}
try {
assetEntryServiceSoap = aesssLocator
.getPortlet_Asset_AssetEntryService(url);
} catch (ServiceException e) {
e.printStackTrace();
}
if (assetEntryServiceSoap == null) {
return;
}
Portlet_Asset_AssetEntryServiceSoapBindingStub assetEntryServiceSoapBindingStub = (Portlet_Asset_AssetEntryServiceSoapBindingStub) assetEntryServiceSoap;
assetEntryServiceSoapBindingStub.setUsername("bruno#7cogs.com");
assetEntryServiceSoapBindingStub.setPassword("bruno");
AssetEntrySoap[] entries;
AssetEntryQuery query = new AssetEntryQuery();
try {
int count = assetEntryServiceSoap.getEntriesCount(query);
System.out.println("Entries count: " + Integer.toString(count));
entries = assetEntryServiceSoap.getEntries(query);
if (entries != null) {
System.out.println(Integer.toString(entries.length));
}
for (AssetEntrySoap aes : assetEntryServiceSoap.getEntries(query)) {
System.out.println(aes.getEntryId());
}
} catch (RemoteException e1) {
e1.printStackTrace();
}
Although getEntriesCount() returns a positive value like 83, getEnries() always returns an empty array. I'm very new to Liferay portal, but it looks really weird to me.
By the way, we are obviously not looking for performance here, the key is just to fetch some specific content from the portal remotely. If you know any working solution your help would be much appreciated.
Normally a AssetEntryQuery would have a little more information in it for example:
AssetEntryQuery assetEntryQuery = new AssetEntryQuery();
assetEntryQuery.setClassNameIds(new long[] { ClassNameLocalServiceUtil.getClassNameId("com.liferay.portlet.journal.model.JournalArticle") });
assetEntryQuery.setGroupIds(new long[] { groupId });
So this would return all AssetEntries for the groupId you specify, that are also JournalArticles.
Try this and see, although as you say, the Count method returns a positive number so it might not make a difference, but give it a go! :)

Categories