Parse Google spreadsheet URL without authentication - java

My aim is to retrieve CellFeeds from Google spreadsheet URLs without authentication.
I tried it with the following spreadsheet URL (published to web):
https://docs.google.com/spreadsheet/ccc?key=0AvNWoDP9TASIdERsbFRnNXdsN2x4MXMxUmlyY0g3VUE&usp=sharing
This URL is stored in variable "spreadsheetName".
First attempt was to take the whole URL as argument for Service.getFeed().
url = new URL(spreadsheetName);
WorksheetFeed feed = service.getFeed(url, WorksheetFeed.class);
But then I ran into following exception :
com.google.gdata.util.RedirectRequiredException: Found
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
Second attempt was to build the URL with the key from the origin URL, using FeedURLFactory:
String key = spreadsheetName.split("key=")[1].substring(0, 44);
url = FeedURLFactory.getDefault().getCellFeedUrl(key,
worksheetName, "public", "basic");
WorksheetFeed feed = service.getFeed(url, WorksheetFeed.class);
...and I got the next exception:
com.google.gdata.util.InvalidEntryException: Bad Request
Invalid query parameter value for grid-id.
Do you have any ideas what I did wrong or is there anybody who successfully retrieved data from spreadsheet URLs without authentication? Thx in advance!

You have two problems. I'm not sure about the second problem, but the first is that you are trying to use a cellFeedURL without the correct key, you are just using worksheetName, which is probably not correct. If you do something like this:
public static void main(String... args) throws MalformedURLException, ServiceException, IOException {
SpreadsheetService service = new SpreadsheetService("Test");
FeedURLFactory fact = FeedURLFactory.getDefault();
String key = "0AvNWoDP9TASIdERsbFRnNXdsN2x4MXMxUmlyY0g3VUE";
URL spreadSheetUrl = fact.getWorksheetFeedUrl(key, "public", "basic");
WorksheetFeed feed = service.getFeed(spreadSheetUrl,
WorksheetFeed.class);
WorksheetEntry entry = feed.getEntries().get(0);
URL cellFeedURL = entry.getCellFeedUrl();
CellFeed cellFeed = service.getFeed(cellFeedURL, CellFeed.class);
}
You will get the correct CellFeed. However, your second problem is that if you do it this way, all the CellEntry.getCell() in the CellFeed populate as null. I am not sure why, or if it can be solved while logged in as public/basic.

The following code should work for your first issue. Probably the second issue is coming due to the query parameter in CellFeed. Make sure that the other dependent jars are available or not. I had worked on spreadsheet API long back. This might help you.
import java.net.URL;
import java.lang.*;
import java.util.List;
import com.google.gdata.client.spreadsheet.FeedURLFactory;
import com.google.gdata.client.spreadsheet.ListQuery;
import com.google.gdata.client.spreadsheet.SpreadsheetService;
import com.google.gdata.data.spreadsheet.CustomElementCollection;
import com.google.gdata.data.spreadsheet.ListEntry;
import com.google.gdata.data.spreadsheet.ListFeed;
import com.google.gdata.data.spreadsheet.WorksheetEntry;
import com.google.gdata.data.spreadsheet.WorksheetFeed;
public class SpreadsheetsDemo {
public static void main(String[] args) throws Exception{
String application = "SpreadsheetsDemo";
String key = "0AvNWoDP9TASIdERsbFRnNXdsN2x4MXMxUmlyY0g3VUE";
SpreadsheetService service = new SpreadsheetService(application);
URL url = FeedURLFactory.getDefault().getWorksheetFeedUrl(key, "public", "basic");
WorksheetFeed feed = service.getFeed(url, WorksheetFeed.class);
List<WorksheetEntry> worksheetList = feed.getEntries();
WorksheetEntry worksheetEntry = worksheetList.get(0);
CellQuery cellQuery = new CellQuery(worksheetEntry.CellFeedLink);
CellFeed cellFeed = service.Query(cellQuery);
foreach (CellEntry cell in cellFeed.Entries)
{
//Iterate through the columns and rows
}
}
}

Related

How to get data from the Java web scraping API?

I am trying to get table data from the following url:
Get Data from this URL
and I wrote this code with the help of jaunt API
package org.open.browser;
import com.jaunt.Element;
import com.jaunt.Elements;
import com.jaunt.JauntException;
import com.jaunt.UserAgent;
public class ICICIScraperDemo {
public static void main(String ar[]) throws JauntException{
UserAgent userAgent = new UserAgent(); //create new userAgent (headless browser)
userAgent.visit("https://www.icicidirect.com/idirectcontent/Research/TechnicalAnalysis.aspx/companyprofile/inftec");
Elements links = userAgent.doc.findEvery("<div class=expander>").findEvery("<a>"); //find search result links
String url = null;
for(Element link : links) {
if(link.innerHTML().equalsIgnoreCase("Company Details")){
url = link.getAt("href");
}
}
/*userAgent = new UserAgent(); */ //create new userAgent (headless browser)
userAgent.visit(url);
System.out.println(userAgent.getSource());
Elements results = userAgent.doc.findEvery("<tr>").findEvery("<td>");
System.out.println(results);
}
}
But it didn't work.
Then I tried another API called htmlunit and wrote below code
public void htmlUnitEx(){
String START_URL = "https://www.icicidirect.com/idirectcontent/Research/TechnicalAnalysis.aspx/companyprofile/inftec";
try {
WebClient webClient = new WebClient(BrowserVersion.CHROME);
HtmlPage page = webClient.getPage(START_URL);
WebResponse webres = page.getWebResponse();
//List<HtmlAnchor> companyInfo = (List) page.getByXPath("//input[#id='txtStockCode']");
HtmlTable companyInfo = (HtmlTable) page.getFirstByXPath("//table");
for(HtmlTableRow item : companyInfo.getBodies().get(0).getRows()){
String label = item.getCell(1).asText();
System.out.println(label);
if(!label.contains("Registered Office")){
continue ;
}
}
}
But this also not giving the result .
Can someone please help how to get the data from the above url and other Anchor url in a single session?
Using HtmlUnit you can do this
String url = "https://www.icicidirect.com/idirectcontent/Research/TechnicalAnalysis.aspx/companyprofile/inftec";
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)) {
HtmlPage page = webClient.getPage(url);
webClient.waitForBackgroundJavaScript(1000);
final DomNodeList<DomNode> divs = page.querySelectorAll("div.bigcoll");
System.out.println(divs.get(1).asText());
}
Two things to mention:
you have to wait after the getPage call a bit because some parts are created by javascript/AJAX
there are many way to find elements on a page (see Finding a specific element). I have done only a quick hack to show the code is working.

Posting status message to facebook page

Im trying to post some status message on my facebook page, not my personal facebook profile account, but on page i created separately.
My current code looks like this:
import facebook4j.Facebook;
import facebook4j.FacebookException;
import facebook4j.FacebookFactory;
import facebook4j.Post;
import facebook4j.ResponseList;
import facebook4j.conf.Configuration;
import facebook4j.conf.ConfigurationBuilder;
public class FacebookImpl {
// from https://developers.facebook.com/apps/
static String appId = "11removed";
// from https://developers.facebook.com/apps/
static String appSecret = "c0removed";
// from https://developers.facebook.com/tools/accesstoken/
static String appToken = "11removed";
// my facebook page
static String myFaceBookPage = "niremoved";
public static void main(String[] args) throws FacebookException {
// Make the configuration builder
ConfigurationBuilder confBuilder = new ConfigurationBuilder();
confBuilder.setDebugEnabled(true);
// Set application id, secret key and access token
confBuilder.setOAuthAppId(appId);
confBuilder.setOAuthAppSecret(appSecret);
confBuilder.setOAuthAccessToken(appToken);
// Set permission
// https://developers.facebook.com/docs/facebook-login/permissions
confBuilder.setOAuthPermissions("manage_pages,publish_pages,publish_actions");
confBuilder.setUseSSL(true);
confBuilder.setJSONStoreEnabled(true);
// Create configuration object
Configuration configuration = confBuilder.build();
// Create FacebookFactory and Facebook instance
FacebookFactory ff = new FacebookFactory(configuration);
Facebook facebook = ff.getInstance();
// this one works fine
System.out.println(getFacebookPostes(facebook, myFaceBookPage));
// try to post status
// FacebookException{statusCode=403, errorType='OAuthException',
// errorMessage='(#200) The user hasn't authorized the application to perform this action', errorCode=200, errorSubcode=-1, version=2.4.5}
facebook.postStatusMessage(myFaceBookPage, "Test Facebook4J.");
}
public static String getFacebookPostes(Facebook facebook, String page) throws FacebookException {
ResponseList<Post> results = facebook.getPosts(page);
// as example just to see if i get any data
return results.get(0).getMessage();
}
}
Problem is that i cant post any message on page using this code: facebook.postStatusMessage(myFaceBookPage, "Test Facebook4J."); but i can get messages already posted (via facebook web interfaces) with method getFacebookPostes.
Can anyone help me with this one ? And please do not paste some random dev.Facebook link to look into API.
What i did:
- create app on https://developers.facebook.com/apps/ i have appid, appsecret
Thanks

how to check and record URL redirction?

I writing some web-spider now. I want to crawl a bunch of pages from the web. I have succeed part of my goal, with hundreds of URL link stored on my hand. But those links are not the final link. That means, when you put a URL in a web browser like Google Chrome, the URL would be automatically redirected to another page, which is what I want. But that only work in a web browser. When I write code to crawl from that URL, redirection would not happen.
Some example:
given (URL_1):
http://weixin.sogou.com/websearch/art.jsp?sg=CBf80b2xkgZ8cxz1-SgG-dBH_4QL8uVunUQKxf0syVWvynE5nPZm2TPqNuEF6MO2xv0MclVANfsVYUGr5-1b3ls29YYxgU27ra8qaaU15iv7KVkBsZp5Td27Cb2A24cIwEuw__0ZHdPeivmW-kcfnw..&url=p0OVDH8R4SHyUySb8E88hkJm8GF_McJfBfynRTbN8wjVuWMLA31KxFCrZAW0lIGG1EpZGR0F1jdIzWnvINEMaGQ3JxMQ33742MRcPWmNX2CMTFYIzOo-v8LrDlfP2AnF54peD-GxvCNYy-5x5In7jJFmExjqCxhpkyjFvwP6PuGcQ64lGQ2ZDMuqxplQrsbk
put this link in a browser, it would be automatically redirect to (URL_2):
http://mp.weixin.qq.com/s?__biz=MzA4OTIxOTA4Nw==&mid=404672464&idx=1&sn=bdfff50b8e9ac28739cf8f8a51976b03&3rd=MzA3MDU4NTYzMw==&scene=6#rd
which is a different link.
But put this in python code like:
response=urllib2.urlopen(URL_1)
print response.read()
that auto-redirection does't happen!
In a word, my question is: given a URL, how to get the redirected one ?
Some body give me some java code, which work in some other situation, but doesn't help in mine:
import java.net.HttpURLConnection;
import java.net.URL;
public class Main {
public void test()throws Exception {
String expectedURL ="http://www.zhihu.com/question/20583607/answer/16597802";
String url = "http://www.baidu.com/link?url=ByBJLpHsj5nXx6DESXbmMjIrU5W4Eh0yg5wCQpe3kCQMlJK_RJBmdEYGm0DDTCoTDGaz7rH80gxjvtvoqJuYxK";
String redirtURL = getRedirectURL(url);
if (redirtURL.equals(expectedURL)) {
System.out.println("Equal");
}else{
System.out.println(url);
System.out.println(redirtURL);
}
}
public String getRedirectURL(String path) throws Exception {
HttpURLConnection conn = (HttpURLConnection) new URL(path).openConnection();
conn.setInstanceFollowRedirects(false);
conn.setConnectTimeout(5000);
return conn.getHeaderField("Location");
}
public static void main(String[] args) throws Exception{
Main obj = new Main();
obj.test();
}
}
It would print out Equal in this case, which mean that we can now get expecteURL from url. But this would work in the former case.( I don't know why, but looking carefully in to the URL_1 above and that url in the java code, I notice that there is some interesting difference: there is a snippet .../link?url=... in the url in above java code , which would probably means some direction. But in the URL_1 above, it is .../art.jsp?sg=... )
Look for follow_redirects option. In python, you can do it e.g. with requests
import requests
response = requests.get('http://example.com', follow_redirects=True)
print response.url
# history contains list of responses for redirects
print response.history

Can't upload file to Amazon Cloudsearch with java

I am building a java application in order to index some json files using amazon cloudsearch. I think that I've used correct the aws documentation, but I can't make my application work.
package com.myPackage;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.services.cloudsearchdomain.AmazonCloudSearchDomainClient;
import com.amazonaws.services.cloudsearchdomain.model.UploadDocumentsRequest;
import com.amazonaws.services.cloudsearchdomain.model.UploadDocumentsResult;
public class App
{
public static final String ACCESS_KEY = "myAccessKey";
public static final String SECRET_KEY = "mySecretKey";
public static final String ENDPOINT = "myDocumentEndpoint";
public static void main( String[] args ) throws FileNotFoundException
{
AWSCredentials credentials = new BasicAWSCredentials(ACCESS_KEY, SECRET_KEY);
AmazonCloudSearchDomainClient domain = new AmazonCloudSearchDomainClient(credentials);
domain.setEndpoint(ENDPOINT);
File file = new File("path to my file");
InputStream docAsStream = new FileInputStream(file);
UploadDocumentsRequest req = new UploadDocumentsRequest();
req.setDocuments(docAsStream);
System.out.print(file.length());
UploadDocumentsResult result = domain.uploadDocuments(req);//here i get the exception
System.out.println(result.toString());
//
// SearchRequest searchReq = new SearchRequest().withQuery("my Search request");
// SearchResult s_res = domain.search(searchReq);
// System.out.println(s_res);
}
}
The problem is that I get the following errors:
Exception in thread "main" com.amazonaws.AmazonClientException: Unable to unmarshall error response (Unable to parse error response: '<html><body><h1>403 Forbidden</h1>Request forbidden by administrative rules.</body></html>'). Response Code: 403, Response Text: Forbidden
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1071)
at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:725)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:460)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:295)
at com.amazonaws.services.cloudsearchdomain.AmazonCloudSearchDomainClient.invoke(AmazonCloudSearchDomainClient.java:527)
at com.amazonaws.services.cloudsearchdomain.AmazonCloudSearchDomainClient.uploadDocuments(AmazonCloudSearchDomainClient.java:310)
at gvrhtyhuj.dfgbmn.App.main(App.java:31)
Caused by: com.amazonaws.AmazonClientException: Unable to parse error response: '<html><body><h1>403 Forbidden</h1>Request forbidden by administrative rules.</body></html>'
at com.amazonaws.http.JsonErrorResponseHandler.handle(JsonErrorResponseHandler.java:55)
at com.amazonaws.http.JsonErrorResponseHandler.handle(JsonErrorResponseHandler.java:29)
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1045)
... 6 more
Caused by: com.amazonaws.util.json.JSONException: A JSONObject text must begin with '{' at 1 [character 2 line 1]
at com.amazonaws.util.json.JSONTokener.syntaxError(JSONTokener.java:422)
at com.amazonaws.util.json.JSONObject.<init>(JSONObject.java:196)
at com.amazonaws.util.json.JSONObject.<init>(JSONObject.java:323)
at com.amazonaws.http.JsonErrorResponseHandler.handle(JsonErrorResponseHandler.java:53)
This is the json file:
{"a":123,"b":"4 5 6"}
First: Please don't put your credentials in your code. It's way too easy to accidentally check credentials into version control, or otherwise post them. If you have your credentials in the default location, you can just do:
AmazonCloudSearchDomainClient client = new AmazonCloudSearchDomainClient();
And the SDK will find them.
One common cause of 403s is getting the endpoint wrong. Make sure you don't have /documents/batch on the end of your endpoint string. The SDK will add that.
One other thing to try is setting the content length:
req.setContentLength(file.length());
My code has that, and works, and is otherwise the same as yours.
You're getting a 403 Forbidden error from CloudSearch, meaning that you don't have permission to upload documents to that domain.
Are you literally using "myAccessKey" as your access key value or did you redact it when you posted this? If you never set it, then you need to set your access key; otherwise check the access policies on your CloudSearch domain through the AWS web console, since it may be configured to accept/reject submissions based on IP address or some other set of conditions.

Programmatically edit a Google Spreadsheet

I have a written a program that takes in user input, but now I want to be able to save that input by editing a Google spreadsheet every time a user submits the form. So basically, the Google spreadsheet is constantly being updated.
Can anyone provide a tutorial on how I might be able to achieve this? I'm writing in Java using Eclipse, so which plug-ins would I need?
I have already tried using some of the sample code provided in the Google Spreadsheets API (adding a list row section), but I can't seem to get it to work.
import com.google.gdata.client.spreadsheet.*;
import com.google.gdata.data.spreadsheet.*;
import com.google.gdata.util.*;
import java.io.IOException;
import java.net.*;
import java.util.*;
public class MySpreadsheetIntegration {
public static void main(String[] args)
throws AuthenticationException, MalformedURLException, IOException, ServiceException {
SpreadsheetService service =
new SpreadsheetService("MySpreadsheetIntegration-v1");
// TODO: Authorize the service object for a specific user (see other sections)
// Define the URL to request. This should never change.
URL SPREADSHEET_FEED_URL = new URL(
"https://docs.google.com/spreadsheets/d/1OcDp1IZ4iuvyhndtrZ3OOMHZNSEt7XTaaTrhEkNPnN4/edit#gid=0");
// Make a request to the API and get all spreadsheets.
SpreadsheetFeed feed = service.getFeed(SPREADSHEET_FEED_URL,
SpreadsheetFeed.class);
List<SpreadsheetEntry> spreadsheets = feed.getEntries();
if (spreadsheets.size() == 0) {
// TODO: There were no spreadsheets, act accordingly.
}
// TODO: Choose a spreadsheet more intelligently based on your
// app's needs.
SpreadsheetEntry spreadsheet = spreadsheets.get(0);
System.out.println(spreadsheet.getTitle().getPlainText());
// Get the first worksheet of the first spreadsheet.
// TODO: Choose a worksheet more intelligently based on your
// app's needs.
WorksheetFeed worksheetFeed = service.getFeed(
spreadsheet.getWorksheetFeedUrl(), WorksheetFeed.class);
List<WorksheetEntry> worksheets = worksheetFeed.getEntries();
WorksheetEntry worksheet = worksheets.get(0);
// Fetch the list feed of the worksheet.
URL listFeedUrl = worksheet.getListFeedUrl();
ListFeed listFeed = service.getFeed(listFeedUrl, ListFeed.class);
// Create a local representation of the new row.
ListEntry row = new ListEntry();
row.getCustomElements().setValueLocal("firstname", "Joe");
row.getCustomElements().setValueLocal("lastname", "Smith");
row.getCustomElements().setValueLocal("age", "26");
row.getCustomElements().setValueLocal("height", "176");
// Send the new row to the API for insertion.
row = service.insert(listFeedUrl, row);
}
}
seem to be very late but surely this is going to help others! The problem is in your SPREADSHEET_FEED_URL and authentication of SpreadSheetService instance because the official SpreadSheets Api has not shared detailed explaination regarding that.You need to get an authentication token and set it on SpreadSheetService Instance like below to get it work:
private void getAuthenticationToken(Activity activity, String accountName){
//Scopes used to get access to google docs and spreadsheets present in the drive
String SCOPE1 = "https://spreadsheets.google.com/feeds";
String SCOPE2 = "https://docs.google.com/feeds";
String scope = "oauth2:" + SCOPE1 + " " + SCOPE2;
String authenticationToken = null;
try {
accessToken= GoogleAuthUtil.getToken(activity, accountName, scope);
}
catch (UserRecoverableAuthException exception){
//For first time, user has to give this permission explicitly
Intent recoveryIntent = exception.getIntent();
startActivityForResult(recoveryIntent, RECOVERY_REQUEST_CODE);
}catch (IOException e) {
e.printStackTrace();
} catch (GoogleAuthException e) {
e.printStackTrace();
}
}
#Override
protected void onActivityResult(int requestCode, int resultCode, Intent data) {
if (requestCode == RECOVERY_REQUEST_CODE){
if(resultCode == RESULT_OK){
if(data != null){
String accountName = data.getStringExtra(AccountManager.KEY_ACCOUNT_NAME);
if (accountName != null && !accountName.equals("")){
//To be called only for the first time after the permission is given
getAuthenticationToken(activity, accountName);
}
}else {
Utility.showSnackBar(linearLayout, Constants.INTENT_DATA_NULL);
}
}
}
}
And finally below code to get all spreadsheets in an email account:
public class MySpreadsheetIntegration {
public void getSpreadSheetEntries()
throws AuthenticationException, MalformedURLException, IOException, ServiceException {
SpreadsheetService service =
new SpreadsheetService("MySpreadsheetIntegration-v1");
service = new SpreadsheetService(applicationName);
service .setProtocolVersion(SpreadsheetService.Versions.V3);
service .setAuthSubToken(accessToken);
// Define the URL to request. This should never change.
URL SPREADSHEET_FEED_URL = new URL(
"https://spreadsheets.google.com/feeds/spreadsheets/private/full");
// Make a request to the API and get all spreadsheets.
SpreadsheetFeed feed = service.getFeed(SPREADSHEET_FEED_URL, SpreadsheetFeed.class);
List<SpreadsheetEntry> spreadsheets = feed.getEntries();
// Iterate through all of the spreadsheets returned
for (SpreadsheetEntry spreadsheet : spreadsheets) {
// Print the title of this spreadsheet to the screen
System.out.println(spreadsheet.getTitle().getPlainText());
}
}
}

Categories