I'm working on a pet project to scrape fantasy football stats from MY own fantasy league on ESPN. The problem that I'm running into that I can't seem to get past is the login which is needed before I can make requests for my league's page.
The URL I hit is
http://games.espn.com/ffl/leaguesetup/ownerinfo?leagueId=123456&seasonId=2016
and by looking at the GET requests it looks like I get redirected to
http://games.espn.com/ffl/signin?redir=http://games.espn.com/ffl/leaguesetup/ownerinfo?leagueId=123456&seasonId=2016
Which immediately gets me to a login prompt window. When I log in I inspect the POST request and note down all the Request Header. Looks like the requested URL on the POST is
https://registerdisney.go.com/jgc/v5/client/ESPN-FANTASYLM-PROD/guest/login?langPref=en-US
additionally I noted the following JSON objected is passed along:
{"loginValue":"myusername","password":"mypassword"}
using the Request Headers and JSON object I did the following:
String url = "http://games.espn.com/ffl/leaguesetup/ownerinfo?leagueId=123456&seasonId=2016";
String rawData = "{\"loginValue\":\"myusername\",\"password\":\"mypassword\"}";
URL obj = new URL(url);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
con.setRequestMethod("POST");
con.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
con.setRequestProperty("Accept-Encoding", "gzip, deflate");
con.setRequestProperty("Accept-Language", "en-US,en;q=0.5");
con.setRequestProperty("Authorization", "APIKEY 8IYGqTgmpFTX51iF1ldp6MBtWrdQ0BxNUf8bg5/empOdV4u16KUSrnkJqy1DXy+QxV8RaxKq45o2sM8Omos/DlHYhQ==");
con.setRequestProperty("Cache-Control", "no-cache");
con.setRequestProperty("Content-Length", "52");
con.setRequestProperty("Content-Type", "application/json; charset=UTF-8");
con.setRequestProperty("Expires", "-1");
con.setRequestProperty("Host", "registerdisney.go.com");
con.setRequestProperty("Origin", "https://cdn.registerdisney.go.com");
con.setRequestProperty("Pragma", "no-cache");
con.setRequestProperty("Referer", "https://cdn.registerdisney.go.com/v2/ESPN-ESPNCOM-PROD/en-US?include=config,l10n,js,html&scheme=http&postMessageOrigin=http%3A%2F%2Fwww.espn.com%2F&cookieDomain=www.espn.com&config=PROD&logLevel=INFO&topHost=www.espn.com&ageBand=ADULT&countryCode=US&cssOverride=https%3A%2F%2Fsecure.espncdn.com%2Fcombiner%2Fc%3Fcss%3Ddisneyid%2Fcore.css&responderPage=https%3A%2F%2Fwww.espn.com%2Flogin%2Fresponder%2Findex.html&buildId=157599bfa88");
con.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0");
con.setRequestProperty("conversation-id", "5a4572f4-c940-454c-8f86-9af27345c894, adffddd3-8c31-41a0-84d7-7a0401cd2ad0");
con.setRequestProperty("correlation-id", "4d9ddc78-b00e-4c5a-8eec-87622961fd34")
con.setDoOutput(true);`
OutputStreamWriter w = new OutputStreamWriter(con.getOutputStream(), "UTF-8");
w.write(rawData);
w.close();
int responseCode = con.getResponseCode();
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
Assuming I'm on the right track what I'm currently getting back from the server is server is
returned HTTP response code: 400 for URL: https://registerdisney.go.com/jgc/v5/client/ESPN-FANTASYLM-PROD/guest/login?langPref=en-US
Any ideas what is happening or if i'm taking the complete wrong approach here? I tried to use JSoup but had no luck either and I believe underneath JSoup uses HttpUrlConnection as well.
Do I need to do some sort of GET request first, save something then do the POST request? How should it work?
You are trying to emulate the behaviour of a Web Browser with JSoup. As you have experienced this is quite complicated and JSoup is not made for to impersonate a browser. When you start with crafting HTTP headers, then it's better to go another way.
The solution for your problem is to use a browser that can be programmatically manipulated. Selenium is more or less the defacto standard in Java.
Selenium starts your favorite browser (Firefox, Chrome, ..) and let you control it from your Java program. You can also retrieve the content of the web pages in order to scrap them with JSoup. Selenium is well documented, you will have no difficulty to find the required documentation/tutorial.
Another answer to your problem. While it is impossible for me to reproduce your issue (don't have football fantasy account and I have no intent to create one), I can still try to give some methodology help.
I would tackle the problem by using the network inspector from my browser, copy in a file all the exchanges between the browser and the server and try to reproduce this in my code.
The API key value in the Authorization header can only be reused for a limited time. If it is expired, the registration response body will contain an "API_KEY_INVALID" error.
Related
i'm trying to read the best price from the skyscanner website using a normal get request, but i'm not getting the content that i want by using this code.
private void getRequest() throws Exception {
StringBuilder result = new StringBuilder();
URL url = new URL(URL);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.addRequestProperty("User-Agent","Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0");
System.out.println(conn.getURL());
conn.setInstanceFollowRedirects(true);
HttpURLConnection.setFollowRedirects(true);
conn.setRequestMethod("GET");
BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line;
while ((line = rd.readLine()) != null) {
result.append(line);
}
System.out.println(conn.getURL());
rd.close();
response = result.toString();
}
The requested URL is the following:
https://www.skyscanner.com/transport/flights/fra/txl/181220/?adults=1&children=0&adultsv2=1&childrenv2=&infants=0&cabinclass=economy&rtn=0&preferdirects=false&outboundaltsenabled=false&inboundaltsenabled=false¤cy=EUR&market=DE&locale=en-US
Response from the code above looks like this:
https://pastebin.com/YKh17RKE
By going to the mentioned skyscanner link in chrome i can click on inspect element and voila under
fqs-opts-container -> <span class="fqs-price">42 €</span>
i can see the cheapest price.
How to get this information using java? What am i doing wrong here?
Thanks in advance.
Inspect shows the current HTML DOM (Document Object Model) resulting from:
the static HTML page (see right-click + View page source) plus
dynamic modifications by JavaScript.
If you do Inspect, tab Network and reload the page, you can see which files (and their contents) are all requested by the browser to display the page.
In this particular case, it seems that you could get the data as JSON:
In the tab Network filter for conductor/v1/fps3/search/. The query is an HTTP post request with the URL https://www.skyscanner.de/g/conductor/v1/fps3/search/?geo_schema=skyscanner&carrier_schema=skyscanner&response_include=query%3Bdeeplink%3Bsegment%3Bstats%3Bfqs%3Bpqs%3B_flights_availability. The answer is in JSON and includes a session_id which is required as part of the URL for subsequent requests for details.
Please note that even if it is technically possible to receive the data, it is in most cases forbidden to use them commercially.
I am trying to make a post request to a https address and set up fiddler to return a standard response. I have two rules set up in Fiddler and the process works from both Internet Explorer and Postman (but not Chrome) and I cannot get it to work from the java application I am trying to write even when I have created an executeable jar file and run from the cmd. I have been using this example as the base for this work. I have the sendGet() working (ish) but I cannot get sendPost() to work getting a java.net.UnknownHostException.
I think the problem may be that I am not hitting Fiddler as the proxy from Eclipse. For the sendGet() from browser and Postman I get the contents of 200_SimpleHTML.dat as required but from eclipse the same rule has no affect and I get the content from the actual URL (Our TeamForge in this case)
My organisation uses a proxy which is set in IE and I have set the java configuration to "Use browser settings" and also tried "Use automatic proxy configuration script" (pointing to the proxy.pac file) and neither seems to have any affect. I have the following in Window -> Preferences -> Network Connections:
but I have no idea how, or even if, I can point to Fiddler as the proxy here. I am not setting up any authentication from the working routes.
The current state of my sendPost is below:
USER_AGENT = "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
which I copied from Fiddler after one of the successful request.
private void sendPost() throws Exception
{
String url = "<Actual URL removed>";
URL obj = new URL(url);
HttpURLConnection http = (HttpURLConnection) obj.openConnection();
// add request header
http.setRequestMethod("POST");
http.setDoOutput(true);
http.setRequestProperty("User-Agent", USER_AGENT);
http.setRequestProperty("Accept-Language", "en-GB,en;q=0.5");
OutputStream out = http.getOutputStream();
int responseCode = http.getResponseCode();
System.out.println("\nSending 'POST' request to URL : " + url);
System.out.println("Response Code : " + responseCode);
BufferedReader in = new BufferedReader(new InputStreamReader(http.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null)
{
response.append(inputLine);
}
in.close();
// print result
System.out.println(response.toString());
}
Does anyone have any ideas as to how I can get this to work from my Java app?
Don't know how to explain it better but i'm trying to get a response from an URL containing a function (right?).
I'm working on this for a lot of hours and progressing a little every time but can't get this finally working.
This is the request and response headers from chrome dev tools:
Headers
My code is:
String params = "{\"prefixText\":\"" + city
+ "\",\"count\":10,\"contextKey\":\"he\"}";
conn = (HttpURLConnection) new URL(
"http://bus.gov.il/WebForms/wfrmMain.aspx/GetCompletionList")
.openConnection();
conn.setDoOutput(true);
conn.setRequestMethod("POST");
conn.setChunkedStreamingMode(0);
// conn.setFixedLengthStreamingMode(params.length());
conn.addRequestProperty("Accept", "*/*");
conn.addRequestProperty("Content-Type", "application/json; charset=UTF-8");
conn.addRequestProperty("Content-Length", String.valueOf(params.length()));
conn.addRequestProperty("Host", "bus.gov.il");
conn.addRequestProperty("Origin", "http://bus.gov.il");
conn.addRequestProperty("X-Requested-With", "XMLHttpRequest");
conn.addRequestProperty("Referer",
"http://bus.gov.il/WebForms/wfrmMain.aspx?width=1024&company=1&language=he&state=");
OutputStream os = new BufferedOutputStream(conn.getOutputStream());
os.write(params.getBytes());
String answer = readStream(conn.getInputStream());
I get the exception (I see in the stack trace) when calling "getinputstream" on this line:
String answer = readStream(conn.getInputStream());
before entering the readStream function!
I don't know how to solve it...
Tried searching about xmlhttprequest but understood that it's only in JS.
Also: I know I have a lot of unnecessary request properties but I can't figure out which are unnecessary until the code will work.
Thanks in advance :)
Sadly, it used to be (and probably still is) that the HttpURLConnection throws a FileNotFoundException when you get a 404 error. When you are doing the getInputStream() that's when it's first connecting, so any error from the server will show up there.
Get Wireshark or something if you want to see what's really going on in HTTP land as you make the request.
String url = "http://maps.googleapis.com/maps/api/directions/xml?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false";
URL google = new URL(url);
HttpURLConnection con = (HttpURLConnection) google.openConnection();
and I use BufferedReader to print the content I get 403 error
The same URL works fine in the browser. Could any one suggest.
The reason it works in a browser but not in java code is that the browser adds some HTTP headers which you lack in your Java code, and the server requires those headers. I've been in the same situation - and the URL worked both in Chrome and the Chrome plugin "Simple REST Client", yet didn't work in Java. Adding this line before the getInputStream() solved the problem:
connection.addRequestProperty("User-Agent", "Mozilla/4.0");
..even though I have never used Mozilla. Your situation might require a different header. It might be related to cookies ... I was getting text in the error stream advising me to enable cookies.
Note that you might get more information by looking at the error text. Here's my code:
try {
HttpURLConnection connection = ((HttpURLConnection)url.openConnection());
connection.addRequestProperty("User-Agent", "Mozilla/4.0");
InputStream input;
if (connection.getResponseCode() == 200) // this must be called before 'getErrorStream()' works
input = connection.getInputStream();
else input = connection.getErrorStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
String msg;
while ((msg =reader.readLine()) != null)
System.out.println(msg);
} catch (IOException e) {
System.err.println(e);
}
HTTP 403 is a Forbidden status code. You would have to read the HttpURLConnection.getErrorStream() to see the response from the server (which can tell you why you have been given a HTTP 403), if any.
This code should work fine. If you have been making a number of requests, it is possible that Google is just throttling you. I have seen Google do this before. You can try using a proxy to verify.
Most browsers automatically encode URLs when you enter them, but the Java URL function doesn't.
You should Encode the URL with URLEncoder URL Encoder
I know this is a bit late, but the easiest way to get the contents of a URL is to use the Apache HttpComponents HttpClient project: http://hc.apache.org/httpcomponents-client-ga/index.html
you original page (with link) and the targeted linked page are not the same domain.
original-domain and target-domain.
I found the difference is in request header:
with 403 forbidden error,
request header have one line:
Referer: http://original-domain/json2tree/ipfs/ipfsList.html
when I enter url, no 403 forbidden,
the request header does NOT have above line referer: original-domain
I finally figure out how to fix this error!!!
on your original-domain web page, you have to add
<meta name="referrer" content="no-referrer" />
it will remove or prevent sending the Referer in header, works both for links and for Ajax requests made
I'm trying to post some login data to a form in order to grab the cookies from the response.
The url is: https://www.deviantart.com/users/login
However I can not get the server to return FOUND 302 but only 200, so I think I'm b0rking my querystring or headers in some manner:
try {
String query = URLEncoder.encode("&username="+user+"&password="+password+"&remember_me=1", "UTF-8");
URL url = new URL("https://www.deviantart.com/users/login");
HttpsURLConnection conn = (HttpsURLConnection) url.openConnection();
HttpsURLConnection.setFollowRedirects(false);
conn.setRequestMethod("POST");
conn.setRequestProperty("Host", "www.deviantart.com");
conn.setRequestProperty("User-Agent", "Mozilla 4.0");
conn.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
conn.setRequestProperty("Content-Length", String.valueOf(query.length()));
conn.setRequestProperty("Content-Language", "en-US");
conn.setUseCaches(false);
conn.setDoOutput(true);
conn.setDoInput(true);
PrintWriter out = new PrintWriter(conn.getOutputStream(), false);
out.write(query);
out.flush();
As far as I know, its normal to get response "200", means that your request was recognized and properly answered.
But when I need to know about HTTP code responses and some details I usually use this link: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html.
Hope it helps...
A 302 response is a redirect ... telling the browser to go to another page. A sites login page will often do this to send the user to a "login succeeded" page, or back to the home page or the page that the user was originally looking at. But it doesn't have to.
A 200 response means "Succeeded", and for a POST request will result in the browser staying at the login page.
I'd not worry about getting a 200 instead of a 302. If you are sending requests from a Java app, you probably don't care where the site redirects you after login. The thing that matters is whether your credentials have been accepted, and you can only determine that by trying a request that needs authentication. (You need to make sure that your code captures the cookies set by the response to your login POST. They need to be supplied in follow-on requests.)
If you are worried that you have "borked" the login URL query String, fetch the login page and look at its source to see if there are any hidden inputs in the form.