When I run the following code:
try {
URL url = new URL("https://www1.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=HUDCO&series=N2");
File f = new File("/Users/Vaibhav/Desktop/nseurltest.txt");
FileUtils.copyURLToFile(url, f);
} catch (Exception e) {
e.printStackTrace();
}
I get a java.net.SocketException: Operation timed out after about 30 seconds. Up to about a month ago, the same code was running without error. What could be the reason for this exception suddenly, and how can I fix it?
The objective of this code is to ultimately extract the latest market price of the HUDCO N2 bond from the .txt file the URL is copied into. If there is another simple way to extract the market price from the URL, I would love to hear.
I guess the website you try to reach, blocks unknown connections. But you can overcome this problem with jsoup library. With the following code, I've managed to download the content of the link.
Response response = Jsoup.connect(
"https://www1.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=HUDCO&series=N2")
.ignoreContentType(true)
.userAgent(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36")
.referrer("http://www.google.com").timeout(30000).followRedirects(true).execute();
Document doc = response.parse();
JSOUP dependecy :
<!-- https://mvnrepository.com/artifact/org.jsoup/jsoup -->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.11.3</version>
</dependency>
You can change useragent. I put chrome 70 version's useragent string There are many options in the following link.
http://www.useragentstring.com/pages/useragentstring.php
Related
url = "https://www.lmcu.org/?__cf_chl_jschl_tk__=9c114404052361017d9cfe1247981e24813649c7-1592389426-0-AfP07ha5TxZHf64q5tb5nJf9BJguC4U553-OJzJWivTqfgwYLqUODkXj-XsOjZTwpC71ROxHWx4Xhdp2S0LgAVlKgXpy7KWOex7lkoGBm8mNpBsCeJapdYNWty-X2oHE6gp_TtMfH0dcBabvWr_mXV1djsVR_IGlYJA-wCuZpPTGOozyzN9TFwjMPxU-3o6BIUxTh6DDcHmJ_Bw48EYKGpq6n57bVdeLezEs9PduataW1JUcF4GqLE2EHiUxWGubtS8YgcxkkGin4zitHXENMbFi1kMhxI77LsORzKyhkAD1OkG8fGmV--Cgd3EpxWHtHD5vpoIFFIwX0uGQywPnegs";
HttpURLConnection connection = pingHttpUrl(url);
responseCode = connection.getResponseCode();
public HttpURLConnection pingHttpUrl(String url) throws IOException {
HttpURLConnection conn = null;
try {
conn = (HttpURLConnection) new URL(url).openConnection();
conn.setRequestMethod("GET");
conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76 Safari/537.36");
conn.setConnectTimeout(2000);
conn.setInstanceFollowRedirects(false);
conn.setReadTimeout(10000);
conn.connect();
Thread.sleep(1000);
} catch (Exception e) {
logger.error("Caught exception : {}", e.getMessage());
throw new IOException();
}
return conn;
}
This gives response code as 503. But the site is properly loading on browser. What can be the issue with this ?
The problem is with the headers of the request. I found that this solution hosted on cloudflare requires two headers to be just so, otherwise you will receive the 503 response:
User-Agent - Your header specified chrome version 76, apparently the server has a problem with this. I had success with this User-Agent value: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36
cookie - I found that the cookie value cf_clearance needs to be set, and possibly the other set-cookie values that are returned on the first request. This value has to do with the cloudflare support for privacy pass (https://blog.cloudflare.com/cloudflare-supports-privacy-pass/). It appears to be a means of verifying that a user is human, and not a machine. Which in turn is bad news for your efforts here.
I have a working solution below, but it will be hard to automate - since it will require you to establish a browser session, and use the cookie set there in the code. Upon expiration of the cf_clearance cookie, you will have visit the site again and reset the cookie value in the code.
I would also speculate that the User-Agent header of the request, is used in generating the cf_clearance cookie that is required. Making it more difficult to hijack the cookie, as you would have to use a matching User-Agent of the browser used for the request when the cf_clearance cookie was generated by cloudflare.
I have journaled my investigation here:
When visiting the URL in my browser:
https://www.lmcu.org/?cf_chl_jschl_tk=9c114404052361017d9cfe1247981e24813649c7-1592389426-0-AfP07ha5TxZHf64q5tb5nJf9BJguC4U553-OJzJWivTqfgwYLqUODkXj-XsOjZTwpC71ROxHWx4Xhdp2S0LgAVlKgXpy7KWOex7lkoGBm8mNpBsCeJapdYNWty-X2oHE6gp_TtMfH0dcBabvWr_mXV1djsVR_IGlYJA-wCuZpPTGOozyzN9TFwjMPxU-3o6BIUxTh6DDcHmJ_Bw48EYKGpq6n57bVdeLezEs9PduataW1JUcF4GqLE2EHiUxWGubtS8YgcxkkGin4zitHXENMbFi1kMhxI77LsORzKyhkAD1OkG8fGmV--Cgd3EpxWHtHD5vpoIFFIwX0uGQywPnegs
And inspecting the response that the server is giving, it turns out that it is infact giving back a 503 as well:
For some reson that I can't make out, the browser is redirected to the below URL instead. I cannot see that the location header is passed back in the response, or find this URL anywhere in the response.
https://www.lmcu.org/?cf_chl_jschl_tk=fe835fdc1e7e2f5b2857ab5eb4be84e67d0e8c42-1592506549-0-AQ3E1piNGHg7O7lxgRyItR1U5BzB52q7GmCHe_HPJBsUHv8RcZCgqLPPtyngPmDjvy7pZDprPNK6ihKVEgQ7HqmbDSPXZ1aHPkBDs9re49u_Q_jI04etmtK7E0GIdxhKWCd-p4TR7b_b0JdnwzJOF6z4XaJQOgNU8kazJr5Mo96zxQpUlsKWPSumEmSfynkGeMDgkM-O1mN59LKp0p4kt-2O2IIFrlc8289ZbCSO6JghtvDsLsFDA3VxLV3Irn2W3KQ8sHg_TdwB-0g0WX9J-WTwedVYzj2a7uNtH377ZIritTXKqRw1qeQ6mkpxQ0h_OVMIl8XUiEC0Zj1KP50tUK8
I checked with Postman, and sure enough - I got the 503 error there as well. As far as I could tell, the server (or reverse proxy in front of it) was inspecting the headers of the request, and invalidating the request based on them. I fooled around a little, moving headers from the browser request into Postman, and finally figured out that it is a combination of the cookie and User-Agent headers being set that allows the request to be served.
The User-Agent header is not allowed to have the specified chrome version, I have it working with version 83 here.
The cookkie header is something that the browser will populate from my first visit to the site in the browser. So that is a bit harder to handle in your code. I tried to fetch it in code with connection.getHeaderField("set-cookie") but that cookie does not seem to cut it.
But! I was able to make the code work, when taking the cookie from my browser, and setting it manually in code, along with the User-Agent:
public HttpURLConnection pingHttpUrl(String url) throws IOException {
HttpURLConnection conn = null;
try {
conn = (HttpURLConnection) new URL(url).openConnection();
conn.setRequestMethod("GET");
// This one does not work for the reason of the chrome version apparently
// conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76 Safari/537.36");
conn.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36");
conn.addRequestProperty("cookie", "<cookie value from the browser, from the header on a successful request>");
conn.setConnectTimeout(2000);
conn.setInstanceFollowRedirects(false);
conn.setReadTimeout(10000);
conn.connect();
Thread.sleep(1000);
} catch (Exception e) {
System.out.println(String.format("Caught exception : %s", e.getMessage()));
throw new IOException();
}
return conn;
}
I later found out that is is the cookie value from the cf_clearance key in the cookie that makes the difference.
While getting results from duckduckgo.com with different queries, after 20-30 iterations, i get this exception:
Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=400, URL=https://duckduckgo.com/html/?q= Hermann_William_Goering
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:682)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:629)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:261)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:250)
at WebContextExtractor.DDGresultsScraping(WebContextExtractor.java:378)
at WebContextExtractor.main(WebContextExtractor.java:521)
I have no idea what's the problem, if i try to visit that link manually on Google Search i can reach that without any problem.
The error occurs when i try to get the document by the page with this simple code:
Connection conn = Jsoup.connect(DUCKDUCKGO_SEARCH_URL + query)
.userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
+ "(KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36");
Document doc = conn.get(); <------ here exception
I am working on Yahoo stock data. Yesterday I got the stock data by using finance web service api. But today when I am trying to get the data from api I am getting the below error:
{
"p": {
"a": {
"href": "https://finance.yahoo.com/webservice/v1/symbols/msft,goog,appl,orcl,yhoo,tcs,amzn,INFY.NS/quote?bypass=true&format=json&view=detail",
"content": "https://finance.yahoo.com/webservice/v1/symbols/msft,goog,appl,orcl,yhoo,tcs,amzn,INFY.NS/quote?bypass=true&format=json&view=detail"
},
"content": "Moved Temporarily. Redirecting to"
}
}
Saying that it was moved temporarily.
Why am I getting this error? Did I reach the API limit for today?
NOTE:
Yesterday I kept it running to test the API request limit. But when I am trying to run today it showing the above error.
If the API limit for IP is reached then when do I get access to the data again?
This is the API which I am using:
http://finance.yahoo.com/webservice/v1/symbols/msft,goog,appl,orcl,yhoo,tcs,amzn,INFY.NS/quote?format=json&view=detail
As it was commented here: https://stackoverflow.com/a/38390559/6586718, you have to change the user-agent to a mobile device.
On Java, I do the following, and it's working (this is for XML, but the same can be applied to JSON):
URL url = new URL ("https://finance.yahoo.com/webservice/v1/symbols/" + stocks + "/quote");
HttpURLConnection urlc = (HttpURLConnection) url.openConnection ();
urlc.setRequestProperty ("User-Agent", "Mozilla/5.0 (Linux; Android 6.0; MotoE2(4G-LTE) Build/MPI24.65-39) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.81 Mobile Safari/537.36");
Document xml = DocumentBuilderFactory.newInstance ().newDocumentBuilder ().parse (urlc.getInputStream ());
try with this new one..
https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20csv%20where%20url%3D'http%3A%2F%2Fdownload.finance.yahoo.com%2Fd%2Fquotes.csv%3Fs%3DAAPL%26f%3Dsl1d1t1c1ohgv%26e%3D.csv'%20and%20columns%3D'symbol%2Cprice%2Cdate%2Ctime%2Cchange%2Ccol1%2Chigh%2Clow%2Ccol2'&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys
I would like to use HtmlUnit to login to website and click a link so that a file would be downloaded, however, the website, which uses JQuery, returns a "Browser Not Supported" Error. Is there a way that HtmlUnit can be set to look exactly like a normal browser to this website?
Any help would be greatly appreciated.
I'm trying to do this with the following settings, but the error is still occurring:
public void surf(Job job) {
System.out.println("[Enter] surf");
try {
String applicationName = "Netscape";
String applicationVersion = "5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36 OPR/38.0.2220.41";
String userAgent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36 OPR/38.0.2220.41";
int browserVersionNumeric = 51;
BrowserVersion browser = new BrowserVersion(applicationName, applicationVersion, userAgent, browserVersionNumeric);
WebClient webClient = new WebClient(browser);
final HtmlPage page = webClient.getPage("https://www.europasports.com");
System.out.println(page);
} catch (Exception e) {
e.printStackTrace();
}
System.out.println("[Exit] surf");
}
Netscape was discontinued in March 2008, that's why you are getting the message. It no longer exists! If you are targeting Apple, i suggest you use Safari, but Google Chrome currently has the largest browser usage share
i create a program that take in input some information and search for the first 10 video on youtube, my problem is that the v2 version of Youtube API isn't support, so i change that
https://gdata.youtube.com/feeds/api/videos?q=
with new version,but i dont find that. Can you help me please?
Thanks a lot
EDIT
Thanks a lot , i have tried to do this command
https://www.googleapis.com/youtube/v3/search?part=snippet&maxResults=10&q=skyrim&key={YOUR_API_KEY}
i want search the first 10 video related to the keyword Skyrim.
I try this using eclipse and i have this error
Exception in thread "main" org.jsoup.HttpStatusException: HTTP errorfetching URL. Status=400, URL=https://www.googleapis.com/youtube/v3/search?part=snippet&maxResults=10&key={YOUR_API_KEY}&q=skyrim&max-results=10
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:590)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:540)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:227)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:216)
I run the query in this mode :
private static String QueryURL ="https://www.googleapis.com/youtube/v3/search?part=snippet&maxResults=10&key={YOUR_API_KEY}&q=";
Document doc = Jsoup.connect(QueryURL+stringa).userAgent("Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36").get();
Where stringa is = Skyrim.
Thanks all for help
Start at the YouTube API v3 page: https://developers.google.com/youtube/v3/