Retrieving data from URL fails

Retrieving data from URL fails - java

I have been trying different ways to get data from the following link:
http://www.ensembl.org/Danio_rerio/Export/Output/Location?db=core;flank3_display=300;flank5_display=300;output=fasta;r=18:19408965-19409049;strand=feature;coding=yes;cdna=yes;peptide=yes;utr3=yes;exon=yes;intron=yes;genomic=unmasked;utr5=yes;_format=Text
Copy paste the link to a web browser works for me but I cannot get to it programmatically in java.
It seems that it doesn't follow the get protocol as the separation of parameters is not as expected.
I tried to use URL but it separates the link above into server path and query and results in HTTP 500.
I tried to use sockets but again failed.
I believe that what I need is a way to simply send the complete string unaltered and then read the result.
Any ideas?

This code reads first line from that URL successfully:
URL u = new URL("http://www.ensembl.org/Danio_rerio/Export/Output/Location?db=core;flank3_display=300;flank5_display=300;output=fasta;r=18:19408965-19409049;strand=feature;coding=yes;cdna=yes;peptide=yes;utr3=yes;exon=yes;intron=yes;genomic=unmasked;utr5=yes;_format=Text");
DataInputStream ds = new DataInputStream(u.openStream());
String s = ds.readLine();
System.out.println(s);
It prints out: >18 dna:chromosome chromosome:Zv9:18:19408665:19409349:1

Related

Cannot get URL (with sql query) to be handled by DefaultHttpClient in java

I've been struggling for half a day trying to learn this and am stuck. My goal is to query finance data from yql yahoo finance tables. I have set up some code from an AndroidHive example and am able to get it running correctly for their sample query. But that sample query just grabs a JSON object directly from the main URL that they provide. To do this for yql, I need to convert the SQL query into a format that the httpClient will recognize, and my app keeps hanging and never returning a response.
First, I tried taking the exact query string from the yql to replicate their search, which for me was this:
https://query.yahooapis.com/v1/public/yql?q=select%20symbol%2CChange%20from%20yahoo.finance.quote%20where%20symbol%20in%20(%22SH%22%2C%22DOG%22%2C%22RWM%22)&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=
That gives me a null result when I set url = to the above, and then run the following:
DefaultHttpClient httpClient = new DefaultHttpClient();
.
.
.
HttpGet httpGet = new HttpGet(url);
httpResponse = httpClient.execute(httpGet);
If I do this with the AndroidHive example URL in my code, it works fine.
If I enter the above URL in browser, it works fine. So clearly, my URL is not being entered correctly.
So then I read online that I need to use a URLEncode to convert syntax of URL into the correct format. Here's what I did:
private String url = "https://query.yahooapis.com/v1/public/yql?q=" +URLEncoder.encode("select symbol, Change from yahoo.finance.quote where symbol in (\"SH\",\"DOG\".\"RWM\")")+"&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=";
That gives me 2 problems. First, it tells me that encode is deprecated, and that I am supposed to use the new syntax that adds a "UTF-8" as a second string parameter, but when I do that, it is a compiler error of java.io.UnsupportedEncodingException (ugh), so then I just tried running it with the deprecated form, but that just leaves my app hanging forever like the original format.
I must be doing something really obvious incorrect here...
okay - updated.
HTTPS!!! versus HTTP. the sample query that i did for the AndroidHive example was HTTP and this yql query had HTTPS. if i simply change it to HTTP it works, and i didn't even need to do the URLEncoder business. The string directly from yql worked.
Now I need to start solving my JSON parsing bugs, but I've got my data!

Sending simple as URL parameter HttpClient GET request

I am developing an Android application in which I am trying to send a simple array as a URL parameter, but it is not working properly. I am using a HTTP client and GET method. I have tried this in the following way:
StringBuilder sb = new StringBuilder();
sb.append(URLEncoder.encode(e.getKey(), "UTF-8")).append('=').append(URLEncoder.encode(e.getValue()+"", "UTF-8"));
where e.getValue() is ArrayList<Integers>
My URL params are appended %5B28%5D when I am sending [28]. If I don't use URL encoder then it goes as [28] But I want to use URL encoder. Am I doing anything wrong?

Your code is fine. this is how URL encoding works.
Seems like there issue in server at the time of decoding.
Debug the server for any possible issue with decoding.
also refer this answer for a better way of sending an array in get request.

Why is an exception generated in java when I use the following code

Document doc1;
String url="http://www.google.com";
url= url +" and 1=1";
doc1=Jsoup.connect(url).get();
Here there is no problem with the connection as the following code gives no exception. The exception is generated only when I try to get the HTML code with the above code.
Document doc1;
String url="http://www.google.com";
url= url +" and 1=1";
Jsoup.connect(url);
Thanks!

JSoup.connect doesn't actually try to connect to the website. If you look through the documentation, you'll see that it only creates a Connection object. You can chain method calls on the Connection to set cookies, user agent, and other stuff before calling get, execute, post, or one of the other methods that will actually send the request.
(Here's another documentation link that might be easier to browse. Unfortunately, Javadoc's use of frames makes linking awkward.)

According to Jsoup javadoc:
Jsoup.connect(String url):
Creates a new Connection to a URL. Use to fetch and parse a HTML page.
Connection.get():
Execute the request as a GET, and parse the result.
So in the first sample you are querying google and getting an IOException because of the invalid URL but not in the second sample (no query is made)

Jsoup.connect doesn't actually connect to anything. It just creates a Connection object. If you want to set any special properties of the connection, you can do it before you call get() which is what actually connects.
As for why you get an exception: probably because http://www.google.com and 1=1 is not a valid URL.

Error 503 in HTTP during page parsing in java

Today I'm developing a java RMI server (and also the client) that gets info from a page and returns me what I want. I put the code right down here. The problem is that sometimes the url I pass to the method throws an IOException that says that the url given makes a 503 HTTP error. It could be easy if it was always that way but the thing is that it appears sometimes.
I have this method structure because the page I parse is from a weather company and I want info from many cities, not only for one, so some cities works perfectly at the first chance and others it fails. Any suggestions?
public ArrayList<Medidas> parse(String url){
medidas = new ArrayList<Medidas>();
int v=0;
String sourceLine;
String content = "";
try{
// The URL address of the page to open.
URL address = new URL(url);
// Open the address and create a BufferedReader with the source code.
InputStreamReader pageInput = new InputStreamReader(address.openStream());
BufferedReader source = new BufferedReader(pageInput);
// Append each new HTML line into one string. Add a tab character.
while ((sourceLine = source.readLine()) != null){
if(sourceLine.contains("<tbody>")) v=1;
else if (sourceLine.contains("</tbody>"))
break;
else if(v==1)
content += sourceLine + "\n";
}
........................
........................ NOW THE PARSING CODE, NOT IMPORTANT
}

HTTP 500 errors reflect server errors so it has likely nothing to do with your client code.
You would get a 400 error if you were passing invalid parameters on your request.
503 is "Service Unavailable" and may be sent by the server when it is overloaded and cannot process your request. From a publicly accessible server, that could explain the erratic behavior.
Edit
Build a retry handler in your code when you detect a 503. Apache HTTPClient can do that automatically for you.
List of HTTP Status Codes

Check that the IOException is really not a MalformedURLException. Try printing out the URLs to verify a bad URL is not causing the IOException.
How large is the file you are parsing? Perhaps your JVM is running out of memory.

Guaranteed way to correctly get the contents of www.bing.com/

I have been working on a program that gets the contents of www.bing.com and saves it to a file, but out of the two ways I have tried one using sockets, and the other using HtmlUnit neither shows the contents 100% correct when I open the file. I know there are other options out there, but I looking for one that is guaranteed to get the contents of www.bing.com/ correctly. I would therefore appreciate it if someone could point me to a means of accomplishing this.

The differences you see are likely due to the web server providing different content to different browsers based on the user agent string and other request headers.
Try setting the User-Agent header in your socket and HtmlUnit strategies to the one you are comparing against and see if the result is as expected. Moreover, you will likely have to replicate the request headers exactly as they are sent by your target browser.

What is "incorrect" about what is returned? Keep in mind, Bing is probably generating some of the content via JavaScript; your client will need to make additional requests to retrieve the JavaScript files, run the JavaScript, etc.

You can use a URL.openConnection() to create a URLConnection and call URLConnection.getInputStream(). You can read the InputStream contents and write it to a file.
If you need to override the User-Agent because the server is using it to serve different content you can do so by first setting the http.agent system property to empty string.
/* Somewhere in your code before you make requests */
System.setProperty("http.agent", "");
or using -Dhttp.agent= on your java command line
and then setting the User-Agent to something useful on the connection before you get the InputStream.
URLConnection conn = ... //Create your URL connection as described above.
String userAgent = ... //Some user-agent string here.
conn.setRequestProperty("User-Agent", userAgent);

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Retrieving data from URL fails - java

Related

Cannot get URL (with sql query) to be handled by DefaultHttpClient in java

Sending simple as URL parameter HttpClient GET request

Why is an exception generated in java when I use the following code

Error 503 in HTTP during page parsing in java

Guaranteed way to correctly get the contents of www.bing.com/

Categories

Resources