when I get the following url with curl
curl -D headers.http "http://www.springerlink.com/index/10.1007/s00453-007-9157-8"
the file headers.http contains a "Location" header:
HTTP/1.1 302 Found
Date: Tue, 27 Oct 2009 17:00:20 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 2.0.50727
Location: http://www.springerlink.com/link.asp?id=c104731297q64224
Set-Cookie: CookiesSupported=True; expires=Wed, 27-Oct-2010 17:00:20 GMT; path=/
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Length: 173
but when I used the apache httpclient library this "Location:" header was missing (?).
int status = httpClient.executeMethod(method);
if(status!=HttpStatus.SC_OK &&
status!=HttpStatus.SC_MOVED_TEMPORARILY &&
status!=HttpStatus.SC_MOVED_PERMANENTLY
)
{
throw new IOException("connection failure for "+url+" status:"+status);
}
Header header=method.getResponseHeader("Location");
if(header==null )
{
for(Header h:method.getResponseHeaders())
{
LOG.info(h.toString());
}
throw new IOException(
"Expected a redirect for "+url
);
}
I've listed the headers below:
INFO: Date: Tue, 27 Oct 2009 17:05:13 GMT
INFO: Server: Microsoft-IIS/6.0
INFO: X-Powered-By: ASP.NET
INFO: X-AspNet-Version: 2.0.50727
INFO: Set-Cookie: ASP.NET_SessionId=js1o5wqnuhuh24islnvkyr45; path=/; HttpOnly
INFO: Cache-Control: private
INFO: Content-Type: text/html; charset=utf-8
INFO: Content-Length: 17245
uhh ???
What's going on is that with curl , you are getting a 302 which is actually a redirect, to the URL in the location header.
With the Apache httpclient it is doing the redirect for you, and returning the headers from the request to the redirected-to location.
To demonstrate this try
curl -D headers.http "http://www.springerlink.com/link.asp?id=c104731297q64224"
and compare the response.
edit: There are actually about 4 redirects in there if you follow each location header through with curl.
http://www.springerlink.com/index/10.1007/s00453-007-9157-8 is actually a redirect. Since the -D option means "headers only", the first one is not redirecting to the specified Location: ..., while the second one is. Take a look at the Content-Length, it's much different.
What happens when you leave out the -D?
Add this
method.setFollowRedirects(false);
Before you execute the method.
HttpClient follows the redirect automatically by default but Curl doesn't.
Related
Using certbot fails to generate certificate with this error:
org.shredzone.acme4j.exception.AcmeException: Failed to pass the challenge for domain www.
mysampledomain123.com, ... Giving up.
I manually checked the challenge file and got
http://www.mysampledomain123.com/.well-known/acme-challenge/jU--PkDrn5tDZw2RN6NNJHbPD00ovHFkLFvN3mJdeQX
Inside the file:
jU--PkDrn5tDZw2RN6NNJHbPD00ovHFkLFvN3mJdeQX.tuMr-UijwpsJ1KVZkdWTYgodWZ2SxxKdB7_CMAAEfpg
And here's the complete HTTP response header:
Accept-Ranges: bytes
Connection: keep-alive
Content-Encoding: gzip
Content-Type: text/plain;charset=iso-8859-1
Date: Sun, 16 Feb 2020 14:15:22 GMT
Server: nginx/1.14.0 (Ubuntu)
Transfer-Encoding: chunked
Vary: Accept-Charset, Accept-Encoding, Accept-Language, Accept
X-Powered-By: MyServer
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 0
I'm wondering whether the problem is with the HTTP response headers or the content itself.
Any ideas would be appreciated.
This is my java program
private static final String SAMPLE_URL = "https://www.dropbox.com/s/<something>/test_out4.mp4";
public static void main(String[] args) throws IOException, URISyntaxException {
HttpClient client = HttpClientBuilder.create().build();
HttpHead request = new HttpHead(new URI(SAMPLE_URL));
HttpResponse response = client.execute(request);
System.out.println(response.getStatusLine());
for (Header header : response.getAllHeaders()) {
System.out.println(header.getName() + ": " + header.getValue());
}
}
See a snippet of the output of the java program. The status line says it is HTTP/1.1 200 OK. However the header fields that are printed doesn't match what i get when i run curl manually. It seems to take the header values from the first response and not from the last response. Even more important Content Length field which is present in the last response is not set the response structure.
HTTP/1.1 200 OK <<< Status is 200
Server: nginx
Date: Thu, 20 Jun 2019 02:22:58 GMT
Content-Type: text/html; charset=utf-8 << Content type is char
When i run curl the output is correct. Is there any setting in HttpClient to return the most recent headers?
curl -I https://www.dropbox.com/s/<something>/test_out4.mp4
HTTP/1.1 301 Moved Permanently <<< Status 301
Server: nginx
Date: Thu, 20 Jun 2019 02:20:50 GMT
Content-Type: text/html; charset=utf-8 <<< Content type text
Connection: keep-alive
....
HTTP/1.1 302 Found << second redirect
Server: nginx
Date: Thu, 20 Jun 2019 02:36:03 GMT
Content-Type: text/html; charset=utf-8
....
HTTP/1.1 200 OK <<< Status finally 200
Server: nginx
Date: Thu, 20 Jun 2019 02:36:04 GMT
Content-Type: video/mp4 << content type correct
Content-Length: 92894175 << length correct
Connection: keep-alive
I've written a query tool which at it's heart uses this block to gather info from an external url:
AsyncHttpClientConfig proxiedCF = new DefaultAsyncHttpClientConfig.Builder().setUserAgent(pickUserAgent()).build();
AsyncHttpClient asyncHttpClient = new DefaultAsyncHttpClient(proxiedCF);
Future<Response> f = asyncHttpClient.prepareGet(url).setProxyServer(new ProxyServer.Builder(pickProxyServer(), 80)).execute();
It works fine. However it works even when an invalid proxy is provided, which is a bit suspicious and I feel my elaborate proxy configuration is not used at all.
I stumbled upon this by having pickProxyServer() return a String "1.1.1.1", which is obviously not a valid web proxy.
I use SLF4J for logging and it looks pretty normal:
20:31:47.454 [AsyncHttpClient-7-1] DEBUG o.a.n.channel.NettyConnectListener - Using new Channel '[id: 0x03359938, L:/10.0.0.101:59775 - R:/1.1.1.1:80]' for 'GET' to '[[url removed by me]]'
20:31:47.586 [AsyncHttpClient-7-1] DEBUG o.a.netty.handler.HttpHandler -
Request DefaultFullHttpRequest(decodeResult: success, version: HTTP/1.1, content: EmptyByteBufBE)
GET [[url removed by me]] HTTP/1.1
Host: [[url removed by me]]
Accept: */*
User-Agent: burning_dandelion
Response DefaultHttpResponse(decodeResult: success, version: HTTP/1.1)
HTTP/1.1 200 OK
Date: Fri, 13 Oct 2017 18:31:49 GMT
Expires: Fri, 13 Oct 2017 18:31:49 GMT
Cache-Control: private, max-age=3600
Content-Type: text/xml; charset=ISO-8859-1
P3P: CP="This is not a P3P policy!"
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Set-Cookie: 1P_JAR=2017-10-13-18; expires=Fri, 20-Oct-2017 18:31:49 GMT; path=/; domain=[[url removed by me]]
Set-Cookie: NID=114=qiVBv02cmXYHh2RfLQbhBfESWIoaGlf3d2jlSbAdQ8yWPDsCpOeK9aYbvfq0HWsER68W1oE53jiriM_fivTc1bJi1F2sfCi0wMptKI-9U3ueVKITtFvYYZx2T0rJf1kQ; expires=Sat, 14-Apr-2018 18:31:49 GMT; path=/; domain=[[url removed by me]]; HttpOnly
Accept-Ranges: none
Vary: Accept-Encoding
Transfer-Encoding: chunked
20:31:47.587 [AsyncHttpClient-7-1] DEBUG o.a.netty.channel.ChannelManager - Adding key: ProxyPartitionKey(proxyHost=1.1.1.1, proxyPort=80, secured=false, targetHostBaseUrl=[[url removed by me]]:80 for channel [id: 0x03359938, L:/10.0.0.101:59775 - R:/1.1.1.1:80]
Can someone point me towards my error? Obviously, I want an I/O exception or any kind of notice when an invalid proxy is called upon.
Ok it's the middle of the night but I found something. When the port is anything but :80, the proxy is correctly used. This does the trick for me.
I am a new programmer i am trying to build an app with Json.
If i use this URL doesn't work . http://zsuzsafodraszat.hostzi.com/boltok.json
if i Use this, my app working. https://api.myjson.com/bins/3zm8i
Both Json files exactly the same.
Can you help me what i am doing wrong ? Maybe bad extension or web000 is not a good service for Json ? Can you give me some good free json hosting ? Thanks
Those 2 urls do not have the same content or the same headers. You can see this if run curl commands from the command line:
$ curl -i "http://zsuzsafodraszat.hostzi.com/boltok.json"
HTTP/1.1 200 OK
Date: Wed, 13 Apr 2016 22:52:50 GMT
Server: Apache
Last-Modified: Wed, 13 Apr 2016 16:48:23 GMT
Accept-Ranges: bytes
Content-Length: 1020
Connection: close
Content-Type: application/json
??{"Aldi":"http://catalog.aldi.com/emag/hu_HU/print/Online_katalogus_04_07/Online_katalogus_04_07.pdf",
"Lidl":"http://www.lidl.hu/statics/lidl-hu/ds_doc/HU_HHZ_kw14_2016.pdf",
"Spar":"http://ajanlatok.spar.hu/view/download/?d=1279",
"Penny":"https://view.publitas.com/16538/136265/pdfs/016f82fb5b00bc97b5a8c35f512d89b01cd3e3ce.pdf",
"Coop":"https://view.publitas.com/2556/133497/pdfs/16603d7e9bf30e8a8a4efec7f01d3fa2caf92fe0.pdf",
"Auchan":"http://www.lidl.hu/statics/lidl-hu/ds_doc/HU_HHZ_kw14_2016.pdf"}
$ curl -i "https://api.myjson.com/bins/3zm8i"
HTTP/1.1 200 OK
Server: nginx/1.5.8
Date: Wed, 13 Apr 2016 22:52:56 GMT
Content-Type: application/json
Content-Length: 500
Connection: keep-alive
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
{"Aldi":"http://catalog.aldi.com/emag/hu_HU/print/Online_katalogus_04_07/Online_katalogus_04_07.pdf","Lidl":"http://www.lidl.hu/statics/lidl-hu/ds_doc/HU_HHZ_kw14_2016.pdf","Spar":"http://ajanlatok.spar.hu/view/download/?id=1279","Penny":"https://view.publitas.com/16538/136265/pdfs/016f82fb5b00bc97b5a8c35f512d89b01cd3e3ce.pdf","Coop":"https://view.publitas.com/2556/133497/pdfs/16603d7e9bf30e8a8a4efec7f01d3fa2caf92fe0.pdf","Auchan":"http://www.lidl.hu/statics/lidl-hu/ds_doc/HU_HHZ_kw14_2016.pdf"}
As you can see, one of them has a couple of junk bytes at the beginning that my terminal is displaying as question marks. Also the http headers are different. The Content-Lengths are wildly different too. Did you use something other than a plain text editor to create the json payload in the failing example?
Try removing the junk characters and adding these http headers:
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
I've got this URL (http://vignette4.wikia.nocookie.net/fantendo/images/6/6e/Small-mario.png/revision/latest?cb=20120718024112)
how determinate the file extension if it isn't at the end of the url?
You'll need to read the response headers. The MIME Type if known is stored in the Content-Type header.
HTTP/1.1 200 OK
Content-Disposition: inline; filename="Small-mario.png"
X-Thumbnailer: Vignette
Content-Type: image/png
Cache-Control: public, max-age=31536000
X-Surrogate-Key: ad1f82ba0cbe38fa60f83c036993a71e05dae492
Server: Jetty(9.2.z-SNAPSHOT)
X-Cacheable: YES
Content-Length: 58457
Accept-Ranges: bytes
Date: Mon, 06 Jul 2015 16:12:31 GMT
Age: 65
Connection: keep-alive
X-Served-By: thumbnailer-s1, cache-wk-sjc3160-WIKIA, cache-lhr6322-LHR
X-Cache: ORIGIN, MISS, HIT
X-Cache-Hits: ORIGIN, 0, 5
X-Timer: S1436199151.564330,VS0,VE0
Vary: Accept-Encoding
Timing-Allow-Origin: *
You're looking for the Content-Type header, which the server ought to send in the HTTP response to tell you this.
Note that it is not guaranteed to be accurate, or present at all.