URL.openStream() and HttpResponse.getEntity().getContent() downloading different files of Inputstream - java

Using URL class in java.net package.
Method 1
String sourceUrl = "https://thumbor.thedailymeal.com/P09kUdGYdBReFSJne1qjVDIphDM=//https://videodam-assets.thedailymeal.com/filestore/5/3/0/2_37ec80e4c368169/5302scr_43fcce37a98877f.jpg%3Fv=2020-03-16+21%3A06%3A42&version=0";
java.net.URL url = new URL(sourceUrl);
InputStream inputStream = url.openStream();
Files.copy(inputStream, Paths.get("/Users/test/rr.png"), StandardCopyOption.REPLACE_EXISTING);
Using Apache's HttpClient class.
Method 2
String sourceUrl = "https://thumbor.thedailymeal.com/P09kUdGYdBReFSJne1qjVDIphDM=//https://videodam-assets.thedailymeal.com/filestore/5/3/0/2_37ec80e4c368169/5302scr_43fcce37a98877f.jpg%3Fv=2020-03-16+21%3A06%3A42&version=0";
CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpget = new HttpGet(sourceUrl);
HttpResponse httpresponse = httpclient.execute(httpget);
InputStream inputStream = httpresponse.getEntity().getContent();
Files.copy(inputStream, Paths.get("/Users/test/rr.png"), StandardCopyOption.REPLACE_EXISTING);
I have downloaded the rr.png file using both the methods. I found both the files are different even in sizes also and using method 2 download a blank image. I have read both the methods are same but I do not understand why method1 downloading correct file and method2 downloading wrong file. Please clarify this and also let me know if there is a fix in the method 2 through which I can download the correct file.

First: cross-posting: https://coderanch.com/t/728266/java/URL-openStream-HttpResponse-getEntity-getContent
Second: I guess the issue is the url and how it's handled differently by javas internal class and apache lib - use a debugger and step through them to see what url really gets send out the tls stream.

Related

Is URL.openStream() the same as respone.getEntity().getContent()?

There is a file that will be downloaded when I make a get request to particular URL. I am able to get InputStream from both ways.
Method 1
Using URL class in java.net package.
java.net.URL url = new URL(downloadFileUrl);
InputStream inputStream = url.openStream();
Method 2
Using Apache's HttpClient class.
org.apache.http.impl.client.CloseableHttpClient httpclient = new CloseableHttpClient();
HttpGet request = new HttpGet(url);
CloseableHttpResponse response = httpclient.execute((HttpUriRequest)request);
InputStream inputStream = response.getEntity().getContent();
Are these methods the same? If not how? Which method is preferred generally or in a specific situation?
The examples I provided are simplistic. Assume I did the neccessary
congifurations with the URL and HttpClient objects to get successful response.
Both methods returns the input stream to read from the connection. There isn't difference between these methods. Since HttpClient is third party library, you need to keep a check for any vulnerabilities and keep updating the library.
Only difference is HttpClient supports only HTTP(s) protocol, whereas URLConnection can be used for other protocols too like FTP
In terms of functionalities, Apache HttpClient has more fine tuning options than URLConnection

URLConnection 404 only on Android not on simple Java

I am working on fetching a JSON on an externally hosted server. I have written 2 separate JUnit tests to test the network request with 1 in the Android environment and one just running the standard PC JUnit (not run on android).
When I use the non-Android based JUnit (the simple java program) test, the URLConnection works fine and I receive a response code of 200 from the urlConnection and the JSON. However, when I run the same static function on the Android Device, I receive a response code of 404 (File or location not found). The url itself is encoded and does not contain any non-ASCII characters.
For the purpose of not spamming the host server, I have replaced the url with http://example.com/JSONLink
Things I have tried:
1. The original implementation:
URL url = new URL("http://example.com/JSONLink");
System.out.println(url);
urlConnection = (HttpURLConnection) url.openConnection();
urlConnection.setRequestMethod("GET");
int problem = urlConnection.getResponseCode();
urlConnection.connect();
System.out.println("The error code is " + problem);
InputStream inputStream = urlConnection.getInputStream();
At this point, urlConnection throws a File Not Found Error. I have also tried to switch
urlConnection.getInputStream();
to
urlConnection.getErrorStream();
The Error stream gives me a HTML file which states that the server was unable to locate the file.
2. Ensured that the Android Manifest included the User-permission for Internet
<uses-permission android:name="android.permission.INTERNET" />
Attempted the deprecated apache HTTPDefaultClient Approach as such:
DefaultHttpClient httpClient = new DefaultHttpClient();
HttpGet httpGet = new HttpGet("http://example.com/JSONLink");
HttpResponse httpResponse = httpClient.execute(httpGet);
HttpEntity httpEntity = httpResponse.getEntity();
InputStream inputStream = httpEntity.getContent();
Use OkiHTTP
Each one of these attempts have all resulted in the 404 error. While I believe I understand what the 404 error means, I don't understand why I am getting it when I am giving it a valid url which can be accessed by any browser (including using chrome on the phone/emulator).
If this problem is server side, is there a way to imitate a browser just to fetch the json?
Thanks in advance
The first step that is always recommended in such case is to cross check the server. And this answer's your question
If this problem is server side, is there a way to imitate a browser just to fetch the json?
Yes. You can try any one of these chrome plugins to test your server is giving expected response from your browser.
1) Advanced REST client
https://chrome.google.com/webstore/detail/advanced-rest-client/hgmloofddffdnphfgcellkdfbfbjeloo
2) Postman
https://chrome.google.com/webstore/detail/postman/fhbjgbiflinjbdggehcddcbncdddomop

How to automate downloading a files using java program

I need to download files with multiple links from a page (may be more than 100 files with separate links) automatically. I know the URL to login and I have credentials.
I'm willing to do this in Java program by automation. The only way to go to the downloading location page is through login to the site.
Is cURL command helpful to this?
Please advise me to do this.
You can use wget which can download log files:
wget -r --no-parent --user=user --password=password --no-check-certificate <URL>
You can pass headers in --header, e.g. --header "Cookie: JSONSESSIONID=3433434343434"
you can pass post data using --post-data 'email=$EMAIL&password=$PASSWRD'
Or You can use following HttpClient in java:
Here is examples of HTTPClient for login and passing POST/GET/Headers information
First get whole HTML page as String
Either parse that String to get links for files or convert to java objects using XML to Object mappers like https://github.com/FasterXML/jackson-dataformat-xml
Once you get the links of files to download files using HttpClient
public void saveFile(String url, String FileName) throws ClientProtocolException, IOException{
HttpGet httpget = new HttpGet(url);
HttpResponse response = httpClient.execute(httpget);
HttpEntity entity = response.getEntity();
if (entity != null) {
long len = entity.getContentLength();
InputStream is = entity.getContent();
FileOutputStream fos = new FileOutputStream(new File(filePath)));
IOUtils.copy(is, fos);
}
return;
}
If you mean to copy a file from a site to a local file then you can use java.nio.file
Files.copy(new URL("http://host/site/filename").openStream(), Paths.get(localfile)

Can I get cached images using HttpClient?

Is it possible to load login page once, using HttpClient, and get image file of img element from cache, not from src link, without reload? It is important because I need to save captcha for just loaded page, if I try load it from src link, it will be another captcha. I tried:
DefaultHttpClient httpclient = new DefaultHttpClient();
HttpGet httpget = new HttpGet("http://www.mysite/login.jsp");
HttpResponse response = httpclient.execute(httpget);
HttpEntity entity = response.getEntity();
InputStream instream = entity.getContent();
OutputStream outstream = new FileOutputStream("d://file.html");
org.apache.commons.io.IOUtils.copy(instream, outstream);
outstream.close();
instream.close();
but there are not any images. I also tried HtmlUnitDriver from selenium library, there are not any images too. Maybe I must try something else? Can you help me with it?
Thanks and sorry for my English.
As it mentioned here: HttpClient Get images from response the DefaultHttpClient/HttpClient get's only one content, which is in your case it's an HTML page (served from: http://www.mysite/login.jsp). Than you need to parse that HTML page and get the specified img tag with it's src than you need only to download it (ONLY that, without resend the login.jsp request!). If you download a captcha image you need to get that image as soon as possible or it could be overwritten by another user, who tries to login.
As the browser does, you need to do the same way, download HTML, than parse it, than request all src/link/ect depends on what you need.
DefaultHttpClient doesn't cache by default.
CachingHttpClient cache is enabled by default, in this case you need to analyzes If-Modified-Since and If-None-Match headers in order to decide if request to the remote server is performed, or if its result is returned from cache. If there's no change on the server, you will get cached data, if you cached previously.

HttpClient 4 + HttpPost help with file name encoding

I am trying to upload file to servlet. I am trying to add file name to header and read in on servlet... But in the servlet side the file name (containing cyrillic) I can get is only ??? ?????.wmv. So my question is how to upload file with ciryllic file names correctly?
I use HttpClient 4
the code snippet:
HttpClient httpclient = new DefaultHttpClient();
httpclient.getParams().setParameter(CoreProtocolPNames.
PROTOCOL_VERSION,
HttpVersion.HTTP_1_1);
String url="testUrl";
httppost = new HttpPost(url);
httppost.addHeader(FILE_NAME_HEADER, file.getName());
Any useful comment is appreciated :)
Andrew
Personally I'm using Apache Commons FileUpload library to handle file uploads. Here is the link. Instead of doing Your own, you can use this library. If it doesn't satisfy your needs, you can still work on your own solution.
Use HttpMultipartMode.BROWSER_COMPATIBLE mode.
See https://issues.apache.org/jira/browse/HTTPCLIENT-293 for the details

Categories