URL url = new URL("https://www.cs.tut.fi/~jkorpela/html/iframe-pdf.html");
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
InputStream in = connection.getInputStream();
When calling on getInputStream, i turn all the bytes into a string. But why am i not seeing any sign of the data in the iframe?
My goal is to download the PDF.
If you request a URL, you will only get the contents of that file. An iframe is normally effectively a seperate page, so you would need to request that seperately. A browser will normally do all this transparently.
I would recommend using a library such as JSoup which contains lots of methods for parsing HTML, which you will need to get the URL of the iframe (and the URL of the PDF).
Related
If I give any URL, I want to know whether the URL is video URL or audio URL.
Is there any API to find ?
Thanks!
This is not a function of the Java language. You might be able to find some third party library that does what you want, but you have not provided enough information to determine what it is you really want in any reasonable set of circumstances. The following might do what you want, though.
You could just get the ContentType from the http request using something like this:
URL url = new URL(myUrl);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("HEAD");
connection.connect();
String contentType = connection.getContentType();
A list of common video ContentTypes that you could detect for are listed on Wikipedia. You'll see video/avi and video/mpeg listed, for example.
Is it possible to load login page once, using HttpClient, and get image file of img element from cache, not from src link, without reload? It is important because I need to save captcha for just loaded page, if I try load it from src link, it will be another captcha. I tried:
DefaultHttpClient httpclient = new DefaultHttpClient();
HttpGet httpget = new HttpGet("http://www.mysite/login.jsp");
HttpResponse response = httpclient.execute(httpget);
HttpEntity entity = response.getEntity();
InputStream instream = entity.getContent();
OutputStream outstream = new FileOutputStream("d://file.html");
org.apache.commons.io.IOUtils.copy(instream, outstream);
outstream.close();
instream.close();
but there are not any images. I also tried HtmlUnitDriver from selenium library, there are not any images too. Maybe I must try something else? Can you help me with it?
Thanks and sorry for my English.
As it mentioned here: HttpClient Get images from response the DefaultHttpClient/HttpClient get's only one content, which is in your case it's an HTML page (served from: http://www.mysite/login.jsp). Than you need to parse that HTML page and get the specified img tag with it's src than you need only to download it (ONLY that, without resend the login.jsp request!). If you download a captcha image you need to get that image as soon as possible or it could be overwritten by another user, who tries to login.
As the browser does, you need to do the same way, download HTML, than parse it, than request all src/link/ect depends on what you need.
DefaultHttpClient doesn't cache by default.
CachingHttpClient cache is enabled by default, in this case you need to analyzes If-Modified-Since and If-None-Match headers in order to decide if request to the remote server is performed, or if its result is returned from cache. If there's no change on the server, you will get cached data, if you cached previously.
I am sending commands to a server using http, and I currently need to parse a response that the server sends back (I am sending the command via the command line, and the servers response appears in my browser).
There are a lot of resources such as this: Saving a web page to a file in Java, that clearly illustrate how to scrape a page such as cnn.com. However, since this is a response page that is only generated when the camera receives a specific command, my attempts to use the method described by Mike Deck (in the link above) have met with failure. (Specifically, when my program requests the page again the server returns a 401 error.)
The response from the server opens a new tab in my browser. Essentially, I need to know how to save the current web page using java, since reading in a file is probably the most simple way to approach this. Do any of you know how to do this?
TL;DR How do you save the current webpage to a webpage.html or webpage.txt file using java?
EDIT: I used Base64 from the Apache commons codec, which solved my 401 authentication issue. However, I am still getting a 400 error when I attempt to connect my InputStream (see below). Does this mean a connection isn't being established in the first place?
URL url = new URL ("http://"+ipAddress+"/axis-cgi/record/record.cgi?diskid=SD_DISK");
byte[] encodedBytes = Base64.encodeBase64("root:pass".getBytes());
String encoding = new String (encodedBytes);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("POST");
connection.setDoInput (true);
connection.setRequestProperty ("Authorization", "Basic " + encoding);
connection.connect();
InputStream content = (InputStream)connection.getInputStream();
BufferedReader in = new BufferedReader (new InputStreamReader (content));
String line;
while ((line = in.readLine()) != null) {
System.out.println(line);
}
EDIT 2: Changing the request to a GET resolved the issue.
So while scrutinizing my code above, I decided to change
connection.setRequestMethod("POST");
to
connection.setRequestMethod("GET");
This solved my problem. In hindsight, I think the server was not recognizing the HTTP because it is not set up to handle the various trappings that come along with post.
I am trying to download an vcalendar using a java application, but I can't download from a specific link.
My code is:
URL uri = new URL("http://codebits.eu/s/calendar.ics");
InputStream in = uri.openStream();
int r = in.read();
while(r != -1) {
System.out.print((char)r);
r = in.read();
}
When I try to download from another link it works (ex: http://www.mysportscal.com/Files_iCal_CSV/iCal_AUTO_2011/f1_2011.ics). Something don't allow me to download and I can't figure out why, when I try with the browser it works.
I'd follow this example. Basically, get the response code for the connection. If it's a redirect (e.g. 301 in this case), retrieve the header location and attempt to access the file using that.
Simplistic Example:
URL uri = new URL("http://codebits.eu/s/calendar.ics");
HttpURLConnection con = (HttpURLConnection)uri.openConnection();
System.out.println(con.getResponseCode());
System.out.println(con.getHeaderField("Location"));
uri = new URL(con.getHeaderField("Location"));
con = (HttpURLConnection)uri.openConnection();
InputStream in = con.getInputStream();
You should check what that link actually provides. For example, it might be a page that has moved, which gives you back an HTTP 301 code. Your browser will automatically know to go and fetch it from the new URL, but your program won't.
You might want to try, for example, wireshark to sniff the actual traffic when you do the browser request.
I think too that there is a redirect. The browser downloads from ssl secured https://codebits.eu/s/calendar.ics. Try using a HttpURLConnection, it should follow redirects automatically:
HttpURLConnection con = (HttpURLConnection)uri.openConnection();
InputStream in = con.getInputStream();
how do I search for existence of a word in a webpage given its url say "www.microsoft.com". Do I need to download this webpage to perform this search ?
You just need to make http request on web page and grab all its content after that you can search necessary words in it, below code might help you to do so.
public static void main(String[] args) {
try {
URL url;
URLConnection urlConnection;
DataOutputStream outStream;
DataInputStream inStream;
// Build request body
String body =
"fName=" + URLEncoder.encode("Atli", "UTF-8") +
"&lName=" + URLEncoder.encode("Þór", "UTF-8");
// Create connection
url = new URL("http://www.example.com");
urlConnection = url.openConnection();
((HttpURLConnection)urlConnection).setRequestMethod("POST");
urlConnection.setDoInput(true);
urlConnection.setDoOutput(true);
urlConnection.setUseCaches(false);
urlConnection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
urlConnection.setRequestProperty("Content-Length", ""+ body.length());
// Create I/O streams
outStream = new DataOutputStream(urlConnection.getOutputStream());
inStream = new DataInputStream(urlConnection.getInputStream());
// Send request
outStream.writeBytes(body);
outStream.flush();
outStream.close();
// Get Response
// - For debugging purposes only!
String buffer;
while((buffer = inStream.readLine()) != null) {
System.out.println(buffer);
}
// Close I/O streams
inStream.close();
outStream.close();
}
catch(Exception ex) {
System.out.println("Exception cought:\n"+ ex.toString());
}
}
i know how i would do this in theory - use cURL or some application to download it, store the contents into a variable, then parse it for whatever you need
Yes, you need to download page content and search inside it for what you want. And if it happens that you want to search the whole microsoft.com website then you should either write your own web crawler, use an existing crawler or use some search engine API like Google's.
Yes, you'll have to download the page, and, to make sure to get the complete content, you'll want to execute scripts and include dynamic content - just like a browser.
We can't "search" something on a remote resource, that is not controlled by us and no webservers offers a "scan my content" method by default.
Most probably you'll want to load the page with a browser engine (webkit or something else) and perform the search on the internal DOM structure of that engine.
If you want to do the search yourself, then obviously you have to download the page.
If you're planning on this approach, i recommend Lucene (unless you want a simple substring search)
Or you could have a webservice that does it for you. You could request the webservice to grep the url and post back its results.
You could use a search engine's API. I believe Google and Bing (http://msdn.microsoft.com/en-us/library/dd251056.aspx) have ones you can use.