how to download multiple webpages that use a 'next' button - java

I am trying to download the latest HTML code from this website, until recently the URL displayed all the information I needed. Recently the web designer changed the format so a portion of the data is displayed and the user must hit the 'next' button to display next portion of data.
The URL doesn't change though.
Anyone know how I can download all the information using JAVA??
Thanks. This is my current code:
[code]
URL url = null;
InputStream is = null;
BufferReader br;
String line;
try {
url = new URL("HTTP://...../..../...");
is = url.openStream();
br = new BufferedReader(new InputStreamReader(is));
while ( (line = br.readLine() ) != null)
System.out.println(line);
} catch(IOException e) {
}
....
[/code]

Related

ImageIO.readImage IIOException while I can open it in Chrome

I can open this image in my browser but it won't load in my java application, why? It is supposed to be a free-to-use database, I can't see why I can't use it.
I'm using this piece of code:
public static String getContentsFromURL(String address){
String contents = "";
try{
URL url = new URL(address);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(url.openStream()));
String line;
while((line = bufferedReader.readLine()) != null){
contents += line;
}
bufferedReader.close();
}catch(IOException e){
e.printStackTrace();
}
return contents;
}
And I'm getting an IIOException "Can't find input file!"
try this code
URL url = new URL("http://ddragon.leagueoflegends.com/cdn/9.20.1/img/champion/Gragas.png");
Image image1 = ImageIO.read(url);
image screenshot from my debbuger.

Getting incomplete HTML source on url.openConnection()

I am trying to get HTML page source for a website. But I am not able to get some image links, which I think are populated dynamically on the webpage.
I am using java as:
url = new URL(firstLevelURL);
connection = (HttpURLConnection) url.openConnection();
try ( // Read all the text returned by the server
BufferedReader br = new BufferedReader(new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
// Read each line of "in" until done, adding each to "response"
while ((str = br.readLine()) != null) {
// str is one line of text readLine() strips newline characters
//I am not able to get this image as it is loaded dynamically using javascript/ajax or something.
if(str.contains("<img id=\"tileImage")) {
response = str;
break;
}
}
}
I tried using :
connection.setReadTimeout(15*1000);
But the page is still not loading completely
Is there any way to wait for page to load completely before fetching HTML source

Open a downloaded html file with WebView

I have an app that opens certain webpages with webview. If there is internet connection, the webview opens a certain url and downloads html file. If there is no internet connection, the webview is supposed to open the previously downloaded html file.
This is how I'm trying to do it:
webView.loadUrl(Environment.getExternalStorageDirectory().toString() + "/Android/data/com.whizzapps.stpsurniki/" + razred + ".html");
The path is 100% right, but it still won't show it for some reason. I did some research and I saw that people usually put the downloaded html file in assets folder, but I'm downloading the html file inside application so I don't really have access to assets folder. What should I do?
you can use loadData instead, but you need to read the file first:
data = readFile(Environment.getExternalStorageDirectory().toString() + "/Android/data/com.whizzapps.stpsurniki/" + razred + ".html");
webView.loadData(data, "text/html; charset=UTF-8", null);
//or
//webView.loadDataWithBaseURL(null, result, "text/html; charset=UTF-8", null, null);
here is a function to read the file
private String readFile(String path) throws IOException
{
StringBuilder sb = new StringBuilder();
BufferedReader br = new BufferedReader(new FileReader(path));
try
{
String line = null;
while ((line = br.readLine())!=null)
{
sb.append(line);
}
}
finally
{
br.close();
}
return sb.toString();
}

Android java read html content

I have a problem with this code to display the html content. When I try it on your smartphone, I would print "Error" that is capturing an error, where am I wrong?
String a2="";
try {
URL url = new URL("www.google.com");
InputStreamReader isr = new InputStreamReader(url.openStream());
BufferedReader in = new BufferedReader(isr);
String inputLine;
while ((inputLine = in.readLine()) != null){
a2+=inputLine;
}
in.close();
tx.setText("OUTPUT \n"+a2);
} catch (Exception e) {
tx.setText("Error");
}
URL requires a correctly formed url. You should use:
URL url = new URL("http://www.google.com");
Update:
As you are getting a NetworkOnMainThreadException, it appears that you are attempting to make the connection in the main thread.
Ths solution is to run the code in an AsyncTask.

Extract HTML from URL

I'm using Boilerpipe to extract text from url, using this code:
URL url = new URL("http://www.example.com/some-location/index.html");
String text = ArticleExtractor.INSTANCE.getText(url);
the String text contains just the text of the html page, but I need to extract to whole html code from it.
Is there anyone who used this library and knows how to extract the HTML code?
You can check the demo page for more info on the library.
For something as simple as this you don't really need an external library:
URL url = new URL("http://www.google.com");
InputStream is = (InputStream) url.getContent();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line = null;
StringBuffer sb = new StringBuffer();
while((line = br.readLine()) != null){
sb.append(line);
}
String htmlContent = sb.toString();
Just use the KeepEverythingExtractor instead of the ArticleExtractor.
But this is using the wrong tool for the wrong job. What you want is just to download the HTML content of a URL (right?), not extract content. So why use a content extractor?
With Java 7 and a trick of Scanner, you can do the following:
public static String toHtmlString(URL url) throws IOException {
Objects.requireNonNull(url, "The url cannot be null.");
try (InputStream is = url.openStream(); Scanner sc = new Scanner(is)) {
sc.useDelimiter("\\A");
if (sc.hasNext()) {
return sc.next();
} else {
return null; // or empty
}
}
}

Categories