Ok so I wrote a piece of code testing ability of my java to connect to internet. It is supposed to fetch html from www.google.com and display the contents in a JFrame's JTextArea object.
Here's the code, so you can have clear picture:
import java.awt.Color;
import java.awt.Dimension;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import javax.swing.JFrame;
import javax.swing.JTextArea;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class JSoupFetchTest extends JFrame{
String userAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0";
boolean jsoupcond = true;
String address = "http://www.google.com";
JTextArea text;
public JSoupFetchTest(){
text = new JTextArea();
text.setPreferredSize(new Dimension(500, 500));
text.setBackground(Color.BLACK);
text.setForeground(Color.WHITE);
text.setVisible(true);
text.setLineWrap(true);
this.add(text);
this.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
this.setVisible(true);
this.pack();
gogo();
}
private void gogo() {
if(jsoupcond){
text.setText(text.getText() +"\nstart...");
try {
text.setText(text.getText() +"\nConnecting to " +address+ "...");
Document doc = Jsoup.connect(address).userAgent(userAgent).get();
text.setText(text.getText() +"\nConverting page document into text");
String s = doc.toString();
text.setText(text.getText() +"\nText: \n" +s);
System.out.println();
} catch (Exception e) {
text.setText(text.getText() +"\n" +e.toString());
e.printStackTrace();
}
text.setText(text.getText() +"\nEnd.");
}
String html = downloadHtml(address);
text.setText(text.getText() +"\nDownloading HTML...");
text.setText(text.getText() +"\nHTML:");
text.setText(text.getText() +"\n" +html);
}
private String downloadHtml(String path) {
text.setText(text.getText() +"\ndownloadHtml entry point...");
InputStream is = null;
try {
text.setText(text.getText() +"\ntry block entered...");
String result = "";
String line;
URL url = new URL(path);
text.setText(text.getText() +"\nabout to open url stream...");
is = url.openStream(); // throws an IOException
text.setText(text.getText() +"\nurl stream opened...");
BufferedReader br = new BufferedReader(new InputStreamReader(is));
text.setText(text.getText() +"\nstarting to read lines...");
while ((line = br.readLine()) != null) {
result += line;
}
text.setText(text.getText() +"\nreading lines finished...");
return result;
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
try {
if (is != null) is.close();
} catch (IOException ioe) { }
}
return "";
}
public static void main(String[] args) {
new JSoupFetchTest();
}
}
I should also add that:
1. My eclipse (cause that's what I'm using) can't connect to marketplace nor can't fetch updates.
2. Eclipse's web browser works fine.
3. My system's browser (Mozilla Firefox) connects fine
4. I exported JSoupFetchTest into a runnable jar and tried to run it from system's level, with no effect
5. I am running Windows 7 Professional MSDN version
6. I contacted eclipse support and they concluded it is not eclipse's fault and suggested that I'm behind a proxy.
7. I contacted my ISP to see if I indeed am and they said I am not.
8. I changed my JAVA's network settings so now it connects "directly". Before the setting was "use browser settings" and it didn't work either.
9. My eclipse's Window -> Preferences -> General -> Network Connections active provider is set to "Native", I also tried "Direct"
10. Method downloadHtml(String path) stops at "is = url.openStream();" and goes on forever...
The exception I get from JSoup is:
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:703)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1534)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1439)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:453)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:434)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:181)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:170)
at JSoupFetchTest.gogo(JSoupFetchTest.java:42)
at JSoupFetchTest.<init>(JSoupFetchTest.java:32)
at JSoupFetchTest.main(JSoupFetchTest.java:92)
I also tried to set JSoup.connect's timeout to infinity. Then it goes on forever.
Before you guys say that my question is a duplicate, or delegate me to other, external possible solutions to my problem, believe me - either the question is mine or I was there - I browse internet in search for solution for weeks now and I feel like pulling my hair out...
Please help if you can cause it prevents me from installing stuff in my eclipse and from developing anything else than stand alone apps...
You need a socket number after the URL -- "http:/www.google.com:80" works. JSoup likely uses defaults for that, but opening the URL as a stream in Java does not.
The following program works for me. So Java and JSoup are working. It has to be some sort of local configuration problem with your network. Check your firewall, routers, gateway, and Java permissions. Do a clean rebuild of your project. Etc. Comment out lines until it does work and then put the lines back one at a time until you find the problem. Etc.
package stuff;
import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URL;
import java.net.URLConnection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class SocketTest
{
public static void main( String[] args ) throws Exception
{
URL url = new URL( "http://www.google.com" );
URLConnection sock = url.openConnection();
InputStream ins = sock.getInputStream();
BufferedReader reader = new BufferedReader( new InputStreamReader(ins, "UTF-8" ) );
for( String line; (line = reader.readLine()) != null; ) {
System.out.println( line );
}
ins.close();
Document doc = Jsoup.connect( "http://www.google.com" ).get();
System.out.println( doc.toString() );
String userAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0";
Document doc2 = Jsoup.connect( "http://www.google.com" ).userAgent(userAgent).get();
System.out.println( doc2.toString() );
}
}
Related
I'm using OpenJDK 11 on Linux and I need to make sure all my web requests done with HttpURLConnection are properly closed and do not keep any file descriptors open.
Oracle's manual tells to use close on the InputStream and Android's manual tells to use disconnect on the HttpURLConnection object.
I also set Connection: close and http.keepAlive to false to avoid pooling of connections.
This seems to work with plain http requests but not encrypted https requests whose response is sent with non-chunked encoding. Only a GC seems to clean up the closed connections.
This example code:
import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.stream.Stream;
public class Test {
private static int printFds() throws IOException {
int cnt = 0;
try (Stream<Path> paths = Files.list(new File("/proc/self/fd").toPath())) {
for (Path path : (Iterable<Path>)paths::iterator) {
System.out.println(path);
++cnt;
}
}
System.out.println();
return cnt;
}
public static void main(String[] args) throws IOException, InterruptedException {
System.setProperty("http.keepAlive", "false");
for (int i = 0; i < 10; i++) {
// Must be a https endpoint returning non-chunked response
HttpURLConnection conn = (HttpURLConnection) new URL("https://www.google.com/").openConnection();
conn.setRequestProperty("Connection", "close");
BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream()));
while (in.readLine() != null) {
}
in.close();
conn.disconnect();
conn = null;
in = null;
}
Thread.sleep(1000);
int numBeforeGc = printFds();
System.gc();
Thread.sleep(1000);
int numAfterGc = printFds();
System.out.println(numBeforeGc == numAfterGc ? "No socket leaks" : "Sockets were leaked");
}
}
prints this output:
/proc/self/fd/0
/proc/self/fd/1
/proc/self/fd/2
/proc/self/fd/3
/proc/self/fd/4
/proc/self/fd/5
/proc/self/fd/9
/proc/self/fd/6
/proc/self/fd/7
/proc/self/fd/8
/proc/self/fd/10
/proc/self/fd/11
/proc/self/fd/12
/proc/self/fd/13
/proc/self/fd/14
/proc/self/fd/15
/proc/self/fd/16
/proc/self/fd/17
/proc/self/fd/18
/proc/self/fd/19
/proc/self/fd/0
/proc/self/fd/1
/proc/self/fd/2
/proc/self/fd/3
/proc/self/fd/4
/proc/self/fd/5
/proc/self/fd/9
/proc/self/fd/6
/proc/self/fd/7
/proc/self/fd/8
Sockets were leaked
Changing to a http URL makes the sockets close correctly as expected without GC:
/proc/self/fd/0
/proc/self/fd/1
/proc/self/fd/2
/proc/self/fd/3
/proc/self/fd/4
/proc/self/fd/5
/proc/self/fd/6
/proc/self/fd/0
/proc/self/fd/1
/proc/self/fd/2
/proc/self/fd/3
/proc/self/fd/4
/proc/self/fd/5
/proc/self/fd/6
No socket leak
Tested with both OpenJDK 11 and 12. Did I miss something or is this a bug?
Turns out to be a bug after all: https://bugs.openjdk.java.net/browse/JDK-8216326
shutdownInput is now replaced by close in the latest builds of JDK 11 and 13 (but not 12).
I was trying to get the HTML page and parse information. I just found out that some of the pages were not completely downloaded using Jsoup. I checked with curl command on command line then the complete page got downloaded. Initially I thought that it was site specific, but then I just tried to parse any big webpage randomly using Jsoup and found that it didn't download the complete webpage. I tried specifying user agent and time out properties still it failed to download. Here is the code I tried:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.UnsupportedEncodingException;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.HashSet;
import java.util.Set;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
public class JsoupTest {
public static void main(String[] args) throws MalformedURLException, UnsupportedEncodingException, IOException {
String urlStr = "http://en.wikipedia.org/wiki/List_of_law_clerks_of_the_Supreme_Court_of_the_United_States";
URL url = new URL(urlStr);
String content = "";
try (BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"))) {
for (String line; (line = reader.readLine()) != null;) {
content += line;
}
}
String article1 = Jsoup.connect(urlStr).get().text();
String article2 = Jsoup.connect(urlStr).userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6").referrer("http://www.google.com").timeout(30000).execute().parse().text();
String article3 = Jsoup.parse(content).text();
System.out.println("ARTICLE 1 : "+article1);
System.out.println("ARTICLE 2 : "+article2);
System.out.println("ARTICLE 3 : "+article3);
}
}
In Article 1 and 2 when I am using Jsoup to connect to the website I am not getting complete info, but while using URL to connect I am getting the complete Page. So basically Article 3 is complete which was done using URL. I have tried with Jsoup 1.8.1 and Jsoup 1.7.2
Use method maxBodySize:
String article = Jsoup.connect(urlStr).maxBodySize(Integer.MAX_VALUE).get().text();
I was able to create a Container in Storage Account and upload a blob to it through the Client Side Code.
I was able to make the blob available for Public access as well , such that when I hit the following query from my browser, I am able to see the image which I uploaded.
https://MYACCOUNT.blob.core.windows.net/MYCONTAINER/MYBLOB
I now have a requirement to use the rest service to retrieve the contents of the blob. I wrote down the following java code.
package main;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.TimeZone;
public class GetBlob {
public static void main(String[] args) {
String url="https://MYACCOUNT.blob.core.windows.net/MYCONTAINER/MYBLOB";
try {
System.out.println("RUNNIGN");
HttpURLConnection connection = (HttpURLConnection) new URL(url).openConnection();
connection.setRequestProperty("Authorization", createQuery());
connection.setRequestProperty("x-ms-version", "2009-09-19");
InputStream response = connection.getInputStream();
System.out.println("SUCCESSS");
String line;
BufferedReader reader = new BufferedReader(new InputStreamReader(response));
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
} catch (IOException e) {
e.printStackTrace();
}
}
public static String createQuery()
{
String dateFormat="EEE, dd MMM yyyy hh:mm:ss zzz";
SimpleDateFormat dateFormatGmt = new SimpleDateFormat(dateFormat);
dateFormatGmt.setTimeZone(TimeZone.getTimeZone("UTC"));
String date=dateFormatGmt.format(new Date());
String Signature="GET\n\n\n\n\n\n\n\n\n\n\n\n" +
"x-ms-date:" +date+
"\nx-ms-version:2009-09-19" ;
// I do not know CANOCALIZED RESOURCE
//WHAT ARE THEY??
// +"\n/myaccount/myaccount/mycontainer\ncomp:metadata\nrestype:container\ntimeout:20";
String SharedKey="SharedKey";
String AccountName="MYACCOUNT";
String encryptedSignature=(encrypt(Signature));
String auth=""+SharedKey+" "+AccountName+":"+encryptedSignature;
return auth;
}
public static String encrypt(String clearTextPassword) {
try {
MessageDigest md = MessageDigest.getInstance("SHA-256");
md.update(clearTextPassword.getBytes());
return new sun.misc.BASE64Encoder().encode(md.digest());
} catch (NoSuchAlgorithmException e) {
}
return "";
}
}
However , I get the following error when I run this main class...
RUNNIGN
java.io.IOException: Server returned HTTP response code: 403 for URL: https://klabs.blob.core.windows.net/delete/Blob_1
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
at main.MainClass.main(MainClass.java:61)
Question1: Why this error, did I miss any header/parameter?
Question2: Do I need to add headers in the first place, because I am able to hit the request from the browser without any issues.
Question3: Can it be an SSL issue? What is the concept of certificates, and how and where to add them? Do I really need them? Will I need them later, when I do bigger operations on my blob storage(I want to manage a thousand blobs)?
Will be thankful for any reference as well, within Azure and otherwise that could help me understand better.
:D
AFTER A FEW DAYS
Below is my new code for PutBlob I azure. I believe I have fully resolved all header and parameter issues and my request is perfect. However I am still getting the same 403. I do not know what the issue is. Azure is proving to be pretty difficult.
A thing to note is that the containers name is delete, and I want to create a blob inside it, say newBlob. I tried to initialize the urlPath in the code below with both "delete" and "delete/newBlob".
Does not work..
package main;
import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.io.File;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.net.HttpURLConnection;
import java.net.URISyntaxException;
import java.net.URL;
import java.security.InvalidKeyException;
import java.security.NoSuchAlgorithmException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.TimeZone;
import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;
import com.sun.org.apache.xml.internal.security.exceptions.Base64DecodingException;
import com.sun.org.apache.xml.internal.security.utils.Base64;
public class Internet {
static String key="password";
static String account="klabs";
private static Base64 base64 ;
private static String createAuthorizationHeader(String canonicalizedString) throws InvalidKeyException, Base64DecodingException, NoSuchAlgorithmException, IllegalStateException, UnsupportedEncodingException {
Mac mac = Mac.getInstance("HmacSHA256");
mac.init(new SecretKeySpec(base64.decode(key), "HmacSHA256"));
String authKey = new String(base64.encode(mac.doFinal(canonicalizedString.getBytes("UTF-8"))));
String authStr = "SharedKey " + account + ":" + authKey;
return authStr;
}
public static void main(String[] args) {
System.out.println("INTERNET");
String key="password";
String account="klabs";
long blobLength="Dipanshu Verma wrote this".getBytes().length;
File f = new File("C:\\Users\\Dipanshu\\Desktop\\abc.txt");
String requestMethod = "PUT";
String urlPath = "delete";
String storageServiceVersion = "2009-09-19";
SimpleDateFormat fmt = new SimpleDateFormat("EEE, dd MMM yyyy HH:mm:sss");
fmt.setTimeZone(TimeZone.getTimeZone("UTC"));
String date = fmt.format(Calendar.getInstance().getTime()) + " UTC";
String blobType = "BlockBlob";
String canonicalizedHeaders = "x-ms-blob-type:"+blobType+"\nx-ms-date:"+date+"\nx-ms-version:"+storageServiceVersion;
String canonicalizedResource = "/"+account+"/"+urlPath;
String stringToSign = requestMethod+"\n\n\n"+blobLength+"\n\n\n\n\n\n\n\n\n"+canonicalizedHeaders+"\n"+canonicalizedResource;
try {
String authorizationHeader = createAuthorizationHeader(stringToSign);
URL myUrl = new URL("https://klabs.blob.core.windows.net/" + urlPath);
HttpURLConnection connection=(HttpURLConnection)myUrl.openConnection();
connection.setRequestProperty("x-ms-blob-type", blobType);
connection.setRequestProperty("Content-Length", String.valueOf(blobLength));
connection.setRequestProperty("x-ms-date", date);
connection.setRequestProperty("x-ms-version", storageServiceVersion);
connection.setRequestProperty("Authorization", authorizationHeader);
connection.setDoOutput(true);
connection.setRequestMethod("POST");
System.out.println(String.valueOf(blobLength));
System.out.println(date);
System.out.println(storageServiceVersion);
System.out.println(stringToSign);
System.out.println(authorizationHeader);
System.out.println(connection.getDoOutput());
DataOutputStream outStream = new DataOutputStream(connection.getOutputStream());
// Send request
outStream.writeBytes("Dipanshu Verma wrote this");
outStream.flush();
outStream.close();
DataInputStream inStream = new DataInputStream(connection.getInputStream());
System.out.println("BULLA");
String buffer;
while((buffer = inStream.readLine()) != null) {
System.out.println(buffer);
}
// Close I/O streams
inStream.close();
outStream.close();
} catch (InvalidKeyException | Base64DecodingException | NoSuchAlgorithmException | IllegalStateException | UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
I know only a proper code reviewer might be able to help me, please do it if you can.
Thanks
Question1: Why this error, did I miss any header/parameter?
Most likely you're getting this error is because of incorrect signature. Please refer to MSDN documentation for creating correct signature: http://msdn.microsoft.com/en-us/library/azure/dd179428.aspx. Unless your signature is correct you'll not be able to perform operations using REST API.
Question2: Do I need to add headers in the first place, because I am
able to hit the request from the browser without any issues.
In your current scenario, because you can access the blob directly (which in turn means the container in which the blob exist has Public or Blob ACL) you don't really need to use REST API. You can simply make a HTTP request using Java and read the response stream which will have blob contents. You would need to go down this route if the container ACL is Private because in this case your requests need to be authenticated and the code above creates an authenticated request.
Question3: Can it be an SSL issue? What is the concept of
certificates, and how and where to add them? Do I really need them?
Will I need them later, when I do bigger operations on my blob
storage(I want to manage a thousand blobs)?
No, it is not an SSL issue. Its an issue with incorrect signature.
Finally found the mistake!!
In the code above , I was using a String "password" as key for my SHA2
base64.decode(key)
It should have been the key associated with my account with AZURE.
Silly One!! Took me 2 weeks to find.
This program compiles successfully but when I try to run the program it gives me errors.
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.InputStreamReader;
import java.net.URL;
public class Main {
public static void main(String[] args)
throws Exception {
URL url = new URL("http://www.google.com");
BufferedReader reader = new BufferedReader
(new InputStreamReader(url.openStream()));
BufferedWriter writer = new BufferedWriter
(new FileWriter("data.html"));
String line;
while ((line = reader.readLine()) != null) {
System.out.println(line);
writer.write(line);
writer.newLine();
}
reader.close();
writer.close();
}
}
The following error occurs (I have attached the image):
Screenshot of errors
I am behind a proxy server. Does that make a problem in connecting to the internet? If so please post the solution that .. Thanks in advance.
You should do something similar:
1st of all put proxy information to system properties:
System.getProperties().put( "proxySet", "true" );
System.getProperties().put( "proxyHost", "proxy_hostname" );
System.getProperties().put( "proxyPort", "8080" ); // or other proxy port
And then you need to do authentication on proxy, using something similar:
URL url = new URL("http://www.google.com");
URLConnection con = url.openConnection();
String pass = "MY_USERNAME:MY_PASS";
String encodedPass = base64Encode( pass );
con.setRequestProperty( "Proxy-Authorization", encodedPass );
Good luck.
Yes. Proxy settings can protect a standalone app from connecting to internet. If you know the proxy try using
-Dhttp.proxyHost=yourProxy & -Dhttp.proxyPort=proxyPort
These are VM arguments. If you are running it command line then use it as
java -Dhttp.proxyHost=yourProxy & -Dhttp.proxyPort=proxyPort Main
Really simple, or so I thought.
Java Code
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.net.URL;
import java.net.URLConnection;
public class UrlConnectionTest {
private static final String TEST_URL = "http://localhost:3000/test/hitme";
public static void main(String[] args) throws IOException {
URLConnection urlCon = null;
URL url = null;
OutputStreamWriter osw = null;
try {
url = new URL(TEST_URL);
urlCon = url.openConnection();
urlCon.setDoOutput(true);
urlCon.setRequestProperty("Content-Type", "text/plain");
osw = new OutputStreamWriter(urlCon.getOutputStream());
osw.write("HELLO WORLD");
} catch (Exception e) {
e.printStackTrace();
} finally {
if (osw != null) {
osw.close();
}
}
}
}
TestController#hitme
def hitme
puts "SOMEONE IS HITTING ME!" * 100
puts request.env.inspect
end
When I run the Java code, I see nothing in my Rails Server Console. However, when I hit the URL in my browser, I get output as specified in TestController#hitme. I thought it would be simple, but haven't had any luck. Any ideas?
Thanks in advance!
You're probably getting an exception, which you aren't seeing, because you're swallowing it. At least print the exception in the catch block.
Even if this isn't the problem, your going to chase your tail a lot if you make a habit of swallowing errors.
I don't think you're actually sending any data until you call
urlCon.getInputStream();
Is it that your URL in your java code shows the controller name of "test" (test/hitme) but you mention that your controller name is TestController? i.e., the URL in your java code should be changed.
private static final String TEST_URL = "http://localhost:3000/TestController/hitme";
Don't fiddle around with URLConnection yourself, let Resty handle it.
Here's the code you would need to write (I assume you are getting text back):
import static us.monoid.web.Resty.*;
import us.monoid.web.Resty;
...
new Resty().text(TEST_URL, content("HELLO WORLD")).toString();