Java httpURLConnection threaded - java

Simple question:
Is it possible to request SEVERAL httpURLConnection at the same time? I'm creating a tool to check if pages exists on certain server, and at the moment, Java seems to wait until for each httpURLConnection finishes to start a new one. Here's my code:
public static String GetSource(String url){
String results = "";
try{
URL SourceCode = new URL(url);
URLConnection connect = SourceCode.openConnection();
connect.setRequestProperty("Host", "www.someserver.com");
connect.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20100101 Firefox/11.0");
connect.setRequestProperty("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8");
connect.setRequestProperty("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");
connect.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
connect.setRequestProperty("Keep-Alive", "115");
connect.setRequestProperty("Connection", "keep-alive");
BufferedReader in = new BufferedReader(new InputStreamReader(connect.getInputStream(), "UTF-8"));
String inputLine;
while ((inputLine = in.readLine()) != null){
results += inputLine;
}
return results;
}catch(Exception e){
// Something's wrong
}
return results;
}
Thanks a lot!

Yes it is possible, the code you posted can be called from multiple threads at the same time.

You need to create a thread for each hit. Create a class that implements Runnable, then put all of your connection code inside the run method.
Then run it with something like this...
for(int i=0; i < *thread count*; i++){
Thread currentThread = new Thread(*Instance of your runnable Class*);
currentThread.start();
}

Related

Read XML from valid URL not returned. Header formatting issue?

I am trying to use the code below to read from a valid url. I can copy and paste the url into my browser and it works perfectly (displays the xml) but when I try to access it programatically it returns nothing (no data and no error). I have already tried to set the user-agent via this post: Can't read in HTML content from valid URL but it didnt fix my problem. If it matters I am trying to do a single Eve API call. I believe the problem is that I do not have my headers formatted correctly, and the Eve site is rejecting the query. I can access the data fine using PHP, but I recently had to change languages.
public static void readFileToXML(String urlString,String fName)
{
try{
java.net.URL url = new java.net.URL(urlString);
System.out.println(url);
URLConnection cnx = url.openConnection();
cnx.setAllowUserInteraction(false);
cnx.setDoOutput(true);
cnx.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/531.0 (KHTML, like Gecko) Chrome/3.0.183.1 Safari/531.0");
System.out.println(cnx.getContentLengthLong());// a change suggested in the comments. returns -1
InputStream is = cnx.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
File file=new File("C:\\Users\\xxx\\Desktop\\"+fName);
BufferedWriter bw = new BufferedWriter(new FileWriter(file,false));
String inputLine;
while ((inputLine = br.readLine()) != null) {
bw.write(inputLine);
System.out.println(inputLine);
}
System.out.println("Finished read");
bw.close();
br.close();
}
catch(Exception e)
{
System.out.println("Exception: "+e.getMessage());
}
}

403 Forbidden with Java but not web browser?

I am writing a small Java program to get the amount of results for a given Google search term. For some reason, in Java I am getting a 403 Forbidden but I am getting the right results in web browsers. Code:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
public class DataGetter {
public static void main(String[] args) throws IOException {
getResultAmount("test");
}
private static int getResultAmount(String query) throws IOException {
BufferedReader r = new BufferedReader(new InputStreamReader(new URL("https://www.google.com/search?q=" + query).openConnection()
.getInputStream()));
String line;
String src = "";
while ((line = r.readLine()) != null) {
src += line;
}
System.out.println(src);
return 1;
}
}
And the error:
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: https://www.google.com/search?q=test
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
at DataGetter.getResultAmount(DataGetter.java:15)
at DataGetter.main(DataGetter.java:10)
Why is it doing this?
You just need to set user agent header for it to work:
URLConnection connection = new URL("https://www.google.com/search?q=" + query).openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.connect();
BufferedReader r = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));
StringBuilder sb = new StringBuilder();
String line;
while ((line = r.readLine()) != null) {
sb.append(line);
}
System.out.println(sb.toString());
The SSL was transparently handled for you as could be seen from your exception stacktrace.
Getting the result amount is not really this simple though, after this you have to fake that you're a browser by fetching the cookie and parsing the redirect token link.
String cookie = connection.getHeaderField( "Set-Cookie").split(";")[0];
Pattern pattern = Pattern.compile("content=\\\"0;url=(.*?)\\\"");
Matcher m = pattern.matcher(response);
if( m.find() ) {
String url = m.group(1);
connection = new URL(url).openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.setRequestProperty("Cookie", cookie );
connection.connect();
r = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));
sb = new StringBuilder();
while ((line = r.readLine()) != null) {
sb.append(line);
}
response = sb.toString();
pattern = Pattern.compile("<div id=\"resultStats\">About ([0-9,]+) results</div>");
m = pattern.matcher(response);
if( m.find() ) {
long amount = Long.parseLong(m.group(1).replaceAll(",", ""));
return amount;
}
}
Running the full code I get 2930000000L as a result.
For me it worked by adding the header:
"Accept": "*/*"
You probably aren't setting the correct headers. Use LiveHttpHeaders (or equivalent) in the browser to see what headers the browser is sending, then emulate them in your code.
It's because the site uses SSL. Try using the Jersey HTTP Client. You will probably also have to learn a little about HTTPS and the certificates, but I think Jersey can bet set to ignore most of the details relating to the actual security.

HttpsURLConnection and POST

some time ago i wrote this program in python, that logged in a website using https, took some info and logged out.
The program was quite simple:
class Richiesta(object):
def __init__(self,url,data):
self.url = url
self.data = ""
self.content = ""
for k, v in data.iteritems():
self.data += str(k)+"="+str(v)+"&"
if(self.data == ""):
self.req = urllib2.Request(self.url)
else:
self.req = urllib2.Request(self.url,self.data)
self.req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 5.1; rv:2.0b6) Gecko/20100101 Firefox/4.0b6')
self.req.add_header('Referer', baseurl+'/www/')
self.req.add_header('Cookie', cookie )
def leggi(self):
while(self.content == ""):
try:
r = urllib2.urlopen(self.req)
except urllib2.HTTPError, e:
print("Errore del server, nuovo tentativo tra 15 secondi")
time.sleep(15)
except urllib2.URLError, e:
print("Problema di rete, proverò a riconnettermi tra 20 secondi")
time.sleep(20)
else:
self.content = r.read().decode('utf-8')
def login(username,password):
global cookie
print("Inizio la procedura di login")
url = "https://example.com/auth/Authenticate"
data = {"login":"1","username":username,"password":password}
f = Richiesta(url,data)
f.leggi()
Now, for some reason, I have to translate it in java. Untill now, this is what i've writte:
import java.net.*;
import java.security.Security.*;
import java.io.*;
import javax.net.ssl.*;
public class SafeReq {
String baseurl = "http://www.example.com";
String useragent = "Mozilla/5.0 (Windows NT 5.1; rv:2.0b6) Gecko/20100101 Firefox/4.0b6";
String content = "";
public SafeReq(String s, String sid, String data) throws MalformedURLException {
try{
URL url = new URL(s);
HttpsURLConnection request = ( HttpsURLConnection ) url.openConnection();
request.setUseCaches(false);
request.setDoOutput(true);
request.setDoInput(true);
request.setFollowRedirects(true);
request.setInstanceFollowRedirects(true);
request.setRequestProperty("User-Agent",useragent);
request.setRequestProperty("Referer","http://www.example.com/www/");
request.setRequestProperty("Cookie","sid="+sid);
request.setRequestProperty("Origin","http://www.example.com");
request.setRequestProperty("Content-Type","application/x-www-form-urlencoded");
request.setRequestProperty("Content-length",String.valueOf(data.length()));
request.setRequestMethod("POST");
OutputStreamWriter post = new OutputStreamWriter(request.getOutputStream());
post.write(data);
post.flush();
BufferedReader in = new BufferedReader(new InputStreamReader(request.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
content += inputLine;
}
post.close();
in.close();
} catch (IOException e){
e.printStackTrace();
}
}
public String leggi(){
return content;
}
}
The problem is, the login doesn't work, and when i try to get a page that require me to be logged, i get the "Login Again" message.
The two classes seems quite the same, and i can't understand why i can't make the second one to work ... any idea?
Where do you get your sid from? From the symptoms, I would guess that your session cookie is not passed correctly to the server.
See this question for possible solution: Cookies turned off with Java URLConnection.
In general, I recommend you to use HttpClient for implementing HTTP conversations in Java (anything more complicated than a simple one-time GET or POST). See code examples (I guess "Form based logon" example is appropriate in your case).
Anyone looking for this in the future, take a look at HtmlUnit.
This answer has a nice example.

URLConnection sometimes returns empty string response?

I'm making an http GET request. It works in about 70% of my attempts. For some reason, I sometimes get no response string from a successful connection. I just setup a button in my app which keeps firing the code below. One call might fail to reply with a string, the next call works fine:
private onButtonClick() {
try {
doit();
} catch (Exception ex) {
...
}
}
public void doit() throws Exception {
URL url = new URL("http://www.example.com/service");
HttpURLConnection connection = (HttpURLConnection)url.openConnection();
connection.setDoInput(true);
connection.setUseCaches(false);
connection.setAllowUserInteraction(false);
connection.setReadTimeout(30 * 1000);
connection.setRequestProperty("Connection", "Keep-Alive");
connection.setRequestProperty("Authorization",
"Basic " + Base64.encode("username" + ":" + "password"));
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line = "";
StringBuilder sb = new StringBuilder();
while ((line = in.readLine()) != null) {
sb.append(line);
}
in.close();
connection.disconnect();
// Every so often this prints an empty string!
System.out.println(sb.toString());
}
am I doing something wrong here? It seems like maybe I'm not closing the connection properly from the last call somehow and the response gets mangled or something? I am also calling doit() from multiple threads simultaneously, but I thought the contents of the method are thread-safe, same behavior though,
Thanks
Thanks
That method looks fine. It's reentrant, so calls shouldn't interfere with each other. It's probably a server issue, either deliberate throttling or just a bug.
EDIT: You can check the status code with getResponseCode.
For checking ResponseCode:
BufferedReader responseStream;
if (((HttpURLConnection) connection).getResponseCode() == 200) {
responseStream = new BufferedReader(new InputStreamReader(connection.getInputStream(), "UTF-8"));
} else {
responseStream = new BufferedReader(new InputStreamReader(((HttpURLConnection) connection).getErrorStream(), "UTF-8"));
}
For empty content resposneCode is 204. So if u can get empty body just add one more "if" with 204 code.
We also came across with the similar scenario, I came across the following solution for this issue:
- Setting up a user agent string on URLConnection object.
URLConnection conn = url.openConnection();
conn.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 4.01; Windows NT)");
more details

java httpurlconnection cutting off html

Hey, I'm trying to get the html from a twitter profile page, but httpurlconnection is only returning a small snippet of the html. My code
for(int i = 0; i < urls.size(); i++)
{
URL url = new URL(urls.get(i));
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestProperty("User-Agent","Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6");
System.out.println(connection.getResponseCode());
String line;
StringBuilder builder = new StringBuilder();
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
while((line = reader.readLine()) != null)
{
builder.append(line);
}
String html = builder.toString();
}
I always get 200 as the response code for each call. However about 1/3 of the time the entire html document is returned, and the other half only the first few hundred lines. The amount returned when the html is cutoff is not always the same.
Any ideas? Thanks for any help!
Additional Info: After viewing the headers it seems I'm getting duplicate content-length headers. The first is the full length, the other is much shorter (and probably representative of the length I'm getting some of the time) How can I handle duplicate headers?
This worked fine for me, I added a newline after builder.append(line); to make it more readable in the console, but other than that it returned all the HTML for this page:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
public class RetrieveHTML {
public static void main(String[] args) throws IOException {
List<String> urls = new ArrayList<String>();
urls.add("http://stackoverflow.com/questions/3285077/java-httpurlconnection-cutting-off-html");
for (int i = 0; i < urls.size(); i++) {
URL url = new URL(urls.get(i));
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6");
System.out.println(connection.getResponseCode());
String line;
StringBuilder builder = new StringBuilder();
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
while ((line = reader.readLine()) != null) {
builder.append(line);
builder.append("\n");
}
String html = builder.toString();
System.out.println("HTML " + html);
}
}
}
Check out my HTTP class
https://stackoverflow.com/questions/9349378/java-net-httpurlconnection-returning-your-browsers-cookie-functionality-has-be
based on this API. Feel free to change some stuff.

Categories