using sockets to fetch a webpage with java - java

I'd like to fetch a webpage, just fetching the data (not parsing or rendering anything), just catch the data returned after a http request.
I'm trying to do this using the high-level Class Socket of the JavaRuntime Library.
I wonder if this is possible since I'm not at ease figuring out the beneath layer used for this two-point communication or I don't know if the trouble is coming from my own system.
.
Here's what my code is doing:
1) setting the socket.
this.socket = new Socket( "www.example.com", 80 );
2) setting the appropriate streams used for this communication.
this.out = new PrintWriter( socket.getOutputStream(), true);
this.in = new BufferedReader( new InputStreamReader( socket.getInputStream() ) );
3) requesting the page (and this is where I'm not sure it's alright to do like this).
String query = "";
query += "GET / HTTP/1.1\r\n";
query += "Host: www.example.com\r\n";
...
query += "\r\n";
this.out.print(query);
4) reading the result (nothing in my case).
System.out.print( this.in.readLine() );
5) closing socket and streams.

If you're on a *nix system, look into CURL, which allows you to retrieve information off the internet using the command line. More lightweight than a Java socket connection.
If you want to use Java, and are just retrieving information from a webpage, check out the Java URL library (java.net.URL). Some sample Java code:
URL ur = new URL("www.google.com");
URLConnection conn = ur.openConnection();
InputStream is = conn.getInputStream();
String foo = new Scanner(is).useDelimiter("\\A").next();
System.out.println(foo);
That'll grab the specified URL, grab the data (html in this case) and spit it out to the console. Might have to tweak the delimiter abit, but this will work with most network endpoints sending data.

Your code looks pretty close. Your GET request is probably malformed in some way. Try this: open up a telnet client and connect to a web server. Paste in the GET request as you believe it should work. See if that returns anything. If it doesn't it means there is a problem with the GET request. The easiest thing to do that point would be write a program that listens on a socket (more or less the inverse of what you're doing) and point a web browser to localhost:[correct port] and see what the web browser sends you. Use that as your template for the GET request.
Alternatively you could try and piece it together from the HTTP specification.

I had to add the full URL to the GET parameter. To make it work. Although I see you can specify HOST also if you want.
Socket socket = new Socket("youtube.com",80);
PrintWriter out = new PrintWriter(new BufferedWriter(new
OutputStreamWriter(socket.getOutputStream())));
out.println("GET http://www.youtube.com/yts/img/favicon_48-vflVjB_Qk.png
HTTP/1.0");
out.println();
out.flush();

Yes, it is possible. You just need to figure out the protocol. You are close.
I would create a simple server socket that prints out what it gets in. You can then use your browser to connect to the socket using a url like: http://localhost:8080. Then use your client socket to mimic the HTTP protocol from the browser.

Not sure why you're going lower down than URLConnection - its designed to do what you want to do: http://download.oracle.com/javase/tutorial/networking/urls/readingWriting.html.
The Java Tutorial on Sockets even says: "URLs and URLConnections provide a relatively high-level mechanism for accessing resources on the Internet. Sometimes your programs require lower-level network communication, for example, when you want to write a client-server application." Since you're not going lower than HTTP, I'm not sure what the point is of using a Socket.

Related

Java Can't Connect To PHP Web Service

Edit:
As I've just seen, it happens even with the simplest setup:
InputStream stream = new URL("http://xx.xx.xxx.xxx/GetAll.php").openStream();
Gives the same timeout error. I think I'm missing some basic configuration.
I used HTTPGet to connect to a PHP web service I have.
I saw it's deprecated so I've been trying to switch to the recommended HttpUrlConnection but with no success.
The HttpURLConnection does not seem to be able connect to the service, even though I can connect from my web browser without any problem.
My connection code:
URL myUrl = new URL("http://xx.xx.xxx.xxx/GetAll.php");
HttpURLConnection request = (HttpURLConnection)myUrl.openConnection();
request.setRequestProperty("Content-Type","text/xml;charset=UTF-8");
InputStream stream = request.getInputStream();
The GetAll.php file:
<?
require_once('MysqliDb.php'); //Helper class
$db = new MysqliDb();
//All closest events by date
$All = $db->query("SELECT * FROM Event;");
//Return in JSON
echo json_encode($All);
The result I am getting from the file:
[{"EventID":1,"StartTime":1300,"Duration":1,"EventDate":"2015-05-17","EventOrder":1,"Type":0,"Name":"\u05e2\u05d1\u05e8\u05d9\u05ea AND ENGLISH","Organiser":"Neta","Phone":"012345678","Location":"Loc","Description":"Desc"}]
Thank you,
Neta
I want to share my solution, as this has cost me hours of hair tearing.
As it turns out, "Timed out" exception has nothing to do with the code, it's a network connectivity issue. The phone I used to debug the app sometimes appears to be connected to Wifi even though it really isn't.
Anyway, if you have this exception, try checking your network connection.
Good luck!

Stream over HTTP SSL is not flushed

I have a web application running behind nginx. Some pages are accessible via http, some others via https. I have some "pages", which are rather streams as the application does not close the connection and feeds data as they come. The feed then looks like this:
TIME1 MESSAGE1
TIME2 MESSAGE2
...
TIMEn MESSAGEn
After each line I write "\n" and then call flush(). Over http, it works correctly and my client can listen to new data. However, over https the client is not receiving any data until the connection is closed.
ServletOutputStream stream = applicationModel.getOutputStream();
OutputStreamWriter streamWriter = new OutputStreamWriter(stream);
BufferedWriter writer = new BufferedWriter(streamWriter);
while (true) {
wait();
writer.write(newMessage);
writer.flush();
}
Unless the application is tightly integrated with the web server a flush on the writer will only flush the buffers inside your application, so that the data gets send to the web server. Inside the web server there are more buffers, which are necessary to optimize the traffic by sending larger TCP packets and thus decrease the overhead for the data. And, if you use SSL there is yet another layer to watch, because your data will be encapsulated into an SSL frame which again adds overhead, so it is good to not only have a few bytes payload inside. Finally you have the buffering at the OS kernel, which might defer the sending of a small TCP packet for some time if there is hope that there will be more data.
Please be aware, that your wish to control the buffers is against a fundamental design concept of HTTP. HTTP is based on the idea that you have a request from the client to the server and then a response from the server, ideally with a known content-length up-front. There is no idea in the original design of a response which evolves slowly and where the browser will update the display once new data arrive. The real way to get updates would be instead to let the client send another request and then send the new response back. Another way would be to use WebSockets.

Keep TCP socket-connection alive if no data is currently available

I have implemented a small HTTP-server which allows clients to connect via HTTP and stream audio-data to them.
My problem is, that in case there's currently no audio-data available, the connection seems to break, either because the client is disconnecting, or due to another reason inside Android.
I'm acting like the following way:
serverSocket = new ServerSocket(0);
Socket socket = serverSocket.accept();
socket.setKeepAlive(true);
BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(socket.getOutputStream()));
out.write("HTTP/1.1 200 OK\r\n");
out.write("Content-Type: audio/wav\r\n");
out.write("Accept-Ranges: none\r\n");
out.write("Connection: keep-alive\r\n"); // additionally added due to answer below
out.write("\r\n");
out.flush();
..
while(len=otherInput.read(audioBuffer)){
out.write(audioBuffer, 0, len);)
}
For sure this is just a snipped of the real code, but it shows what I'm doing.
Now, in case the "otherinput.read()" takes a long time because there's no data available at the moment, I get a
java.net.SocketException: sendto failed: EPIPE (Broken pipe)
at libcore.io.IoBridge.maybeThrowAfterSendto(IoBridge.java:499)
at libcore.io.IoBridge.sendto(IoBridge.java:468)
at java.net.PlainSocketImpl.write(PlainSocketImpl.java:508)
at java.net.PlainSocketImpl.access$100(PlainSocketImpl.java:46)
at java.net.PlainSocketImpl$PlainSocketOutputStream.write(PlainSocketImpl.java:270)
at java.io.BufferedOutputStream.flushInternal(BufferedOutputStream.java:185)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:139)
Who can tell me how I can prevent the connection from breaking/closing without a manual heartbeat? Do I miss some header or am I using something the wrong way?
Thanks for your help in advance, tried and searched myself crazy meanwhile.
There are at least two problems here.
Clients of HTTP servers are not well-behaved in the way you seem to expect. Consider a browser. The user can shut it down, go back, navigate away etc, any time he likes, even in the middle of a page load. If you get any error transmitting to the client there's nothing you can do except close the connection and forget about it. Same applies to any server really, but it applies unsolder to HTTP servers.
You're not reading the entire request sent by the client. You need to read all the headers until a blank line, then you need to read the body up to the length specified in the Content-length: header, or all the chunks, or until end of stream, as the case may be: see RFC 2616. The effect of this may be that you cause the behaviour at (1).

Writing to servlet stream

I'm not sure if what I'm trying to do is possible, it might not. Here is my problem:
I'm trying to use a Servlet to pass information from a client to a server via HTTP. This communication is very frequent (I'm passing UI information, so every single mouse event), so I want to have as little overhead as possible to avoid latency issues, which is why I would like to not do a GET call for each transmission. HTTP is a requirement. I'm using an older Tomcat version (Servlet API 2.4). I guess this is somewhat of a web sockets use case, but I don't have any web sockets support available.
What I tried was to open a URL connection on the client side, and to open the input stream (otherwise the doGet() of the servlet never gets called). I'm passing an argument for initialization purposes to the client.
URLConnection uiConnection = url.openConnection();
uiConnection.setRequestProperty("Authorization", "Basic " + encode("xyz" + ":"
+ "xyz"));
uiConnection.setReadTimeout(0);
uiConnection.setDoOutput(true);
uiConnection.setAllowUserInteraction(true);
DataInputStream is = new DataInputStream(
uiConnection.getInputStream());
When I later try to retrieve an ouput stream from this connection, I'm getting a ProtocolException (cannot write output after reading input).
out = new BufferedWriter(new OutputStreamWriter(
uiConnection.getOutputStream()));
out.write(uiUpdate);
On the servlet end I did something like this:
DataInputStream is = new DataInputStream(
request.getInputStream());
Am I completely on the wrong track or is something like this possible without using a new connection for each transmission?
Thanks,
Mark
I think the key question for this, is do you also have http traffic going to this IP? If so, there may not be anything you can do using just java. If not, then create a servlet to listen in on port 80, and parse the incoming data directly.
http://download.oracle.com/javase/tutorial/networking/sockets/clientServer.html

Communication between java server and matlab client

I'd like to establish a server(Java)/client (Matlab) communication using socket. They can send messages to each other. An example shows how to do this in Java server and Java client, http://java.sun.com/docs/books/tutorial/networking/sockets/clientServer.html.
When I try to rewrite the client part in Matlab, I only can get the first message that the Java server sends and display it in the Matlab command window.
When I type a message in the Matlab command window, I can't pass it to the Java Server.
Jave code:
kkSocket = new Socket("localhost", 3434);
Matlab equivalent:
kkSocket = Socket('localhost', 3434);
Java code for client:
out = new PrintWriter(kkSocket.getOutputStream(), true);
in = new BufferedReader(new InputStreamReader(kkSocket.getInputStream()));
What would be a Matlab equivalent for this? Thanks in advance.
For the input stream:
input_stream = input_socket.getInputStream;
d_input_stream = DataInputStream(input_stream);
For the output stream:
output_stream = output_socket.getOutputStream;
d_output_stream = DataOutputStream(output_stream);
If you are trying to use MATLAB and the Java application on the same machine then matlabcontrol may do everything that you are looking for. It automatically establishes a connection to a session of MATLAB. It uses Java's Remote Method Invocation under the hood which makes use of sockets. matlabcontrol is designed specifically to only enable communication on localhost; the sockets it creates will not accept remote connections due to the security issues that could allow. However, if you need to allow remote connections you may find parts of matlabcontrol's source code to be useful.

Categories