Several TCP connections with java URLConnection - java

I have made a simple HTTP client, which downloads a set of URLs parsed from a webpage.
My problem is that the download is slow, compared to a real browser (IE, Firefox, Chrome). Especially if the page contains many objects.
I noticed (with wireshark) that many times the real browsers will setup 5-10 TCP connections within the same millisecond instantly after starting the load of a page. Those conections will then live concurrently for a period of time.
My client will also setup concurrent TCP-connections (and it will reuse TCP connections), but not at all this aggressively. I'm guessing that this is one of the reasons my client is slower.
I have tried creating several URLConnections before reading from the input stream, but this does not work for me. I am inexperienced though, so I probably do it wrong.
Does anyone know of way to do this (mimic what the browsers are doing in terms of TCP connection setup) with URLConnection?

I recommend using HttpClient:
http://hc.apache.org/httpcomponents-client-ga/
It has support for internal connection management, pooling etc.
Browsers tend to have this sort of stuff.
Things may have changed since I last used it, but UrlConnection didn't work well for production apps. Ex. it didn't have a clean way to shut it down.

I would also recommend using a high performance networking library, like Apache Mina. This will automatically create thread pool for you and save you a lot of time.

Related

Is it a good idea to destroy sockets after a single use?

I've been looking into making a simple Sockets-based game in Java, and read in multiple places that client sockets are destroyed after a single exchange. Is this good practice for continued connections? The server needs to maintain a connection with a client (i.e. not using socket.accept() every time it wants to tell a client about something), but can't wait every time for the client's response. I already have the server/client running in separate threads, but won't destroying the socket after every exchange mean re-acquiring (or failing to re-acquire) a connection to that client? I've seen so many conflicting websites about sockets in Java and how they should be implemented.
There's no hard and fast rules, but it does depend slightly on what data rates you want to achieve.
For example, YouTube is a streaming video service, but the video data is delivered by means of the client using https to fetch batches of video data. Inefficient, yes, but very easy to program for. There's lots of reasons to use https for an application like YouTube (firewalls, etc), but ultimate power saving and network performance were not one of them. The "proper" way would be to use a protocol like RTP which uses UDP to deliver small packets of data which can then be rearranged into order, you also have to deal with missing frames at the CODEC level, etc. Much less network traffic, friendly to bandwidth constrained network links, but significantly more difficult to deal with traversing across firewalls, in client software, etc.
So if your game is sending modest amounts of data, the only thing wrong with setting up and tearing down a whole socket connection for every message is the nagging feeling you yourself will have that it is somehow not the most efficient solution.
Though it sounds like you have a conflict between the need to communicate between client / server and a need to process something else whilst waiting for the communication to complete. Here you're getting into asynchronous I/O territory. To make that easy i strongly suggest you take a look at ZeroMQ - that will make everything a whole lot simpler.
and read in multiple places that client sockets are destroyed after a single exchange.
Only in the places where that actually happens. There are numerous contexts where it doesn't, the outstanding example being HTTP, where every effort is made to reuse connections.
Is this good practice for continued connections?
The question is a contradiction in terms. A continued connection is a connection that isn't closed. A closed connection can't be continued.
The server needs to maintain a connection with a client (i.e. not using socket.accept() every time it wants to tell a client about something), but can't wait every time for the client's response.
The word you are groping for here is 'session'.
I already have the server/client running in separate threads, but won't destroying the socket after every exchange mean re-acquiring (or failing to re-acquire) a connection to that client?
Yes.
I've seen so many conflicting websites about sockets in Java and how they should be implemented.
You should use a connection pool at the client; a request loop at the server that looks for multiple requests per connection; a client-side facility that closes idle connections after some idle timeout; and a read timeout at the server that closes connections on which no request has been read within the timeout.

Is this an appropriate use of websocket programming?

I am incredibly new to the topic of websockets and am trying to figure out the correct way to handle the communication between a device and a server.
Here is my scenario: I have a thermostat (very similar to the Nest) that needs to communicate with a web server. Every time you change the temperature on the thermostat, I need to send data to the web server to update it's "current stats" in the database. Easy, I can do that.
The part that I am confused about, and think websockets might be a use-case is when the user changes the temperature from the web interface. The thermostat has to pull in that information from the server to see "Oh, okay you want it to be 66 degrees."
I've thought of having the thermostat long-polling the server every 2-5 seconds to see what the "current stats" are in the database to change the temperature, but that seems like overkill.
Is there a way to open a connection between the thermostat and the server to listen for messages?
I began down the road of reading about websockets, however, I believe it is unfortunately browser-based only.
As I'm fairly new to the game with regards to these types of connections, if anyone could point me in the right direction regarding protocols, communication, etc. I would greatly appreciate it!
Tech Specs
Server is written in Ruby on Rails
Thermostat is written in Java
Websockets can be used between any two programs which need to communicate, they are certainly not restricted to the browser. That said, should you be using websockets is a different question. one thing to think about is that websockets involves a persistent connection. this may not scale (if you have lots of devices) and it may also be overkill. if you are expecting the temperature to be changed once a day, having a persistent connection for the entire day is an enormous waste of resources. websockets are typically used when communication needs to be "fast" and relatively frequent. unless you really need instantaneous updates in the thermostat, i would just have it ping the server every few minutes for updates.
Side note, websockets is fairly new, so any libraries you end up using may be a bit on the immature side.
We prototyped some java to java websockets a while not too long ago. We used the ning async library on the client side and the atmosphere library (built on netty) on the server side.
WebSockets is just a specification for tunneling something similar to TCP sockets over HTTP; it's not confined to the browser, and client libraries are available for most common languages.
This sounds like a reasonable use case for a long-running connection, but I would generally prefer a raw TCP connection to a WebSockets connection unless you have a specific restriction in mind (e.g., most home Internet connections have no problem with connecting to a server at an arbitrary port).

Java - first http request to specific host significantly slower

I am writing a benchmarking tool to run against a web application. The problem that I am facing is that the first request to the server always takes significantly longer than subsequent requests.
I have experienced this problem with the apache http client 3.x, 4.x and the Google http client. The apache http client 4.x shows the biggest difference (first request takes about seven times longer than subsequent ones. For Google and 3.x it is about 3 times longer.
My tool has to be able to benchmark simultaneous requests with threads. I can not use one instance of e.g. HttpClient and call it from all the threads, since this throws a Concurrent exception. Therefore, I have to use an individual instance in each thread, which will only execute a single request. This changes the overall results dramatically.
I do not understand this behavior. I do not think that it is due to a caching mechanism on the server because a) the webapp under consideration does not employ any caching (to my knowledge) and b) this effect is also visible when first requesting www.hostxy.com and afterwards www.hostxy.com/my/webapp.
I use System.nanoTime() immediately before and after calling client.execute(get) or get.execute(), respectively.
Does anyone have an idea where this behavior stems from? Do these httpclients themselves do any caching? I would be very grateful for any hints.
Read this: http://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html for connection pooling.
Your first connection probably takes the longest, because its a Connect: keep-alive connection, thus following connections can reuse that connection, once it has been established. This is justa guess
Are you hitting a JSP for the first time after server start? If the server flushes it's working directory on each start, then the first hit the JSPs compile and it takes a long time.
Also done on the first transaction: If the transaction uses a ca cert trust store, it will be loaded and cached.
You better once see it about caching
http://hc.apache.org/httpcomponents-client-ga/tutorial/html/caching.html
If your problem is that "first http request to specific host significantly slower", maybe the cause of this symptom is on the server, while you are concerned about the client.
If the "specific host" that you are calling is an Google App Engine application (or any other Cloud Plattform), it is normal that the first call to that application make you wait a little more. That is because Google put it under a dormant state upon inactivity.
After a recent call (that usually takes longer to respond), the subsequent calls have faster responses, because the server instances are all awake.
Take a look on this:
Google App Engine Application Extremely slow
I hope it helps you, good luck!

Reliable/persistent outbound sockets needed, options?

I have a Scala application which maintains (or tries to) TCP connections to various servers for hours (possibly > 24) at a time. Each server sends a short, ~30 character message about twice a second. These messages are fed into an iteratee where they are parsed and eventually end up making state changes to a database.
If any of these connections fail for any reason, my app needs to continually try to reconnect until I specify otherwise. Any messages getting lost is Bad. I have no control over the servers I connect to, or the protocols used.
It is conceivable there would be as many as 300 of these connections at once. No exactly a high-load scenario, so I don't think NIO is needed, though it might be nice to have? Other bits of the app are high-load.
I'm looking for some sort of socket controller / manager which can keep these connections as reliably as possible. I am running my own blocking controller now, but as I'm inexperienced with socket coding (and all the various settings, options, timeouts, etc.) I doubt its will achieve the best possible uptime. Plus I may need SSL support at some point down the line.
Would NIO offer any real advantages?
Would Netty be the best choice here? I've seen the Uptime example here, and was thinking of simply duplicating it, but being new to lower-level networking I wasn't sure if there were better options.
However I'm uncertain of the best strategies for ensuring as few packets are lost as possible, and assumed this would be a "solved" problem in one library or another.
Yup. JMS is an example.
I suppose a lot of it would come down to a timeout guessing strategy? Close and re-open a socket too early and you've lost whatever packets were en-route.
That is correct. That approach is not going to be reliable, especially if connections go up and down regularly.
A real solution involves having the other end keep track of what it has received, and letting the sender know when then connection is re-established. If that can't be done, you have no real way of controlling how much gets lost. (This is what the reliable messaging services do ...)
I have no control over the servers I connect to. So unless there's another way to adapt JMS to a generic TCP stream I don't think it will work.
Yup. And the same applies if you try to implement this by hand. The other end has to cooperate.
I guess you could construct something where you run (say) a JMS end point on each of the remote servers, and have the endpoint use UNIX domain sockets or loopback (i.e. 127.0.0.1) to talk to the server. But you still have potential for message loss.

How to diagnose leaked http connections (org.apache.http.impl.conn.tsccm.ConnPoolByRoute)

I have a multithreaded java program that runs on Amazon's EC2. It queries and fetches data items from a vendor via HttpPost and HttpGet, using a org.apache.http.impl.client.DefaultHttpClient. Concurrently, it pushes the retrieved data items into S3 using AWS's Java SDK.
After a few days of running, I get the symptoms that normally come with http connection leaks:
org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection
at org.apache.http.impl.conn.tsccm.ConnPoolByRoute.getEntryBlocking(ConnPoolByRoute.java:417)
at org.apache.http.impl.conn.tsccm.ConnPoolByRoute$1.getPoolEntry(ConnPoolByRoute.java:300)
at org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager$1.getConnection(ThreadSafeClientConnManager.java:224)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:391)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
Since both AWS and my requests to the data vendor use Http connections, I am not quite sure where exactly I forget to HttpEntity.consume(), or S3ObjectInputStream.close() (unless it is yet something else...).
So here is my question: are there ways to monitor org.apache.http.impl.conn.tsccm.ConnPoolByRoute so that at least I can detect when I am starting to leak connections/entities not properly consumed/http streams not closed? (I have a feeling it happens only under certain conditions, e.g. when certain exceptions are being thrown, by-passing the logic in my code that consumes HttpEntities, closes streams, etc.) Any idea on how to diagnose what eventually causes all my http connections to fail with that ConnectionPoolTimeoutException would be most welcome. I don't feel like waiting 4+ days between attempts to fix the root cause of the problem.
If you're using the PoolingClientConnectionManager note there are the methods getTotalStats() and getStats(final HttpRoute route) which will give you a PoolStats object with the data you're looking to monitor.
Just fetch the ConnectionManager from your httpclient:
PoolingClientConnectionManager poolManager = (PoolingClientConnectionManager) httpClient.getConnectionManager();
If you can access the org.apache.http.impl.conn.tsccm.ConnPoolByRoute then set it's connTTL to a low enough value so that it's WaitingThreadAborter will eventually terminate a connection. It will show a nice stacktrace there. The other option is to use CGLIB or some other bytecode manipulating framework to create a proxy class wrapping org.apache.http.impl.conn.tsccm.ConnPoolByRoute. Depending on your environment it might not be that easy to set it up, but it's a rather valuable tool to debug issues like yours. (And yes, if you happen to use spring or just plain Aspects the setup will be supereasy :) )

Categories