Multithreaded http/https Man in the middle Proxy, Socket Performance - java

Question edited following first comment.
My problem is mostly with java socket performance, and especially reading from the target server.
The server is a simple serversocket.accept() loop that create a client thread for every connection from firefox
Main problem is socket input stream reading that blocks for enormous amounts of time.
Client thread is as follows :
//Take an httpRequest (hc.apache.org), raw string http request, and the firefox socket outputstream
private void handle(httpRequest req, String raw, Outputstream out)
{
InputStream targetIn =null;
OutputStream targetOut = null;
Socket target = null;
try {
System.out.println("HANDLE HTTP");
String host = req.getHeaders("Host")[0].getValue();
URI uri = new URI(req.getRequestLine().getUri());
int port = uri.getPort() != -1 ? uri.getPort() : 80;
target = new Socket(host, port);
//**I have tried to play around with these but cannot seem to get a difference in performance**
target.setTcpNoDelay(true);
// target.setReceiveBufferSize(1024 *1024);
// target.setSendBufferSize(1024 * 1024);
//Get your plain old in/out streams
targetIn = target.getInputStream();
targetOut = target.getOutputStream();
//Send the request to the target
System.out.println("---------------Start response---------------");
targetOut.write(raw.getBytes());
System.out.println("request sent to target");
////Same as membrane
byte[] buffer = new byte[8 * 1024];
int length = 0;
try {
while((length = targetIn.read(buffer)) > 0) {
out.write(buffer, 0, length);
out.flush();
}
} catch(Exception e) {
e.printStackTrace();
}
System.out.println("closing out + target socket");
//IOUTILS
// long count = IOUtils.copyLarge(targetIn, out, 0L, 1048576L);
// int count = IOUtils.copy(targetIn, out);
// System.out.println("transfered : " + count );
//CHANNEL COPY
//
// ReadableByteChannel input = Channels.newChannel(targetIn);
// WritableByteChannel output = Channels.newChannel(out);
//
// ChannelTools.fastChannelCopy(input, output);
//
// input.close();
// output.close();
//CHAR TO CHAR COPY
// int c;
// while ((c = targetIn.read()) != -1) {
// out.write(c);
// }
target.close();
out.close();
System.out.println("-------------------- end response ------------------------------");
}
catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
The main problem lies in in the appropriate method to copy the target inputstream to the client (firefox) outputstream.
The site i am using to test this out is http://www.ouest-france.fr (new site with a load of images and makes loads of requests).
Ping time from workstation to target : 10ms
Normal Loading in iceweasel (debian firefox, firebug time) : 14 secs, 2.5MB
Loading behind this proxy : 14 minutes (firebug net panel is full of fake 404s, and aborted request that go back to black after a certain time, loads of requests are in blocking or waiting mode)
Now when executing i loadup visual vm, launch profiling with no class filter (to see where the app is really spending its time) and it spends 99% of its time in java.net.SocketInputStream.read(byte[], int, int), which is reading on the target socket input stream.
I think i have done my homework and been searching a testing different solutions juste about anywhere i could.
but performance never seems to improve.
I What i have already tried :
-Putting input and output streams into their buffered version, no change at all
-int to int copy, no change at all,
-classic byte[] array copy with variable sized arrays, no change at all
fiddling around with settcpnodelay, setsendbuffersize, setreceivebuffersize, could not get any change.
Was thinking of trying out nio socketchannels , but cannot find a way to do the socket to sslsocket hijacking.
So at the moment i am a bit stuck and searching for solutions.
I look at the source code of open sources proxies and cannot seem to find a fundamental difference in logic so i am completely lost with this.
Tried a other test :
export http_proxy="localhost:4242"
wget debiandvd.iso
Throughput gets to 2MB/s.
And threads seems to spend 66% time reading from target an 33% time writing to client
I am thinking that maybe i have to many threads running but running a test on www.google.com has much less requests going through but still the sames problems as www.ouest-france.fr
With the debian iso test i was thinking i had to many threads running (ouest-france is around 270 requests) but the google test (10 request) test seems to confirm that thread numbers are not the problem.
Any help will be appreciated.
Environment is debian, sun java 1.6, dev with eclipse and visualvm
I can provide the rest of the code as needed.
Thank you

Partial solution found :
Not a very clean solution but works.
I still have a throughput problem.
What I do is set the socket timer to a normal timeout (30000ms).
When the first read has come in the loop I reset the timer to something a lot lower (1000ms at the moment).
That allows me to wait for the server to start sending data, and if I get 1 second without any new data coming I consider the transfer to be finished.
Response times are still quite slow but way better.

Related

Java socket InputStream.read() not behaving as expected

Ive read many tutorials and posts about the java InputStream and reading data. Ive established a client and server implementation but i'm having weird issues where reading a variable length "payload" from the client is not consistent.
What im trying to do is to transfer up 100kB max in one single logical payload. Now i have verified that the TCP stack is not sending one mahousive 100kB packet from the client. I have played about with different read forms as per previous questions about the InputStream reading but ive nearly teared my hair out trying to get it to dump the correct data.
Lets for example say the client is sending a 70k payload.
Now the first observation i've noticed is that if I flow through the code line by line initiated from a break point, it will work fine, i get the exact same count in the outbound byte[]. When free running the byte[] will be different sizes every time i run the code with practically the same payload.
Timing problems?
second observation is that when the "inbuffer" size is set to 4096 for example this odd behaviour occurs. setting the "inbuffer" size to 1 presents the correct behaviour i.e. i get the correct payload size.
please understand i dont like the way ive had to get this to work and im not happy with the solution.
What experiences, problems have you had/seen respectively that might help me fix this code to be more reliable, easier to read.
public void listenForResponses() {
isActive = true;
try {
// apprently read() doesnt return -1 on socket based streams
// if big stuff comes through, TCP packets are segmented, but the inputstream
// does something odd and doesnt return the correct raw data.
// this is a work around to accept vari-length payloads into one byte[] buffer
byte[] inBuffer = new byte[1];
byte[] buffer = null;
int bytesRead = 0;
byte[] finalbuffer = new byte[0];
boolean isMultichunk = false;
InputStream istrm = currentSession.getInputStream();
while ((bytesRead = istrm.read(inBuffer)) > -1 && isActive) {
buffer = new byte[bytesRead];
buffer = Arrays.copyOfRange(inBuffer, 0, bytesRead);
int available = istrm.available();
if(available < 1) {
if(!isMultichunk) {
finalbuffer = buffer;
}
else {
finalbuffer = ConcatTools.ByteArrayConcat(finalbuffer,buffer);
}
notifyOfResponse(deserializePayload(finalbuffer));
finalbuffer = new byte[0];
isMultichunk = false;
}
else {
if(!isMultichunk) {
isMultichunk = true;
finalbuffer = new byte[0];
}
finalbuffer = ConcatTools.ByteArrayConcat(finalbuffer,buffer);
}
}
} catch (IOException e) {
Logger.consoleOut("PayloadReadThread: " + e.getMessage());
currentSession = null;
}
}
InputStream is working as designed.
if I flow through the code line by line initiated from a break point, it will work fine, i get the exact same count in the outbound byte[].
That's because stepping through the code is slower, so more data drives between reads, enough to fill your buffer.
When free running the byte[] will be different sizes every time i run the code with practically the same payload.
That's because InputStream.read() is contracted to block until at least one byte has been transferred, or EOS or an exception occurs. See the Javadoc. There's nothing in there about filling the buffer.
second observation is that when the "inbuffer" size is set to 4096 for example this odd behaviour occurs. setting the "inbuffer" size to 1 presents the correct behaviour i.e. i get the correct payload size.
That's the correct behaviour in the case of a 1 byte buffer for exactly the same reason given above. It's not the 'correct behaviour' for any other size.
NB Your copy loop is nonsense. available() has few correct uses, and this isn't one of them.
while ((count = in.read(buffer)) > 0)
{
out.write(buffer, 0, count);
}
NB (2) read() does indeed return -1 on socket-based streams, but only when the peer has shutdown or closed the connection.

Inconsistent output from multithreaded FTP InputStreams

I'm trying to create a java program that downloads certain asset files from an FTP server to a local file. Because my (free) FTP server doesn't support file sizes over a few megabytes, I decided to split up the files when they are uploaded and recombine them when the program downloads them. This works, but it is rather slow, because for each file, it has to get the InputStream, which takes some time.
The FTP server I use has a way to download the files without actually logging into the server, so I'm using this code to get the InputStream:
private static final InputStream getInputStream(String file) throws IOException {
return new URL("http://site.website.com/path/" + file).openStream();
}
To get the InputStream of a part of the asset file I'm using this code:
public static InputStream getAssetInputStream(String asset, int num) throws IOException, FTPException {
try {
return getInputStream("assets/" + asset + "_" + num + ".raf");
} catch (Exception e) {
// error handling
}
}
Because the getAssetInputStreams(String, int) method takes some time to run (especially if the file size is more then a megabyte), I decided to make the code that actually downloads the file multi-threaded. Here is where my problem lies.
final Map<Integer, Boolean> done = new HashMap<Integer, Boolean>();
final Map<Integer, byte[]> parts = new HashMap<Integer, byte[]>();
for (int i = 0; i < numParts; i++) {
final int part = i;
done.put(part, false);
new Thread(new Runnable() {
#Override
public void run() {
try {
InputStream is = FTP.getAssetInputStream(asset, part);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buf = new byte[DOWNLOAD_BUFFER_SIZE];
int len = 0;
while ((len = is.read(buf)) > 0) {
baos.write(buf, 0, len);
curDownload.addAndGet(len);
totAssets.addAndGet(len);
}
parts.put(part, baos.toByteArray());
done.put(part, true);
} catch (IOException e) {
// error handling
} catch (FTPException e) {
// error handling
}
}
}, "Download-" + asset + "-" + i).start();
}
while (done.values().contains(false)) {
try {
Thread.sleep(100);
} catch(InterruptedException e) {
e.printStackTrace();
}
}
File assetFile = new File(dir, "assets/" + asset + ".raf");
assetFile.createNewFile();
FileOutputStream fos = new FileOutputStream(assetFile);
for (int i = 0; i < numParts; i++) {
fos.write(parts.get(i));
}
fos.close();
This code works, but not always. When I run it on my desktop computer, it works almost always. Not 100% of the time, but often it works just fine. On my laptop, which has a far worse internet connection, it almost never works. The result is a file that is incomplete. Sometimes, it downloads 50% of the file. Sometimes, it downloads 90% of the file, it differs every time.
Now, if I replace the .start() by .run(), the code works just fine, 100% of the time, even on my laptop. It is, however, incredibly slow, so I'd rather not use .run().
Is there a way I could change my code so it does work multi-threaded? Any help will be appreciated.
Firstly, get your FTP server replaced, there are plenty of free FTP servers that support arbitrary file size serving with additional features, but I digress...
Your code seems to have many unrelated problems that could potentially all cause the behavior you are seeing, addressed below:
You have race conditions from accessing the done and parts maps from unprotected/unsynchronized access from multiple threads. This could cause data corruption and loss of synchronization for these variables between threads, potentially causing done.values().contains(false) to return true even when it's really not.
You are calling done.values().contains() repeatedly at a high frequency. Whilst the javadoc doesn't explicitly state, a hash map likely traverses every value in a O(n) fashion to check if a given map contains a value. Coupled with the fact that other threads are modifying the map, you'll get undefined behavior. According to values() javadoc:
If the map is modified while an iteration over the collection is in progress (except through the iterator's own remove operation), the results of the iteration are undefined.
You are somehow calling new URL("http://site.website.com/path/" + file).openStream(); but stating you are using FTP. The http:// in the link defines the protocol openStream() tries to open in and http:// is not ftp://. Not sure if this is a typo or did you mean HTTP (or do you have an HTTP server serving identical files).
Any thread raising any type of Exception will cause the code to fail given that not all parts will have "completed" (based on your busy-wait loop design). Granted, you may be redacted some other logic to guard against this, but otherwise this is a potential problem with the code.
You aren't closing any streams that you've opened. This could mean that the underlying socket itself is also left open. Not only does this constitute resource leakage, if the server itself has some sort of maximum number of simultaneous connection limit, you are only causing new connections to fail because the old, completed transfers are not closed.
Based on the issues above, I propose moving the download logic into a Callable task and running them through an ExecutorService as follows:
LinkedList<Callable<byte[]>> tasksToExecute = new LinkedList<>();
// Populate tasks to run
for(int i = 0; i < numParts; i++){
final int part = i;
// Lambda to
tasksToExecute.add(() -> {
InputStream is = null;
try{
is = FTP.getAssetInputStream(asset, part);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buf = new byte[DOWNLOAD_BUFFER_SIZE];
int len = 0;
while((len = is.read(buf)) > 0){
baos.write(buf, 0, len);
curDownload.addAndGet(len);
totAssets.addAndGet(len);
}
return baos.toByteArray();
}catch(IOException e){
// handle exception
}catch(FTPException e){
// handle exception
}finally{
if(is != null){
try{
is.close();
}catch(IOException ignored){}
}
}
return null;
});
}
// Retrieve an ExecutorService instance, note the use of work stealing pool is Java 8 only
// This can be substituted for newFixedThreadPool(nThreads) for Java < 8 as well for tight control over number of simultaneous links
ExecutorService executor = Executors.newWorkStealingPool(4);
// Tells the executor to execute all the tasks and give us the results
List<Future<byte[]>> resultFutures = executor.invokeAll(tasksToExecute);
// Populates the file
File assetFile = new File(dir, "assets/" + asset + ".raf");
assetFile.createNewFile();
try(FileOutputStream fos = new FileOutputStream(assetFile)){
// Iterate through the futures, writing them to file in order
for(Future<byte[]> result : resultFutures){
byte[] partData = result.get();
if(partData == null){
// exception occured during downloading this part, handle appropriately
}else{
fos.write(partData);
}
}
}catch(IOException ex(){
// handle exception
}
Using the executor service, you further optimize your multi-threading scenario since the output file will start writing as soon as pieces (in order) are available and that threads themselves are reused to save on thread creation costs.
As mentioned, there could be the case where too many simultaneous links causes the server to reject connections (or even more dangerously, write an EOF to make you think the part was downloaded). In this case, the number of worker threads can be tweaked by newFixedThreadPool(nThreads) to ensure at any given time, only nThreads amount of downloads can happen concurrently.

Problems with InputStream

Following is a part of the code snippet that I will be using for my project.
public String fetchFromStream()
{
try
{
int charVal;
StringBuffer sb = new StringBuffer();
while((charVal = inputStream.read()) > 0) {
sb.append((char)charVal);
}
return sb.toString();
} catch (Exception e)
{
m_log.error("readUntil(..) : " + e.getMessage());
return null;
} finally {
System.out.println("<<<<<<<<<<<<<<<<<<<<<< Called >>>>>>>>>>>>>>>>>>>>>>>>>>>");
}
}
Initially the while loop start working pretty fine. But after the probable last character is read from the stream I was expecting to get -1 return value. But this is where my problem starts. The code gets hanged, even the finally block is not executed.
I was debugging this code in Eclipse to see what is actually happening during the run-time. I set a pointer (debug) inside the while loop and was constantly monitoring the StringBuffer getting populated with char values one by one. But suddenly while checking the condition inside the while loop, the debugging control is getting lost and this is the point where the code goes to hangup state !! No exception is thrown as well !!
What is happening here ?
Edit::
This is how I'm getting my InputStream. Basically I'm using Apache Commons Net for Telnet.
private TelnetClient getTelnetSession(String hostname, int port)
{
TelnetClient tc = new TelnetClient();
try
{
tc.connect(hostname, port != 0 ? port : 23);
//These are instance variables
inputStream = tc.getInputStream();
outputStream = new PrintStream(tc.getOutputStream());
//More codes...
return tc;
} catch (SocketException se)
{
m_log.error("getTelnetSession(..) : " + se.getMessage());
return null;
} catch (IOException ioe)
{
m_log.error("getTelnetSession(..) : " + ioe.getMessage());
return null;
} catch (Exception e)
{
m_log.error("getTelnetSession(..) : " + e.getMessage());
return null;
}
}
Look at the JavaDocs:
Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown.
In simple turns: if your stream ended (e.g. end of file), read() returns -1 immediately. However if the stream is still open but JVM is waiting for data (slow disk, socket connection), read() will block (not really hung).
Where are you getting the stream from? Check out the available() - but please do not call it in a loop exhausting CPU.
Finally: casting int/byte to char will only work for ASCII characters, consider using Reader on top of InputStream.
read the docs
read() will wait until there is more data on the InputStream if the InputStream is not closed.
I suspect you are doing this with sockets? This is the most common area where this comes up.
"Reads the next byte of data from the input stream. The value byte is returned as an int in the range 0 to 255. If no byte is available because the end of the stream has been reached, the value -1 is returned. This method blocks until input data is available, the end of the stream is detected, or an exception is thrown"
I have the same issue with the Apache Commons on Android ...
the read() command on the inputstream hangs forever for some reason. And no, it is not just blocking "until data is available" ...
my debugging information shows that there are several 100 chars available() ... yet it just randomly blocks at some read. However, whenever I send something to the telnet server the block is suddenly released and it will continue reading for several chars until it suddenly stops/blocks again at some arbitrary point!
I believe there is some bug within the Apache Commons library! This is really annoying because there isn't a lot that can be done ... no timeout for the read command or anything else ...
EDIT: I was able to get around it ... by setting the TelNetClient.setReaderThread(false) ... obviously there is a bug within the Library that exists as long as a thread handles the input data ... when dispabled it works just fine for me!

Java - UDP sending data over socket.. not rec. all data

It would seem that the Client - Server application i wrote does work however it seems that not all data is processed every time.
I am testing it on a local machine in Eclipse env.
Server:
private void sendData() throws Exception
{
DatagramPacket data = new DatagramPacket(outgoingData, outgoingData.length, clientAddress, clientPort);
InputStream fis = new FileInputStream(responseData);
int a;
while((a = fis.read(outgoingData,0,512)) != -1)
{
serverSocket.send(data);
}
}
Client:
private void receiveData() throws Exception
{
DatagramPacket receiveData = new DatagramPacket(incomingData, incomingData.length);
OutputStream fos = new FileOutputStream(new File("1"+data));
while(true)
{
clientSocket.receive(receiveData);
fos.write(incomingData);
}
}
I used to have if else in the while(true) loop to check if packet length is less than 512 bytes so it knew when to break;
I was thinking there was a problem whit that but seems that was oke for now i just wait few minutes and then stop the Client.java app
The file does transfer but the original file is 852kb and so far i got 777, 800, 850,.. but never all of it.
The fundamental problem with your approach is that UDP does not guarantee delivery. If you have to use UDP (rather than, say, TCP), you have to implement a scheme that would detect and deal with packets that got lost, arrive out of order, or are delivered multiple times.
See When is it appropriate to use UDP instead of TCP?
You should probably use TCP to transfer files. You are probably losing packets because you are sending them so fast in that while loop.
int a;
while((a = fis.read(outgoingData,0,512)) != -1)
{
serverSocket.send(data);
}
since you're sending so fast I highly doubt it will have a chance to be received in the right order. some packets will probably be lost because of it too.
Also since your sending a fixed size of 512 bytes the last packet you send will probably not be exactly that size, so you will see the end of the file "look wierd."

Java InputStream Locking

I am using an InputStream to stream a file over the network.
However if my network goes down the the process of reading the file the read method blocks and is never recovers if the network reappears.
I was wondering how I should handle this case and should some exception not be thrown if the InputStream goes away.
Code is like this.
Url someUrl = new Url("http://somefile.com");
InputStream inputStream = someUrl.openStream();
byte[] byteArray = new byte[];
int size = 1024;
inputStream.read(byteArray,0,size);
So somewhere after calling read the network goes down and the read method blocks.
How can i deal with this situation as the read doesn't seem to throw an exception.
From looking at the documentation here:
http://docs.oracle.com/javase/6/docs/api/java/io/InputStream.html
It looks like read does throw an exception.
There are a few options to solve your specific problem.
One option is to track the progress of the download, and keep that status elsewhere in your program. Then, if the download fails, you can restart it and resume at the point of failure.
However, I would instead restart the download if it fails. You will need to restart it anyway so you might as well redo the whole thing from the beginning if there is a failure.
The short answer is to use Selectors from the nio package. They allow non-blocking network operations.
If you intend to use old sockets, you may try some code samples from here
Have a separate Thread running that has a reference to your InputStream, and have something to reset its timer after the last data has been received - or something similar to it. If that flag has not been reset after N seconds, then have the Thread close the InputStream. The read(...) will throw an IOException and you can recover from it then.
What you need is similar to a watchdog. Something like this:
public class WatchDogThread extends Thread
{
private final Runnable timeoutAction;
private final AtomicLong lastPoke = new AtomicLong( System.currentTimeMillis() );
private final long maxWaitTime;
public WatchDogThread( Runnable timeoutAction, long maxWaitTime )
{
this.timeoutAction = timeoutAction;
this.maxWaitTime = maxWaitTime;
}
public void poke()
{
lastPoke.set( System.currentTimeMillis() );
}
public void run()
{
while( Thread.interrupted() ) {
if( lastPoke.get() + maxWaitTime < System.currentTimeMillis() ) {
timeoutAction.run();
break;
}
try {
Thread.sleep( 1000 );
} catch( InterruptedException e ) {
break;
}
}
}
}
public class Example
{
public void method() throws IOException
{
final InputStream is = null;
WatchDogThread watchDog =
new WatchDogThread(
new Runnable()
{
#Override
public void run()
{
try {
is.close();
} catch( IOException e ) {
System.err.println( "Failed to close: " + e.getMessage() );
}
}
},
10000
);
watchDog.start();
try {
is.read();
watchDog.poke();
} finally {
watchDog.interrupt();
}
}
}
EDIT:
As noticed, sockets have a timeout already. This would be preferred over doing a watchdog thread.
The function inputStream.read() is blocking function and it should be called in a thread.
There is alternate way of avoiding this situation. The InputStream also has a method available(). It returns the number of bytes that can be read from the stream.
Call the read method only if there are some bytes available in the stream.
int length = 0;
int ret = in.available();
if(ret != 0){
length = in.read(recv);
}
InputStream does throw the IOException. Hope this information is useful to you.
This isn't a big deal. All you need to do is set a timeout on your connection.
URL url = ...;
URLConnection conn = URL.openConnection();
conn.setConnectTimeout( 30000 );
conn.setReadTimeout(15000);
InputStream is = conn.openStream();
Eventually, one of the following things will happen. Your network will come back, and your transfers will resume, the TCP stack will eventually timeout in which case an exception IS thrown, or the socket will get a socket closed/reset exception and you'll get an IOException. In all cases the thread will let go of the read() call, and your thread will return to the pool ready to service other requests without you having to do anything extra.
For example, if your network goes out you won't be getting any new connections coming in, so the fact that this thread is tied up isn't going to make any difference because you don't have connections coming in. So your network going out isn't the problem.
More likely scenario is the server you are talking to could get jammed up and stop sending you data which would slow down your clients as well. This is where tuning your timeouts is important over writing more code, using NIO, or separate threads, etc. Separate threads will just increase your machine's load, and in the end force you to abandon the thread after a timeout which is exactly what TCP already gives you. You also could tear your server up because you are creating a new thread for every request, and if you start abandoning threads you could easily wind up with 100's of threads all sitting around waiting for a timeout on there socket.
If you have a high volume of traffic on your server going through this method, and any hold up in response time from a dependency, like an external server, is going to affect your response time. So you will have to figure out how long you are willing to wait before you just error out and tell the client to try again because the server you're reading this file from isn't giving it up fast enough.
Other ideas are caching the file locally, trying to limit your network trips, etc to limit your exposure to an unresponsive peer. The exact same thing can happen with databases on external servers. If your DB doesn't send you a responses fast enough it can jam up your thread pool just like a file that doesn't come down quick enough. So why worry any differently about file servers? More error handling isn't going fix your problem, and it will just make your code obtuse.

Categories