I need to empty the buffer of a inputstream from TcpSocket connection.
I tried this:
public void emptyReadBuffer(){
try {
while((DataInputStream)inFromServer.read()>=0) {}
} catch (IOException e) {}
}
But it waits for some input until the timeout... I just want to empty the buffer because I found that sometimes I read dirty data for previous connection.
Your code snippet will loop forever, until either an exception happens, or the socket closes.
What you want is not possible, depending on your definition of 'buffer'.
If you're talking about the actual buffers on the local hardware (So, the network card, etcetera), you can sort of do this, though the actual specification of java's TCP support don't quite guarantee you can do it. However, this is pointless: A new TCP socket simply does not hold stray bytes from previous connections (it would be a major security leak if it was, so, yes, people would care, a lot, it'd be public information).
So, I assume what you really mean is that you have a single, long-lived TCP connections which is used for multiple 'sessions', and you're still picking up bytes from the previous session.
If the latter is indeed what's going on, your use of the word 'buffer' is misleading. You have no idea WHERE those 'stray bytes' are right now: Perhaps they are still stuck somewhere in a router, or in the middle of a cable across the atlantic. The notion of 'just throw away any bytes that have arrived at the local machine' just won't cover it.
What you really need is a unique message along the lines of 'Hello, this is the start of session ', and then your 'emptyReadBuffer' method should instead be: "ignoreAllUntilStartOfSession(String sessionId)" instead. That CAN be done, and will reliably get rid of the previous session.
What your snippet is trying to do and failing would be best done like so:
inFromServer.skip(inFromServer.available());
But as I said, this doesn't accomplish anything reliably. You really, really shouldn't do this. It WILL fail to do what you want one day because networks are varied and unreliable.
Related
This is one for the tomcat / network experts. I would benchmark / wireshark it but this is pretty demanding and perhaps someone knows the answer offhand.
Comparing these two methods for generating servlet output, which one would be the fastest from a user's perspective:
Writing direct to the servlet output stream:
for( int i=0; i<10000; i++ ) {
servletOutputStream.write( "a" );
/* a little bit of delay */
}
Creating a buffer and write it in one turn
for( int i=0; i<10000; i++ ) {
stringbuffer.append( "a" );
}
servletOutputStream.write( stringBuffer.toString() )
I can imagine the PROs of method 1 would be that the response can start sending stuff quickly while in method 2 the sending starts later.
On the other hand method 1 could generate more / smaller TCP packets which in turn could take longer to transmit completely?
Regards
PS: Please, don't tell me this is premature optimization. In the case at hand I have an object which offers both toString and write(Appendable a) methods. I've simply have to choose which one to use here. Additionally I find this very interesting from a theoretical point of view and regarding the general design of servlets.
EDIT: Thanks all for the answers. But it seems I was to unclear in my question or oversimplified my example.
I'm not worried about not buffering at all. I know that there must be buffering at least in one place in the sending queue. Probably it is in multiple places (Java,OS,Hardware). I think the real question I have is this: When is are these buffers flushed?
So to make it more clear lets assume we have a MTU of 1000 and sending of consecutive packets is triggered by a buffer-empty interrupt by the hardware. Then in the first case it could look like:
. packet( "a" ) //triggered by the first write( "a" ),
. packet( "aaaaaaa" ) // triggered by buffer-empty, sending the amount of "a"s which have been written in the meantime
. packet( "aaaa" ) // and so on
. packet( "aaaaaaaaaaa" )
...x1000 // or so in this example
While for the second case there are all 10000 bytes already available when sending starts and so the result would be:
. packet( "aaaa....a(x1000)" )
. packet( "aaaa....a(x1000)" )
...x10
Even for smaller data sizes (smaller than MTU, lets say 100 "a"s) and creating the output faster then it could be send the result could look like:
. packet( "a" ) // first write
. packet( "aaaa...a(x99) ) // all remaining data available when buffer-empty interrupt.
Of course all this would be quiet different if the buffer(s) where working differently. E.g. if they would be waiting for more data to send or waiting for a flush to send anything at all ... (but this in turn would slow down sending in some respect, too)
So this is what I don't know: How exactly is this buffering within tomcat working and what would be the the best strategy of using it?
(And I'm not worrying or expecting larger speed gains. I just like to know how things work.)
I expect that the ServletOutputStream is actually an instance of
org.apache.tomcat.core.BufferedServletOutputStream
which is (as the name suggests) is a buffered stream. That will mean that it is better to write characters directly to the stream rather than assembling them in a StringBuffer or StringBuilder and writing the result. Writing directly will avoid at least one copy of the characters.
If it turns out that your ServletOutputStream is not buffered already, then you can wrap it in a BufferedOutputStream, and you will get the same result.
Assuming now that you are talking about the streams now. (Flushing a StringBuffer has no meaning.)
When is are these buffers flushed?
When they are full, when you call flush on the stream, or when the stream is closed.
... and what would be the the best strategy of using it?
In general, write the data and when you are finished, close the file. Don't flush explicitly, unless there is a good reason to do so. There rarely is, if you are delivering ordinary HTTP responses. (A flush is liable to cause the network stack to transmit the same amount of information by sending more network packets. That could impact on overall network throughput.)
In the case of the servlet framework, I recall that the Servlet specification says that a ServletOutputStream will automatically be flushed and closed when the request/response processing is finished. Provided that you didn't wrap the ServletOutputStream, you don't even need to close the stream. (It does no harm though.)
There's no doubt that writing directly to the output stream will be faster for a number of reasons:
The output buffer is fixed
The output buffer will be flushed automatically when it's full (and I'd argue that it doesn't matter when this happens, so stop worrying about it)
The output buffer will be re-used
Your StringBuilder can grow very large, taking up lots of heap space
Your StringBuilder will re-allocate its space at intervals, causing new objects to be created, data copied all over the place, etc
All that memory activity will create "garbage" that the GC will have to deal with
However
I would argue that your analysis isn't taking into account a ver important factor: detection and recovery from errors.
If you have a semi-complex procedure that your servlet is performing, it could fail at any time. If it fails after rendering half of the output, you will be unable to do any of the following things:
Issue an "error" HTTP status code (e.g. 500 Server Error)
Redirect the user to another page (error page?)
Show a nice error message on the screen without ruining/interrupting the page
So, even though the manually-buffered approach (based upon the StringBuilder) is less efficient, I believe it gives you a great deal of flexibility for handling errors.
This is more of a religious argument than anything else, but you'll find many web application programmers who would say that your servlet should produce no output at all, and the task of generating responses should be delegated to another component more suited to the task (e.g. JSP, Velocity, FreeMarker, etc.).
If you are, however, writing a servlet with an eye towards raw speed, then by all means: write directly to the output stream. It will give you the best performance in both micro-benchmarks and overall speed under load.
EDIT 2016-01-26
When [are] these buffers flushed?
The servlet spec makes no guarantees about whether or not the ServletOutputStream is buffered or not, but not using a buffer would be a practical mistake: sending TCP packets one-character-at-a-time would certainly be awful for performance.
If you absolutely need to make sure that the response is buffered, you must use your own BufferedOutputStream, because the servlet container could change its implementation at any time and, as mentioned, is not guaranteed to buffer your response for you.
How exactly is this buffering within Tomcat working?
The buffering currently implemented in Tomcat works just like buffering in the standard JDK classes: when the buffer fills, it's flushed to the lower stream and then the balance of bytes remains in the buffer after the call is made.
If you manually call flush on the stream, you'll force the use of Transfer-Encoding: chunked which means that additional data will need to be sent over the wire, because there is no Content-Length (unless you manually set one before you start filling the buffer). If you can avoid chunked-encoding, you can save yourself some network traffic. Also, if the client knows the Content-Length of the response, they can show an accurate progress bar when downloading the resource. With chunked encoding, the client never knows how much data is coming until it's all been downloaded.
Wrap you servletOutputStream in a BufferedOutputStream (unless it already is) and you don't need to worry about silly things like that.
I would definitely use the first one. The servlet output stream is buffered, so you don't have to worry about sending it too fast. Also you allocate a new string everytime with the second one, which might impose a GC overhead overtime. Use the first one and call flush after the loop.
It's already buffered, and in some cases it is written to a ByteArrayOutputStream so that Tomcat can prepend the Content-Length header. Don't worry about it.
To be more specific, i have written a server with java NIO, and it works quiet well, after some testing i have found out that for some reason, in average a call to the SocketChannels write method takes 1ms, the read method on the other hand takes 0.22ms in average.
Now at first i was thinking that setting the sent/receive buffer values on Socket might help a bit, but after thinking about it, all the messages are very short(a few bytes) and i send a message about every 2 seconds on a single connection. Both sent and receive buffers are well over 1024 bytes in size so this can't really be the problem, i do have several thousand clients connected at once thou.
Now i am a bit out of ideas on this, is this normal and if it is, why ?
I would start by using Wireshark to eliminate variables.
#Nuoji i am using nonblocikng-io and yes i am using a Selector, as for when i write to a channel i do the following:
Since what i wrote in the second paragraph in my post is true, i assume that the channel is ready for writing in most cases, hence i do not at first set the interest set on the key to write, but rather try to write to the channel directly. In case however that, i can not write everything to the channel (or anything at all for that matter), i set the interest set on the key to write(that way the next time i try to write to the channel it is ready to write). Although in my testing where i got the results mentioned in the original post, this happens very rarely.
And yes i can give you samples of the code, although i didn't really want to bother anyone with it. What parts in particular would you like to see, the selector thread or the write thread ?
I'm coding a tool, that, given any URL, would periodically fetch its output. The problem is that an output could be not a simple and lightweight HTML page (expected in most cases), but some heavy data stream (i.e. straight from /dev/urandom, possible DoS attack).
I'm using java.net.URL + java.net.URLConnection, setting connection and read timeouts to 30sec. Currently input is being read by java.io.BufferedReader, using readLine().
Possible solutions:
Use java.io.BufferedReader.read() byte by byte, counting them and closing connection after limit has been reached. The problem is that an attacker may transmit one byte every 29sec, so that read/connection timeout would almost never occur (204800B * 29sec = 68 days)
Limit Thread execution to 1-5min and use java.io.BufferedReader.readLine(). Any problems here?
I feel like trying to reinvent the wheel and the solution is very straightforward, just doesn't come to my mind.
Thanks in advance.
You could encapsulatebhhis by writing yourself a FilterInputStream that enforces whatever you want to enforce and placing it at the bottom of the stack, around the connection output stream
However this and the remedies you suggest only work if the output is arriving in chunked transfer mode. Otherwise HttpURLConnection can buffer the entire response before you read any of it. The usual solution to this is a filter in the firewall.
There seems to be a number of avenues for denial of service here.
A huge big line that gobbles memory. Probably the easiest is to use a MeteredInputStream before even hitting the character decoding. Reading char by char will be extremely slow in any circumstance. You could read a long char[] at a time, but that will likely over complicate the code.
Dealing with an adversary (or bug) keeping many connections alive at once. You probably want non-blocking I/O reading the whole message, and then proceed normally.
I am re-writing the core NIO server networking code for my project, and I'm trying to figure out when I should "store" connection information for future use. For example, once a client connects in the usual manner, I store and associate the SocketChannel object for that connected client so that I can write data to that client at any time. Generally I use the client's IP address (including port) as the key in a HashMap that maps to the SocketChannel object. That way, I can easily do a lookup on their IP address and asynchronously send data to them via that SocketChannel.
This might not be the best approach, but it works, and the project is too large to change its fundamental networking code, though I would consider suggestions. My main question, however, is this:
At what point should I "store" the SocketChannel for future use? I have been storing a reference to the SocketChannel once the connection is accepted (via OP_ACCEPT). I feel that this is an efficient approach, because I can assume that the map entry already exists when the OP_READ event comes in. Otherwise, I would need to do a computationally expensive check on the HashMap every time OP_READ occurs, and it is obvious that MANY more of those will occur for a client than OP_ACCEPT. My fear, I guess, is that there may be some connections that become accepted (OP_ACCEPT) but never send any data (OP_READ). Perhaps this is possible due to a firewall issue or a malfunctioning client or network adaptor. I think this could lead to "zombie" connections that are not active but also never receive a close message.
Part of my reason for re-writing my network code is that on rare occasions, I get a client connection that has gotten into a strange state. I'm thinking the way I've handled OP_ACCEPT versus OP_READ, including the information I use to assume a connection is "valid" and can be stored, could be wrong.
I'm sorry my question isn't more specific, I'm just looking for the best, most efficient way to determine if a SocketChannel is truly valid so I can store a reference to it. Thanks very much for any help!
If you're using Selectors and non-blocking IO, then you might want to consider letting NIO itself keep track of the association between a channel and it's stateful data. When you call SelectionKey.register(), you can use the three-argument form to pass in an "attachment". At every point in the future, that SelectionKey will always return the attachment object that you provided. (This is pretty clearly inspired by the "void *user_data" type of argument in OS-level APIs.)
That attachment stays with the key, so it's a convenient place to keep state data. The nice thing is that all the mapping from channel to key to attachment will already be handled by NIO, so you do less bookkeeping. Bookkeeping--like Map lookups--can really hurt inside of an IO responder loop.
As an added feature, you can also change the attachment later, so if you needed different state objects for different phases of your protocol, you can keep track of that on the SelectionKey, too.
Regarding the odd state you find your connections in, there are some subtleties in using NIO and selectors that might be biting you. For example, once a SelectionKey signals that it's ready for read, it will continue to be ready for read the next time some other thread calls select(). So, it's easy to end up with multiple threads attempting to read the socket. On the other hand, if you attempt to deregister the key for reading while you're doing the read, then you can end up with threading bugs because SelectionKeys and their interest ops can only be manipulated by the thread that actually calls select(). So, overall, this API has some sharp edges, and it's tricky to get all the state handling correct.
Oh, and one more possibility, depending on who closes the socket first, you may or may not notice a closed socket until you explicitly ask. I can't recall the exact details off the top of my head, but it's something like this: the client half-closes its end of the socket, this does not signal any ready op on the selection key, so the socketchannel never gets read. This can leave a bunch of sockets in TIME_WAIT status on the client.
As a final recommendation, if you're doing async IO, then I definitely recommend a couple of books in the "Pattern Oriented Software Architecture" (POSA) series. Volume 2 deals with a lot of IO patterns. (For instance, NIO lends itself very well to the Reactor pattern from Volume 2. It addresses a bunch of those state handling problems I mention above.) Volume 4 includes those patterns and embeds them in the larger context of distributed systems in general. Both of these books are a very valuable resource.
An alternative may be to look at an existing NIO socket framework, possible candidates are:
Apache MINA
Sun Grizzly
JBoss Netty
Sometimes, while sending a large amount of data via SocketChannel.write(), the underlying TCP buffer gets filled up, and I have to continually re-try the write() until the data is all sent.
So, I might have something like this:
public void send(ByteBuffer bb, SocketChannel sc){
sc.write(bb);
while (bb.remaining()>0){
Thread.sleep(10);
sc.write(bb);
}
}
The problem is that the occasional issue with a large ByteBuffer and an overflowing underlying TCP buffer means that this call to send() will block for an unexpected amount of time. In my project, there are hundreds of clients connected simultaneously, and one delay caused by one socket connection can bring the whole system to a crawl until this one delay with one SocketChannel is resolved. When a delay occurs, it can cause a chain reaction of slowing down in other areas of the project, and having low latency is important.
I need a solution that will take care of this TCP buffer overflow issue transparently and without causing everything to block when multiple calls to SocketChannel.write() are needed. I have considered putting send() into a separate class extending Thread so it runs as its own thread and does not block the calling code. However, I am concerned about the overhead necessary in creating a thread for EACH socket connection I am maintaining, especially when 99% of the time, SocketChannel.write() succeeds on the first try, meaning there's no need for the thread to be there. (In other words, putting send() in a separate thread is really only needed if the while() loop is used -- only in cases where there is a buffer issue, perhaps 1% of the time) If there is a buffer issue only 1% of the time, I don't need the overhead of a thread for the other 99% of calls to send().
I hope that makes sense... I could really use some suggestions. Thanks!
Prior to Java NIO, you had to use one Thread per socket to get good performance. This is a problem for all socket based applications, not just Java. Support for non-blocking IO was added to all operating systems to overcome this. The Java NIO implementation is based on Selectors.
See The definitive Java NIO book and this On Java article to get started. Note however, that this is a complex topic and it still brings some multithreading issues into your code. Google "non blocking NIO" for more information.
The more I read about Java NIO, the more it gives me the willies. Anyway, I think this article answers your problem...
http://weblogs.java.net/blog/2006/05/30/tricks-and-tips-nio-part-i-why-you-must-handle-opwrite
It sounds like this guy has a more elegant solution than the sleep loop.
Also I'm fast coming to the conclusion that using Java NIO by itself is too dangerous. Where I can, I think I'll probably use Apache MINA which provides a nice abstraction above Java NIO and its little 'surprises'.
You don't need the sleep() as the write will either return immediately or block.
You could have an executor which you pass the write to if it doesn't write the first time.
Another option is to have a small pool of thread to perform the writes.
However, the best option for you may be to use a Selector (as has been suggested) so you know when a socket is ready to perform another write.
For hundreds of connections, you probably don't need to bother with NIO. Good old fashioned blocking sockets and threads will do you.
With NIO, you can register interest in OP_WRITE for the selection key, and you will get notified when there is room to write more data.
There are a few things you need to do, assuming you already have a loop using
Selector.select(); to determine which sockets are ready for I/O.
Set the socket channel to non-blocking after you've created it, sc.configureBlocking(false);
Write (possibly parts of) the buffer and check if there's anything left. The buffer itself takes care of current position and how much is left.
Something like
sc.write(bb);
if(sc.remaining() == 0)
//we're done with this buffer, remove it from the select set if there's nothing else to send.
else
//do other stuff/return to select loop
Get rid of your while loop that sleeps
I am facing some of the same issues right now:
- If you have a small amount of connections, but with large transfers, I would just create a threadpool, and let the writes block for the writer threads.
- If you have a lot of connections then you could use full Java NIO, and register OP_WRITE on your accept()ed sockets, and then wait for the selector to come in.
The Orielly Java NIO book has all this.
Also:
http://www.exampledepot.com/egs/java.nio/NbServer.html?l=rel
Some research online has led me to believe NIO is pretty overkill unless you have a lot of incoming connections. Otherwise, if its just a few large transfers - then just use a write thread. It will probably have quicker response. A number of people have issues with NIO not repsonding as quick as they want. Since your write thread is on its own blocking it wont hurt you.