We are migrating from Servlet API 2.5 (thread per request) in our application to Netty. One of cases is using blocking-style ini file parser which we had for many years. Current approach is accumulating incoming ByteBuf into CompositeByteBuf and feeding it to a parser after wrapping with ByteBufInputStream.
In real application we are using HTTP and ini is sent to a server as HTTP request body. But in snippet below it is assumed that all inbound content is transferred file.
class AccumulatingChannelHandler extends ChannelInboundHandlerAdapter {
final BlockingIniFileParser parser = new BlockingIniFileParser();
CompositeByteBuf accumulator;
#Override
public void channelActive(ChannelHandlerContext ctx) {
accumulator = ctx.alloc().compositeBuffer(Integer.MAX_VALUE);
}
#Override
public void channelRead(ChannelHandlerContext ctx, Object msg) {
ByteBuf ioBuffer = (ByteBuf) msg;
accumulator.addComponent(true, ioBuffer);
}
#Override
public void channelInactive(ChannelHandlerContext ctx) {
IniFile iniFile = parser.parse(new ByteBufInputStream(accumulator));
accumulator.release();
ByteBuf result = process(iniFile);
ctx.writeAndFlush(result);
ctx.close();
}
private ByteBuf process(IniFile iniFile) {...}
}
class BlockingIniFileParser {
IniFile parse(InputStream in) {...}
}
interface IniFile {
String getSetting(String section, String entry);
}
By default pooled direct buffers are coming to channelRead method. And with such strategy we risk to get uncontrollable consumption of direct memory. So, I would like to understand:
Is it rational to accumulate IO buffers in such fashion?
Is there any best practice for integrating Netty IO with blocking parsers?
What is the best practice for parsing input (in some structured format) with Netty?
You have not mentioned the size of a typical uploaded ini file, but assuming they're large enough to cause some concern, I would consider ditching the CompositeByteBuf. Allocate an un-pooled [optionally direct] buffer and as pooled buffers come in the door, write them to the un-pooled buffer, then release them. When you're done reading, continue with your use of a ByteBufInputStream around the un-pooled buffer. Once complete, the un-pooled buffer will be GCed.
You will still be allocating chunks of memory, but it won't be drawn from your buffer pools, and if you use a direct un-pooled buffer, it will not impact your heap as much.
Ultimately, if the size of the un-pooled buffer remains a concern, I would bite the bullet and write the incoming pooled buffers out to disk and on completion, read them back in using a FileInputStream.
Related
I am using latest version Apache mina 2.2.1 to do low-level communication. A server port opened which accepts messages from client, these messages may contain sensitive data.
With new PCI requirments, what I am trying is the data should not be written to heap, in-order to do that I have added “IoBuffer.setUseDirectBuffer(true);” in java main during startup of program. So, all the mina buffers inside will not be heap allocated but direct allocated. I am using following filters
SSLFilter
PrefixedString decoder filter extends CumulativeProtocolDecoder
protected boolean doDecode(IoSession session, IoBuffer in, ProtocolDecoderOutput out) throws Exception {
if (in.prefixedDataAvailable(4)) {
int length = in.getInt();
byte[] bytes = new byte[length];
in.get(bytes);
String str = new String(bytes, “UTF-8”);
out.write(str);
Arrays.fill(bytes, (byte)42);
//io.sweep();
return true;
} else {
return false;
}
}
Just after request response operation is completed, I still see the message with sensitive data in heap, it is in byte array with no reference attached to it.
Tried the following but no
Sending data without SSLFilter in chain but still the data is seen in heap.
Tried with old version of mina 2.10.0 and still the same issue with data on heap.
In doDecode(above), tried following but Mina seems to be not liking modifying the IoBuffer and subsequent messages are halted,
- tried sweep operation on IoBuffer,
- tried modifying the IoBuffer secure data byte by byte.
Which would be the fastest way to copy a file over a socket? I have tried several ways but I am not convinced I found the fastest way concerning transfer and CPU usage. (Best result: 175mBit/s (SSD/GBit Network))
Server:
ByteBuffer bb = ByteBuffer.allocate(packet_size);
DataOutputStream data_out = new DataOutputStream(socket.getOutputStream());
while(working){
int count =0;
int packet_size = in.readInt();
long pos = in.readLong();
if(filechannel.position()!=requested_pos){
filechannel.position(requested_pos);
}
bb.limit(packet_size);
bb.position(0);
if((count=filechannel.read(bb))>0){ //FileInputStream.getChannel()
data_out.writeInt(count);
data_out.write(bb.array(),0,count);
}else{
working=false;
}
}
Client:
for(long i=0;i<=steps;i++){
data_out.writeInt(packet_size); //requested packet size
data_out.writeLong(i*packet_size); //requested file position
count=in.readInt();
bb.clear();
bb.limit(count);
lastRead=0;
while(lastRead<count){
lastRead+=in.read(bytes,lastRead,count-lastRead);
}
bb.put(bytes,0,count);
bb.position(0);
filechannel.write(bb); // filechannel over RandomAccessFile
}
any suggestions?
You are looking at only half the issue. The code used to send/receive is only one factor. No matter how hard you optimize it, if you set up your socket with unsuitable parameters, performance takes a big hit.
For large data transfers, ensure the sockets have reasonably large buffers. I'd choose at least 64kb, possibly larger. Send and receive buffers can be set up independently, for the sender you want a large(r) send buffer and for the receiver a large(r) receive buffer.
socket.setReceiveBufferSize(int);
socket.setSendBufferSize(int);
socket.setTcpNoDelay(false);
Set TCP NO DELAY to OFF, unless you know what you're doing and after confirming you really absolutely need it. It will never improve throughput, on the contrary it may sacrifice throughput in favor of reduced latency.
The next thing is to tailor your sender code to do its best to keep that buffer full at all times. For maximum speed reading from the file and writing to the socket should be separated into two independent threads, communicating to each other using some kind of queue. Chunks in the queue should be reasonably large (at least a few kb).
Likewise the receiving code should do its best to keep the receive buffer as empty as possible. Again, for maximum speed this requires two threads, one reading the socket and another processing the data. Queue in between like the sender.
The job of the queues is to decouple stalls in reading data from file/writing to file from the actual network transfer, and vice versa.
The above is the generic pattern how you get maximum throughput, regardless of transmission channels. The slower channel will be kept completely saturated, be it file reading/writing or network transfer.
Buffer sizes can be tweaked to squeeze out the last few percent of possible performance (I'd start with 64kb for the socket and 8kb chunks in the queue with a maximum queue size of 1mb, this should deliver performance reasonably close to the maximum possible).
Another limiting factor you may run into is the TCP transfer window scaling (especially over a high bandwidth, high latency connection). Aside from ensuring the receiver empties the receive buffer as fast as possible there isn't anything you can do from the java side. Tweaking options exists on the OS level.
You want to use the NIO.
import java.nio.ByteBuffer;
import java.nio.channels.Channels;
import java.nio.channels.ReadableByteChannel;
import java.nio.channels.WritableByteChannel;
public class FileServlet extends HttpServlet {
#Override
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
try(final InputStream is = new BufferedInputStream((InputStream) <YOUR INPUT STREAM TO A FILE HERE>);
final OutputStream os = new BufferedOutputStream(response.getOutputStream()); ) {
fastCopy(is, os);
}
}
public static void fastCopy(final InputStream src, final OutputStream dest) throws IOException {
fastCopy(Channels.newChannel(src), Channels.newChannel(dest));
}
public static void fastCopy(final ReadableByteChannel src, final WritableByteChannel dest) throws IOException {
final ByteBuffer buffer = ByteBuffer.allocateDirect(16 * 1024);
while(src.read(buffer) != -1) {
buffer.flip();
dest.write(buffer);
buffer.compact();
}
buffer.flip();
while(buffer.hasRemaining()) {
dest.write(buffer);
}
}
}
}
Why doesn't channelRead() give me the full message I send to the server? Fragmentation sometimes occur when messages are getting above 140 bytes (Roughly, sometimes more and sometimes less). I'm using a TCP socket using the NioServerSocketChannel class.
I'm using 4.1.0.Beta5.
Isn't there a way to read the full message when it has arrived?
this.serverBootstrap = new ServerBootstrap();
this.serverBootstrap.group(new NioEventLoopGroup(1), new NioEventLoopGroup(6))
.channel(NioServerSocketChannel.class)
.childHandler(new ChannelInitializer<SocketChannel>()
{
#Override
public void initChannel(SocketChannel ch) throws Exception
{
ch.pipeline().addLast(new TestServerHandler());
}
})
.option(ChannelOption.SO_BACKLOG, (int)Short.MAX_VALUE)
.option(ChannelOption.SO_RCVBUF, (int) Short.MAX_VALUE)
.option(ChannelOption.SO_KEEPALIVE, true)
.option(ChannelOption.TCP_NODELAY, true);
this.serverBootstrap.bind(this.host, this.port);
And class TestServerHandler extends ChannelInboundHandlerAdapter:
#Override
public void channelRead(ChannelHandlerContext ctx, Object msg) {
String s = buffer.toString(CharsetUtil.UTF_8);
for(int i = 0; i < 20; i++)
{
s = s.replace("[" + ((char)i) + "]", i + "");
}
System.out.println(s.length() + "");
System.out.println();
System.out.println();
System.out.println(s);
}
I need a way to get the full bytebuf / bytearray when it has fully arrived at the server and get notified of that so my application can respond in a correct way according to the data the client has send.
So in short: How can I prevent fragmentation and have the channelRead event output the whole message / bytebuf.
The basic data type used by Netty is Channel Buffers or ByteBuf. This is simply a collection of bytes and nothing else. In your code you have simply used a custom handler to handle the raw incoming data. This is generally not a good practice. A very basic netty pipeline should look something like the following
So a pipeline consists of a decoder / encoder and then we have our custom handlers or logging handlers. We never really handle any raw data as is. TCP is a stream protocol. It does not identify when a specific packet ends and a new packet starts. Even if we send a very very large packet or say two individual packets, they will simply be treated as a set of bytes and when we try to read the raw set of bytes, fragmentation might happen.
So properly implement a channel pipeline which consists of a String decoder / encoder (whatever you need) and this problem will go away.
TCP provides a stream of bytes, so you can't rely on receiving a complete message in one packet. You will need a handler in your pipeline that knows how your messages are framed. Netty provides some built-in handlers that you can adapt for your protocol. See Dealing with a Stream-based Transport in the Netty User Guide.
Please give me advice how to increase my ByteBuf initial capacity. In situation like:
#Override
protected void decode(ChannelHandlerContext ctx, ByteBuf in, List<Object> out) throws Exception {
byte[] byteinput = new byte[in.readableBytes()];
in.readBytes(byteinput);
//further handling...
}
If income message more than max capacity of ByteBuf - i get cutted data. Its vital for this project to get whole, non chunked message.
I suppose i need to set initial capacity of ByteBuf somewhere in bootstraps childOptions, or in cannel.config()... inside of ChannelInitializer.
And i tried different ways like setting
ch.config().setReceiveBufferSize(1024)
but i still have same value of ByteBuf capacity(e.g. 496).
UPD
I discovered my protocol traffic with wireshark, and packets up to 1,4k going uncorrupted out and in from my test user client. This issue is only matter of netty settings. Operating system socket buffer do not cuts messages.
That was easy as pie.
ServerBootstrap b = new ServerBootstrap(); // (2)
b.group(bossGroup, workerGroup)
.channel(NioServerSocketChannel.class) // (3)
.childHandler(new ChannelInitializer<SocketChannel>() { // (4)
#Override
public void initChannel(SocketChannel ch) throws Exception {
//decrypt //checknum
ch.config().setRecvByteBufAllocator(new FixedRecvByteBufAllocator(2048)); //set buf size here
ch.pipeline().addLast(new InboundDecryptor());
.
.
.
You may be able to configure Netty's buffer allocation sizes but there are likely more general limitations you are subject to. Netty is an asynchronous framework. This means it will read what ever is made available to it by the OS and pass that on to you. Netty has no control over network conditions, networking hardware behavior, OS behavior, or anything else in between your producer of data and your Netty application. If your application logic requires complete application level messages you may have to aggregate the data before you invoke this application logic. Netty has some convenience methods to help with this see MessageAggregator.java and for an HTTP specific implementation see HttpObjectAggregator.java.
Scott is right. Increasing only the size of buffer does not solve the problem.
ch.config().setRecvByteBufAllocator(new FixedRecvByteBufAllocator(2048));
For HTTP requests it also should use HTTPObjectAggregator. It worked for me.
I'm trying to calculate the load on a server I have to build.
I need to create a server witch have one million users registered in an SQL database. During a week each user will approximately connect 3-4 times. Each time a user will up and download 1-30 MB data, and it will take maybe 1-2 minutes.
When an upload is complete it will be deleted within minutes.
(Update text removed error in calculations)
I know how to make and query an SQL database but what to consider in this situation?
What you want exactly is Netty. It's an API written in NIO and provides another event driven model instead of the classic thread model.
It doesn't use a thread per request, but it put the requests in a queue. With this tool you can make up to 250,000 requests per second.
I am using Netty for a similar scenario. It is just working!
Here is a starting point for using netty:
public class TCPListener {
private static ServerBootstrap bootstrap;
public static void run(){
bootstrap = new ServerBootstrap(
new NioServerSocketChannelFactory(
Executors.newCachedThreadPool(),
Executors.newCachedThreadPool()));
bootstrap.setPipelineFactory(new ChannelPipelineFactory() {
public ChannelPipeline getPipeline() throws Exception {
TCPListnerHandler handler = new MyHandler();
ChannelPipeline pipeline = Channels.pipeline();
pipeline.addLast("handler", handler);
return pipeline;
}
});
bootstrap.bind(new InetSocketAddress(9999)); //port number is 9999
}
public static void main(String[] args) throws Exception {
run();
}
}
and MyHandler class:
public class MyHandler extends SimpleChannelUpstreamHandler {
#Override
public void messageReceived(
ChannelHandlerContext ctx, MessageEvent e) {
try {
String remoteAddress = e.getRemoteAddress().toString();
ChannelBuffer buffer= (ChannelBuffer) e.getMessage();
//Now the buffer contains byte stream from client.
} catch (UnsupportedEncodingException ex) {
ex.printStackTrace();
}
byte[] output; //suppose output is a filled byte array
ChannelBuffer writebuffer = ChannelBuffers.buffer(output.length);
for (int i = 0; i < output.length; i++) {
writebuffer.writeByte(output[i]);
}
e.getChannel().write(writebuffer);
}
#Override
public void exceptionCaught(
ChannelHandlerContext ctx, ExceptionEvent e) {
// Close the connection when an exception is raised.
e.getChannel().close();
}
}
At first I was thinking this many
users would require a non-blocking
solution but my calculations show that
I dont, [am I] right?
On modern operating systems and hardware, thread-per-connection is faster than non-blocking I/O, at least unless the number of connections reaches truely extreme levels. However, for writing the data to disk, NIO (channels and buffers) may help, because it can use DMA and avoid copy operations.
But overall, I also think network bandwidth and storage are your main concerns in this application.
The important thing to remember is that most users do not access a system evenly in every hour of every day of the week. Your system need to perform correctly during the busiest hour of the week.
Say the busiest hour of the week, 1/50 of all uploads are made. In the busiest hour each upload could be 30 MB, a total of 1.8 TB. This means you need to have an Internet upload bandwidth to support this. 1.8 TB/hour * 8 bits/byte / 60 min/hour / 60 sec/min = 4 Gbit/s Internet connection.
If for example, you have only a 1 Gbit/s connection, this will limit access to your server.
The other thing to consider is your retention time for these uploads. If each upload is 15 MB on average, you will be getting 157 TB per week or 8.2 PB (8200 TB) per year. You may need a significant amount of storage to retain this.
Once you have spend a significant amount of money on Internet connectivity and disk, the cost of buying a couple of servers is minor. You could use Apache MIMA, however a single server with a 10 Gbit/s connection can support 1 GB easily using any software you care to chose.
A single PC/server/labtop can handle 1,000 I/O threads so 300-600 is not a lot.
The problem will not be in the software but in the network/hardware you chose.