GZIP Compressing http response before sending to the client - java

I have gzipped the response using filter. The data received has been compressed from 50 MB to 5 MB however, it didn't result in much saving of time. The time taken has reduced from 12 seconds to 10 seconds. Is there anything else which can be done to reduce the time period?
Initially, the data transfer over the network took 9 seconds, now it takes 6 seconds after compression and 1 sec to decompress approximately
What else can be done?

For the filter the possible measures are little:
There exist different compression levels, the more compression the slower. The default or GZIPOutputStream should be fast enough.
GZIPOutputStream has constructors with size to set.
Then there is buffered streaming, and not doing byte-wise int read().
Code review for plausibility: the original Content-Length header must be removed.
For the static content:
.bmp are a waste of space
.pdf can be optimized when images repeat, w.r.t. fonts.
.docx is a zip format, so inner image files might be optimized too
For dynamic content generation:
Fixed documents can be stored (xxxxxx.yyy.gz) with timestamp and then the generation time forfalls. Only of interest after measuring the real bottle neck; likely the network.
The code for delivery should be fast. In general chain streams, try not to write to a ByteArrayOutputStream, but immediately to a BufferedOutputStream(original output stream). Check that the buffering is not done twice. Some wrapping streams check that the wrapped stream is an instanceof a buffered.
Production environment:
Maybe you even need throttling (slowing down delivery) in order to serve multiple simultaneous requests.
You may need to do the delivery on another server.
Buy speed from the provider. Inquire from the provider, whether the througput was too high, and the provider slowed things down.

Related

How to speed up data transfer over socket?

Currently I am using this code on both Server and Client Side. Client is an android device.
BufferedOutputStream os = new BufferedOutputStream(socket.getOutputStream(),10000000);
BufferedInputStream sin = new BufferedInputStream(socket.getInputStream(),10000000);
os.write("10000000\n".getBytes());
os.flush();
for (int i =0;i<10000000;i++){
os.write((sampleRead[i]+" ").getBytes());
}
os.flush();
The problem is that this code takes about 80 secs to transfer data from android client to server while it takes only 8 seconds to transfer the data back from server to client. The code is same on both sides and buffer is also same. I also tried with different buffer sizes but the problem is with this segment
for (int i =0;i<10000000;i++){
os.write((sampleRead[i]+" ").getBytes());
}
The buffering takes most of the time while the actual transfer takes only about 6-7 seconds on a 150mbps hotspot connection. What could be the problem and how to solve it?
First of all, as a commenter has already noted, using a monstrously large buffer is likely to be counter productive. Once your stream buffer is bigger than the size of a network packet, app-side buffering loses its effectiveness. (The data in your "big" buffer needs to be split packet-sized chunks by the TCP/IP stack before it goes onto the network.) Indeed, if the app-side buffer is really large, you may find that your data gets stuck in the buffer for a long time waiting for the buffer to fill ... while the network is effectively idle.
(The Buffered... readers, writers and streams are primarily designed to avoid lots of syscalls that transfer tiny amounts of data. Above 10K or so, the buffering doesn't performance help much.)
The other thing to now is that in a lot of OS environments, the network throughput is actually limited by virtualization and default network stack tuning parameters. To get a better throughput, you may need to tune at the OS level.
Finally, if your network path is going over a network path that is congested, has a high end-to-end latency or links with constrained data rate, then you are unlikely to get fast data transfers no matter how you tune things.
(Compression might help ... if you can afford the CPU overhead at both ends ... but some data links already do compression transparently.)
You could compress the data transfer, it will save a lot of memory and well to transfer a compress stream of data is cheaper... For that you need to implement compress logic in client side and decompress logic in server side, see GZIPInputStream... And try reducing the buffer size is huge for a mobile device...

High performance file IO in Android

I'm creating an app which communicates with an ECG monitor. Data is read at a rate of 250 samples pr second. Each package from the ECG monitor contains 80 bytes and this is received 40 times per second.
I've tried using a RandomAcccessFile but packages were lost both in sync
RandomAccessFile(outputFile, "rws") and async RandomAccessFile(outputFile, "rw") mode.
In a recent experiment I've tried using the MappedByteBuffer. This should be extremely performant, but when I create the buffer I have to specify a size map(FileChannel.MapMode.READ_WRITE, 0, 10485760) for a 10MB buffer. But this results in a file that's always 10MB in size. Is it possible to use a MappedByteBuffer where the file size is only the actual amount of data stored?
Or is there another way to achieve this? Is it naive to write to a file this often?
On a side note this wasn't an issue at all on iOS - this can be achieved with no buffering at all.

Design : A Java Application with high throughput

I have a scenario, in which
A HUGE Input file with a specific format, delimited with \n has to be read, it has almost 20 Million records.
Each Record has to be read and processed by sending it to server in specific format.
=====================
I am thinking on how to design it.
- Read the File(nio)
- The thread that reads the file can keep those chunks into a JMS queue.
- Create n threads representing n servers (to which the data is to be sent). and then n Threads running in parallel can pick up one chunk at a time..execute that chunk by sending requests to the server.
Can you suggest if the above is fine, or you see any flaw(s) :). Also it would be great if you can suggest better way/ technologies to do this.
Thank you!
Updated : I wrote a program to read that file with 20m Records, using Apache Commons IO(file iterator) i read the file in chunks (10 lines at at time). and it read the file in 1.2 Seconds. How good is this? Should i think of going to nio? (When i did put a log to print the chunks, it took almost 26seconds! )
20 million records isn't actually that many so first off I would try just processing it normally, you may find performance is fine.
After that you will need to measure things.
You need to read from the disk sequentially for good speed there so that must be single threaded.
You don't want the disk read waiting for the networking or the networking waiting for the disk reads so dropping the data read into a queue is a good idea. You probably will want a chunk size larger than one line though for optimum performance. Measure the performance at different chunk sizes to see.
You may find that network sending is faster than disk reading already. If so then you are done, if not then at that point you can spin up more threads reading from the queue and test with them.
So your tuning factors are:
chunk size
number of threads.
Make sure you measure performance over a decent sized amount of data for various combinations to find the one that works best for your circumstances.
I believe you could batch the records instead of sending one at a time. You could avoid unnecessary network hops given the volume of data that need to be processed by the server.

DigestInputStream -> compute the hash without slowdown when the consumer part is the bottleneck

I have an application that needs to transfer files to a service like S3.
I have an InputStream of that incoming file (not necessarily a FileInputStream), and I write this InputStream to a multipart request body that is represented by an OutputStream, and then I need to write the hash of the file at the end (also through the request body).
Thanks to the DigestInputStream, I'm able to compute the hash live, so after the file body has been sent to the OutputStream, the hash becomes available and can also be appended to the multipart request.
You can check this related question:
What is the less expensive hash algorithm?
And particularly my own benchmark answer:
https://stackoverflow.com/a/19160508/82609
So it seems my own computer is capable of hashing with a MessageDigest with a throughput of 500MB/s for MD5, and nearly 200MB/s for SHA-512.
The connection to which I write the request body has a throughput of 100MB/s. If I write to the OutputStream with a higher throughput, the OutputStream starts to block (this is done intentionnally because we do want to keep a low memory footprint and do not want bytes to accumulate in some part of the application)
I have done tests and I can clearly notice the impact of the algorithm on the performances of my application.
I tried to upload 20 files of 50MB (1Gb total).
With MD5, it takes ~16sec
With SHA-512, it takes ~22sec
When doing a single upload, I can also see a slowdown of the same order.
So in the end there is no parallelisation of the computation of the hash and the write to the connection: these steps are done sequentially:
Request bytes from the stream
Hashing the bytes requested
Sending the bytes
So as the hashing has a throughput > the connection throughput, is there an easy way to not have that slowdown? Does it require additional threads?
I think the next chunk of data could be precomputed and hashed during the previous chunk is being written to the connection right?
This is not a premature optimization, we need to upload a lot of documents and the execution time is sensible for our business.

I want to use java to calcuate the speed of network,but i don't known how to do

I thought of using download files to calculator the speed, but it turns out to be unsuccessful. Operation is as follows:
I download a file and read every second file size, and use a small tool observation network speed at the same time. Finally found that the size of the file every second increase less (300 KB/S), but the tools it show JVM download speeds up to 4M/S.
Now I do not have a few thoughts, and I need your help.
When you are looking at the amount of actual data you are usually measuring in bytes (8 bits) and is without TCP/IP headers (can be 54 bytes). When you are looking at the raw connection you are measuring in bits and including headers. If the packets are fairly small (ie with a significant header overhead), you can have a 4 Mb/s (b for bit) connection and only 300 kB (B for byte or octet) of actual data.

Categories