Large byte arrays over RMI - java

I'm currently trying to find out whether it's a good idea to transfer rather large byte arrays (<50MB) over RMI.
I read that it is slow and the data needs to be hold in memory both on the client and the server. This could result into a problem when there are multiple calls.
Are there any (simple) alternatives to this?

RMIIO lets you stream objects in chunked fashion.
EDIT : you can also use KRYO to serialize and compress the object to send across the wire.

RMI is intended to transfer objects. If you have a byte array object on the server and want it on the client you must have it both places until it has been delivered successfully (then you can let the original go away).
A more reasonable approach might be repeated calls populating a remote object transferring only a small chunk at a time. This will then in turn require multiple trips making it slower.
What is the actual (non-technical) problem you want to solve?

Consider java streams which support compression to send/receive large amounts of data.
For instance GZipOutputStream to send data and GZipInputStream to receive sent data block.

It's a very bad idea. The byte array has to be formed in memory before calling the remote method; then it has to be transmitted in the call; then it has to be read by the server; then it has to exist in the server; then it can be processed by the server. You never want to deal with data items this large in a single chunk. It wastes both time and space. Consider a streaming API where you can use moderate sized buffers at both ends; send the data in chunks that are convenient to the sender; and receive it in chunks that are convenient to the receiver.

Related

Different in transfering data between Pipe and Serialization in Java and C?

I am studying about the Interprocess Communication Methods in the course Operating System Concept.
I don't really understand the mechanism in transferring data. In the case of pipe method, a conduit will be created between 2 process to transfer byte streams , right?
And how about Serialization?
I know Serialization is the method to convert an object into byte stream to transfer and we can rebuild the object when it reached the destination.
So in which case we use Serialzation or Pipe to transfer data?
What is the advantages and the disadvantages between them?
Can anyone explain to me a very deep mechanism in transferring data of these methods? And are these mechanisms different between Java and C? , or it is the same?
Thanks in advanced.
There are two basic types of pipe in UNIX/Linux: a named pipe and an anonymous one.
An anonymous pipe is created by the "pipe()" system call, which returns 2 file descriptors associated with a newly created pipe, one for writing data, the other for reading from it. The shell uses anonymous pipes to connect the standard output of one process to the standard input of another when you connect two process with the "|" operator.
A named pipe appears as a file in the file system, and can be opened with the normal "open()" system call.
In blocking mode (the default), the process that reads from the pipe will block until data appears there; the writer can then send data which will appear as a byte stream to the reader.
The important fact here is that the data that is transferred is a byte stream. The sender and receiver of the data must agree on a protocol to determine how to interpret the bytes. One typical method for this is serialization. Consider a 32 bit integer ... 4 bytes. Some systems store those bytes with the most significant bit in the first byte (known as big-endian), some store the least significant bit in the first byte (little-endian system, such as x86). When transmitting such data across a network, serialization of such data is important, since it is entirely possible that each end stores the data in a different order.
But even when transmitting data between two processes on the same host, serialization helps. It can be used to encapsulate objects so that the receiver knows when it has received everything. For example, with our 32 bit integer, if the receiver doesn't know it is expecting an integer, and gets 3 bytes (the 4th having been delayed by some scheduling), it must know that it needs to wait before continuing.
None of this is particular language specific, save that some languages have built in support for serialization. Java is one such language (see ObjectInputStream and ObjectOutputStream). If you are trying to move data between Java and C programs, and on the Java side you want to use these classes, then you'll need to understand the serialization protocol used by them.
Another common serialization technique is JSON (JavaScript Object Notation), for which there exists several good libraries in C and Java.
I don't really understand the mechanism in transferring data. In the case of pipe method, a conduit will be created between 2 process to transfer byte streams , right?
A named or anonymous pipe is a stream rather like a socket connection over loop back. In fact in some OSes, it is implemented by the same drivers/library.
And how about Serialization?
How serialization is done is not a language specific and you can serialize data in a manner which can be shared between C and Java.
What is the advantages and the disadvantages between them?
There is many forms of serialization and this is too broad a topic to cover in one answer. You could do an entire thesis on it.
Can explain one explain to me a very deep mechanism in transferring data of these methods?
There isn't much to it. A block of data is copied to memory managed by the OS and this buffered data can be read by another program (or the same one)
And are these mechanisms different between Java and C? , or it is the same?
They both use the same OS calls to do the real work. The Java API hides this fact from you and makes it more Java friendly, but they are the same.

How to handle big object sent on socket java?

I am trying to implement a client-server in java
and i made connection between in sockets
and sending JSON objects as strings on streams
if i have big object is there's a way to handle it
so i don't have to regroup it because the limit size of tcp packet (cant know when the single object is fully transferred to me as client or not yet)
note :am using G-son to convert objects to JSON objects
If I have big object is there's a way to handle it so I don't have to regroup it because the limit size of tcp packet. (I cant know when the single object is fully transferred to me as client or not yet)
Actually, the client can know when it has received a complete JSON object. When your client sees the } that matches the opening {, you have the complete object. Of course, this means that that your client needs to understand JSON syntax, but you can use an off-the-shelf JSON parser to do that.
So the best way to do this is for the server to generate and send the JSON, and the client to parse the socket input stream using a normal JSON parser. If you do it that way, then you don't need to know whether the TCP/IP stack has broken the data stream into multiple packets. By the time the JSON parser sees them, they will have been reassembled into a stream of bytes.
If this doesn't answer your question, we need to see what your code is currently doing to generate and send the JSON on the server side.
is there's a way to handle it so i don't have to regroup it because the limit size of tcp packet
You don't have to care about the size of TCP packets. Just write the data. TCP will segmentize and packetize it for you.
(cant know when the single object is fully transferred to me as client or not yet)
Yes you can. You reach the closing '}', as #StephenC mentions. Your JSON parser should take of that for you in any case.
Your question is founded on false assumptions.

Accessing external TCP stream

Theory:
Let's say I have an application A, written in Java, that uses a TCP stream for client/server communication (it's on the client end in the relationship). Now, purely as an experiment, I am trying to create an application B, written in VB.NET, that would serve as a proxy for application A's network stream, allowing app B to read and write to the stream.
Is it, at all, possible to access such a network stream from another application, also taking the language boundary into account?
Your question is pretty vague, but if you're asking about the possibility of making a proxy server, then yes, it's possible. The language doesn't matter, but the interface does (the way that the content in the stream is encoded). For instance, Java typically serializes things into a stream using big endian (most significant byte of each byte sequence sent first), whereas .NET uses little endian (least significant byte of each byte sequence sent first). Again though, as long as you're aware of how the data is actually encoded into those streams, you can write a decent proxy server. If all that your proxy server will be doing is passing along data without caring what the data is, then you can just read a byte from one stream and write it to the other. But if you're actually reading values (integers, strings, pictures, etc.), then you will be dealing with the endianness issues, because Java and VB.NET's default stream readers will read and write integers differently, etc.
There will be some complications if you want to actually edit the data instead of simply passing it along. You'll have to deal with the client's and server's reactions to strange network behavior. For instance, if Client A is a video game, and Proxy B injects a message to the server to "join the game", then you'll have to deal with the fact that the server is going to send "ok, you've joined the game". When the client receives that message, it will most likely ignore it, because it had no knowledge that the proxy tried to join the game on its behalf, and will just assume the server made a mistake.

Large byte array transfer to client

Let me present my situation.
I have a lot of data in bytes stored in files on server. I am writing and reading this files using AIO that is coming in JDK7. Thus, I am using ByteBuffer(s) for read and write operations.
The question is once I have performed a read on AsynchronousFileChannel I want to transfer the content of the ByteByffer that was used in read operation to the client. Thus I actually want to send the bytes.
What would be the best way to go from here. I don't want to send the ByteBuffer, because I have a pool of them that I reuse, thus this is not an option. I want to be able also to even maybe combine several reads and send the content of several ByteBuffer(s) combined at once.
So what do I send. Just a byte[] array? Or do I need some stream? What be the best solution regarding performance here.
I am using RMI for communication.
Thanx in advance.
You can simulate streams over rmi using the RMIIO library, which will allow you to stream arbitrary amounts of bytes via RMI without causing memory problems on either end.
(disclaimer, i wrote the library)
Unless there is a very good reason not to, then just send the byte array along with sufficient meta data that you can provide reliable service.
The less of the underlying implementation you need to transfer back and forth over RMI, the better. Especially when you work with Java 7 which is not yet generally available.
To use RMI you have to retrieve the contents of the buffer as a byte[], then write it to an ObjectOutputStream (the write happens under the covers). Assuming that you're currently using direct buffers, this means CPU time to create the array in the Java heap, and CPU time to garbage-collect that array once it's been written, and the possibility that the stream will hold onto the reference too long, causing an out-of-memory error.
A better approach, in my opinion, is to open a SocketChannel to the destination and use it to write the buffer's contents. Of course, to make this work you'll need to write additional data describing the size of the buffer, and this will probably evolve into a communication protocol.

What is the fastest way to output a large amount of data?

I have an JAX-RS web service that calls a db2 z/os database and returns about 240mb of data in a resultset. I am then creating an OutputStream to send this data to the client by looping through the resultset and adding a few XML tags for my output.
I am confused about what to use PrintWriter, BufferedWriter or OutputStreamWriter. I am looking for the fastest way to deliver the data. I also don't want the JVM to hold onto this data any longer than it needs to, so I don't use up it's memory.
Any help is appreciated.
You should use
BufferedWriter
Call .flush() frequently
Enable gzip for best compression
Start thinking about a different way of doing this. Can your data be paginated? Do you need all the data in one request.
If you are sending a large binary data, you probably don't want to use xml. When xml is used, binary data is usually represented using base64 which becomes larger than the original binary and uses quite a lot of CPU for the conversion into base64.
If I were you, I'd send the binary separate from the xml. If you are using WebService, MTOM attachment could help. Otherwise you could send the reference to the binary data in the xml, and let the app. download the binary data separately.
As for the fastest way to send binary, if you are using weblogic, just writing on the response's outputstram would be ok. That output stream is most probably buffered and whatever you do probably won't change the performance anyways.
Turning on gzip could also help depending on what you are sending (e.g. if you are sending jpeg (stuff that is already compressed) or something, it won't help a lot but if you are sending raw text then it can help a lot, etc.).
One solution (which might not work for you) is to spawn a job / thread that creates a file and then notifies the user when the file is ready to download, in this way you're not tied to the bandwidth of the client connection (and you can even compress the file properly, before the client downloads it)
Some Business Intelligence and data crunching applications do this, specially if the process takes some time to generate the data.
The output max speed will me limited by network bandwith and i am shure any Java OutputStream will be much more faster than you will notice the difference.
The choice depends on the data to send: is that text (lines) PrintWriter is easy, is that a byte array take OutputStream.
To hold not too much data in the buffers you should call flush() any x kb maybe.
You should never use PrintWriter to output data over a network. First of all, it creates platform-dependent line breaks. Second, it silently catches all I/O exceptions, which makes it hard for you to deal with those exceptions.
And if you're sending 240 MB as XML, then you're definitely doing something wrong. Before you start worrying about which stream class to use, try to reduce the amount of data.
EDIT:
The advice about PrintWriter (and PrintStream) came from a book by Elliotte Rusty Harold. I can't remember which one, but it was a few years ago. I think that ServletResponse.getWriter() was added to the API after that book was written - so it looks like Sun didn't follow Rusty's advice. I still think it was good advice - for the reasons stated above, and because it can tempt implementation authors to violate the API contract
in order to get predictable behavior.

Categories