Socket streaming in Java - java

When continuously writing/reading sets of data through a socket, how do you recognize the end of 1 set, the start of the next set, and if the entire set is even in the stream for retrieval yet, and not just a piece of it?
To make things simple let's say I'm sending JSON strings through the socket. How do I know if the whole object is there, and get that object from start to finish so I can correctly read it? Keep in mind there may be more objects behind this one.

That depends. If you use an ObjectOutputStream then Java takes care of this for you. Obviously this is Java specific and requires you to have a ObjectInputStream on the other side. It also expects that you send serializable objects to the other side. String however is a serializable object, and I would in general expect any data structure to be serializable.
Otherwise you will have to think of some kind of container format yourself. Nowadays it is also pretty common to use XML structures to serialize the data into. If you go to an even higher level you get to the point of using web-services.

Related

Java Socket Programming: Send Object as CSV or Serialized Object?

I am just getting in to writing networked code using Sockets in Java. I'm just making some test programs. Originally I was going to send data as comma separated values, but I recently discovered ObjectOutputStream. Which method would be faster or more bandwidth efficient? For example, if I'm making a game where I have to send x and y coordinates very often, should I send it through PrintWriter separated by a comma, or make a Position class and send an instance over ObjectOutputStream. What if I change my code and need to send a lot more data?
What are the pros and cons of sending data as CSV over PrintWriter vs as fields in an object over ObjectOutputStream?
An ad-hoc binary format has a good chance of being more bandwidth-efficient than the default serialization format, which should be (but it's a wild guess, and it depends on the nature and amount of data: you should measure it if it matters) more or less as bandwidth efficient than a text-based format.
But bandwidth efficiency is not the only thing that matters.
Using serialization, the client and the server must be written in Java, and have the classes of the serialized objects in their classpath. If you intend to have clients written in any language, you shouldn't consider it.
If serialization is OK, it's of course a really easy way to transform almost any Java object into bytes, which allows you to avoid defining a format.
Note that there are alternatives that provide almost the same flexibility, but don't have the Java-only disadvantage of serialization. For example, JSON, XML, or protobuf.
I think CSV is smaller.
If you want to check data size,please try to output to a File.
and I don't recommend ObjectOutputStream to you by other reason.
Because you have to keep Objects compatibility.
Did you research about serialize and serialVersionUID?
Please check java.io.Serializable

Different in transfering data between Pipe and Serialization in Java and C?

I am studying about the Interprocess Communication Methods in the course Operating System Concept.
I don't really understand the mechanism in transferring data. In the case of pipe method, a conduit will be created between 2 process to transfer byte streams , right?
And how about Serialization?
I know Serialization is the method to convert an object into byte stream to transfer and we can rebuild the object when it reached the destination.
So in which case we use Serialzation or Pipe to transfer data?
What is the advantages and the disadvantages between them?
Can anyone explain to me a very deep mechanism in transferring data of these methods? And are these mechanisms different between Java and C? , or it is the same?
Thanks in advanced.
There are two basic types of pipe in UNIX/Linux: a named pipe and an anonymous one.
An anonymous pipe is created by the "pipe()" system call, which returns 2 file descriptors associated with a newly created pipe, one for writing data, the other for reading from it. The shell uses anonymous pipes to connect the standard output of one process to the standard input of another when you connect two process with the "|" operator.
A named pipe appears as a file in the file system, and can be opened with the normal "open()" system call.
In blocking mode (the default), the process that reads from the pipe will block until data appears there; the writer can then send data which will appear as a byte stream to the reader.
The important fact here is that the data that is transferred is a byte stream. The sender and receiver of the data must agree on a protocol to determine how to interpret the bytes. One typical method for this is serialization. Consider a 32 bit integer ... 4 bytes. Some systems store those bytes with the most significant bit in the first byte (known as big-endian), some store the least significant bit in the first byte (little-endian system, such as x86). When transmitting such data across a network, serialization of such data is important, since it is entirely possible that each end stores the data in a different order.
But even when transmitting data between two processes on the same host, serialization helps. It can be used to encapsulate objects so that the receiver knows when it has received everything. For example, with our 32 bit integer, if the receiver doesn't know it is expecting an integer, and gets 3 bytes (the 4th having been delayed by some scheduling), it must know that it needs to wait before continuing.
None of this is particular language specific, save that some languages have built in support for serialization. Java is one such language (see ObjectInputStream and ObjectOutputStream). If you are trying to move data between Java and C programs, and on the Java side you want to use these classes, then you'll need to understand the serialization protocol used by them.
Another common serialization technique is JSON (JavaScript Object Notation), for which there exists several good libraries in C and Java.
I don't really understand the mechanism in transferring data. In the case of pipe method, a conduit will be created between 2 process to transfer byte streams , right?
A named or anonymous pipe is a stream rather like a socket connection over loop back. In fact in some OSes, it is implemented by the same drivers/library.
And how about Serialization?
How serialization is done is not a language specific and you can serialize data in a manner which can be shared between C and Java.
What is the advantages and the disadvantages between them?
There is many forms of serialization and this is too broad a topic to cover in one answer. You could do an entire thesis on it.
Can explain one explain to me a very deep mechanism in transferring data of these methods?
There isn't much to it. A block of data is copied to memory managed by the OS and this buffered data can be read by another program (or the same one)
And are these mechanisms different between Java and C? , or it is the same?
They both use the same OS calls to do the real work. The Java API hides this fact from you and makes it more Java friendly, but they are the same.

How to persist large strings in a POJO?

If I have a property of an object which is a large String (say the contents of a file ~ 50KB to 1 MB, maybe larger), what is the practice around declaring such a property in a POJO? All I need to do is to be able to set a value from one layer of my application and transfer it to another without making the object itself "heavy".
I was considering if it makes sense to associate an InputStream or OutputStream to get / set the value, rather than reference the String itself - which means when I attempt to read the value of the contents, I read it as a stream of bytes, rather than a whole huge string loaded into memory... thoughts?
What you're describing depends largely on your anticipated use of the data. If you're delivering the contents in raw form, then there may be more efficient ways to manage it.
For example, if your app has a web interface, your app may just provide a URL for a web server to stream the contents to the requester. If it's a CLI-based app, you may be able to get away with a simple file copy. If your app is processing the file, however, then perhaps your POJO could retain only the results of that processing rather than the raw data itself.
If you wish to provide a general pattern along the lines of using POJO's with references to external streams, I would suggest storing in your POJO something akin to a URI that tells where to find the stream (like a row ID in a database or a filename or a URI) rather than storing an instance of the stream itself. In doing so, you'll reduce the number of open file handles, prevent potential concurrency issues, and will be able to serialize those objects locally if needed without having to duplicate the raw data persisted elsewhere.
You could have an object that supplies a stream or an iterator every time you access it. Note that the content has to live on some storage, like a file. I.e your object will store a pointer (e.g. a file path) to the storage and every time someone access it, you open a stream or create an iterator and let that party read. Note also that in order to save on memory, whoever consumes it has to make sure not to store the whole content in memory.
However, 50KB or 1MB is really tiny. Unless you have like gigabytes (or maybe hundred megabytes), I wouldn't try to do something like that.
Also, even if you have large data, it's often simpler to just use files or whatever storage you'll use.
tl;dr: Just use String.

Java using streams as sort of "buffers"

I'm working with a library that I have to provide an InputStream and a PrintStream. It uses the InputStream to gather data for processing and the PrintStream to provide results. I'm stuck using this library and its API cannot be altered.
There are two issues with this that I think have related solutions.
First, the data that needs to be read via the InputStream is not available upfront. Instead, the data is dynamically created by a different part of the application and given to my code as a String via method call. My code's job is to somehow allow the library to read this data through the InputStream provided as I get it.
Second, I need to somehow get the result that is written to the PrintStream and send it to another part of the application as a String. This needs to happen as immediately after the data is put in to the PrintStream as possible.
What it looks like I need are two stream objects that behave more or less like buffers. I need an InputStream that I can shove data in to whenever I have it and a PrintStream that I can grab it's contents whenever it has some. This seems a little awkward to me, but I'm not sure how else to do it.
I'm wondering if anything already exists that allows this kind of behavior or if there is a different (better) solution that will work in the situation I've described. The only thing I can come up with is to try to implement streams with this behavior, but that can become complicated fast (especially since the InputStream needs to block until data is available).
Any ideas?
Edit: To be clear, I'm not writing the library. I'm writing code that is supposed to provide the library with an InputStream to read data from and a PrintStream to write data to.
Looks like both streams need to be constantly reading/writing so you'll need two threads independent of each other. The pattern resembles JMS a little bit, in which case you're feeding information to a "queue" or "topic", and wait for it to be processed then put on a "output" queue/topic. This may introduce additional moving parts, but you could write a simple client to place info onto a JMS queue, then have a listener to just grab messages, and feed it to the input stream constantly. Then another piece of code to read from output stream, and do what you need with it.
Hope this helps.

Sanitize json input to a java server

I'm using json to pass data between the browser and a java server.
I'm using Json-lib to convert between java objects and json.
I'd like to strip out susupicious looking stuff (i.e "doSomethingNasty().) from the user input while converting from json to java.
I can imagine several points at which I could do this:
I could examine the raw json string and strip out funny-looking stuff
I could look for a way to intercept every json value on its way into the java object, and look for funny stuff there.
I could traverse my new java objects immediately after reconstitution from json, look for any fields that are Strings, and stripp stuff out there.
What's the best approach? Are there any technologies built for this this task that I tack tack on to what I have already?
I suggest approach 3: traverse the reconstructed Java objects immediately upon arrival, and before any other logic can act on them. Build the most restrictive validation you can get away with (that is, do white-listing).
You can probably do this in a single depth-first traversal of the object hierarchy that you retrieve from Json-lib. In general, you will not want to accept any JSON functions in your objects, and you will want to make sure that all values (numbers, strings, depth of object tree, ...) are within expected ranges. Yes, it is a huge hassle to do this well, but believe me, the alternative to good user-input validation is much, much worse. It may be a good idea to add logging for whenever you chop things out, to diagnose any possible bugs in your validation code.
As I understand you need to validate the JSON data coming into your application.
If you want to do white listing ("You know the data you expect and nothing else is valid"), then it makes sense to validate your java objects once they are created ("make sure not to send the java object to DB or back to UI in some way before validation is done).
In case you want to black listing of characters (you know some of the threat characters which you want to avoid"), then you can directly look at the json string as this validation would not change much over a period of time and even if it does, you only need to enhance one common place. For while listing iot would depend on your business logic.

Categories