Correct way to represent repeated data in stream - java

Any advice on how to support repeated messages? Specifically, if these message are all one type. In JSON, these would essentially be an array. In my case, I do not care about indexing however, but that is not saying that an array type would not be useful for protobuf. I have considered the below approaches, but I don' like the tradeoff's. It ins't clear from reading the Google documentation which approach is meant to be used for collections.
Use any existing message and just have a bunch of empty fields
You can use an existing type and just only include the desired collection of repeated messages. So if a user message type has repeated photo message type, send an empty user with nothing but the photo collection field.
Create a wrapper type
This is what #1 does but instead of using an existing type, create a new one. This is a little cleaner because it is explicit and doesn't use empty fields. Still has message typing too. In photo case, this would be an ArrayOfPhotos message w/ only repeated photo message field.
Use delimited stream
Not too sure about this method as I haven't tried it, but protobuf supports delimiting streams. This seems cool, but I would imagine it has downside of less strong typing. Streams could contain a grab bag of different message types.
Does seem beneficial though that this option requires no extra Message types.
In photo case, this would be delimited photo messages, but again, seems like you could throw user messages in as well.

It sounds like you're trying to ask what to do when your top-level data is an array rather than a record. (It isn't totally clear from your question whether you're asking about top-level, but otherwise I don't understand the problem.)
The questions to ask yourself are:
Is there any chance that some day you'll want to add some auxiliary data to this message which is not attached to any one of the objects? For instance, maybe your list of photos will some day have an album name attached. In this case, you certainly want to use your solution #2, since it gives you the flexibility to add other fields later without messing up some existing message type.
Will it be a problem for the client or the server to have to hold the entire data set in memory and parse/serialize it all at once? This is a requirement for a single message. For example, if you're sending 1GB of photos, you probably want each end to be able to handle one or just a few photos at a time. In this case you certainly want solution #3.
I would not advise using solution #1 in any case.

Related

Text file to string matrix java

I am making an auto chat client like Cleverbot for school. I have everything working, but I need a way to make a knowledge base of responses. I was going to make a matrix with all the responses that I need the bot to say, but I think it would be hard to edit the code every time I want to add a responses to the bot. This is the code that I have for the knowledge base matrix:
`String[][] Database={
{"hi","hello","howdy","hey"},//possible user input
{"hi","hello","hey"},//response
{"how are you", "how r u", "how r you", "how are u"},
{"good","doing well"}`
How would I make a matrix like this from a text file? Is there a better way than reading from a text file to deal with this?
You could...
Use a properties file
The properties file is something that can easily be read into (and stored from, but you're not interested in that) Java. The class java.util.Properties can make that easier, but it's fairly simple to load it and then you access it like a Map.
hello.input=hi,hello,howdy,hey
hello.output=hi,hello,hey
Note the matching formats there. This has its own set of problems and challenges to work with, but it lets you easily pull things in to and out of properties files.
Store it in JSON
Lots of things use JSON for a serialization format. And thus, there are lots of libraries that you can use to read and store from it. It would again make some things easier and have its own set of challenges.
{
"greeting":{
"input":["hi","hello","howdy","hey"],
"output":["hi","hello","hey"]
}
}
Something like that. And then again, you read this and store it into your data structures. You could store JSON in a number of places such as document databases (like couch) which would make for easy updates, changes, and access... given you're running that database.
Which brings us to...
Embedded databases
There are lots of databases that you can stick right in your application and access it like a database. Nice queries, proper relationships between objects. There are lots of advantages to using a database when you actually want a database rather than hobbling strings together and doing all the work yourself.
Custom serialization
You could create a class (instead of a 2d array) and then store the data in a class (in which it might be a 2d array, but that's an implementation detail). At this point, you could implement Serializable and write the writeObject and readObject methods and store the data somehow in a file which you could then read back into the object directly. If you have the administration ability of adding new things as part of this application (or another that uses the same class) you could forgo the human readable aspect of it and use the admin tool (that you write) to update the object.
Lots of others
This is just the tip of the iceberg. There are lots of ways to go about this.
P.S.
Please change the name of the variable from Database to something in lower case that better describes it such as input2output or the like. Upper case names are typically reserved for class names (unless its all upper case, in which case it's a final static field)
A common solution would be to dump the data in to a properties file, and then load it with the standard Properties.load(...) method.
Once you have your data like that, you can then access the data by a map-like interface.
You could find different ways of storing the data in the file like:
userinput=hi,hello,howdy,hey
response=hi,hello,hey
...
Then, when you read the file, you can split the values on the comma:
String[] expectHello = properties.getProperty("userinput").split(",");

Socket streaming in Java

When continuously writing/reading sets of data through a socket, how do you recognize the end of 1 set, the start of the next set, and if the entire set is even in the stream for retrieval yet, and not just a piece of it?
To make things simple let's say I'm sending JSON strings through the socket. How do I know if the whole object is there, and get that object from start to finish so I can correctly read it? Keep in mind there may be more objects behind this one.
That depends. If you use an ObjectOutputStream then Java takes care of this for you. Obviously this is Java specific and requires you to have a ObjectInputStream on the other side. It also expects that you send serializable objects to the other side. String however is a serializable object, and I would in general expect any data structure to be serializable.
Otherwise you will have to think of some kind of container format yourself. Nowadays it is also pretty common to use XML structures to serialize the data into. If you go to an even higher level you get to the point of using web-services.

How to persist large strings in a POJO?

If I have a property of an object which is a large String (say the contents of a file ~ 50KB to 1 MB, maybe larger), what is the practice around declaring such a property in a POJO? All I need to do is to be able to set a value from one layer of my application and transfer it to another without making the object itself "heavy".
I was considering if it makes sense to associate an InputStream or OutputStream to get / set the value, rather than reference the String itself - which means when I attempt to read the value of the contents, I read it as a stream of bytes, rather than a whole huge string loaded into memory... thoughts?
What you're describing depends largely on your anticipated use of the data. If you're delivering the contents in raw form, then there may be more efficient ways to manage it.
For example, if your app has a web interface, your app may just provide a URL for a web server to stream the contents to the requester. If it's a CLI-based app, you may be able to get away with a simple file copy. If your app is processing the file, however, then perhaps your POJO could retain only the results of that processing rather than the raw data itself.
If you wish to provide a general pattern along the lines of using POJO's with references to external streams, I would suggest storing in your POJO something akin to a URI that tells where to find the stream (like a row ID in a database or a filename or a URI) rather than storing an instance of the stream itself. In doing so, you'll reduce the number of open file handles, prevent potential concurrency issues, and will be able to serialize those objects locally if needed without having to duplicate the raw data persisted elsewhere.
You could have an object that supplies a stream or an iterator every time you access it. Note that the content has to live on some storage, like a file. I.e your object will store a pointer (e.g. a file path) to the storage and every time someone access it, you open a stream or create an iterator and let that party read. Note also that in order to save on memory, whoever consumes it has to make sure not to store the whole content in memory.
However, 50KB or 1MB is really tiny. Unless you have like gigabytes (or maybe hundred megabytes), I wouldn't try to do something like that.
Also, even if you have large data, it's often simpler to just use files or whatever storage you'll use.
tl;dr: Just use String.

Data type for large amounts of text?

I'm developing a Java package that makes basic HTTP requests (GET, POST, PUT, and DELETE). Right now, I'm having it just print the output of the request. I would like to store it in a field, but I'm not sure if String supports large amounts of text. Is there a data type for large amounts of text, or is there a reasonable alternative to it? Right now, because I'm just printing it, I can't do anything with the data that is returned (like parse it, if it's JSON).
Any ideas would be helpful.
Edit: The code is online on GitHub.
Strings can take up to 2^31 - 1 characters so I suspect are big enough. Data from SO question
I see that you use BufferedReader in your code. You can just leave the string in there and pass that reader to your JSON parser for instance. Would be more efficient than first creating a String out of it.
If you are performing a single set of operations on the data, you can stream it through a pipeline and not even store the entire data in memory at any time. It can also boost performance as work can begin upon the first character rather than after the last is received. Check out CharSequence.

Sanitize json input to a java server

I'm using json to pass data between the browser and a java server.
I'm using Json-lib to convert between java objects and json.
I'd like to strip out susupicious looking stuff (i.e "doSomethingNasty().) from the user input while converting from json to java.
I can imagine several points at which I could do this:
I could examine the raw json string and strip out funny-looking stuff
I could look for a way to intercept every json value on its way into the java object, and look for funny stuff there.
I could traverse my new java objects immediately after reconstitution from json, look for any fields that are Strings, and stripp stuff out there.
What's the best approach? Are there any technologies built for this this task that I tack tack on to what I have already?
I suggest approach 3: traverse the reconstructed Java objects immediately upon arrival, and before any other logic can act on them. Build the most restrictive validation you can get away with (that is, do white-listing).
You can probably do this in a single depth-first traversal of the object hierarchy that you retrieve from Json-lib. In general, you will not want to accept any JSON functions in your objects, and you will want to make sure that all values (numbers, strings, depth of object tree, ...) are within expected ranges. Yes, it is a huge hassle to do this well, but believe me, the alternative to good user-input validation is much, much worse. It may be a good idea to add logging for whenever you chop things out, to diagnose any possible bugs in your validation code.
As I understand you need to validate the JSON data coming into your application.
If you want to do white listing ("You know the data you expect and nothing else is valid"), then it makes sense to validate your java objects once they are created ("make sure not to send the java object to DB or back to UI in some way before validation is done).
In case you want to black listing of characters (you know some of the threat characters which you want to avoid"), then you can directly look at the json string as this validation would not change much over a period of time and even if it does, you only need to enhance one common place. For while listing iot would depend on your business logic.

Categories