KNIME custom node output format - java

Is it possible to set the output format of my custom node to any object i created inside my node? or are there restrictions?

Your output should implement PortObject, and also have a PortObjectSerializer and PortObjectSpec (also with PortObjectSpecSerializer). You need to register that with an extension point, just like this. After that, you can use them.
I would recommend using existing port objects though. For example PMMLPortObject might be a good alternative if you need tree-like data structures, BufferedDataTable for tabular data, and the ImagePortObject for images. You might consider creating special cell types and store them in regular BufferedDataTables.

Related

Custom index in Apache Solr

Suppose in addition of simple text terms i want to retrieve some complex data from text. For example, text can contain descriptions of graphs in some format. After that I want to do queries which contain some conditions on those graphs (for examle I want to find all documents with planar graphs or something like this). It seems that standard index of Solr is not sufficient for such a task because in the end it (as I understand) treats document in terms of tokens which are just strings, but I need additional index which have more suited format. So question is: can I somehow customize indexing and retrieving data from index in Solr? I've read a lot of documentation but could not find an answer.
Yes. You are able to define each field in the schema.xml file. Within that file, you can define what type of data is stored, how the document is tokenized, and how the tokenized data is manipulated. In order to meet your need, you will probably need to write a custom tokenizer and possibly custom filters as well.
Your best starting point is to look at field definition of text_general in schema. It has various tokenizers, filters that apply to the text and help you in indexing. You can define different tokens both at indexing and quering process.
You need to know that, tokens apply on the text, and filters apply on each token. You have descripton of graphs in some format. Can you elaborate more on th type of format, so that we can think of better ways? There are so many existing tokenzers and filters available. Depending upon the format, you can use existing ones or write your own.

Text file to string matrix java

I am making an auto chat client like Cleverbot for school. I have everything working, but I need a way to make a knowledge base of responses. I was going to make a matrix with all the responses that I need the bot to say, but I think it would be hard to edit the code every time I want to add a responses to the bot. This is the code that I have for the knowledge base matrix:
`String[][] Database={
{"hi","hello","howdy","hey"},//possible user input
{"hi","hello","hey"},//response
{"how are you", "how r u", "how r you", "how are u"},
{"good","doing well"}`
How would I make a matrix like this from a text file? Is there a better way than reading from a text file to deal with this?
You could...
Use a properties file
The properties file is something that can easily be read into (and stored from, but you're not interested in that) Java. The class java.util.Properties can make that easier, but it's fairly simple to load it and then you access it like a Map.
hello.input=hi,hello,howdy,hey
hello.output=hi,hello,hey
Note the matching formats there. This has its own set of problems and challenges to work with, but it lets you easily pull things in to and out of properties files.
Store it in JSON
Lots of things use JSON for a serialization format. And thus, there are lots of libraries that you can use to read and store from it. It would again make some things easier and have its own set of challenges.
{
"greeting":{
"input":["hi","hello","howdy","hey"],
"output":["hi","hello","hey"]
}
}
Something like that. And then again, you read this and store it into your data structures. You could store JSON in a number of places such as document databases (like couch) which would make for easy updates, changes, and access... given you're running that database.
Which brings us to...
Embedded databases
There are lots of databases that you can stick right in your application and access it like a database. Nice queries, proper relationships between objects. There are lots of advantages to using a database when you actually want a database rather than hobbling strings together and doing all the work yourself.
Custom serialization
You could create a class (instead of a 2d array) and then store the data in a class (in which it might be a 2d array, but that's an implementation detail). At this point, you could implement Serializable and write the writeObject and readObject methods and store the data somehow in a file which you could then read back into the object directly. If you have the administration ability of adding new things as part of this application (or another that uses the same class) you could forgo the human readable aspect of it and use the admin tool (that you write) to update the object.
Lots of others
This is just the tip of the iceberg. There are lots of ways to go about this.
P.S.
Please change the name of the variable from Database to something in lower case that better describes it such as input2output or the like. Upper case names are typically reserved for class names (unless its all upper case, in which case it's a final static field)
A common solution would be to dump the data in to a properties file, and then load it with the standard Properties.load(...) method.
Once you have your data like that, you can then access the data by a map-like interface.
You could find different ways of storing the data in the file like:
userinput=hi,hello,howdy,hey
response=hi,hello,hey
...
Then, when you read the file, you can split the values on the comma:
String[] expectHello = properties.getProperty("userinput").split(",");

Using list of serialized objects

I'm learning Android/Java programming, but I'm confused about persistant data.
After a lot of research it seems that the best way to store objects is to serialize them in a file, but I couldn't find a simple way to retrieve these objects.
If I create two objects and save their serialized versions, how can I retieve and list both of them? Do I need to create a file for each object with a specific ID in the filename so I can list them with a getFilesDir?
Depends on how complex those objects are (your personal preferences I guess), I have used SharedPreferences to store simple objects before, just for the sake of simplicity, while a co-worker makes generous use of SQLite, but that suits his needs.
Since you do not state what is being stored, the best advice I can give you right now is have read here, it covers how persistent data should be dealt with on Android.
"best" way? Please define your criteria.
There are databases (relational and non-relational) or file systems. You can serialize lots of ways: Java serialization, XML, Google's protobuf, and others.
Yes, you'll need a way to specify a unique representation with its object. In a relational database, you'd use a primary key. You need something like that in any system you use.
If you serialize via some mechanism to a file system, you'll have to write the object into the desired format and stream it to a file. To go the other way, specify the key for the object, read the serialized data, and parse it back into the object.

best practices question: How to save a collection of images and a java object in a single file? File is read to be rendered

I am making a java program that has a collection of flash-card like objects. I store the objects in a jtree composed of defaultmutabletreenodes. Each node has a user object attached to it with has a few string/native data type parameters. However, i also want each of these objects to have an image (typical formats, jpg, png etc).
I would like to be able to store all of this information, including the images and the tree data to the disk in a single file so the file can be transferred between users and the entire tree, including the images and parameters for each object, can be reconstructed.
I had not approached a problem like this before so I was not sure what the best practices were. I found XLMEncoder (http://java.sun.com/j2se/1.4.2/docs/api/java/beans/XMLEncoder.html) to be a very effective way of storing my tree and the native data type information. However I couldn't figure out how to save the image data itself inside of the XML file, and I'm not sure it is possible since the data is binary (so restricted characters would be invalid). My next thought was to associate a hash string instead of an image within each user object, and then gzip together all of the images, with the hash strings as the names and the XMLencoded tree in the same compmressed file. That seemed really contrived though.
Does anyone know a good approach for this type of issue?
THanks!
Thanks!
Assuming this isn't just a serializable graph, consider bundling the files together in Jar format. If you already have your data structures working with XMLEncoder, you can reuse this code by saving the data as a jar entry.
If memory serves, the jar library has better support for Unicode name entries than the zip package, which is why I would favour it.
You might consider using an MS JET database (.mdb file) and storing all the stuff in there. That'll also make it easy to examine and edit the data in (for example) MS Access.
You can employ some virtual file system, which stores it's data in a single container. We develop and offer one of such files sytems, SolFS, however right now there's no Java binding for it. We will release Java JNI interface for SolFS within a month.

Sanitize json input to a java server

I'm using json to pass data between the browser and a java server.
I'm using Json-lib to convert between java objects and json.
I'd like to strip out susupicious looking stuff (i.e "doSomethingNasty().) from the user input while converting from json to java.
I can imagine several points at which I could do this:
I could examine the raw json string and strip out funny-looking stuff
I could look for a way to intercept every json value on its way into the java object, and look for funny stuff there.
I could traverse my new java objects immediately after reconstitution from json, look for any fields that are Strings, and stripp stuff out there.
What's the best approach? Are there any technologies built for this this task that I tack tack on to what I have already?
I suggest approach 3: traverse the reconstructed Java objects immediately upon arrival, and before any other logic can act on them. Build the most restrictive validation you can get away with (that is, do white-listing).
You can probably do this in a single depth-first traversal of the object hierarchy that you retrieve from Json-lib. In general, you will not want to accept any JSON functions in your objects, and you will want to make sure that all values (numbers, strings, depth of object tree, ...) are within expected ranges. Yes, it is a huge hassle to do this well, but believe me, the alternative to good user-input validation is much, much worse. It may be a good idea to add logging for whenever you chop things out, to diagnose any possible bugs in your validation code.
As I understand you need to validate the JSON data coming into your application.
If you want to do white listing ("You know the data you expect and nothing else is valid"), then it makes sense to validate your java objects once they are created ("make sure not to send the java object to DB or back to UI in some way before validation is done).
In case you want to black listing of characters (you know some of the threat characters which you want to avoid"), then you can directly look at the json string as this validation would not change much over a period of time and even if it does, you only need to enhance one common place. For while listing iot would depend on your business logic.

Categories