deeplearning4j generate response to input - java

I have recently been trying to learn DL4J but have run into some issues. They have an example of a neural network generating Shakespeare-like text based off and input character but I can't seem to find anything that wold indicate a possible way of creating a response to an input statement.
I would like to use an input string such as "Hello" and have it be able to generate a response of varying length depended on the input. I would like to know if this is possible using LSTM and have a point in the right direction as I have no idea where to even start.

We have plenty of documentation this actually. This gives you a layout of what an RNN looks like:
http://deeplearning4j.org/usingrnns
The model you would be looking at is character level, in general what you want is question answering though. You may want to look at an architecture like this: https://cs.umd.edu/~miyyer/pubs/2014_qb_rnn.pdf
If you are completely new to NLP, I would look at this class:
https://www.youtube.com/playlist?list=PLhVhwi0Pz282aSA2uZX4jR3SkF3BKyMOK
It covers question answering as well.

Related

Alternative to Markdown with Color support

I am writing on a Note App (Android and REST API built with PHP/Slim 3). I am wondering if there is something else than Markdown to save notes to a readable and interchangeable format. The problem with Markdown for me is that there is no solution to style texts (e.g. colored text). It is also hard to extend Markdown with custom attributes.
I am already thinking of creating an own data format (or using XML). But this means a lot of work for parsing it. I like the idea of using a standard format to interchange it between client/server and between other applications. But the featureset of Markdown is very limited (by design for sure).
Do you have any tips on this topic?
This question verges on overly-broad, i.e. it may lead to an argument over technologies rather than a "this is the solution" situation.
That being said, here's an answer I think won't be controversial: when you say
"readable, interchangeable format... solution to style texts... custom attributes"
I think HTML. I don't recommend trying to roll-your-own format, because 1.) you are correct that it will be difficult and 2.) it will be even more difficult to match the feature sets of existing solutions
To sum it up: I like the idea of using HTML instead of Markdown. It is an open standard format and exchangable as well as human-readable.
The problem I see with all of these solutions: How to write a WYSIWYG-Editor with this in mind? I am already working with Markdown using the Markwon library: https://github.com/noties/Markwon
It is no problem to write Markdown in an Android EditText widget and render it. You can easily convert it back to plaintext (you can save it). It is much more complicated to get a WYSIWYG experience. You have to deal with every User input, writing in a second file or string which contains the Markup while the user just sees the rendered result. The user can edit/delete anything anywhere in the EditText and you have to take care that those changes will affect the Markdown String/File too. I didn't find an easy solution for this.
The easiest way would be to somehow parse the content of the EditText back to Markdown. But there is no getSpannables-method or alike for the EditText widget. I am thinking of looping through the EditText and see what character is there and how it's formatted. But I think this will have disadvantages too, because there are other things like bulleted lists and checkboxes..

java - Reading and writing text in spezific locations inside a .txt file

I am very confused by the amount of different read and write methods in Java and couldn't find one that fits my needs.
I have a text file that looks like this:
CharName{
Sheet{
Vitality{
Short=LP,Name=Life,Base=1000,Equip=0,Buff=0;
Short=LPR,Name=Life regen per sec,Base=50,Equip=0,Buff=0;
}
Magic{
Short=MP,Name=Mana,Base=100,Equip=0,Buff=0;
Short=MPR,Name=Mana regen per sec,Base=5,Equip=0,Buff=0;
}
}
}
It's actually a lot bigger, but that much is sufficient to show what I mean.
I want to make a function that basically looks something like getInfoFromCharFile(sectionName, categoryName, rowNum, tagName).
Example: getInfoFromCharFile(Sheet, Vitality, 2, Name) so that it gives me 50) and also the same with writing saveInfoToCharFile(sectionName, categoryName, rowNum, tagName, newValue)
On top of that comes that I'm trying to make it that when one of those files gets read, I have an easy method of 'loading it in' (reading everything systematically) - so something that goes like "find sheet; find Vitality; get everything in between the next { and }; split at ;;.; find Magic; get everything in between the next { and };. And so on and so forth.
I have an idea how to do the spiting and processing, but how can I get it to 'load' everything in between the braces?
Note: I did read, watch and Google a lot about this, but couldn't find anything that gave me an idea on how to solve my problem. I'm sitting on this problem for weeks now! I used to just use properties, but that got really ugly really fast.
I would show you the code to 'load' in that solution, but the last time I had to edit it, it literally took me hours to understand it again - so yeah, not very good code I guess, that's why I want to replace it like that.
I see two ways to to this:
The hard way:
You insist to use your custom format. In that case, you will probably have to use regular expressions to read your files, you will have to be prepared to handle malformed input, etc.
You can use Matcher or Scanner to do the matching. If you feel adventurous, you can do state-based parsing, by reading the chars one by one.
I actually did something like that a while ago and maybe you can take inspiration from the source...or even use it, last time I checked, it worked. (especially the JONReader could be useful.
The easy way:
Use a standard format like XML, JSON (your format is already quite close to that anyway!), or even some CSV stuff. There are a lot of solid parsers for those out there, and it might become as easy as "charname.getMagic()", without worrying for the details.
If you decide to use JSON, what would seem the most fitting thing for your approach, I suggest having a look at minimal-json or Jackson, depending on how you want to work with it.

Is it possible to do this type of search in Java

I am stuck on a project at work that I do not think is really possible and I am wondering if someone can confirm my belief that it isn't possible or at least give me new options to look at.
We are doing a project for a client that involved a mass download of files from a server (easily did with ftp4j and document name list), but now we need to sort through the data from the server. The client is doing work in Contracts and wants us to pull out relevant information such as: Licensor, Licensee, Product, Agreement date, termination date, royalties, restrictions.
Since the documents are completely unstandardized, is that even possible to do? I can imagine loading in the files and searching it but I would have no idea how to pull out information from a paragraph such as the licensor and restrictions on the agreement. These are not hashes but instead are just long contracts. Even if I were to search for 'Licensor' it will come up in the document multiple times. The documents aren't even in a consistent file format. Some are PDF, some are text, some are html, and I've even seen some that were as bad as being a scanned image in a pdf.
My boss keeps pushing for me to work on this project but I feel as if I am out of options. I primarily do web and mobile so big data is really not my strong area. Does this sound possible to do in a reasonable amount of time? (We're talking about at the very minimum 1000 documents). I have been working on this in Java.
I'll do my best to give you some information, as this is not my area of expertise. I would highly consider writing a script that identifies the type of file you are dealing with, and then calls the appropriate parsing methods to handle what you are looking for.
Since you are dealing with big data, python could be pretty useful. Javascript would be my next choice.
If your overall code is written in Java, it should be very portable and flexible no matter which one you choose. Using a regex or a specific string search would be a good way to approach this;
If you are concerned only with Licensor followed by a name, you could identify the format of that particular instance and search for something similar using the regex you create. This can be extrapolated to other instances of searching.
For getting text from an image, try using the API's on this page:
How to read images using Java API?
Scanned Image to Readable Text
For text from a PDF:
https://www.idrsolutions.com/how-to-search-a-pdf-file-for-text/
Also, PDF is just text, so you should be able to search through it using a regex most likely. That would be my method of attack, or possibly using string.split() and make a string buffer that you can append to.
For text from HTML doc:
Here is a cool HTML parser library: http://jericho.htmlparser.net/docs/index.html
A resource that teaches how to remove HTML tags and get the good stuff: http://www.rgagnon.com/javadetails/java-0424.html
If you need anything else, let me know. I'll do my best to find it!
Apache tika can extract plain text from almost any commonly used file format.
But with the situation you describe, you would still need to analyze the text as in "natural language recognition". Thats a field where; despite some advances have been made (by dedicated research teams, spending many person years!); computers still fail pretty bad (heck even humans fail at it, sometimes).
With the number of documents you mentioned (1000's), hire a temp worker and have them sorted/tagged by human brain power. It will be cheaper and you will have less misclassifications.
You can use tika for text extraction. If there is a fixed pattern, you can extract information using regex or xpath queries. Other solution is to use Solr as shown in this video.You don't need solr but watch the video to get idea.

Need to export data in CCR format from Java

I'm working in a project which needs to export EHR information in CCR format. I must use Java. The problem that I'm facing is that I can't find an easy way to do it.
The better way to do what I'm doing would be to export as CDA using something like CDAPI but it's overly expensive (30k/year) and complicated. However it shows an example of what I'd like. Something like:
CCR ccr = new CCR();
...
out.print(ccr.toString()); // Returns XML
But it's as if this doesn't exist.
There's CCR4J but it can only read XML files and make Java objects. Not the other way around.
There's Google Health (now discontinued) which might have what I'm looking for, but I can't even figure out how to use it.
There's CCR Binder which has some convenience methods for creating CCR XML from code built on top of Google Health API, but I can't figure out how to use that either.
I could also just read the ASTM CCR Spec and implement something on my own which at this point begins to look like the faster option.
Now I would really like to stay away from Google Health. Seems to be an overkill for my task as is exporting do CDA. Any comments and suggestions are appreciated.
Just for the benefit of people searching for the same info. Here's the CCR Spec.
Sorry for this (very) late answer, but i stumbled uppon this post, cause it's still ranked high in Google if you search for java and CCR. To prevent others from giving up to quick I have to correct you:
With CCR4J you CAN create CCRs from Java Objects (since 2008) and it works like a charm! Not just parsing it from a given file.
Perhaps you just didn't got how to use the library back in time?
So here's a little Example (no valid CCR!) for the next one, who stumble over this post trying to create a CCR with this library:
//New XML-Document
ContinuityOfCareRecordDocument newDoc = ContinuityOfCareRecordDocument.Factory.newInstance();
//New CCR
ContinuityOfCareRecord newCCR = ContinuityOfCareRecord.Factory.newInstance();
//Add Object ID
newCCR.setCCRDocumentObjectID("asdasdbdffdjg343204dsss3490");
//Add new Language
newCCR.addNewLanguage().setText("English");
//Add new Body
newCCR.addNewBody();
//Add new Problem with Code
newCCR.getBody().addNewProblems().addNewProblem().addNewDescription().addNewCode().setCodingSystem("ICD");
newCCR.getBody().getProblems().getProblemArray(0).getDescription().getCodeArray(0).setValue("1225-55558");
//Add CCR to document and save
newDoc.setContinuityOfCareRecord(newCCR);
newDoc.save(new File("My-Generated-CCR.xml"));
I ended up doing something like this:
Video: Quick and Dirty CCR
To summarize: Use JAXB to make the classes them marshall them using JAXB marshaller.

Rewriting Binary Streams using Java

I have been studying Netty and Mina but am confused as to the best way to rewrite binary streams. For example, I would like to create a proxy that will allow for replacement of XML and forward along.
Examples appreciated.
I think you're thinking at too low of a level. XML is not so much "binary" as it is an abstraction on top of binary. If you want to replace snippets of XML as they come across your line, you'll have to poke into the payload portion of the packets and look for patterns of XML.. a simple way is to use a regular expression after rebuilding the bytes into content temporarily.
Once you have this search and you have matched what you want, you can replace what you want to replace and re-send.
The hard part of this is that you will likely need to cache some input before it leaves your machine so that you are able to find the beginning and end of what it is you are searching for. What makes this difficult is that often times, you don't know what constitutes the "beginning" and the "end" of a data payload.

Categories