i have seen countless of posts about how to insert pictures or images into word documents using Apache POI. There is a major problem to this: I can not say, where exactly to put them. Of course you can put them into a certain segment using the alignment option, but this seems to me a little bit lack luster. What i would like to have is something along the lines of
insert(xcoord, ycoord, width, length, pictureName);
but i have not seen this being done anywhere. Is there already an elegant approach of how to implement this?
Because another issue with this is the following: Yeah, of course i could take the Microsoft XML code and parse this xml with the wanted data, but that is kind of a pain in the arse and not efficient at all. So yeah, that's that.
Greetings, newbie
Related
I am writing on a Note App (Android and REST API built with PHP/Slim 3). I am wondering if there is something else than Markdown to save notes to a readable and interchangeable format. The problem with Markdown for me is that there is no solution to style texts (e.g. colored text). It is also hard to extend Markdown with custom attributes.
I am already thinking of creating an own data format (or using XML). But this means a lot of work for parsing it. I like the idea of using a standard format to interchange it between client/server and between other applications. But the featureset of Markdown is very limited (by design for sure).
Do you have any tips on this topic?
This question verges on overly-broad, i.e. it may lead to an argument over technologies rather than a "this is the solution" situation.
That being said, here's an answer I think won't be controversial: when you say
"readable, interchangeable format... solution to style texts... custom attributes"
I think HTML. I don't recommend trying to roll-your-own format, because 1.) you are correct that it will be difficult and 2.) it will be even more difficult to match the feature sets of existing solutions
To sum it up: I like the idea of using HTML instead of Markdown. It is an open standard format and exchangable as well as human-readable.
The problem I see with all of these solutions: How to write a WYSIWYG-Editor with this in mind? I am already working with Markdown using the Markwon library: https://github.com/noties/Markwon
It is no problem to write Markdown in an Android EditText widget and render it. You can easily convert it back to plaintext (you can save it). It is much more complicated to get a WYSIWYG experience. You have to deal with every User input, writing in a second file or string which contains the Markup while the user just sees the rendered result. The user can edit/delete anything anywhere in the EditText and you have to take care that those changes will affect the Markdown String/File too. I didn't find an easy solution for this.
The easiest way would be to somehow parse the content of the EditText back to Markdown. But there is no getSpannables-method or alike for the EditText widget. I am thinking of looping through the EditText and see what character is there and how it's formatted. But I think this will have disadvantages too, because there are other things like bulleted lists and checkboxes..
Im trying to implement 'wrap text around image' but I'd rather write it step by step on my own so I can understand it fully.
Can somebody tell me how to do so? Any websites worth recommending regarding this issue?
I think FlowTextView is exactly what you need.
I have recently been trying to learn DL4J but have run into some issues. They have an example of a neural network generating Shakespeare-like text based off and input character but I can't seem to find anything that wold indicate a possible way of creating a response to an input statement.
I would like to use an input string such as "Hello" and have it be able to generate a response of varying length depended on the input. I would like to know if this is possible using LSTM and have a point in the right direction as I have no idea where to even start.
We have plenty of documentation this actually. This gives you a layout of what an RNN looks like:
http://deeplearning4j.org/usingrnns
The model you would be looking at is character level, in general what you want is question answering though. You may want to look at an architecture like this: https://cs.umd.edu/~miyyer/pubs/2014_qb_rnn.pdf
If you are completely new to NLP, I would look at this class:
https://www.youtube.com/playlist?list=PLhVhwi0Pz282aSA2uZX4jR3SkF3BKyMOK
It covers question answering as well.
I am very confused by the amount of different read and write methods in Java and couldn't find one that fits my needs.
I have a text file that looks like this:
CharName{
Sheet{
Vitality{
Short=LP,Name=Life,Base=1000,Equip=0,Buff=0;
Short=LPR,Name=Life regen per sec,Base=50,Equip=0,Buff=0;
}
Magic{
Short=MP,Name=Mana,Base=100,Equip=0,Buff=0;
Short=MPR,Name=Mana regen per sec,Base=5,Equip=0,Buff=0;
}
}
}
It's actually a lot bigger, but that much is sufficient to show what I mean.
I want to make a function that basically looks something like getInfoFromCharFile(sectionName, categoryName, rowNum, tagName).
Example: getInfoFromCharFile(Sheet, Vitality, 2, Name) so that it gives me 50) and also the same with writing saveInfoToCharFile(sectionName, categoryName, rowNum, tagName, newValue)
On top of that comes that I'm trying to make it that when one of those files gets read, I have an easy method of 'loading it in' (reading everything systematically) - so something that goes like "find sheet; find Vitality; get everything in between the next { and }; split at ;;.; find Magic; get everything in between the next { and };. And so on and so forth.
I have an idea how to do the spiting and processing, but how can I get it to 'load' everything in between the braces?
Note: I did read, watch and Google a lot about this, but couldn't find anything that gave me an idea on how to solve my problem. I'm sitting on this problem for weeks now! I used to just use properties, but that got really ugly really fast.
I would show you the code to 'load' in that solution, but the last time I had to edit it, it literally took me hours to understand it again - so yeah, not very good code I guess, that's why I want to replace it like that.
I see two ways to to this:
The hard way:
You insist to use your custom format. In that case, you will probably have to use regular expressions to read your files, you will have to be prepared to handle malformed input, etc.
You can use Matcher or Scanner to do the matching. If you feel adventurous, you can do state-based parsing, by reading the chars one by one.
I actually did something like that a while ago and maybe you can take inspiration from the source...or even use it, last time I checked, it worked. (especially the JONReader could be useful.
The easy way:
Use a standard format like XML, JSON (your format is already quite close to that anyway!), or even some CSV stuff. There are a lot of solid parsers for those out there, and it might become as easy as "charname.getMagic()", without worrying for the details.
If you decide to use JSON, what would seem the most fitting thing for your approach, I suggest having a look at minimal-json or Jackson, depending on how you want to work with it.
I am stuck on a project at work that I do not think is really possible and I am wondering if someone can confirm my belief that it isn't possible or at least give me new options to look at.
We are doing a project for a client that involved a mass download of files from a server (easily did with ftp4j and document name list), but now we need to sort through the data from the server. The client is doing work in Contracts and wants us to pull out relevant information such as: Licensor, Licensee, Product, Agreement date, termination date, royalties, restrictions.
Since the documents are completely unstandardized, is that even possible to do? I can imagine loading in the files and searching it but I would have no idea how to pull out information from a paragraph such as the licensor and restrictions on the agreement. These are not hashes but instead are just long contracts. Even if I were to search for 'Licensor' it will come up in the document multiple times. The documents aren't even in a consistent file format. Some are PDF, some are text, some are html, and I've even seen some that were as bad as being a scanned image in a pdf.
My boss keeps pushing for me to work on this project but I feel as if I am out of options. I primarily do web and mobile so big data is really not my strong area. Does this sound possible to do in a reasonable amount of time? (We're talking about at the very minimum 1000 documents). I have been working on this in Java.
I'll do my best to give you some information, as this is not my area of expertise. I would highly consider writing a script that identifies the type of file you are dealing with, and then calls the appropriate parsing methods to handle what you are looking for.
Since you are dealing with big data, python could be pretty useful. Javascript would be my next choice.
If your overall code is written in Java, it should be very portable and flexible no matter which one you choose. Using a regex or a specific string search would be a good way to approach this;
If you are concerned only with Licensor followed by a name, you could identify the format of that particular instance and search for something similar using the regex you create. This can be extrapolated to other instances of searching.
For getting text from an image, try using the API's on this page:
How to read images using Java API?
Scanned Image to Readable Text
For text from a PDF:
https://www.idrsolutions.com/how-to-search-a-pdf-file-for-text/
Also, PDF is just text, so you should be able to search through it using a regex most likely. That would be my method of attack, or possibly using string.split() and make a string buffer that you can append to.
For text from HTML doc:
Here is a cool HTML parser library: http://jericho.htmlparser.net/docs/index.html
A resource that teaches how to remove HTML tags and get the good stuff: http://www.rgagnon.com/javadetails/java-0424.html
If you need anything else, let me know. I'll do my best to find it!
Apache tika can extract plain text from almost any commonly used file format.
But with the situation you describe, you would still need to analyze the text as in "natural language recognition". Thats a field where; despite some advances have been made (by dedicated research teams, spending many person years!); computers still fail pretty bad (heck even humans fail at it, sometimes).
With the number of documents you mentioned (1000's), hire a temp worker and have them sorted/tagged by human brain power. It will be cheaper and you will have less misclassifications.
You can use tika for text extraction. If there is a fixed pattern, you can extract information using regex or xpath queries. Other solution is to use Solr as shown in this video.You don't need solr but watch the video to get idea.