I have some document templates(.dotx files) with placeholders. I need to read that template and replace placeholders with actual text which is coming from database columns. I am able to do this using docx4j's WordprocessingMLPackage, but problem is, in some of database columns there is HTML code. This is text coming from a rich text editor fields. When I tried to replace this text in word document template, I am getting pure html code is copied into document. I want convert that html code into actual html text and write into document. How am I able to achieve this?
You can use https://github.com/plutext/docx4j-ImportXHTML either directly, or via content control databinding OpenDoPE extensions.
Related
I want to use pdfBox to extract test from Persian pdf files, but it returns "?" for all the Persian characters (it returns correctly the Latin words in the same document).
How can I fix it? Any advice?
Sadly, the provided file has the persian text as vector graphics, not as text from fonts, so it cannot be extracted. You'll have to use OCR for it.
See also the text extraction FAQ:
How come I am not getting any text from the PDF document?
Text extraction from a pdf document is a complicated task and there
are many factors involved that effect the possibility and accuracy of
text extraction. It would be helpful to the PDFBox team if you could
try a couple things.
Open the PDF in Acrobat and try to extract text from there. If Acrobat
can extract text then PDFBox should be able to as well and it is a bug
if it cannot. If Acrobat cannot extract text then PDFBox ‘probably’
cannot either.
It might really be an image instead of text. Some PDF documents are
just images that have been scanned in. You can tell by using the
selection tool in Acrobat, if you can’t select any text then it is
probably an image.
while fetching an email am converting html text to plain text by using jericho renderer.
example input will be encoded in anchor tags, while parsing it will remove all hyperlinks and the output will be like this http://www.google.com.
but the problem is how can i display this as a link in UI. Can anyone tell me, what is the efficient way to handle this.
Can anyone suggest how can I read and display bold/italics characters from a file in java/android. I have some italics characters which I am reading from a file under assets and then displaying those characters in a TextView. Once I get the contents in a string, the bold/italics goes off.
Plain text files
You can achieve by adding HTML tags to your text as this
This text is <i>italic</i> <b>bold</b>
<u>underlined</u> <b><i><u>bold italic underlined</u></b></i>
and then you can use the HTML class that processes HTML strings into displayable styled text.
// textString is the String after you retrieve it from the file
textView.setText(Html.fromHtml(textString));
Microsoft Word or Excel files
I didn't try it myself but you can look at the Apache POI library.
i have the following scenario:
<xml>Text text text<a><b></b>Test text</a> text text text<c>text text</c><d><d/><xml>
How can i parse this xml so that i keep all the information (parse into a tree?). I need to keep the text and the sequence and position of the tags in the text.
Thanks for your help!
EDIT: I already tried using a java parser...i didn't manage to get it to work.
this isn't a well formed xml. you can't use a standard parser.
You must write a your.
Can anybody give me any idea, if i can store a text attribute (like bold, italic etc) modified in JEditorPane in MSSQL SERVER Database?
The user should be able to modify the text attribute, then store that in DB and again retrieve them from DB when in need, in the same attribute style.
You have to define proper EditorKit. E.g. HTMLEditorKit will allow you to store text as HTML with all styles info. Just use getText() and setText() to work with formatted content.
ALternatively you can write own Reader/Writer see for exmple http://java-sl.com/editor_kit_tutorial_reader_writer.html