Can anyone suggest how can I read and display bold/italics characters from a file in java/android. I have some italics characters which I am reading from a file under assets and then displaying those characters in a TextView. Once I get the contents in a string, the bold/italics goes off.
Plain text files
You can achieve by adding HTML tags to your text as this
This text is <i>italic</i> <b>bold</b>
<u>underlined</u> <b><i><u>bold italic underlined</u></b></i>
and then you can use the HTML class that processes HTML strings into displayable styled text.
// textString is the String after you retrieve it from the file
textView.setText(Html.fromHtml(textString));
Microsoft Word or Excel files
I didn't try it myself but you can look at the Apache POI library.
Related
I want to use pdfBox to extract test from Persian pdf files, but it returns "?" for all the Persian characters (it returns correctly the Latin words in the same document).
How can I fix it? Any advice?
Sadly, the provided file has the persian text as vector graphics, not as text from fonts, so it cannot be extracted. You'll have to use OCR for it.
See also the text extraction FAQ:
How come I am not getting any text from the PDF document?
Text extraction from a pdf document is a complicated task and there
are many factors involved that effect the possibility and accuracy of
text extraction. It would be helpful to the PDFBox team if you could
try a couple things.
Open the PDF in Acrobat and try to extract text from there. If Acrobat
can extract text then PDFBox should be able to as well and it is a bug
if it cannot. If Acrobat cannot extract text then PDFBox ‘probably’
cannot either.
It might really be an image instead of text. Some PDF documents are
just images that have been scanned in. You can tell by using the
selection tool in Acrobat, if you can’t select any text then it is
probably an image.
If I open the pdf and search for these bold letters, the browser will find them. What type of text is that? How can I decode that?
I have some document templates(.dotx files) with placeholders. I need to read that template and replace placeholders with actual text which is coming from database columns. I am able to do this using docx4j's WordprocessingMLPackage, but problem is, in some of database columns there is HTML code. This is text coming from a rich text editor fields. When I tried to replace this text in word document template, I am getting pure html code is copied into document. I want convert that html code into actual html text and write into document. How am I able to achieve this?
You can use https://github.com/plutext/docx4j-ImportXHTML either directly, or via content control databinding OpenDoPE extensions.
hi there I have a Java program I wanna know how to print the result of this program in a text file without losing it's colour .I mean the out put is in colour and I want to have the colourfull printed result Thanks
You can't.
Plain text is just that. There is no formatting in a plain text file that lets you specify color/font/size.
However, if you are displaying the text in a Bash shell or have configured your windows command console correctly, you could use ANSI Escape Codes to format the text.
You can't. Textfiles don't have colors.
You could wrap them in HTML tags and style them with css. (there are probably libaries that do that for you). This HTML file can be viewed with a webbrowser.
You could also use ANSI escape code to format your text (e.g. https://github.com/fusesource/jansi)
I have one PDF file. I want to replace few paragraphs of this PDF file by some other text from files like(.doc/ .docx/ .xls). how to scan and edit text paragraph wise or sentence wise in the same PDF document.
Thanks alot in advance..
You can use PDFBox library (http://pdfbox.apache.org/) to extract the text from PDF and use hssf library to extract the data from word or excel file. Then you can do all your manipulations and finally create new PDF doc with the updated text data.