Apache FOP - PDF creation russian text

Apache FOP - PDF creation russian text - java

I have a small Java application that creates (besides other stuff) a PDF file using Apache's FOP 1.0.
Everything works fine when using latin letters. But it doesn't when there are others - e.g. cyrillic.
I don't think, it is the default problem of missing fonts, since the bookmarks within the PDF file are alright (unfortunately I can't add pics to this post).
Any ideas, what I'm doing wrong?
Thanks for your help!
Andreas

in your f:block you need to specify the font you want to use
<fo:block font-family="MS Mincho" font-size="12pt" font-weight="normal" space-after="5mm" background-color="#8BAF3F" color="white">
Of course the font should be available as well.

Thanks for the hints.
I've had set the font-family to 'Verdana', which may or may not have cyrillic letters.
Additionally I set the font-family in 'simple-page-master', so all pages using this master should be using this font.
On basis of your hints I changed the font-family to 'Arial'.
I also set the font-family in one block explicitly, just for a simple test.
I tried even a change of the system language to russian.
Unfortunately nothing worked. The changes of the font-family can be seen each and every time (Arial, Courier, Times, MS Mincho, MAC C Times) on the changed style, but there are always '#' shown.
And, most confusing, the bookmarks are alright...

Related

PDF/iText : replace font defs

I'm using iText (Java lib) to process an already created PDF file.
What I would like to achieve is to replace fonts that are metric-compatible with a PDF base font with that PDF base font. This would make the PDF more "compliant" and potentially also smaller.
Here's how it would go:
Loop through the fonts used in the PDF.
If font is metric-compatible
with a PDF base font then replace font name with that font (but maintain the PDF resource name, e.g. /F13, so that we do not need to touch any text
objects). Since iText embeds in its jar the AFM files for the PDF
base fonts I'm assuming that iText actually has enough knowledge to
make this assesment. I would probably have to look at
serif/sans-serif and monotype flags as well to know if I should swap
in Helvetica, Times or Courier.
Further if metric-compatible: Remove
any font embeds for that font. (since we've replaced with a PDF base
font there's no need to embed anything .. size matters!)
An example:
An existing PDF file uses "Calibri", "Arial" and "Times". Here's how each of those should be handled.
Calibri. This font doesn't have a metric-compatible cousin among the PDF base fonts so processing for this font resource will be skipped.
Arial. This font has a metric-compatible cousin among the PDF base fonts, namely "Helvetica". The name of the font resource (attribute BaseFont I suppose) will be changed to "Helvetica" and any potential embeds will be removed.
Times. This font is already a PDF base font. Skip processing. (we may consider unembedding here if there's something to unembed, but I already know how to do that so not part of the question)
I basically get stuck on the step which is to determine metric-compatibility. Any help is greatly appreciated.
(Note: An answer based on iText 5.x is perfectly ok as I feel the recent iText 7 is still somewhat undocumented)
UPDATE
As pointed out a number of checks would need to be carried out in addition in order to do a safe replacement:
Font encoding compatibility. Not really a problem for me as fonts in the documents I'll be processing will be using WinAnsiEncoding.
Available chars in font. Not really a problem for me as I'll only be processing documents that use only ISO 8859-1 chars. Furthermore: If the PDF contains an embedded subset of a font then I'll have easily accessible knowledge about exactly which chars is used in the document for that font.
I'm sure I can figure out to check for both these conditions. (I'm blissfully naive)
I'm not trying to do a general tool. I know where the PDF's I'll be processing comes from. In any case I guess it is possible to have enough information from the PDF to skip the font substitution if it can't be determined that the substitution will be "safe".

Recognizing colors in text from a docx

I'm trying to write a program that reads a docx file and checks whether some of the text is colored. For instance, imagine if all the words bolded in this sentence were actually written in some arbitrary color. I want my program to recognize that the words "words bolded in this sentence were actually written in some arbitrary color" are colored.
Then after recognizing the coloration, I want to be able to edit the recognized text based on the color. For instance, if the the bolded text above were red, I want to add "Red>" tags around the text, while still keeping intact the rest of the sentence that isn't colored.
I was originally using ZipInputStream and ZipEntry to get the "word/document.xml," and I had planned on pulling the text and colors from there, but I feel like that would get too confusing after a while. I also tried using Apache poi, but I don't think it's able to recognize colors. Docx4j looks promising, though. Any thoughts, suggestions, or sample code to get me started?

Font color is a run property:
<w:r>
<w:rPr>
<w:color w:val="FF0000"/>
</w:rPr>
<w:t>red</w:t>
</w:r>
docx4j provides three ways to do stuff with that:
via XPath
via TraversalUtil
via XSLT
I'd recommend TraversalUtil, since XPath is dependent on JAXB's support for it, which isn't always robust (at least in the Sun/Oracle reference implementation).
See the finders package for examples of using this.
But beyond this, the challenge you face is that the color property could be specified via a style (or even as a document default). If you want to take this into account, you need to be looking at the effective run properties (which is what docx4j's PDF output does).

Loading custom fonts at runtime for use with JTextPane

Thanks for your time. My question is regarding the display of different fonts within the one JTextPane. My client wishes to view a word in two different languages within the one field. They've explicitly specified that they wish the different languages (namely Amharic, Arabic, Coptic and Hebrew) to be shown with different fonts. These are obviously non-standard fonts and I can't rely on the user having the required fonts installed on their OS.
From my research I've found that I can load a font file at runtime and set the JTextPane's font accordingly, which is fine if I just wanted to use one font, not two. I've also read about adding fonts to the OS' font directory or the JRE's font directory, outlined here.
I was hoping however, that there might be away to use the fonts without altering the user's OS. Am I out of luck?
Thanks again for your time and I look forward to any replies with bright ideas!

From my research I've found that I can load a font file at runtime and set the JTextPane's font accordingly, which is fine if I just wanted to use one font, not two.
A JTextPane can use multiple fonts.
Check out the section from the Swing tutorial on Text Component Features for an example of playing with the attributes of the text in the text pane.
Edit:
However to use multiple fonts, the only way I have worked out to set the font is by creating a MutableAttributeSet and setting the "FontFamily" attribute (a string) to the desired font name, and then assigning the Attribute set to the text using the StyledDocument.setCharacterAttributes
Reading the API for the createFont() method it looks like you should be able to use:
GraphicsEnvironment.registerFont(Font)

Khmer Unicode in iText

I'm very new in iText.
Now I want to display Khmer Unicode in iText, but I can't do it.
Does any one know how to do it?
Please advise me.
Regards,
LeeJava

According to you, the owner of the question in another post, iText is not support Khmer Unicode.
I made summarize on this again as well on my blog: http://ask.osify.com/qa/287
Only we have to modify the source code of iText but until now no one claim that they will work on it so iText is not the right choice for us with Khmer Unicode yet until the source modify.
Another alternative solution, we have to use Openoffice doc with JODConverter, for me, I'm still in experiment this one but with quick test, it's working fine.
We just have another issue with creating the openoffice doc that could put Khmer unicode rendering as stated shortly in http://ask.osify.com/qa/318
Updated 13/01/2016
I have added the source code sample for the rendering: http://ask.osify.com/qa/613
The rendering customization with iText for Khmer Unicode added in github: https://github.com/Seuksa/iTextKhmer

if its anything like a normal font
your gonna want something like
Font myfont= new Font(BaseFont.createFont("Font location", "encoding, "embedded"));
most fonts are located under c:\windows\fonts

Font font = new Font("khmer", Font.PLAIN ,33);
iText.setFont(font);

Is it safe to use Unicode characters in Java GUI?

For a play button in a Java GUI I currently use a button with the label set to ' ▻ ' (found this symbol in a Unicode symbol table). As I understand, it is better to not use such symbols directly in source code but rather use the explicit unicode representation like \u25BB in this example, because some tools (editor, ...) might not be able to handle files with non-ASCII content (is that correct?).
Assuming the compiled class contains the correct character, under which circumstances would the GUI not show the intended symbol on a current PC operating system? Linux, Windows, Mac should all support UTF-16, right? Do available fonts or font settings cause problems to this approach?
(Of course I could add an icon, but why add extra resources if a symbol should already be available... given that this is a portable solution)

Do available fonts or font settings cause problems to this approach?
Unfortunately they do. You can use unicode in the source code of course, but the problem is that currently unicode has 246,943 code points assigned so obviously no font has even a fraction of those defined. You'll get squares or some other weird rendering when the glyph isn't available. I've had cases where relatively simple symbols such as ³ render fine on one Windows computer and show up as squares in the next, almost identical computer. All sort of language and locale settings and minor version changes affect this, so it's quite fragile.
AFAIK there are few, if any, characters guaranteed to be always available. Java's Font class has some methods such as canDisplay and canDisplayUpTo, which can be useful to check this at runtime.
Instead of using icons, you could bundle some good TrueType font that has the special characters you need, and then use that font everywhere in your app.

I currently use a button with the label set to ' ▻ '
rather than I always use JButton(String text, Icon icon), and Icon doesn't matter if is there this Font or another Font, UTF-16 or Unicode

Most of editors have support for unicode, so go on.
Look at this post: Eclipse French support
If you are using simple editor like notepad then when you save type name and below it choose UTF encoding ( http://www.sevenforums.com/software/72727-how-make-notepad-save-txt-files-unicode.html )

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.