How to use Japanese language when convert html to pdf? - java

I have html file with Japan language, I converted pdf file. But it don't show text Japanese.
This is my code :
final Charset charset = Charset.forName("UTF-8");
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("html.pdf"));
document.open();
XMLWorkerHelper.getInstance().parseXHtml(writer, document, new FileInputStream(filename), charset);
document.close();
example :
I have a line text : "ユーザロールを持つユーザだけが利用できるコンテンツ"
I want to add in pdf file by java then show in pdf file.

You need to registry JAPANESE Font
public static final String DEST = "results/fonts/chinese.pdf";
public static final String FONT = "resources/fonts/NotoSansCJKsc-Regular.otf";
public static final String CHINESE = "\u5341\u950a\u57cb\u4f0f";
public static final String JAPANESE = "\u8ab0\u3082\u77e5\u3089\u306a\u3044";
public static final String KOREAN = "\ube48\uc9d1";
public static void main(String[] args) throws IOException, DocumentException {
File file = new File(DEST);
file.getParentFile().mkdirs();
new NotoExample().createPdf(DEST);
}
public void createPdf(String dest) throws IOException, DocumentException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(DEST));
document.open();
Font font = FontFactory.getFont(FONT, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Paragraph p = new Paragraph(TEXT, font);
document.add(p);
document.add(new Paragraph(CHINESE, font));
document.add(new Paragraph(JAPANESE, font));
document.add(new Paragraph(KOREAN, font));
document.close();
}
}

Related

How can replace words in pdf file in java?

I tried code like this, but it's not replacing words. Is it correct way to replace words in pdf file?
#SpringBootApplication
public class DocReadWriteApplication {
public static final String SRC = "../Downloads/Debt LOI.pdf";
public static final String DEST = "../Downloads/hello.pdf";
public static void main(String[] args) throws IOException, DocumentException {
File file = new File(DEST);
file.getParentFile().mkdirs();
manipulatePdf(SRC, DEST);
}
public static void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfDictionary dict = reader.getPageN(1);
PdfObject object = dict.getDirectObject(PdfName.CONTENTS);
PdfArray refs = null;
if (dict.get(PdfName.CONTENTS).isArray()) {
refs = dict.getAsArray(PdfName.CONTENTS);
} else if (dict.get(PdfName.CONTENTS).isIndirect()) {
refs = new PdfArray(dict.get(PdfName.CONTENTS));
}
for (int i = 0; i < refs.getArrayList().size(); i++) {
PRStream stream = (PRStream) refs.getDirectObject(i);
byte[] data = PdfReader.getStreamBytes(stream);
stream.setData(new String(data).replace("transaction", "Data").getBytes());
}
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
reader.close();
}
}
Anybody have done like this?

Word found unreadable content in .docx after replacing content through docx4j

I am getting error Word found unreadable content in .docx after replacing content through docx4j.
Please find code snippet.
I am using docx4j-6.1.2 jar
public class Testt {
public static void main(String[] args) throws Exception {
final String TEMPLATE_NAME = "D://fileuploadtemp//123.docx";
InputStream templateInputStream = new FileInputStream(TEMPLATE_NAME);
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(templateInputStream);
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
String xpath = "//w:r[w:t[contains(text(),'TEST')]]";
List<Object> list = documentPart.getJAXBNodesViaXPath(xpath, true);
for (Object obj : list) {
org.docx4j.wml.ObjectFactory factory = new org.docx4j.wml.ObjectFactory();
org.docx4j.wml.Text t = factory.createText();
t.setValue("\r\n");
((R) obj).getContent().clear();
((R) obj).getContent().add(t);
}
OutputStream os = new FileOutputStream(new File("D://fileuploadtemp//1234.docx"));
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
wordMLPackage.save(outputStream);
outputStream.writeTo(os);
os.close();
outputStream.close();
templateInputStream.close();
}
}

Code will not accept Freesans as a resource file

I am trying to put mathmatical symbols into my PDF. The error, java.io.IOException: resources/fonts/FreeSans.ttf not found as file or resource.
public class CreateTable {
public static final String FONT = "resources/fonts/FreeSans.ttf";
public static void main(String[] args) throws FileNotFoundException, DocumentException {
BaseFont bf = null;
try {
bf = BaseFont.createFont(FONT, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
} catch (IOException e) {
e.printStackTrace();
}
Font f = new Font(bf, 12);
Document document = new Document(); // Whole page is consider as docuemnt so we need object .
PdfPTable table = new PdfPTable(7); //Create Table Object
//Adding alignment and cells with defining rows
table.getDefaultCell().setHorizontalAlignment(Element.ALIGN_CENTER);
table.addCell("");
table.addCell("Age \u00AC");
table.addCell("Location");
table.addCell("Anotherline");
table.setHeaderRows(1);}
}
The file is in the resources folder and under fonts. Did I do something wrong?
From your error it seems your program not getting file so throwing exception.
IOException has sub classes such as FileNotFoundException . Make sure file path is correct and resource directory has ttf file .

Adding UNICODE emoticon on pdf with itext

I have some problem adding unicode emoticon on pdf created with itext pdf. I tried with this and itextpdf core 5.5.13
public class MathSymbols {
public static final String DEST = "EXAMPLE.pdf";
public static final String FONT = "/res/fonts/arialuni.ttf";
public static String TEXT ;
public static void main(String[] args) throws IOException, DocumentException {
File file = new File(DEST);
TEXT = "this "+Character.toChars(0x1F600)+" string \uD83D\uDE00 contains \ud83d\ude00 special \u2609 characters like this \u2208, \u2229, \u2211, \u222b, \u2206";
new MathSymbols().createPdf(DEST);
}
public void createPdf(String dest) throws IOException, DocumentException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(dest));
document.open();
BaseFont bf = BaseFont.createFont(FONT, BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
Font f = new Font(bf,12);
Paragraph p = new Paragraph(TEXT, f);
document.add(p);
document.close();
}}
I have the same proble on itextpdf 7.x with this snippet
public class Main {
public static final String DEST = "example.pdf";
public static void main(String args[]) throws IOException {
File file = new File(DEST);
new Main().createPdf(DEST);
}
public void createPdf(String dest) throws IOException {
PdfWriter writer = new PdfWriter(dest);
PdfDocument pdf = new PdfDocument(writer);
Document document = new Document(pdf);
PdfFont f = PdfFontFactory.createFont("/resources/arialuni.ttf", PdfEncodings.IDENTITY_H,true);
Paragraph p = new Paragraph("H\u2082SO\u2074 1 \uD83D\uDE00 contains \ud83d\ude00 spe \u2702 cial \u2609 characters like this \u2208, \u2229, \u2211, \u222b, \u2206").setFont(f).setFontSize(10);
document.add(p);
document.close();
}}
I tried with different fonts and different way in java like here.
But I only obtaing a withe space, or a square, in the pdf and no emoticon

Extracting font color along with font type from PDF using PDFBox

I need to extract Font color as well as Font type[E.g.-Black, Tahoma, Bold] from PDF by Java(Using PDFBox). Below is the code I have written to extract font type and embed the same in the extracted text.
public class PDFParse {
public static void main(String args[]) {
PDFTextStripper pdfStripper = null;
PDDocument pdDoc = null;
COSDocument cosDoc = null;
File file = new File("Sample Bill.pdf");
try {
PDFParser parser = new PDFParser(new FileInputStream(file));
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new PDFTextStripper() {
String prevBaseFont = "";
protected void writeString(String text, List<TextPosition> textPositions) throws IOException
{
StringBuilder builder = new StringBuilder();
for (TextPosition position : textPositions)
{
String baseFont = position.getFont().getBaseFont();
if (baseFont != null && !baseFont.equals(prevBaseFont))
{
builder.append('[').append(baseFont).append(']');
prevBaseFont = baseFont;
}
builder.append(position.getCharacter());
}
writeString(builder.toString());
}
};
pdDoc = new PDDocument(cosDoc);
pdfStripper.setStartPage(1);
pdfStripper.setEndPage(5);
pdfStripper.setSortByPosition(true);
String parsedText = pdfStripper.getText(pdDoc);
PrintWriter out = new PrintWriter("sample.txt");
out.println(parsedText);
out.close();
System.out.println(parsedText);
}
}
How to extract the font color for each word and embed the same in the same extracted file? Thanks :)

Categories