ă ș ț characters missing from pdf generated from html with PdfWriter - java

I am trying to convert some html content to a pdf using the itext PdfWriter, like this:
Document document = new Document();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
document.open();
InputStream stream = new ByteArrayInputStream(content.getBytes(StandardCharsets.UTF_8));
XMLWorkerHelper.getInstance().parseXHtml(writer, document, stream, Charset.forName("UTF-8"));
document.close();
but the ă ș ț charaters are missing from the generated pdf. I have tried setting the encoding or the font, but with no luck. What I tried was to use a font provider and set it as a param to the parseXHtml method.
I set the encoding, but nothing changed.
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider();
fontProvider.setUseUnicode(true);
fontProvider.defaultEncoding = BaseFont.CP1257;
I also tried setting the font, but it was not applied to the pdf.
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register(PATH_TO_TTF_FONT_FILE_HOSTED_ON_S3);
And then set the param for parseXHtml.
XMLWorkerHelper.getInstance().parseXHtml(writer, document, stream, Charset.forName("UTF-8"), fontProvider);
Is there any way I could use the PdfWriter to convert all characters correctly from html to pdf?

Related

How to change PDF margin while converting .docx to pdf using opensagres

I'm using this code to convert an XWPFDocument to PDF:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfOptions options = PdfOptions.create();
PdfConverter.getInstance().convert(xwpfDocument, baos, options);
The converter is setting wrong margins on the pdf. Here is the original .docx:
And this is the pdf after conversion:
How can I retain the original left and right margins?

How to convert an html String to a PDF InputStream?

If we have string with a content of a html page, how can we convert it to a InputStream made after transform this string to a pdf document?
I'm trying to use iText with XMLWorkerHelper, and this following code works, but the problem is I don't want the output on a file. I have tried several variations in order to get the result on a InputStream that I could convert to a Primefaces StreamedContent but no success. How we can do it?
Is there another technique that we can use to solve this problem?
The motivation to this is use xhtml files wich is already rendered and output it as a pdf to be downloaded by the user.
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document,
new FileOutputStream("results/loremipsum.pdf"));
document.open();
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new FileInputStream("/html/loremipsum.html"));
document.close();
If you need an InputStream from which some other code can read the PDF your code produces, you can simply create the PDF using a byte array output stream and thereafter wrap the byte array from that stream in a byte array input stream:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, baos);
document.open();
XMLWorkerHelper.getInstance().parseXHtml(writer, document, new FileInputStream("/html/loremipsum.html"));
document.close();
ByteArrayInputStream pdfInputStream = new ByteArrayInputStream(baos.toByteArray());
You can optimize this a bit by creating and processing the PDF in different threads and using a PipedOutputStream and a PipedInputStream instead.

Save itext pdf as blob without physical existence.

I am using this code to generate PDF using iText. First it creates HTML to PDF after that it converts that PDF in byte array or in BLOB or in byte array.
I dont want to create any physical stores of pdf on my server. First i want to convert HTML to blob of PDF using itext, And after that i want to store that blob in my DB(Stores in DB i will done).
String userAccessToken=requests.getSession()
.getAttribute("access_token").toString();
Document document = new Document(PageSize.LETTER);
String name="/pdf/invoice.pdf";
PdfWriter pdfWriter = PdfWriter.getInstance
(document, new FileOutputStream(requests.getSession().getServletContext().getRealPath("")+"/assets"+name));
document.open();
document.addAuthor("Real Gagnon");
document.addCreator("Real's HowTo");
document.addSubject("Thanks for your support");
document.addTitle("Please read this ");
XMLWorkerHelper worker = XMLWorkerHelper.getInstance();
//data is an html string
String str = data;
worker.parseXHtml(pdfWriter, document, new StringReader(str));
document.close();
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
PdfWriter.getInstance(document, byteArrayOutputStream);
byte[] pdfBytes = byteArrayOutputStream.toByteArray();
link=name;
System.out.println("Byte array is "+pdfBytes);
PROBLEM:- Convert html to pdf BLOB using itext, Without physical existence of PDF.
The other answer to this question is almost correct, but not quite.
You can use any OutputStream when you create a PdfWriter. If you want to create a file entirely in memory, you can use a ByteArrayOutputStream like this:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
Document document = new Document();
PdfWriter.getInstance(document, baos);
document.open();
// add stuff
document.close();
byte[] pdf = baos.toByteArray();
In short: you first create a ByteArrayOutputStream, you pass this OutputStream to the PdfWriter and after the document is closed, you can get the bytes from the OutputStream.
(In the other answer, there was no way to retrieve the bytes. Also: it is important that you don't try to retrieve the bytes before the document is closed.)
Write into a ByteArrayOutputStream (instead of a FileOutputStream):
PdfWriter pdfWriter = PdfWriter.getInstance
(document, new ByteArrayOutputStream());

How to add external CSS while generating PDF?

Currently i am using following code to generate PDF in a JSP file:
response.setContentType("application/force-download");
response.setHeader("Content-Disposition", "attachment;filename=reports.pdf");
Document document = new Document();
document.setPageSize(PageSize.A1);
PdfWriter writer = null;
writer = PdfWriter.getInstance(document, response.getOutputStream());
document.open();
ByteArrayInputStream bis = new ByteArrayInputStream(htmlSource.toString().getBytes());
XMLWorkerHelper.getInstance().parseXHtml(writer, document, bis);
document.close();
With this am able to generate PDF.
But i would like to add CSS file while generating PDF.
Please Help me...
i am not sure in java how can you use but in c# you can add external style sheet code or syntax by this code:-
StyleSheet css = new StyleSheet();
css.LoadTagStyle("body", "face", "Garamond");
css.LoadTagStyle("body", "encoding", "Identity-H");
css.LoadTagStyle("body", "size", "12pt");
may be this helps you
Regards,
vinit
Please take a look at the ParseHtmlTable1 example. In this example, we have HTML stored in a StringBuilder object and some CSS stored in a String. In my example, I convert the sb object and the CSS object to an InputStream. If you have files with the HTML and the CSS, you could easily use a FileInputStream.
Once you have an InputStream for the HTML and the CSS, you can use this code:
// CSS
CSSResolver cssResolver = new StyleAttrCSSResolver();
CssFile cssFile = XMLWorkerHelper.getCSS(new ByteArrayInputStream(CSS.getBytes()));
cssResolver.addCss(cssFile);
// HTML
HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
// Pipelines
PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
p.parse(new ByteArrayInputStream(sb.toString().getBytes()));
Or, if you don't like all that code:
ByteArrayInputStream bis = new ByteArrayInputStream(htmlSource.toString().getBytes());
ByteArrayInputStream cis = new ByteArrayInputStream(cssSource.toString().getBytes());
XMLWorkerHelper.getInstance().parseXHtml(writer, document, bis, cis);
You may try this code reference.
$html .= '<style>'.file_get_contents(_BASE_PATH.'stylesheet.css').'</style>';
Change content type to
response.setContentType("application/pdf");

binary pdf byte[] to com.lowagie.text.Document

I had an binary pdf(Byte []). I would like to convert it to a com.lowagie.text.Document.is there anyway to convert it without losing any information on it.Thanks
this is i tried, and to not sure how to move further.
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, out);
document.open();
PdfReader reader = new PdfReader(byteBuffer);

Categories