Add html code with base64 images in a header using iText [duplicate] - java

I'm using itextpdf-5.0.6.jar (Java 8) and when I try to export html code with base64 image tag I get file not found exception.
if I remove the image tag everything works great!
I found few solutions about overriding image tag processor but most of them are old and not compatiable with the 5.0.6 version.
Here is the HTML I send:
"<!doctype html>\n<html lang=\"en\">\n<head>\n
<meta charset=\"UTF-8\">\n
<title>Test PDF</title>\n</head>\n<body>\n\n
<div class=\"pdf-header\">\n\n
<img src=\"\"> \n\n\n</div>\n\n<div class=\"main\">\n<div class=\"canvas\">\nHellow world</div></div></body>\n</html>"
part of my code:
fileOutputStream = new FileOutputStream(file);
Document document = new Document();
PdfWriter.getInstance(document, fileOutputStream);
document.open();
HTMLWorker htmlWorker = new HTMLWorker(document);
StringReader stringReader = new StringReader(htmlCode);
htmlWorker.parse(stringReader);
document.close();
fileOutputStream.close();
any help will be appricated
thanks

Please stop using HTMLWorker, as repeated many times on StackOverflow, the HTMLWorker class has been abandoned in favor of XML Worker a long time ago. We won't invest in further development of HTMLWorker so it's a very bad choice to use it. Please switch to XML Worker.
Also upgrade to the latest iText version, the version you are using dates from February 4, 2011, many bugs have been fixed in the 4 years that have passed. Make sure you have both the iText jar and the XML Worker jar with the same version number.
Base64 images aren't supported yet, but I have made you a very simple Proof of Concept, showing how easy it is to add support for such images. Take a look at the ParseHtml4 example and the resulting PDF: html_4.pdf.
To achieve this, you need to write an implementation of the ImageProvider interface. I have done this by extending the AbstractImageProvider class:
class Base64ImageProvider extends AbstractImageProvider {
#Override
public Image retrieve(String src) {
int pos = src.indexOf("base64,");
try {
if (src.startsWith("data") && pos > 0) {
byte[] img = Base64.decode(src.substring(pos + 7));
return Image.getInstance(img);
}
else {
return Image.getInstance(src);
}
} catch (BadElementException ex) {
return null;
} catch (IOException ex) {
return null;
}
}
#Override
public String getImageRootPath() {
return null;
}
}
As you can see, I check for the existence of "base64," in whatever is passed to XML Worker through the src attribute of the img tag. If that String is present, I decode whatever follows that "base64," and I return an Image object that is created using the resulting bytes.
Once you have this ImageProvider implementation, it's only a matter of passing it to XML Worker.

Related

Why is my form being flattened without calling the flattenFields method?

I am testing my method with this form https://help.adobe.com/en_US/Acrobat/9.0/Samples/interactiveform_enabled.pdf
It is being called like so:
Pdf.editForm("./src/main/resources/pdfs/interactiveform_enabled.pdf", "./src/main/resources/pdfs/FILLEDOUT.pdf"));
where Pdf is just a worker class and editForm is a static method.
The editForm method looks like this:
public static int editForm(String inputPath, String outputPath) {
try {
PdfDocument pdf = new PdfDocument(new PdfReader(inputPath), new PdfWriter(outputPath));
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
Map<String, PdfFormField> m = form.getFormFields();
for (String s : m.keySet()) {
if (s.equals("Name_First")) {
m.get(s).setValue("Tristan");
}
if (s.equals("BACHELORS DEGREE")) {
m.get(s).setValue("Off"); // On or Off
}
if (s.equals("Sex")) {
m.get(s).setValue("FEMALE");
}
System.out.println(s);
}
pdf.close();
logger.info("Completed");
} catch (IOException e) {
logger.error("Unable to fill form " + outputPath + "\n\t" + e);
return 1;
}
return 0;
}
Unfortunately the FILLEDOUT.pdf file is no longer a form after calling this method. Am I doing something wrong?
I was using this resource for guidance. Notice how I am not calling the form.flattenFields(). If I do call that method however, I get an error of java.lang.IllegalArgumentException.
Thank you for your time.
Your form is Reader-enabled, i.e. it contains a usage rights digital signature by a key and certificate issued by Adobe to indicate to a regular Adobe Reader that it shall activate a number of additional features when operating on that very PDF.
If you stamp the file as in your original code, the existing PDF objects will get re-arranged and slightly changed. This breaks the usage rights signature, and Adobe Reader, recognizing that, disclaims "The document has been changed since it was created and use of extended features is no longer available."
If you stamp the file in append mode, though, the changes are appended to the PDF as an incremental update. Thus, the signature still correctly signs its original byte range and Adobe Reader does not complain.
To activate append mode, use StampingProperties when you create your PdfDocument:
PdfDocument pdf = new PdfDocument(new PdfReader(inputPath), new PdfWriter(outputPath), new StampingProperties().useAppendMode());
(Tested with iText 7.1.1-SNAPSHOT and Adobe Acrobat Reader DC version 2018.009.20050)
By the way, Adobe Reader does not merely check the signature, it also tries to determine whether the changes in the incremental update don't go beyond the scope of the additional features activated by the usage rights signature.
Otherwise you could simply take a small Reader-enabled PDF and in append mode replace all existing pages by your own content of choice. This of course is not in Adobe's interest...
The filled in PDF is still an AcroForm, otherwise the example below would result in the same PDF twice.
public class Main {
public static final String SRC = "src/main/resources/interactiveform_enabled.pdf";
public static final String DEST = "results/filled_form.pdf";
public static final String DEST2 = "results/filled_form_second_time.pdf";
public static void main(String[] args) throws Exception {
File file = new File(DEST);
file.getParentFile().mkdirs();
Main main = new Main();
Map<String, String> data1 = new HashMap<>();
data1.put("Name_First", "Tristan");
data1.put("BACHELORS DEGREE", "Off");
main.fillPdf(SRC, DEST, data1, false);
Map<String, String> data2 = new HashMap<>();
data2.put("Sex", "FEMALE");
main.fillPdf(DEST, DEST2, data2, false);
}
private void fillPdf(String src, String dest, Map<String, String> data, boolean flatten) {
try {
PdfDocument pdf = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
//Delete print field from acroform because it is defined in the contentstream not in the formfields
form.removeField("Print");
Map<String, PdfFormField> m = form.getFormFields();
for (String d : data.keySet()) {
for (String s : m.keySet()) {
if(s.equals(d)){
m.get(s).setValue(data.get(d));
}
}
}
if(flatten){
form.flattenFields();
}
pdf.close();
System.out.println("Completed");
} catch (IOException e) {
System.out.println("Unable to fill form " + dest + "\n\t" + e);
}
}
}
The issue you are facing has to do with the 'reader enabled forms'.
What it boils down to is that the PDF file that is initially fed to your program is reader enabled. Hence you can open the PDF in Adobe Reader and fill in the form. This allows Acrobat users to extend the behaviour of Adobe Reader.
Once the PDF is filled in and closed using iText it saves the PDF as 'not reader-extended'.
This makes it so that the AcroForm can still be filled using iText but when you open the PDF using Adobe Reader the extended functionality you see in the original PDF is gone. But this does not mean the form is flattened.
iText cannot make a form reader enabled, as a matter of fact, the only way to create a reader enabled form is using Acrobat Professional. This is how Acrobat and Adobe Reader interact and it is not something iText can imitate or solve. You can find some more info and a possible solution on this link.
The IllegalArgumentException you get when you call the form.flattenFields() method is because of the way the PDF document was constructed.
The "Print form" button should have been defined in the AcroForm, yet it is defined in the contentstream of the PDF, meaning the button in the AcroForm has an empty text value, and this is what causes the exception.
You can fix this by removing the print field from the AcroForm before you flatten.
IllegalArgumentException issue has been fixed in iText 7.1.5.

HTML to PDF with base64 images throws FileNotFoundException

I'm using itextpdf-5.0.6.jar (Java 8) and when I try to export html code with base64 image tag I get file not found exception.
if I remove the image tag everything works great!
I found few solutions about overriding image tag processor but most of them are old and not compatiable with the 5.0.6 version.
Here is the HTML I send:
"<!doctype html>\n<html lang=\"en\">\n<head>\n
<meta charset=\"UTF-8\">\n
<title>Test PDF</title>\n</head>\n<body>\n\n
<div class=\"pdf-header\">\n\n
<img src=\"\"> \n\n\n</div>\n\n<div class=\"main\">\n<div class=\"canvas\">\nHellow world</div></div></body>\n</html>"
part of my code:
fileOutputStream = new FileOutputStream(file);
Document document = new Document();
PdfWriter.getInstance(document, fileOutputStream);
document.open();
HTMLWorker htmlWorker = new HTMLWorker(document);
StringReader stringReader = new StringReader(htmlCode);
htmlWorker.parse(stringReader);
document.close();
fileOutputStream.close();
any help will be appricated
thanks
Please stop using HTMLWorker, as repeated many times on StackOverflow, the HTMLWorker class has been abandoned in favor of XML Worker a long time ago. We won't invest in further development of HTMLWorker so it's a very bad choice to use it. Please switch to XML Worker.
Also upgrade to the latest iText version, the version you are using dates from February 4, 2011, many bugs have been fixed in the 4 years that have passed. Make sure you have both the iText jar and the XML Worker jar with the same version number.
Base64 images aren't supported yet, but I have made you a very simple Proof of Concept, showing how easy it is to add support for such images. Take a look at the ParseHtml4 example and the resulting PDF: html_4.pdf.
To achieve this, you need to write an implementation of the ImageProvider interface. I have done this by extending the AbstractImageProvider class:
class Base64ImageProvider extends AbstractImageProvider {
#Override
public Image retrieve(String src) {
int pos = src.indexOf("base64,");
try {
if (src.startsWith("data") && pos > 0) {
byte[] img = Base64.decode(src.substring(pos + 7));
return Image.getInstance(img);
}
else {
return Image.getInstance(src);
}
} catch (BadElementException ex) {
return null;
} catch (IOException ex) {
return null;
}
}
#Override
public String getImageRootPath() {
return null;
}
}
As you can see, I check for the existence of "base64," in whatever is passed to XML Worker through the src attribute of the img tag. If that String is present, I decode whatever follows that "base64," and I return an Image object that is created using the resulting bytes.
Once you have this ImageProvider implementation, it's only a matter of passing it to XML Worker.

How to attach single or multiple attachments to CouchbaseLite document - Android?

I want to attach files to CouchbaseLite document. How can I do so? I did not find any code sample on official CBLite website for this - CBLite code Sample. I am still stuck how to accomplish it.
One way to do this in code is:
Document document = mDatabaseLocal.createDocument();
document.getCurrentRevision().createRevision().setAttachment(name, contentType, contentStream);
But this is not clear. *What should be the name?* - It is the absolute path of the attachment on your local disk?
For contentType: I do not know if there exists any enum class or constants that I can pass as contentType.
How would I attach multiple files to a document? Do I need to create unsavedRevision for every attachment?
The name must be unique per attachment, and doesn't refer to the local file, it refers to the name that you want to fetch it from on the document.
In this case you would call createRevision() once and then setAttachment() multiple times on the revision, before saving it.
you have to put an inputstream as attachment to your document.
A example can be found here CouchBase Attachment Example.
You have to convert each file into an InputStream and then you can set it to the document.
For convert you can use something like this:
private InputStream getAsStream(YourData data)
{
baos = new ByteArrayOutputStream();
try
{
objOstream = new ObjectOutputStream(baos);
objOstream.writeObject(data);
} catch (IOException e)
{
e.printStackTrace();
}
bArray = baos.toByteArray();
bais = new ByteArrayInputStream(bArray);
return bais;
}
In this example YourData can be every object or some of your own objectTypes.
Hope this explanation will help you.

ghost4j class cast exception during joining two PostScripts

I am trying to join two PostScript files to one with ghost4j 0.5.0 as follows:
final PSDocument[] psDocuments = new PSDocument[2];
psDocuments[0] = new PSDocument();
psDocuments[0].load("1.ps");
psDocuments[1] = new PSDocument();
psDocuments[1].load("2.ps");
psDocuments[0].append(psDocuments[1]);
psDocuments[0].write("3.ps");
During this simplified process I got the following exception message for the above "append" line:
org.ghost4j.document.DocumentException: java.lang.ClassCastException:
org.apache.xmlgraphics.ps.dsc.events.UnparsedDSCComment cannot be cast to
org.apache.xmlgraphics.ps.dsc.events.DSCCommentPage
Until now I have not made to find out whats the problem here - maybe some kind of a problem within one of the PostScript files?
So help would be appreciated.
EDIT:
I tested with ghostScript commandline tool:
gswin32.exe -dQUIET -dBATCH -dNOPAUSE -sDEVICE=pswrite -sOutputFile="test.ps" --filename "1.ps" "2.ps"
which results in a document where 1.ps and 2.ps are merged into one(!) page (i.e. overlay).
When removing the --filename the resulting document will be a PostScript with two pages as expected.
The exception occurs because one of the 2 documents does not follow the Adobe Document Structuring Convention (DSC), which is mandatory if you want to use the Document append method.
Use the SafeAppenderModifier instead. There is an example here: http://www.ghost4j.org/highlevelapisamples.html (Append a PDF document to a PostScript document)
I think something is wrong in the document or in the XMLGraphics library as it seems it cannot parse a part of it.
Here you can see the code in ghost4j that I think it is failing (link):
DSCParser parser = new DSCParser(bais);
Object tP = parser.nextDSCComment(DSCConstants.PAGES);
while (tP instanceof DSCAtend)
tP = parser.nextDSCComment(DSCConstants.PAGES);
DSCCommentPages pages = (DSCCommentPages) tP;
And here you can see why XMLGraphics may bre sesponsable (link):
private DSCComment parseDSCComment(String name, String value) {
DSCComment parsed = DSCCommentFactory.createDSCCommentFor(name);
if (parsed != null) {
try {
parsed.parseValue(value);
return parsed;
} catch (Exception e) {
//ignore and fall back to unparsed DSC comment
}
}
UnparsedDSCComment unparsed = new UnparsedDSCComment(name);
unparsed.parseValue(value);
return unparsed;
}
It seems parsed.parseValue(value) has thrown an exception, it was hidden in the catch and it returned an unparsed version ghost4j didn't expect.

Print PDF that contains JBIG2 images

Please, suggest me some libraries that will help me print PDF files that contain JBIG2 encoded images. PDFRenderer, PDFBox don't help me. These libs can print simple PDF, but not PDF containing JBIG2 images. PDFRenderer tries to fix it (according to bug issue on PDFRedndrer's bug tracker), but some pages still (especially where barcodes exist) don't want to print.
P.S. I use javax.print API within applet
Thanks!
UPDATE: also tried ICEPdf, is too don't want to work.
I came to the conclusion that all these libraries(PDFRenderer, ICEPdf, PDFBox) use JPedals jbig2 decoder. Bug (some pages didn't print) come from this decoder library. The open source version of this decoder (which is used in PDFRenderer, ICEPdf, PDFBox) is no longer supported, but JPedal has a new commercial branch of the project, and they wrote that the bug has been fixed in new commercial release, which costs $9k.
Any ideas?
UPDATE 2: yesterday I tried to replace JPedal's free library with other open-source jbig2-imageio libraries. But yet I don't get any successful results, so I created a new topic on their project's page (google-code's forum - here ). Would be grateful for any help.
I also found some helpfull discussions on Apache PDFBox bug-tracker: here and here.
As going through your comment in yms answer ie. " but what library I can use to extract images and (more importantly) put them back in PDF?"
Here is a simple demonstration of
1 ) Extracting jbig2 or you can say all images from pdf.
2 ) Converting jbig2 image to any other format, in my case its jpeg.
3 ) Creating new pdf containing the jpeg.
Using libraries jbig2-imageio and itext.
In the below example please change the resources and the directories path as per your need.
For this I had to go through several resources that I will attach in the end. Hope this helps.
import com.itextpdf.text.Document;
import com.itextpdf.text.Image;
import com.itextpdf.text.pdf.PdfPCell;
import com.itextpdf.text.pdf.PdfPTable;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.pdf.parser.*;
import com.levigo.jbig2.JBIG2ImageReader;
import com.levigo.jbig2.JBIG2ImageReaderSpi;
import com.levigo.jbig2.JBIG2ReadParam;
import com.levigo.jbig2.io.DefaultInputStreamFactory;
import java.awt.image.BufferedImage;
import java.io.*;
import javax.imageio.ImageIO;
import javax.imageio.stream.ImageInputStream;
public class JBig2Image {
private String filepath;
private int imageIndex;
public JBig2Image() {
this.filepath = "/home/blackadmin/Desktop/pdf/demo18.jbig2";
this.imageIndex = 0;
extractImgFromPdf();
convertJBig2ToJpeg();
createPDF();
}
private void extractImgFromPdf() {
try {
/////////// Extract all Images from pdf /////////////////////////
PdfReader reader = new PdfReader("/home/blackadmin/Desktop/pdf/orig.pdf");
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
MyImageRenderListener listener = new MyImageRenderListener("/home/blackadmin/Desktop/pdf/demo%s.%s");
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
parser.processContent(i, listener);
}
} catch (IOException ex) {
System.out.println(ex);
}
}
private void convertJBig2ToJpeg() {
InputStream inputStream = null;
try {
///////// Read jbig2 image ////////////////////////////////////////
inputStream = new FileInputStream(new File(filepath));
DefaultInputStreamFactory disf = new DefaultInputStreamFactory();
ImageInputStream imageInputStream = disf.getInputStream(inputStream);
JBIG2ImageReader imageReader = new JBIG2ImageReader(new JBIG2ImageReaderSpi());
imageReader.setInput(imageInputStream);
JBIG2ReadParam param = imageReader.getDefaultReadParam();
BufferedImage bufferedImage = imageReader.read(imageIndex, param);
////////// jbig2 to jpeg ///////////////////////////////////////////
ImageIO.write(bufferedImage, "jpeg", new File("/home/blackadmin/Desktop/pdf/demo18.jpeg"));
} catch (IOException ex) {
System.out.println(ex);
} finally {
try {
inputStream.close();
} catch (IOException ex) {
System.out.println(ex);
}
}
}
public void createPDF() {
Document document = new Document();
try {
PdfWriter.getInstance(document,
new FileOutputStream("/home/blackadmin/Desktop/pdf/output.pdf"));
document.open();
PdfPTable table = new PdfPTable(1); //1 column.
Image image = Image.getInstance("/home/blackadmin/Desktop/pdf/demo18.jpeg");
image.scaleToFit(800f, 600f);
image.scaleAbsolute(800f, 600f); // Give the size of image you want to print on pdf
PdfPCell nestedImgCell = new PdfPCell(image);
table.addCell(nestedImgCell);
document.add(table);
document.close();
System.out.println(
"======== PDF Created Successfully =========");
} catch (Exception e) {
System.out.println(e);
}
}
public static void main(String[] args) throws IOException {
new JBig2Image();
}
}
class MyImageRenderListener implements RenderListener {
/**
* The new document to which we've added a border rectangle.
*/
protected String path = "";
/**
* Creates a RenderListener that will look for images.
*/
public MyImageRenderListener(String path) {
this.path = path;
}
/**
* #see com.itextpdf.text.pdf.parser.RenderListener#beginTextBlock()
*/
public void beginTextBlock() {
}
/**
* #see com.itextpdf.text.pdf.parser.RenderListener#endTextBlock()
*/
public void endTextBlock() {
}
/**
* #see com.itextpdf.text.pdf.parser.RenderListener#renderImage(
* com.itextpdf.text.pdf.parser.ImageRenderInfo)
*/
public void renderImage(ImageRenderInfo renderInfo) {
try {
String filename;
FileOutputStream os;
PdfImageObject image = renderInfo.getImage();
if (image == null) {
return;
}
filename = String.format(path, renderInfo.getRef().getNumber(), image.getFileType());
os = new FileOutputStream(filename);
os.write(image.getImageAsBytes());
os.flush();
os.close();
} catch (IOException e) {
System.out.println(e.getMessage());
}
}
/**
* #see com.itextpdf.text.pdf.parser.RenderListener#renderText(
* com.itextpdf.text.pdf.parser.TextRenderInfo)
*/
public void renderText(TextRenderInfo renderInfo) {
}
}
References :
1 ) Extracting jbig2 from pdf (extract images) (MyImageRenderListener).
2 ) Converting jbig2 (JBIG2ImageReaderDemo)
There is a fork of the JPedal library by Borisvl located at
https://github.com/Borisvl/JBIG2-Image-Decoder#readme
which contains speed improvements and I believe it should also fix your bug.
EDIT : The bug is related to simple range checking. Basically you need to prevent GetPixel from accessing x,y values outside of the bitmap extents.
You need to make sure the following conditions are met before calling getPixel
col >= 0 and col < bitmap.width
row >= 0 and row < bitmap.height
Here is some Delphi code with a couple of small range checks. I cannot test the Java code myself but you need to make changes to src/org/jpedal/jbig2/image/JBIG2Bitmap.java
procedure TJBIG2Bitmap.combine(bitmap: TJBIG2Bitmap; x, y: Integer; combOp: Int64);
...
...
var
begin
srcWidth := bitmap.width;
srcHeight := bitmap.height;
srcRow := 0;
srcCol := 0;
if (x < 0) then x := 0;
if (y < 0) then y := 0;
for row := y to Min(y + srcHeight - 1, Self.height - 1) do // <<<<<<<< HERE
begin
for col := x to x + srcWidth - 1 do
begin
srcPixel := bitmap.getPixel(srcCol, srcRow);
Andrew.
How about using AcrobatReader itself? It's a bit muddy getting it to work, and not a robust solution I guess. But will probably print all of it perfectly. And be free
Some info about this route;
http://vineetreynolds.blogspot.nl/2005/12/silent-print-pdf-print-pdf.html
http://www.codeproject.com/Questions/98586/Programmatically-print-PDF-documents
http://forums.adobe.com/message/2336723
You have tools as ImageMagick which handle images and convert them to a lot of formats. I used it some years ago so I can't tell you if the jbig2 format is properly handled by default or if you have to install some plugin.
You can try the following to have a list of supported formats beginning with J like the JBIG2 you are searching for:
$ convert -list format | grep -i J
It is really obvious to convert to pdf with with tool too, coupled with gs tool aka GhostScript.
If fact nothing prevent you to display a PNG/JPEG version of the image and provide a download link to the original JBIG2 file with its own metadatas.
As an alternative, you could try doing this server-side:
Approach 1:
Convert the PDF files to raster images using an external application and print that instead.
Approach 2:
Adjust your PDF files by recompressing JBIG2 images:
1- Extracting the images compressed as JBIG2 from your files.
2- Re-compress them with some other algorithm (jpeg, png, etc). In order to do this you might need to go outside of Java using either JNI or calling an external application. You can try with jbig2dec or ImageMagic for example if the GPL lincense suits your needs.
3- Put the recompressed images back in your PDF.
This approach will imply some quality loss on those images, but at least you will be able to print the files.
You can do this in Java with iText, there is a chapter about resizing images in the book iText in Action (with sample code). The idea there is to extract the image, resize it (including recompression) and put it back. You can use this as starting point. Be aware that iText is an AGPL project, hence you cannot use it for free in commercial closed-source applications.
If you are using a Windows-based server and you can afford a commercial tool, you can also achieve this with Amyuni PDF Creator either with C#/VB.Net or C++ (Usual disclaimer applies for this suggestion). You just need to go though all objects of type acObjectTypePicture and set the attribute Compression to acJPegHigh, this approach does not require any external JBIG2 decoder, (I can include some sample code here if you are interested).
If you are using an applet just to print your PDF files, you could also try generating a PDF file that shows the print dialog when opened

Categories