Disable pdf-text searching with pdfBox - java

I have a pdf document (no form) where I want to disable the text searching using pdfBox (java).
Following possibilities I can imagine:
Flatten text
Remove Text information (without removing text itself)
Add overlay to document.
Currently I've no idea how I can implement that. Does anyone has an idea how to solve that?

many thanks for your help here. I guess I found a way that fit to the requirements. (Honestly, not really clean):
Add the rectangle to the address sections
convert PDF to image
convert image back to pdf.
While losing all text information, the user isn't able to see the critical information anymore. Due to the reason, that this is only for display (the initial PDF document doesn't get changed) this is ok for now.

It depends on your goals:
avoid everything on some texts: print, mark with black ink, and scan again;
delete sensible text: you have to scan inside text, and remove/replace it (with pdfbox), but it is risky (some text are splitted);
mask some text for viewer : find text and add a black rectangle (with pdfbox), but it is not very safe. You can remove the rectangle, or use another tool to read the text. Usually, if text is inside, some tool can find it;
avoiding copy/paste the text (but not search / view): use security options, with password:
see: https://pdfbox.apache.org/2.0/cookbook/encryption.html
PDDocument doc = PDDocument.load(new File("filename.pdf"));
// Define the length of the encryption key.
// Possible values are 40, 128 or 256.
int keyLength = 128;
// 256 => plante
AccessPermission ap = new AccessPermission();
// disable printing, everything else is allowed
ap.setCanPrint(false);
ap.setCanExtractContent(false);
ap.setCanExtractForAccessibility(false);
// Owner password (to open the file with all permissions) is "12345"
// User password (to open the file but with restricted permissions, is empty here)
StandardProtectionPolicy spp = new StandardProtectionPolicy("12345", "", ap);
spp.setEncryptionKeyLength(keyLength);
spp.setPermissions(ap);
doc.protect(spp);
doc.save("filename-encrypted2.pdf");
doc.close();

Related

pdfbox embedding subset font for annotations

I am trying to use Apache PDFBOX v2.0.21 to modify existing PDF documents, adding signatures and annotations. That means that I am actively using incremental save mode. I am also embedding LiberationSans font to accommodate some Unicode characters. It makes sense for me to use the subsetting feature of PDF embedded fonts as embedding LiberationSans in full makes the PDF file around 200+ KB more in side.
After multiple trials and errors I finally managed to have something working - all but the font subsetting. The way I do this is to initialize the PDFont object once using
try (InputStream fs = PDFService.class.getResourceAsStream("/static/fonts/LiberationSans-Regular.ttf")) {
_font = PDType0Font.load(pddoc, fs, true);
}
And then to use custom Appearance Stream to show the text.
private void addAnnotation(String name, PDDocument doc, PDPage page, float x, float y, String text) throws IOException {
List<PDAnnotation> annotations = page.getAnnotations();
PDAnnotationRubberStamp t = new PDAnnotationRubberStamp();
t.setAnnotationName(name); // might play important role
t.setPrinted(true); // always visible
t.setReadOnly(true); // does not interact with user
t.setContents(text);
PDRectangle rect = ....;
t.setRectangle(rect);
PDAppearanceDictionary ap = new PDAppearanceDictionary();
ap.setNormalAppearance(createAppearanceStream(doc, t));
ap.getCOSObject().setNeedToBeUpdated(true);
t.setAppearance(ap);
annotations.add(t);
page.setAnnotations(annotations);
t.getCOSObject().setNeedToBeUpdated(true);
page.getResources().getCOSObject().setNeedToBeUpdated(true);
page.getCOSObject().setNeedToBeUpdated(true);
doc.getDocumentCatalog().getPages().getCOSObject().setNeedToBeUpdated(true);
doc.getDocumentCatalog().getCOSObject().setNeedToBeUpdated(true);
}
private PDAppearanceStream createAppearanceStream(final PDDocument document, PDAnnotation ann) throws IOException
{
PDAppearanceStream aps = new PDAppearanceStream(document);
PDRectangle rect = ann.getRectangle();
rect = new PDRectangle(0, 0, rect.getWidth(), rect.getHeight());
aps.setBBox(rect); // set bounding box to the dimensions of the annotation itself
// embed our unicode font (NB: yes, this needs to be done otherwise aps.getResources() == null which will cause NPE later during setFont)
PDResources res = new PDResources();
_fontName = res.add(_font).getName();
aps.setResources(res);
PDAppearanceContentStream apsContent = null;
try {
// draw directly on the XObject's content stream
apsContent = new PDAppearanceContentStream(aps);
apsContent.beginText();
apsContent.setFont(_font, _fontSize);
apsContent.showText(ann.getContents());
apsContent.endText();
}
finally {
if (apsContent != null) {
try { apsContent.close(); } catch (Exception ex) { log.error(ex.getMessage(), ex); }
}
}
aps.getResources().getCOSObject().setNeedToBeUpdated(true);
aps.getCOSObject().setNeedToBeUpdated(true);
return aps;
}
This code runs, but creates a PDF with dots instead of actual characters, which, I guess, means that the font subset has not been embedded. Moreover, I get the following warnings:
2021-04-17 12:33:31.326 WARN 20820 --- [ main]
o.a.p.pdmodel.PDAbstractContentStream : attempting to use subset
font LiberationSans without proper context
After looking through the source code, I get and I guess that I am messing something up when creating the appearance stream - somehow it's not connected with the PDDocument and the subsetting does not continue normally. Note that the above code works well when the font is embedded fully (i.e. if I call PDType0Font.load with the last parameter set to false)
Can anyone think of some hint to give to me? Thank you!
I don't know - am I lucky? It is very often that luckiness in programming points to something completely wrong or misleading. In any case, if someone can still give a hint, my ears are more than open...
Again, after looking through the code, I saw the following in PDDocument.save():
// subset designated fonts
for (PDFont font : fontsToSubset)
{
font.subset();
}
This is not happening in PDDocument.saveIncremental() which I am using. Just to mess around with the code, I went and did the following just before calling saveIncremental() on my document:
_font.subset(); // you can see in the beginning of the question how _font is created
_font.getCOSObject().setNeedToBeUpdated(true);
pddoc.saveIncremental(baos);
Believe it or not, but the document was saved correctly - at least it appears correct in Acrobat Reader DC and Chrome & Firefox PDF viewers. Note that Unicode codepoints are added to the subset for the font during showText() on appearance content stream.
UPDATE 18/04/2021: as I mentioned in the comments, I got reports from users that started seeing messages like "Cannot extract the embedded font XXXXXX+LiberationSans-Regular from ...", when they opened the modified PDF files. Strangely enough, I didn't see these messages during my tests. It turns out that my copy of Acrobat Reader DC was newer than theirs, and specifically with the continuous release version 2021.001.20149 no errors were shown, while with the continuous release version 2020.012.20043 the above message was shown.
After investigations, it turns out that the problem was with the way I was embedding the font. I am not aware if any other way exists, and I am not that familiar with the PDF specification to know otherwise. What I was doing, as you can see from the above code, was to load the font ONCE for the document, and then to use it freely in the resource dictionary of the appearance stream of EVERY annotation. This had as a result all the resource dictionaries of the annotation content streams to reference an F1 font that was defined with the SAME /BaseFont name. The PDF Reference, 3rd ed. on p.323 specifically states that:
"... the PostScript name of the font - ... - begins with a tag
followed by a plus sign (+). The tag consists of exactly six uppercase
letters; the choice of letters is arbitrary, but different subsets in
the same PDF file must have different tags..."
Once I started to call PDType0Font.load for each of my annotations and calling subset() (and of course setNeedToBeUpdated) after creating appearance stream for each of them, I saw that the BaseName attributes started to look indeed differently - and indeed, the older 2020 version of Acrobat Reader DC stopped complaining.
[edit 07/10/2021: even trying to use a single PDFont object per page (having multiple annotations with this font), and subsetting it once, after having called showText on appearances of all annotations, appears to not work - it appears that the subsetting uses the letters I passed to the first showText, and not the others, resulting in wrong rendering of the 2nd, 3rd etc. annotations that might have characters that didn't exist in the 1st annotation - so I reiterate that what worked was to use loadFont for each separate annotation and then (after modifying appearance with showText, which will mark the letters to be used during subsetting) to call subset() on each of these fonts (which will result in the change of the font name)]
Note that other than using iText RUPS for inspecting the PDF contents, one could use Foxit PDF viewer to at least ensure that the subset font names are different. Acrobat Reader DC and PDF-xChange in Properties -> Fonts just show the initial font name, like LiberationSans, without showing the 6-letter unique prefix.
UPDATE 19/04/2021 I am still working on this issue - because I still get reports about the infamous "Cannot extract the embedded font" message. It is quite possible that the original cause of that message was not (or not only) the fact that the different subsets had same BaseFont names. One thing that I am observing is that on some computers, the stamp annotations that I am using cause Acrobat Reader DC to open automatically the so called "Comments pane" - there are options to turn this automatic thing off (Preferences -> Commenting -> Show comments pane when a PDF with comments is opened). When this pane opens, either manually or automatically, the error message appears (and I was on my wits ends to see why same version of Acrobat Reader DC behaves differently for different machines). I think that Acrobat Reader tries to extract the full version of the font and fails, since it is only a subset. But, I guess, this doesn't have to do with the semantic contents of the document - the document still passes "qpdf --check". I am currently trying to find if it is possible to restrict stamps to not allow comments - i.e. some way to disable the comments pane in Acrobat Reader DC, although I have little hope.
UPDATE 20/04/2021 opened a new question here

add Inline image in pdf document using itext library

we have created an application to generate pdf documents using itext 5 library. As the part of pdf generation, we tried to embed an image inline in pdf which should be non editable and read only. We tried with an addImage method of PdfContentByte as below,
byte[] decoded = Base64.getDecoder().decode(encodedImage);
image = Image.getInstance(decoded);
After this image is retrieved, used the same in addImage method.
PdfContentByte canvas = pdfStamper.getOverContent(item.getPage(0));
canvas.addImage(image, Boolean.TRUE);
Findings : Since the image is in Base 64 string format, the image is not displayed in the resultant pdf document (if the image is not in Base 64 format, it is working fine).
when we open the pdf , the below error is shown :-------> "An error exists on this page. Acrobat may not display the page correctly. Please contact the person who created the PDF document to correct the problem."
how can we handle this situation ?.
Is any other way to achieve this requirement. Please help
Code :-
PdfReader resultantPdfReader = new PdfReader("template.pdf");
PdfStamper resultandPdfStamper = new PdfStamper(resultantPdfReader, new FileOutputStream("A13.pdf"));
AcroFields acroFields = resultantPdfReader.getAcroFields();
Rectangle fieldPosRec = acroFields.getFieldPositions("imageField").get(0).position;
String encodedSignature = "";
encodedSignature = new String(Files.readAllBytes(Paths.get("MyImage.png")));
if(encodedSignature.indexOf("data:image/png;base64,") != -1) {
encodedSignature = encodedSignature.substring("data:image/png;base64,".length());
}
Image image = null;
try {
byte[] decoded = Base64.getDecoder().decode(encodedSignature);
image = Image.getInstance(decoded);
} catch (Exception e) {
e.printStackTrace();
}
image.scaleAbsoluteHeight(fieldPosRec.getHeight());
image.scaleAbsoluteWidth(fieldPosRec.getWidth());
image.scaleToFit(fieldPosRec);
acroFields.removeField("imageField");
image.setAbsolutePosition(fieldPosRec.getLeft(), fieldPosRec.getBottom());
PdfContentByte canvas = resultandPdfStamper.getOverContent(item.getPage(0));
canvas.addImage(image, Boolean.TRUE);
resultandPdfStamper.close();
resultantPdfReader.close();
Base64 Image String : iVBORw0KGgoAAAANSUhEUgAAAU4AAACWCAYAAACxSWGfAAAAAXNSR0IArs4c6QAAFJFJREFUeAHtnQmwZFV9hw37DgoiaJgZsGBAFgkSIEJgjAZRAliFikrEoOgoGkQlEFJKiXFhr5RCAKMjJhBETAoRImjAB4KAKBgEDBBhQM2IyEDYh2XM90Gf8k7T3a/fe9093X1//6rv3XOXvsvX9/773HNP93vBCxIxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAMxEAPNBl7MhEObJ2Y8BmIgBmKgtYG9mTwBv2vAIBEDMRADMdDKwCuYeCmUhFmGrZbNtBiIgRhoa2AN5uwDZ8L3oCST6Q6vYR3nwnyYC8MQ67MTp8JT4HEthsMbZccTMRADMTCpgU1Z4kPwbXgCppsku3ndIta/vBLpymz7w2CidF9NnCZQE6lR9v+5sfyNgRiIgYqBlSjvASfALVAShsNn4Br4OPwRzDSsZc4Hk+UiqG7LstOcNx/mQj9iM1Z6LFwNZfveonurXmJVCmVemZZhDMRAzQ1swPG/E74GD0BJEg4dPw8OAp8u9zNMjibJc2ERVPfDstOcNx/mwnRjLV54MFwBS6Fsx3X7MKg5jmeCyxzXPCPjnQ38QefZmRsDI2dge/bYJCE7wwpQ4mcULm5wFcOny4wBD02O8yps1LR9b6sfhvsa/GaS8k7M/yvYH9YE41H4NzgLJsAEWQ3dWBs1Xg0/fLaUPzEQA7UysAdH+2UotSyHtl1eAn8Nm8Kwhol0PlgzvA6qxzCVsrVMa5sHg7XPdrEqM24F153aZjtLHaanxtlBTmaNhAFvs0+EdzX29iaGJp+L4DKw5jVqYa1xQ/DYpF3ZeS77A/gJfBXuhMnCW/QjwRq4bbpLIBEDMVADA37ovwfuB2tOj8MnYBVItDfgLbpNFLJT+8UyJwZiYNwMbM0BfR/KbaxPi18+bgfZ4+Pxg+bNcCPkFr3HcrO63hlYl1VtA3vBe+FYsA3uO3Ar/Aouh9PhjbAaJDobWIPZn4MnwYt/EbwNEp0NvI7Z10P5oDmasu2ciRgYGgOvYk9OgnKSdju0Le6bcAhsDIllDezN6F2gz2fgNPDDKdHewI7M+i6Uc9AP6/eBfVoTMTAUBjwZj4HylTbb3qxZWsNcANY4rXlaA7Um6kW/A/gaawNLoZzglp3mvF50yGY1IxsvY8+/AcXNDZT/eGSPZjA7vgWb+TqUc+oBykfB6pCIgaExsCV7Yj84L25P1lNgqrfe1jJNrNY6rX2WROHwHNgX6hQrcrAfhodABw79brXTE60NvJTJZ0L58H6M8nHwQkjEwNAYsMH9MPAE9eJeCPNgpmHS9db0DLCGVZLoBGVvv8Y9rFFWj9uO3NY8E60NmBhNkOU8NHGaQE2kiRgYKgObsDeXQUlqCyiv04c9tAngQ3AfuC1rtGfDLBi3sPniVLAN02O9C/wASbQ24K23t+CLoZwb3qJ7q56IgaEzcBB79CB4st4L+0K/w6RireJxcLsOfcLcj2TNagceB7DF/wWPzafmHqtP0RPPN+CHqQ95fNijL/ku1OFuhMNMjJqBF7PD/w7lZLXstEHGbDZmm6c1T/fDmugHwYtpFMP+l5dCcXoVZR+eJZY1YLPQLnAyXAfF1/WUXweJGBhKA9YqrV16wlrbtNa5PMPaxRVQLqD/przf8tyhKW57FZb3Z9xKDdpeCIeACSLxnIFqsrybSeW9dngevAXiCwmJ4TPgrfACKCet7Zq2bw5LmCxvg7J/E5SH/ZZtJ/bRC7/s81cpD7rmziaHMjoly3vY41PAmmcS5lC+fdkpDcyDheAF7hPLw2AYT1hv00fhAZLfVrHt0u9J6/QCmAd1jyTLup8BY3L8q3EcfrKXdkT7aM4dgWNr9QDJRGWteXmHtcxbwYRp4jweTKR1jSTLur7zY3rca3FcJ4IXuP3hjgFrdKMUs9jZYXmA1FzLNHnuPEoye7iv1WTpbbfnWCG34T0UnVUN3sD5jZN5gqHfOx/laH6AdCUHczRsNKCDSi3zuaad8jQ8yXJAJ142M1gDR7I5awA+Nd98sJvu69Z8gGTH6FK7sSbtt3H2hH602da1lrk6Pv2wejf8A1wOd0Dx7tDkaZeiPOBBQmL0DbyWQ7DtzXbNfUf/cJ53BCbI14MJ08RZLuY7KfeyFlqHWqYuNwU/kD4B3qXYs6F846m4LUMTaJIlEhLjZWAWh3MfeKJ/arwOreXReKv+d2DSLBf3TGuh41rL9MHarvABOB2uhvLDI8VdGfqNp5+C7ctHwRvgZZCIgbEz4BP068GT/z9gBahLdKqFmli7bQsdh1qmv7o0F+xU/vfgr1TdBSUpNg/9euglcCK8E14JdupPxEAtDHyZo/Si+DnU+ee3plMLHdVa5ga8138Gh8MC+BE8Bs3J0XG/2eR8l3N5X+frEzFQWwPv58i9OB6F7WprYdkDb1cLbZVUyrRh7ZdpDdD39S/hBLCGaE2x7HfzcCHzLoRPw1thS7AmmoiBGGgY2IXhEvDieUdjWgbLGii1UB8oNSeZMm5H9mHol2lb4l5wJJwNN4FtjmU/q0PbKG2rtM3yUNgNbMtMxEBXBqxd1DFewkH/GLzY7DbyEUiMhgFrkVuBbYoF30drh82xlAn/AybRKgsZN5EmYmBaBuqYOP0W0GWwO1wJpRsSxcSQGdiQ/fFWuyRIhybNlaE5bmeCt+HVBHkL47ZdJmKgpwZG7auEvTj4k1iJSfNXYDuW7XOJ5WvA83ALqCZIyxu32C1rkfaZ/K8mftli2UyKgb4YqFuN07ZM+9jZ9rUHXAuJwRqw50JzLXIbpvmEvjlsi7QGWU2SNzOeWmSzqYwP1ECdapxerP/UsHsYwyTN/p5q9od9OTTXIme12KztjXdCNUFaXghpi0RCYrgM1KXGaS3HvnibwQJ4DyR6Z2BtVrUtVJOk42u22IS1Rb9dU02S1iofbrFsJsXAUBqoQ+K05nMR+PU3k6ddT+yGlJiegTm8rJogLfuB1Opc+gXTqwnSsk+5badMxMDIGqjDrfoneXdMmr+F/SFJEwldxOosY9tjNUna3LFui9fq1CfYzUnygRbLZlIMjLyBVrWEkT+oygHsS/kCsIazJ1wOiecbsB9kNUFa9im3tfXm+DUTmhOkT7nTO6HZVMZjYAQNeOE/CD5c+JsR3P9+77Lft9bLhaCjZux5YNvjv8AR8OewISRiIAbG1IA16c+DycAf8E383sCrKf4zPAElWfoDJ34p4BR4F2wPq0AiBmKgRgZ251hNCn6tcq0aHXe7Q9XB++AnUJLlM5R9aPZGaHVLzuREDMRAnQycycGaID5Tp4NucaxbM+0L8H9QEua9lD8LsyERAzEQA88a8BZzMZgoXvHslHr9WZnDPQCugJIsHX4f3g65BUdCIgZiYFkDb2LURHHjspPHfmwWR2gN26feJWE+RPk0sFtRIgZiIAbaGjifOSaOj7VdYnxm2Da5F/hk3DbLkjDtLvR+SPsuEhIxEAOdDazL7MfBJGLfxHENuxIdCT+Hkix9Sn427AqJGIiBGOjawLtZ0kRyWdevGK0F7Upkv8pqV6I7GT8KTKaJGIiBGJiyAROmifPgKb9yeF/g7fZ8aO5K9C2mpSvR8L5v2bMYGAkD3pp7i+6t+jojscedd9KuRKdCuhJ19pS5MRADMzBwBK+1tjnK3xSyq9DboLkr0ZVMS1ciJCRiIAZ6a6Dcyu7X29UOZG2z2YpdieycbvKX0pXImmciBmIgBnpuwORisrkfRqWDt12J/Lm7Vl2JbNNMVyIkJGIgBvpn4LOs2sR5Rv820dM1b8La/EGNUrssXYl8ap6IgRiIgYEYuIitmIT+dCBbm9lGtuPl/kdG9/eHYH/MdCVCQiIGYmCwBi5hcyaiXQa72Slv7bW8ojwln6C83pTXkBfMxIA/N+g5cjJ4ziRioNYGvBBMnA6HNQ5kx5aA+/k1aPXvcJmc6LGBarK8m3Xrv7BVj7eV1cXASBmwFuHFcA94oQxb+M2epeA+ngTDuI/s1thEp2TpOWL7sudM3oexectzINMx4AXgBWFi8oIYlvDJub9O5H7ZOf8wSPTHQDVZlnNB75Jk2R/nWesYGBi22/XVcXoBeOH6bSb/w2aitwaSLHvrM2uroYFyu/4bjv1zsM1ydOBT8mvApHk/5BeLkNCDMFH6vn4QzoPypYdqzdIP0NyGIyERA90Y8KLaB8pF5PAm+FuYDYOKzdjQ7eD274ItITE9AyvyslfBR8Haux9Ceq0ywXiSJRISMTBdAyZP+3KeDr+FcoH5YMZ/HfEB6GefyR1Z/73gdm+AjSDRvQG/9eWXAI6Gb4NfOy3vYRnezTR/Wu+9MBcSMRADPTTg/975C/hXeBTKhfcU5YvgHbAm9Cr8ibdHwO1cAmtBorMB24FfA5+Ey+ExKO9TGd7BtC/BQTAHEjEQAwMyYIK0H+XFYOIsF6WJ7hzYG0y0041DeOHT4Hq/AitB4vkG1mGS/+rDNuir4Uko74VD7wxuBnsiHAAbQyIGYmAIDGzAPhwKV4EXarlwvbX/R9gNvOXvNo5lwbKOT3X7opostz7H6a9V2W/yR1A+XIovx3/cmP8mhi6fiIGhNzCVBDH0BzONHZzNa7xll20qr19E+T5Y2IYHmG6t8otwMNhH03+O9iWoc1hDtI15D9gdtobqOWZt3wR6BVwJ1jptx0zEwEgZqJ7UI7XjfdjZbVmnCXQH2HOS9Xux625tMBmcBT7MWNjAxFqH8IPHBFkS5eZNB/0449dBSZTXUrYdMxEDI20gibP12/ciJs/pgAmzU5hYF3ZgVBPrFhxTNVHOYrwaDzPyAyiJ8nrKT1YXSDkGxsFAEufU3kX7ZFqznAN+je848MnwnArWwnwI0ikmS6zOXxVWaQy7LXe7XLv1dnq9TRPOr8ZiRmwrLonyRso2WyRiYKwNJHF2//buyqIXwovAW859wAdKrcJl5rShm8Taap3DMM3fD70bSqL0CfjvhmHHsg8xMEgDSZzd2d6fxc6G1eCb8Haw/W660SmxrstK7ThvzW1JA293S9lhdXyq5Zm83vbcRAzEQAxMauAwljCJWbOyf+EKkIiBGIiBGGhhwNr4iWDCXAp+5z0RAzEQAzHQxoAPT84Fk6a3wgdCIgZiIAZioI2B9Zg+ASZN/z+Q/ycoEQMxEAMx0MaAXYu+ACbNX8J2kIiBGIiBGOhgwO+qmzS/A5t0WC6zYiAGYiAGMGC/TJPmE/BKSMRADMRADHQwYL9Jf9TDxHl4h+UyKwZiIAZiAAN2O7oUTJoO86UAJCRiIAZioJOBjzLTpOk/ebPmmYiBGIiBGOhgwLZM2zRNnLZxJmIgBmIgBjoYsOvRLWDS9KuUiRiIgRiIgUkMlK5HJk+TaCIGYiAGYqCDgXQ96iAns2IgBmKg2UC6HjUbyXgMxEAMdDCQrkcd5GRWDMRADLQy8BEmputRKzOZFgMxEAMtDKTrUQspmRQDMRAD7Qyk61E7M5keAzEQA20MpOtRGzGZHAMxEAOtDKTrUSsrmRYDMRADbQxUux75YCgRA3U2sBMHf1ydBeTYJzeQrkeTO8oS9TCwKodpwnwa8rsM9XjPp32U6Xo0bXV54RgZsJZ5K5gwTZzHg4k0MQMD4/rbk3Y9ug48QfaFb0FitAz4n0bXqrB2pTzZ9FbLei74z/fE/yf1i6Zhmeb8cQiP91g4AlaEn8HB4HWRmKGBlWb4+mF8uR8G88ETx6fpSZpI6HN0k+RaJTMTYKvpTlu5D/u8Jut8KWzVYd2PMK9dUh325DqXfZ/XYHuGW8IzcAIcA0sg0QMD41jj3AUv18CNsCs8DompG/BDdUPwAVth40q5TDPJvQR6HU+xQpPYw42h5UKrad0suwbr+MMG/jM+y81Dk/lk4baGJblamzwQ9oa3QjXOYOQsSC2zaqUH5XGscb6l4eV7DJM0n3+SvJBJJek1D6uJcQOW6/aD9VGWfRJaJbRupzUnPtfX63CdD8LNHVa8LvOak2nzuMnVWmunmut/Mv8mOB9MXLYx9ipKspzHCr39LuFPJP4UJhrcxjDRBwPdXhh92HRfVunx3A2e6H8C10IdYjUO0lpfNfE1J8UybhNGN/EMC/kP7H7dhkWV6Q9RrlN0Sq7WYtcHa+sl7qHwDZhpEi0J05qlNcwSX6EwAeeA71uizwbGLXHugi9v072Nmg29/JRndQONFdiatb6S8By2S4zrTWHPTHImw2ria5Uc/R9MS6ew3iz6ewNeVzuDCe7N4Ad5Cc9NE+hUkmhJmB/ndZs3VnQxw69DkmVDyCAH45Y4T0ae/3ztFPjYIEVOYVu2CVaTYbVcTYzWWLxguglvQe+FagJslxjTfNGN0d4tM90kOpddmNdgW4Zbg3EHfBqSMLWxnGLcEucleHw9nAufh+th1SZ8Atw8rTrez/m2KbutbsLa8v1QTYbV8qLKvMXdrDDLLHcDJYnaDi/VmuhpjN8FO8IWsANUw9qltcwkzKqV5VQet8RpY/2ty8llt5v1FvgRqCa+akIsZWuQPllOjKeBuRzWfPB23vbp5ge1NzDtdphocBvDxJAYGLfEWf1E99bmNfA0LAFvZx22Y1Dzkwx5E2oYJsp5sB/sBjbZNMdFTJAJSKJEQiIGYqC+Bmx2aYVPwK1VngTbQSIGYiAGYqBhoJo0TZQ+Ud8f1mnMzyAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGYiAGRs3A/wMpp3D77pc5JQAAAABJRU5ErkJggg==
Your example image has properties that iText cannot properly translate into an inlined image. Unfortunately it does not recognize this and outputs an erroneous result PDF.
In particular your image file uses transparency. Inline images don't allow for Mask or SMask entries (which images in PDFs use to represent transparency). Thus, your image as is cannot be used as inline image.
As a result the inline image created by iText only consists of a black rectangle while the transparency information (which contains the line drawing) is dropped.
Furthermore, your image uses a calibrated RGB color space. Such calibrated RGB color spaces cannot be inlined themselves, so the color space definition has to be put into the page resources. iText, though, when creating an inlined image, fails to reference that non-inlined part properly.
As a result the inline image created by iText references a color space by the wrong name, causing the "An error exists on this page" error message by Adobe Reader. Fixing that reference one gets a valid result PDF showing the black rectangle mentioned above.
In a comment you explain that your actual objective is to prevent copying image from the generated pdf document.
In general this obviously is not possible - any information a PDF viewer can access to draw on screen or paper also can be accessed by some PDF processor designed for that task to copy to some file. (Let's ignore proprietary, viewer specific DRM extensions here.)
What you can try, though, is draw your data in such a way that the common PDF viewers don't offer to copy it. Trying to use an inline image as you did is one approach in that direction. Other approaches wrap the image in other structures, e.g. in a pattern:
PdfContentByte canvas = resultandPdfStamper.getOverContent(1);
Rectangle pageSize = resultantPdfReader.getPageSize(1);
PdfPatternPainter painter = canvas.createPattern(pageSize.getWidth(), pageSize.getHeight());
painter.addImage(image);
canvas.setColorFill(new PatternColor(painter));
canvas.rectangle(0, 0, pageSize.getWidth(), pageSize.getHeight());
canvas.fill();
(AddImageInPattern test testAddToPageTest3)
Adobe Acrobat Reader here does not offer to copy that image. Also my (admittedly older) Adobe Acrobat Pro does not offer to copy it, merely to remove it (more exactly, remove the whole rectangle filled with the pattern).
Beware, though, what the common PDF viewers do or don't offer is a moving target...

MigraDOC and pdfSharp, center dynamic text

i was looking for a solution to create a document (PDF) in such why that i can calculate an input field:
The PDF content should look something like this
(all pdf content needs to be centered)
Header
Field1 : not dynamic (needs to be centered)
Field2 : UserName (dynamic- needs to be appended in the center of the
paragraph) - since each user has a different name length
So my questions is , does pdfSharp or migraDoc has a method or something that can align the text to center (meaning that it does some calculates - determine the font-family, font-size and does the magic so that in the end the marked text is centered) ?
If so what is the method since i've searched the migraDoc and pdfSharp documentation and could not find anything like that.
And if such method does not exist, did someone try this? worked with it? has any suggestions how can i achieve this behavior? maybe some source to look from.
Thank you
Sample 1 shows with a lot of code example how to create a pdf and use almost all functionality that Migradoc offers. Sample 2 shows in detail how to create tables, which might also be interesting for you in respect to layout the page's content.
The alignment (center/left/right) is usually done by setting the Format.Alignment property like this:
Paragraph par = new Paragraph();
par.Format.Alignment = ParagraphAlignment.Center;
A short version of a document with centered content would be:
// first you need a document
Document MigraDokument = new Document();
// each document needs at least one section
Section section = MigraDokument.AddSection();
section.PageSetup.PageFormat = PageFormat.A4;
// and then you add paragraphs to the section
Paragraph par = section.AddParagraph();
// and set the alignment as you wish
par.Format.Alignment = ParagraphAlignment.Center;
// now just fill it with content and set the rest of the parameters...
par.AddText("text");

How to write XMP metadata to a PSD? (JAVA)

I have a photoshop file that I want to be able to change 2 text values with a java program. Opening the PSD with a text editor I can find the text that I want to change. LayerText Eighty, LayerText Nine
I hid some content with blue for privacy reasons. If I use exiftool gui i see [this][2]. So I assumed it was under TextLayerText. In photoshop they are [text layers.][3] I did some research and heard about Sanselan in apache commons. I can find the same code that I found in my [text editor][4].
File imageFile = new File(fileField.getText());
File outputFile = new File(fileField.getText().split("\\.")[0] + ".png");
BufferedImage image = Sanselan.getBufferedImage(imageFile);
logArea.append("--- XMP Metadata ----\n");
logArea.append(Sanselan.getXmpXml(imageFile));
Map params = new HashMap();
params.put("TextLayerText", "");
Sanselan.writeImage(image, outputFile, ImageFormat.IMAGE_FORMAT_PNG, params);
This is the code I currently have. It declares 2 files first is input and 2nd is output. It gets the XMP and prints it out. I create a params Map but my error is.
org.apache.sanselan.ImageWriteException: Unknown parameter: TextLayerText
The goal of this program is to modify the 2 text layers and render the png from this. It renders the png file if i leave the params blank, and i can read the params with Sanselan.getXmpXml. I am struggling to find a way to change them though. I put all pictures in one because of my reputation I can't post more than 2 links.

Accessing "alternate text" for an image via PDFBox

Is there some way to extract "alternate text" for a specific image using PDFBox?
I have a PDF file which, as described at http://www.w3.org/WAI/GL/2011/WD-WCAG20-TECHS-20110621/pdf.html#PDF1, has had alternate text added to an image. Using PDFBox I can find my way through the object model to the image itself (a PDXObjectImage) through PDFDocument.getDocumentCatalog().getAllPages() [iterator] .getResources.getImages() but I can not see any way to get from the image itself to the alternate text for it.
A small sample PDF (with a single image which has some alternate text specified) can be found at http://dl.dropbox.com/u/12253279/image_test_pass.pdf (It should say "This is the alternate text for the image.").
I do not know how/if this can be done with PDFBox, but I can tell you that this feature is related to the sections of the PDF Spec called Logical Structutre/Tagged PDF, which is not fully supported in every PDF tool out-there.
Assuming it is supported by the tool you are using, you will have to follow 4 main steps to retrieve this information (I will use the sample PDF file you posted for the following explanation).
Assuming you have access to the internal structure of the PDF file, you will need to:
1- Parse the page content and find the MCID number of the Tag element that wraps the image you are interested in.
Page content:
BT
/P <</MCID 0 >>BDC
/GS0 gs
/TT0 1 Tf
0.0004 Tc -0.0028 Tw 10.02 0 0 10.02 90 711 Tm
(This is an image test )Tj
EMC
ET
/Figure <</MCID 1 >>BDC
q
106.5 0 0 106.5 90 591.0599976 cm
/Im0 Do
Q
EMC
Your image:
2- In the page object, retrieve the key StructParents.
3- Now retrieve the Structure Tree (key StructTreeRoot of the Catalog object, which is the root object in every PDF file), and inside it, the ParentTree.
4- The ParentTree starts with an array where you can find pairs of elements (See Number Trees in the PDF Spec for more details). In this specific tree, the first element of each pair is a numeric value that corresponds to the StructParents key retrieved in step 2, and the second element is an array of objects, where the indexes correspond to the MCID values retreived in step 1. So, You will search here the element that corresponds to the MCID value of your image, and you will find a PDF object. Inside this object, you will find the alternate text.
Looks easy, isn't it?
Tools used in this answer:
PDF Vole (based on iText)
Amyuni PDF Analyzer
Eric from the PDFBox mailing list sent me the following, though I've not tested it out yet...
Hi,
For your test file, here is a way to access "/Alt" entry :
PDDocument document = PDDocument.load("image_test_pass.pdf");
PDStructureTreeRoot treeRoot =
document.getDocumentCatalog().getStructureTreeRoot();
// get page for each StructElement
for (Object o : treeRoot.getKids()) {
if (o instanceof PDStructureElement) {
PDStructureElement structElement = (PDStructureElement)o;
System.out.println(structElement.getAlternateDescription());
PDPage page = structElement.getPage();
if (page != null) {
page.getResources().getImages();
}
}
}
Please refer to the PDF specification http://www.adobe.com/devnet/acrobat/pdfs/PDF32000_2008.pdf and in particular §14.6, §14.7,
§14.9.3 and §14.9.4 to know all the rules in order to find the "/Alt"
entry. There seems to have several way to define this information.
BR,
Eric

Categories