Font problem with renderImage function pdfbox

Font problem with renderImage function pdfbox - java

I have an error when i read a page from a PDF document. this page contains a bar code which is
done with a font (AAAAAC+Code3de9). this error appear only when i use the renderImage function.
I use the 2.0.17 version of pdfbox-app.
*déc. 02, 2019 9:34:13 AM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
AVERTISSEMENT: Could not read embedded OTF for font AAAAAC+Code3de9
java.io.IOException: Illegal seek position: 2483278652
at org.apache.fontbox.ttf.MemoryTTFDataStream.seek(MemoryTTFDataStream.java:164)
at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:352)
at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:173)
at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:150)
at org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:79)
at org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:27)
at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:106)
at org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:73)
at org.apache.pdfbox.pdmodel.font.PDCIDFontType2.<init>(PDCIDFontType2.java:112)
at org.apache.pdfbox.pdmodel.font.PDCIDFontType2.<init>(PDCIDFontType2.java:65)
at org.apache.pdfbox.pdmodel.font.PDFontFactory.createDescendantFont(PDFontFactory.java:139)
at org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:192)
at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:97)
at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:146)
at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:61)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:872)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:506)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:480)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:153)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:268)
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:321)
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243)
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:203)
at patrick.mart1.impose.ImposeKosmedias$1.run(ImposeKosmedias.java:370)
déc. 02, 2019 9:34:13 AM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 findFontOrSubstitute
AVERTISSEMENT: Using fallback font LiberationSans for CID-keyed TrueType font AAAAAC+Code3de9*
Many thanks for your help

This is based on the RemoveAllText.java example from the source code download. It removes the selection of F2 in the content stream, and also removes the font in the resources. It makes the assumption that F2 is not really used, i.e. that there is no text related to F2. Compared to the official example, only "createTokensWithoutText" has been changed. I kept all the names even if the meaning is different, except for the class name.
So this code is really just for this file, or for files generated similarly.
public final class RemoveFontF2
{
/**
* Default constructor.
*/
private RemoveFontF2()
{
// example class should not be instantiated
}
/**
* This will remove all text from a PDF document.
*
* #param args The command line arguments.
*
* #throws IOException If there is an error parsing the document.
*/
public static void main(String[] args) throws IOException
{
if (args.length != 2)
{
usage();
}
else
{
PDDocument document = PDDocument.load(new File(args[0]));
if (document.isEncrypted())
{
System.err.println(
"Error: Encrypted documents are not supported for this example.");
System.exit(1);
}
for (PDPage page : document.getPages())
{
List<Object> newTokens = createTokensWithoutText(page);
PDStream newContents = new PDStream(document);
writeTokensToStream(newContents, newTokens);
page.setContents(newContents);
processResources(page.getResources());
}
document.save(args[1]);
document.close();
}
}
private static void processResources(PDResources resources) throws IOException
{
for (COSName name : resources.getXObjectNames())
{
PDXObject xobject = resources.getXObject(name);
if (xobject instanceof PDFormXObject)
{
PDFormXObject formXObject = (PDFormXObject) xobject;
writeTokensToStream(formXObject.getContentStream(),
createTokensWithoutText(formXObject));
processResources(formXObject.getResources());
}
}
for (COSName name : resources.getPatternNames())
{
PDAbstractPattern pattern = resources.getPattern(name);
if (pattern instanceof PDTilingPattern)
{
PDTilingPattern tilingPattern = (PDTilingPattern) pattern;
writeTokensToStream(tilingPattern.getContentStream(),
createTokensWithoutText(tilingPattern));
processResources(tilingPattern.getResources());
}
}
}
private static void writeTokensToStream(PDStream newContents, List<Object> newTokens) throws IOException
{
OutputStream out = newContents.createOutputStream(COSName.FLATE_DECODE);
ContentStreamWriter writer = new ContentStreamWriter(out);
writer.writeTokens(newTokens);
out.close();
}
private static List<Object> createTokensWithoutText(PDContentStream contentStream) throws IOException
{
PDFStreamParser parser = new PDFStreamParser(contentStream);
Object token = parser.parseNextToken();
List<Object> newTokens = new ArrayList<Object>();
while (token != null)
{
if (token instanceof Operator)
{
Operator op = (Operator) token;
String opName = op.getName();
if (OperatorName.SET_FONT_AND_SIZE.equals(opName) &&
newTokens.get(newTokens.size() - 2).equals(COSName.getPDFName("F2")))
{
// remove the 2 arguments to this operator
newTokens.remove(newTokens.size() - 1);
newTokens.remove(newTokens.size() - 1);
token = parser.parseNextToken();
continue;
}
}
newTokens.add(token);
token = parser.parseNextToken();
}
// remove F2
COSBase fontBase = contentStream.getResources().getCOSObject().getItem(COSName.FONT);
if (fontBase instanceof COSDictionary)
{
((COSDictionary) fontBase).removeItem(COSName.getPDFName("F2"));
}
return newTokens;
}
/**
* This will print the usage for this document.
*/
private static void usage()
{
System.err.println("Usage: java " + RemoveFontF2.class.getName() + " <input-pdf> <output-pdf>");
}
}

Related

How to get footnote hyperlink while reading a Word document using Apache POI?

I am using Apache POI to convert a Word document to HTML. I have a Word document that has a footnote which includes an external hyperlink. I am not able to get the hyperlink URL for that hyperlink. Here is my code:
List<CTHyperlink> links = paragraph.getCTP().getHyperlinkList();
log.debug("Count of hyperlinks="+links.size());
for (CTHyperlink ctHyperlink : links) {
String rId = ctHyperlink.getId();
log.debug("rid="+rId);
XWPFHyperlink link = document.getHyperlinkByID(rId);
if(link!=null) {
log.debug("link not NULL");
}else {
log.debug("link is NULL");
}
}
From the above code, I see that in my case, the count of hyperlinks is 2. I am getting the rId correctly as "rId1" and "rId2" but link is always coming as NULL.
In the OOXML, I see that the hyperlinks in the document are stored in package name "/word/_rels/document.xml.rels" while hyperlinks in the footnote are stored in the package name "/word/_rels/footnotes.xml.rels". Probably that is the reason why my link variable is coming as NULL. But I am not sure how to get the hyperlink element from the footnote relationship package.

You are correct. If the paragraph in your code snippet is in a XWPFAbstractFootnoteEndnote then it is in package part /word/footnotes.xml or /word/endnotes.xml and not in /word/document.xml. And XWPFDocument.getHyperlinkByID only gets the hyperlinks stored in /word/document.xml.
The solution depends on where the paragraph in your code snippet is coming from. This you are not showing.
But simplest solution would be to get the XWPFHyperlinkRun from the XWPFParagraph and then get the XWPFHyperlink from that XWPFHyperlinkRun. If the parent package part of the XWPFHyperlinkRun is not the XWPFDocument then this must be done using underlying PackageRelationship since a hyperlink list only exists for XWPFDocument until now.
In Unable to read all content in order of a word document (docx) in Apache POI I have shown a basic example for how to traverse a Worddocument. This code I have extended now to traverse footnotes and endnotes as well as headers and footers and to handle found XWPFHyperlinkRuns.
Example:
import java.io.FileInputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
import org.apache.poi.openxml4j.opc.PackageRelationship;
import java.util.List;
public class WordTraverseAll {
static void traversePictures(List<XWPFPicture> pictures) throws Exception {
for (XWPFPicture picture : pictures) {
System.out.println(picture);
XWPFPictureData pictureData = picture.getPictureData();
System.out.println(pictureData);
}
}
static void traverseComments(XWPFRun run) throws Exception {
CTMarkup comr = null;
if (run.getCTR().getCommentReferenceList().size() > 0) {
comr = run.getCTR().getCommentReferenceList().get(0);
}
if (comr != null) {
XWPFComment comment = run.getDocument().getCommentByID(String.valueOf(comr.getId().intValue()));
System.out.println("Comment from " + comment.getAuthor() + ": " + comment.getText());
}
}
static void traverseFootnotes(XWPFRun run) throws Exception {
CTFtnEdnRef ftn = null;
if (run.getCTR().getFootnoteReferenceList().size() > 0) {
ftn = run.getCTR().getFootnoteReferenceList().get(0);
} else if (run.getCTR().getEndnoteReferenceList().size() > 0) {
ftn = run.getCTR().getEndnoteReferenceList().get(0);
}
if (ftn != null) {
XWPFAbstractFootnoteEndnote footnote =
ftn.getDomNode().getLocalName().equals("footnoteReference") ?
run.getDocument().getFootnoteByID(ftn.getId().intValue()) :
run.getDocument().getEndnoteByID(ftn.getId().intValue());
for (XWPFParagraph paragraph : footnote.getParagraphs()) {
traverseRunElements(paragraph.getIRuns());
}
}
}
static void traverseRunElements(List<IRunElement> runElements) throws Exception {
for (IRunElement runElement : runElements) {
if (runElement instanceof XWPFFieldRun) {
XWPFFieldRun fieldRun = (XWPFFieldRun)runElement;
//System.out.println(fieldRun.getClass().getName());
System.out.println(fieldRun);
traversePictures(fieldRun.getEmbeddedPictures());
} else if (runElement instanceof XWPFHyperlinkRun) {
XWPFHyperlinkRun hyperlinkRun = (XWPFHyperlinkRun)runElement;
//System.out.println(hyperlinkRun.getClass().getName());
String rId = hyperlinkRun.getHyperlinkId();
XWPFHyperlink hyperlink = null;
if (hyperlinkRun.getParent().getPart() instanceof XWPFAbstractFootnotesEndnotes) {
PackageRelationship rel = hyperlinkRun.getParent().getPart().getPackagePart().getRelationships().getRelationshipByID(rId);
hyperlink = new XWPFHyperlink(rId, rel.getTargetURI().toString());
} else if (hyperlinkRun.getParent().getPart() instanceof XWPFHeaderFooter) {
PackageRelationship rel = hyperlinkRun.getParent().getPart().getPackagePart().getRelationships().getRelationshipByID(rId);
hyperlink = new XWPFHyperlink(rId, rel.getTargetURI().toString());
} else if (hyperlinkRun.getParent().getPart() instanceof XWPFDocument) {
hyperlink = hyperlinkRun.getDocument().getHyperlinkByID(rId);
}
System.out.print(hyperlinkRun);
if (hyperlink != null) System.out.println("->" + hyperlink.getURL());
traversePictures(hyperlinkRun.getEmbeddedPictures());
} else if (runElement instanceof XWPFRun) {
XWPFRun run = (XWPFRun)runElement;
//System.out.println(run.getClass().getName());
System.out.println(run);
traverseFootnotes(run);
traverseComments(run);
traversePictures(run.getEmbeddedPictures());
} else if (runElement instanceof XWPFSDT) {
XWPFSDT sDT = (XWPFSDT)runElement;
System.out.println(sDT);
System.out.println(sDT.getContent());
//ToDo: The SDT may have traversable content too.
}
}
}
static void traverseTableCells(List<ICell> tableICells) throws Exception {
for (ICell tableICell : tableICells) {
if (tableICell instanceof XWPFSDTCell) {
XWPFSDTCell sDTCell = (XWPFSDTCell)tableICell;
System.out.println(sDTCell);
//ToDo: The SDTCell may have traversable content too.
} else if (tableICell instanceof XWPFTableCell) {
XWPFTableCell tableCell = (XWPFTableCell)tableICell;
//System.out.println(tableCell);
traverseBodyElements(tableCell.getBodyElements());
}
}
}
static void traverseTableRows(List<XWPFTableRow> tableRows) throws Exception {
for (XWPFTableRow tableRow : tableRows) {
//System.out.println(tableRow);
traverseTableCells(tableRow.getTableICells());
}
}
static void traverseBodyElements(List<IBodyElement> bodyElements) throws Exception {
for (IBodyElement bodyElement : bodyElements) {
if (bodyElement instanceof XWPFParagraph) {
XWPFParagraph paragraph = (XWPFParagraph)bodyElement;
//System.out.println(paragraph);
traverseRunElements(paragraph.getIRuns());
} else if (bodyElement instanceof XWPFSDT) {
XWPFSDT sDT = (XWPFSDT)bodyElement;
System.out.println(sDT);
System.out.println(sDT.getContent());
//ToDo: The SDT may have traversable content too.
} else if (bodyElement instanceof XWPFTable) {
XWPFTable table = (XWPFTable)bodyElement;
//System.out.println(table);
traverseTableRows(table.getRows());
}
}
}
static void traverseHeaderFooterElements(XWPFDocument document) throws Exception {
for (XWPFHeader header : document.getHeaderList()) {
traverseBodyElements(header.getBodyElements());
}
for (XWPFFooter footer : document.getFooterList()) {
traverseBodyElements(footer.getBodyElements());
}
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("WordHavingHyperlinks.docx"));
System.out.println("===== Document body elements =====");
traverseBodyElements(document.getBodyElements());
System.out.println("===== Header and footer elements =====");
traverseHeaderFooterElements(document);
document.close();
}
}

Extract text from pdf file by pdfbox

i am facing an issue in pdf reading.
public class GetLinesFromPDF extends PDFTextStripper {
static List<String> lines = new ArrayList<String>();
Map<String, String> auMap = new HashMap();
boolean objFlag = false;
public GetLinesFromPDF() throws IOException {
}
/**
* #throws IOException If there is an error parsing the document.
*/
public static void main(String[] args) throws IOException {
PDDocument document = null;
String fileName = "E:\\sample.pdf";
try {
int i;
document = PDDocument.load(new File(fileName));
PDFTextStripper stripper = new GetLinesFromPDF();
stripper.setSortByPosition(true);
stripper.setStartPage(0);
stripper.setEndPage(document.getNumberOfPages());
Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
stripper.writeText(document, dummy);
// print lines
for (String line : lines) {
//System.out.println("line = " + line);
if (line.matches("(.*)Objection(.*)")) {
System.out.println(line);
withObjection(lines);
//System.out.println("iiiiiiiiiiii");
break;
}
//System.out.println("uuuuuuuuuuuuuu");
}
} finally {
if (document != null) {
document.close();
}
}
}
/**
* Override the default functionality of PDFTextStripper.writeString()
*/
#Override
protected void writeString(String string, List<TextPosition> textPositions) throws IOException {
System.out.println("textPositions = " + string);
// System.out.println("tex "+textPositions.get(0).getFont()+ getArticleEnd());
// you may process the line here itself, as and when it is obtained
}
}
in need a output like
My pdf have some title, we need to skip the same.
pdf file content is
how to extract text as in separate formats as specified.
thanks in advance.

replace a text in MS word Templete(Docx) using java

I am trying to search a string in docx and replace with some other text using java apache poi but it is replacing randomly
getting error as arrayIndexoutofbound Exception in line
"declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:ffData/w:name/#w:val")[0];
public class WordReplaceTextInFormFields {
private static void replaceFormFieldText(XWPFDocument document, String ffname, String text) {
boolean foundformfield = false;
for (XWPFParagraph paragraph : document.getParagraphs()) {
for (XWPFRun run : paragraph.getRuns()) {
XmlCursor cursor = run.getCTR().newCursor();
cursor.selectPath(
"declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:fldChar/#w:fldCharType");
while (cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
if ("begin".equals(((SimpleValue) obj).getStringValue())) {
cursor.toParent();
obj = cursor.getObject();
obj = obj.selectPath(
"declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:ffData/w:name/#w:val")[0];
if (ffname.equals(((SimpleValue) obj).getStringValue())) {
foundformfield = true;
} else {
foundformfield = false;
}
} else if ("end".equals(((SimpleValue) obj).getStringValue())) {
if (foundformfield)
return;
foundformfield = false;
}
}
if (foundformfield && run.getCTR().getTList().size() > 0) {
run.getCTR().getTList().get(0).setStringValue(text);
// System.out.println(run.getCTR());
}
}
}
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("WordTemplate.docx"));
replaceFormFieldText(document, "Text1", "Моя Компания");
replaceFormFieldText(document, "Text2", "Аксель Джоачимович Рихтер");
replaceFormFieldText(document, "Text3", "Доверенность");
document.write(new FileOutputStream("WordReplaceTextInFormFields.docx"));
document.close();
}
}
it misses some string, it not replaces entire document..please help with sample code

I do something similar in my project at https://github.com/centic9/poi-mail-merge which provides a general mail-merge functionality based on POI. It is using a bit different functionality from XmlBeans which replaces strings in the full XML-content of the document instead of each paragraph separately.
private static void appendBody(CTBody src, String append, boolean first) throws XmlException {
XmlOptions optionsOuter = new XmlOptions();
optionsOuter.setSaveOuter();
String srcString = src.xmlText();
String prefix = srcString.substring(0,srcString.indexOf(">")+1);
final String mainPart;
// exclude template itself in first appending
if(first) {
mainPart = "";
} else {
mainPart = srcString.substring(srcString.indexOf(">")+1,srcString.lastIndexOf("<"));
}
String suffix = srcString.substring( srcString.lastIndexOf("<") );
String addPart = append.substring(append.indexOf(">") + 1, append.lastIndexOf("<"));
CTBody makeBody = CTBody.Factory.parse(prefix+mainPart+addPart+suffix);
src.set(makeBody);
}
}
See line 132 in MailMerge.java

Read Shapes (Rectangle,Square,Circle,Arrow etc), Clip Arts from MS Word File using java

I am able to get image from ms word table but unable to get shapes and clip-arts.
public static void main(String[] args) throws Exception {
// The path to the documents directory.
try {
String dataDir = "E://test//demo.docx";
generatePicturesAsImages(dataDir);
} catch (Exception e) {
e.printStackTrace();
}
}
public static void generatePicturesAsImages(String sourcePath) {
try {
Document doc = new Document(sourcePath);
ImageSaveOptions options = new ImageSaveOptions(SaveFormat.JPEG);
options.setJpegQuality(100);
options.setResolution(100);
// options.setUseHighQualityRendering(true);
List<ShapeRenderer> pictures = getAllPictures(doc);
if (pictures != null) {
for (int i = 0; i < pictures.size(); i++) {
ShapeRenderer picture = pictures.get(i);
String imageFilePath = sourcePath + "_output_" + i + ".jpeg";
picture.save(imageFilePath, options);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
private static List<ShapeRenderer> getAllPictures(final Document document) throws Exception {
List<ShapeRenderer> images = null;
#SuppressWarnings("unchecked")
NodeCollection<DrawingML> nodeCollection = document.getChildNodes(NodeType.DRAWING_ML, Boolean.TRUE);
if (nodeCollection.getCount() > 0) {
images = new ArrayList<ShapeRenderer>();
for (DrawingML drawingML : nodeCollection) {
images.add(drawingML.getShapeRenderer());
}
}
return images;
}
Above program is getting images from table so what should i add more to get the shapes.. Please suggest me any help will be appreciate !

You are using an older version of Aspose.Words. If you want to use older version of Aspose.Words, please get the collection of Shape and DrawingML nodes using Document.getChildNodes in your getAllPictures method.
NodeCollection<DrawingML> drwingmlnodes = document.getChildNodes(NodeType.DRAWING_ML, Boolean.TRUE);
NodeCollection<Shape> shapenodes = document.getChildNodes(NodeType.SHAPE, Boolean.TRUE);
Note that we removed the DrawingML from our APIs in Aspose.Words 15.2.0. If you want to use latest version of Aspose.Words v16.5.0, please only use NodeType.SHAPE.
I work with Aspose as Developer evangelist.

How do I add an ICC to an existing PDF document

I have an existing PDF document that is using CMYK colors. It was created using a specific ICC profile, which I have obtained. The colors are obviously different if I open the document with the profile active than without. From what I can tell using a variety of tools, there is no ICC profile embedded in the document. What I would like to do is embed the ICC profile in the PDF so that it can be opened and viewed with the correct colors by third parties. My understanding is that this is possible to do with the PDF format, but nothing I have tried seems to work.
I wrote a small program using PDFBox based on looking at some examples, but it seems to have no effect. I feel like I am missing a step somewhere.
package com.mapsherpa.tools.addicc;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.InputStream;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.common.PDMetadata;
import org.apache.pdfbox.pdmodel.graphics.color.PDOutputIntent;
import java.io.FileInputStream;
import java.io.IOException;
public class AddICC {
public AddICC() {
// TODO Auto-generated constructor stub
}
public static void main(String[] args) {
AddICC app = new AddICC();
try {
if( args.length != 3) {
app.usage();
} else {
app.doIt(args[0], args[1], args[2]);
}
}
catch (Exception e) {
e.printStackTrace();
}
}
private void doIt(String input, String output, String icc) throws IOException {
// TODO Auto-generated method stub
System.out.printf("Adding %s to %s and saving as %s\n", icc, input, output);
PDDocument doc = null;
try
{
File file = new File(input);
doc = PDDocument.load(file);
PDDocumentCatalog cat = doc.getDocumentCatalog();
PDMetadata metadata = new PDMetadata(doc);
cat.setMetadata(metadata);
InputStream colorProfile = new FileInputStream(icc);
PDOutputIntent oi = new PDOutputIntent(doc, colorProfile);
oi.setInfo("SWOP (Coated), 20%, GCR, None");
oi.setOutputCondition("SWOP (Coated), 20%, GCR, None");
oi.setOutputConditionIdentifier("SWOP (Coated), 20%, GCR, None");
oi.setRegistryName("http://www.color.org");
cat.addOutputIntent(oi);
doc.save(output);
System.out.println("Finished adding color profile");
}
catch (Exception e)
{
System.out.println("Exception processing color profile");
e.printStackTrace();
}
finally
{
if (doc != null) {
doc.close();
}
}
}
private void usage() {
// TODO Auto-generated method stub
System.err.println("Usage: " + this.getClass().getName() + " <input-file> <output-file> <icc-file>");
}
}
I'm not a Java expert but I did manage to get this to run and it seems to do something but I still am not seeing the correct colors and there is no indication using imagemagick or pdfinfo that it has a color profile.
I feel like somehow I should be indicating that the document color space is ICCBased but I can't see any obvious way to do that using the PDFBox API.
Any help would be appreciated (even being told that it won't work!)
EDIT:
I believe that this is working as written in that it adds the required output intent to the document. However, I have also discovered that this is not what I need - I now believe that I need it to add an /ICCBased stream to the PDF - sigh. The updated code below is based on this stackoverflow question's updated createColorSpace function.
private static PDColorSpace createColorSpace( PDDocument doc, ColorSpace cs ) throws IOException
{
PDColorSpace retval = null;
if( cs.isCS_sRGB() )
{
retval = PDDeviceRGB.INSTANCE;
}
else if( cs instanceof ICC_ColorSpace )
{
ICC_ColorSpace ics = (ICC_ColorSpace)cs;
// CREATING MANUALLY THE COS ARR ****************************
COSArray cosArray = new COSArray();
cosArray.add(COSName.ICCBASED);
PDStream pdStream = new PDStream(doc);
cosArray.add(pdStream.getStream());
// USING DIFFERENT CONSTRUTOR *******************************
PDICCBased pdCS = new PDICCBased( cosArray );
retval = pdCS;
COSArray ranges = new COSArray();
for( int i=0; i<cs.getNumComponents(); i++ )
{
ranges.add( new COSFloat( ics.getMinValue( i ) ) );
ranges.add( new COSFloat( ics.getMaxValue( i ) ) );
}
PDStream iccData = pdCS.getPDStream();
OutputStream output = null;
try
{
output = ((COSStream)iccData.getCOSObject()).createFilteredStream();
output.write( ics.getProfile().getData() );
}
finally
{
if( output != null )
{
output.close();
}
}
pdCS.setNumberOfComponents( cs.getNumComponents() );
}
else
{
throw new IOException( "Not yet implemented:" + cs );
}
return retval;
}
private void doIt(String input, String output, String icc) throws IOException {
// TODO Auto-generated method stub
System.out.printf("Adding %s to %s and saving as %s\n", icc, input, output);
PDDocument doc = null;
try
{
File file = new File(input);
doc = PDDocument.load(file);
ICC_ColorSpace iccColorSpace = new ICC_ColorSpace(ICC_Profile.getInstance(icc));
PDColorSpace colorSpace = createColorSpace(doc, iccColorSpace);
doc.save(output);
System.out.println("Finished adding color profile");
}
catch (Exception e)
{
System.out.println("Exception processing color profile");
e.printStackTrace();
}
finally
{
if (doc != null) {
doc.close();
}
}
}
This code now has an exception:
java.io.IOException: Unknown color space number of components:-1
at org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.getAlternateColorSpace(PDICCBased.java:269)
at org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.loadICCProfile(PDICCBased.java:151)
at org.apache.pdfbox.pdmodel.graphics.color.PDICCBased.<init>(PDICCBased.java:89)
at com.mapsherpa.tools.addicc.AddICC.createColorSpace(AddICC.java:65)
at com.mapsherpa.tools.addicc.AddICC.doIt(AddICC.java:109)
at com.mapsherpa.tools.addicc.AddICC.main(AddICC.java:39)
at this line of code:
cosArray.add(pdStream.getStream());
The only difference I can see between this code and the other answer is that I am loading an existing PDF document rather than creating a new empty one.
For testing, I'm using the US Web (Coated) SWOP v2 icc profile from Adobe, but it is the same exception with any profile I test. From my understanding of reading the PDFBox source, it isn't a problem with the profile but with reading the stream from the document (which doesn't have an /ICCBased stream, the whole point of this question :))
EDIT 2: the code above does actually run without exceptions if used with PDFBox 1.8.10 - apparently I had linked in 2.0.0 RC2 without realizing it (total Java newbie).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Font problem with renderImage function pdfbox - java

Related

How to get footnote hyperlink while reading a Word document using Apache POI?

Extract text from pdf file by pdfbox

replace a text in MS word Templete(Docx) using java

Read Shapes (Rectangle,Square,Circle,Arrow etc), Clip Arts from MS Word File using java

How do I add an ICC to an existing PDF document

Categories

Resources