How to add text outlines to text within Powerpoint via Apache POI:

How to add text outlines to text within Powerpoint via Apache POI: - java

Does anyone have an idea how we can add outlines to text (text outline) within powerpoint templates (ppxt) using Apache POI? What I have gathered so far is that the XSLFTextRun class does not have a method to get/ set the text outline for a given run element.
And as such, I could only persist the following font/ text styles:
def fontStyles(textBox: XSLFTextBox, textRun: XSLFTextRun): Unit = {
val fontFamily = textRun.getFontFamily
val fontColor = textRun.getFontColor
val fontSize = textRun.getFontSize
val fontBold = textRun.isBold
val fontItalic = textRun.isItalic
val textAlign = textRun.getParagraph.getTextAlign
textBox.getTextParagraphs.foreach { p =>
p.getTextRuns.foreach { tr =>
tr.setFontFamily(fontFamily)
tr.setFontColor(fontColor)
tr.setFontSize(fontSize)
tr.setBold(fontBold)
tr.setItalic(fontItalic)
tr.getParagraph.setTextAlign(textAlign)
}
}
}
Is it possible to add text outline?
Any assistance/ suggestions would be highly appreciated.

Apache poi uses underlying ooxml-schemas classes. Those are auto generated from Office Open XML standard. So they are more complete than the high level XSLF classes. Of course they are much less convenient.
So if somewhat is not implemented in high level XSLF classes, we can get the underlying CT classes and do it using those. In case of XSLFTextRun we can get the CTRegularTextRun object. Then we can look whether there are run properties already. If not, we add one. Then we look whether there is outline set already. If so, we unset it, because we want set it new. Then we set a new outline. This simply is a line having a special color. That line is represented by CTLineProperties object. So we need to have methods to create that CTLineProperties, to set CTLineProperties to the XSLFTextRun and get CTLineProperties from XSLFTextRun.
Complete example using Java code:
import java.io.FileOutputStream;
import java.io.FileInputStream;
import org.apache.poi.xslf.usermodel.*;
import org.apache.poi.sl.usermodel.*;
import java.awt.Rectangle;
public class PPTXTextRunOutline {
static org.openxmlformats.schemas.drawingml.x2006.main.CTLineProperties createSolidFillLineProperties(java.awt.Color color) {
// create new CTLineProperties
org.openxmlformats.schemas.drawingml.x2006.main.CTLineProperties lineProperties
= org.openxmlformats.schemas.drawingml.x2006.main.CTLineProperties.Factory.newInstance();
// set line solid fill color
lineProperties.addNewSolidFill().addNewSrgbClr().setVal(new byte[]{(byte)color.getRed(), (byte)color.getGreen(), (byte)color.getBlue()});
return lineProperties;
}
static void setOutline(XSLFTextRun run, org.openxmlformats.schemas.drawingml.x2006.main.CTLineProperties lineProperties) {
// get underlying CTRegularTextRun object
org.openxmlformats.schemas.drawingml.x2006.main.CTRegularTextRun ctRegularTextRun
= (org.openxmlformats.schemas.drawingml.x2006.main.CTRegularTextRun)run.getXmlObject();
// Are there run properties already? If not, add one.
if (ctRegularTextRun.getRPr() == null) ctRegularTextRun.addNewRPr();
// Is there outline set already? If so, unset it, because we are creating it new.
if (ctRegularTextRun.getRPr().isSetLn()) ctRegularTextRun.getRPr().unsetLn();
// set a new outline
ctRegularTextRun.getRPr().setLn(lineProperties);
}
static org.openxmlformats.schemas.drawingml.x2006.main.CTLineProperties getOutline(XSLFTextRun run) {
// get underlying CTRegularTextRun object
org.openxmlformats.schemas.drawingml.x2006.main.CTRegularTextRun ctRegularTextRun
= (org.openxmlformats.schemas.drawingml.x2006.main.CTRegularTextRun)run.getXmlObject();
// Are there run properties already? If not, return null.
if (ctRegularTextRun.getRPr() == null) return null;
// get outline, may be null
org.openxmlformats.schemas.drawingml.x2006.main.CTLineProperties lineProperties = ctRegularTextRun.getRPr().getLn();
// make a copy to avoid orphaned exceptions or value disconnected exception when set to its own XML parent
if (lineProperties != null) lineProperties = (org.openxmlformats.schemas.drawingml.x2006.main.CTLineProperties)lineProperties.copy();
return lineProperties;
}
// your method fontStyles taken to Java code
static void fontStyles(XSLFTextRun templateRun, XSLFTextShape textShape) {
String fontFamily = templateRun.getFontFamily();
PaintStyle fontColor = templateRun.getFontColor();
Double fontSize = templateRun.getFontSize();
boolean fontBold = templateRun.isBold();
boolean fontItalic = templateRun.isItalic();
TextParagraph.TextAlign textAlign = templateRun.getParagraph().getTextAlign();
org.openxmlformats.schemas.drawingml.x2006.main.CTLineProperties lineProperties = getOutline(templateRun);
for (XSLFTextParagraph paragraph : textShape.getTextParagraphs()) {
for (XSLFTextRun run : paragraph.getTextRuns()) {
run.setFontFamily(fontFamily);
if(run != templateRun) run.setFontColor(fontColor); // set PaintStyle has the issue which I am avoiding by using a copy of the underlying XML
run.setFontSize(fontSize);
run.setBold(fontBold);
run.setItalic(fontItalic);
run.getParagraph().setTextAlign(textAlign);
setOutline(run, lineProperties);
}
}
}
public static void main(String[] args) throws Exception {
XMLSlideShow slideShow = new XMLSlideShow(new FileInputStream("./PPTXIn.pptx"));
XSLFSlide slide = slideShow.getSlides().get(0);
//as in your code, get a template text run and set its font style to all other runs in text shape
if (slide.getShapes().size() > 0) {
XSLFShape shape = slide.getShapes().get(0);
if (shape instanceof XSLFTextShape) {
XSLFTextShape textShape = (XSLFTextShape) shape;
XSLFTextParagraph paragraph = null;
if(textShape.getTextParagraphs().size() > 0) paragraph = textShape.getTextParagraphs().get(0);
if (paragraph != null) {
XSLFTextRun run = null;
if(paragraph.getTextRuns().size() > 0) run = paragraph.getTextRuns().get(0);
if (run != null) {
fontStyles(run, textShape);
}
}
}
}
//new text box having outlined text from scratch
XSLFTextBox textbox = slide.createTextBox();
textbox.setAnchor(new Rectangle(100, 300, 570, 80));
XSLFTextParagraph paragraph = null;
if(textbox.getTextParagraphs().size() > 0) paragraph = textbox.getTextParagraphs().get(0);
if(paragraph == null) paragraph = textbox.addNewTextParagraph();
XSLFTextRun run = paragraph.addNewTextRun();
run.setText("Test text outline");
run.setFontSize(60d);
run.setFontColor(java.awt.Color.YELLOW);
setOutline(run, createSolidFillLineProperties(java.awt.Color.BLUE));
FileOutputStream out = new FileOutputStream("./PPTXOit.pptx");
slideShow.write(out);
out.close();
}
}
Tested and works using current apache poi 5.0.0.

Related

Apache poi PowerPoint slide masters - not generating footer with page number

I have problem with slide master layout which contains page number and static text. I have them on my layouts but they are not appearing once I'm generating presentation. When I go INSTER --> Page Number I can select and they appear. When I'm adding new slide to my generated presentation via PowerPoint (add slide with selected layout), then it's showing page number for that page.
Here is the code:
String fileName = "slidemaster-010";
String templateLocation = "/Users/akonopko/Export/Templates/Presentation10.pptx";
XMLSlideShow ppt = new XMLSlideShow(new FileInputStream(templateLocation));
XSLFSlideMaster defaultMaster = ppt.getSlideMasters().get(0);
XSLFSlideLayout masterLayout = defaultMaster.getLayout("Master");
XSLFSlide slide = ppt.createSlide(masterLayout);
XSLFSlideLayout masterLayout1 = defaultMaster.getLayout("1_Custom Layout");
XSLFSlide slide2 = ppt.createSlide(masterLayout1);
List<XSLFShape> slideShapes = slide2.getShapes();
for (XSLFShape shape : slideShapes) {
if (shape.getPlaceholder() == Placeholder.TITLE) {
((XSLFTextShape) shape).setText("Test Text");
}
}

Apache POI just does not copy the placeholders from the XSLFSlideLayout while XMLSlideShow.createSlide(XSLFSlideLayout). It uses XSLFSlideLayout.copyLayout(XSLFSlide) and there DATETIME, SLIDE_NUMBER and FOOTER is explicitly excluded.
One could write a method which copies those placeholders from the layout. This could look like so:
public void copyPlaceholdersFromLayout(XSLFSlideLayout layout, XSLFSlide slide) {
for (XSLFShape sh : layout.getShapes()) {
if (sh instanceof XSLFTextShape) {
XSLFTextShape tsh = (XSLFTextShape) sh;
Placeholder ph = tsh.getTextType();
if (ph == null) continue;
switch (ph) {
// these are special and not copied by default
case DATETIME:
case SLIDE_NUMBER:
case FOOTER:
System.out.println(ph);
slide.getXmlObject().getCSld().getSpTree().addNewSp().set(tsh.getXmlObject().copy());
break;
default:
//slide.getSpTree().addNewSp().set(tsh.getXmlObject().copy());
//already done
}
}
}
}
Get called like so:
copyPlaceholdersFromLayout(masterLayout1, slide2);
But there might be a reason why apache poi explicitly excludes this.

How do I ADD bullet points to a word document using Apache POI in Java

I have a word document which is used as a template. Inside this template I have some tables that contain predefined bullet points. Now I'm trying to replace the placeholder string with a set of strings.
I'm totally stuck on this. My simplified methods looks like this.
replaceKeyValue.put("[DescriptionOfItem]", new HashSet<>(Collections.singletonList("This is the description")));
replaceKeyValue.put("[AllowedEntities]", new HashSet<>(Arrays.asList("a", "b")));
replaceKeyValue.put("[OptionalEntities]", new HashSet<>(Arrays.asList("c", "d")));
replaceKeyValue.put("[NotAllowedEntities]", new HashSet<>(Arrays.asList("e", "f")));
try (XWPFDocument template = new XWPFDocument(OPCPackage.open(file))) {
template.getTables().forEach(
xwpfTable -> xwpfTable.getRows().forEach(
xwpfTableRow -> xwpfTableRow.getTableCells().forEach(
xwpfTableCell -> replaceInCell(replaceKeyValue, xwpfTableCell)
)
));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
template.write(baos);
return new ByteArrayResource(baos.toByteArray());
} finally {
if (file.exists()) {
file.delete();
}
}
private void replaceInCell(Map<String, Set<String>> replacementsKeyValuePairs, XWPFTableCell xwpfTableCell) {
for (XWPFParagraph xwpfParagraph : xwpfTableCell.getParagraphs()) {
for (Map.Entry<String, Set<String>> replPair : replacementsKeyValuePairs.entrySet()) {
String keyToFind = replPair.getKey();
Set<String> replacementStrings = replacementsKeyValuePairs.get(keyToFind);
if (xwpfParagraph.getText().contains(keyToFind)) {
replacementStrings.forEach(replacementString -> {
XWPFParagraph paragraph = xwpfTableCell.addParagraph();
XWPFRun run = paragraph.createRun();
run.setText(replacementString);
});
}
}
}
I was expecting that some more bullet points will be added to the current cell. Am I missing something? The paragraph is the one containing the placeholder string and format.
Thanks for any help!
UPDATE: This is how part of the template looks like. I would like to automatically search for the terms and replace them. Searching works so far. But trying to replace the bullet points ends in an unlocatable NullPointer.
Would it be easier to use fields? I need to keep the bullet point style though.
UPDATE 2: added download link and updated the code. Seems I can't alter the paragraphs if I'm iterating through them. I get a null-pointer.
Download link: WordTemplate

Since Microsoft Word is very, very "strange" in how it divides text in different runs in it's storage, such questions are not possible to answer without having a complete example including all code and the Word documents in question. Having a general usable code for adding content to Word documents seems not be possible, except all the adding or replacement is only in fields (form fields or content controls or mail merge fields).
So I downloaded your WordTemplate.docx which looks like so:
Then I runned the following code:
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTR;
import org.apache.xmlbeans.XmlCursor;
import java.util.*;
import java.math.BigInteger;
public class WordReadAndRewrite {
static void addItems(XWPFTableCell cell, XWPFParagraph paragraph, Set<String> items) {
XmlCursor cursor = null;
XWPFRun run = null;
CTR cTR = null; // for a deep copy of the run's low level object
BigInteger numID = paragraph.getNumID();
int indentationLeft = paragraph.getIndentationLeft();
int indentationHanging = paragraph.getIndentationHanging();
boolean first = true;
for (String item : items) {
if (first) {
for (int r = paragraph.getRuns().size()-1; r > 0; r--) {
paragraph.removeRun(r);
}
run = (paragraph.getRuns().size() > 0)?paragraph.getRuns().get(0):null;
if (run == null) run = paragraph.createRun();
run.setText(item, 0);
cTR = (CTR)run.getCTR().copy(); // take a deep copy of the run's low level object
first = false;
} else {
cursor = paragraph.getCTP().newCursor();
boolean thereWasParagraphAfter = cursor.toNextSibling(); // move cursor to next paragraph
// because the new paragraph shall be **after** that paragraph
// thereWasParagraphAfter is true if there is a next paragraph, else false
if (thereWasParagraphAfter) {
paragraph = cell.insertNewParagraph(cursor); // insert new paragraph if there are next paragraphs in cell
} else {
paragraph = cell.addParagraph(); // add new paragraph if there are no other paragraphs present in cell
}
paragraph.setNumID(numID); // set template paragraph's numbering Id
paragraph.setIndentationLeft(indentationLeft); // set template paragraph's indenting from left
if (indentationHanging != -1) paragraph.setIndentationHanging(indentationHanging); // set template paragraph's hanging indenting
run = paragraph.createRun();
if (cTR != null) run.getCTR().set(cTR); // set template paragraph's run formatting
run.setText(item, 0);
}
}
}
public static void main(String[] args) throws Exception {
Map<String, Set<String>> replaceKeyValue = new HashMap<String, Set<String>>();
replaceKeyValue.put("[AllowedEntities]", new HashSet<>(Arrays.asList("allowed 1", "allowed 2", "allowed 3")));
replaceKeyValue.put("[OptionalEntities]", new HashSet<>(Arrays.asList("optional 1", "optional 2", "optional 3")));
replaceKeyValue.put("[NotAllowedEntities]", new HashSet<>(Arrays.asList("not allowed 1", "not allowed 2", "not allowed 3")));
XWPFDocument document = new XWPFDocument(new FileInputStream("WordTemplate.docx"));
List<XWPFTable> tables = document.getTables();
for (XWPFTable table : tables) {
List<XWPFTableRow> rows = table.getRows();
for (XWPFTableRow row : rows) {
List<XWPFTableCell> cells = row.getTableCells();
for (XWPFTableCell cell : cells) {
int countParagraphs = cell.getParagraphs().size();
for (int p = 0; p < countParagraphs; p++) { // do not for each since new paragraphs were added
XWPFParagraph paragraph = cell.getParagraphArray(p);
String placeholder = paragraph.getText();
placeholder = placeholder.trim(); // this is the tricky part to get really the correct placeholder
Set<String> items = replaceKeyValue.get(placeholder);
if (items != null) {
addItems(cell, paragraph, items);
}
}
}
}
}
FileOutputStream out = new FileOutputStream("Result.docx");
document.write(out);
out.close();
document.close();
}
}
The Result.docx looks like so:
The code loops trough the table cells in the Word document and looks for a paragraph which contains exactly the placeholder. This even might be the tricky part since that placeholder might be splitted into differnt text runs by Word. If found it runs a method addItems which takes the found paragraph as a template for numbering and indention (might be incomplter though). Then it sets the first new item in first text run of found paragraph and removes all other text runs which possibly are there. Then it determines wheter new paragraphs must be inserted or added to the cell. For this a XmlCursor is used. In new inserted or added paragrahs the other items are placed and the numbering and indention settings are taken from the placeholder's paragraph.
As said, this is code for showing the principles of how to do. It would must be extended very much to be general usable. In my opinion those trials using text placeholders in Word documents for text replacements are not really good. Placeholders for variable text in Word documents should be fields. This could be form fields, content controls or mail merge fields. Advantage of fields in contrast of text placeholders is that Word knows the fields being entities for variable texts. It will not split them into multiple text runs for multiple strange reasons as it often does with normal text.

How to edit a Hyperlink in a Word Document using Apache POI?

So I've been browsing around the source code / documentation for POI (specifically XWPF) and I can't seem to find anything that relates to editing a hyperlink in a .docx. I only see functionality to get the information for the currently set hyperlink. My goal is to change the hyperlink in a .docx to link to "http://yahoo.com" from "http://google.com" as an example. Any help would be greatly appreciated. Thanks!

I found a way to edit the url of the link in a "indirect way" (copy the previous hyperlink, modify the url, delete the previous hyperlink and add the new one in the paragraph).
Code is shown below:
private void editLinksOfParagraph(XWPFParagraph paragraph, XWPFDocument document) {
for (int rIndex = 0; rIndex < paragraph.getRuns().size(); rIndex++) {
XWPFRun run = paragraph.getRuns().get(rIndex);
if (run instanceof XWPFHyperlinkRun) {
// get the url of the link to edit it
XWPFHyperlink link = ((XWPFHyperlinkRun) run).getHyperlink(document);
String linkURL = link.getURL();
//get the xml representation of the hyperlink that includes all the information
XmlObject xmlObject = run.getCTR().copy();
linkURL += "-edited-link"; //edited url of the link, f.e add a '-edited-link' suffix
//remove the previous link from the paragraph
paragraph.removeRun(rIndex);
//add the new hyperlinked with updated url in the paragraph, in place of the previous deleted
XWPFHyperlinkRun hyperlinkRun = paragraph.insertNewHyperlinkRun(rIndex, linkURL);
hyperlinkRun.getCTR().set(xmlObject);
}
}
}

This requirement needs knowledge about how hyperlinks referring to an external reference get stored in Microsoft Word documents and how this gets represented in XWPF of Apache POI.
The XWPFHyperlinkRun is the representation of a linked text run in a IRunBody. This text run, or even multiple text runs, is/are wrapped with a XML object of type CTHyperlink. This contains a relation ID which points to a relation in the package relations part. This package relation contains the URI which is the hyperlink's target.
Currently (apache poi 5.2.2) XWPFHyperlinkRun provides access to a XWPFHyperlink. But this is very rudimentary. It only has getters for the Id and the URI. It neither provides access to it's XWPFHyperlinkRun and it's IRunBody nor it provides a setter for the target URI in the package relations part. It not even has internally access to it's the package relations part.
So only using Apache POI classes the only possibility currently is to delete the old XWPFHyperlinkRun and create a new one pointing to the new URI. But as the text runs also contain the text formatting, deleting them will also delete the text formatting. It would must be copied from the old XWPFHyperlinkRun to the new before deleting the old one. That's uncomfortable.
So the rudimentary XWPFHyperlink should be extended to provide a setter for the target URI in the package relations part. A new class XWPFHyperlinkExtended could look like so:
import org.apache.poi.xwpf.usermodel.*;
import org.apache.poi.openxml4j.opc.PackageRelationship;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
/**
* Extended XWPF hyperlink class
* Provides access to it's Id, URI, XWPFHyperlinkRun, IRunBody.
* Provides setting target URI in PackageRelationship.
*/
public class XWPFHyperlinkExtended {
private String id;
private String uri;
private XWPFHyperlinkRun hyperlinkRun;
private IRunBody runBody;
private PackageRelationship rel;
public XWPFHyperlinkExtended(XWPFHyperlinkRun hyperlinkRun, PackageRelationship rel) {
this.id = rel.getId();
this.uri = rel.getTargetURI().toString();
this.hyperlinkRun = hyperlinkRun;
this.runBody = hyperlinkRun.getParent();
this.rel = rel;
}
public String getId() {
return this.id;
}
public String getURI() {
return this.uri;
}
public IRunBody getIRunBody() {
return this.runBody;
}
public XWPFHyperlinkRun getHyperlinkRun() {
return this.hyperlinkRun;
}
/**
* Provides setting target URI in PackageRelationship.
* The old PackageRelationship gets removed.
* A new PackageRelationship gets added using the same Id.
*/
public void setTargetURI(String uri) {
this.runBody.getPart().getPackagePart().removeRelationship(this.getId());
this.uri = uri;
PackageRelationship rel = this.runBody.getPart().getPackagePart().addExternalRelationship(uri, XWPFRelation.HYPERLINK.getRelation(), this.getId());
this.rel = rel;
}
}
It does not extend XWPFHyperlink as this is so rudimentary it's not worth it. Furthermore after setTargetURI the String uri needs to be updated. But there is no setter in XWPFHyperlink and the field is only accessible from inside the package.
The new XWPFHyperlinkExtended can be got from XWPFHyperlinkRun like so:
/**
* If this HyperlinkRun refers to an external reference hyperlink,
* return the XWPFHyperlinkExtended object for it.
* May return null if no PackageRelationship found.
*/
/*modifiers*/ XWPFHyperlinkExtended getHyperlink(XWPFHyperlinkRun hyperlinkRun) {
try {
for (org.apache.poi.openxml4j.opc.PackageRelationship rel : hyperlinkRun.getParent().getPart().getPackagePart().getRelationshipsByType(XWPFRelation.HYPERLINK.getRelation())) {
if (rel.getId().equals(hyperlinkRun.getHyperlinkId())) {
return new XWPFHyperlinkExtended(hyperlinkRun, rel);
}
}
} catch (org.apache.poi.openxml4j.exceptions.InvalidFormatException ifex) {
// do nothing, simply do not return something
}
return null;
}
Once we have that XWPFHyperlinkExtended we can set an new target URI using it's method setTargetURI.
A further problem results from the fact, that the XML object of type CTHyperlink can wrap around multiple text runs, not only one. Then multiple XWPFHyperlinkRun are in one CTHyperlink and point to one target URI. For example this could look like:
... [this is a link to example.com]->https://example.com ...
This results in 6 XWPFHyperlinkRuns in one CTHyperlink linking to https://example.com.
This leads to problems when link text needs to be changed when the link target changes. The text of all the 6 text runs is the link text. So which text run shall be changed?
The best I have found is a method which sets the text of the first text run in the CTHyperlink.
/**
* Sets the text of the first text run in the CTHyperlink of this XWPFHyperlinkRun.
* Tries solving the problem when a CTHyperlink contains multiple text runs.
* Then the String value is set in first text run only. All other text runs are set empty.
*/
/*modifiers*/ void setTextInFirstRun(XWPFHyperlinkRun hyperlinkRun, String value) {
org.openxmlformats.schemas.wordprocessingml.x2006.main.CTHyperlink ctHyperlink = hyperlinkRun.getCTHyperlink();
for (int r = 0; r < ctHyperlink.getRList().size(); r++) {
org.openxmlformats.schemas.wordprocessingml.x2006.main.CTR ctR = ctHyperlink.getRList().get(r);
for (int t = 0; t < ctR.getTList().size(); t++) {
org.openxmlformats.schemas.wordprocessingml.x2006.main.CTText ctText = ctR.getTList().get(t);
if (r == 0 && t == 0) {
ctText.setStringValue(value);
} else {
ctText.setStringValue("");
}
}
}
}
There the String value is set in first text run only. All other text runs are set empty. The text formatting of the first text run remains.

This works, but need more some steps to get text formatting correctly:
try (var fis = new FileInputStream(fileName);
var doc = new XWPFDocument(fis)) {
var pList = doc.getParagraphs();
for (var p : pList) {
var runs = p.getRuns();
for (int i = 0; i < runs.size(); i++) {
var r = runs.get(i);
if (r instanceof XWPFHyperlinkRun) {
var run = (XWPFHyperlinkRun) r;
var link = run.getHyperlink(doc);
// To get text: link for checking
System.out.println(run.getText(0) + ": " + link.getURL());
// how i replace it
var run1 = p.insertNewHyperlinkRun(i, "http://google.com");
run1.setText(run.getText(0));
// remove the old link
p.removeRun(i + 1);
}
}
}
try (var fos = new FileOutputStream(outFileName)) {
doc.write(fos);
}
}
I'm using these libraries:
implementation 'org.apache.poi:poi:5.2.2'
implementation 'org.apache.poi:poi-ooxml:5.2.2'

Text Highlighting with PDFClown without using PDF Annotations

I've started using PDFClown some weeks ago. My purpose is multi-word highlighting, mainly on newspapers. Starting from the org.pdfclown.samples.cli.TextHighlightSample example, I succeeded in extracting multi-word positions and highlighting them. I even solved some problems due to text ordering and matching in most cases.
Unfortunately my framework includes FPDI and it does not consider PDFAnnotations. So, all the content outside of a page content stream, like text annotations and other so called markup annotations, get lost.
So any suggestion on creating "Text Highlighting" with PdfClown and without using PDF annotations?

To not have the highlight in an annotation but instead in the actual page content stream, one has to put the graphic commandos into the page content stream which in case of the org.pdfclown.samples.cli.TextHighlightSample example are implicitly put into the normal annotation appearance stream.
This can be implemented like this:
org.pdfclown.files.File file = new org.pdfclown.files.File(resource);
Pattern pattern = Pattern.compile("S", Pattern.CASE_INSENSITIVE);
TextExtractor textExtractor = new TextExtractor(true, true);
for (final Page page : file.getDocument().getPages())
{
final List<Quad> highlightQuads = new ArrayList<Quad>();
Map<Rectangle2D, List<ITextString>> textStrings = textExtractor.extract(page);
final Matcher matcher = pattern.matcher(TextExtractor.toString(textStrings));
textExtractor.filter(textStrings, new TextExtractor.IIntervalFilter()
{
#Override
public boolean hasNext()
{
return matcher.find();
}
#Override
public Interval<Integer> next()
{
return new Interval<Integer>(matcher.start(), matcher.end());
}
#Override
public void process(Interval<Integer> interval, ITextString match)
{
{
Rectangle2D textBox = null;
for (TextChar textChar : match.getTextChars())
{
Rectangle2D textCharBox = textChar.getBox();
if (textBox == null)
{
textBox = (Rectangle2D) textCharBox.clone();
}
else
{
if (textCharBox.getY() > textBox.getMaxY())
{
highlightQuads.add(Quad.get(textBox));
textBox = (Rectangle2D) textCharBox.clone();
}
else
{
textBox.add(textCharBox);
}
}
}
highlightQuads.add(Quad.get(textBox));
}
}
#Override
public void remove()
{
throw new UnsupportedOperationException();
}
});
// Highlight the text pattern match!
ExtGState defaultExtGState = new ExtGState(file.getDocument());
defaultExtGState.setAlphaShape(false);
defaultExtGState.setBlendMode(Arrays.asList(BlendModeEnum.Multiply));
PrimitiveComposer composer = new PrimitiveComposer(page);
composer.getScanner().moveEnd();
// TODO: reset graphics state here.
composer.applyState(defaultExtGState);
composer.setFillColor(new DeviceRGBColor(1, 1, 0));
{
for (Quad markupBox : highlightQuads)
{
Point2D[] points = markupBox.getPoints();
double markupBoxHeight = points[3].getY() - points[0].getY();
double markupBoxMargin = markupBoxHeight * .25;
composer.drawCurve(new Point2D.Double(points[3].getX(), points[3].getY()),
new Point2D.Double(points[0].getX(), points[0].getY()),
new Point2D.Double(points[3].getX() - markupBoxMargin, points[3].getY() - markupBoxMargin),
new Point2D.Double(points[0].getX() - markupBoxMargin, points[0].getY() + markupBoxMargin));
composer.drawLine(new Point2D.Double(points[1].getX(), points[1].getY()));
composer.drawCurve(new Point2D.Double(points[2].getX(), points[2].getY()),
new Point2D.Double(points[1].getX() + markupBoxMargin, points[1].getY() + markupBoxMargin),
new Point2D.Double(points[2].getX() + markupBoxMargin, points[2].getY() - markupBoxMargin));
composer.fill();
}
}
composer.flush();
}
file.save(new File(RESULT_FOLDER, "multiPage-highlight-content.pdf"), SerializationModeEnum.Incremental);
(HighlightInContent.java method testHighlightInContent)
You will recognize the text extraction frame from the original example. Merely now the quads from a whole page are collected before they are processed, and the processing code (which mostly has been borrowed from TextMarkup.refreshAppearance()) draws forms representing the quads into the page content.
Beware, to make this work generically, the graphics state has to be reset before inserting the new instructions (the position is marked with a TODO comment). This can be done either by applying save/restore state or by actually counteracting unwanted changed state entries. Unfortunately I did not see how to do the former in PDF Clown and have not yet had the time to do the latter.

JTextPane - HTMLDocument: when adding/removing a new style, other attributes also changes

I have a JTextPane (or JEditorPane) in which I want to add some buttons to format text (as shown in the picture).
When I change the selected text to Bold (making a new Style), the font family (and others attributes) also changes. Why? I want to set (or remove) the bold attribute in the selected text and other stays unchanged, as they were.
This is what I'm trying:
private void setBold(boolean flag){
HTMLDocument doc = (HTMLDocument) editorPane.getDocument();
int start = editorPane.getSelectionStart();
int end = editorPane.getSelectedText().length();
StyleContext ss = doc.getStyleSheet();
//check if BoldStyle exists and then add / remove it
Style style = ss.getStyle("BoldStyle");
if(style == null){
style = ss.addStyle("BoldStyle", null);
style.addAttribute(StyleConstants.Bold, true);
} else {
style.addAttribute(StyleConstants.Bold, false);
ss.removeStyle("BoldStyle");
}
doc.setCharacterAttributes(start, end, style, true);
}
But as I explained above, other attributes also change:
Any help will be appreciated. Thanks in advance!
http://oi40.tinypic.com/riuec9.jpg

What you are trying to do can be accomplished with one of the following two lines of code:
new StyledEditorKit.BoldAction().actionPerformed(null);
or
editorPane.getActionMap().get("font-bold").actionPerformed(null);
... where editorPane is an instance of JEditorPane of course.
Both will seamlessly take care of any attributes already defined and supports text selection.
Regarding your code, it does not work with previously styled text because you are overwriting the corresponding attributes with nothing. I mean, you never gather the values for the attributes already set for the current selected text using, say, the getAttributes() method. So, you are effectively resetting them to whatever default the global stylesheet specifies.
The good news is you don't need to worry about all this if you use one of the snippets above. Hope that helps.

I made some minor modifications to your code and it worked here:
private void setBold(boolean flag){
HTMLDocument doc = (HTMLDocument) editorPane.getDocument();
int start = editorPane.getSelectionStart();
int end = editorPane.getSelectionEnd();
if (start == end) {
return;
}
if (start > end) {
int life = start;
start = end;
end = life;
}
StyleContext ss = doc.getStyleSheet();
//check if BoldStyle exists and then add / remove it
Style style = ss.getStyle(editorPane.getSelectedText());
if(style == null){
style = ss.addStyle(editorPane.getSelectedText(), null);
style.addAttribute(StyleConstants.Bold, true);
} else {
style.addAttribute(StyleConstants.Bold, false);
ss.removeStyle(editorPane.getSelectedText());
}
doc.setCharacterAttributes(start, end - start, style, true);
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to add text outlines to text within Powerpoint via Apache POI: - java

Related

Apache poi PowerPoint slide masters - not generating footer with page number

How do I ADD bullet points to a word document using Apache POI in Java

How to edit a Hyperlink in a Word Document using Apache POI?

Text Highlighting with PDFClown without using PDF Annotations

JTextPane - HTMLDocument: when adding/removing a new style, other attributes also changes

Categories

Resources