I try to extract some text out of a PDF. For that I need to define a rectangle that contains the text.
I recognized that the coordinates may have a different meaning when I compare the coordinates from extraction of text to coordinates of drawing.
package MyTest.MyTest;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.pdmodel.PDPageContentStream.*;
import org.apache.pdfbox.text.*;
import java.awt.*;
import java.io.*;
public class MyTest
{
public static void main (String [] args) throws Exception
{
PDDocument pd = PDDocument.load (new File ("my.pdf"));
PDFTextStripperByArea st = new PDFTextStripperByArea ();
PDPage pg = pd.getPage (0);
float h = pg.getMediaBox ().getHeight ();
float w = pg.getMediaBox ().getWidth ();
System.out.println (h + " x " + w + " in internal units");
h = h / 72 * 2.54f * 10;
w = w / 72 * 2.54f * 10;
System.out.println (h + " x " + w + " in mm");
int X = 85;
int Y = 175;
int dX = 250;
int dY = 15;
// extract some text
st.addRegion ("a", new Rectangle (X, Y, dX, dY));
st.extractRegions (pg);
String text = st.getTextForRegion ("a");
System.out.println("text="+text);
// fill a rectangle
PDPageContentStream contents = new PDPageContentStream (pd, pg,AppendMode.APPEND, false);
contents.setNonStrokingColor (Color.RED);
contents.addRect (X, Y, dX, dY);
contents.fill ();
contents.close ();
pd.save ("x.pdf");
}
}
The text I extract (output of text= in the console) is not the text I overdraw with my red rectangle (generated x.pdf).
Why??
For testing try some PDF you already have. To avoid a lot of try/error in aiming for a rectangle with text in it use a file with a lot of text.
There are (at least) two issues in your approach:
Different coordinate systems
You use st.addRegion. Its JavaDoc comment tells us:
/**
* Add a new region to group text by.
*
* #param regionName The name of the region.
* #param rect The rectangle area to retrieve the text from. The y-coordinates are java
* coordinates (y == 0 is top), not PDF coordinates (y == 0 is bottom).
*/
public void addRegion( String regionName, Rectangle2D rect )
(Actually the whole text extraction apparatus of PDFBox uses its own coordinate system, and there already have been many questions on stack overflow because of irritations this caused.)
On the other hand contents.addRect does not use those "java coordinates". Thus, you have to subtract the y coordinate you use in text extraction from the maximum crop box y coordinate to get a coordinate for addRect.
Furthermore, the region rectangles have their anchor point at the top left while the regular PDF rectangles (like the one you define with contents.addRect) have it at the bottom left. Thus, you additionally have to add or subtract the rectangle height from the y coordinate.
Actually you may have to change the x coordinate, too. It is not mirrored but there may be a shift, the PDFBox text extraction coordinate system uses x=0 for the left page border but that is not necessarily the case in PDF user space. Thus, you may have to add the left border x coordinate of the crop box to your text extraction x coordinate.
Possibly changed coordinate system
In the page content stream the coordinate system may have been changed by applying a transformation to the current transformation matrix. As a result the coordinates in the instructions you append to it may have a different meaning than even outlined above.
To rule out such an effect, you should use a different PDPageContentStream constructor with an additional boolean resetContext parameter:
/**
* Create a new PDPage content stream.
*
* #param document The document the page is part of.
* #param sourcePage The page to write the contents to.
* #param appendContent Indicates whether content will be overwritten, appended or prepended.
* #param compress Tell if the content stream should compress the page contents.
* #param resetContext Tell if the graphic context should be reset. This is only relevant when
* the appendContent parameter is set to {#link AppendMode#APPEND}. You should use this when
* appending to an existing stream, because the existing stream may have changed graphic
* properties (e.g. scaling, rotation).
* #throws IOException If there is an error writing to the page contents.
*/
public PDPageContentStream(PDDocument document, PDPage sourcePage, AppendMode appendContent,
boolean compress, boolean resetContext) throws IOException
I.e. replace
PDPageContentStream contents = new PDPageContentStream (pd, pg,AppendMode.APPEND, false);
by
PDPageContentStream contents = new PDPageContentStream (pd, pg,AppendMode.APPEND, false, false);
Related
I am trying to add tiled diagonal watermarks to the pdf, but it seems that pattern fills in iText are always tiled from the bottom left of the page, meaning that the tiles at the top and right side of the page can be cut abruptly. Is there an option to tile from the top left or with an offset instead?
Here is a sample of the code:
List<String> watermarkLines = getWatermarkLines();
Rectangle watermarkRect = getWatermarkRect();
PdfContentByte over = stamper.getOverContent(1);
PdfPatternPainter painter = over.createPattern(watermarkRect.getWidth(), watermarkRect.getHeight();
for (int x = 0; x < watermarkLines.size(); x++) {
AffineTransform trans = getWatermarkTransform(watermarkLines, x);
ColumnText.showTextAligned(painter, 0, watermarkLines.get(x), (float) trans.getTranslateX(), (float) trans.getTranslateY(), 45f);
}
over.setColorFill(new PatternColor(painter));
over.rectangle(0, 0, pageSize.getWidth(), pageSize.getHeight());
over.fill();
I tried changing the x and y of the rectangle function to negative or positive values, but it seems that the watermark is still stamped in the pattern as if it was tiled from the bottom left, cutting it in the same place as before.
First of, I cannot fathom which iText version you are using,
List<String> watermarkLines = getWatermarkLines();
...
ColumnText.showTextAligned(painter, 0, watermarkLines.get(x), (float) trans.getTranslateX(), (float) trans.getTranslateY(), 45f);
implies that the third parameter of the ColumnText.showTextAligned method you use is typed as String or Object. The iText 5 version I have at hand, though, requires a Phrase there. Below I'll show how to apply an offset with the current iText 5.5.13. You'll have to check whether it also works for your version.
Yes, you can apply an offset... in the pattern definition!
If instead of
PdfPatternPainter painter = over.createPattern(watermarkRect.getWidth(), watermarkRect.getHeight());
you create the pattern like this
PdfPatternPainter painter = over.createPattern(2 * watermarkRect.getWidth(), 2 * watermarkRect.getHeight(),
watermarkRect.getWidth(), watermarkRect.getHeight());
you have the same step size of pattern application (watermarkRect.getWidth(), watermarkRect.getHeight()) but a canvas twice that width and twice that height to position you text on. By positioning the text with an offset, you effectively move the whole pattern by that offset.
E.g. if you calculate the offsets as
Rectangle pageSize = pdfReader.getCropBox(1);
float xOff = pageSize.getLeft();
float yOff = pageSize.getBottom() + ((int)pageSize.getHeight()) % ((int)watermarkRect.getHeight());
and draw the text using
ColumnText.showTextAligned(painter, 0, new Phrase(watermarkLines.get(x)), (float) trans.getTranslateX() + xOff, (float) trans.getTranslateY() + yOff, 45f);
the pattern should fill the page as if starting at the top left corner of the visible page.
You haven't supplied getWatermarkLines, getWatermarkRect, and getWatermarkTransform. If I use
static AffineTransform getWatermarkTransform(List<String> watermarkLines, int x) {
return AffineTransform.getTranslateInstance(6 + 15*x, 6);
}
static Rectangle getWatermarkRect() {
return new Rectangle(65, 50);
}
static List<String> getWatermarkLines() {
return Arrays.asList("Test line 1", "Test line 2");
}
your original code for me creates a top left corner like this
and the code with the above offset creates one like this
I'm trying to center the text vertically inside a rect but it's always off by by a little bit.
The font used is the Helvetica and the font size is set to 12 and I'm setting a padding of 6 points above and below the letter and I'm setting the size of the rect as 24 points.
The code used to write the cells is below and the image shows the cell uncentered vertically.
public void drawCell(PDPageContentStream owningStream, float xOffset, float yOffset) throws IOException {
float cellHeightSpacing = fontSize / 2;
float height = yOffset - fontSize - cellHeightSpacing;
if (isContentLargerThanCell()) {
if (maxLines < 2)
return;
} else {
float x = xOffset+getAlignedX(" "+content+" ");
drawContent(owningStream," "+content+" ",x,height);
}
drawCellBoundaries(owningStream, xOffset, yOffset - 2 * fontSize, 2 * fontSize);
}
private void drawCellBoundaries(PDPageContentStream owniContentStream, float X, float startHeight, float sizeHeight) throws IOException {
owniContentStream.addRect(X, startHeight, width, sizeHeight);
owniContentStream.stroke()
}
You actually have two issues to cope with:
For a given font size fs, hardly any letter actually has a height of fs, usually short sequences of letters don't either.
Your code assumes that it has to vertically center content of height fs but you use capital letters without any part beneath the base line, so their height is considerably less than fs.
The y coordinate you use for drawing text is the height of the base line, not the height of the bottom of all text.
E.g. look at this letter
If you draw this letter at some coordinates x,y, its descender will be drawn even below your y height while your code assumes for centering that the whole letter is located between y and y + fs.
The former problem most likely will have to remain. If you vertically center for the exact appearance of the letters, neighboring cells might have jumping base lines which will look worse than a certain degree of being off-center.
Your main problem is the latter one, and you can solve it by increasing the height of text drawing (or lowering the height of the boundary drawing) by fs times the absolute value of the maximum descent of the font.
You can retrieve the font descent from the font's font descriptor (PDFontDescriptor.getDescent()) or the font's bounding box (PDFont.getBoundingBox())
I have a pdf coordinate (x, y) as input . I need to draw a string at the given input coordinate[Eg :- (x,y)=(200,250)]. I am using pdfbox , When I am using the below method moveTextPositionByAmount I am not getting the exact position.Even i have tried with moveTo(). Please help me how to draw the string at an exact position ?
PDPageContentStream contentStream = new PDPageContentStream(document, page,true,true);
contentStream.beginText();
contentStream.setFont(PDType1Font.HELVETICA_BOLD, 12);
contentStream.moveTextPositionByAmount(xindex, yindex);
contentStream.setNonStrokingColor(color);
contentStream.drawString(comment);
contentStream.stroke();
contentStream.endText();
Thanks.
Getting rid of graphic state changes from the existing page content
You use the PDPageContentStream constructor with two boolean arguments:
new PDPageContentStream(document, page,true,true);
This constructor is implemented as:
this(document, sourcePage, appendContent, compress, false);
i.e. it calls the constructor with three boolean arguments using false for the final one. This final boolean argument is documented as:
* #param resetContext Tell if the graphic context should be reseted.
Thus, you append to the page content without resetting the graphic context. This means that any changes to the current transformation matrix done in the existing page content still transforms your coordinates. To prevent that from happening you should use the PDPageContentStream constructor with three boolean arguments:
new PDPageContentStream(document, page, true, true, true);
Using this one can easily position text.
Drawing rectangles and test
The OP mentioned that he was successful drawing rectangles but not drawing text.
The following code
PDPage firstPage = allPages.get(0);
PDRectangle pageSize = firstPage.findMediaBox();
float x = 121;
float y = 305;
float w = 262;
float h = 104;
PDPageContentStream contentStream = new PDPageContentStream(document, firstPage, true, true, true);
contentStream.setNonStrokingColor(Color.yellow);
contentStream.fillRect(pageSize.getLowerLeftX() + x, pageSize.getLowerLeftY() + y, w, h);
contentStream.beginText();
contentStream.moveTextPositionByAmount(pageSize.getLowerLeftX() + x, pageSize.getLowerLeftY() + y);
contentStream.setFont(PDType1Font.HELVETICA_BOLD, 12);
contentStream.setNonStrokingColor(Color.red);
contentStream.drawString("My Text Here");
contentStream.endText();
contentStream.close();
results in
as would be expected.
Meaning of input coordinates must be explained
The OP also mentioned X:-121,Y:-305,W:-262,h:-104 as coordinates from external application in his comments.
As PDFs most often have positive coordinates inside the media box, these X and Y coordinates make no sense for PDFs in general.
Furthermore the OP was unable to share the document.
Therefore, it could not be found out whether or not those negative coordinates make sense for his special PDF.
Additionally negative values for widths and height are accepted by the rectangle drawing operations, but if used for text, they might imply that the Y coordinate does not denote the baseline, or that the text is not expected to start at X but to end there, or that the text shall be mirrored, or, or, or...
Thus, the meaning of those negative coordinates and dimensions must first be explained. Which is the origin of those coordinates, are the positive y coordinates above or below, is X,Y the lower left of the rectangle, what is the meaning of a negative width or height, where in relation to X, Y shall the string be drawn?
I found that this one worked for me .
PDPageContentStream contentStream = new PDPageContentStream(document, page,true,true);
contentStream.beginText();
contentStream.setFont(PDType1Font.HELVETICA_BOLD, 12);
contentStream.moveTextPositionByAmount(xindex, yindex);
contentStream.setNonStrokingColor(color);
contentStream.drawString(comment);
contentStream.endText();
I'm currently writing a new Karaoke-FX-Generator using Java. Now I have a problem with the implementation of the TextExtents-Function: It returns the wrong string bounds for the Subtitle file.
Here's an example:
The red rectangle represents the bounds of the string calculated by my program while the red background are the bounds calculated by xy-vsfilter.
Does anyone know how to fix that. I'm trying several hours and I still don't get any further.
This is the current implementation of the function.
/**
* Calculates the text-extents for the given text in the specified
* style.
* #param style The style
* #param text The text
* #return The extents of the text.
*/
public TextExtents getTextExtents(AssStyle style, String text) {
// Reads the font object from the cache.
Font font = this.getFont(style.getFontname(), style.isBold(), style.isItalic());
// If the font is unknown, return null.
if (font == null)
return null;
// Add the font size. (Note: FONT_SIZE_SCALE is 64)
font = font.deriveFont((float) style.getFontsize() * FONT_SIZE_SCALE);
// Returns other values like ascend, descend and ext-lead.
LineMetrics metrics = font.getLineMetrics(text, this.ctx);
// Calculate String bounds.
Rectangle2D rSize = font.getStringBounds(text, this.ctx);
// Returns the text-extents.
return new TextExtents(
rSize.getWidth() / FONT_SIZE_SCALE,
rSize.getHeight() / FONT_SIZE_SCALE,
metrics.getAscent() / FONT_SIZE_SCALE,
metrics.getDescent() / FONT_SIZE_SCALE,
metrics.getLeading() / FONT_SIZE_SCALE
);
}
I partially solved the problem. LOGFONT.lfHeight and Java uses different unit for font sizes. As such, I had to convert the font-size of java to the "logical" units.
// I used this code to convert from pixel-size to "logical units"
float fontSize = 72F / SCREEN_DPI; // SCREEN_DPI = 96
Now I only have small differences.
I want to draw text on canvas of certain width using .drawtext
For example, the width of the text should always be 400px no matter what the input text is.
If input text is longer it will decrease the font size, if input text is shorter it will increase the font size accordingly.
Here's a much more efficient method:
/**
* Sets the text size for a Paint object so a given string of text will be a
* given width.
*
* #param paint
* the Paint to set the text size for
* #param desiredWidth
* the desired width
* #param text
* the text that should be that width
*/
private static void setTextSizeForWidth(Paint paint, float desiredWidth,
String text) {
// Pick a reasonably large value for the test. Larger values produce
// more accurate results, but may cause problems with hardware
// acceleration. But there are workarounds for that, too; refer to
// http://stackoverflow.com/questions/6253528/font-size-too-large-to-fit-in-cache
final float testTextSize = 48f;
// Get the bounds of the text, using our testTextSize.
paint.setTextSize(testTextSize);
Rect bounds = new Rect();
paint.getTextBounds(text, 0, text.length(), bounds);
// Calculate the desired size as a proportion of our testTextSize.
float desiredTextSize = testTextSize * desiredWidth / bounds.width();
// Set the paint for that size.
paint.setTextSize(desiredTextSize);
}
Then, all you need to do is setTextSizeForWidth(paint, 400, str); (400 being the example width in the question).
For even greater efficiency, you can make the Rect a static class member, saving it from being instantiated each time. However, this may introduce concurrency issues, and would arguably hinder code clarity.
Try this:
/**
* Retrieve the maximum text size to fit in a given width.
* #param str (String): Text to check for size.
* #param maxWidth (float): Maximum allowed width.
* #return (int): The desired text size.
*/
private int determineMaxTextSize(String str, float maxWidth)
{
int size = 0;
Paint paint = new Paint();
do {
paint.setTextSize(++ size);
} while(paint.measureText(str) < maxWidth);
return size;
} //End getMaxTextSize()
Michael Scheper's solution seems nice but it didn't work for me, I needed to get the largest text size that is possible to draw in my view but this approach depends on the first text size you set, Every time you set a different size you'll get different results that can not say it is the right answer in every situation.
So I tried another way:
private float calculateMaxTextSize(String text, Paint paint, int maxWidth, int maxHeight) {
if (text == null || paint == null) return 0;
Rect bound = new Rect();
float size = 1.0f;
float step= 1.0f;
while (true) {
paint.getTextBounds(text, 0, text.length(), bound);
if (bound.width() < maxWidth && bound.height() < maxHeight) {
size += step;
paint.setTextSize(size);
} else {
return size - step;
}
}
}
It's simple, I increase the text size until the text rect bound dimensions are close enough to maxWidth and maxHeight, to decrease the loop repeats just change step to a bigger value (accuracy vs speed), Maybe it's not the best way to achieve this but It works.