Getting null from PDPage#getMediaBox()

Getting null from PDPage#getMediaBox() - java

I need to use the Mediabox to get the coordinates in a page from a pdf, but for some pdfs I get null and for others I get the regular Mediabox.
Why is it happen? How is the method work?
private void addPDF(File pdf) throws IOException, InterruptedException {
waiting_label.setText("");
pdf_name.setText(pdf.getName());
all_my_p = new ArrayList<>();
System.out.println("prova.JPanelImageAndButton.addPDF()");
/*pddoc = null;
cosdoc = null;*/
PDFParser parser = new PDFParser(new FileInputStream(pdf));
parser.parse();
cosdoc = parser.getDocument();
pddoc = new PDDocument(cosdoc);
List<PDPage> list = pddoc.getDocumentCatalog().getAllPages();
pdf_name.setText(pdf.getName());
if (my_p != null) {
remove(my_p);
}
JFrame top = (JFrame) SwingUtilities.getWindowAncestor(this);
Dimension d = new Dimension(top.getWidth(), top.getHeight() - p.getHeight());
for (int i = 0; i < n_page; i++) {
PDPage pdp=list.get(i);
System.out.println("prova.JPanelImageAndButton.addPDF()"+pdp.getMediaBox());
final MyPanelFrame t = new MyPanelFrame(pdf.getName() + "_temp" + (i + 1) + ".png", pdp);
t.setPreferredSize(d);
t.setBounds(new Rectangle(10, 30, top.getWidth(), top.getHeight()));
t.addHierarchyBoundsListener(new HierarchyBoundsListener() {
#Override
public void ancestorMoved(HierarchyEvent e) {
}
#Override
public void ancestorResized(HierarchyEvent e) {
t.setPreferredSize(new Dimension(top.getWidth(), top.getHeight() - p.getHeight()));
t.setBounds(new Rectangle(10, 30, top.getWidth(), top.getWidth()));
top.revalidate();
}
});
all_my_p.add(t);
}
my_p = all_my_p.get(0);
add(my_p);
top.setSize(top.getWidth() + 1, top.getHeight() + 1);
top.revalidate();
top.setSize(top.getWidth() - 1, top.getHeight() - 1);
top.revalidate();
top.setExtendedState(JFrame.MAXIMIZED_BOTH);
label_load.setText("");
label_save.setText("");
activityDone = true;
//pddoc.close();
//cosdoc.close();
}
This is an example, but for the same pdf I get null everywhere I use getMediaBox().

You seem to use a 1.x.x version of PDFBox. For these versions the observed behavior is to be expected, cf. the JavaDocs of the method:
/**
* A rectangle, expressed
* in default user space units, defining the boundaries of the physical
* medium on which the page is intended to be displayed or printed
*
* This will get the MediaBox at this page and not look up the hierarchy.
* This attribute is inheritable, and findMediaBox() should probably used.
* This will return null if no MediaBox are available at this level.
*
* #return The MediaBox at this level in the hierarchy.
*/
public PDRectangle getMediaBox()
This comment also presents the solution, use findMediaBox() instead:
/**
* This will find the MediaBox for this page by looking up the hierarchy until
* it finds them.
*
* #return The MediaBox at this level in the hierarchy.
*/
public PDRectangle findMediaBox()
If you plan to switch to PDFBox 2.0.0, you'll find that the behavior of getMediaBox has changed, it already walks the hierarchy if necessary and there is no findMediaBox anymore.

Related

itext 7 pdf how to prevent text overflow on right side of the page

I am using itextpdf 7 (7.2.0) to create a pdf file. However even though the TOC part is rendered very well, in the content part the text overflows. Here is my code that generates the pdf:
public class Main {
public static void main(String[] args) throws IOException {
PdfWriter writer = new PdfWriter("fiftyfourthPdf.pdf");
PdfDocument pdf = new PdfDocument(writer);
Document document = new Document(pdf, PageSize.A4,false);
//document.setMargins(30,10,36,10);
// Create a PdfFont
PdfFont font = PdfFontFactory.createFont(StandardFonts.TIMES_ROMAN,"Cp1254");
document
.setTextAlignment(TextAlignment.JUSTIFIED)
.setFont(font)
.setFontSize(11);
PdfOutline outline = null;
java.util.List<AbstractMap.SimpleEntry<String, AbstractMap.SimpleEntry<String, Integer>>> toc = new ArrayList<>();
for(int i=0;i<5000;i++){
String line = "This is paragraph " + String.valueOf(i+1)+ " ";
line = line.concat(line).concat(line).concat(line).concat(line).concat(line);
Paragraph p = new Paragraph(line);
p.setKeepTogether(true);
document.add(p.setFont(font).setFontSize(10).setHorizontalAlignment(HorizontalAlignment.CENTER).setTextAlignment(TextAlignment.LEFT));
//PROCESS FOR TOC
String name = "para " + String.valueOf(i+1);
outline = createOutline(outline,pdf,line ,name );
AbstractMap.SimpleEntry<String, Integer> titlePage = new AbstractMap.SimpleEntry(line, pdf.getNumberOfPages());
p
.setFont(font)
.setFontSize(12)
//.setKeepWithNext(true)
.setDestination(name)
// Add the current page number to the table of contents list
.setNextRenderer(new UpdatePageRenderer(p));
toc.add(new AbstractMap.SimpleEntry(name, titlePage));
}
int contentPageNumber = pdf.getNumberOfPages();
for (int i = 1; i <= contentPageNumber; i++) {
// Write aligned text to the specified by parameters point
document.showTextAligned(new Paragraph(String.format("Sayfa %s / %s", i, contentPageNumber)).setFontSize(10),
559, 26, i, TextAlignment.RIGHT, VerticalAlignment.MIDDLE, 0);
}
//BEGINNING OF TOC
document.add(new AreaBreak());
Paragraph p = new Paragraph("Table of Contents")
.setFont(font)
.setDestination("toc");
document.add(p);
java.util.List<TabStop> tabStops = new ArrayList<>();
tabStops.add(new TabStop(580, TabAlignment.RIGHT, new DottedLine()));
for (AbstractMap.SimpleEntry<String, AbstractMap.SimpleEntry<String, Integer>> entry : toc) {
AbstractMap.SimpleEntry<String, Integer> text = entry.getValue();
p = new Paragraph()
.addTabStops(tabStops)
.add(text.getKey())
.add(new Tab())
.add(String.valueOf(text.getValue()))
.setAction(PdfAction.createGoTo(entry.getKey()));
document.add(p);
}
// Move the table of contents to the first page
int tocPageNumber = pdf.getNumberOfPages();
for (int i = 1; i <= tocPageNumber; i++) {
// Write aligned text to the specified by parameters point
document.showTextAligned(new Paragraph("\n footer text\n second line\nthird line").setFontColor(ColorConstants.RED).setFontSize(8),
300, 26, i, TextAlignment.CENTER, VerticalAlignment.MIDDLE, 0);
}
document.flush();
for(int z = 0; z< (tocPageNumber - contentPageNumber ); z++){
pdf.movePage(tocPageNumber,1);
pdf.getPage(1).setPageLabel(PageLabelNumberingStyle.UPPERCASE_LETTERS,
null, 1);
}
//pdf.movePage(tocPageNumber, 1);
// Add page labels
/*pdf.getPage(1).setPageLabel(PageLabelNumberingStyle.UPPERCASE_LETTERS,
null, 1);*/
pdf.getPage(tocPageNumber - contentPageNumber + 1).setPageLabel(PageLabelNumberingStyle.DECIMAL_ARABIC_NUMERALS,
null, 1);
document.close();
}
private static PdfOutline createOutline(PdfOutline outline, PdfDocument pdf, String title, String name) {
if (outline == null) {
outline = pdf.getOutlines(false);
outline = outline.addOutline(title);
outline.addDestination(PdfDestination.makeDestination(new PdfString(name)));
} else {
PdfOutline kid = outline.addOutline(title);
kid.addDestination(PdfDestination.makeDestination(new PdfString(name)));
}
return outline;
}
private static class UpdatePageRenderer extends ParagraphRenderer {
protected AbstractMap.SimpleEntry<String, Integer> entry;
public UpdatePageRenderer(Paragraph modelElement, AbstractMap.SimpleEntry<String, Integer> entry) {
super(modelElement);
this.entry = entry;
}
public UpdatePageRenderer(Paragraph modelElement) {
super(modelElement);
}
#Override
public LayoutResult layout(LayoutContext layoutContext) {
LayoutResult result = super.layout(layoutContext);
//entry.setValue(layoutContext.getArea().getPageNumber());
if (result.getStatus() != LayoutResult.FULL) {
if (null != result.getOverflowRenderer()) {
result.getOverflowRenderer().setProperty(
Property.LEADING,
result.getOverflowRenderer().getModelElement().getDefaultProperty(Property.LEADING));
} else {
// if overflow renderer is null, that could mean that the whole renderer will overflow
setProperty(
Property.LEADING,
result.getOverflowRenderer().getModelElement().getDefaultProperty(Property.LEADING));
}
}
return result;
}
#Override
// If not overriden, the default renderer will be used for the overflown part of the corresponding paragraph
public IRenderer getNextRenderer() {
return new UpdatePageRenderer((Paragraph) this.getModelElement());
}
}
}
Here are the screen shots of TOC part and content part :
TOC :
Content :
What am I missing? Thank you all for your help.
UPDATE
When I add the line below it renders with no overflow but the page margins of TOC and content part differ (the TOC margin is way more than the content margin). See the picture attached please :
document.setMargins(30,60,36,20);
Right Margin difference between TOC and content:
UPDATE 2 :
When I comment the line
document.setMargins(30,60,36,20);
and set the font size on line :
document.add(p.setFont(font).setFontSize(10).setHorizontalAlignment(HorizontalAlignment.CENTER).setTextAlignment(TextAlignment.LEFT));
to 12 then it renders fine. What difference should possibly the font size cause for the page content and margins? Are not there standard page margins and page setups? Am I unknowingly (I am newbie to itextpdf) messing some standard implementations?

TL; DR: either remove setFontSize in
p
.setFont(font)
.setFontSize(12)
//.setKeepWithNext(true)
.setDestination(name)
or change setFontSize(10) -> setFontSize(12) in
document.add(p.setFont(font).setFontSize(10).setHorizontalAlignment(HorizontalAlignment.CENTER).setTextAlignment(TextAlignment.LEFT));
Explanation: You are setting the Document to not immediately flush elements added to that document with the following line:
Document document = new Document(pdf, PageSize.A4,false);
Then you add an paragraph element with font size equal to 10 to the document with the following line:
document.add(p.setFont(font).setFontSize(10).setHorizontalAlignment(HorizontalAlignment.CENTER).setTextAlignment(TextAlignment.LEFT));
What happens is that the element is being laid out (split in lines etc), but now drawn on the page. Then you do .setFontSize(12) and this new font size is applied for draw only, so iText calculated that X characters would fit into one line assuming the font size is 10 while in reality the font size is 12 and obviously fewer characters can fit into one line.
There is no sense in setting the font size two times to different values - just pick one value you want to see in the resultant document and set it once.

Java code to compare excel sheets does not work for larger files

I have recently done a project in java to compare excel sheets in 2 different folders and generates the result in a summary folder created in the source folder directories. All the code was working fine except for files which have more than 10000 rows. its just creating an empty sheet instead of compared mismatches for larger files. here is the code i used Please help me out.
package com.validation.comparators;
import java.util.ArrayList;
import java.util.List;
import org.apache.commons.lang3.StringUtils;
import org.bson.Document;
/**
* The utility class SheetComparator
*/
public class SheetComparator {
private SheetComparator() {
// The utility class
}
/**
* Compares the document equivalent of two sheets
*
* #param document1
* The document 1
* #param document2
* The document 2
* #return The compared output
*/
#SuppressWarnings("unchecked")
public static Document compare(Document document1, Document document2) {
List<String> headers = (List<String>) document1.get("headers");
List<Document> sheet1Rows = (List<Document>) document1.get("data");
List<Document> sheet2Rows = (List<Document>) document2.get("data");
List<Document> temp;
List<Document> comparedOutput = new ArrayList<>();
if (sheet1Rows.size() < sheet2Rows.size()) {
temp = sheet1Rows;
sheet1Rows = sheet2Rows;
sheet2Rows = temp;
}
int length = sheet1Rows.size();
int length2 = sheet2Rows.size();
for (int i = 0; i < length2; i++) {
Document sheet1Row = sheet1Rows.get(i);
Document sheet2Row = sheet2Rows.get(i);
Document comparedRow = new Document("row number",
new Document("value", sheet1Row.getString("row number")).append("color", "WHITE"));
Boolean completeMatch = true;
for (String header : headers) {
Boolean isNull = false;
String value1 = sheet1Row.getString(header).trim();
String value2 = sheet2Row.getString(header).trim();
if (StringUtils.isAnyBlank(value1, value2)) {
completeMatch = false;
isNull = true;
} else if (!StringUtils.equals(value1, value2)) {
completeMatch = false;
}
if (isNull) {
comparedRow.append(header, new Document("value", StringUtils.isBlank(value1) ? value2 : value1)
.append("color", "RED"));
} else {
comparedRow.append(header, new Document("value", value1).append("color", "WHITE"));
}
}
if (!completeMatch) {
comparedOutput.add(comparedRow);
}
}
for (int i = length2; i < length; i++) {
Document row = sheet1Rows.get(i);
Document comparedRow = new Document();
for (String header : headers) {
String value = row.getString(header);
comparedRow.put(header, new Document("value", value).append("color", "RED"));
}
comparedRow.append("row number",
new Document("value", row.getString("row number")).append("color", "WHITE"));
comparedOutput.add(comparedRow);
}
headers.add(0, "row number");
return new Document("data", comparedOutput).append("headers", headers);
}
}

Try it, set the jvm's (java's) memory high. The problem is you have to read the entire DOM object hierarchy.
Otherwise you need ony to sequentially read the documents, by sheet, by row, by cell.
So instead of having some DOM object in memory, you could:
Write a sequential stream (as text document) of the documents to a file.
Convert both Excels to text.
Read two streams and make a diff for every token.
Probably you can immediately write an diff report.

How to replace bookmarks in ".docx", using POI without loosing format?

I am trying to replace bookmark with values.
private FileInputStream fis = new FileInputStream(new File("D:\\test.docx"));
private XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paraList = this.document.getParagraphs();
private final void procParaList(List<XWPFParagraph> paraList, String bookmarkName, String bookmarkValue) {
Iterator<XWPFParagraph> paraIter = null;
XWPFParagraph para = null;
List<CTBookmark> bookmarkList = null;
Iterator<CTBookmark> bookmarkIter = null;
CTBookmark bookmark = null;
XWPFRun run = null;
Node nextNode = null;
paraIter = paraList.iterator();
while (paraIter.hasNext()) {
para = paraIter.next();
bookmarkList = para.getCTP().getBookmarkStartList();
bookmarkIter = bookmarkList.iterator();
while (bookmarkIter.hasNext()) {
bookmark = bookmarkIter.next();
if (bookmark.getName().equals(bookmarkName)) {
run = para.createRun();
run.setText(bookmarkValue);
nextNode = bookmark.getDomNode().getNextSibling();
while (!(nextNode.getNodeName().contains("bookmarkEnd"))) {
para.getCTP().getDomNode().removeChild(nextNode);
nextNode = bookmark.getDomNode().getNextSibling();
}
para.getCTP().getDomNode().insertBefore(run.getCTR().getDomNode(), nextNode);
}
}
}
}
I am able to replace bookmark to value but it is not keeping the same format(font family, font size, color etc) as bookmark text have.
Can anyone please provide some advice.

As Discussed earlier , i believe this is your exact use case , official archive link
help Please focus on the use of Node styleNode to copy the style information.
/**
* Replace the text - if any - contained between the bookmarkStart and
it's
* matching bookmarkEnd tag with the text specified. The technique used
will
* resemble that employed when inserting text after the bookmark. In
short,
* the code will iterate along the nodes until it encounters a matching
* bookmarkEnd tag. Each node encountered will be deleted unless it is
the
* final node before the bookmarkEnd tag is encountered and it is a
* character run. If this is the case, then it can simply be updated to
* contain the text the users wishes to see inserted into the document.
If
* the last node is not a character run, then it will be deleted, a new
run
* will be created and inserted into the paragraph between the
bookmarkStart
* and bookmarkEnd tags.
*
* #param run An instance of the XWPFRun class that encapsulates the
text
* that is to be inserted into the document following the bookmark.
*/
private void replaceBookmark(XWPFRun run) {
Node nextNode = null;
Node styleNode = null;
Node lastRunNode = null;
Node toDelete = null;
NodeList childNodes = null;
Stack<Node> nodeStack = null;
boolean textNodeFound = false;
boolean foundNested = true;
int bookmarkStartID = 0;
int bookmarkEndID = -1;
int numChildNodes = 0;
nodeStack = new Stack<Node>();
bookmarkStartID = this._ctBookmark.getId().intValue();
nextNode = this._ctBookmark.getDomNode();
nodeStack.push(nextNode);
// Loop through the nodes looking for a matching bookmarkEnd tag
while (bookmarkStartID != bookmarkEndID) {
nextNode = nextNode.getNextSibling();
nodeStack.push(nextNode);
// If an end tag is found, does it match the start tag? If so,
end
// the while loop.
if (nextNode.getNodeName().contains(Bookmark.BOOKMARK_END_TAG))
{
try {
bookmarkEndID = Integer.parseInt(
nextNode.getAttributes().getNamedItem(
Bookmark.BOOKMARK_ID_ATTR_NAME).getNodeValue());
} catch (NumberFormatException nfe) {
bookmarkEndID = bookmarkStartID;
}
}
//else {
// Place a reference to the node on the nodeStack
// nodeStack.push(nextNode);
//}
}
// If the stack of nodes found between the bookmark tags is not
empty
// then they have to be removed.
if (!nodeStack.isEmpty()) {
// Check the node at the top of the stack. If it is a run, get
it's
// style - if any - and apply to the run that will be replacing
it.
//lastRunNode = nodeStack.pop();
lastRunNode = nodeStack.peek();
if ((lastRunNode.getNodeName().equals(Bookmark.RUN_NODE_NAME)))
{
styleNode = this.getStyleNode(lastRunNode);
if (styleNode != null) {
run.getCTR().getDomNode().insertBefore(
styleNode.cloneNode(true),
run.getCTR().getDomNode().getFirstChild());
}
}

Creating a gradient in background with PDFBox

How can I create a gradient in PDFBox? Or maybe "can I?".
I don't want to create them and export to jpeg or something else. I need a light document, so this has to be programmed somehow.
Any ideas?

After a lot of research, I finally created a small "creator of my own gradient"! It looks like this:
COSDictionary fdict = new COSDictionary();
fdict.setInt(COSName.FUNCTION_TYPE, 2); // still not understaning that...
COSArray domain = new COSArray();
domain.add(COSInteger.get(0));
domain.add(COSInteger.get(1));
COSArray c0 = new COSArray();
c0.add(COSFloat.get("0.64176"));
c0.add(COSFloat.get("0.72588"));
c0.add(COSFloat.get("0.78078"));
COSArray c1 = new COSArray();
c1.add(COSFloat.get("0.57176"));
c1.add(COSFloat.get("0.62588"));
c1.add(COSFloat.get("0.70078"));
fdict.setItem(COSName.DOMAIN, domain);
fdict.setItem(COSName.C0, c0);
fdict.setItem(COSName.C1, c1);
fdict.setInt(COSName.N, 1);
PDFunctionType2 func = new PDFunctionType2(fdict);
PDShadingType2 axialShading = new PDShadingType2(new COSDictionary());
axialShading.setColorSpace(PDDeviceRGB.INSTANCE);
axialShading.setShadingType(PDShading.SHADING_TYPE2);
COSArray coords1 = new COSArray();
coords1.add(COSInteger.get(0));
coords1.add(COSInteger.get(0));
coords1.add(COSInteger.get(850)); // size of my page
coords1.add(COSInteger.get(600));
axialShading.setCoords(coords1); // so this sets the bounds of my gradient
axialShading.setFunction(func); // and this determines all the curves etc?
CStr.shadingFill(axialShading); // where CStr is a ContentStream for my PDDocument
I will leave this for others. Leave your opinions and be free to show me some clever ideas to improve this code :)

Here's a class I made to make the creation of gradients easier. It supports axial gradients with multiple colors. It uses java.awt.Color to specify colors but that can be replaced easily.
public class PDGradient extends PDShadingType2 {
public PDGradient(List<GradientPart> parts) {
super(new COSDictionary());
// PDF 1.7 - 8.7.4.5.3 Type 2 (Axial) Shadings
setColorSpace(PDDeviceRGB.INSTANCE);
setShadingType(PDShadingType2.SHADING_TYPE2);
setFunction(createGradientFunction(parts));
}
private static PDFunction createGradientFunction(List<GradientPart> parts) {
if (parts.size() < 2) {
throw new IllegalArgumentException("Gradient must have at least 2 colors.");
}
GradientPart first = parts.get(0);
GradientPart last = parts.get(parts.size() - 1);
if (first.ratio != 0f) {
throw new IllegalArgumentException("Gradient first color ratio must be 0.");
} else if (last.ratio != 1f) {
throw new IllegalArgumentException("Gradient last color ratio must be 1.");
}
if (parts.size() == 2) {
// Only two colors, use exponential function.
return createColorFunction(first.color, last.color);
}
// Multiple colors, use stitching function to combine exponential functions
// PDF 1.7 - 7.10.4 Type 3 (Stitching) Functions
COSDictionary dict = new COSDictionary();
COSArray functions = new COSArray();
COSArray bounds = new COSArray();
COSArray encode = new COSArray();
GradientPart lastPart = first;
for (int i = 1; i < parts.size(); i++) {
GradientPart part = parts.get(i);
// Add exponential function for interpolating between these two colors.
functions.add(createColorFunction(lastPart.color, part.color));
// Specify function bounds, except for first and last, which are specified by domain.
if (i != parts.size() - 1) {
bounds.add(new COSFloat(part.ratio));
}
// Used to interpolate stitching function subdomain (eg: [0.2 0.5]
// to the exponential function domain, which is always [0.0 1.0].
encode.add(COSInteger.ZERO);
encode.add(COSInteger.ONE);
lastPart = part;
}
dict.setInt(COSName.FUNCTION_TYPE, 3);
dict.setItem(COSName.DOMAIN, new PDRange()); // [0.0 1.0]
dict.setItem(COSName.FUNCTIONS, functions);
dict.setItem(COSName.BOUNDS, bounds);
dict.setItem(COSName.ENCODE, encode);
return new PDFunctionType3(dict);
}
private static PDFunction createColorFunction(Color start, Color end) {
// PDF 1.7 - 7.10.3 Type 2 (Exponential Interpolation) Functions
COSDictionary dict = new COSDictionary();
dict.setInt(COSName.FUNCTION_TYPE, 2);
dict.setItem(COSName.DOMAIN, new PDRange()); // [0.0 1.0]
dict.setItem(COSName.C0, createColorCOSArray(start));
dict.setItem(COSName.C1, createColorCOSArray(end));
dict.setInt(COSName.N, 1); // Linear interpolation
return new PDFunctionType2(dict);
}
private static COSArray createColorCOSArray(Color color) {
// Create a COSArray for a color.
// java.awt.Color uses 0-255 values while PDF uses 0-1.
COSArray a = new COSArray();
a.add(new COSFloat(color.getRed() / 255f));
a.add(new COSFloat(color.getGreen() / 255f));
a.add(new COSFloat(color.getBlue() / 255f));
return a;
}
/**
* Specifies a color and its position in a {#link PDGradient}.
*/
public static class GradientPart {
public final Color color;
public final float ratio;
public GradientPart(Color color, float ratio) {
this.color = color;
this.ratio = ratio;
}
}
}
Example usage:
List<GradientPart> parts = new ArrayList<>();
parts.add(new GradientPart(Color.RED, 0.0f));
parts.add(new GradientPart(Color.YELLOW, 0.5f));
parts.add(new GradientPart(Color.GREEN, 1.0f));
PDGradient gradient = new PDGradient(parts);
gradient.setCoords(...);
pdfStream.shadingFill(gradient)
This works essentially the same as the other answer for two colors gradients, using an exponential function (type 2) to linearly interpolate between two colors. If there are more colors, a stitching (type 3) function is used to combine multiple exponential functions with different subdomains.

Get word level bounding boxes in JavaCpp Tessearact

I am trying to extract the bounding boxes of each word from the javacpp tesseract. This appears to be the bounding box call (my full code below):
boolean box = ri.BoundingBox(RIL_WORD, coord1, coord2, coord3, coord4)
RIL_WORD is the iterator level that can be adjusted for words, sentences, and paragraphs. The coordinates are IntPointers (included class with javacpp).
The api says this returns the bounding box coordinates but returns a boolean instead. SO at this point I know there is a bounding box but still cannot get the actual coordinates. Does anyone know how to get the bounding box rectangles out of java cpp tessaract? Thanks for the help. I have posted my working code for getting the individual words and the confidence level below because I had such a hard time finding examples.
public class TesseractOCR {
public void OCRText() {
BytePointer outText;
TessBaseAPI api = new TessBaseAPI();
// Initialize tesseract-ocr with English, without specifying tessdata path
if (api.Init(null, "eng") != 0) {
System.err.println("Could not initialize tesseract.");
System.exit(1);
}
// Open input image with leptonica library
org.bytedeco.javacpp.lept.PIX image = pixRead("testimage.png");
// Get OCR result
outText = api.GetUTF8Text();
System.out.println("OCR output:\n" + outText.getString());
final ResultIterator ri = api.GetIterator();
int x1 = 0;
int y1 = 0;
int x2 = 0;
int y2 = 0;
IntPointer coord1 = new IntPointer(x1);
IntPointer coord2 = new IntPointer(y1);
IntPointer coord3 = new IntPointer(x2);
IntPointer coord4 = new IntPointer(y2);
ri.Begin();
if (ri !=null) {
do {
BytePointer word = ri.GetUTF8Text(RIL_WORD);
float conf = ri.Confidence(RIL_WORD);
boolean box = ri.BoundingBox(RIL_WORD, coord1, coord2, coord3, coord4);
System.out.println(word.getString());
System.out.println(conf);
System.out.println(box);
} while (ri.Next(RIL_WORD));
}
api.End();
outText.deallocate();
pixDestroy(image);
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Getting null from PDPage#getMediaBox() - java

Related

itext 7 pdf how to prevent text overflow on right side of the page

Java code to compare excel sheets does not work for larger files

How to replace bookmarks in ".docx", using POI without loosing format?

Creating a gradient in background with PDFBox

Get word level bounding boxes in JavaCpp Tessearact

Categories

Resources