pdfbox and itext extracting image with incorrect dpi

pdfbox and itext extracting image with incorrect dpi - java

When I extract an image using pdfbox I am getting incorrect dpi of the image for some PDFs. When I extract an image using Photoshop or Acrobat Reader Pro I can see that the dpi of the image is 200 using windows photo viewer, but when I extract the image using pdfbox the dpi is 72.
For extracting the image I am using following code :
Not able to extract images from PDFA1-a format document
When I check the logs I see an unusual entry:
2015-01-23-main--DEBUG-org.apache.pdfbox.util.TIFFUtil:
<?xml version="1.0" encoding="UTF-8"?><javax_imageio_jpeg_image_1.0>
<JPEGvariety>
<app0JFIF majorVersion="1" minorVersion="2" resUnits="0" Xdensity="1" Ydensity="1" thumbWidth="0" thumbHeight="0"/>
</JPEGvariety>
<markerSequence>
<dqt>
<dqtable elementPrecision="0" qtableId="0"/>
<dqtable elementPrecision="0" qtableId="1"/>
</dqt>
<dht>
<dhtable class="0" htableId="0"/>
<dhtable class="0" htableId="1"/>
<dhtable class="1" htableId="0"/>
<dhtable class="1" htableId="1"/>
</dht>
<sof process="0" samplePrecision="8" numLines="0" samplesPerLine="0" numFrameComponents="3">
<componentSpec componentId="1" HsamplingFactor="2" VsamplingFactor="2" QtableSelector="0"/>
<componentSpec componentId="2" HsamplingFactor="1" VsamplingFactor="1" QtableSelector="1"/>
<componentSpec componentId="3" HsamplingFactor="1" VsamplingFactor="1" QtableSelector="1"/>
</sof>
<sos numScanComponents="3" startSpectralSelection="0" endSpectralSelection="63" approxHigh="0" approxLow="0">
<scanComponentSpec componentSelector="1" dcHuffTable="0" acHuffTable="0"/>
<scanComponentSpec componentSelector="2" dcHuffTable="1" acHuffTable="1"/>
<scanComponentSpec componentSelector="3" dcHuffTable="1" acHuffTable="1"/>
</sos>
</markerSequence>
</javax_imageio_jpeg_image_1.0>
I tried to google but I can see to find out what pdfbox means by this log. What does this mean?
You can download a sample pdf with this problem from this link:
http://myslams.com/test/1.pdf
I have even tried itext but it is extracting image with 96 dpi.
Am I doing something wrong? Or pdfbox and itext have this limitation?

After some digging I found your 1.pdf. Thus,...
PDFBox
In comments to this recent answer #Tilman and you were discussing this older answer in which #Tilman pointed towards the PrintImageLocations PDFBox example. I ran it for your file and got:
Processing page: 0
*******************************************************************
Found image [Im0]
position = 0.0, 0.0
size = 1704px, 888px
size = 613.44, 319.68
size = 8.52in, 4.44in
size = 216.408mm, 112.776mm
Processing page: 1
*******************************************************************
Found image [Im0]
position = 0.0, 0.0
size = 1704px, 2800px
size = 613.44, 1008.0
size = 8.52in, 14.0in
size = 216.408mm, 355.6mm
Processing page: 2
*******************************************************************
Found image [Im0]
position = 0.0, 0.0
size = 1704px, 2800px
size = 613.44, 1008.0
size = 8.52in, 14.0in
size = 216.408mm, 355.6mm
Processing page: 3
*******************************************************************
Found image [Im0]
position = 0.0, 0.0
size = 1704px, 1464px
size = 613.44, 527.04
size = 8.52in, 7.3199997in
size = 216.408mm, 185.928mm
On all pages this amounts to 200 dpi both in x and y directions (1704px / 8.52in = 888px / 4.44in = 2800px / 14.0in = 1464px / 7.32in = 200 dpi).
So PDFBox gives you the dpi values you are after.
(#Tilman: The current 2.0.0-SNAPSHOT version of that sample returns utter nonsense; you might want to fix this.)
iText
A simplified iText version of that PDFBox example would be this:
public void printImageLocations(InputStream stream) throws IOException
{
PdfReader reader = new PdfReader(stream);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
ImageRenderListener listener = new ImageRenderListener();
for (int page = 1; page <= reader.getNumberOfPages(); page++)
{
System.out.printf("\nPage %s:\n", page);
parser.processContent(page, listener);
}
}
static class ImageRenderListener implements RenderListener
{
public void beginTextBlock() { }
public void renderText(TextRenderInfo renderInfo) { }
public void endTextBlock() { }
public void renderImage(ImageRenderInfo renderInfo)
{
try
{
PdfDictionary imageDict = renderInfo.getImage().getDictionary();
float widthPx = imageDict.getAsNumber(PdfName.WIDTH).floatValue();
float heightPx = imageDict.getAsNumber(PdfName.HEIGHT).floatValue();
float widthUu = renderInfo.getImageCTM().get(Matrix.I11);
float heigthUu = renderInfo.getImageCTM().get(Matrix.I22);
System.out.printf("Image %.0fpx*%.0fpx, %.0fuu*%.0fuu, %.2fin*%.2fin\n", widthPx, heightPx, widthUu, heigthUu, widthUu/72, heigthUu/72);
}
catch (IOException e)
{
e.printStackTrace();
}
}
}
(Beware: I assumed unrotated and unskewed images.)
The results for your file:
Page 1:
Image 1704px*888px, 613uu*320uu, 8,52in*4,44in
Page 2:
Image 1704px*2800px, 613uu*1008uu, 8,52in*14,00in
Page 3:
Image 1704px*2800px, 613uu*1008uu, 8,52in*14,00in
Page 4:
Image 1704px*1464px, 613uu*527uu, 8,52in*7,32in
Thus, also 200dpi all along. So iText, too, gives you the dpi values you are after.
Your code
Obviously the code you referenced had no chance to report a dpi value sensible in the context of the PDF because it only extracts the images as found in the resources but ignores how the respective image resource is used on the page.
An image resource can be stretched, rotated, skewed, ... any way the author likes when he uses it in the page content.
BTW, a dpi value only makes sense if the author did not skew and rotated only by a multiple of 90°.

Related

How to read PDF sections using Header font size using PDFBox?

I am trying to read PDF documents and I need them to be separated by sections using header font size or font and font size I currently have it implemented based on the answer of this post. But due to my PDF having the same font for header and the sub-header I need to modify the code so it would search based on font size or both.
List<TextSectionDefinition> sectionDefinitions = Arrays.asList(
new TextSectionDefinition("Section", x -> x.get(0).get(0).getFont().getName().contains("Calibri,Bold"), TextSectionDefinition.MultiLine.multiLineHeader, true)
);
document.getClass();
PDFTextSectionStripper stripper = new PDFTextSectionStripper(sectionDefinitions);
stripper.getText(document);
System.out.println("Sections:");
List<String> texts = new ArrayList<>();
for (TextSection textSection : stripper.getSections()) {
String text = textSection.toString();
System.out.println(text);
texts.add(text);
}
return ResponseEntity.ok(texts);
My problem stems if I try to use getFontSize instead of getFont it doesn't allow any parameters to be entered, in my case 16 (font size).

In the answer you refer to there are text section definitions like this:
new TextSectionDefinition("Titel",
x->x.get(0).get(0).getFont().getName().contains("CMBX12"),
MultiLine.singleLine,
false)
I assume your remark
if I try to use getFontSize instead of getFont it doesn't allow any parameters to be entered, in my case 16
indicates that you want to exchange the lambda expression in the second parameter
x->x.get(0).get(0).getFont().getName().contains("CMBX12")
by something that tests the font size. Thus, have you tried replacing it by
x->x.get(0).get(0).getFontSize() == 16
or
x->x.get(0).get(0).getFontSizeInPt() == 16
or
x-> {
float size = x.get(0).get(0).getFontSizeInPt();
return size > 15 && size < 17;
}
yet?

Imebra library shows completely gray image for transfer syntax 1.2.840.10008.1.2.1

Iam trying to use Imebra library to display DICOM images in android. Iam using version 5.0 of the library.
The bitmap shown is completely gray, transfer syntax for image is 1.2.840.10008.1.2.1.For other supported transfer syntax i.e JPEG it works fine.
Also I am unable to add VOILUT transform functionality as mentioned in documentation it gives error cons tructor not found for VOILUT.
Below is the code Iam using, VOILUT transform part is giving constructor not found. if i remove VOILUT transform part things work fine but for image with transfer syntax 1.2.840.10008.1.2.1 it shows completely grey image
private Bitmap fromDicom(String filePath, int frameNumber){
// have been applied).
Image dicomImage = loadedDataSet.getImageApplyModalityTransform(frameNumber);
// Use a DrawBitmap to build a stream of bytes that can be handled by the
// Android Bitmap class.
com.imebra.TransformsChain chain = new com.imebra.TransformsChain();
if( com.imebra.ColorTransformsFactory.isMonochrome(dicomImage.getColorSpace()))
{
// Retrieve the VOIs (center/width pairs)
com.imebra.VOIs vois = loadedDataSet.getVOIs();
if(!vois.isEmpty())
{
// Get the first VOI setting from the dataset
chain.addTransform(new VOILUT(vois.get(0)));
}
else
{
// The dataset does not have any VOI setting, find the optimal one
com.imebra.SWIGTYPE_p_imebra__VOIDescription voiDescription = VOILUT.getOptimalVOI(dicomImage, 0, 0, dicomImage.getWidth(), dicomImage.getHeight());
chain.addTransform(new VOILUT(voiDescription));
}
}
DrawBitmap drawBitmap = new DrawBitmap(chain);
Memory memory = drawBitmap.getBitmap(dicomImage, drawBitmapType_t.drawBitmapRGBA, 4);
// Build the Android Bitmap from the raw bytes returned by DrawBitmap.
Bitmap renderBitmap = Bitmap.createBitmap((int)dicomImage.getWidth(), (int)dicomImage.getHeight(), Bitmap.Config.ARGB_8888);
byte[] memoryByte = new byte[(int)memory.size()];
memory.data(memoryByte);
ByteBuffer byteBuffer = ByteBuffer.wrap(memoryByte);
renderBitmap.copyPixelsFromBuffer(byteBuffer);
// Update the image
return renderBitmap;
}
After changing the code suggested by you, I don't find classes mentioned
VOIDescription instead i see class SWIGTYPE_p_imebra__VOIDescription should i use that class
There is one more error no getWidth() method available with vois.get(0).getWidth
One last Error i don't see class vois_t instead there is a class VOIs should VOIs be used
Thanks for the reponse

The VOILUT must be initialized with the proper contrast settings from the dataset like in the code below.
However, the dataset contains a VOI setting that is wrong (the window width is 0) so this file will be displayed correctly only if you use custom VOI settings or just use automatic settings when width is zero (see alternative code below which checks for width > 0).
Code that does not check for width:
if(com.imebra.ColorTransformsFactory.isMonochrome(dicomImage.getColorSpace());
{
// Retrieve the VOIs (center/width pairs)
com.imebra.vois_t vois = loadedDataSet.getVOIs();
if(!vois.isEmpty())
{
// Get the first VOI setting from the dataset
chain.addTransform(new VOILUT(vois.get(0)));
}
else
{
// The dataset does not have any VOI setting, find the optimal one
com.imebra.VOIDescription voiDescription = VOILUT.getOptimalVOI(dataSetImage, 0, 0, width, height);
chain.addTransform(new VOILUT(voiDescription));
}
}
Alternative code that checks if width is 0:
if(com.imebra.ColorTransformsFactory.isMonochrome(dicomImage.getColorSpace());
{
// Retrieve the VOIs (center/width pairs)
com.imebra.vois_t vois = loadedDataSet.getVOIs();
if(!vois.isEmpty() && vois.get(0).getWidth() > 0.1)
{
// Get the first VOI setting from the dataset
chain.addTransform(new VOILUT(vois.get(0)));
}
else
{
// The dataset does not have any VOI setting, find the optimal one
com.imebra.VOIDescription voiDescription = VOILUT.getOptimalVOI(dataSetImage, 0, 0, width, height);
chain.addTransform(new VOILUT(voiDescription));
}
}

Content overwritten on top of another when formula is applied to excel cell using Apache poi

I am getting an issue of content being overwritten on top of another i.e old value and new values overlap after applying formula. And this issue only occurs Microsoft Excel 2010. How do I handle this?
I have tried refreshing the formulas for entire workbook using formula evaluator but still the issue persists.
private String applyTtlAvlFormula(final int colId, final int hrsColId)
{
String ttlFormulaString = "";
for( int i = ttlAvlContentRow; i < tableHeadStartRow+1; i++) {
String colName1 = CellReference.convertNumToColString(hrsColId);
String colName2 = CellReference.convertNumToColString(colId);
ttlFormulaString += "("+colName1+String.valueOf(i)+ "*$"+colName2 +"$"+String.valueOf(i)+")";
if (i != tableHeadStartRow){
ttlFormulaString += "+";
}
}
return ttlFormulaString;
}
cell.setCellFormula(applyTtlAvlFormula(j,hrsCol));
Following are pre and post shots of the issue.
From the images it can be seen that 40 and 32 gets overlapped after applying the formula. How can this issue be resolved.

iTextSharp Footer Gets Progressively Bolder with Each Page

Using OnEndPage, I add a footer to my PDF created with iTextSharp. The footer font gets progressively bolder with each page.
How can I create consistent NORMAL fonts in my footer?
Here is my code:
public override void OnEndPage(PdfWriter writer, Document doc)
{
iTextSharp.text.Image gif = null;
if (FooterImage)
{
if (File.Exists(PathImages))
{
gif = iTextSharp.text.Image.GetInstance(PathImages);
gif.ScaleToFit(75f, 75f);
gif.SetAbsolutePosition(0, 0);
}
}
string sFooter = string.Empty;
if (FooterURL != null && FooterURL.Length > 0)
{
sFooter = FooterURL + " ";
}
if (FooterDate != null && FooterDate.Length > 0)
{
sFooter += FooterDate + " ";
}
if (FooterPage)
{
sFooter += "Page " + doc.PageNumber.ToString();
}
PdfPTable footerTbl = new PdfPTable(1);
footerTbl.TotalWidth = 900;
footerTbl.HorizontalAlignment = Element.ALIGN_CENTER;
Phrase ph = new Phrase(sFooter, FontFactory.GetFont(FontFactory.TIMES, 10, iTextSharp.text.Font.NORMAL));
PdfPCell cell = new PdfPCell(ph);
cell.Border = 0;
cell.PaddingLeft = 10;
footerTbl.AddCell(cell);
if (FooterImage)
{
PdfContentByte cbfoot = writer.DirectContent;
PdfTemplate tpl = cbfoot.CreateTemplate(gif.Width / 5, gif.Height / 5);
tpl.AddImage(gif);
cbfoot.AddTemplate(tpl, doc.PageSize.Width - 100, 10);
}
footerTbl.WriteSelectedRows(0, -1, 10, 30, writer.DirectContent);
}

In the old days, when there wasn't as much choice as today regarding fonts, people used workarounds to create bold fonts. One way to make a font bold, was by adding the same text over and over again at the same position. I think that this is happening to you.
When you use page events correctly, the onEndPage() method is triggered automatically each time a page ends. My guess is that you're doing something very wrong that triggers the onEndPage() many times. Maybe you are called the onEndPage() from your code, maybe you're adding the page event to the writer more than once (and page events are cumulative).
If I have to guess, I would guess that you are doing the latter. My guess is based on the fact that you are using variables such as FooterImage in your onEndPage() method. How are you setting that variable. If you are setting it in the constructor of the page event and you're adding the new page event over and over again to the writer, then you're doing it wrong.

Load image of size greater than 2880 in Bitmap

I am using bitmap to load the image, if the image size is more than 2880 h/w I am getting an error.
BitmapData src = new BitmapData(canvasToPrint.width,canvasToPrint.height)(canvasToPrint.width, canvasToPrint.height);
src.draw(_designArea); // -- encode the jpg
var quality:int = 115;
var jpg:JPEGEncoder = new JPEGEncoder(quality);
var byteArray:ByteArray = jpg.encode(src);
if canvasToPrint.width > 2880 or canvasToPrint.width > 2880 I will get below error at line 1
Error : invalid Bitmap
To over come the above issue I have used bitmapdataunlimited class, as mentioned in the below link.
http://blog.formatlos.de/2008/05/28/bitmapdataunlimited/comment-page-2/#comment-4870
But it only works fine for 4096 pixel of height and width, guide me if any alternative solution is there to create huge bitmap.

If makes a difference which FlashPlayer you are targetting:
versions VS maximum bitmapsize
flashplayer -9 : 2880x2880 px
flashplayer 10 : 4096x4096 px
flashplayer 11 : unlimited

http://www.bit-101.com/blog/?p=2067
Try this

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

pdfbox and itext extracting image with incorrect dpi - java

Related

How to read PDF sections using Header font size using PDFBox?

Imebra library shows completely gray image for transfer syntax 1.2.840.10008.1.2.1

Content overwritten on top of another when formula is applied to excel cell using Apache poi

iTextSharp Footer Gets Progressively Bolder with Each Page

Load image of size greater than 2880 in Bitmap

Categories

Resources