Identifying Hidden Elements in PDF from Chrome Page Breaks

Identifying Hidden Elements in PDF from Chrome Page Breaks - java

I have a pdf document created by Google Chrome. When parsing the text via PDFBox (Java), I found that there was a hidden block of text straddled between the pages. Although the rendering mode was "FILL", I found that the element was off the page. Problem solved.
Now, I find that another similar element also appears off the page, but the coordinates to not tell this. It is within the visible margin of the second page that it straddles. It has y2 = 31.195312 and max height of 29.894833 (Font size = 36). Calculated y1 is about 1, still on the page.
The text positions obj shows some interesting internal properties, but they are not public vars. All I have is this TextPosition object (https://pdfbox.apache.org/docs/1.8.10/javadocs/org/apache/pdfbox/util/TextPosition.html) and the surrounding context.
I can reproduce the issue, but it requires my particular document. Could be attempted with page-break-inside tests, but I haven't found a simple test for it. I'm looking for some kind of margin, but so far all the boxes from within this.getCurrentPage() show just the plain page height, and no start position. Another possibility is that there is another way of looking for the coordinates than firstTextPos.getY() and firstTextPos.getHeight().
PDF in Mac Preview:
The text is selected between pages and is listed on the second page. In the case where it is listed on the first page, I am able to handle the issue as described above.
TextPosition object private vars:

Related

How to inject images into a Word template via docx4j without getting them resized

My program injects text and pictures into a Word template. This works great via content control data binding (thanks to docx4j and Content-Control-Toolkit).
My problem is, that images get resized after injection. What I actually want, is the behavoir that Jason decribed here: http://www.docx4java.org/forums/data-binding-java-f16/picture-content-control-size-t634.html
The current behaviour is to just let it be whatever its natural size is (at a given dpi), unless that is greater than page width, in which case it is scaled down.
According to that post, the behavoir of docx4j has been changed so that the pictures always fit the size of the content control with respect to the ratio.
Is it possible to get the "old" behavoir back? Do I have to do that on my own, or is the switch, that Jason wrote about, already implemented?
As the answer to How to force Docx4j to refresh a replaced image file states, the size of a picture is stored in the main document part. At the moment, I only use XPath to set content in the custom XML part. If there is any possibility to get what I need without touching the documents XML directly, I would really prefer that. A macro to set the size after opening the document in Word is no option for me.

The first thing to be aware of is that these days we prefer to have a picture in a rich text content control, as opposed to a picture content control.
This is because Word limits your ability to "float" a picture content control.
The handling for this is triggered by w:tag containing 'od:Handler=picture': datastorage/bind.xslt#L165
The basic behaviour is that if the w:sdtContent contains an existing w:drawing/wp:inline/a:graphic then reuse it, so any formatting thus configured is used.
But for a "legacy" picture content control which doesn't contain a:blip (when would this be?), xpathInjectImage is invoked with wp:extent passed in (see bind.xslt#L240).
At line 1143, if (cxl==0 || cyl==0) // Let BPAI work out size
So if you want the image at its natural size, you could try removing the when clause at bind.xslt#L212
By the way, we can also bind escaped XHTML. But there, we make an effort to fit any image not just to the page width, but if in a table cell, to that as well.

How to verify the position of the web page element

I would like to know if there is any way to verify that a specific elment of a web page is in a given position.
For instance, suppose you have a requirement stating:
"The lamp button shall appear on the upper left side corner of the web page".
How can I verify that the button is in the right position automatically?
Firstly, I should ask the requirements engineer to translate the left position in coordinate and since I have been working with Selenium 2.0 JAVA API, I can use
the method "getLocation" of the element. Is the only way to verify the element position.
http://selenium.googlecode.com/svn/trunk/docs/api/java/org/openqa/selenium/WebElement.html#getLocation%28%29

There are two ways to verify the location of an element.
getLocation() verifies the absolute location of an element. I don't recommend this, it can change all the time depending on loading speeds and the size of the window.
getCssValue() can, to an extent, verify the relative location of an element. Most modern websites do their layout via CSS. You can do a getCssValue("display") which might return block,grid, or other layouts. You can check the margin, border, or padding.
However, the above solution is also brittle, because a site can change their layout quite frequently. It also requires the developer of the selenium code to know quite a lot about CSS. Finally, it isn't fail-proof. I can check for 20 different things, and still a single change in the CSS can completely change the position of the element.

How do I render a grid only on the last page in BIRT?

Using BIRT 3.7.2, I'm trying to place a grid at the bottom part of the last page of a report for signatories. However I can't seem to find a way to do that.
I've tried to place the grid in a footer and render it using:
if(BirtComp.lessThan(pageNumber, totalPage)) {
this.getStyle().display = "none";
}
on the grid's onRender.
However this only works on the webviewer, and I need it to work in PDF with my Java application. When I use run and format=pdf parameters, there is still that grid at every page, and using frameset instead of run makes it not appear at all. Are there other techniques to do this? If else I may have to settle for placing the signatory grid below the table data so at least it will be on the last page.

PDF Generation Using CSS and JAVA

I am generating a PDF comprised of multiple tables and I want to know if there is a certain way by which I can get to know if the table size will exceed the PDF page size or not. I am using the information to decide wether I will generate the PDF in portrait or landscape mode. Would it be possible to get this size?

know if the table size will exceed the PDF page size
The table size is determined from your XHTML Table Content (rows, columns, headers, footers, etc), plus your CSS doc(s) (width, height and border properties).
The rendering engine (ITextRenderer) only knows your table size, and whether it will fit within a page after it's applied CSS, as part of the process of converting XHTML to rendered output.
So, if you were to query ITextRenderer for this information, you would need to:
Create an ITextRenderer instance, pass your Document in via renderer.setDocument(), and then apply the CSS via renderer.layout(), causing all of the layout calculations to occur based upon the original page orientation
Query somehow to determine if the table fits on the page
If it doesn't, switch between portrait/landscape. But this means changing to a new CSS, setup for a different page shape:
Pick a new CSS, appropriate for the new page orientation and set it within your document
Create a new ITextRenderer instance, pass your Document in via renderer.setDocument(), and then apply the CSS via renderer.layout(), causing all of the layout calculations to occur based upon the new page orientation.
Create PDF output via renderer.createPDF().
Problems:
This is inefficient and inelegant processing.
This is doing things the wrong way around. Controlling page layout should be part of the design process, and should be reflected in (a) the content generated (XHTML) and (b) the formatting (CSS). If that's done correctly, there should be no need to do render-time magic to reformat the entire page.
At step (2), there's no simple query interface that will simply tell you if your table fits within a page. Instead, you have to query the formatted output blocks, and try to work out for yourself where your table is and whether it fits in a page. E.g.:
BlockBox root = renderer.getRootBox();
List pageList = root.getLayer().getPages()
PageBox page = (PageBox)pageList.get(2);
List childBoxList = page.getChildren();
Box childBox = (Box)childBoxList.get(0);
// etc... until you locate your table
At step (4), ITextRenderer expects your CSS to be linked from within your XHTML document. That means you first need to modify your XHTML source, then you need to resubmit it to ITextRenderer.
Suggested Alternative:
Do a wireframe/sketch design with 2 scenarios - one for portrait & one for landscape
Determine the size of elements/rows/columns under the 2 scenarios
Determine the condition logic under which you would use each scenario, portrait v landscape. E.g. if I the number of rows is X and the number of columns is Y - then use landscape, ...
Now create the two CSS files
Now, when you generate your XHTML, use the condition logic to link to the correct CSS

Is it possible to count lines of visible text in HTML formatted file?

We have a form that lets people input html formatted text and that input is then displayed elsewhere on the webpage.
Due to layout constraints the input may not be longer than X lines. There is no preview of the edited text in the webpage. Input upon submit is immediately published. (Short Explanation why I cannot fix the layout: the text in question is displayed as an overlay above an image. The overlay has about the size of the image and that is fixed. The text should be completely visible inside that overlay and should not spill over.)
I am being asked to implement something to keep people from entering too much text.
My first try was going for "maxLength" but goes wrong because of the possibility html formatting of the input.
Besides the obvious two 1. expecting users to be smart and/or 2. implementing a preview method, how else could we possibly solve this?
I am out of ideas, I'll also accept an explanation why it is impossible
Technology used: java, wicket 1.4.x

I'm not a web-development expert so this may not be a precise enough answer.
Using javascript you can get the effective size of an HTML element once it has been rendered in the browser. Thus, one solution could be to render the page server-side, and check whether the result exceeds the size you expect.
This may not guarantee a correct result however because the server may render the page differently than the client. You could also always accept the input, then when the client renders the page, have a snippet of javascript that checks (client-side this time) whether the rendered result is okay. If it isn't, then let javascript redirect the client to an error page so that it can edit his input.
What I'd do however is to change the layout so that nothing breaks if the user enters too much text. Using the CSS overflow property could be a start. You could also implement better solutions in javascript such as dynamically changing the text size until it fits the size you want.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Identifying Hidden Elements in PDF from Chrome Page Breaks - java

Related

How to inject images into a Word template via docx4j without getting them resized

How to verify the position of the web page element

How do I render a grid only on the last page in BIRT?

PDF Generation Using CSS and JAVA

Is it possible to count lines of visible text in HTML formatted file?

Categories

Resources