Error on columns order - extracting text with PDFbox - java

I am trying to strip the text of a PDF document but the columns are out of order. I'm getting the last column as the first.
How I can reorder it? See document here.

Related

Not able to apply mso-number-format in HTML for HTML string to Excel conversion using Aspose Cells

I am trying to convert Html string to Excel using Aspose Cells. I am trying to convert negative number to number with parenthesis and comma and the number to be aligned in such a way that the decimal point comes in single line as shown in the below image.
For the html string, I am creating a table in jquery where I am inserting mso-number-format in the string as show below.
I am adding the image because I was not able to edit it to show the exact format.
var html += '<td style='+msoStyle+'>(1,234)</td><td style='+msoStyle+'>1,234</td>';
The mso-number-format can be found in ms excel --> Number --> Custom option
When I check the generated html, the css looks like this :
Html page :
When I re-generate html string from the html table, the css style looks like this :
The excel generated from Aspose cells is giving me below output
Aspose cells version : 19.6
Java version : 1.8
Can you guys please help me where exactly I am going wrong. If you have any different solution for what I am trying to achieve, please let me know.
Regards,
Ashish M

How can i remove blank columns from an excel file through java?

I have created a testdata.xlsx which contains data in form of key-value pair. The java code that I have implemented enters a key and the corresponding value below it . In some cases the key-value will be blank , hence creating a blank column . Please suggest how can i remove the blank columns so that the existing data will be available serially without any blank columns between them .
I couldn't find any solution .
NA
NA
If you are sure that file will contain only 2 row (1-Key, 2-Value), then check first 2 cell in each column. If both are null then remove that column using Apache POI API.
If not, you have to check entire column cell for any character.
You can use sheet.shiftColumns(colno + 1, endcolno, -1);.
Or just sheet.setColumnHidden(colno); to hide the column, which is easier when
formulas are involved.
Finding blank columns you probably already did.

Apache POI for docx Insert Text On Specific Page

I'm trying to make Table of Contents for my Word document docx.
Apache POI is still too buggy. The document.createTOC() does not produce anything unless placed at the end. Sometimes, it doesn't give the correct page numbers.
The document.enforceUpdateFields() doesn't do anything!
So I thought I make my own method that creates the Table of Content. However, I will call it at the end but I need it to be inserted at the beginning!
In other words, suppose my document at some point in my program has some text on the first page and the second page. And I haven't yet saved it; how do I insert at the beginning of first page?
I haven't tried this yet. But, after you write the document. Reload it again
Then try the following:
List<XWPFParagraph> paragraphs = document.getParagraphs();
XWPFRun run = paragraphs.get(0).insertNewRun(0); // first paragraph, 0 is the position
run.setText("your data here");

Show pdf table header only for first table on a page

I have PDF document generated by iText in Java application, where I have a lot of tables with the same columns. In order to save some space in document I would like to have header only the first table on page. Any ideas? Thanks.

How to insert content in the middle of a page in a PDF using IText

I have a requirement to insert Content into the middle of the page in a PDF.
The Content may be a Dynamic Table or an Image.
My Concept was to first split the PDF into 2 parts, then get the new Content that is to be added and append by replacing a place holder field.
the Splitting is called Tiling as per IText and here is an example for the same.
http://itextpdf.com/examples/iia.php?id=116
The Code above has 2 drawbacks:
1. It splits the page into 16 parts. but that is part of the example. Still i cant figure out a way to split the file into 2 parts only.
2. secondly the split page is converted to a complete page thus disturbing its proportions.
The Rearranging code is the another problem.
The remaining Content should be re-ordered in append mode. but till yet i have only found codes to add complete new pages rather than just the content.
I have found a code that appends the PDF content by replacing a placeholder:
float[] fieldPosition= pdfTemplate.getAcroFields().getFieldPositions("tableField");
PdfPTable table = buildTable();
PdfContentByte cb = stamper.getOverContent(1);
table.writeSelectedRows(0, -1, fieldPosition[1],fieldPosition[4],cb);
Please help me to solve this requirement.
PDF is a presentation format, not an edition format. In other words, it is not designed to allow content insertion, with the original content reflowing gracefully. As a consequence, no tool (at least, none that I know of, and surely not iText) will enable you to achieve what you were given as a requirement.
My advice :
refuse the assignment since it's not feasible, or
get your hands on the original document, insert the desired extra content, and then convert to PDF.

Categories