I generate A4 PDF with wkhtmltopdf library; However, I can't figure out how to generate a two-column A3 PDF (such as the attached math exam paper) via Java.
eg: two-column math exam paper
If anyone has some suggestions, I would really appreciate it.
The way PDFs merge doesn't work for me.
Related
I'm trying to perform automaticaly table extraction inside PDF. I know there are several libraries and methods Java and Python, but to my surprise, the method that has worked best for me is to convert my Pdf to a Docx document and from there to extract the tables (thanks to: How to get pictures and tables from .docx document using apache poi?).
My question is this: Assuming that within the format conversion there may be loss of information, why are my results better this way? Tabula hasn't been able to do better automatically. To understand this, I have looked for information (e.g. Extracting table contents from a collection of PDF files) but I'm still very confused.
PD: For the moment, I have used https://github.com/thoqbk/traprange (A method based on Pdfbox), How to extract table as text from the PDF using Python? (PyPdf2) and Tabula. When I get to my home I going to put code and cases, I'm writing from my smartphone.
This question already has answers here:
How to read a Table in a PDF using iText java?
(3 answers)
Closed 7 years ago.
I am very tried to trying to read table with rows, cells of a pdf file to get records in systematic order.
I have done a lot of google but i could not find best ways to do this.
So i want to ask one question about it -
Q 1- Can we read data from pdf file ?
Q 2- Can we read data from any cell of pdf table ?
I am using itext of java to do this.
Please give me any example to do this.
Thanks
The answer to both your questions is: It depends.
Suppose that you have a ZUGFeRD invoice. In that case, the invoice is a PDF/A-3 document that has an embedded file in the CII XML format. It is very easy to extract this XML and read it to get all the necessary information about the invoice. The concept of embedded or attached file that contain the source of the data used to create the PDF, or the data in an alternative form than PDF, is a technique that is used to allow what you need.
You can extract text from a PDF. This is explained in questions such as PDF text extraction using iText but you only get the raw text without formatting. In many cases, a PDF consists of a bunch of text and lines put on a canvas at absolute positions. A word on the page does not know if it's part of a sentence, part of a cell, etc. Unless:
If the PDF is a Tagged PDF, then the PDF also contains information about the structure of the content. For instance: the content will contain tags that indicate structures such as tables, table headers, table rows, table cells. If you are talking about Tagged PDFs, then it's possible to extract the text in a structured way.
In the past, we have done project where we received credit card statements from VISA, MasterCard, AmEx,... We had to extract all the expenses and store them as records in a database. We were able to achieve this, because the format of the statements was predictable: all VISA statements are created alike, hence we were able to find the pattern that allowed us to extract the data.
It goes without saying that we do not share the code we used to do this. The company that paid us for doing that project would not be pleased.
I am trying to create dynamic forms for a web application using Excel spreadsheets.
The form has some relatively advanced rules like the following:
Field A > Field B.
Field C must be shown if Check Box D is checked.
Field E is read-only and must be a sum of A and B.
Field G is sum of E and A or F and A if B is empty.
Combinations of rules.
These are just examples of some of them.
The server is implemented and runs in Java which I guess narrows the possible solutions. My first thought is to parse the excel spreadsheet with all required information into XML to enable either serverside or clientside conversion. This is basicly because I have found tools that work on either side.
So my question is whether anyone knows of a tool that can perform this conversion or if anyone knows of a better solution?
I have looked at https://github.com/davidmoten/xsd-forms but I am not sure it can implement all the required rules and license information is sparse.
I realize this question is quite vague but so is the task. Any help is appreciated.
I think you can use Apache's POI API for reading Excel sheet and JAX-B for generating XML from the data read from excel sheet.
You can read the more details about reading excel files using Apache's POI API over here.
I'm developing a desktop software to manage people and telephones, and also to generate (export) a list of telephones (also with a summary of the cities) that can be printed (like pdf). The part of telephones management is ready and was made with java and swt/jface. Exporting the list in a print friendly format is what has become an issue.
I tried exporting the list in HTML with CSS, but the result is not the same in different browsers.
I was thinking about generating it in LaTeX, but creating an style is getting too complicated (need an A7 page size, smaller fonts...).
What file format can be used to export this list? Is there an easy way to generate printable stuff?
Edit: forgot to mention that the file will be sent to a company to be printed.
Thanks!
Generate a pdf, it will look the same no matter what browser they use. You can use iText to create the pdf, it is fairly straight forward for a simple pdf.
You could just draw an image, it will stay the same on different systems and its easy to print. by drawing it, you can style it like you imagine, without learning any document format. It should be easy to draw a simple table.
Plain text is a very friendly format for me. Altough, this could be done with HTML and CSS, if you keep the style complexity level to a minimum. Try reading:
http://www.smashingmagazine.com/2010/06/07/the-principles-of-cross-browser-css-coding/
And be careful when choosing your properties!
I am suppose to generate graph from the results/execution of my algorithm . I have heard something about using CSV file in Excel and generating the graph. I have no idea what this CSV file is and how to do it. I googled CSV file but the answer i got was in connection with databases.
I am asking if someone can show me or point me to a tutorial where this kind of thing has been done before. For instance i have to generate a graph from a Quicksort algorithm and also generate another graph with many algorithms at the same time.
Need help please
Thanks
CSV == "comma separated values". It's a file that has one row per line, where each value is separated by a comma.
I'm not sure how this is relevant to your algorithm or generating graphs.
Since you're using Java, you can easily generate a nice looking graph using GraphViz from AT&T. I think it's a terrific tool.