How to save output from XML to PDF - java

I am using JAXB to unmarshall XML. Then I want to take some infos and write it to PDF format using iText. For some reason PDF is created but I can't open the file. I am also using ZFile as this should work on mainframes too, but this shouldn't be a problem here.
Probably I am doing something wrong when writing to PDF file. Here is my code:
package music;
import java.io.*;
import java.sql.Timestamp;
import java.util.Date;
import java.util.List;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Unmarshaller;
import com.ibm.jzos.ZFile;
import java.io.FileOutputStream;
import com.itextpdf.text.*;
import com.itextpdf.text.pdf.BaseFont;
import com.itextpdf.text.pdf.PdfContentByte;
import com.itextpdf.text.pdf.PdfWriter;
import music.Music.Artist;
import music.Music.Artist.Album;
import music.Music.Artist.Album.Description;
import music.Music.Artist.Album.Song;
public class MusicXml {
public static void main(String[] args) throws JAXBException, IOException {
ZFile inputZ = null, outputZ = null;
File inputW = null;
PdfWriter outputW = null;
PdfContentByte cb = null;
Document pdf = new Document(PageSize.A4);
Paragraph paragraf = new Paragraph();
// Font
Font fnt12n;
JAXBContext jaxb = null;
Unmarshaller unmarsh = null;
String line = null, sep = " ";
Music music;
Date date = new Date();
Date startDate = new Timestamp(date.getTime());
System.out.println("Start: " + startDate);
jaxb = JAXBContext.newInstance(ObjectFactory.class);
unmarsh = jaxb.createUnmarshaller();
String os = System.getProperty("os.name");
System.out.println("System: " + os);
boolean isWin = os.toLowerCase().contains("wind");
if (!isWin) {
// z/OS:
inputZ = new ZFile(args[0], "rt"); // "rt" - readtext
InputStream inpStream = inputZ.getInputStream();
InputStreamReader streamRdr = new InputStreamReader(inpStream, "CP870");
try {
outputW = PdfWriter.getInstance(pdf, (new ZFile(args[1], "wb")).getOutputStream());
} catch (DocumentException e) {
e.printStackTrace();
}
music = (Music) unmarsh.unmarshal(streamRdr);
} else {
// Windows:
inputW = new File(args[0]);
music = (Music) unmarsh.unmarshal(inputW);
try {
outputW = PdfWriter.getInstance(pdf, new FileOutputStream(args[1]));
} catch (DocumentException e) {
e.printStackTrace();
}
}
List<Artist> listaArtystow = music.getArtist();
for (Artist artysta : listaArtystow) {
List<Album> listaAlbumow = artysta.getAlbum();
for (Album album : listaAlbumow) {
Description opis = album.getDescription();
List<Song> listaPiosenek = album.getSong();
for (Song piosenka : listaPiosenek) {
String artistName = artysta.getName();
String albumName = album.getTitle();
int numberOfSongs = listaPiosenek.size();
String albumDescription = album.getDescription().getValue();
String songTitle = piosenka.getTitle();
String songDuration = piosenka.getLength();
line = songTitle + sep + songDuration;
FontFactory.register(args[2], "jakiesFonty");
Font font = FontFactory.getFont("jakiesFonty", BaseFont.CP1250, BaseFont.EMBEDDED);
BaseFont bf = font.getBaseFont();
fnt12n = new Font(bf, 12f, Font.NORMAL, BaseColor.BLACK);
// PDF
outputW.setPdfVersion(PdfWriter.VERSION_1_7);
pdf.addTitle("Musical collection");
pdf.addAuthor("Natalia Nazaruk");
pdf.addSubject("Cwiczenie tworzenia PDF z XML");
pdf.addKeywords("Metadata, Java, iText, PDF");
pdf.addCreator("Program: MusicXML");
pdf.setMargins(60, 60, 50, 40);
pdf.open();
pdf.newPage();
try {
paragraf.setAlignment(Element.ALIGN_JUSTIFIED);
paragraf.setSpacingAfter(16f);
paragraf.setLeading(14f);
paragraf.setFirstLineIndent(30f);
paragraf.setFont(fnt12n);
pdf.add(new Paragraph(line, fnt12n));
} catch (DocumentException e) {
e.printStackTrace();
}
}
}
}
date = new Date();
Date stopDate = new Timestamp(date.getTime());
System.out.println("Stop: " + stopDate);
long diffInMs = stopDate.getTime() - startDate.getTime();
float diffInSec = diffInMs / 1000.00f;
System.out.format("Czas przetwarzenia pliku XML: %.2f s.", diffInSec);
System.exit(0);
if (isWin) {
outputW.close();
} else
outputZ.close();
}
}

Apart from the fact that you chose to use an old version of iText, there are a couple of other things wrong with your code. Which documentation did you read? I don't think you've already discovered the official iText web site, otherwise you would have used iText 7 instead of iText 5, and you would have known that no valid document is created if you never close the Document object.
The short answer is that you forgot:
pdf.close();
I see that you close the output stream:
if (isWin) {
outputW.close();
} else
outputZ.close();
}
That doesn't really make sense, because at that point, the PDF hasn't been finalized (for instance: no cross-reference table was created). When you close the document, the underlying output stream is closed implicitly (unless you tell iText explicitly not to do this).
There's also something awkward about the loops you are creating:
List<Artist> listaArtystow = music.getArtist();
for (Artist artysta : listaArtystow) {
...
for (Album album : listaAlbumow) {
...
for (Song piosenka : listaPiosenek) {
...
FontFactory.register(args[2], "jakiesFonty");
Font font = FontFactory.getFont("jakiesFonty", BaseFont.CP1250, BaseFont.EMBEDDED);
BaseFont bf = font.getBaseFont();
fnt12n = new Font(bf, 12f, Font.NORMAL, BaseColor.BLACK);
// PDF
outputW.setPdfVersion(PdfWriter.VERSION_1_7);
pdf.addTitle("Musical collection");
pdf.addAuthor("Natalia Nazaruk");
pdf.addSubject("Cwiczenie tworzenia PDF z XML");
pdf.addKeywords("Metadata, Java, iText, PDF");
pdf.addCreator("Program: MusicXML");
pdf.setMargins(60, 60, 50, 40);
pdf.open();
pdf.newPage();
...
}
}
}
output.close();
You create the same font over and over again. One PDF can only have 1 version (in your case PDF-1.7) and 1 set of metadata, yet you define that version and metadata over and over again. Finally, you open the document many times whereas you only need to open it once.
This makes more sense:
FontFactory.register(args[2], "jakiesFonty");
Font font = FontFactory.getFont("jakiesFonty", BaseFont.CP1250, BaseFont.EMBEDDED);
BaseFont bf = font.getBaseFont();
fnt12n = new Font(bf, 12f, Font.NORMAL, BaseColor.BLACK);
// PDF
outputW.setPdfVersion(PdfWriter.VERSION_1_7);
pdf.addTitle("Musical collection");
pdf.addAuthor("Natalia Nazaruk");
pdf.addSubject("Cwiczenie tworzenia PDF z XML");
pdf.addKeywords("Metadata, Java, iText, PDF");
pdf.addCreator("Program: MusicXML");
pdf.setMargins(60, 60, 50, 40);
pdf.open();
List<Artist> listaArtystow = music.getArtist();
for (Artist artysta : listaArtystow) {
...
for (Album album : listaAlbumow) {
...
for (Song piosenka : listaPiosenek) {
...
pdf.newPage();
...
}
}
}
pdf.close();
As you can see, you open() the Document instance pdf before the loop, to write the PDF headers, and you close() the Document after the loop to write some objects (e.g. fonts), the cross-reference table, and the PDF trailer. As you don't have pdf.close() in your code, all that necessary information is missing from your PDF.
Since you are new at iText, I would highly recommend you not to use versions older than iText 7. You may have discovered that the latest iText 5 release is iText 5.5.13, but that's a maintenance release. In maintenance releases, we only provide bug fixes for our paying customers; we don't add new functionality. For instance: the new PDF specification ISO 32000-2 (aka PDF 2.0) is only available from iText 7.1 on. We won't support PDF 2.0 in older versions.
If you go to the official web site, you'll notice that iText 7.1.1 is the most recent version (iText 7 download page). Where did you find iText, and how come you selected an old version? (This isn't a rhetorical question: we'd like to know to find out how we can improve our web site. We also want to know why so many people post such bad code on Stack Overflow; it's as if they can't find the tutorials. That's sad, because we're investing plenty of time and money in those tutorials. (But if no one is reading them, what's the point???)
You can find more info about iText 7 in the Jump-Start tutorial and the Building Blocks tutorial.
As for converting XML to PDF, why don't you convert to HTML first, and then use the pdfHTML add-on? There's an example on how to do that in chapter 4 of the HTML to PDF tutorial as well as in the ZUGFeRD tutorial.

Related

PDF styles not properly rendering using ASPOSE Library

We are using ASPOSE for content download in Word & PDF format. We don’t have separate code format for PDF or WORD.
There is only one base code format to retrieve data from database,finally will add the response type based on PDF(SaveFormat.PDF) or WORD (SaveFormat.DOCX).
When we change running head styles we get the correct format in WORD as expected but not in PDF.
Note : We do have updated ASPOSE JAR still its not working.
Could you please help on this issue. Thanks in advance.
package com.sam.test;
import java.text.MessageFormat;
import com.aspose.words.Document;
import com.aspose.words.DocumentBuilder;
import com.aspose.words.HeaderFooterType;
import com.aspose.words.ParagraphAlignment;
import com.aspose.words.SaveFormat;
public class SuperScriptTest {
public static void main(String[] args) throws Exception {
String fontName = "Times New Roman";
String fontColour = "black";
Double fontSize = 15.9996;
Double lineheight = 100.0;
String footerVariable = "";
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.writeln("Aspose Sample document Content for Word  file.");
com.aspose.words.Section currentSection = builder.getCurrentSection();
com.aspose.words.PageSetup pageSetup = currentSection.getPageSetup();
pageSetup.setDifferentFirstPageHeaderFooter(true);
// --- Create header for the first page. ---
pageSetup.setHeaderDistance(0.5 * 72 );
pageSetup.setFooterDistance(0.5 * 72);
builder.moveToHeaderFooter(HeaderFooterType.HEADER_FIRST);
builder.getParagraphFormat().setAlignment(ParagraphAlignment.LEFT);
String runningHead = "Running Head Test";
runningHead = MessageFormat
.format("<span style=\"margin:0px; font-family:{0}; font-size:{1}px; color:{2}; line-height:{3}%;\">{4}</span>",
fontName, fontSize, fontColour, lineheight,
runningHead);
if (!doc.getLastSection().getBody().hasChildNodes())
doc.getLastSection().remove();
builder.insertHtml(runningHead);
doc.save("C:/ASPOSE/Examples/ASPOSEPOC1/Aspose_word_doc.docx",SaveFormat.DOCX);
doc.save("C:/ASPOSE/Examples/ASPOSEPOC1/Aspose_pdf_doc.pdf",SaveFormat.PDF);
}
}

How do I specify fonts for a specific encoding?

I am using flying saucer library and trying to add a custom font for a specific encoding of letters. So that I could make support for unicode characters.
Here is the link of solution that I follow from official guide of flying saucer library http://flyingsaucerproject.github.io/flyingsaucer/r8/guide/users-guide-R8.html#xil_33.
Below is the code,
public void convertHtmlToPdf(String html, String css, OutputStream out) {
try {
html = correctHtml(html);
html = getFormedHTMLWithCSS(html, css);
HtmlCleaner cleaner = new HtmlCleaner();
TagNode rootTagNode = cleaner.clean(html);
CleanerProperties cleanerProperties = cleaner.getProperties();
XmlSerializer xmlSerializer = new PrettyXmlSerializer(cleanerProperties);
String cleanedHtml = xmlSerializer.getAsString(rootTagNode);
File fontFile = new File("/Verdana.ttf");
FontFactory.register(fontFile.getAbsolutePath());
ITextRenderer r = new ITextRenderer();
r.getFontResolver().addFont(fontFile.getAbsolutePath(), BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
r.setDocumentFromString(cleanedHtml);
r.layout();
r.createPDF(out);
r.finishPDF();
} catch (Exception e) {
e.printStackTrace();
logger.error(e.getMessage(), e);
}
}
But Still I am unable to encode certain characters. Like,
'■' : '■',
'▲' : '▲',
For '■' i am getting &x25a0; in generated pdf, and likewise for other characters that I try to encode.

Data validation fails when copy pasting data in excel made through apache poi

I am making one .xls file using apache poi. I'm including some data validations also as shown.
ObservableList<String> objectstatusList = UpgradeWorkBench.wsData.getObjectStatusDevMan("Test", "testing");
String[] strStatus = new String[objectstatusList.size()];
objectstatusList.toArray(strStatus);
CellRangeAddressList addressListStatus = new CellRangeAddressList(0, 65535, 9, 9);
DVConstraint dvConstraintStatus = DVConstraint.createExplicitListConstraint(strStatus);
dataValidationStatus = new HSSFDataValidation(addressListStatus, dvConstraintStatus);
dataValidationStatus.setSuppressDropDownArrow(false);
dataValidationStatus.setErrorStyle(ErrorStyle.STOP);
The validation is applied correctly if I enter any data manually. But if I copy paste data from some other cell the validation fails, also the validation gets completely removed from that cell. Can anyone please help on how can i validate data even on copy paste.
I saw many links but couldnt get the correct solution
Since apache poi cannot create macros, the only way is to have a template with the needed macro and creating the result from that template.
Example:
Have a template.xls with a worksheet Sheet1 and the macro from http://spreadsheetpage.com/index.php/tip/ensuring_that_data_validation_is_not_deleted/ located in the code module for that worksheet.
Edit 2020-09-03:
The above link is dead now. But see https://superuser.com/questions/870926/restrict-paste-into-dropdown-cells-in-excel. There the used name is "DataValidationRange" instead of "ValidationRange". This would must be considered.
Then you could use following code:
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.util.*;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.ss.usermodel.DataValidation.ErrorStyle;
import org.apache.poi.hssf.usermodel.*;
import java.io.*;
class ReadAndWriteFromTemplateWithMacro {
public static void main(String[] args) {
try {
String templateName = "template.xls";
String resultName = "result.xls";
String sheetName = "Sheet1";
String[] strStatus = new String[]{"on", "off", "maybe"};
FileInputStream template = new FileInputStream(templateName);
Workbook wb = WorkbookFactory.create(template);
Sheet sheet = wb.getSheet(sheetName);
if (sheet instanceof HSSFSheet) {
CellRangeAddressList addressListStatus = new CellRangeAddressList(0, 65535, 9, 9);
DVConstraint dvConstraint = DVConstraint.createExplicitListConstraint(strStatus);
DataValidation dataValidation = new HSSFDataValidation(addressListStatus, dvConstraint);
dataValidation.setSuppressDropDownArrow(false);
dataValidation.setErrorStyle(ErrorStyle.STOP);
sheet.addValidationData(dataValidation);
//create a named range for the data validation as described in http://spreadsheetpage.com/index.php/tip/ensuring_that_data_validation_is_not_deleted/
Name name = wb.createName();
name.setNameName("ValidationRange");
String reference = addressListStatus.getCellRangeAddress(0).formatAsString(sheetName, true);
name.setRefersToFormula(reference);
}
FileOutputStream output = new FileOutputStream(resultName);
wb.write(output);
wb.close();
} catch (InvalidFormatException ifex) {
} catch (FileNotFoundException fnfex) {
} catch (IOException ioex) {
}
}
}
Now the result.xls will contain that macro also. If macros are enabled, then this macro will prevent destroy data valitation by pasting over.

How to remove headers and footer from pdf file using pdfbox in java

I am using Pdf Parser to convert pdf to text.Below is my code to convert pdf to text file using java.
My PDF file contains Following Data:
Data Sheet(Header)
PHP Courses for PHP Professionals(Header)
Networking Academy
We live in an increasingly connected world, creating a global economy and a growing need for technical skills. Networking Academy delivers information technology skills to over 500,000 students a year in more than 165 countries worldwide. Networking Academy students have the opportunity to participate in a powerful and consistent learning experience that is supported by high quality, online curricula and assessments, instructor training, hands-on labs, and classroom interaction. This experience ensures the same level of qualifications and skills regardless of where in the world a student is located.
All copyrights reserved.(Footer).
Sample code:
public class PDF_TEST {
PDFParser parser;
String parsedText;
PDFTextStripper pdfStripper;
PDDocument pdDoc;
COSDocument cosDoc;
PDDocumentInformation pdDocInfo;
// PDFTextParser Constructor
public PDF_TEST() {
}
// Extract text from PDF Document
String pdftoText(String fileName) {
File f = new File(fileName);
if (!f.isFile()) {
return null;
}
try {
parser = new PDFParser(new FileInputStream(f));
} catch (Exception e) {
return null;
}
try {
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new PDFTextStripper();
pdDoc = new PDDocument(cosDoc);
parsedText = pdfStripper.getText(pdDoc);
} catch (Exception e) {
e.printStackTrace();
try {
if (cosDoc != null) cosDoc.close();
if (pdDoc != null) pdDoc.close();
} catch (Exception e1) {
e.printStackTrace();
}
return null;
}
return parsedText;
}
// Write the parsed text from PDF to a file
void writeTexttoFile(String pdfText, String fileName) {
try {
PrintWriter pw = new PrintWriter(fileName);
pw.print(pdfText);
pw.close();
} catch (Exception e) {
e.printStackTrace();
}
}
//Extracts text from a PDF Document and writes it to a text file
public static void test() {
String args[]={"C://Sample.pdf","C://Sample.txt"};
if (args.length != 2) {
System.exit(1);
}
PDFTextParser pdfTextParserObj = new PDFTextParser();
String pdfToText = pdfTextParserObj.pdftoText(args[0]);
if (pdfToText == null) {
}
else {
pdfTextParserObj.writeTexttoFile(pdfToText, args[1]);
}
}
public static void main(String args[]) throws IOException
{
test();
}
}
The above code works for extracting pdf to text.But my requirement is to ignore Header and Footer and extract only content from pdf file.
Required output:
Networking Academy
We live in an increasingly connected world, creating a global economy and a growing need for technical skills. Networking Academy delivers information technology skills to over 500,000 students a year in more than 165 countries worldwide. Networking Academy students have the opportunity to participate in a powerful and consistent learning experience that is supported by high quality, online curricula and assessments, instructor training, hands-on labs, and classroom interaction. This experience ensures the same level of qualifications and skills regardless of where in the world a student is located.
Please suggest me how to do this.
Thanks.
In general there is nothing special about header or footer texts in PDFs. It is possible to tag that material differently, but tagging is optional and the OP did not provide a sample PDF to check.
Thus, some manual work (or somewhat failure intensive image analysis) generally is necessary to find the regions on the pages for header, content, and footer material.
As soon as you have the coordinates for these regions, though, you can use the PDFTextStripperByAreawhich extends the PDFTextStripper to collect text by regions. Simply define a region for the page content using the largest rectangle including the content but excluding headers and footers, and after pdfStripper.getText(pdDoc) call getTextForRegion for the defined region.
You can use PDFTextStripperByArea to remove "Header" and "Footer" by pdf file.
Code in java using PDFBox.
public String fetchTextByRegion(String path, String filename, int pageNumber) throws IOException {
File file = new File(path + filename);
PDDocument document = PDDocument.load(file);
//Rectangle2D region = new Rectangle2D.Double(x,y,width,height);
Rectangle2D region = new Rectangle2D.Double(0, 100, 550, 700);
String regionName = "region";
PDFTextStripperByArea stripper;
PDPage page = document.getPage(pageNumber + 1);
stripper = new PDFTextStripperByArea();
stripper.addRegion(regionName, region);
stripper.extractRegions(page);
String text = stripper.getTextForRegion(regionName);
return text;
}

iText: split a PDF into several PDF (1 per page)

What I want is that: given a 10-pages-pdf-file, I want to display each page of that pdf inside a table on the web. What is the best way to achieve this? I guess one way is to split this 10-pages-pdf-file into 10 1-pages pdf, and programmatically display each pdf onto a row of a table. Can I do this with iText? Is there a better way to accomplish this?
From Split a PDF file (using iText)
import java.io.FileOutputStream;
import com.lowagie.text.Document;
import com.lowagie.text.pdf.PdfCopy;
import com.lowagie.text.pdf.PdfImportedPage;
import com.lowagie.text.pdf.PdfReader;
public class SplitPDFFile {
/**
* #param args
*/
public static void main(String[] args) {
try {
String inFile = args[0].toLowerCase();
System.out.println ("Reading " + inFile);
PdfReader reader = new PdfReader(inFile);
int n = reader.getNumberOfPages();
System.out.println ("Number of pages : " + n);
int i = 0;
while ( i < n ) {
String outFile = inFile.substring(0, inFile.indexOf(".pdf"))
+ "-" + String.format("%03d", i + 1) + ".pdf";
System.out.println ("Writing " + outFile);
Document document = new Document(reader.getPageSizeWithRotation(1));
PdfCopy writer = new PdfCopy(document, new FileOutputStream(outFile));
document.open();
PdfImportedPage page = writer.getImportedPage(reader, ++i);
writer.addPage(page);
document.close();
writer.close();
}
}
catch (Exception e) {
e.printStackTrace();
}
/* example :
java SplitPDFFile d:\temp\x\tx.pdf
Reading d:\temp\x\tx.pdf
Number of pages : 3
Writing d:\temp\x\tx-001.pdf
Writing d:\temp\x\tx-002.pdf
Writing d:\temp\x\tx-003.pdf
*/
}
}
Many iText examples here.
With PDDocument you can do so very easily.
You just have to use a Java List of PDDocument type and Splitter function to split a document.
List<PDDocument> Pages=new ArrayList<PDDocument>();
PDDocument.load(filePath);
try {
Splitter splitter = new Splitter();
Pages = splitter.split(document);
}
catch(Exception e) {
e.printStackTrace(); // print reason and line number where error exist
}
I can't comment, but this line in the most voted answer
Document document = new Document(reader.getPageSizeWithRotation(1));
should be
Document document = new Document(reader.getPageSizeWithRotation(i+1));
to get the correct pdf size if other pages have different page size (it know it's rare)

Categories