I want to add a Line into a PDF document using java - java

I am currently using PDFBox and reading from within a.pdf which is found in folder 1
I first list all the Pdf files found within the folder.
Then I check the number of pages that each file has.
Now i want to go to the very end of the file below the footer to add an image that can be recognised by the printer to staple the pages since it will realise it has reached end of file.
I have arrived till getting list of files and the number of pages.
What command do i use to go to the end of the last page and write there.
Should i transform the .pdf file into text or
Should i be able to use PDPageContentStream
This is the code I am currently using I am trying to test and see if a AAA string will be insterted into my last page of the pdf file. the project is executing with no errors but for some reason it is not being inserted into the pdf.
package pdfviewer;
import java.io.*;
import java.util.*;
import java.util.List;
import java.io.IOException;
import org.apache.pdfbox.PDFReader;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
public class Main {
/**
* #param args the command line arguments
*/
public static List flist()
{
List listfile = new ArrayList();
String path = "C:/1";
String files;
File folder = new File(path);
File[] listOfFiles = folder.listFiles();
for (int i = 0; i < listOfFiles.length; i++)
{
if (listOfFiles[i].isFile())
{
files = listOfFiles[i].getName();
if (files.endsWith(".pdf") || files.endsWith(".PDF"))
{
listfile.add(listOfFiles[i]);
}
}
}
System.out.println(listfile);
return listfile;
}
public static void CheckPages(List a)
{
String dir = null;
Object[] arraydir = a.toArray(new Object[0]);
for (int i=0; i< arraydir.length; i++)
{
int pages = 0;
PDFont font = PDType1Font.HELVETICA_BOLD;
float fontSize = 12.0f;
dir = arraydir[i].toString();
System.out.println(dir);
try {
PDDocument pdoc = PDDocument.load(dir);
List allPages = pdoc.getDocumentCatalog().getAllPages();
pages = pdoc.getNumberOfPages();
System.out.println(allPages);
int f = pages;
System.out.println(pages);
PDPage page = (PDPage) allPages.get(i);
//System.out.println(page);
PDRectangle pageSize = page.findMediaBox();
float stringWidth = font.getStringWidth( "AAA" );
float centeredPosition = (pageSize.getWidth() - (stringWidth*fontSize)/1000f)/2f;
PDPageContentStream contentStream = new PDPageContentStream(pdoc,page,true,true);
//System.out.println(contentStream);
contentStream.beginText();
contentStream.setFont( font, fontSize );
contentStream.moveTextPositionByAmount( centeredPosition, 30 );
contentStream.drawString( "AAA" );
contentStream.endText();
contentStream.close();
pdoc.close();
}
catch (Exception e)
{
System.err.println("An exception occured in parsing the PDF Document."+ e.getMessage());
}
}
}
public static void main(String[] args)
{
List l = new ArrayList();
l = pdfviewer.Main.flist();
pdfviewer.Main.CheckPages(l);
}
}
Thanks for your attention
The code I was using above is correct.
The problem is that the PDF files being generated are version 1.2, that is the reason why I am not being allowed to Edit the pdf document.
Does anyone know what I should do if i'm using a version 1.2, since I can't really upgrade it.

you can look at the examples supplied with the library.
there are two files that are of interest to you:
1- AddImageToPDF.java AddImageToPDF.java on google code search
2- AddMessageToEachPage.java AddMessageToEachPage.java on google code search
the second one adds a message to every page but you can modify it to work with the last page only. according to the PDFBox user guide document, they should be found under the folder: src/main/java/org/apache/pdfbox/examples
I have added links on google code search in case you have trouble locating the files.
I have not worked with the library or tried the examples and I am quite sure you will need to modify the code a little to suit your needs for the location of the added line/image.
In any case, if this helps you and you get a working solution, you can add the solution so that others can benefit from it.
EDIT:
After seeing the code posted by the question author, I add a modification to make it work.
I allowed myself also to make few changes for clarity.
import java.io.File;
import java.io.FileFilter;
import java.util.List;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
public class Main {
/**
* #param args the command line arguments
*/
public static final FileFilter pdfFileFilter = new FileFilter() {
public boolean accept(File file) {
return file.isFile() && file.getName().toLowerCase().endsWith(".pdf");
}
};
public static void closeQuietly(PDDocument doc) {
if (doc != null) {
try {
doc.close();
} catch (Exception exception) {
//do something here if you wish like logging
}
}
}
public static void CheckPages(File[] sourcePdfFiles,String textToInsert, String prefix) {
for (File sourcePdfFile : sourcePdfFiles) {
PDFont font = PDType1Font.HELVETICA_BOLD;
float fontSize = 12.0f;
PDDocument pdoc = null;
try {
pdoc = PDDocument.load(sourcePdfFile);
List allPages = pdoc.getDocumentCatalog().getAllPages();
PDPage lastPage = (PDPage) allPages.get(allPages.size() - 1);
PDRectangle pageSize = lastPage.findMediaBox();
float stringWidth = font.getStringWidth(textToInsert);
float centeredPosition = (pageSize.getWidth() - (stringWidth * fontSize) / 1000f) / 2f;
PDPageContentStream contentStream = new PDPageContentStream(pdoc, lastPage, true, true);
contentStream.beginText();
contentStream.setFont(font, fontSize);
contentStream.moveTextPositionByAmount(centeredPosition, 30);
contentStream.drawString(textToInsert);
contentStream.endText();
contentStream.close();
File resultFile = new File(sourcePdfFile.getParentFile(), prefix + sourcePdfFile.getName());
pdoc.save(resultFile.getAbsolutePath());
} catch (Exception e) {
System.err.println("An exception occured in parsing the PDF Document." + e.getMessage());
} finally {
closeQuietly(pdoc);
}
}
}
public static void main(String[] args) {
File pdfFilesFolder = new File("C:\\1");
File[] pdfFiles = pdfFilesFolder.listFiles(pdfFileFilter);
//when a file is processed, the result will be saved in a new file having the location of the source file
//and the same name of source file prefixed with this
String modifiedFilePrefix = "modified-";
CheckPages(pdfFiles,"AAA", modifiedFilePrefix);
}
}

Related

Extract Multiple Embedded Images from a single PDF Page using PDFBox

Friends, I am using PDFBox 2.0.6. I have been successfull in extracting images from the pdf file, But right now it is creating an image for single pdf page. But the issue is that there can be any no. of images in a pdf page, And I want that each embedded image should be extracted as a single image itself.
Here is the code,
import java.awt.image.BufferedImage;
import java.io.File;
import javax.imageio.ImageIO;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.PDFRenderer;
public class DemoPdf {
public static void main(String args[]) throws Exception {
//Loading an existing PDF document
File file = new File("C:/Users/ADMIN/Downloads/Vehicle_Photographs.pdf");
PDDocument document = PDDocument.load(file);
//Instantiating the PDFRenderer class
PDFRenderer renderer = new PDFRenderer(document);
File imageFolder = new File("C:/Users/ADMIN/Desktop/image");
for (int page = 0; page < document.getNumberOfPages(); ++page) {
//Rendering an image from the PDF document
BufferedImage image = renderer.renderImage(page);
//Writing the image to a file
ImageIO.write(image, "JPEG", new File(imageFolder+"/" + page +".jpg"));
System.out.println("Image created"+ page);
}
//Closing the document
document.close();
}
}
Is it possible in PDFBox that I can extract all embedded images as separate images, Thanks
Yes. It is possible to extract all images from all the pages in pdf.
You may refer this link, extract images from pdf using PDFBox.
The basic idea here is that, extend the class with PDFStreamEngine, and override processOperator method. Call PDFStreamEngine.processPage for all the pages. And if the object that has been passed to processOperator is an Image Object, get BufferedImage from the object, and save it.
Extend PDFStreamEngine and override the processOperator some thing like
#Override
protected void processOperator( Operator operator, List<COSBase> operands) throws IOException
{
String operation = operator.getName();
if( "Do".equals(operation) )
{
COSName objectName = (COSName) operands.get( 0 );
PDXObject xobject = getResources().getXObject( objectName );
if( xobject instanceof PDImageXObject)
{
PDImageXObject image = (PDImageXObject)xobject;
int imageWidth = image.getWidth();
int imageHeight = image.getHeight();
// same image to local
BufferedImage bImage = new BufferedImage(imageWidth,imageHeight,BufferedImage.TYPE_INT_ARGB);
bImage = image.getImage();
ImageIO.write(bImage,"PNG",new File("c:\\temp\\image_"+imageNumber+".png"));
imageNumber++;
}
else
{
}
}
else
{
super.processOperator( operator, operands);
}
}
This answer is similar with #jprism. But this is intended for someone who want just copy and paste this ready to use code with demo.
import org.apache.pdfbox.contentstream.PDFStreamEngine;
import org.apache.pdfbox.contentstream.operator.Operator;
import org.apache.pdfbox.cos.COSBase;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.UUID;
public class ExtractImagesUseCase extends PDFStreamEngine{
private final String filePath;
private final String outputDir;
// Constructor
public ExtractImagesUseCase(String filePath,
String outputDir){
this.filePath = filePath;
this.outputDir = outputDir;
}
// Execute
public void execute(){
try{
File file = new File(filePath);
PDDocument document = PDDocument.load(file);
for(PDPage page : document.getPages()){
processPage(page);
}
}catch(IOException e){
e.printStackTrace();
}
}
#Override
protected void processOperator(Operator operator, List<COSBase> operands) throws IOException{
String operation = operator.getName();
if("Do".equals(operation)){
COSName objectName = (COSName) operands.get(0);
PDXObject pdxObject = getResources().getXObject(objectName);
if(pdxObject instanceof PDImageXObject){
// Image
PDImageXObject image = (PDImageXObject) pdxObject;
BufferedImage bImage = image.getImage();
// File
String randomName = UUID.randomUUID().toString();
File outputFile = new File(outputDir,randomName + ".png");
// Write image to file
ImageIO.write(bImage, "PNG", outputFile);
}else if(pdxObject instanceof PDFormXObject){
PDFormXObject form = (PDFormXObject) pdxObject;
showForm(form);
}
}
else super.processOperator(operator, operands);
}
}
Demo
public class ExtractImageDemo{
public static void main(String[] args){
String filePath = "C:\\Users\\John\\Downloads\\Documents\\sample-file.pdf";
String outputDir = "C:\\Users\\John\\Downloads\\Documents\\Output";
ExtractImagesUseCase useCase = new ExtractImagesUseCase(
filePath,
outputDir
);
useCase.execute();
}
}

Java - Merge multiple images to a single PDF using PDFBox

I was able to merge multiple PDF files into a single PDF using the code below -
public void mergePDF() {
File file1 = new File("inputPDF/001.pdf");
File file2 = new File("inputPDF/002.pdf");
File file3 = new File("inputPDF/003.pdf");
File file4 = new File("inputPDF/004.pdf");
try {
PDDocument doc1 = PDDocument.load(file1);
PDDocument doc2 = PDDocument.load(file2);
PDDocument doc3 = PDDocument.load(file3);
PDDocument doc4 = PDDocument.load(file4);
PDFMergerUtility PDFmerger = new PDFMergerUtility();
PDFmerger.setDestinationFileName("outputImages/merged.pdf");
System.out.println("Destination path set to "+PDFmerger.getDestinationFileName());
PDFmerger.addSource(file1);
PDFmerger.addSource(file2);
PDFmerger.addSource(file3);
PDFmerger.addSource(file4);
//Merging the documents
PDFmerger.mergeDocuments();
doc1.close();
doc2.close();
doc3.close();
doc4.close();
System.out.println("Done!");
} catch (IOException e) {
e.printStackTrace();
}
}
However, my requirement is to merge multiple images (JPG, PNG) to a single PDF as well.
Is it possible to merge multiple images to a single PDF using PDFBox?
Since I struggled with this task, here's my code. The merged document is PDF/A-1b compliant
import com.google.common.io.Resources;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.Calendar;
import java.util.List;
import javax.xml.transform.TransformerException;
import org.apache.commons.io.FileUtils;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.cos.COSStream;
import org.apache.pdfbox.io.IOUtils;
import org.apache.pdfbox.io.MemoryUsageSetting;
import org.apache.pdfbox.multipdf.PDFMergerUtility;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.common.PDMetadata;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.graphics.color.PDOutputIntent;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import org.apache.pdfbox.preflight.parser.PreflightParser;
import org.apache.xmpbox.XMPMetadata;
import org.apache.xmpbox.schema.DublinCoreSchema;
import org.apache.xmpbox.schema.PDFAIdentificationSchema;
import org.apache.xmpbox.schema.XMPBasicSchema;
import org.apache.xmpbox.type.BadFieldValueException;
import org.apache.xmpbox.xml.XmpSerializer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public final class PDFMerger {
private static final Logger LOG = LoggerFactory.getLogger(PDFMerger3.class);
private static final String OUTPUT_CONDITION_IDENTIFIER = "sRGB IEC61966-2.1";
public static final String DOCUMENT_CREATOR = "Mr. Meeseeks";
public static final String DOCUMENT_SUBJECT = "Great subject";
public static final String DOCUMENT_TITLE = "Here goes your title";
/**
* Creates a compound PDF document from a list of input documents.
* <p>
* The merged document is PDF/A-1b compliant
*
* #param sources list of source PDF document streams.
* #return compound PDF document as a readable input stream.
* #throws IOException if anything goes wrong during PDF merge.
*/
public static ByteArrayOutputStream mergeFiles(final List<InputStream> sources) throws IOException {
Path mergeDirectory = Files.createTempDirectory("merge-" + System.currentTimeMillis());
try (ByteArrayOutputStream mergedPDFOutputStream = new ByteArrayOutputStream()) {
LOG.debug("Merging {} source documents into one PDF", sources.size());
PDFMergerUtility mixedPdfMerger = createMixedPdfMerger(sources, mergedPDFOutputStream, mergeDirectory);
mergeFileStreams(mergedPDFOutputStream, mixedPdfMerger);
return mergedPDFOutputStream;
} catch (Exception e) {
if (!(e instanceof IOException)) {
throw new IOException("PDF merge problem", e);
}
throw (IOException) e;
} finally {
FileUtils.deleteDirectory(mergeDirectory.toFile());
sources.forEach(IOUtils::closeQuietly);
}
}
private static void mergeFileStreams(ByteArrayOutputStream mergedPDFOutputStream, PDFMergerUtility pdfMerger)
throws IOException, BadFieldValueException, TransformerException {
LOG.debug("Initialising PDF merge utility");
try (COSStream cosStream = new COSStream()) {
// PDF and XMP properties must be identical, otherwise document is not PDF/A compliant
pdfMerger.setDestinationDocumentInformation(createPDFDocumentInfo());
pdfMerger.setDestinationMetadata(createXMPMetadata(cosStream));
pdfMerger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
LOG.debug("PDF merge successful, size = {} bytes", mergedPDFOutputStream.size());
}
}
#SuppressWarnings("UnstableApiUsage")
private static PDFMergerUtility createMixedPdfMerger(List<InputStream> sources, ByteArrayOutputStream mergedPDFOutputStream, Path mergeDirectory) throws IOException {
PDFMergerUtility pdfMerger = new PDFMergerUtility();
byte[] colorProfile = org.apache.commons.io.IOUtils.toByteArray(Resources.getResource("sRGB.icc"));
for (InputStream source : sources) {
File file = streamToFile(mergeDirectory, source);
if (isPdf(file)) {
pdfMerger.addSource(file);
} else {
pdfMerger.addSource(imageToPDDocument(mergeDirectory, file, colorProfile));
}
}
pdfMerger.setDestinationStream(mergedPDFOutputStream);
return pdfMerger;
}
private static PDDocumentInformation createPDFDocumentInfo() {
LOG.debug("Setting document info (title, author, subject) for merged PDF");
PDDocumentInformation documentInformation = new PDDocumentInformation();
documentInformation.setTitle(DOCUMENT_TITLE);
documentInformation.setCreator(DOCUMENT_CREATOR);
documentInformation.setSubject(DOCUMENT_SUBJECT);
return documentInformation;
}
private static PDMetadata createXMPMetadata(COSStream cosStream)
throws BadFieldValueException, TransformerException, IOException {
LOG.debug("Setting XMP metadata (title, author, subject) for merged PDF");
XMPMetadata xmpMetadata = XMPMetadata.createXMPMetadata();
// PDF/A-1b properties
PDFAIdentificationSchema pdfaSchema = xmpMetadata.createAndAddPFAIdentificationSchema();
pdfaSchema.setPart(1);
pdfaSchema.setConformance("B");
pdfaSchema.setAboutAsSimple("");
// Dublin Core properties
DublinCoreSchema dublinCoreSchema = xmpMetadata.createAndAddDublinCoreSchema();
dublinCoreSchema.setTitle(DOCUMENT_TITLE);
dublinCoreSchema.addCreator(DOCUMENT_CREATOR);
dublinCoreSchema.setDescription(DOCUMENT_SUBJECT);
// XMP Basic properties
XMPBasicSchema basicSchema = xmpMetadata.createAndAddXMPBasicSchema();
Calendar creationDate = Calendar.getInstance();
basicSchema.setCreateDate(creationDate);
basicSchema.setModifyDate(creationDate);
basicSchema.setMetadataDate(creationDate);
basicSchema.setCreatorTool(DOCUMENT_CREATOR);
// Create and return XMP data structure in XML format
try (ByteArrayOutputStream xmpOutputStream = new ByteArrayOutputStream();
OutputStream cosXMPStream = cosStream.createOutputStream()) {
new XmpSerializer().serialize(xmpMetadata, xmpOutputStream, true);
cosXMPStream.write(xmpOutputStream.toByteArray());
return new PDMetadata(cosStream);
}
}
private static File imageToPDDocument(Path mergeDirectory, File file, byte[] colorProfile) throws IOException {
try (PDDocument doc = new PDDocument()) {
PDImageXObject pdImage = PDImageXObject.createFromFileByContent(file, doc);
drawPage(doc, pdImage);
doc.getDocumentCatalog().addOutputIntent(createColorScheme(doc, colorProfile));
File pdfFile = Files.createTempFile(mergeDirectory, String.valueOf(System.currentTimeMillis()), ".tmp").toFile();
doc.save(pdfFile);
return pdfFile;
}
}
private static void drawPage(PDDocument doc, PDImageXObject pdImage) throws IOException {
PDPage page;
pdImage.getCOSObject().setItem(COSName.SMASK, COSName.NONE);
boolean isLandscapeMode = pdImage.getWidth() > pdImage.getHeight();
if (isLandscapeMode) {
page = new PDPage(new PDRectangle(PDRectangle.A4.getHeight(), PDRectangle.A4.getWidth()));
float scale = Math.min(Math.min(PDRectangle.A4.getWidth() / pdImage.getHeight(), PDRectangle.A4.getHeight() / pdImage.getWidth()), 1);
float width = pdImage.getWidth() * scale;
float height = pdImage.getHeight() * scale;
// center the image
float startWidth = (PDRectangle.A4.getHeight() - width) / 2;
float startHeight = (PDRectangle.A4.getWidth() - height) / 2;
try (PDPageContentStream contentStream = new PDPageContentStream(doc, page)) {
contentStream.drawImage(pdImage, startWidth, startHeight, width, height);
}
} else {
page = new PDPage(PDRectangle.A4);
float scale = Math.min(Math.min(PDRectangle.A4.getWidth() / pdImage.getWidth(), PDRectangle.A4.getHeight() / pdImage.getHeight()), 1);
float width = pdImage.getWidth() * scale;
float height = pdImage.getHeight() * scale;
// try to center the image
float startWidth = (PDRectangle.A4.getWidth() - width) / 2;
float startHeight = (PDRectangle.A4.getHeight() - height) / 2;
try (PDPageContentStream contentStream = new PDPageContentStream(doc, page)) {
contentStream.drawImage(pdImage, startWidth, startHeight, width, height);
}
}
doc.addPage(page);
}
private static PDOutputIntent createColorScheme(PDDocument doc, byte[] colorProfile) throws IOException {
PDOutputIntent intent = new PDOutputIntent(doc, new ByteArrayInputStream(colorProfile));
intent.setInfo(OUTPUT_CONDITION_IDENTIFIER);
intent.setOutputCondition(OUTPUT_CONDITION_IDENTIFIER);
intent.setOutputConditionIdentifier(OUTPUT_CONDITION_IDENTIFIER);
intent.setRegistryName("http://www.color.org");
return intent;
}
private static boolean isPdf(File file) {
try {
PreflightParser preflightParser = new PreflightParser(file);
preflightParser.parse();
return true;
} catch (Exception e) {
return false;
}
}
private static File streamToFile(Path tempDirectory, InputStream in) throws IOException {
final Path tempFile = Files.createTempFile(tempDirectory, String.valueOf(System.currentTimeMillis()), ".tmp");
try (FileOutputStream out = new FileOutputStream(tempFile.toFile())) {
IOUtils.copy(in, out);
}
return tempFile.toFile();
}
}
You can take a look at this gist for an option to merge pdf files as well.
You need to convert the images to a PDF first. See How can I convert a PNG file to PDF using java? or Create PDF from a PNG image Or Java Panel for an example on how to do this.
After that, use pdfbox to merge the resulting pdfs.
I have used itext library for merging images and convert them to pdf
Here is the code
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(image_path+"\\"+image_name+".pdf"));
document.open();
Paragraph p = new Paragraph();
File files[] = new File(path).listFiles();
PdfPTable table = new PdfPTable(1);
for (File file : files) {
table.setWidthPercentage(100);
table.addCell(createImageCell(file.getAbsolutePath()));
}
document.add(table);
document.close();
Hope It helps

Generating a .docx document from a .dotx template with docx4j (in an XPages application)

I'm using docx4j in an XPages application to create Word documents containing content from an XPage. The Word document (in .docx format) is created based on a template (also in docx.format). This all works fine. However, when I change the template from a .docx to a .dotx format, the Word document (.docx) which is generated cannot be opened. On trying to open the document, I get an error saying that the content causes problems.
Can anyone tell me how to convert a .dotx file to a .docx file using docx4j?
The code I am currently using is:
import java.util.ArrayList;
import java.util.List;
import java.util.StringTokenizer;
import javax.xml.bind.JAXBElement;
import javax.xml.bind.JAXBException;
import org.docx4j.XmlUtils;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.wml.ContentAccessor;
import org.slf4j.impl.*;
import java.io.FileInputStream;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import org.docx4j.wml.*;
import org.apache.commons.lang3.StringUtils;
import java.util.Enumeration;
import java.util.Map;
import java.util.Iterator;
import java.util.Vector;
import lotus.domino.Document;
import lotus.domino.*;
import org.docx4j.openpackaging.parts.WordprocessingML.DocumentSettingsPart;
import org.docx4j.jaxb.Context;
import org.docx4j.openpackaging.parts.relationships.RelationshipsPart;
import org.docx4j.openpackaging.parts.relationships.Namespaces;
public class JavaTemplateDocument {
public void mainCode(Session session, Document currDoc, String empLang, String templateType, String sArt) throws Exception {
Database dbCurr = session.getCurrentDatabase();
String viewName = "vieTemplateLookup";
View tview = dbCurr.getView(viewName);
Vector viewKey = new Vector();
viewKey.addElement(empLang);
viewKey.addElement(templateType);
Document templateDoc = tview.getDocumentByKey(viewKey);
if (tview.getDocumentByKey(viewKey) == null ) System.out.println("templateDoc is NULL");
Item itmNotesFields = templateDoc.getFirstItem("NotesFieldList");
Item itmWordFields = templateDoc.getFirstItem("WordFieldList");
Vector<String[]> notesFields = itmNotesFields.getValues();
Vector<String[]> wordFields = itmWordFields.getValues();
int z = notesFields.size();
int x = wordFields.size();
Enumeration e1 = notesFields.elements();
Enumeration e2 = wordFields.elements();
WordprocessingMLPackage template = getTemplate("C:\\Temp\\AZG Sample Template.dotx","C:\\Temp\\AZG Sample Template.docx");
for (int y = 0; y < x; y++) {
if (currDoc.hasItem(String.valueOf(notesFields.elementAt(y)))) {
Item itmNotesName = currDoc.getFirstItem(String.valueOf(notesFields.elementAt(y)));
replacePlaceholder(template, itmNotesName.getText(), String.valueOf(wordFields.elementAt(y))); }
else {
replacePlaceholder(template, "", String.valueOf(wordFields.elementAt(y)));
}
}
writeDocxToStream(template, "C:\\Temp\\AZG Sample Document.docx");
createResponseDocument(dbCurr, currDoc, templateDoc, sArt);
}
private void createResponseDocument(Database dbCurr, Document currDoc, Document templateDoc, String sArt) throws NotesException{
Document respDoc = dbCurr.createDocument(); // create the response document
String refVal = currDoc.getUniversalID();
respDoc.appendItemValue("IsDocTemplate", "1");
if (currDoc.hasItem("Name")) {
respDoc.appendItemValue("Name", currDoc.getItemValue("Name"));}
else {System.out.println("Name is not available"); }
if (currDoc.hasItem("Firstname")) {
respDoc.appendItemValue("Firstname", currDoc.getItemValue("Firstname"));}
else {System.out.println("Firstname is not available"); }
if (currDoc.hasItem("ReferenceTypeTexts")) {
respDoc.appendItemValue("ReferenceTypeTexts", currDoc.getItemValue("ReferenceTypeTexts"));}
else {System.out.println("ReferenceTypeTexts is not available"); }
if (currDoc.hasItem("ReferenceType")) {
respDoc.appendItemValue("ReferenceType", currDoc.getItemValue("ReferenceType"));}
else {System.out.println("ReferenceType is not available"); }
System.out.println("Append Form value");
respDoc.appendItemValue("Form", "frmRespTempl");
respDoc.makeResponse(currDoc);
RichTextItem body = respDoc.createRichTextItem("Body");
body.embedObject(1454, "", "C:\\Temp\\AZG Sample Document.docx", null);
respDoc.save();
}
/*
* Create a simple word document that we can use as a template.
* For this just open Word, create a new document and save it as template.docx.
* This is the word template we'll use to add content to.
* The first thing we need to do is load this document with docx4j.
*/
private WordprocessingMLPackage getTemplate(String source, String target) throws Docx4JException, FileNotFoundException, IOException {
String WORDPROCESSINGML_DOCUMENT = "application/vnd.openxmlformats- officedocument.wordprocessingml.document.main+xml";
final ContentType contentType = new ContentType(WORDPROCESSINGML_DOCUMENT);
String templatePath = source;
File sourceFile = new File(source);
File targetFile = new File(target);
copyFileUsingFileChannels(sourceFile, targetFile);
WordprocessingMLPackage template = WordprocessingMLPackage.load(new FileInputStream(targetFile));
ContentTypeManager ctm = wordMLPackage.getContentTypeManager();
ctm.addOverrideContentType(new URI("/word/document.xml"),WORDPROCESSINGML_DOCUMENT);
DocumentSettingsPart dsp = new DocumentSettingsPart();
CTSettings settings = Context.getWmlObjectFactory().createCTSettings();
dsp.setJaxbElement(settings);
wordMLPackage.getMainDocumentPart().addTargetPart(dsp);
// Create external rel
RelationshipsPart rp = RelationshipsPart.createRelationshipsPartForPart(dsp);
org.docx4j.relationships.Relationship rel = new org.docx4j.relationships.ObjectFactory().createRelationship();
rel.setType( Namespaces.ATTACHED_TEMPLATE );
rel.setTarget(templatePath);
rel.setTargetMode("External");
rp.addRelationship(rel); // addRelationship sets the rel's #Id
settings.setAttachedTemplate(
(CTRel)XmlUtils.unmarshalString("<w:attachedTemplate xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\" xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" r:id=\"" + rel.getId() + "\"/>", Context.jc, CTRel.class)
);
return template;
}
private static List<Object> getAllElementFromObject(Object obj, Class<?> toSearch) {
List<Object> result = new ArrayList<Object>();
if (obj instanceof JAXBElement) obj = ((JAXBElement<?>) obj).getValue();
if (obj.getClass().equals(toSearch))
result.add(obj);
else if (obj instanceof ContentAccessor) {
List<?> children = ((ContentAccessor) obj).getContent();
for (Object child : children) {
result.addAll(getAllElementFromObject(child, toSearch));
}
}
return result;
}
/*
* This will look for all the Text elements in the document, and those that match are replaced with the value we specify.
*/
private void replacePlaceholder(WordprocessingMLPackage template, String name, String placeholder ) {
List<Object> texts = getAllElementFromObject(template.getMainDocumentPart(), Text.class);
for (Object text : texts) {
Text textElement = (Text) text;
if (textElement.getValue().equals(placeholder)) {
textElement.setValue(name);
}
}
}
/*
* write the document back to a file
*/
private void writeDocxToStream(WordprocessingMLPackage template, String target) throws IOException, Docx4JException {
File f = new File(target);
template.save(f);
}
/*
* Example code for replaceParagraph
*
String placeholder = "SJ_EX1";
String toAdd = "jos\ndirksen";
replaceParagraph(placeholder, toAdd, template, template.getMainDocumentPart());
*/
private void replaceParagraph(String placeholder, String textToAdd, WordprocessingMLPackage template, ContentAccessor addTo) {
// 1. get the paragraph
List<Object> paragraphs = getAllElementFromObject(template.getMainDocumentPart(), P.class);
P toReplace = null;
for (Object p : paragraphs) {
List<Object> texts = getAllElementFromObject(p, Text.class);
for (Object t : texts) {
Text content = (Text) t;
if (content.getValue().equals(placeholder)) {
toReplace = (P) p;
break;
}
}
}
// we now have the paragraph that contains our placeholder: toReplace
// 2. split into seperate lines
String as[] = StringUtils.splitPreserveAllTokens(textToAdd, '\n');
for (int i = 0; i < as.length; i++) {
String ptext = as[i];
// 3. copy the found paragraph to keep styling correct
P copy = (P) XmlUtils.deepCopy(toReplace);
// replace the text elements from the copy
List<?> texts = getAllElementFromObject(copy, Text.class);
if (texts.size() > 0) {
Text textToReplace = (Text) texts.get(0);
textToReplace.setValue(ptext);
}
// add the paragraph to the document
addTo.getContent().add(copy);
}
// 4. remove the original one
((ContentAccessor)toReplace.getParent()).getContent().remove(toReplace);
}
/*
* A set of hashmaps that contain the name of the placeholder to replace and the value to replace it with.
*
* Map<String,String> repl1 = new HashMap<String, String>();
repl1.put("SJ_FUNCTION", "function1");
repl1.put("SJ_DESC", "desc1");
repl1.put("SJ_PERIOD", "period1");
Map<String,String> repl2 = new HashMap<String, String>();
repl2.put("SJ_FUNCTION", "function2");
repl2.put("SJ_DESC", "desc2");
repl2.put("SJ_PERIOD", "period2");
Map<String,String> repl3 = new HashMap<String, String>();
repl3.put("SJ_FUNCTION", "function3");
repl3.put("SJ_DESC", "desc3");
repl3.put("SJ_PERIOD", "period3");
replaceTable(new String[]{"SJ_FUNCTION","SJ_DESC","SJ_PERIOD"}, Arrays.asList(repl1,repl2,repl3), template);
*/
private void replaceTable(String[] placeholders, List<Map<String, String>> textToAdd,
WordprocessingMLPackage template) throws Docx4JException, JAXBException {
List<Object> tables = getAllElementFromObject(template.getMainDocumentPart(), Tbl.class);
// 1. find the table
Tbl tempTable = getTemplateTable(tables, placeholders[0]);
List<Object> rows = getAllElementFromObject(tempTable, Tr.class);
// first row is header, second row is content
if (rows.size() == 2) {
// this is our template row
Tr templateRow = (Tr) rows.get(1);
for (Map<String, String> replacements : textToAdd) {
// 2 and 3 are done in this method
addRowToTable(tempTable, templateRow, replacements);
}
// 4. remove the template row
tempTable.getContent().remove(templateRow);
}
}
private Tbl getTemplateTable(List<Object> tables, String templateKey) throws Docx4JException, JAXBException {
for (Iterator<Object> iterator = tables.iterator(); iterator.hasNext();) {
Object tbl = iterator.next();
List<?> textElements = getAllElementFromObject(tbl, Text.class);
for (Object text : textElements) {
Text textElement = (Text) text;
if (textElement.getValue() != null && textElement.getValue().equals(templateKey))
return (Tbl) tbl;
}
}
return null;
}
private static void addRowToTable(Tbl reviewtable, Tr templateRow, Map<String, String> replacements) {
Tr workingRow = (Tr) XmlUtils.deepCopy(templateRow);
List<?> textElements = getAllElementFromObject(workingRow, Text.class);
for (Object object : textElements) {
Text text = (Text) object;
String replacementValue = (String) replacements.get(text.getValue());
if (replacementValue != null)
text.setValue(replacementValue);
}
reviewtable.getContent().add(workingRow);
}
private static void copyFileUsingFileChannels(File source, File dest)
throws IOException {
FileChannel inputChannel = null;
FileChannel outputChannel = null;
try {
inputChannel = new FileInputStream(source).getChannel();
outputChannel = new FileOutputStream(dest).getChannel();
outputChannel.transferFrom(inputChannel, 0, inputChannel.size());
} finally {
inputChannel.close();
outputChannel.close();
}
}
}
Broadly, there are a few things that comprise the difference between a template (.dotx) and a document (.docx). This means you have a few things that you need to do -- it's not as simple as just changing the file extension, whether you're saving a doc as a template, or attempting to create a document from a template.
Hopefully this outline will assist:
First do what you've already done: your new document should be a file copy of the template
Change your new WordprocessingMLPackage's document type as appropriate (see WORDPROCESSINGML_TEMPLATE in the ContentTypes class)
Create an attached template and attach it to your document: see the sample code on Github for more detail on that (TemplateAttach.java sample).
Good luck!
Let's hack it.
New office formats are just ZIPs with many XML configurations and data. Try to save identical document as template and document in MS Word. IMHO the core of your problem is in (packed) file [Content_Types].xml.
They differ in the property:
ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.template.main+xml"
ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"
I would expect #benpoole's advice should work (it should alter the content of said file). If that is not the case, simply hack the content of it inside the file (it is just ordinary ZIP archive, remember).
Disclaimer: there IS difference in few more files, that might need tweaking to make it work.
I would say that you need to change the returning filename to a dotx from docx
do a filecopy from docx to dotx and change this row
body.embedObject(1454, "", "C:\\Temp\\AZG Sample Document.dotx", null);

How to get raw text from pdf file using java

I have some pdf files, Using pdfbox i have converted them into text and stored into text files, Now from the text files i want to remove
Hyperlinks
All special characters
Blank lines
headers footers of pdf files
“1)”,“2)”, “a)”, “bullets”, etc.
I want to get valid text line by line like this:
We propose OntoGain, a method for ontology learning from multi-word concept terms extracted from plain text. OntoGain follows an ontology learning process dened by distinct processing layers. Building upon plain term extraction a con-cept hierarchy is formed by clustering the extracted concepts. The derived term taxonomy is then enriched with non-taxonomic relations. Several dierent state-of-the-art methods have been examined for implementing each layer. OntoGain is based upon multi-word term concepts, as multi-word or compound terms are vested with more solid and distinctive semantics than plain single word terms. We opted for a hierarchical clustering method and Formal Concept Analysis (FCA) algorithm for building the term taxonomy. Furthermore an association rule algorithm is applied for revealing non-taxonomic relations. A method which tries to carry out the most appropriate generalization level between a relation's concepts is also implemented. To show proof of concept, a system prototype is implemented. The OntoGain allows transformation of the derived ontology into OWL using Jena Semantic Web Frame-work1. OntoGain is applied on two separate data sources a medical and computer corpus and its results are compared with similar results obtained by Text2Onto, a state-of-the-art-ontology learning method. The analysis of 11.5 CCD1.1 results indicates that OntoGain performs better than Text2Onto in terms of precision extracts more correct concepts while being more selective extracts fewer but more reasonable concepts.
How can I achieve this?
Using pdfbox we can achive this
Example :
public static void main(String args[]) {
PDFParser parser = null;
PDDocument pdDoc = null;
COSDocument cosDoc = null;
PDFTextStripper pdfStripper;
String parsedText;
String fileName = "E:\\Files\\Small Files\\PDF\\JDBC.pdf";
File file = new File(fileName);
try {
parser = new PDFParser(new FileInputStream(file));
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new PDFTextStripper();
pdDoc = new PDDocument(cosDoc);
parsedText = pdfStripper.getText(pdDoc);
System.out.println(parsedText.replaceAll("[^A-Za-z0-9. ]+", ""));
} catch (Exception e) {
e.printStackTrace();
try {
if (cosDoc != null)
cosDoc.close();
if (pdDoc != null)
pdDoc.close();
} catch (Exception e1) {
e1.printStackTrace();
}
}
}
Hi we can extract the pdf files using Apache Tika
The Example is :
import java.io.IOException;
import java.io.InputStream;
import java.util.HashMap;
import java.util.Map;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.metadata.TikaCoreProperties;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.sax.BodyContentHandler;
public class WebPagePdfExtractor {
public Map<String, Object> processRecord(String url) {
DefaultHttpClient httpclient = new DefaultHttpClient();
Map<String, Object> map = new HashMap<String, Object>();
try {
HttpGet httpGet = new HttpGet(url);
HttpResponse response = httpclient.execute(httpGet);
HttpEntity entity = response.getEntity();
InputStream input = null;
if (entity != null) {
try {
input = entity.getContent();
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
AutoDetectParser parser = new AutoDetectParser();
ParseContext parseContext = new ParseContext();
parser.parse(input, handler, metadata, parseContext);
map.put("text", handler.toString().replaceAll("\n|\r|\t", " "));
map.put("title", metadata.get(TikaCoreProperties.TITLE));
map.put("pageCount", metadata.get("xmpTPg:NPages"));
map.put("status_code", response.getStatusLine().getStatusCode() + "");
} catch (Exception e) {
e.printStackTrace();
} finally {
if (input != null) {
try {
input.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
} catch (Exception exception) {
exception.printStackTrace();
}
return map;
}
public static void main(String arg[]) {
WebPagePdfExtractor webPagePdfExtractor = new WebPagePdfExtractor();
Map<String, Object> extractedMap = webPagePdfExtractor.processRecord("http://math.about.com/library/q20.pdf");
System.out.println(extractedMap.get("text"));
}
}
You can use iText for do such things
//iText imports
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
for example:
try {
PdfReader reader = new PdfReader(INPUTFILE);
int n = reader.getNumberOfPages();
String str=PdfTextExtractor.getTextFromPage(reader, 2); //Extracting the content from a particular page.
System.out.println(str);
reader.close();
} catch (Exception e) {
System.out.println(e);
}
another one
try {
PdfReader reader = new PdfReader("c:/temp/test.pdf");
System.out.println("This PDF has "+reader.getNumberOfPages()+" pages.");
String page = PdfTextExtractor.getTextFromPage(reader, 2);
System.out.println("Page Content:\n\n"+page+"\n\n");
System.out.println("Is this document tampered: "+reader.isTampered());
System.out.println("Is this document encrypted: "+reader.isEncrypted());
} catch (IOException e) {
e.printStackTrace();
}
the above examples can only extract the text, but you need to do some more to remove hyperlinks, bullets, heading & numbers.
For the newer versions of Apache pdfbox. Here is the example from the original source
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.pdfbox.examples.util;
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.encryption.AccessPermission;
import org.apache.pdfbox.text.PDFTextStripper;
/**
* This is a simple text extraction example to get started. For more advance usage, see the
* ExtractTextByArea and the DrawPrintTextLocations examples in this subproject, as well as the
* ExtractText tool in the tools subproject.
*
* #author Tilman Hausherr
*/
public class ExtractTextSimple
{
private ExtractTextSimple()
{
// example class should not be instantiated
}
/**
* This will print the documents text page by page.
*
* #param args The command line arguments.
*
* #throws IOException If there is an error parsing or extracting the document.
*/
public static void main(String[] args) throws IOException
{
if (args.length != 1)
{
usage();
}
try (PDDocument document = PDDocument.load(new File(args[0])))
{
AccessPermission ap = document.getCurrentAccessPermission();
if (!ap.canExtractContent())
{
throw new IOException("You do not have permission to extract text");
}
PDFTextStripper stripper = new PDFTextStripper();
// This example uses sorting, but in some cases it is more useful to switch it off,
// e.g. in some files with columns where the PDF content stream respects the
// column order.
stripper.setSortByPosition(true);
for (int p = 1; p <= document.getNumberOfPages(); ++p)
{
// Set the page interval to extract. If you don't, then all pages would be extracted.
stripper.setStartPage(p);
stripper.setEndPage(p);
// let the magic happen
String text = stripper.getText(document);
// do some nice output with a header
String pageStr = String.format("page %d:", p);
System.out.println(pageStr);
for (int i = 0; i < pageStr.length(); ++i)
{
System.out.print("-");
}
System.out.println();
System.out.println(text.trim());
System.out.println();
// If the extracted text is empty or gibberish, please try extracting text
// with Adobe Reader first before asking for help. Also read the FAQ
// on the website:
// https://pdfbox.apache.org/2.0/faq.html#text-extraction
}
}
}
/**
* This will print the usage for this document.
*/
private static void usage()
{
System.err.println("Usage: java " + ExtractTextSimple.class.getName() + " <input-pdf>");
System.exit(-1);
}
}
Extracting all keywords from PDF(from a web page) file on your local machine or Base64 encoded string:
import org.apache.commons.codec.binary.Base64;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
public class WebPagePdfExtractor {
public static void main(String arg[]) {
WebPagePdfExtractor webPagePdfExtractor = new WebPagePdfExtractor();
System.out.println("From file: " + webPagePdfExtractor.processRecord(createByteArray()).get("text"));
System.out.println("From string: " + webPagePdfExtractor.processRecord(getArrayFromBase64EncodedString()).get("text"));
}
public Map<String, Object> processRecord(byte[] byteArray) {
Map<String, Object> map = new HashMap<>();
try {
PDFTextStripper stripper = new PDFTextStripper();
stripper.setSortByPosition(false);
stripper.setShouldSeparateByBeads(true);
PDDocument document = PDDocument.load(byteArray);
String text = stripper.getText(document);
map.put("text", text.replaceAll("\n|\r|\t", " "));
} catch (Exception exception) {
exception.printStackTrace();
}
return map;
}
private static byte[] getArrayFromBase64EncodedString() {
String encodedContent = "data:application/pdf;base64,JVBERi0xLjMKJcTl8uXrp/Og0MTGCjQgMCBvYmoKPDwgL0xlbmd0aCA1IDAgUiAvRmlsdGVyIC9GbGF0ZURlY29kZSA+PgpzdHJlYW0KeAGF0E0OgjAQBeA9p3hL3UCHlha2Gg9A0sS1AepPxIDl/rFFErVESDddvPlm8nqU6EFpzARjBCVkLHNkipBzPBsc8UCyt4TKgmCr/9HI+GDqg2x8Luzk8UtfYwX5DVWLnQaLmd+qHTsF3V5QEekWidZuDNpgc7L1FvqGg35fOzPlqslFYJrzZdnkq6YI77TXtrs3GBo7oKvNss9mfhT0IAV+e6CUL5pSTWb0t1tVBKbI5McsXxNmciYKZW5kc3RyZWFtCmVuZG9iago1IDAgb2JqCjE4NQplbmRvYmoKMiAwIG9iago8PCAvVHlwZSAvUGFnZSAvUGFyZW50IDMgMCBSIC9SZXNvdXJjZXMgNiAwIFIgL0NvbnRlbnRzIDQgMCBSIC9NZWRpYUJveCBbMCAwIDU5NSA4NDJdC" +
"j4+CmVuZG9iago2IDAgb2JqCjw8IC9Qcm9jU2V0IFsgL1BERiAvVGV4dCBdIC9Db2xvclNwYWNlIDw8IC9DczEgNyAwIFIgL0NzMiA4IDAgUiA+PiAvRm9udCA8PAovVFQxIDkgMCBSID4+ID4+CmVuZG9iagoxMCAwIG9iago8PCAvTGVuZ3RoIDExIDAgUiAvTiAxIC9BbHRlcm5hdGUgL0RldmljZUdyYXkgL0ZpbHRlciAvRmxhdGVEZWNvZGUgPj4Kc3RyZWFtCngBhVVdaBxVFD67c2cDEgcftA0ttIM/bQnpMolWE4u12026SRO362ZTmyrKdHY2O81kZpyZ3SahT6XgmxYE6augPsaCCLYqNi/2paXFkko1DwoRWowgKH1S8Dsz22R2QTLDnfnuueeee8537rmXqOtv3fPstEo054R+oZybPjl9Su26TWlSqJvw6Ebg5UqlCcaO65j8b38e3qUUS+7sZ1vtY1v25KoZGNC6huZWA2OOKKURZWqG54dEXZcgHzwbeoxvAz85WynngdeAldZcQHqqYDqmbxlqwdcX1JLv1i" +
"w76etW42xjy2fObrCv/OxG6w5mJ8fx74XPF0xnahJ4H/CSoY8w7gO+27ROFGOcTnvhkXKsn842ZqdyLfnJmn90qiW/UG+MMs4SpZcW65U3gJ8AXnVOF4+39Ndn3XG200Mk9RhB/hTws8Ba3RzjPKnAFd8tsz7Lw6o5PAL8MvAlKxyrAMO+9EPQnGQ5sKDFep79xFoie0Y/VgLeBnzItAu8FuyIiheW2OYg8LxjF3ktxC4um0EUL2IXP4X1ymisL6dDv8JznyaS99Sso2PA4EQerfujLIc/cujZ0d56EXjJb5Q59j3Aa7o/UgCGzcxjVX2YeX4BeIBOpHQyyaXT+Brk0L+INyCLmhHyyMdYDX2bCtBw0Hz0DGgVgHRaAColtEz0WCeeo1IVPZVmollBhNjK/ahvUH7Xp9SAtE7rkNaBXqNfIsk8/Upz6OchbWBspsNuHl44tAgP2BO2+aBl0xXbhSaeRzsoJsQrYlAMkSpeFYfFITEM6ZA4GM2JvU/6zn4+2LD0LtZN+r4MDkKsZ8MzB6xwNAE8+AfrzkaaCbYu7mjs87yP3j/vv2MZtz74s429APoxJ7/BogtrJiXmXj/3TU/CQ3VFfPXWne7r5+h4MktR3qqdWZLX5PvyCr735NWkDflneRXvvbZcPcoL/5O5zSFGO5LNQc48m1G0ccYbwCG4qUVz9rdZTLLptmK0YMlClJ2ruP/LCfPDPLexUnMu7vC8tz9jNs33ig+LdL5Pu6y" +
"ta59oP2p/aCvax0C/Sx9KX0rfSlekq9INUqVr0rL0nfS99Ln0NXpfQLosXenYSXHsG7sHfsZ71mjtMGaGsxQQ88LazApLH/F3BmOb+TOh1V4Dnbt/Yy3liLJTeUYZVnYrzykTSq9yQDmsbFcG0PqVUWUvRnZusGRjPc6AhX+SZ4umI67iPLFXdbDnw0sd76ZfXMPWhjXYST0Ontnapg6vEVe/FVVjvDtdnAY6TSFii84ich86nB8nqv7O2VyTODVSb+KUsMQu0S/GWjWYEwdQheNt9TjIVZoZyQxncqRmejNDmf7MMcZRrNH5ktmL0SF8RxLeM8sx/5s1xGcY7x3mqAlso4dbKzTncd8R5V1vwbdm6qE6oGkvqTlcr6Y65hjZPlW3bTUaClTfDEy/aVazxHc3zyP66/XoTk5tu2E0/GYso1TqJtF/t4+TNAplbmRzdHJlYW0KZW5kb2JqCjExIDAgb2JqCjExMTYKZW5kb2JqCjcgMCBvYmoKWyAvSUNDQmFzZWQgMTAgMCBSIF0KZW5kb2JqCjEyIDAgb2JqCjw8IC9MZW5ndGgg" +
"MTMgMCBSIC9OIDMgL0FsdGVybmF0ZSAvRGV2aWNlUkdCIC9GaWx0ZXIgL0ZsYXRlRGVjb2RlID4+CnN0cmVhbQp4AYVVW4gbVRj+kznJCrvO09rVLaRDvXQpu0u2Fd2ltJpbk7RrGrLZ1RZBs5OTZMzsJM5M0gt9KoLii6u+SUG8vS0IgtJ6wdYH+1KpUFZ36yIoPrR4QSj0RbfxO5NkJllqm2XPfPP93/lv558ZooG1Qr2u+xWiJcM2c8mo8tzRY8rAOvnpIRqkURosqFY9ks3OEn5CK679v1s/kE8wVyfubO9Xb7kbLHJLJfLdB75WtNQl4BNEgbNq3bSJBobBTx+36wKLHIZNJAj8osDlNoaNhhfb+DVHk8/FoDkLLKuVQhF4BXh8sYcv9+B2DlDAT5Ib3NRURfQia9ZKms4dQ3u5h7lHeTe4pDdQs/PbgXXIqs4dxnUMtb9SLMQFngReUQuJOeBHgK81tYVMB9+u29Ec8GNE/p2N6nwEeDdwqmQenAeGH79ZaaS6+J1Tlfyz4LeB/8ZYzBzp7F1TrRh6STvB367wtOhviEhSN" +
"DudB4Yf6YBZywk9cpBKRR5PAI8Dv16tHRY5wKf0mdWcE7zIZ+1UJSbyFPzllwqHssCjwL9yPSn0iCX9W7eznRxYyNAzIi5isTi3nHrhh4XsSj4FHnGZbpv5zl62XNIOpjv6TypmSvBi77W67swocgv4zUZO1I5YgcmCmUgCw2cgy4150U+Bm7TgKxCnGi1iVcmgTVIoR0mK4lonE5YSaaSD4bByMBx3Xc2Es8+iKniNmo7Nwpp1lO2dXa1CZbAGXXe0KsVCH1EDnir0B9iK61OhGO4a4Mr/46edy42OnxobYWG2F//72Czbz6bZDCnsKfY0O8DiYGfYPtd3Fnu6FYl8biBK28/LiMgd3QJqv4gabSpg/QWKGlmuh76uLI82xjzLGfMFTb3yxt89vdKws+oqJvo6euRePQ/8FrgeWMW6HthwfSiBnwIb+FtHb7xaap6902VxUhpOtNan23oWXVUElerOziV0QUPNvKfmiV4fl05/+aAXbZWde/7q0KXTJWN51GNFF/irmVsZOjPuseEfw3+GV8PvhT8M/y69LX0qfSWdlz6XLpMiXZ" +
"AuSl9L30ofS1+4+rvNkHv2JDIXcyXyFtPVrbC315hYOSpvlx+W4/IO+VF51lUp8og8JafkXbBsd8/Nm2+lt3L05Siidftz51jiWdFcTzgD3/2YAM2L2DcD88hYo+PwaaLfYt4MOglt75PXqYiF2BRLb5nuaTHzXd/BRDAejJAS3B2cCU4FDwncfZaDu2CbwZrozQ3z4Sr6KuU2PyG+JxSr1U+aWrliK3vC4SeVCD59XEkb6uS4UtB1xTFZisktbjZ5cZLEd1PsI7qZc76Hvm1XPM5+hmj/X3j3fe9xxxpEKxbRyOMeN4Z35QPvEp17Qm2YzbY/8vm+I7JKe/c4976hKN5fP7daN/EeG3iLaPPNVuuf91utzQ/gf4Pogv4foJ98VQplbmRzdHJlYW0KZW5kb2JqCjEzIDAgb2JqCjEwNzkKZW5kb2JqCjggMCBvYmoKWyAvSUNDQmFzZWQgMTIgMCBSIF0KZW5kb2JqCjMgMCBvYmoKPDwgL1R5cGUgL1BhZ2VzIC9NZWRpYUJveCBbMCAwIDU5NSA4NDJdIC9Db3VudCAxIC9LaWR" +
"zIFsgMiAwIFIgXSA+PgplbmRvYmoKMTQgMCBvYmoKPDwgL1R5cGUgL0NhdGFsb2cgL1BhZ2VzIDMgMCBSID4+CmVuZG9iago5IDAgb2JqCjw8IC9UeXBlIC9Gb250IC9TdWJ0eXBlIC9UcnVlVHlwZSAvQmFzZUZvbnQgL0NOVFpYVStNZW5" +
"sby1SZWd1bGFyIC9Gb250RGVzY3JpcHRvcgoxNSAwIFIgL0VuY29kaW5nIC9NYWNSb21hbkVuY29kaW5nIC9GaXJzdENoYXIgMzIgL0xhc3RDaGFyIDExNiAvV2lkdGhzIFsgNjAyCjAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgNjAyIDYwMiA2MDIgNjAyIDYwMiA2MDIgMCAwIDAgMCAwIDAgMCAwIDAKMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgMCAwIDAgNjAyIDAgMAo2MDIgNjAyIDYwMiA2MDIgNjAyIDYwMiAwIDAgNjAyIDYwMiAwIDAgNjAyIDAgMCA2MDIgNjAyIF0gPj4KZW5kb2JqCjE1IDAgb2JqCjw8IC9UeXBlIC9Gb250RGVzY3JpcHRvciAvRm9udE5hbWUgL0NOVFpYVStNZW5sby1SZWd1bGFyIC9GbGFncyAzMyAvRm9udEJCb3gKWy01NTggLTM3NSA3MTggMTA0MV0g" +
"L0l0YWxpY0FuZ2xlIDAgL0FzY2VudCA5MjggL0Rlc2NlbnQgLTIzNiAvQ2FwSGVpZ2h0IDcyOQovU3RlbVYgOTkgL1hIZWlnaHQgNTQ3…/ZfICj5JcLdi/ATmQZKogDPg0lIDBunI0ZGOB1OB/Lpyce1TbJqCpBThycVs3GyQPZSLKexbMGyFss8LF4sNb2lElu5HPlJ2439G1jKsbRh6cTyPNpx8I6AFxa8P+xD2E4e/G+5PqJ/8aDzERFvGBJR/WLkfwcM3kRCiZpokDMdxhn5MeD9Rn5MSm0mYUpLSF98J5HXaQgtpJvoDWGesEe4C4NgK3woWsQ88RgzszXsMM4WyALeIC5gO5B/FYk/pNxVCJGoZT8NYc8LIknrONeVQYznus51pYeZHCaXw+RYIJLAEogJfMEbVPrvv31S6icvTMlp1EQhO41cOuXb0EEkSYkmGaMXSzuIfhCKAA4Y/YScTs9ASizblWVyWB1UT4fwNfSp9+mgwLFd4oI3D++9++kuheYWpOnEeBhLJrv7kVg" +
"Xk1hkVDRExLgkieUZTTt1jZYGkTTiXU8tULUtIsEIfeKMgY5AV3u7yZyTQdK6Mm923fwgHe1GZWTfmCJy5CYi05PgwqWzB5HBw2n2wL7OBEmVPZxmZYpWi6TSU7pM2BNY1kojs0sLN1bPOLZ4/nuzL1CNp/S+zt27dx+lA4Y/2/hA1fq8kR9kZF57u6R96YgvZRmsvfe5OBj5TSKjkN+wRqu6LrRF1yjF19lbYhudDVKTdVe/8DAClihbX6MNEuItofH9kF9k+FwXMofC7rqCDHcZu293G7tz0qmNWi2iM6FvYrYN2RuEvCbT7GDnZ0xDyMZt/Otb8z+aP+/dOS17927ZurVu24ZVnrYFz7w95jxlayE+8b3Nf/m6b5/j2QMb1v26qeXZ8iWVSUkH7fYLb1bKBw7aA57LYgVGYAGtLM8dT3WgIwC6PAIaVSOjsCaUatXEFiJKBm0fvTEQODe0K9Mki/mK3DPnBOUsHkchH/ckhFIHZJmyrE6T0+TIFi7xfvRjx/X33jves5rFBb6Gk4GsHXwbLX1Hlp0XZZeKa8eRYe4EURUX3agy" +
"1RnXWxp1QiNZo2tS7baBjUTYqDqBGONtspI7UEwosSsoMUVevAM5CEO9mmRVEquF/ExwsrxOCTd7OpKnpXxFjfzzO8uPTnz44Oydb7bunLy1kHXu5huMBt59vYvfsNtPZmb4tjfvdblQGjXI23jUayTpg9w5VfFRjer4RqP6DRGPs/ViY3iDscmVYCN9dQkqKZaGxbuMga6uwBXZeYLq/MKI6jShPq0DqDNBUBg0Wy2C0y6YjMSRGU4TJKslPKhYuJS7fkL7u+m7F33yzc3PeOBb6qSWsZv4Zys3bVq5as0atu+gK5Ff4ldLH+d3vvuW36bL6Ab6LF0X37Pw4I4dB//4+z0+RZ8y306xEuNGP7LI3V+tItF2baRBRfZHqurNjjr7O3H1fdrMTZE6GilG6dWSNt8uStbh/Y03u9AkM1G3snI7rtwMyCKWd2DKMeegZ6W749Lj0+3pjvSEZtJMm4VmdbNme3hzRHNkc1RztH4m7rJ3Q4OzB5uc2XpE9M0eOOh+mi1LoNfdwlGfQtuwV159duGWPfTAgfv/VP3GBz98d4eu2jirfca81uK6o8P62oWsJxaXLT57sN/4npUtpY/8eXvr4bhVzwwa6E9MnDIlc2PQditxr2bMIowYLdLd0YxYouv1lvqQJn0bfREiRCIJo0xmzeg43Ju8NTk2KIaDe0ynpqxeHlEd5ixUx790gazCdL9/QFPpiWvX3y/byg1ramr" +
"q6mpq1sAZYeQ/utYVTaP3Uys10cHTuOaj8xfPdV44L/uSzE8Jyt6K/GQFIyKGKSUIChgEY09jScPcUUJf02C4DCNRShuL32qS0zOo1WGVZIMYbEXZ2QmaSVamWRUUnlgS+DzknT3F7eWPHpnBf+Dnqf3GR3d82g1ran4XItRPl744dl/OW8nJNIeGUS118782Lt3lWyT72RH08USUUxgZiFIyUm3IfonWkxf10mG1EKYioUzSGTQW47mhHYGhHZmKAVzJDKD60b9lA0aJxFHZyWSndqBKs8TEM3Mn0JV8hZ930uRdf5IsTZPnz/UG0uCMd6JfTq9lefDRornXFke7E6O0tpjEUDDXhYWH1tvC6w2AlmgzHEk63D8xikjaUZLZ7BiNhtjRqy10846gERo7u9EC0RKRGyV0Bx0nDL3pxzg5TJANrleZEdlZMH31ytXrvWtWrPZ3Xx3fUjSneeTmNSlbyjuuX+9Y2JDmF3JOffzxqVOfnuefBXggNmb/gJTtvpCqWQ/TIVSFZ+mQh6ZvkPcRlF+MIr8Ud2SoHvC3OKne1KZ9UU0FiYzVhUqaQgvaGIoMTWwoxnKM67KFOU1BZrGTpfh/uBhz4LEnVtb5/RmvL3ljl7C/Z6ywv3H9W2/0rJYsPTtK5l6W5daN+qqWDHh+6oicYqKpyD9kyocpRTtSXcR8Em1J7muwlW1LexrtSs5JZLuScwa5pXgGyx/hdZxIeA" +
"JTPMvxBMSaYoSmQ+nHtDywiJbzyzTe7xcfDmR5vTBcyPsKv7yBPEyXopGDxCAHOoUoppRILPQiribnP/IqOjlLka3XEo5eIbu8bCTCqRmej6+99ib/lF6im3/13ItnD8OtF4LyxN+YxQq0iwTyqjsx0mwIFVUkLkZSWbX1dmiLORxlVBGTIWSC" +
"NNE0wTAxNnJCdIHTeHOcTzt1nM80dUbxARJ9r/0+T2BoAA0leOYPHXrlpnIwoYmg8NPdo9LFdJYupavSQ9JD09Xpmtzw3IjcyNyo3OjcmNzY3LhcWzVUi9WsWqpWVYdUh1arqzXecG+EN9Ib5Y32xnhjvXFem5POpPLJEh5Ff6LMf2vVqgwKOx" +
"IeHbu64vXswkn3v54zdkzOzp2Oubnjy6B7dMEZfqlnubDymyWVX/SsEFbeWCy3YknJ0NxCWddt/CFxKspCjmFZ7tgfY1ibvokegcNxGL9GKZGsUI5imYqJoV/8GMZcsm0pXJhNRtkbfuofdPmBA3IYu/rV+/Oa6I3VNatqa1fVrF7Xc1xSe4um" +
"8Xf5df53fnwavfXR+Qud5y5iFJPtvRPjmIQ8JZJlbrdOK+g1EfG2kFBBpY6wxdvy4myRao0tXrSSOtouWuqs7ZH1JrHe1WZqSopTa+JjVOSBGEk/RiVZEgqSgu58RXZfObDIh6OR3+o23uo2RyjHipKn6ZU8Tak9CcETUz5L4pVkSPq3k2cPTBMGYPo2CFUCJx9oLqqqfPitsWvXdX1YtP+x+YemPrvqVkjBy789//70FjFn34ABk4vGjXXqo7dVtbQ6nW3Z2XM91RmCPn7jilf+4FD2incAMYS9hLExwx2pZyEG2E9M9HDIfnWIJhTzYclo1v88MnbdHNohp0Cyg8sx8Wdmb8Lr7nY+a9ayU5dP7ZZDI3uJH/b2NP9qzsaWE0KJlw5HnSvPvefwlwRZ2r988CqMnmzBUyScRGD+EUXyySgymowhY8k4Mp48gLl+EXmITFM+pPjvhSANSb5Ej5w4dXrxg8kTyhYtrEidUjZ/2cLZTxLyT2S78dEKZW5kc3RyZWFtCmVuZG9iagoxNyAwIG9iago0ODAxCmVuZG9iagoxOCAwIG9iagooKQplbmRvYmoKMTkgMCBvYmoKKE1hYyBPUyBYIDEwLjEyLjYgUXVhcnR6IFBERkNvbnRleHQpCmVuZG9iagoyMCAwIG9iagooKQplbmRvYmoKMjEgMCBvYmoKKCkKZW5kb2JqCjIyIDAgb2JqCihUZXh0TWF0ZSkKZW5kb2JqCjIzIDAgb2JqCihEOjIwMTcxMjEyMTMwMzQ4WjAwJzAwJykKZW5kb2JqCjI0IDAgb2JqCigpCmVuZG9iagoyNSAwIG9iagpbICgpIF0KZW5kb2JqCjEgMCBvYmoKPDwgL1RpdGxlIDE4IDAgUiAvQXV0aG9yIDIwIDAgUiAvU3ViamVjdCAyMSAwIFIgL1Byb2R1Y2VyIDE5IDAgUiAvQ3JlYXRvcgoyMiAwIFIgL0NyZWF0aW9uRGF0ZSAyMyAwIFIgL01vZERhdGUgMjMgMCBSIC9LZXl3b3JkcyAyNCAwIFIgL0FBUEw6S2V5d29yZHMKMjUgMCBSID4+CmVuZG9iagp4cmVmCjAgMjYKMDAwMDAwMDAwMCA2NTUzNSBmIAowMDAwMDA4OTI5IDAwMDAwIG4gCjAwMDAwMDAzMDAgMDAwMDAgbiAKMDAwMDAwMzAyOCAwMDAwMCBuIAowMDAwMDAwMDIyIDAwMDAwIG4gCjAwMDAwMDAyODEgMDAwMDAgbiAKMDAwMDAwMDQwNCAwMDAwMCBuIAowMDAwMDAxNzUzIDAwMDAwIG4gCjAwMDAwMDI5OTIgMDAwMDAgbiAKMDAwMDAwMzE2MSAwMDAwMCBuIAowMDAwMDAwNTEyIDAwMDAwIG4gCjAwMDAwMDE3MzIgMDAwMDAgbiAKMDAwMDAwMTc4OSAwMDAwMCBuIAowMDAwMDAyOTcxIDAwMDAwIG4gCjAwMDAwMDMxMTEgMDAwMDAgbiAKMDAwMDAwMzU0NCAwMDAwMCB" +
"uIAowMDAwMDAzNzk2IDAwMDAwIG4gCjAwMDAwMDg2ODcgMDAwMDAgbiAKMDAwMDAwODcwOCAwMDAwMCBuIAowMDAwMDA4NzI3IDAwMDAwIG4gCjAwMDAwMDg3ODAgMDAwMDAgbiAKMDAwMDAwODc5OSAwMDAwMCBuIAowMDAwMDA4ODE4IDAwMDAwIG4gCjAwMDAwMDg4NDUgMDAwMDAgbiAKMDAwMDAwODg4NyAwMDAwMCBuIAowMDAwMDA4OTA2IDAwMDAwIG4gCnRyYWlsZXIKPDwgL1NpemUgMjYgL1Jvb3QgMTQgMCBSIC9JbmZvIDEgMCBSIC9JRCBbIDxkYjc4M2NhNDM2Mzg4YzI5ZDc5MDQ2NzY3NjUxNjE3OT4KPGRiNzgzY2E0MzYzODhjMjlkNzkwNDY3Njc2NTE2MTc5PiBdID4+CnN0YXJ0eHJlZgo5MTA0CiUlRU9GCg==";
String content = encodedContent.substring("data:application/pdf;base64," .length());
return Base64.decodeBase64(content);
}
public static byte[] createByteArray() {
String pathToBinaryData = "/bla-bla/src/main/resources/small.pdf";
File file = new File(pathToBinaryData);
if (!file.exists()) {
System.out.println(" could not be found in folder " + pathToBinaryData);
return null;
}
FileInputStream fin = null;
try {
fin = new FileInputStream(file);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
byte fileContent[] = new byte[(int) file.length()];
try {
fin.read(fileContent);
} catch (IOException e) {
e.printStackTrace();
}
return fileContent;
}
}

How to write content into pdf use iText?

Right now i use iText to generate a pdf automatically.
And my problem is that when the content is really very large, i need to calculate the content's height and width, and then add new page...
this is really very inconvinent.
so I wonder whether or not there is a method like:
Document.add("a very very large article");
and after this , it will auto generate a pdf file ????
Thanks in advance !
The following creates a 9 page pdf without having to calculate height and width.
import java.io.FileOutputStream;
import java.io.IOException;
import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;
public class HelloWorld {
public static void main(String[] args) {
Document document = new Document();
try {
PdfWriter.getInstance(document,
new FileOutputStream("HelloWorld.pdf"));
document.open();
String text = "";
for (int i = 0; i < 10000; i++) {
text += "test";
}
document.add(new Paragraph(text));
} catch (DocumentException e) {
System.err.println(e.getMessage());
} catch (IOException ex) {
System.err.println(ex.getMessage());
}
document.close();
}
}
a new page will be generated automaticly, when the content of the current page is full.

Categories