I have a booklet pdf. I want to Split in half i.e Vertical + Re-paginate from booklet scan
Ex: booklet pages would be 1, 8 and 7, 2 etc.,
After processing i want to have a PDF with 1, 2, 3, 4, ....
Please advise which PDF library would be able to do the above in java
Thanks
Depending on how the booklet was scanned into the PDF, I think you might be able to do this using a Java library that can extract and merge PDF pages.
For example, in the LEADTOOLS Java PDF Library, which is what I am familiar with since I work for the vendor, there is a PDFFile class that can be used to extract and merge pages from and to a PDF file.
PDFFile file = new PDFFile(bookletFile);
int pageCount = file.getPageCount();
for (int i = 1; i <= pageCount; i++)
{
File destinationFile = new File(destinationFolder, String.format("Extracted_Page{0}.pdf", i));
file.extractPages(i, i, destinationFile.getPath());
}
Since the booklet looks like it’s scanned in a way that every other page will contain a double page. To split them, you can load every other extracted page as a raster image then use the library's raster imaging classes to save each half as a separate raster PDF:
RasterCodecs codecs = new RasterCodecs();
RasterImage firstHalfImage = codecs.load(extractedDoublePage);
// Create a LeadRect that encompasses the second half
LeadRect secondHalfLeadRect = new LeadRect(firstHalfImage.getImageWidth() / 2, 0, firstHalfImage.getImageWidth() / 2, firstHalfImage.getImageHeight());
// Create a new image containing the second half
RasterImage secondHalfImage = firstHalfImage.clone(secondHalfLeadRect);
// Crop First Image to contain only first half
LeadRect firstHalfLeadRect = new LeadRect(0, 0, firstHalfImage.getImageWidth() / 2, firstHalfImage.getImageHeight());
CropCommand cropCommand = new CropCommand(firstHalfLeadRect);
cropCommand.run(firstHalfImage);
You can then use the RasterCodecs.Save() method to save each image as a raster PDF file.
Finally, once you have split everything accordingly, you can use the PDFFile.MergeWith() method to combine all the pages back into one file in the needed order.
Related
I need to split large documents (several thousands of pages and 1-2 Gb) using itext 7
I already tried to split pdf using this reference
https://itextpdf.com/en/resources/examples/itext-7/splitting-pdf-file
and also doing something like this:
try (PdfDocument pdfDoc = new PdfDocument(new PdfReader(outputPdfPath.toString()))) {
Files.createDirectories(Paths.get(destFolder));
int numberOfPages = pdfDoc.getNumberOfPages();
int pageNumber = 0;
while (pageNumber < numberOfPages) {
try (PdfDocument document = new PdfDocument(
new PdfWriter(destFolder + pages.get(pageNumber++).id + ".pdf"))) {
pdfDoc.copyPagesTo(pageNumber, pageNumber, document);
}
}
log.info("Provided PDF has been split into multiple.");
}
Both examples works perfectly fine but created documents are large and with lots of unused fonts, images, objects.
How can I remove all this unused objects to make newly created one paged pdfs weigh less.
The problem with your document is as follows: each page shares a lot of (maybe even all)the fonts/xobjets of the document. While coping pages, iText doesn't know whether the resources are needed on the page or not: it just copies themm and that's why you get so huge resultant pdfs.
The option you are looking for is iText's pdfSweep.
It's general purpose is redaction of some page's content, however besides that pdfSweep also optimizes the pages while redacting.
So how to sovle yout problem?
a) Specify the redaction area as a degenerate rectangle
b) Clean up the pages (of splitted documents or of the original document):
PdfCleanUpLocation dummyLocation = new PdfCleanUpLocation(1, new Rectangle(0, 0, 0, 0), null);
PdfDocument pdfDocument = new PdfDocument(new PdfReader(input), new PdfWriter(output));
PdfCleanUpTool cleaner = (cleanUpLocations == null)
? new PdfCleanUpTool(pdfDocument, true)
: new PdfCleanUpTool(pdfDocument, cleanUpLocations);
cleaner.cleanUp();
pdfDocument.close();
I've tried this approach to process the first of your resultant documents (which represents the first page).
The size of the document before pdfSweep processing: 9282 KB.
The size of the document after pdfSweep processing: 549 KB.
Could someone kindly show a working code example for PDF page imposition using iText?
It seems I've exhausted my Google options: there's no code samples for that out there.
Thanks,
InnerOrchestra
ps: By imposition, a printing technical term, I mean, for example, having an 11x17 sheet of paper hold two 8.5x11 pages. For business cards, this would be the same page (3.75x2.25) and for a booklet it would not since the sheet would be folded and the page placement would vary depending on booklet settings.
You could have saved yourself plenty of time by reading Chapter 6 of my book or by simply taking a look at the examples on the iText site. Take for instance the NUpTool example. As you're working in the printing sector, you should be familiar with the term "N-upping". It's when you take a document and then create a new one with 2 pages on one (2-upping), 4 pages on one (4-upping), etc...
Your request is very similar, but easier to achieve, because when we take a document, let's say text_on_stationery.pdf and we 2-up it using the example from my book, you have to scale down the pages, resulting for instance in the document result2up.pdf.
In your case, it's not that hard because you don't need to scale anything. You just need to create a Document object with twice the size of the original document, create PdfImportedPage objects to import the pages, and use addTemplate() with the correct offset to add them side by side on the new document.
There are quite some examples that demonstrate the use of PdfImportedPage: http://itextpdf.com/themes/keyword.php?id=236
It is strange that Google didn't show you the SuperImposing example when looking for "imposing". In this example, we add four different layers on top of each other:
PdfReader reader = new PdfReader(SOURCE);
// step 1
Document document = new Document(PageSize.POSTCARD);
// step 2
PdfWriter writer
= PdfWriter.getInstance(document, new FileOutputStream(RESULT));
// step 3
document.open();
// step 4
PdfContentByte canvas = writer.getDirectContent();
PdfImportedPage page;
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
page = writer.getImportedPage(reader, i);
canvas.addTemplate(page, 1f, 0, 0, 1, 0, 0);
}
// step 5
document.close();
reader.close();
In other words, a 4-page document, is now a 1-page document where all the pages are rendered on top of each other. What you now need to do, is to change step 1, so that the dimension of the new pages are different, and to adapt step 4, so two pages are added next to each other and a new page is added after each two pages:
page = writer.getImportedPage(reader, i);
canvas.addTemplate(page, 1f, 0, 0, 1, 0, 0);
i++;
if (i <= reader.getNumberOfPages())
page = writer.getImportedPage(reader, i);
canvas.addTemplate(page, 1f, 0, 0, 1, width / 2, 0);
document.newPage();
In this example, I assume that the height of the original document is equal to the height of the new document and that the width of the new document is twice the width of the original document. It goes without saying that you can also choose to create a new document with the same width and a double height. In that case, you need:
page = writer.getImportedPage(reader, i);
canvas.addTemplate(page, 1f, 0, 0, 1, 0, height / 2);
i++;
if (i <= reader.getNumberOfPages())
page = writer.getImportedPage(reader, i);
canvas.addTemplate(page, 1f, 0, 0, 1, 0, 0);
document.newPage();
I was wondering if it was possible to mark strings in pdf with different color or underline them while looping through the pdf document ?
It's possible on creating a document. Just use different chunks to set the style. Here's an example:
Document document = new Document();
PdfWriter.getInstance(document, outputStream);
document.open();
document.add(new Chunk("This word is "));
Chunk underlined = new Chunk("underlined");
underlined.setUnderline(1.0f, -1.0f); //We can customize thickness and position of underline
document.add(underlined);
document.add(new Chunk(". And this phrase has "));
Chunk background = new Chunk("yellow background.");
background.setBackground(BaseColor.YELLOW);
document.add(background);
document.add(Chunk.NEWLINE);
document.close();
However, it's almost impossible to edit an existing PDF document. The author of iText writes in his book:
In a PDF document, every character or glyph on a PDF page has its
fixed position, regardless of the application that’s used to view the
document. This is an advantage, but it also comes with a disadvantage.
Suppose you want to replace the word “edit” with the word “manipulate”
in a sentence, you’d have to reflow the text. You’d have to reposition
all the characters that follow that word. Maybe you’d even have to
move a portion of the text to the next page. That’s not trivial, if
not impossible.
If you want to “edit” a PDF, it’s advised that you change the original
source of the document and remake the PDF.
Aspose.PDF APIs support to create new PDF document and manipulate existing PDF documents without Adobe Acrobat dependency. You can search and add Highlight Annotation to mark PDF text.
REST API Solution using Aspose.PDF Cloud SDK for Java:
// For complete examples and data files, please go to https://github.com/aspose-pdf-cloud/aspose-pdf-cloud-java
String name = "02_pages.pdf";
String folder="Temp";
String remotePath=folder+"/"+name;
// File to upload
File file = new File("C:/Temp/"+name);
// Storage name is default storage
String storage = null;
// Get App Key and App SID from https://dashboard.aspose.cloud/
PdfApi pdfApi = new PdfApi("xxxxxxxxxxxxxxxxxxxx", "xxxxx-xxxx-xxxx-xxxx-xxxxxxxxx");
//Upload file to cloud storage
pdfApi.uploadFile(remotePath,file,storage);
//Text Position
Rectangle rect = new Rectangle().LLX(259.27580539703365).LLY(743.4707997894287).URX(332.26148873138425).URY(765.5148007965088);
List<AnnotationFlags> flags = new ArrayList<>();
flags.add(AnnotationFlags.DEFAULT);
HighlightAnnotation annotation = new HighlightAnnotation();
annotation.setName("Name Updated");
annotation.rect(rect);
annotation.setFlags(flags);
annotation.setHorizontalAlignment(HorizontalAlignment.CENTER);
annotation.setRichText("Rich Text Updated");
annotation.setSubject("Subj Updated");
annotation.setPageIndex(1);
annotation.setZindex(1);
annotation.setTitle("Title Updated");
annotation.setModified("02/02/2018 00:00:00.000 AM");
List<HighlightAnnotation> annotations = new ArrayList<>();
annotations.add(annotation);
//Add Highlight Annotation to the PDF document
AsposeResponse response = pdfApi.postPageHighlightAnnotations(name,1, annotations, storage, folder);
//Download annotated PDF file from Cloud Storage
File downloadResponse = pdfApi.downloadFile(remotePath, null, null);
File dest = new File("C:/Temp/HighlightAnnotation.pdf");
Files.copy(downloadResponse.toPath(), dest.toPath(), java.nio.file.StandardCopyOption.REPLACE_EXISTING);
System.out.println("Completed......");
On-Premise Solution using Aspose.PDF for Java:
// For complete examples and data files, please go to https://github.com/aspose-pdf/Aspose.Pdf-for-Java
// Instantiate Document object
Document document = new Document("C:/Temp/Test.pdf");
// Create TextFragment Absorber instance to search particular text fragment
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("Estoque");
// Iterate through pages of PDF document
for (int i = 1; i <= document.getPages().size(); i++) {
// Get first page of PDF document
Page page = document.getPages().get_Item(i);
page.accept(textFragmentAbsorber);
}
// Create a collection of Absorbed text
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();
// Iterate on above collection
for (int j = 1; j <= textFragmentCollection.size(); j++) {
TextFragment textFragment = textFragmentCollection.get_Item(j);
// Get rectangular dimensions of TextFragment object
Rectangle rect = new Rectangle((float) textFragment.getPosition().getXIndent(), (float) textFragment.getPosition().getYIndent(), (float) textFragment.getPosition().getXIndent() + (float) textFragment.getRectangle().getWidth(), (float) textFragment.getPosition().getYIndent() + (float) textFragment.getRectangle().getHeight());
// Instantiate HighLight Annotation instance
HighlightAnnotation highLight = new HighlightAnnotation(textFragment.getPage(), rect);
// Set opacity for annotation
highLight.setOpacity(.80);
// Set the color of annotation
highLight.setColor(Color.getYellow());
// Add annotation to annotations collection of TextFragment
textFragment.getPage().getAnnotations().add(highLight);
}
// Save updated document
document.save("C:/Temp/HighLight.pdf");
P.S: I work as support/evangelist developer at Aspose.
Reading multicolumned PDF document
When iText read the PDF (Extract a page content into a string variable), then the content would be fixed there by:
reader = new PdfReader(getResources().openRawResource(R.raw.resume1));
original_content = PdfTextExtractor.getTextFromPage(reader, 2);
String sub_content = original_content.trim().replaceAll(" {2,}", " ");
sub_content = sub_content.trim().replaceAll("\n ", "\n");
sub_content = sub_content.replaceAll("(.+)(?<!\\.)\n(?!\\W)", "$1 ");
if the document is 1 column only but if the document has multicolumn, it would extract the document 1 per line. it would combine left and right column.
I am using this as a sample PDF this is from START QA document.
How to read a multicolumned PDF document?
There are two different approaches to this problem, and the choice which to use depends on the PDF itself.
If strings in the page content of the PDF in questions already are in the desired order: Instead of the LocationTextExtractionStrategy implicitly used by the overload of PdfTextExtractor.getTextFromPage you use, explicitly use the SimpleTextExtractionStrategy; in your case:
original_content = PdfTextExtractor.getTextFromPage(reader, 2, new SimpleTextExtractionStrategy());
If the strings in the page content of the PDF in question are not in the desired order: Instead of the LocationTextExtractionStrategy implicitly used by the overload of PdfTextExtractor.getTextFromPage you use, explicitly wrap one such strategy in a FilteredTextRenderListener restricting it to receive text for the area of a single column only; in your case:
Rectangle left = new Rectangle(0, 0, 306, 792);
Rectangle right = new Rectangle(306, 0, 612, 792);
RenderFilter leftFilter = new RegionTextRenderFilter(left);
RenderFilter rightFilter = new RegionTextRenderFilter(right);
[...]
TextExtractionStrategy strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), leftFilter);
original_content = PdfTextExtractor.getTextFromPage(reader, 2, strategy);
originalContent += " ";
strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), rightFilter);
original_content += PdfTextExtractor.getTextFromPage(reader, 2, strategy);
I have 5 single page tiff images.
I want to combine all these 5 tiff images in to one multipage tiff image.
I am using Java Advanced Imaging API.
I have read the JAI API documentation and tutorials given by SUN.
I am new to JAI. I know the basic core java.
I dont understand those documentation and turorial by SUN.
So friends Please tell me how to combine 5 tiff image file in to one multipage tiff image.
Please give me some guidence on above topic.
I have been searching internet for above topic but not getting any single clue.
I hope you have the computer memory to do this. TIFF image files are large.
You're correct in that you need to use the Java Advanced Imaging (JAI) API to do this.
First, you have to convert the TIFF images to a java.awt.image.BufferedImage. Here's some code that will probably work. I haven't tested this code.
BufferedImage image[] = new BufferedImage[numImages];
for (int i = 0; i < numImages; i++) {
SeekableStream ss = new FileSeekableStream(input_dir + file[i]);
ImageDecoder decoder = ImageCodec.createImageDecoder("tiff", ss, null);
PlanarImage op = new NullOpImage(decoder.decodeAsRenderedImage(0), null, null, OpImage.OP_IO_BOUND);
image[i] = op.getAsBufferedImage();
}
Then, you convert the BufferedImage array back into a multiple TIFF image. I haven't tested this code either.
TIFFEncodeParam params = new TIFFEncodeParam();
OutputStream out = new FileOutputStream(output_dir + image_name + ".tif");
ImageEncoder encoder = ImageCodec.createImageEncoder("tiff", out, params);
Vector vector = new Vector();
for (int i = 0; i < numImages; i++) {
vector.add(image[i]);
}
params.setExtraImages(vector.listIterator(1)); // this may need a check to avoid IndexOutOfBoundsException when vector is empty
encoder.encode(image[0]);
out.close();