I want to merge PDF files; around 20 files of 60 MB each. I am using iText API to merge it.
The problem is, I have to complete it within 2 seconds but my code is taking 8 seconds.
Any solution to speeding up merging PDF files?
private void mergeFiles(List<String> filesToBeMerged, String mergedFilePath) throws Exception {
Document document = null;
PdfCopy copy = null;
PdfReader reader = null;
BufferedOutputStream bos = null;
int bufferSize = 8 * 1024 * 1024;
String pdfLocation = "C:\\application\\projectone-working\\projectone\\web\\pdf\\";
try {
int fileIndex = 0;
for (String file : filesToBeMerged)
{
reader = new PdfReader(pdfLocation+"/"+file);
reader.consolidateNamedDestinations();
int totalPages = reader.getNumberOfPages();
if (fileIndex == 0) {
document = new Document(reader.getPageSizeWithRotation(1));
bos = new BufferedOutputStream(new FileOutputStream(mergedFilePath), bufferSize);
copy = new PdfCopy(document, bos);
document.open();
}
PdfImportedPage page;
for (int currentPage = 1; currentPage <= totalPages; currentPage++) {
page = copy.getImportedPage(reader, currentPage);
copy.addPage(page);
}
PRAcroForm form = reader.getAcroForm();
if (form != null) {
copy.copyAcroForm(reader);
}
}
document.close();
} finally {
if (reader != null) {
reader.close();
}
if (bos != null) {
bos.flush();
bos.close();
}
if (copy != null) {
copy.close();
}
}
}
Related
I am trying to count the number of attachments on a PDF to verify our attachment code. The code I have works most of the time but recently it started failing when the number of attachments went up as well as the size of the attachments. Example: I have a PDF with 700 attachments which total 1.6 gb. And another with 65 attachments of around 10mb. The 65 count was done incrementally. We had built it up file by file. At 64 files (about 9.8mb) the routine counted fine. Add file 65 (about .5mb) and the routine failed.
This is on itextpdf-5.5.9.jar under jre1.8.0_162
We are still testing different combinations of file numbers and size to see where it breaks.
private static String CountFiles() throws IOException, DocumentException {
Boolean errorFound = new Boolean(true);
PdfDictionary root;
PdfDictionary names;
PdfDictionary embeddedFiles;
PdfReader reader = null;
String theResult = "unknown";
try {
if (!theBaseFile.toLowerCase().endsWith(".pdf"))
theResult = "file not PDF";
else {
reader = new PdfReader(theBaseFile);
root = reader.getCatalog();
names = root.getAsDict(PdfName.NAMES);
if (names == null)
theResult = "0";
else {
embeddedFiles = names.getAsDict(PdfName.EMBEDDEDFILES);
PdfArray namesArray = embeddedFiles.getAsArray(PdfName.NAMES);
theResult = String.format("%d", namesArray.size() / 2);
}
reader.close();
errorFound = false;
}
}
catch (Exception e) {
theResult = "unknown";
}
finally {
if (reader != null)
reader.close();
}
if (errorFound)
sendError(theResult);
return theResult;
}
private static String AttachFileInDir() throws IOException, DocumentException {
String theResult = "unknown";
String outputFile = theBaseFile.replaceFirst("(?i).pdf$", ".attach.pdf");
int maxFiles = 1000;
int fileCount = 1;
PdfReader reader = null;
PdfStamper stamper = null;
try {
if (!theBaseFile.toLowerCase().endsWith(".pdf"))
theResult = "basefile not PDF";
else if (theFileDir.length() == 0)
theResult = "no attach directory";
else if (!Files.isDirectory(Paths.get(theFileDir)))
theResult = "invalid attach directory";
else {
reader = new PdfReader(theBaseFile);
stamper = new PdfStamper(reader, new FileOutputStream(outputFile));
stamper.getWriter().setPdfVersion(PdfWriter.VERSION_1_7);
Path dir = FileSystems.getDefault().getPath(theFileDir);
DirectoryStream<Path> stream = Files.newDirectoryStream(dir);
for (Path path : stream) {
stamper.addFileAttachment(null, null, path.toFile().toString(), path.toFile().getName());
if (++fileCount > maxFiles) {
theResult = "maxfiles exceeded";
break;
}
}
stream.close();
stamper.close();
reader.close();
theResult = "SUCCESS";
}
}
catch (Exception e) {
theResult = "unknown";
}
finally {
if (stamper != null)
stamper.close();
if (reader != null)
reader.close();
}
if (theResult != "SUCCESS")
sendError(theResult);
return theResult;
}
I expect a simple count of attachments back. What seems to be happening is the namesArray is coming back null. The result stays "unknown". I suspect the namesArray is trying to hold all the files and choking on the size.
Note: The files are being attached using the AttachFileInDir procedure. Dump all the files in a directory and run the AttachFileInDir. And yes the error trapping in AttachFileInDir needs work.
Any help would be appreciated or another method welcome
I finally got it. Turns out each KID is a dictionary of NAMES….
Each NAMES hold 64 file references. At 65 files and up it made a KIDS dictionary array of names. So 279 files = ( 8*64 +46 )/2 (9 total KIDS array elements).
One thing that I had to compensate for. If one deletes all the attachments off a pdf it leaves artifacts behind as opposed to a PDF that never had an attachment
private static String CountFiles() throws IOException, DocumentException {
Boolean errorFound = new Boolean(true);
int totalFiles = 0;
PdfArray filesArray;
PdfDictionary root;
PdfDictionary names;
PdfDictionary embeddedFiles;
PdfReader reader = null;
String theResult = "unknown";
try {
if (!theBaseFile.toLowerCase().endsWith(".pdf"))
theResult = "file not PDF";
else {
reader = new PdfReader(theBaseFile);
root = reader.getCatalog();
names = root.getAsDict(PdfName.NAMES);
if (names == null){
theResult = "0";
errorFound = false;
}
else {
embeddedFiles = names.getAsDict(PdfName.EMBEDDEDFILES);
filesArray = embeddedFiles.getAsArray(PdfName.NAMES);
if (filesArray != null)
totalFiles = filesArray.size();
else {
filesArray = embeddedFiles.getAsArray(PdfName.KIDS);
if (filesArray != null){
for (int i = 0; i < filesArray.size(); i++)
totalFiles += filesArray.getAsDict(i).getAsArray(PdfName.NAMES).size();
}
}
theResult = String.format("%d", totalFiles / 2);
reader.close();
errorFound = false;
}
}
}
catch (Exception e) {
theResult = "unknown" + e.getMessage();
}
finally {
if (reader != null)
reader.close();
}
if (errorFound)
sendError(theResult);
return theResult;
}
I am basing my code from this
https://github.com/Betel-Flowers/BetelFlowers/blob/master/BetelFlowers-ejb/src/main/java/com/betel/flowers/pdf/util/RemoveBlankPageFromPDF.java
or this
http://www.rgagnon.com/javadetails/java-detect-and-remove-blank-page-in-pdf.html
I am trying to use a byte array as input a byte array as output.
This is my code
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.pdf.PdfCopy;
import com.lowagie.text.pdf.PdfDictionary;
import com.lowagie.text.pdf.PdfImportedPage;
import com.lowagie.text.pdf.PdfName;
import com.lowagie.text.pdf.PdfReader;
import com.lowagie.text.pdf.RandomAccessFileOrArray;
public class RemoveBlankPageFromPDF {
// value where we can consider that this is a blank image
// can be much higher or lower depending of what is considered as a blank page
public static final int BLANK_THRESHOLD = 160;
public static byte[] removeBlankPdfPages(byte[] fuente) throws IOException, DocumentException{
PdfReader r = null;
RandomAccessFileOrArray raf = null;
Document document = null;
PdfCopy writer = null;
ByteArrayOutputStream archivoFinal = new ByteArrayOutputStream();
try {
r = new PdfReader(fuente);
raf = new RandomAccessFileOrArray(fuente);
document = new Document(r.getPageSizeWithRotation(1));
writer = new PdfCopy(document,archivoFinal);
document.open();
PdfImportedPage page = null;
for (int i = 1; i <= r.getNumberOfPages(); i++) {
PdfDictionary pageDict = r.getPageN(i);
PdfDictionary resDict = (PdfDictionary) pageDict.get(PdfName.RESOURCES);
boolean noFontsOrImages = true;
if (resDict != null) {
noFontsOrImages = resDict.get(PdfName.FONT) == null
&& resDict.get(PdfName.XOBJECT) == null;
}
if (!noFontsOrImages) {
byte bContent[] = r.getPageContent(i, raf);
ByteArrayOutputStream bs = new ByteArrayOutputStream();
bs.write(bContent);
System.out.println("bs size: " + bs.size());
if (bs.size() > BLANK_THRESHOLD) {
page = writer.getImportedPage(r, i);
writer.addPage(page);
}
}
}
System.out.println("Original: " + fuente.length+ " new: " + archivoFinal.toByteArray().length);
return archivoFinal.toByteArray();
} finally {
if (document != null) {
document.close();
}
if (writer != null) {
writer.close();
}
if (raf != null) {
raf.close();
}
if (r != null) {
r.close();
}
}
}
}
my pdf gets corrupted i cannot open it after.
Even with a pdf without spaces I get different sizes, it should be the same
Original: 95089 New: 88129
That is my output from my las sysout.
I am using itext 2.1.5 and java 1.5 by the way. I cannot upgrade.
Anyways I found an answer. Just in case somebody needs for an older version of itext
public static void removeBlankPdfPages(PdfReader r) throws IOException{
PdfTextExtractor extractor = new PdfTextExtractor(r);
List<Integer> paginas = new ArrayList<Integer>();
for (int i = 1; i <= r.getNumberOfPages(); i++) {
PdfDictionary pageDict = r.getPageN(i);
PdfDictionary resDict = (PdfDictionary) pageDict.get(PdfName.RESOURCES);
boolean noFontsOrImages = true;
if (resDict != null) {
noFontsOrImages = resDict.get(PdfName.FONT) == null
&& resDict.get(PdfName.XOBJECT) == null;
}
if (!noFontsOrImages) {
String textFromPage = extractor.getTextFromPage(i);
if(textFromPage.length() >50 ){
paginas.add(i);
}
}
}
r.selectPages(paginas);
}
I'm merging multiple files, which originally have 19mb.
But the result is a total of 56mb. How can I make this final value approach the 19mb.
[EDIT]
public void concatena(InputStream anterior, InputStream novo, OutputStream saida, List<String> marcadores)
throws IOException {
PDFMergerUtility pdfMerger = new PDFMergerUtility();
pdfMerger.setDestinationStream(saida);
PDDocument dest;
PDDocument src;
MemoryUsageSetting setupMainMemoryOnly = MemoryUsageSetting.setupMainMemoryOnly();
if (anterior != null) {
dest = PDDocument.load(anterior, setupMainMemoryOnly);
src = PDDocument.load(novo, setupMainMemoryOnly);
} else {
dest = PDDocument.load(novo, setupMainMemoryOnly);
src = new PDDocument();
}
int totalPages = dest.getNumberOfPages();
pdfMerger.appendDocument(dest, src);
criaMarcador(dest, totalPages, marcadores);
saida = pdfMerger.getDestinationStream();
dest.save(saida);
dest.close();
src.close();
}
Sorry, I still do not know how to use stackoverflow very well. I'm trying to post the rest of the code but I'm getting an error
[Edit 2 - add criaMarcador method]
private void criaMarcador(PDDocument src, int numPaginas, List<String> marcadores) {
if (marcadores != null && !marcadores.isEmpty()) {
PDDocumentOutline documentOutline = src.getDocumentCatalog().getDocumentOutline();
if (documentOutline == null) {
documentOutline = new PDDocumentOutline();
}
PDPage page;
if (src.getNumberOfPages() == numPaginas) {
page = src.getPage(0);
} else {
page = src.getPage(numPaginas);
}
PDOutlineItem bookmark = null;
PDOutlineItem pai = null;
String etiquetaAnterior = null;
for (String etiqueta : marcadores) {
bookmark = bookmark(pai != null ? pai : documentOutline, etiqueta);
if (bookmark == null) {
if (etiquetaAnterior != null && !etiquetaAnterior.equals(etiqueta) && pai == null) {
pai = bookmark(documentOutline, etiquetaAnterior);
}
bookmark = new PDOutlineItem();
bookmark.setTitle(etiqueta);
if (marcadores.indexOf(etiqueta) == marcadores.size() - 1) {
bookmark.setDestination(page);
}
if (pai != null) {
pai.addLast(bookmark);
pai.openNode();
} else {
documentOutline.addLast(bookmark);
}
} else {
pai = bookmark;
}
etiquetaAnterior = etiqueta;
}
src.getDocumentCatalog().setDocumentOutline(documentOutline);
}
}
private PDOutlineItem bookmark(PDOutlineNode outline, String etiqueta) {
PDOutlineItem current = outline.getFirstChild();
while (current != null) {
if (current.getTitle().equals(etiqueta)) {
return current;
}
bookmark(current, etiqueta);
current = current.getNextSibling();
}
return current;
}
[Edit 3]Here is the code used for testing
public class PDFMergeTeste {
public static void main(String[] args) throws IOException {
if (args.length == 1) {
PDFMergeTeste teste = new PDFMergeTeste();
teste.executa(args[0]);
} else {
System.err.println("Argumento tem que ser diretorio contendo arquivos .pdf com nomeclatura no padrão Autos");
}
}
private void executa(String diretorioArquivos) throws IOException {
File[] listFiles = new File(diretorioArquivos).listFiles((pathname) ->
pathname.getName().endsWith(".pdf") || pathname.getName().endsWith(".PDF"));
List<File> lista = Arrays.asList(listFiles);
lista.sort(Comparator.comparing(File::lastModified));
PDFMerge merge = new PDFMerge();
InputStream anterior = null;
ByteArrayOutputStream saida = new ByteArrayOutputStream();
for (File file : lista) {
List<String> marcadores = marcadores(file.getName());
InputStream novo = new FileInputStream(file);
merge.concatena(anterior, novo, saida, marcadores);
anterior = new ByteArrayInputStream(saida.toByteArray());
}
try (OutputStream pdf = new FileOutputStream(pathDestFile)) {
saida.writeTo(pdf);
}
}
private List<String> marcadores(String name) {
String semExtensao = name.substring(0, name.indexOf(".pdf"));
return Arrays.asList(semExtensao.split("_"));
}
}
The error is in the executa method:
InputStream anterior = null;
ByteArrayOutputStream saida = new ByteArrayOutputStream();
for (File file : lista) {
List<String> marcadores = marcadores(file.getName());
InputStream novo = new FileInputStream(file);
merge.concatena(anterior, novo, saida, marcadores);
anterior = new ByteArrayInputStream(saida.toByteArray());
}
Your ByteArrayOutputStream saida is re-used in each loop but it is not cleared in-between. Thus, it contains
after processing file 1:
file 1
after processing file 2:
file 1
concatenation of file 1 and file 2
after processing file 3: file 1
file 1
concatenation of file 1 and file 2
concatenation of file 1 and file 2 and file 3
after processing file 4:
file 1
concatenation of file 1 and file 2
concatenation of file 1 and file 2 and file 3
concatenation of file 1 and file 2 and file 3 and file 4
(Actually this only works because PDFBox tries to be nice and fixes broken input files under the hood as these concatenations of files strictly speaking are broken and PDFBox doesn't need to be able to parse them.)
You can fix this by clearing saida at the start of each iteration:
InputStream anterior = null;
ByteArrayOutputStream saida = new ByteArrayOutputStream();
for (File file : lista) {
saida.reset();
List<String> marcadores = marcadores(file.getName());
InputStream novo = new FileInputStream(file);
merge.concatena(anterior, novo, saida, marcadores);
anterior = new ByteArrayInputStream(saida.toByteArray());
}
With your original method the result size for your inputs is nearly 26 MB, with the fixed method it is about 5 MB, and that latter size approximately represents the sum of the sizes of the input files.
hello so I have been writing an updater for my game.
1) It checks a .version file on drop box and compares it to the local .version file.
2) If there is any link missing from the local version of the file, it downloads the required link one by one.
The issue I am having is some of the users can download the zips and some cannot.
One of my users who was having the issue was using windows xp. So some of them have old computers.
I was wondering if anyone could help me to get an idea on what could be causing this.
This is the main method that is ran
public void UpdateStart() {
System.out.println("Starting Updater..");
if(new File(cache_dir).exists() == false) {
System.out.print("Creating cache dir.. ");
while(new File(cache_dir).mkdir() == false);
System.out.println("Done");
}
try {
version_live = new Version(new URL(version_file_live));
} catch(MalformedURLException e) {
e.printStackTrace();
}
version_local = new Version(new File(version_file_local));
Version updates = version_live.differences(version_local);
System.out.println("Updated");
int i = 1;
try {
byte[] b = null, data = null;
FileOutputStream fos = null;
BufferedWriter bw = null;
for(String s : updates.files) {
if(s.equals(""))
continue;
System.out.println("Reading file "+s);
text = "Downloading file "+ i + " of "+updates.files.size();
b = readFile(new URL(s));
progress_a = 0;
progress_b = b.length;
text = "Unzipping file "+ i++ +" of "+updates.files.size();
ZipInputStream zipStream = new ZipInputStream(new ByteArrayInputStream(b));
File f = null, parent = null;
ZipEntry entry = null;
int read = 0, entry_read = 0;
long entry_size = 0;
progress_b = 0;
while((entry = zipStream.getNextEntry()) != null)
progress_b += entry.getSize();
zipStream = new ZipInputStream(new ByteArrayInputStream(b));
while((entry = zipStream.getNextEntry()) != null) {
f = new File(cache_dir+entry.getName());
if(entry.isDirectory())
continue;
System.out.println("Making file "+f.toString());
parent = f.getParentFile();
if(parent != null && !parent.exists()) {
System.out.println("Trying to create directory "+parent.getAbsolutePath());
while(parent.mkdirs() == false);
}
entry_read = 0;
entry_size = entry.getSize();
data = new byte[1024];
fos = new FileOutputStream(f);
while(entry_read < entry_size) {
read = zipStream.read(data, 0, (int)Math.min(1024, entry_size-entry_read));
entry_read += read;
progress_a += read;
fos.write(data, 0, read);
}
fos.close();
}
bw = new BufferedWriter(new FileWriter(new File(version_file_local), true));
bw.write(s);
bw.newLine();
bw.close();
}
} catch(Exception e) {
this.e = e;
e.printStackTrace();
return;
}
System.out.println(version_live);
System.out.println(version_local);
System.out.println(updates);
try {
} catch (Exception er) {
er.printStackTrace();
}
}
I have been trying to fix this for the last two days and I am just so stumped at this point
All the best,
Christian
i'm trying to unGzip and unTar an inputStream in java , i have those methods :
public InputStream unTar(InputStream in) throws IOException {
TarInputStream myTarStream = new TarInputStream(in);
TarEntry entry = myTarStream.getNextEntry();
InputStream input = null;
while (entry != null) {
ByteArrayOutputStream output = new ByteArrayOutputStream();
byte[] buff = new byte[1024];
int read;
do {
read = myTarStream.read(buff);
if (read != -1) {
output.write(buff, 0, read);
}
} while (read != -1);
output.flush();
input = new ByteArrayInputStream(output.toByteArray());
entry = myTarStream.getNextEntry();
}
myTarStream.close();
return input;
}
public InputStream unGzipIt(InputStream in) {
byte[] buffer = new byte[1024];
InputStream outGZIPStream = null;
try {
ByteArrayOutputStream bytesOutput = new ByteArrayOutputStream();
GZIPInputStream gis = new GZIPInputStream(in);
int len;
while ((len = gis.read(buffer)) > 0) {
bytesOutput.write(buffer, 0, len);
}
in.close();
bytesOutput.close();
outGZIPStream = new ByteArrayInputStream(bytesOutput.toByteArray());
} catch (IOException e) {
e.printStackTrace();
}
return outGZIPStream;
}
the problem is when i passed an inputStream from a file in my local , it working,
but when i passed the inputStream from my server response , it doesn't work .
should i use reset and mark ? any help ? thank you
this i s how i'm getting the inputStream :
public InputStream getFolder(#PathParam("id") String envId, #PathParam("appName") String appName, #PathParam("imageType") String imageType,
#QueryParam("folderPath") String folderPath) throws EnvAutomationException, IOException {
Environment env = Envs.getEnvironmentManager().findEnvironment(envId);
ApplicationInstance appInst = env.getApplicationInstance(appName);
Container container = appInst.getContainer(imageType);
InputStream folderData = Envs.getContainerizationManager().getFolder(container, folderPath);
ByteArrayOutputStream output = new ByteArrayOutputStream();
byte[] buff = new byte[1024];
int size = 0;
int read;
do {
read = folderData.read(buff);
if (read != -1) {
size = size + read;
output.write(buff, 0, read);
}
} while (read != -1);
output.flush();
byte[] bo = output.toByteArray();
InputStream input = new ByteArrayInputStream(bo);
InputStream inputGZIP = gzipIt(input);
return inputGZIP;
}
since it's a .tar file , i gizip it and this is the method to gzip;
public InputStream gzipIt(InputStream source) {
byte[] buffer = new byte[1024];
InputStream outGZIPStream = null;
try {
ByteArrayOutputStream bytesOutput = new ByteArrayOutputStream();
GZIPOutputStream gzos = new GZIPOutputStream(bytesOutput);
int len;
int size = 0;
while ((len = source.read(buffer)) > 0) {
gzos.write(buffer, 0, len);
size = size + len;
}
source.close();
gzos.close();
byte[] bo = bytesOutput.toByteArray();
outGZIPStream = new ByteArrayInputStream(bytesOutput.toByteArray());
logger.info("folder tar size :" + size + " ; folder gzip size " + bo.length);
} catch (IOException e) {
e.printStackTrace();
}
return outGZIPStream;
}