Unable to compress excel file (XLSX or XLS) using java - java

In one of java sprint MVC based application, i need to compress size of excel file, because application is unable to send email attachment.
any one please help

The formats with 'x' at the end are actually zip archives, so any more compression is moot. You could unzip the Excel sheet and look for data based compression.
Images in lower resolution.
Excel XML with less repetition of styles, attributes.
Repeated expressions.
Maybe there are embedded fonts.

Here is a tutorial of how to Zip single/multiple files in Java which should get you what you require:
http://www.kscodes.com/java/how-to-compress-files-in-java/

Related

Optimize Apache POI .xls file append

Can someone please let me know if there is a memory efficient way to append to .xls files. (Client is very insistent on .xls file for the report and I did all possible research but in vain) All I could find is that to append to existing .xls, we first have to load the entire file into memory, append data and then write it back. Is that the only way ? I can afford to give up on time to optimize memory consumption.
I am afraid that is not possible using apache poi. And I doubt that it will be possible by other libraries. Even the Microsoft applications itself needs always opening the whole file to be able to work with it.
All of the Microsoft Office file formats have a complex internal structure similar to a file system. And the parts of that internal system may have relations to each other. So one cannot simply stream data into those files and append data as it is possible with plain text files or CSV files or single XML files for example. One always needs considering the validity of the complete file system and its realtions. So the complete file system always needs to be known. And where should it be known when not in memory?
The modern Microsoft Office file formats are Office Open XML. This are ZIP archives containing an internal file system having a directory structure containing XML files and other files too. So one can reduce the memory footprint by reading data parts from that ZIP file system directly instead of reading all data into the memory by unzipping the ZIP file system. This is what apache poi tries with XSSF and SAX (Event API). But this is for reading only.
For the writing approach one could have parts of the data (single XML files) written to temporary files to keep them away from the memory. Then put the complete ZIP file system together from those temporary files when all writing is complete. This is what SXSSF (Streaming Usermodel API) tries to do. But this is for writing only.
When it comes to appending data to an existing Microsoft Office file, then nothing of the above is useable. Because, as said already, one always needs considering the validity of the complete file system and its realtions. So the complete file system always needs to be known. So the whole file system always needs to be accessible to append data parts to it and update the relationships. One could think about having all data parts (single XML files) and relationship parts in temporary files to keep them away from the memory. But I don't know any library (maybe the closed source ones like Aspose) who does this. And I doubt that will be possible in a performant way. So you would pay time for a lower memory footprint.
The older Microsoft Office file formats are binary file systems but also consists in an complex internal structure. The single parts are streams of binary records which also may have relations to each other. So the main problem is the same as with Office Open XML.
There is Event API (HSSF Only) which tries reading single record streams similiar to the event API for Office Open XML. But, of course, this is for reading only.
There is no streaming approach for writing HSSF upto now. And the reason is that the old binary Excel worksheets only provide 65,536 rows and 256 columns. So the data amount in one sheet cannot be that big. So a GB sized *.xls file should not occur at all. You should not use Excel as data exchange format for database data. This is not what a spreadsheet calculation application is made for.
But even if one would program a streaming approach for writing HSSF this would not solve your problem. Because there is still nothing for appending data to an existing *.xls file. And the problems for this are the same as with the Office Open XML file formats.

Get Size of Text/Font in a PDF File using PDFBox from Apache

the task is to write a Java file that analyzes a PDF file. PDFBox from Apache should be used.
The number of words, the number of images, the names of the fonts used, etc. are all no problem.
My problem is: How do I get all used Font Sizes in the PDF file? I read a lot, that I have to use TextStrippe and writeString, but I dont see a solution.
So how do I get the Font Sizes in pt. for a pdf file? Has anyone an idea or solution?

How to verify text/content in thousands of PDF files

I want to verify/assert certain set of text or sentence in each PDF files automatically. I have 1000s of PDF files which needs to be verified whether a specific text/sentence is present in it.
You can do this by using Apache Lucene and Apache pdfbox.
Please refer to this post: http://www.programming-free.com/2012/11/simple-word-search-in-pdf-files-using.html

Insert Image into Excel - Java

I am working on a java web application where i have to show an image on excel file.
i used the java file iopo to write the image to the excel file.
Issue is when the user mail this to client the image does not show up
Is there any way to embed the image into the excel file using java with/without using any external API.
You can use Apache POI: http://poi.apache.org
See especially http://poi.apache.org/spreadsheet/quick-guide.html#Images
You must use an external library as there is no standard Excel API for Java.
You should use something along the lines of JExcelAPI.

What is the best way to display multiple PDF files via browser?

I'm developing an web application using Flex and JSP.
I am having some performance issues with displaying multiple PDF files.
I am trying to display about 50-100 PDF files. I know that is a little crazy.
Hence, I made the project to convert PDF files to JPG format and display the JPG files.
I'm wondering if there is a way to decrease the file size of PDF to size of JPG.
Additionally, I would like to seek other way that may improve the performance.
Does anyone know a good way to display many PDF files (that will be mostly just text) for web application? Or, should I just have it display JPG files?
If the PDF files are mostly text you should probably use HTML. Is there something that would prevent you from making regular pages from your PDFs?
You can convert the PDF to rtf text file, use the text from rtf file to populate your HTML page perhaps in a table.
Check out ghostscript lib for doing this conversion.

Categories