Invalid header signature; IOException with Apache POI on excel document - java

I'm getting:
java.io.IOException: Invalid header signature; read
0x000201060000FFFE, expected 0xE11AB1A1E011CFD0
when trying to add some custom properties to an Excel document using apache POI HPSF.
I'm completely sure the file is Excel OLE2 (not HTML, XML or something else that Excel doesn't complain about).
This is a relevant part of my code:
try {
final POIFSFileSystem poifs = new POIFSFileSystem(event.getStream());
final DirectoryEntry dir = poifs.getRoot();
final DocumentEntry dsiEntry = (DocumentEntry)
dir.getEntry(DocumentSummaryInformation.DEFAULT_STREAM_NAME);
final DocumentInputStream dis = new DocumentInputStream(dsiEntry);
final PropertySet props = new PropertySet(dis);
dis.close();
dsi = new DocumentSummaryInformation(props);
}
catch (Exception ex) {
throw new RuntimeException
("Cannot create POI SummaryInformation for event: " + event +
", path:" + event.getPath() +
", name:" + event.getPath() +
", cause:" + ex);
}
I get the same error when trying with word and power point files (also OLE2).
I'm completely out of ideas so any help/pointers are greatly appreciated :)

If you flip the signature number round, you'll see the bytes of the start of your file:
0x000201060000FFFE -> 0xFE 0xFF 0x00 0x00 0x06 0x01 0x02 00
The first two bytes look like a Unicode BOM, 0xFEFF means 16 bit little endian. You then have some low control bytes, the hex codes for 0 then 258 then 2, so maybe it isn't a text file after all.
That file really isn't an OLE2 file, and POI is right to give you the error. I don't know what it is, but I'm guessing that perhaps it might be part of an OLE2 file without it's outer OLE2 wrapper? If you can open it with office, do a save-as and POI should be fine to open that. As it stands, that header isn't an OLE2 file header so POI can't open it for you.

In my case, the file was a CSV file saved with the .xls extension. Excel was able to open it without a problem, but POI was not.
If I find a better/more general solution, I'll come back and write it up here.

Try save it as csv file directly and use opencsv for your operations.
Use the following link to know about opencsv.
http://opencsv.sourceforge.net/#what-is-opencsv
Excel can open a csv, xls or even html table saved as xls.
So you can save the file as file_name.csv and can use opencsv for reading the file in your code.
Or else you can the file once in excel by save As excel 97-2003 workbook.
And then, POI itself can read the file :-)

because you saved your file by Excel 2013. save As your file as excel 97-2003 format.

I had the same problem with an xls file generated by software, I am forced to save files with Excel (the same format) to be able to read with apache POI.

I was using the .xlsx file instead of .xls. We have to use the .xls file if we are using Workbook, Sheet and Row classes.
My file was .xlsx, that created this issue and I changed it to .xls, it worked.

Related

I would like to create a copy of word or excel file using poi

I would like to create a copy of word or excel file using poi.
I know that poi is also used when reading a word or excel file. Reading means not only values but also attribute such as font size or table color and backgroud colors for each cells. Reading values and attribute of the xlsx or docx document, I want to make a copy of the word or Excel document as it is. Is it possible that the related source is open at open source on the any site?
read Apache POI or docx4j for dealing with docx documents
you can find the techniques related to adding text into document you can found out on https://www.slideshare.net/plutext/document-generation-2012osdcsydney
use POI's HWPF support. this is often enclosed in docx4j as a dependency. however its not an excellent approach, since it does not convert the doc to docx4j's internal representation:- you are kind of stuck in HWPF land
use JODConverter to convert the doc to a docx, and if necessary, back again. this is often the simplest .
To open an excel from one file, and save it to another file I use this code.
//open source excel
InputStream template = new FileInputStream("C:\\source excel path\\input.xlsx");
Workbook wb = WorkbookFactory.create(template);
//Saving excel to a different location or filename.
FileOutputStream out = new FileOutputStream("C:\\path to copy excel to\\output.xlsx");
wb.write(out);
wb.close();
out.close();
template.close();

how to know whether a file is .docx or .doc format from Apache POI

I know we can get it done by extension or by mime type, do we have any other way through which we can get the idea of type of file whether it is .docx or .doc.
If it is just a matter of decided whether a collection of files known to either be .doc or .docx but are not marked accordingly with an extension, you can use the fact that a .docx file is a zipped collection of files. Something to the tune as follows might help:
boolean isZip = new ZipInputStream( fileStream ).getNextEntry() != null;
where fileStream is whatever file or other input stream you wish to evaluate. You could further evaluate a zipped file by looking for key .docx entries. A good starting reference is Word Document (DOCX). Likewise, if you know it is just a binary file, you can test for Word's File Information Block (see Word (.doc) Binary File Format)
You could use Apache Tika for content Detection. But you should been aware that this is a huge framework (many required dependencies) for such a small task.
There is a way, no strightforward though. But with Apache POI, you can locate it.
Try to read a .docx file using HWPFDocument Class. It would give you the following error
org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied
data appears to be in the Office 2007+ XML. You are calling the part
of POI that deals with OLE2 Office Documents. You need to call a
different part of POI to process this data (eg XSSF instead of HSSF)
String filePath = "C:\\XXXX\XXXX.docx";
FileInputStream inStream;
try {
inStream = new FileInputStream(new File(filePath));
HWPFDocument doc = new HWPFDocument(inStream);
WordExtractor wordExtractor = new WordExtractor(doc);
System.out.println("Getting words"+wordExtractor.getText());
} catch (Exception e) {
System.out.print("Its not a .doc format");
}
.docx can be read using XWPFDocument Class.
Why dont you use Apache Tika:
File file = new File('File Here');
Tika tika = new Tika();
String filetype = tika.detect(file);
System.out.println(filetype);
Assuming you're using Apache POI, you have a few options.
One is to grab the first few bytes of the file, and ask POIFSFileSystem with the hasPOIFSHeader(byte) method. If you have a stream that supports mark/reset, you can instead use POIFSFileSystem.hasPOIFSHeader(InputStream). If those return true then try to open it as a .doc with HWPF, otherwise try as .docx with XWPF
Otherwise, if you prefer a try/catch way, try to open it with POIFSFileSystem and catch OfficeXmlFileException - if it opens fine it's .doc, if you get the exception it's .docx
If you look at the source code for WorkbookFactory you'll see the first pattern in use, you can copy a similar set of logic form that

Apache POI HSSF XLS reading error

Using the following code while reading in a .xls file, where s is the file directory:
InputStream input = new FileInputStream(s);
Workbook wbs = new HSSFWorkbook(input);
I get the following error message:
Exception in thread "main" java.io.IOException: Invalid header signature; read 0x0010000000060809, expected 0xE11AB1A1E011CFD0
I need a program that is able to read in either XLSX or XLS, and using the exact same code just adjusted for XSSF it has no problem at all reading in the XLSX file.
The Exception you're getting is one telling you that the file you're supplying isn't a valid Excel binary file, at least not a valid Excel file produced since about 1990. The exception you're getting tells you what POI expects, and that it found something else instead which wasn't a valid .xls file, and wasn't anything else POI can detect.
One thing to be aware of is that Excel opens a wide variety of different file formats, including .csv and .html. It's also not very picky about the file extension, so will happily open a CSV file that has been renamed to a .xls one. However, since renaming a .csv to a .xls doesn't magically change the format, POI still can't open it!
.
From the exception, I can tell what's happening, and I can also tell you're using an ancient version of Apache POI! A header signature of 0x0010000000060809 corresponds to the Excel 4 file format, from about 25 years ago! If you use a more recent version of Apache POI, it'll give you a helpful error message telling you that the file supplied is an old and largely unsupported Excel file. New versions of POI do include the OldExcelExtractor tool which can pull out some information from those ancient formats.
Otherwise, as with all exceptions of this type, try opening the file in Excel and doing a save-as. That will give you an idea of what the file currently is (eg .html saved as .xls, .csv saved as .xls etc), and will also let you re-save it as a proper .xls file for POI to load and work with.
If the file is in xlsx format instead of xls you might get this error. I would try using the generic Workbook object (Also called the SS Usermodel)
Check out the Workbook interface and the WorkbookFactory object. The factory should be able to create a generic Workbook for you out of either xlsx or xls.
I thought I had a good tutorial on this, but I can't seem to find it. I'll keep looking though.
Edit
I found this little tiny snippet from Apache's site about reading and rewriting using the SS Usermodel.
I hope this helps!
Invalid header signature; read 0x342E312D46445025, expected 0xE11AB1A1E011CFD0
Well I got this error when I uploaded corrupted xls/xlsx file(to upload corrupt file I renamed sample.pdf to sample.xls). Add validation like :
Workbook wbs = null;
try {
InputStream input = new FileInputStream(s);
wbs = new HSSFWorkbook(input);
} catch(IOException e) {
// log "file is corrupted", show error message to user
}

how to solve JXL error : jxl.read.biff.BiffException: Unable to recognize OLE stream

i am trying to get cell data from my .csv file but it gets error :
jxl.read.biff.BiffException: Unable to recognize OLE stream
I don't understand how to solve this,please give me some solution
this code is for jxl api & is that api support to .csv?
Code for reference:
public void read() throws IOException, BiffException {
File inputWorkbook = new File(inputFile);
try
{
w = Workbook.getWorkbook(inputWorkbook.getAbsoluteFile());
// Get the first sheet
Sheet sheet = w.getSheet(0);
// Loop over first 10 column and lines
for (row = 1; row < sheet.getRows(); row++)
{
ReadExcelLotSizeEntity readExcelLotSizeEntity =new ReadExcelLotSizeEntity();
cell = sheet.getCell(1,row);
type= cell.getType();
if (cell.getType() == CellType.LABEL)
{
symbol=cell.getContents();
System.out.println(":::::::::::::::::"+symbol);
readExcelLotSizeEntity.setSymbol(symbol);
}
int col=2;
cell = sheet.getCell(col,row);
while(!cell.getContents().equals("")||cell.getContents()!=null)
{
System.out.println("||||||||||||||||"+cell.getContents());
cell=sheet.getCell(col,row);
col++;
}
lotSize= new Double(cell.getContents());
readExcelLotSizeEntity.setLotSize(lotSize);
readExcelLotSizeEntity.setCreateUserId(1L);
readExcelLotSizeEntity.setCreateDtTm(new Date());
readExcelLotSizeHome.persist(readExcelLotSizeEntity);
}
} catch (BiffException e) {
e.printStackTrace();
}
}
I was also facing this problem earlier. I googled and read this post and many other posts that were asking for solution to this BiffException. I don't have the exact solution but as I fixed my problem you can do it too, perhaps.
I was trying to read data from the Excel file saved in MS Office 2010 and I was getting this error. I saved the file as an Excel 2003-7 and then read it without any problem. It may the case that this problem occurs in Office 10 but not in Office 2003-7.
I hope this will work in your case.
Saving File as "Excel 97-2003 Workbook" type solved my issue.
JXL library doesnot support .csv and .xslx formats, which is the format used by Excel-2010. hence, use Excel 97-2003 which is .xls foramatted and is supported by JXL library.
or else if you want to use excel-2010, use APACHE POI(XSSFWorkbooks) instead of JXL.
For using .csv format, google for CSVReader libraries.
JXL is a simple (and hence limited) API. If it says
Unable to recognize OLE stream
it is what it is. It doesn't quite understand your Excel XLS file. Have confidence that the error is legitimate. This API only supports *.xls files; it doesn't support, for example, *.csv or *.xlsx files. Obviously, having the file renamed to *.xls alone is not sufficient. It must be in Excel 97-2003 format too.
Copy all the cells from your *.csv or *.xlsx file.
Open MS Excel and paste the copied cells.
Save the file as MS Excel 97-2003 (*.xls) file.
This error will surely not appear again.
On the other hand, if you want to process other formats (xlsx, csv) directly, look for other tools like Apache POI.
Save the Excel file type as Excel 97-2003 Worksheet and extension type as xls
Actually you are using different version of csv file .Please save it in the exact version.
For ex: we should save the excel sheet in word as 9
save the file as Excel 97-2003 and also change the file format from xlsx to xlx , in the code(in the file name)
I was trying to read data from the Excel file saved in MS Office 2010 and I was getting this error. I saved the file as an Excel 2003-7 and then read it without any problem. It may the case that this problem occurs in Office 10 but not in Office 2003-7

JXL Won't Allow Excel 2007 Extensions

I am using JXL to write an Excel report. Everything works fine if I use the .XLS extension. However, when I use the .XLSM extension, the report fails to load when I try to open it. I get the message "Excel cannot open the file 'TestExcel.xlsm' because the file format or file extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file". Below is a simplified version of what I am doing. As mentioned, it works if you change the filePath to use .XLS instead.
String filePath = "C:\\Users\\Rachel\\Desktop\\TestExcel.xlsm";
File excelFile = new File(filePath);
WritableWorkbook book = Workbook.createWorkbook(excelFile);
WritableSheet sheet = book.createSheet("Test Sheet", 0);
sheet.addCell(new Label(0, 0, "Testing..."));
book.write();
book.close();
I am using Excel 2007 to open the file. If I create a new workbook in Excel, save it as xlsm, close it, and then open it, Excel opens it correctly. Does anyone know how to make the JXL file open correctly? I cannot switch to Apache POI or anything, I have to use JXL.

Categories