Reading Excel Data Issue From DB (CLOB Column) in Java with POI - java

I have a question looks to me so hard at first glance but maybe has very easy solution that I cant figure it out yet. I need to read binary data of an excel file which stored in a oracle database CLOB column.
Everything is ok with reading CLOB as string in java. I get excel file as binaries on a string parameter.
String respXLS = othRaw.getOperationData(); // here I get excel file
InputStream bais = new ByteArrayInputStream(respXLS.getBytes());
POIFSFileSystem fs = new POIFSFileSystem(bais);
HSSFWorkbook wb = new HSSFWorkbook(fs);
Then I try to read ByteStreamData and put in POIFSFileSystem but I get this exception:
java.io.IOException: Invalid header signature; read 0x00003F1A3F113F3F, expected 0xE11AB1A1E011CFD0
I googled some excel problems, they mention about read access. So I download same excel file to hdd and change nothing with it(even I did not open it), and use FileInputStream by giving the file path. It has worked flawless. So what is the reason?
Any advice or alternative way to read from CLOB will be appreciated.
Thanks in advance,
My Regards.

CLOB means Character Large OBject; You want to use a BLOB - Binary Large OBject. Change your database schema.
What happens is that a CLOB will use a Character Set to convert your String to/from the database internal format, whatever that is; this will cause file corruption on non-text contents.
Repeat after me: a String is not a byte[], and a character is not a byte.

Related

How to convert a BLOB to pdf or csv using itext

My goal is to display the appropriate file, when the user clicks on a pdf or xls link.
The contents of a pdf or xlsfile are stored as a blob in a table. A stored procedure takes the file id as an input parameter and returns the blob as output.
I want to be able to display the file and am not sure how to go about it. On doing some reading it looks like i could use itest.
Is there a way to convert the blob to a pdf(or xls), using itext. Is this possible?
I was unable to find any examples that use a blob datatype.
(can't comment on David solution due to low reputation)
If the content on the BLOB record is a PDF binary data, you actually don't need to do anything with iText.
If you are not saving the BLOB to disk before and want (acording to your description) to just display, you could set the content type on the HTTP Response to indicate the browser how to deal with it:
response.setContentType("application/pdf"); // for PDF
response.setContentType("application/vnd.ms-excel"); // For BIFF .xls files
response.setContentType("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"); // For Excel2007 and above .xlsx files
The IOUtils class in the Apache Commons IO library has a method copy which will copy all the bytes from an InputStream to an OutputStream. See the Javadoc.
So once you've got your blob and your HTTP response, you can just write
OutputStream httpOutputStream = httpResponse.getOutputStream();
InputStream blobInputStream = theBlob.getBinaryStream();
IOUtils.copy(blobInputStream, httpOutputStream);
blobInputStream.close();
httpOutputStream.close();
to copy the data. Or you might want to put the two close() calls in a finally block.
If you're not already using Apache Commons, don't forget to download the jar and add it to your classpath.

Commons Net FTPClient retrieved file encoding issue

I'm retrieving a file from a FTP Server. The file is encoded as UTF-8
ftpClient.connect(props.getFtpHost(), props.getFtpPort());
ftpClient.login(props.getUsername(), props.getPassword());
ftpClient.setFileType(FTP.BINARY_FILE_TYPE);
inputStream = ftpClient.retrieveFileStream(fileNameBuilder
.toString());
And then somewhere else I'm reading the input stream
bufferedReader = new BufferedReader(new InputStreamReader(
inputStream, "UTF-8"));
But the file is not getting read as UTF-8 Encoded!
I tried ftpClient.setAutodetectUTF8(true); but still doesn't work.
Any ideas?
EDIT:
For example a row in the original file is
...00248090041KENAN SARÐIN 00000000015.993FAC...
After downloading it through FTPClient, I parse it and load in a java object, one of the fields of the java object is name, which for this row is read as "KENAN SAR�IN"
I tried dumping to disk directly:
File file = new File("D:/testencoding/downloaded-file.txt");
FileOutputStream fop = new FileOutputStream(file);
ftpClient.retrieveFile(fileName, fop);
if (!file.exists()) {
file.createNewFile();
}
I compared the MD5 Checksums of the two files(FTP Server one and the and the one dumped to disk), and they're the same.
I would separate out the problems first: dump the file to disk, and compare it with the original. If it's the same as the original, the problem has nothing to do with UTF-8. The FTP code looks okay though, and if you're saying you want the raw binary data, I'd expect it not to mess with anything.
If the file is the same after transfer as before, then the problem has nothing to do with FTP. You say "the file is not getting read as UTF-8 Encoded" but it's not clear what you mean. How certain are you that it's UTF-8 text to start with? If you could edit your question with the binary data, how it's being read as text, and how you'd expect it to be read as text, that would really help.
Try to download the file content as bytes and not as characters using InputStream and OutputStream instead of InputStreamReader. This way you are sure that the file is not changed during transfer.

Apache POI HSSF XLS reading error

Using the following code while reading in a .xls file, where s is the file directory:
InputStream input = new FileInputStream(s);
Workbook wbs = new HSSFWorkbook(input);
I get the following error message:
Exception in thread "main" java.io.IOException: Invalid header signature; read 0x0010000000060809, expected 0xE11AB1A1E011CFD0
I need a program that is able to read in either XLSX or XLS, and using the exact same code just adjusted for XSSF it has no problem at all reading in the XLSX file.
The Exception you're getting is one telling you that the file you're supplying isn't a valid Excel binary file, at least not a valid Excel file produced since about 1990. The exception you're getting tells you what POI expects, and that it found something else instead which wasn't a valid .xls file, and wasn't anything else POI can detect.
One thing to be aware of is that Excel opens a wide variety of different file formats, including .csv and .html. It's also not very picky about the file extension, so will happily open a CSV file that has been renamed to a .xls one. However, since renaming a .csv to a .xls doesn't magically change the format, POI still can't open it!
.
From the exception, I can tell what's happening, and I can also tell you're using an ancient version of Apache POI! A header signature of 0x0010000000060809 corresponds to the Excel 4 file format, from about 25 years ago! If you use a more recent version of Apache POI, it'll give you a helpful error message telling you that the file supplied is an old and largely unsupported Excel file. New versions of POI do include the OldExcelExtractor tool which can pull out some information from those ancient formats.
Otherwise, as with all exceptions of this type, try opening the file in Excel and doing a save-as. That will give you an idea of what the file currently is (eg .html saved as .xls, .csv saved as .xls etc), and will also let you re-save it as a proper .xls file for POI to load and work with.
If the file is in xlsx format instead of xls you might get this error. I would try using the generic Workbook object (Also called the SS Usermodel)
Check out the Workbook interface and the WorkbookFactory object. The factory should be able to create a generic Workbook for you out of either xlsx or xls.
I thought I had a good tutorial on this, but I can't seem to find it. I'll keep looking though.
Edit
I found this little tiny snippet from Apache's site about reading and rewriting using the SS Usermodel.
I hope this helps!
Invalid header signature; read 0x342E312D46445025, expected 0xE11AB1A1E011CFD0
Well I got this error when I uploaded corrupted xls/xlsx file(to upload corrupt file I renamed sample.pdf to sample.xls). Add validation like :
Workbook wbs = null;
try {
InputStream input = new FileInputStream(s);
wbs = new HSSFWorkbook(input);
} catch(IOException e) {
// log "file is corrupted", show error message to user
}

Convert binary contents in BLOB field in the database to files mySQL

I have got to retrieve binary in LONGBLOB field from the db. This field is storing all sorts of file formats such as txt, doc, xdoc, pdf, etc. I basically need to be able to convert the binary format into their actual file formats in order to allow my user to download these files.
Has anyone got any idea how to do this?
As others have said, it would be best have another field to store the format. You can do this by copying the extension (ie, everything after the last "." in the file name). The best way would probably be to get the file's mime type: see this for example.
You can then store the mime type in a field in the database. This will almost always work, whereas the extension of a file can be misleading (or vague).
Adding a field file_format indicating the file format of the file stored in LONGBLOB, then you are able to convert the binary according to the associated file format.
or, reserve the first several bytes for file format, after that is the actual content of file.
I think u should have another field to save the document type and tell which type should be converted. Use I/O InputStream to read/write file.
What I recommend is upload the client files to somewhere, save the path that's link to these file into db. That should be faster.
as others have suggested. create another column(FILETYPE_COL_NAME) in database which will tell you which type of file is stored in BOLB(BLOB_COL_NAME) field and then you extend the below code in try/catch block
String fileType = rs.getString("FILETYPE_COL_NAME");
DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream("filepath."+fileType )));
ResultSet rs = statement.executeQuery("Select * from tablename");
BufferedInputStream bis = new BufferedInputStream(rs.getBinaryStream("BLOB_COL_NAME"));
byte[] buffer = new byte[1024];
int byteread = 0;
while((byteread = bis.read(buffer)) != -1){
dos.write(buffer, 0, byteread);
}
dos.flush();
dos.close();
bis.close();

Invalid header signature; IOException with Apache POI on excel document

I'm getting:
java.io.IOException: Invalid header signature; read
0x000201060000FFFE, expected 0xE11AB1A1E011CFD0
when trying to add some custom properties to an Excel document using apache POI HPSF.
I'm completely sure the file is Excel OLE2 (not HTML, XML or something else that Excel doesn't complain about).
This is a relevant part of my code:
try {
final POIFSFileSystem poifs = new POIFSFileSystem(event.getStream());
final DirectoryEntry dir = poifs.getRoot();
final DocumentEntry dsiEntry = (DocumentEntry)
dir.getEntry(DocumentSummaryInformation.DEFAULT_STREAM_NAME);
final DocumentInputStream dis = new DocumentInputStream(dsiEntry);
final PropertySet props = new PropertySet(dis);
dis.close();
dsi = new DocumentSummaryInformation(props);
}
catch (Exception ex) {
throw new RuntimeException
("Cannot create POI SummaryInformation for event: " + event +
", path:" + event.getPath() +
", name:" + event.getPath() +
", cause:" + ex);
}
I get the same error when trying with word and power point files (also OLE2).
I'm completely out of ideas so any help/pointers are greatly appreciated :)
If you flip the signature number round, you'll see the bytes of the start of your file:
0x000201060000FFFE -> 0xFE 0xFF 0x00 0x00 0x06 0x01 0x02 00
The first two bytes look like a Unicode BOM, 0xFEFF means 16 bit little endian. You then have some low control bytes, the hex codes for 0 then 258 then 2, so maybe it isn't a text file after all.
That file really isn't an OLE2 file, and POI is right to give you the error. I don't know what it is, but I'm guessing that perhaps it might be part of an OLE2 file without it's outer OLE2 wrapper? If you can open it with office, do a save-as and POI should be fine to open that. As it stands, that header isn't an OLE2 file header so POI can't open it for you.
In my case, the file was a CSV file saved with the .xls extension. Excel was able to open it without a problem, but POI was not.
If I find a better/more general solution, I'll come back and write it up here.
Try save it as csv file directly and use opencsv for your operations.
Use the following link to know about opencsv.
http://opencsv.sourceforge.net/#what-is-opencsv
Excel can open a csv, xls or even html table saved as xls.
So you can save the file as file_name.csv and can use opencsv for reading the file in your code.
Or else you can the file once in excel by save As excel 97-2003 workbook.
And then, POI itself can read the file :-)
because you saved your file by Excel 2013. save As your file as excel 97-2003 format.
I had the same problem with an xls file generated by software, I am forced to save files with Excel (the same format) to be able to read with apache POI.
I was using the .xlsx file instead of .xls. We have to use the .xls file if we are using Workbook, Sheet and Row classes.
My file was .xlsx, that created this issue and I changed it to .xls, it worked.

Categories