NullpointerException when trying to read XLSX file - java

I currently have this code to open an xlsx file using apache POI
File existingXlsx = new File("/app/app.xlsx");
System.out.println("File Exists: " + existingXlsx.exists());
Workbook workbook = WorkbookFactory.create(existingXlsx);
When I try to execute this, I get the following output
File Exists: true
java.lang.NullPointerException
at org.apache.poi.xssf.usermodel.XSSFWorkbook.onDocumentRead(XSSFWorkbook.java:270)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:186)
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:91)
The file I am trying to open can be opened in Excel and show the data correctly, what can I do to get POI to read the XLSX file?
Here is the file that breaks;
https://mega.co.nz/#!FJMWjQKI!CzihQgMVpxOQDTXzSnb3UFYSKbx4yFTb03-LI3iLmkE
Edit
I have also tried, this results in the same error;
Workbook workbook = new XSSFWorkbook(new FileInputStream(existingXlsx));
Edit
I found the line it is throwing the exception on;
WorkbookDocument doc = WorkbookDocument.Factory.parse(getPackagePart().getInputStream());
this.workbook = doc.getWorkbook();
Map<String, XSSFSheet> shIdMap = new HashMap<String, XSSFSheet>();
for(POIXMLDocumentPart p : getRelations())
{
if(p instanceof SharedStringsTable) sharedStringSource = (SharedStringsTable)p;
else if(p instanceof StylesTable) stylesSource = (StylesTable)p;
else if(p instanceof ThemesTable) theme = (ThemesTable)p;
else if(p instanceof CalculationChain) calcChain = (CalculationChain)p;
else if(p instanceof MapInfo) mapInfo = (MapInfo)p;
else if (p instanceof XSSFSheet) {
shIdMap.put(p.getPackageRelationship().getId(), (XSSFSheet)p);
}
}
stylesSource.setTheme(theme); <== BREAKS HERE
Edit
After some research POI seems to be unable to find the styles.xml and the workbook.xml, I find this strange because a simple reader like TextWrangler which shows the structure of the archive shows me the styles xml.
How do I fix this? Is there a default styles.xml and workbook.xml which I can insert into the archive?

Now I've dowloaded the latest packages:
poi-src-3.9-20121203.zip (As source)
xmlbeans-2.6.0.zip
jsr173_1.0_api.jar
resolver.jar
xbean.jar
xbean_xpath.jar
xmlbeans-qname.jar
xmlpublic.jar
ooxml-schemas-1.1.jar
dom4j-1.6.1.jar
commons-codec-1.8.jar
commons-logging-1.1.3.jar
ant.jar (ant 1.7)
And your test2.xlsx were read without problems:
public static void main(String arg []){
try {
//File existingXlsx = new File("/app/app.xlsx");
File existingXlsx = new File("c:/Java/poi-3.9/test-data/__theproblem/test2.xlsx");
System.out.println("File Exists: " + existingXlsx.exists());
Workbook workbook = WorkbookFactory.create(existingXlsx);
} catch (Exception e) {
e.printStackTrace();
}
}
Are you sure you're using ooxml-schemas-1.1.jar as the POI documentation recommends?
EDIT
Hmm. It's work for me from jar too.
I have downloaded poi-bin-3.9-20121203.tar.gz from
http://poi.apache.org/download.html
Made a new project in Eclipse, extracted all the jars from the zip:
lib/commons-codec-1.5.jar
lib/commons-logging-1.1.jar
lib/dom4j-1.6.1.jar
lib/junit-3.8.1.jar
lib/log4j-1.2.13.jar
lib/poi-3.9-20121203.jar
lib/poi-examples-3.9-20121203.jar
lib/poi-excelant-3.9-20121203.jar
lib/poi-ooxml-3.9-20121203.jar
lib/poi-ooxml-schemas-3.9-20121203.jar
lib/poi-scratchpad-3.9-20121203.jar
lib/stax-api-1.0.1.jar
lib/xmlbeans-2.3.0.jar
Add the test xlsx:
test-data/test2.xlsx
The test Java:
src/XlsxReadTest1.java
Source:
import java.io.File;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.WorkbookFactory;
public class XlsxReadTest1 {
public static void main(String arg []){
try {
File existingXlsx = new File("c:/Java/__Work/apache_POI/poi-3.9-bin/test-data/test2.xlsx");
System.out.println("File Exists: " + existingXlsx.exists());
Workbook workbook = WorkbookFactory.create(existingXlsx);
System.out.println("A1: " + workbook.getSheetAt(0).getRow(0).getCell(0).getStringCellValue());
} catch (Exception e) {
e.printStackTrace();
}
}
}
Run. (Tried with jdk1.7.0_07, jdk1.6.0_31)
Result:
File Exists: true
A1: Testing Edit
"Testing Edit" is the content of the first cell on the first sheet of your file.
I think, You may try this, from scratch.
(Maybe you are using other jars for your project, whom interfere with this jars in the class loader? Class loader is a cunning guy...)

I guess you just used the wrong poi package.
Try to download the following or you check the newest version from the page.
The following I tested in my Eclipse development:
http://www.apache.org/dyn/closer.cgi/poi/release/bin/poi-bin-3.9-20121203.zip
extract it, and include all the jars into your eclipse lib
I combine user1234's answer and my own approach, both are working on your test2.xlsx
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import org.apache.poi.openxml4j.exceptions.OpenXML4JException;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.extractor.XSSFExcelExtractor;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.xmlbeans.XmlException;
public class Main {
/**
* #param args
*/
public static void main(String[] args) {
// File existingXlsx = new File("app.xlsx");
File file = new File("test2.xlsx");
FileInputStream fs;
try {
fs = new FileInputStream(file);
OPCPackage xlsx = OPCPackage.open(fs);
XSSFExcelExtractor xe = new XSSFExcelExtractor(xlsx);
System.out.println(xe.getText());
} catch (FileNotFoundException e1) {
e1.printStackTrace();
} catch (XmlException e) {
e.printStackTrace();
} catch (OpenXML4JException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
/// -------------- Another approach
File existingXlsx = new File("test2.xlsx");
System.out.println("File Exists: " + existingXlsx.exists());
try {
Workbook workbook = new XSSFWorkbook(new FileInputStream(
existingXlsx));
Sheet worksheet = workbook.getSheet("Filter criteria");
Row row1 = worksheet.getRow(0);
Cell cellA1 = row1.getCell((short) 0);
String a1Val = cellA1.getStringCellValue();
System.out.println("A1: " + a1Val);
} catch (IOException e) {
e.printStackTrace();
}
}
}
finally I got the result:

If you want to read .xlsx, could you please try this code (uses apache poi 3.9) :
File file = new File("/app/app.xlsx");
FileInputStream fs = new FileInputStream(file);
OPCPackage xlsx = OPCPackage.open(fs);
XSSFExcelExtractor xe = new XSSFExcelExtractor(xlsx);
System.out.println(xe.getText());
The above code should display the content of the file app.xlsx.

Related

Excel File is getting Corrupted while renaming the Existing excel file with Current date and Time?

When I am renaming the existing Excel file using below code, Excel File is getting generated with Current Time Stamp but when I am trying the open the renamed Excel file, I am not able to Open it. It is showing Workbook is Corrupted.
Please Find Out where is the Error in the Below code. Why Excel File is getting Corrupted?
package BrokenLink;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Date;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
public class CreateExcelFile {
public static void main(String[] args)
{
//new File("C:\\Users\\skumari1\\eclipse-workspace\\AlarmTest\\Generics").mkdir();
CreateFileRenameExisting("NewBook.xlsx");
}
//Rename an existing file and create a new file
public static void CreateFileRenameExisting(String filename)
{
//get current project path
String filePath=System.getProperty("user.dir");
//create a new file
File file=new File(filePath+"\\"+filename);
try {
if(!file.exists()) {
Workbook wb1 = new XSSFWorkbook();
FileOutputStream fileOut1 = new FileOutputStream(filename);
wb1.write(fileOut1);
fileOut1.close();
//file.createNewFile();
System.out.println("File is created");
}
else
{
File backupFile=new File(filePath+"\\"+ Validatedate() + file.getName());
System.out.println("File already exist and backup file is created");
file.renameTo(backupFile);
}
} catch (IOException e) {
e.printStackTrace();
}
}
public static String Validatedate() {
SimpleDateFormat dateFormat = new SimpleDateFormat("dd-MM-yyyy HH_mm_ss");
Date date = new Date();
String date1= dateFormat.format(date);
// System.out.println("Current date and time is " +date1);
return date1;
}
}
Thank you in advance.
I think you should add workbook code in else part too.
As you are simply renaming the file with File object may be the reason your excel is getting corrupted.
Add Workbook code in else part and also close your workbook instance once all operations are done.

Extract Paragraph from Word Document Using Apache POI

I have an word document
Docx file
As you can see in the word document there are a number of questions with Bullet Points. Right now I am trying to extract each paragraph from the file using apache POI. Here is my current code
public static String readDocxFile(String fileName) {
try {
File file = new File(fileName);
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
String whole = "";
for (XWPFParagraph para : paragraphs) {
System.out.println(para.getText());
whole += "\n" + para.getText();
}
fis.close();
document.close();
return whole;
} catch (Exception e) {
e.printStackTrace();
return "";
}
}
The problem with above method is that it is printing each line instead of paragraphs. Also the bullet points are also gone from extracted whole string. The whole is returned a plain string.
Can anyone explain what I am doing wrong. Also please suggest if you have a better idea to solve it.
Above code is correct and I ran your code on my system that giving each and every paragraphs , I think problem with writting content on docx file whenever I wrote content in bullet points and uses 'enter' key than that breaks my current bullet points and above code make that breaked-line as saparate paragraph.
I am writting below code sample may be It's useful for you take a look here I am using Set datastructure for ignoring duplicate questions from docx .
Dependency of apache poi is below
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.7</version>
</dependency>
Code Sample :
package com;
import java.io.File;
import java.io.FileInputStream;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.springframework.util.ObjectUtils;
public class App {
public static void main(String...strings) throws Exception{
Set<String> bulletPoints = fileExtractor();
bulletPoints.forEach(point -> {
System.out.println(point);
});
}
public static Set<String> fileExtractor() throws Exception{
FileInputStream fis = null;
try {
Set<String> bulletPoints = new HashSet<>();
File file = new File("/home/deskuser/Documents/query.docx");
fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
paragraphs.forEach(para -> {
System.out.println(para.getText());
if(!ObjectUtils.isEmpty(para.getText())){
bulletPoints.add(para.getText());
}
});
fis.close();
return bulletPoints;
} catch (Exception e) {
e.printStackTrace();
throw new Exception("error while extracting file.", e);
}finally{
if(!ObjectUtils.isEmpty(fis)){
fis.close();
}
}
}
}
I couldn't find which version of apache POI you are using. If it's the latest version (3.17), the XWPFParagraph object used in your code has a getNumFmt() method. From the apache poi documentation (https://poi.apache.org/apidocs/org/apache/poi/xwpf/usermodel/XWPFParagraph.html) this method will return the string "bullet" if the paragraph starts with a bullet. So regarding the second point of your question (what happens to the bullets), you can resolve with something like the following:
public class TestPoi {
private static final String BULLET = "•";
private static final String NEWLINE = "\n";
public static void main(String...args) {
String test = readDocxFile("/home/william/Downloads/anesthesia.docx");
System.out.println(test);
}
public static String readDocxFile(String fileName) {
try {
File file = new File(fileName);
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
StringBuilder whole = new StringBuilder();
for (XWPFParagraph para : paragraphs) {
if ("bullet".equals(para.getNumFmt())) {
whole.append(BULLET);
}
whole.append(para.getText());
whole.append(NEWLINE);
}
fis.close();
document.close();
return whole.toString();
} catch (Exception e) {
e.printStackTrace();
return "";
}
}
}
Regarding your first point, what is the expected output? I ran your code with the provided docx and apart from the missing bullets you mentioned, it looked okay stepping through with the debugger.

Error creating an Excel file with Apache POI library.

Here is my code:
import org.apache.poi.hslf.HSLFSlideShow;
import org.apache.poi.hslf.model.Slide;
import org.apache.poi.hslf.usermodel.SlideShow;
import java.io.*;
public class CreateNewPresentation
{
public static void main(args[])
{
try
{
SlideShow slideShow = new SlideShow();
Slide slide = slideShow.createSlide();
FileOutputStream out = new
FileOutputStream("slideshow.ppt");
slideShow.write(out);
System.out.println("File Created...");
out.close();
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
The issue is that it does not recognise org.apache.poi package.
How can I make it work?
If you are using netbeans, refer this for setting classpath.
If you want to set classpath from command line refer this

Extracting tar.gz using java error

I am trying to extract an archive .tar.gz using java and I am getting Directory error that I do not seem to understand. Please help. I got this sample code from https://forums.oracle.com/forums/thread.jspa?threadID=2065236
package untargz;
import java.io.*;
import com.ice.tar.*;
import javax.activation.*;
import java.util.zip.GZIPInputStream;
/**
*
* #author stanleymungai
*/
public class Untargz {
public static InputStream getInputStream(String tarFileName) throws Exception{
if(tarFileName.substring(tarFileName.lastIndexOf(".") + 1, tarFileName.lastIndexOf(".") + 3).equalsIgnoreCase("gz")){
System.out.println("Creating an GZIPInputStream for the file");
return new GZIPInputStream(new FileInputStream(new File(tarFileName)));
}else{
System.out.println("Creating an InputStream for the file");
return new FileInputStream(new File(tarFileName));
}
}
private static void untar(InputStream in, String untarDir) throws IOException {
System.out.println("Reading TarInputStream... ");
TarInputStream tin = new TarInputStream(in);
TarEntry tarEntry = tin.getNextEntry();
if(new File(untarDir).exists()){
while (tarEntry != null){
File destPath = new File(untarDir + File.separatorChar + tarEntry.getName());
System.out.println("Processing " + destPath.getAbsoluteFile());
if(!tarEntry.isDirectory()){
FileOutputStream fout = new FileOutputStream(destPath);
tin.copyEntryContents(fout);
fout.close();
}else{
destPath.mkdir();
}
tarEntry = tin.getNextEntry();
}
tin.close();
}else{
System.out.println("That destination directory doesn't exist! " + untarDir);
}
}
private void run(){
try {
String strSourceFile = "C:/AskulInstaller/pid.tar.gz";
String strDest = "C:/AskulInstaller/Extracted Files";
InputStream in = getInputStream(strSourceFile);
untar(in, strDest);
}catch(Exception e) {
e.printStackTrace();
System.out.println(e.getMessage());
}
}
public static void main(String[] args) {
new Untargz().run();
}
}
Once I run this piece of code, this is My Output;
Creating an GZIPInputStream for the file
Reading TarInputStream...
That destination directory doesn't exist! C:/AskulInstaller/Extracted Files
BUILD SUCCESSFUL (total time: 0 seconds)
When I Manually Create the destination Directory C:/AskulInstaller/Extracted Files
I get this Error Output;
Creating an GZIPInputStream for the file
Reading TarInputStream...
Processing C:\AskulInstaller\Extracted Files\AskulInstaller\pid\Askul Logs\DbLayer_AskulMain_10_Apr_2013_07_44.log
java.io.FileNotFoundException: C:\AskulInstaller\Extracted Files\AskulInstaller\pid\Askul Logs\DbLayer_AskulMain_10_Apr_2013_07_44.log (The system cannot find the path specified)
C:\AskulInstaller\Extracted Files\AskulInstaller\pid\Askul Logs\DbLayer_AskulMain_10_Apr_2013_07_44.log (The system cannot find the path specified)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:212)
at java.io.FileOutputStream.<init>(FileOutputStream.java:165)
at untargz.Untargz.untar(Untargz.java:37)
at untargz.Untargz.run(Untargz.java:55)
at untargz.Untargz.main(Untargz.java:64)
Is there a way I am supposed to place My directories so that the extraction Happens or what exactly is My Mistake?
If the tar file contains an entry for a file foo/bar.txt but doesn't contain a previous directory entry for foo/ then your code will be trying to create a file in a directory that doesn't exist. Try adding
destFile.getParentFile().mkdirs();
just before you create the FileOutputStream.
Alternatively, if you don't mind your code depending on Ant as a library then you can delegate the whole unpacking process to an Ant task rather than doing it by hand. Something like this (not fully tested):
Project p = new Project();
Untar ut = new Untar();
ut.setProject(p);
ut.setSrc(tarFile);
if(tarFile.getName().endsWith(".gz")) {
ut.setCompression((UntarCompressionMethod)EnumeratedAttribute.getInstance(UntarCompressionMethod.class, "gzip"));
}
ut.setDest(destDir);
ut.perform();

Writing to .xlsx using java. BiffViewer error occurs

jars I have used : dom4j poi-3.8.jar poi-ooxml-3.8.jar poi-ooxml-schemas-3.8.jar xbean.jar
The code :
package org.capgemini.ui;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Date;
import java.util.Enumeration;
import java.util.Iterator;
import java.util.Vector;
import org.apache.poi.xssf.usermodel.XSSFCell;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
public class XSSFWorkBookWriter {
public void writeWorkBook(Vector table) throws Exception {
XSSFWorkbook workBook = new XSSFWorkbook();
XSSFSheet sheet = workBook.createSheet();
table=new Vector();
table.add(new String("Howdy"));
table.add(new java.sql.Date(2012,06,20));
table.add(new Double(13.35D));
table.add(new String("Fine"));
table.add(new java.sql.Date(2012,06,20));
table.add(new Double(13.38D));
Iterator rows=table.iterator();
Enumeration rowsOfVector=table.elements();
int totalNoOfRows=table.size()/2;
int currentRow=0;
while (rows.hasNext () && currentRow<totalNoOfRows){
XSSFRow row = sheet.createRow(currentRow++);
for (int i = 0; i < 3; i++) {
XSSFCell cell=row.createCell(i);
Object val=rows.next();
if( val instanceof String){
cell.setCellValue(val.toString());
}
else if(val instanceof Date){
cell.setCellValue((java.sql.Date)val);
}
else if(val instanceof Double){
cell.setCellValue((Double)val);
}
}
}
FileOutputStream outPutStream = null;
try {
outPutStream = new FileOutputStream("D:/Try.xlsx");
workBook.write(outPutStream);
} catch (IOException e) {
e.printStackTrace();
} finally {
if (outPutStream != null) {
try {
outPutStream.flush();
outPutStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
public static void main(String[] args) {
XSSFWorkBookWriter bookWriter=new XSSFWorkBookWriter();
try {
bookWriter.writeWorkBook(new Vector());
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
ERROR ::
org.apache.poi.hssf.dev.BiffViewer$CommandParseException: Biff viewer needs a filename
at org.apache.poi.hssf.dev.BiffViewer$CommandArgs.parse(BiffViewer.java:333)
at org.apache.poi.hssf.dev.BiffViewer.main(BiffViewer.java:386)
BiffViewer is part of HSSF, which only works with the older .xls files (OLE2 based). You're generating a .xlsx file with XSSF, which is a different low level format.
If you really want to use BiffViewer (not sure why, it's normally only used with debugging, but still), then you'll need to change your XSSF code to be HSSF code. Otherwise, if you did mean to be using XSSF to generate a .xlsx file, then you can't use the HSSF debugging tools. If you want to know what's in your .xlsx file, unzip it (.xlsx is a zip of xml files) and view the XML.

Categories