I have a code for adding the texts to existing .doc file and it'll save that as another name by using apache POI.
The following is the code I have tried so far
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import org.apache.poi.xwpf.model.XWPFHeaderFooterPolicy;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFFooter;
import org.apache.poi.xwpf.usermodel.XWPFTable;
public class FooterTableWriting {
public static void main(String args[])
{
String path="D:\\vignesh\\AgileDocTemplate.doc";
String attch="D:\\Attach.doc";
String comment="good";
String stat="ready";
String coaddr="xyz";
String cmail="abc#gmail.com";
String sub="comp";
String title="Globematics";
String cat="General";
setFooter(path, attch, comment, stat, coaddr, cmail, sub, title, cat);
}
private static void setFooter(String docTemplatePath,String attachmentPath,String comments,String status,String coAddress,String coEmail,String subject,String title,String catagory)
{
try{
InputStream input = new FileInputStream(new File(docTemplatePath));
XWPFDocument document=new XWPFDocument(input);
XWPFHeaderFooterPolicy headerPolicy =new XWPFHeaderFooterPolicy(document);
XWPFFooter footer = headerPolicy.getDefaultFooter();
XWPFTable[] table = footer.getTables();
for (XWPFTable xwpfTable : table)
{
xwpfTable.getRow(1).getCell(0).setText(comments);
xwpfTable.getRow(1).getCell(1).setText(status);
xwpfTable.getRow(1).getCell(2).setText(coAddress);
xwpfTable.getRow(1).getCell(3).setText(coEmail);
xwpfTable.getRow(1).getCell(4).setText(subject);
xwpfTable.getRow(1).getCell(5).setText(title);
xwpfTable.getRow(1).getCell(6).setText(catagory);
}
File f=new File (attachmentPath.substring(0,attachmentPath.lastIndexOf('\\')));
if(!f.exists())
f.mkdirs();
FileOutputStream out = new FileOutputStream(new File(attachmentPath));
document.write(out);
out.close();
System.out.println("Attachment Created!");
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
The following is what I got
org.apache.poi.POIXMLException: org.apache.xmlbeans.XmlException: error: The document is not a document#http://schemas.openxmlformats.org/wordprocessingml/2006/main: document element mismatch got themeManager#http://schemas.openxmlformats.org/drawingml/2006/main
at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:124)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:200)
at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:74)
at ext.gt.checkOut.FooterTableWriting.setFooter(FooterTableWriting.java:32)
at ext.gt.checkOut.FooterTableWriting.main(FooterTableWriting.java:25)
Caused by: org.apache.xmlbeans.XmlException: error: The document is not a document#http://schemas.openxmlformats.org/wordprocessingml/2006/main: document element mismatch got themeManager#http://schemas.openxmlformats.org/drawingml/2006/main
at org.apache.xmlbeans.impl.store.Locale.verifyDocumentType(Locale.java:458)
at org.apache.xmlbeans.impl.store.Locale.autoTypeDocument(Locale.java:363)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1279)
at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1263)
at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345)
at org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Unknown Source)
at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:92)
... 4 more
I have added all the jar files corresponding to this but still I can't find the solution.I'm new to this apache poi so please help me with some explanations and examples.
Thanks
Copied from my comment done to the question:
Looks like you need poi-ooxml-schemas.jar that comes in the Apache POI distribution. Just adding a single jar doesn't mean that you have all the classes of the framework.
After solving the problem based on my comment (or another people answers), you have this new Exception
org.apache.xmlbeans.XmlException: error: The document is not a document#http://schemas.openxmlformats.org/wordprocessingml/2006/main: document element mismatch got themeManager#http://schemas.openxmlformats.org/drawingml/2006/main
Reading Apache POI - HWPF - Java API to Handle Microsoft Word Files, it looks like you're using the wrong class to handle 2003- word documents: HWPF is the name of our port of the Microsoft Word 97(-2007) file format to pure Java ... The partner to HWPF for the new Word 2007 .docx format is XWPF.. This means that you need HWPFDocument class to handle the document or change your document from Word 2003- to Word 2007+.
IMO I find Apache POI as a good solution to handling Excel files, but I would look another options to handling Word documents. Check this question to get more related info.
This is the dependency hierarchy for poi-ooxml-3.9.jar.
Which means any of them can be used at runtime even if they aren't used at compile-time.
Make sure you have all the jars in the classpath of your project.
Add this dependency on your config file:
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>ooxml-schemas</artifactId>
<version>1.3</version>
</dependency>
or
System couldn’t find the
poi-ooxml-schemas-xx.xx.jar
Please add the library to your classpath.
The class org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument.Factory is located in the jar ooxml-schemas-1.0.jar which can be downloaded here
You're getting that error because you don't have the proper dependency for the XWPFDocument. ooxml-schemas requires xmlbeans, and ooxml requires poi and ooxml-schemas, etc...
Check here: http://poi.apache.org/overview.html#components
Thought I would report my experience with this error. I started getting it out of the blue, and hadn't changed anything in my workspace. Turns out that it occurs while trying to read an Excel file that has more than 1 sheet (second sheet was a pivot table, large amount of data. Not quit sure if it's due to the size of the data (I suspect so, because I HAVE read Excel files that contain more than 1 worksheet). When I deleted that second sheet, it worked. No changes to classpath needed.
org.apache.poi.POIXMLException: org.apache.xmlbeans.XmlException: Element themeManager#http://schemas.openxmlformats.org/drawingml/2006/main is not a valid workbook#http://schemas.openxmlformats.org/spreadsheetml/2006/main document or a valid substitution.
Solution :- use .xlsx format instead of .xls
FWIW I had to add this:
compile 'org.apache.poi:ooxml-schemas:1.3'
For my case I had different versions of poi(s). poi-scratchpad was of 3.9 and all others - poi, poi-ooxml,poi-ooxml-schemas were of 3.12. I changed version of poi-scratchpad to 3.12 as well and everything started working.
If you are not using maven for your project dependencies. You should have the following jars in your classpath
Related
I am currently Apache poi 3.14 version jar to create Word documents. I am now looking to upgrade Poi to latest stable version of 5.0. But upon checking I am facing issues where I am even unable to load document stream in XWPF document. I have attached a sample code where I try to read a simple docx file & then re-write it again, I am getting error in place of even loading file bytes into XWPFDocument. I am baffled any detailed help would be really appreciated.
package basePackage;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class PoiJars {
public static void main(String[] args) throws Exception {
String docxFilePath = "SimpleWordFile.docx";
InputStream stream = new FileInputStream(docxFilePath);
XWPFDocument document = new XWPFDocument(stream);
FileOutputStream outFile = new FileOutputStream("output.docx");
document.write(outFile);
}
}
Exception that occurs is:
Exception in thread "main" java.lang.NoClassDefFoundError: org/openxmlformats/schemas/drawingml/x2006/chart/ChartSpaceDocument$Factory
at org.apache.poi.xddf.usermodel.chart.XDDFChart.<init>(XDDFChart.java:155)
at org.apache.poi.xwpf.usermodel.XWPFChart.<init>(XWPFChart.java:75)
at org.apache.poi.ooxml.POIXMLFactory.createDocumentPart(POIXMLFactory.java:61)
at org.apache.poi.ooxml.POIXMLDocumentPart.read(POIXMLDocumentPart.java:660)
at org.apache.poi.ooxml.POIXMLDocument.load(POIXMLDocument.java:165)
at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:126)
at basePackage.PoiJars.main(PoiJars.java:18)
Caused by: java.lang.ClassNotFoundException: org.openxmlformats.schemas.drawingml.x2006.chart.ChartSpaceDocument$Factory
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 7 more
Jars I am using in Class path:
poi-5.0.0.jar, poi-ooxml-5.0.0.jar, xmlbeans-4.0.0.jar along with other commons & codec jar dependencies.
My queries are:
1)Why I am not even able to load basic docx file in XWPFdocument?
2)If use poi-ooxml-full-5.0.0.jar instead of poi-ooxml-5.0.0.jar, XWPFDocument class is not present in it, what is reason ?
3)Also can some one pls help me in sharing some links to get complete understanding POI architecture & code flow, so I can modify classes in jar according to my needs.
I am facing an issue when using apache poi to extract an embedded .xlsx files from a .ppt file. It would be really great if somebody could help me out.
The subject of the problem:
Problem trying to solve: Extracting a ".xlsx" file embedded inside a ".ppt".
I am currently using apache-poi.
It seems that when I try to do it using hslfSlideShow.getEmbeddedObjects(), I get the xlsx object just fine but when I try converting it to the XLSFWorkbook object using say WorkbookFactory.create(inputStream), it threw an error saying
java.lang.IllegalArgumentException: The supplied POIFSFileSystem does not contain a BIFF8 'Workbook' entry. Is it really an excel file? Had: [OlePres000, Ole, CompObj, Package]
at org.apache.poi.hssf.usermodel.HSSFWorkbook.getWorkbookDirEntryName(HSSFWorkbook.java:286)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:326)
at org.apache.poi.hssf.usermodel.HSSFWorkbookFactory.createWorkbook(HSSFWorkbookFactory.java:64)
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:167)
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:112)
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:253)
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:221)
Interestingly it is calling HSSFWorkbookFactory even though its an xlsx file.
And no the xlsx file is not corrupted/password-protected. I can open it just fine.
Also, it works fine if I try parsing the .xlsx file without embedding it in the .ppt.
And the parsing works fine when I embed it in a .pptx file and call methods such as xmlSlideShow.getAllEmbeddedParts() to get the embedded objects from .pptx.
Promoting some comments and investigation to an answer...
This was a limitation in older version of Apache POI, but was fixed in July in r1880164.
For backwards-compatibility reasons, PowerPoint will often (but not always...) write embedded OOXML resources wrapped in an intermediate OLE2 layer. This has the advantage that tools/programs which expect embedded office documents to be something like a xls / doc to cope, but at the expense of another layer of wrapping.
Newer versions of Apache POI (5.0 should be the first released one with the fix in) have support in WorkbookFactory for receiving an OLE2 wrapper like this, pulling out the underlying xlsx stream and handing that off to XSSFWorkbook. (Older versions did this for OLE2-based password-protected xlsx files, but not their unencrypted cousins)
For now, if you're stuck on an affected POI version, the code you'll want is something like this (largely taken from the unit test verifying support!):
POIFSFileSystem fs = new POIFSFileSystem(data.getInputStream());
if(fs.getRoot().hasEntry("Package")) {
DocumentInputStream dis = new DocumentInputStream((DocumentEntry)fs.getRoot().getEntry("Package"));
try (OPCPackage pkg = OPCPackage.open(dis)) {
XSSFWorkbook wb = new XSSFWorkbook(pkg);
handleWorkbook(wb);
wb.close();
}
} else {
try (HSSFWorkbook wb = new HSSFWorkbook(fs)) {
handleWorkbook(wb);
}
}
I have the issue that Apache POI "corrupted" a xlsm / xlsx file by just reading and writing it (e.g. with the following code)
public class Snippet {
public static void main(String[] args) throws Exception {
String str1 = "c:/tmp/spreadsheet.xlsm";
String str2 = "c:/tmp/spreadsheet_poi.xlsm";
// open file
XSSFWorkbook wb = new XSSFWorkbook(new FileInputStream(new File(str1)));
// save file
FileOutputStream out = new FileOutputStream(str2);
wb.write(out);
wb.close();
out.close();
}
}
Once you open the spreadsheet_poi.xlsm in Excel you'll get an error like the following
"We found a problem with some content in xxx. Do you want us to try to recover as much as we can..."?
If you say yes you'll end up with a log which could look like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<logFileName>error145040_01.xml</logFileName>
<summary>Errors were detected in file 'C:\tmp\spreadsheet_poi.xlsm'</summary>
<repairedParts>
<repairedPart>Repaired Part: /xl/worksheets/sheet4.xml part with XML error. Load error. Line 2, column 0.</repairedPart>
<repairedPart>Repaired Part: /xl/worksheets/sheet5.xml part with XML error. Load error. Line 2, column 0.</repairedPart>
<repairedPart>Repaired Part: /xl/worksheets/sheet8.xml part with XML error. Load error. Line 2, column 0.</repairedPart>
</repairedParts>
</recoveryLog>
Whats the best approach to debug the issue in more detail (e.g. find out what makes poi to "corrupt" the file?
Eventually I found how that the best approach for debugging this are two things
open the affected workbook (e.g. with 7zip and format the affected sheets with an xml editor (e.g. Notepad++ > Plugins > XML Tools > Pretty print (XML only - with line breaks). After saving the files and updating the xlsm file you'll get the "real" line numbers in the Excel error log. Alternative option (which I haven't tried but should work according to the POI mailing liste: use OOXMLPrettyPrint (https://svn.apache.org/repos/asf/poi/trunk/src/ooxml/java/org/apache/poi/ooxml/dev/) to format the file and then reopen it it in excel.
if the real line numbers not already help compare the sheet xml files of the original xlsx file and the one saved by poi. You'll notice that there are differences in regards to the attributes and also the order is different. In order to properly compare I used Beyond Compare with "Additional File Formats" (see https://weblogs.asp.net/lorenh/comparing-xml-files-with-beyond-compare-3-brilliant for more information). Maybe there is another diff tool that is equally good.
In my case the problem was that poi somehow changed the dimension setting from
<dimension ref="A1:XFD147"/>
to
<dimension ref="A1:XFE147"/>
(with XFE beeing a non existing column). I fixed it by removing those many empty columns in the original xlsx file.
My professor said: "How does the mathematician find the lion in the desert?" - "First cuts the desert into two halves, finds out where is the lion, then repeats it until the lion is caught".
So, try to remove features from the Excel files, try different versions, until you find the root cause. There may be multiple causes, though.
I am using Aspose.Cells (trial version) to parse a .xls (Excel) file for Java. But when I try to load the file, it throws the exception given below:
SEVERE: java.lang.IllegalStateException: XML Stream Exception: XMLStreamException: com.ctc.wstx.sr.ValidatingStreamReader cannot be cast to com.ctc.wstx.sr.ValidatingStreamReader
Here is my code
Workbook workbook = new Workbook();
try {
workbook.open(path+fileName);
} catch (Exception e) {
e.printStackTrace();
}
Worksheet worksheet = workbook.getWorksheets().get(0);
This exception is coming at workbook.open(path+fileName); this line.I am quiet sure that this is not due to wrong path because when I give wrong path then aspose throws FileNotFoundException.So now I am stuck here and unable to find why this is happening?Note: In search of this problem, I found this answer on aspose forum but it is not helpful and feasible(to check all the classes present in jars placed in lib).
We recommend you to kindly try our latest version of the product (e.g v7.7.x (JAVA)) as we did remove some inter dependencies jars and have written/included our own custom XML parsers to perform some XML operations in the product. In the new versions, we we have removed the conflicting "com.etc.wstx" jar in the product, so you should not find this exception any more.
Thanks,
So I am trying to get an flat file XML version of an OPC document.
I am using OPCPackage from org.apache.poi.openxml4j
In C++ you there is a call that creates flat XML file from this zipped file.
Anyone knows how to do that in Java?
Also any good read related to OPC and Java would be awesome.
Thanks a lot
Cheers
UPDATE: related to the comment i made to only answer...
code
// imports
import org.docx4j.convert.out.flatOpcXml.FlatOpcXmlCreator;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
// code snippet
WordprocessingMLPackage wmlPkg = null;
try
{
wmlPkg = WordprocessingMLPackage.load(inFile);
}
catch (Docx4JException ex)
{
//...
}
FlatOpcXmlCreator flatOpcWorker = new FlatOpcXmlCreator((wmlPkg));
flatOpcWorker.marshal(new FileOutputStream(tmpFlatFile.getAbsolutePath()));
So thats code snippet and it results in compile error:
cannot find symbol symbol: method marshal(java.io.FileOutputStream) location: variable flatOpcWorker of type org.docx4j.convert.out.flatOpcXml.FlatOpcXmlCreator
My project docx4j has a FlatOpcXmlCreator which does this; see the ConvertOutFlatOpenPackage sample
If you want to use it with POI (which uses XML Beans, not JAXB), I guess you could port it. Both projects are ASL, and both do the OPC part based on OpenXML4J.