I know we can get it done by extension or by mime type, do we have any other way through which we can get the idea of type of file whether it is .docx or .doc.
If it is just a matter of decided whether a collection of files known to either be .doc or .docx but are not marked accordingly with an extension, you can use the fact that a .docx file is a zipped collection of files. Something to the tune as follows might help:
boolean isZip = new ZipInputStream( fileStream ).getNextEntry() != null;
where fileStream is whatever file or other input stream you wish to evaluate. You could further evaluate a zipped file by looking for key .docx entries. A good starting reference is Word Document (DOCX). Likewise, if you know it is just a binary file, you can test for Word's File Information Block (see Word (.doc) Binary File Format)
You could use Apache Tika for content Detection. But you should been aware that this is a huge framework (many required dependencies) for such a small task.
There is a way, no strightforward though. But with Apache POI, you can locate it.
Try to read a .docx file using HWPFDocument Class. It would give you the following error
org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied
data appears to be in the Office 2007+ XML. You are calling the part
of POI that deals with OLE2 Office Documents. You need to call a
different part of POI to process this data (eg XSSF instead of HSSF)
String filePath = "C:\\XXXX\XXXX.docx";
FileInputStream inStream;
try {
inStream = new FileInputStream(new File(filePath));
HWPFDocument doc = new HWPFDocument(inStream);
WordExtractor wordExtractor = new WordExtractor(doc);
System.out.println("Getting words"+wordExtractor.getText());
} catch (Exception e) {
System.out.print("Its not a .doc format");
}
.docx can be read using XWPFDocument Class.
Why dont you use Apache Tika:
File file = new File('File Here');
Tika tika = new Tika();
String filetype = tika.detect(file);
System.out.println(filetype);
Assuming you're using Apache POI, you have a few options.
One is to grab the first few bytes of the file, and ask POIFSFileSystem with the hasPOIFSHeader(byte) method. If you have a stream that supports mark/reset, you can instead use POIFSFileSystem.hasPOIFSHeader(InputStream). If those return true then try to open it as a .doc with HWPF, otherwise try as .docx with XWPF
Otherwise, if you prefer a try/catch way, try to open it with POIFSFileSystem and catch OfficeXmlFileException - if it opens fine it's .doc, if you get the exception it's .docx
If you look at the source code for WorkbookFactory you'll see the first pattern in use, you can copy a similar set of logic form that
I'm building an android app and I want to use iText for creating pdf file, but I can't use Document class. As I seen in tutorials, there should be import com.itextpdf.text.Document for using Document class. For this app, I'm using com.itextpdf:itext-pdfa:5.5.9 library. I want to create a simple pdf file with 2 paragraphs, something like this:
try{
File pdfFolder = new File(Environment.getExternalStoragePublicDirectory(
Environment.DIRECTORY_DOCUMENTS), "pdfdemo");
if (!pdfFolder.exists()) {
pdfFolder.mkdir();
}
Date date = new Date() ;
String timeStamp = new SimpleDateFormat("yyyyMMdd_HHmmss").format(date);
File myFile = new File(pdfFolder + timeStamp + ".pdf");
OutputStream output = new FileOutputStream(myFile);
Document document = new Document();
PdfAWriter.getInstance(document, output);
document.open();
document.add(new Paragraph(mSubjectEditText.getText().toString()));
document.add(new Paragraph(mBodyEditText.getText().toString()));
document.close();
}catch (Exception e) {}
'
Could anyone help me with this problem? What am I doing wrong?
You say:
I'm using com.itextpdf:itext-pdfa:5.5.9 library
That is wrong for two reasons:
itext-pdfa is an addon to iText that is meant for writing or manipulating PDF/A documents. It requires the core iText libary. Read about the different parts of iText on the official web site: https://developers.itextpdf.com/itext-java
You say you want to use iText on Android, but you are referring to iText for Java. iText for Java contains classes that are not allowed on Android (java.awt.*, javax.nio,...). You should use the Android port for iText, which is called iTextG: https://developers.itextpdf.com/itextg-android
It's as if you're using iText without having visited the official iText web site. How is that even possible?
Just open your app level gradle file and add following code into your dependencies
implementation 'com.itextpdf:itext-pdfa:5.5.9'
It works for me
I have an xml file already being created and rendered as a PDF sent over a servlet:
TraxInputHandler input = new TraxInputHandler(
new File(XML_LOCATION+xmlFile+".xml"),
new File(XSLT_LOCATION)
);
ByteArrayOutputStream out = new ByteArrayOutputStream();
//driver is just `new Driver()`
synchronized (driver) {
driver.reset();
driver.setRenderer(Driver.RENDER_PDF);
driver.setOutputStream(out);
input.run(driver);
}
//response is HttpServletResponse
byte[] content = out.toByteArray();
response.setContentType("application/pdf");
response.setContentLength(content.length);
response.getOutputStream().write(content);
response.getOutputStream().flush();
This is all working perfectly fine.
However, I now have another PDF file that I need to include in the output. This is just a totally separate .pdf file that I was given. Is there any way that I can append this file either to the response, the driver, out, or anything else to include it in the response to the client? Will that even work? Or is there something else I need to do?
We also use FOP to generate some documents, and we accept uploaded documents, all of which we eventually combine into a single PDF.
You can't just send them sequentially out the stream, because the combined result needs a proper PDF file header, metadata, etc.
We use the iText library to combine the files, starting off with
PdfReader reader = new PdfReader(/*String*/fileName);
reader.consolidateNamedDestinations();
We later loop through adding pages from each pdf to the new combined destination pdf, adjusting the bookmark / page numbers as we go.
AFAIK, FOP doesn't provide this sort of functionality.
Is there a way to convert .doc file to .pdf keeping the format same as doc file which can also include images?
I am able to generate PDF file from doc but only the text appears.
You can use a library based on Open-Office.
It allows to convert from (and to) all the formats supported by OpenOffice.
Moreover, if your doc is read correctly by OpenOffice, it should be converted exactly as you see.
I know JOD Converter for exemple :
File inputFile = new File("document.doc");
File outputFile = new File("document.pdf");
// connect to an OpenOffice.org instance running on port 8100
OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);
connection.connect();
// convert
DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
converter.convert(inputFile, outputFile);
// close the connection
connection.disconnect();
You can also use a simple command line (with oo installed) :
#!/bin/sh
DIR=$(pwd)
DOC=$DIR/$1
echo "Doc to convert : $DOC"
/user/bin/oowriter-invisible "macro://Standard.Module1.ConvertWordToPDF($DOC)"
You can use Apache POI to read the doc file and then Apache PDFBox to write the pdf file.
You can use Openoffice Macro for export doc as pdf like,
sub Docaspdf
rem ----------------------------------------------------------------------
rem define variables
dim document as object
dim dispatcher as object
rem ----------------------------------------------------------------------
rem get access to the document
document = ThisComponent.CurrentController.Frame
dispatcher = createUnoService("com.sun.star.frame.DispatchHelper")
rem ----------------------------------------------------------------------
dim args1(2) as new com.sun.star.beans.PropertyValue
args1(0).Name = "URL"
args1(0).Value = "file:///C:/doc.pdf"
args1(1).Name = "FilterName"
args1(1).Value = "writer_pdf_Export"
args1(2).Name = "FilterData"
args1(2).Value = Array(Array("UseLosslessCompression",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("Quality",0,90,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("ReduceImageResolution",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("MaxImageResolution",0,300,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("UseTaggedPDF",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("SelectPdfVersion",0,0,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("ExportNotes",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("ExportBookmarks",0,true,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("OpenBookmarkLevels",0,-1,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("UseTransitionEffects",0,true,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("IsSkipEmptyPages",0,true,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("IsAddStream",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("EmbedStandardFonts",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("FormsType",0,0,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("ExportFormFields",0,true,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("AllowDuplicateFieldNames",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("HideViewerToolbar",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("HideViewerMenubar",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("HideViewerWindowControls",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("ResizeWindowToInitialPage",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("CenterWindow",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("OpenInFullScreenMode",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("DisplayPDFDocumentTitle",0,true,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("InitialView",0,0,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("Magnification",0,0,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("Zoom",0,100,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("PageLayout",0,0,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("FirstPageOnLeft",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("InitialPage",0,1,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("Printing",0,2,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("Changes",0,4,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("EnableCopyingOfContent",0,true,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("EnableTextAccessForAccessibilityTools",0,true,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("ExportLinksRelativeFsys",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("PDFViewSelection",0,0,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("ConvertOOoTargetToPDFTarget",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("ExportBookmarksToPDFDestination",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("_OkButtonString",0,"",com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("EncryptFile",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("PreparedPasswords",0,,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("RestrictPermissions",0,false,com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("PreparedPermissionPassword",0,Array(),com.sun.star.beans.PropertyState.DIRECT_VALUE),Array("",0,,com.sun.star.beans.PropertyState.DIRECT_VALUE))
dispatcher.executeDispatch(document, ".uno:ExportToPDF", "", 0, args1())
end sub
import officetools.OfficeFile;
FileInputStream(new File("test.doc"));
FileOutputStream fos = new FileOutputStream(new File("test.pdf")); /
OfficeFile f = new OfficeFile(fis,"localhost","8100", false);
convert to pdf
f.convert(fos,"pdf");
You may use Aspose.Words for Java to convert Doc files to PDF. This component retains the format of the word document when converted to PDF. It also converts the images along with text.
Disclosure: I work as developer evangelist at Aspose.