Page count showing zero for APACHE POI .docx file - java

I have implemented Apache POI library for Page count of Doc pages, but it shows page count zero when I download Google Doc as .docx file.
Edit: My code is as follows
public Integer getPagesCount(byte[] docBytes, String type)
throws IOException {
ByteArrayInputStream in = new ByteArrayInputStream(docBytes);
String lowerFilePath = type.toLowerCase();
if (lowerFilePath.equals("docx")) {
#SuppressWarnings("resource")
XWPFDocument docx = new XWPFDocument(in);
return docx.getProperties().getExtendedProperties()
.getUnderlyingProperties().getPages();
} else if (lowerFilePath.equals("doc")) {
#SuppressWarnings("resource")
HWPFDocument wordDoc = new HWPFDocument(in);
return wordDoc.getSummaryInformation().getPageCount();
} else if (lowerFilePath.equals("ppt")) {
HSLFSlideShow document = new HSLFSlideShow(in);
return document.getSlides().size();
} else if (lowerFilePath.equals("pptx")) {
#SuppressWarnings("resource")
XMLSlideShow xslideShow = new XMLSlideShow(in);
return xslideShow.getSlides().size();
} else if (lowerFilePath.equals("pdf")) {
PDDocument doc = PDDocument.load(in);
return doc.getNumberOfPages();
}
return 0;
}

Related

Csv to PDF PDFBOX?

I'm new to java, and I'm learning how to create PDF, I was using Itext to create PDF, but because it handles a license I stopped using it, and started using PDFbox. I've searched the internet for how to convert from CSV to PDF, but I can't find an example of how to use margins, or how to align the content.
Using Itext, this is my code using Itext7, it works perfectly but i want to migrate it to PDFbox
public static boolean isEmpty(File rutaCSV) {
if(rutaCSV.isDirectory()){
String[] csvFiles = rutaCSV.list();
if (csvFiles.length > 0) {
for (String cadena : csvFiles){
readCSV("path");
try{
File csvFile = new File("path");
if (csvFile.exists()){
csvFile.delete();
return false;
}
System.out.println(csvFile.delete());
}catch(Exception ex){
System.out.println(ex);
}
}
return true;
}else{
System.out.println("There's not files");
return true;
}
}
return true;
}
public static void readCSV(String file, String cadena){
try{
FileReader filereader = new FileReader(file);
// Configuración para leer el archivo con ;
CSVParser parser = new CSVParserBuilder().withSeparator(';').build();
CSVReader csvReader = new CSVReaderBuilder(filereader)
.withCSVParser(parser)
.build();
Document pdfDocument = new Document();
pdfDocument.addTitle("Some");
pdfDocument.setMargins(0, 0,25,25);
String fileName = cadena.replaceFirst("[.][^.]+$", "");
PdfWriter.getInstance(pdfDocument, new FileOutputStream("path"+fileName+".pdf"));
pdfDocument.open();
PdfPTable tableData = new PdfPTable(5);
PdfPCell cells;
// Lista para guardar los datos leídos
List<String[]> allData = csvReader.readAll();
for (String[] row : allData) {
for (String cell : row) {
cells =new PdfPCell(new Phrase(cell));
tableData.addCell(cells);
}
}
pdfDocument.addTitle("Some");
pdfDocument.add(tableData);
pdfDocument.close();
}catch(Exception ex){
ex.printStackTrace();
System.out.println("error");
}
}
Using PDFBox
public static void main(String[] args) throws IOException, CsvException {
PDDocument document = new PDDocument();
PDPage page = new PDPage(PDRectangle.A4);
document.addPage(page);
PDPageContentStream content = new PDPageContentStream(document, page);
try{
String file = "C:\\Users\\brayan.milian\\Desktop\\pruebaFTP\\prueba.csv";
FileReader filereader = new FileReader(file);
CSVParser parser = new CSVParserBuilder().withSeparator(';').build();
CSVReader csvReader = new CSVReaderBuilder(filereader)
.withCSVParser(parser)
.build();
List<String[]> allData = csvReader.readAll();
for (String row[] : allData) {
for (String cell : row){
content.beginText();
content.setFont(PDType1Font.TIMES_BOLD, 12);
content.showText(cell);
content.endText();
}
}
content.close();
}catch(Exception ex){
System.out.println("dja");
}
Can anyone help me?

I want to write the ouput of displayDirectoryContents to a excel sheet

I want to write the output of the displayDirectoryContents to a excel sheet
I have tried using the Apache POI method I want to get the output to a excel sheet
Folder and filename in one column and
the name of the files in another column
import statements
public class Excel {
private static String dest = "C:\\Users\\mahselva\\testexcel.xls";
private static HSSFWorkbook myWorkBook = new HSSFWorkbook();
private static HSSFSheet mySheet = myWorkBook.createSheet();
public static void excelLog(String filename, String message, int rowNum)
{
HSSFRow myRow = null;
HSSFCell myCell = null;
String excelData[][] = new String[1][2];
excelData[0][0] = filename;
excelData[0][1] = message;
myRow = mySheet.createRow(rowNum);
for (int cellNum = 0; cellNum < 2; cellNum++) {
myCell = myRow.createCell(cellNum);
myCell.setCellValue(excelData[0][cellNum]);
}
try {
FileOutputStream out = new FileOutputStream(dest);
myWorkBook.write(out);
out.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
File currentDir = new File("C:\\OracleATS\\openScript"); // current
directory
displayDirectoryContents(currentDir);
}
public static void displayDirectoryContents(File dir) {
try {
int i = 0;
File[] files = dir.listFiles();
for (File file : files) {
if (file.isDirectory()) {
Path path = Paths.get(file.getCanonicalPath());
//System.out.println("Folder"
+path.getFileName().toString());
excelLog("Folder",path.getFileName().toString(),i);
i++;
displayDirectoryContents(file);
} else {
Path path = Paths.get(file.getCanonicalPath());
//System.out.println(path.getFileName().toString());
excelLog("File",path.getFileName().toString(),i);
i++;
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
I want two columns in an excel sheet with column 1 containing File or
Folder and column 2 containing the name of the file/folder
eg
File books.xml
Folder Script
Thus i want to write the output to the excel sheet
i am using the function excel log to write to the output screen
I use this to write to an excel - however I make it .csv format, not xls. Therefore this might not be what you need, but it's still partially useful as it's writing in a file that can be opened by excel efortlessly.
public static void printAtFile(String filename, String header, String content[])
{
filename+=".csv";
System.out.println("Start creating file "+filename);
PrintWriter writer = null;
try {
writer = new PrintWriter(filename);
writer.println(header);
for(String u:content)
writer.println(u);
writer.close();
} catch (Exception ex) {
System.out.println("Error while writing at file "+filename);
}
}

Extract text from pdf file by pdfbox

i am facing an issue in pdf reading.
public class GetLinesFromPDF extends PDFTextStripper {
static List<String> lines = new ArrayList<String>();
Map<String, String> auMap = new HashMap();
boolean objFlag = false;
public GetLinesFromPDF() throws IOException {
}
/**
* #throws IOException If there is an error parsing the document.
*/
public static void main(String[] args) throws IOException {
PDDocument document = null;
String fileName = "E:\\sample.pdf";
try {
int i;
document = PDDocument.load(new File(fileName));
PDFTextStripper stripper = new GetLinesFromPDF();
stripper.setSortByPosition(true);
stripper.setStartPage(0);
stripper.setEndPage(document.getNumberOfPages());
Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
stripper.writeText(document, dummy);
// print lines
for (String line : lines) {
//System.out.println("line = " + line);
if (line.matches("(.*)Objection(.*)")) {
System.out.println(line);
withObjection(lines);
//System.out.println("iiiiiiiiiiii");
break;
}
//System.out.println("uuuuuuuuuuuuuu");
}
} finally {
if (document != null) {
document.close();
}
}
}
/**
* Override the default functionality of PDFTextStripper.writeString()
*/
#Override
protected void writeString(String string, List<TextPosition> textPositions) throws IOException {
System.out.println("textPositions = " + string);
// System.out.println("tex "+textPositions.get(0).getFont()+ getArticleEnd());
// you may process the line here itself, as and when it is obtained
}
}
in need a output like
My pdf have some title, we need to skip the same.
pdf file content is
how to extract text as in separate formats as specified.
thanks in advance.

replace a text in MS word Templete(Docx) using java

I am trying to search a string in docx and replace with some other text using java apache poi but it is replacing randomly
getting error as arrayIndexoutofbound Exception in line
"declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:ffData/w:name/#w:val")[0];
public class WordReplaceTextInFormFields {
private static void replaceFormFieldText(XWPFDocument document, String ffname, String text) {
boolean foundformfield = false;
for (XWPFParagraph paragraph : document.getParagraphs()) {
for (XWPFRun run : paragraph.getRuns()) {
XmlCursor cursor = run.getCTR().newCursor();
cursor.selectPath(
"declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:fldChar/#w:fldCharType");
while (cursor.hasNextSelection()) {
cursor.toNextSelection();
XmlObject obj = cursor.getObject();
if ("begin".equals(((SimpleValue) obj).getStringValue())) {
cursor.toParent();
obj = cursor.getObject();
obj = obj.selectPath(
"declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' .//w:ffData/w:name/#w:val")[0];
if (ffname.equals(((SimpleValue) obj).getStringValue())) {
foundformfield = true;
} else {
foundformfield = false;
}
} else if ("end".equals(((SimpleValue) obj).getStringValue())) {
if (foundformfield)
return;
foundformfield = false;
}
}
if (foundformfield && run.getCTR().getTList().size() > 0) {
run.getCTR().getTList().get(0).setStringValue(text);
// System.out.println(run.getCTR());
}
}
}
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("WordTemplate.docx"));
replaceFormFieldText(document, "Text1", "Моя Компания");
replaceFormFieldText(document, "Text2", "Аксель Джоачимович Рихтер");
replaceFormFieldText(document, "Text3", "Доверенность");
document.write(new FileOutputStream("WordReplaceTextInFormFields.docx"));
document.close();
}
}
it misses some string, it not replaces entire document..please help with sample code
I do something similar in my project at https://github.com/centic9/poi-mail-merge which provides a general mail-merge functionality based on POI. It is using a bit different functionality from XmlBeans which replaces strings in the full XML-content of the document instead of each paragraph separately.
private static void appendBody(CTBody src, String append, boolean first) throws XmlException {
XmlOptions optionsOuter = new XmlOptions();
optionsOuter.setSaveOuter();
String srcString = src.xmlText();
String prefix = srcString.substring(0,srcString.indexOf(">")+1);
final String mainPart;
// exclude template itself in first appending
if(first) {
mainPart = "";
} else {
mainPart = srcString.substring(srcString.indexOf(">")+1,srcString.lastIndexOf("<"));
}
String suffix = srcString.substring( srcString.lastIndexOf("<") );
String addPart = append.substring(append.indexOf(">") + 1, append.lastIndexOf("<"));
CTBody makeBody = CTBody.Factory.parse(prefix+mainPart+addPart+suffix);
src.set(makeBody);
}
}
See line 132 in MailMerge.java

Error While Reading Large Excel Files (xlsx) Via Apache POI

I am trying to read large excel files xlsx via Apache POI, say 40-50 MB. I am getting out of memory exception. The current heap memory is 3GB.
I can read smaller excel files without any issues. I need a way to read large excel files and then them back as response via Spring excel view.
public class FetchExcel extends AbstractView {
#Override
protected void renderMergedOutputModel(
Map model, HttpServletRequest request, HttpServletResponse response)
throws Exception {
String fileName = "SomeExcel.xlsx";
response.setContentType("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
OPCPackage pkg = OPCPackage.open("/someDir/SomeExcel.xlsx");
XSSFWorkbook workbook = new XSSFWorkbook(pkg);
ServletOutputStream respOut = response.getOutputStream();
pkg.close();
workbook.write(respOut);
respOut.flush();
workbook = null;
response.setHeader("Content-disposition", "attachment;filename=\"" +fileName+ "\"");
}
}
I first started off using XSSFWorkbook workbook = new XSSFWorkbook(FileInputStream in);
but that was costly per Apache POI API, so I switched to OPC package way but still the same effect. I don't need to parse or process the file, just read it and return it.
Here is an example to read a large xls file using sax parser.
public void parseExcel(File file) throws IOException {
OPCPackage container;
try {
container = OPCPackage.open(file.getAbsolutePath());
ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(container);
XSSFReader xssfReader = new XSSFReader(container);
StylesTable styles = xssfReader.getStylesTable();
XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
while (iter.hasNext()) {
InputStream stream = iter.next();
processSheet(styles, strings, stream);
stream.close();
}
} catch (InvalidFormatException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (OpenXML4JException e) {
e.printStackTrace();
}
}
protected void processSheet(StylesTable styles, ReadOnlySharedStringsTable strings, InputStream sheetInputStream) throws IOException, SAXException {
InputSource sheetSource = new InputSource(sheetInputStream);
SAXParserFactory saxFactory = SAXParserFactory.newInstance();
try {
SAXParser saxParser = saxFactory.newSAXParser();
XMLReader sheetParser = saxParser.getXMLReader();
ContentHandler handler = new XSSFSheetXMLHandler(styles, strings, new SheetContentsHandler() {
#Override
public void startRow(int rowNum) {
}
#Override
public void endRow() {
}
#Override
public void cell(String cellReference, String formattedValue) {
}
#Override
public void headerFooter(String text, boolean isHeader, String tagName) {
}
},
false//means result instead of formula
);
sheetParser.setContentHandler(handler);
sheetParser.parse(sheetSource);
} catch (ParserConfigurationException e) {
throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
}
You don't mention whether you need to modify the spreadsheet or not.
This may be obvious, but if you don't need to modify the spreadsheet, then you don't need to parse it and write it back out, you can simply read bytes from the file, and write out bytes, as you would with, say an image, or any other binary format.
If you do need to modify the spreadsheet before sending it to the user, then to my knowledge, you may have to take a different approach.
Every library that I'm aware of for reading Excel files in Java reads the whole spreadsheet into memory, so you'd have to have 50MB of memory available for every spreadsheet that could possibly be concurrently processed. This involves, as others have pointed out, adjusting the heap available to the VM.
If you need to process a large number of spreadsheets concurrently, and can't allocate enough memory, consider using a format that can be streamed, instead of read all at once into memory. CSV format can be opened by Excel, and I've had good results in the past by setting the content-type to application/vnd.ms-excel, setting the attachment filename to something ending in ".xls", but actually returning CSV content. I haven't tried this in a couple of years, so YMMV.
In the bellwo example I'll add a complete code how to parse a complete excel file (for me 60Mo) into list of object without any problem of "out of memory" and work fine:
import java.util.ArrayList;
import java.util.List;
class DistinctByProperty {
private static OPCPackage xlsxPackage = null;
private static PrintStream output= System.out;
private static List<MassUpdateMonitoringRow> resultMapping = new ArrayList<>();
public static void main(String[] args) throws IOException {
File file = new File("C:\\Users\\aberguig032018\\Downloads\\your_excel.xlsx");
double bytes = file.length();
double kilobytes = (bytes / 1024);
double megabytes = (kilobytes / 1024);
System.out.println("Size "+megabytes);
parseExcel(file);
}
public static void parseExcel(File file) throws IOException {
try {
xlsxPackage = OPCPackage.open(file.getAbsolutePath(), PackageAccess.READ);
ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(xlsxPackage);
XSSFReader xssfReader = new XSSFReader(xlsxPackage);
StylesTable styles = xssfReader.getStylesTable();
XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
int index = 0;
while (iter.hasNext()) {
try (InputStream stream = iter.next()) {
String sheetName = iter.getSheetName();
output.println();
output.println(sheetName + " [index=" + index + "]:");
processSheet(styles, strings, new MappingFromXml(resultMapping), stream);
}
++index;
}
} catch (InvalidFormatException e) {
e.printStackTrace();
} catch (OpenXML4JException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
}
}
private static void processSheet(StylesTable styles, ReadOnlySharedStringsTable strings, MappingFromXml mappingFromXml, InputStream sheetInputStream) throws IOException, SAXException {
DataFormatter formatter = new DataFormatter();
InputSource sheetSource = new InputSource(sheetInputStream);
try {
XMLReader sheetParser = SAXHelper.newXMLReader();
ContentHandler handler = new XSSFSheetXMLHandler(
styles, null, strings, mappingFromXml, formatter, false);
sheetParser.setContentHandler(handler);
sheetParser.parse(sheetSource);
System.out.println("Size of Array "+resultMapping.size());
} catch(ParserConfigurationException e) {
throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
}
}
}
you have to add a calss that implements
SheetContentsHandler
import com.sun.org.apache.xpath.internal.operations.Bool;
import org.apache.poi.ss.util.CellAddress;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler;
import org.apache.poi.xssf.usermodel.XSSFComment;
import java.io.PrintStream;
import java.util.ArrayList;
import java.util.List;
public class MappingFromXml implements SheetContentsHandler {
private List<myObject> result = new ArrayList<>();
private myObject myObject = null;
private int lineNumber = 0;
/**
* Number of columns to read starting with leftmost
*/
private int minColumns = 25;
/**
* Destination for data
*/
private PrintStream output = System.out;
public MappingFromXml(List<myObject> list) {
this.result = list;
}
#Override
public void startRow(int i) {
output.println("iii " + i);
lineNumber = i;
myObject = new myObject();
}
#Override
public void endRow(int i) {
output.println("jjj " + i);
result.add(myObject);
myObject = null;
}
#Override
public void cell(String cellReference, String formattedValue, XSSFComment comment) {
int columnIndex = (new CellReference(cellReference)).getCol();
if(lineNumber > 0){
switch (columnIndex) {
case 0: {//Tech id
if (formattedValue != null && !formattedValue.isEmpty())
myObject.setId(Integer.parseInt(formattedValue));
}
break;
//TODO add other cell
}
}
}
#Override
public void headerFooter(String s, boolean b, String s1) {
}
}
For more information visite this link
I too faced the same issue of OOM while parsing xlsx file...after two days of struggle, I finally found out the below code that was really perfect;
This code is based on sjxlsx. It reads the xlsx and stores in a HSSF sheet.
[code=java]
// read the xlsx file
SimpleXLSXWorkbook = new SimpleXLSXWorkbook(new File("C:/test.xlsx"));
HSSFWorkbook hsfWorkbook = new HSSFWorkbook();
org.apache.poi.ss.usermodel.Sheet hsfSheet = hsfWorkbook.createSheet();
Sheet sheetToRead = workbook.getSheet(0, false);
SheetRowReader reader = sheetToRead.newReader();
Cell[] row;
int rowPos = 0;
while ((row = reader.readRow()) != null) {
org.apache.poi.ss.usermodel.Row hfsRow = hsfSheet.createRow(rowPos);
int cellPos = 0;
for (Cell cell : row) {
if(cell != null){
org.apache.poi.ss.usermodel.Cell hfsCell = hfsRow.createCell(cellPos);
hfsCell.setCellType(org.apache.poi.ss.usermodel.Cell.CELL_TYPE_STRING);
hfsCell.setCellValue(cell.getValue());
}
cellPos++;
}
rowPos++;
}
return hsfSheet;[/code]

Categories