Copy HTML content into Excel by Java - java

I am a Java beginner and it would be grateful if you could help to provide some sample codes or guidelines for below situation
I have a large number of html files, each file contains some school's info. Each html file may locate at different hierarchy of folder path but for sure it is always in the lowest level of the folder path. And some folders may have no school html files
For example
C:\schools\england\london\hampstead\school_A.html [1 html in 1 folder]
C:\schools\england\london\southwark\school_B.html [multiple files in 1 folder]
C:\schools\england\london\southwark\school_C.html
C:\schools\england\london\southwark\school_D.html
C:\schools\wales\monmouth\school_E.html [file at different path level]
C:\schools\scotland\aberdeen\aberdeen [folder has no file]
HTML CONTENT TO BE COPIED
< h1 id="MainControl_CustomFunctionality_ZoneMain_EmbeddedUserControlPlaceholderControl1_ctl01_schoolName" class="schoolName">**school_A**</h1>
< li id="MainControl_CustomFunctionality_ZoneMain_EmbeddedUserControlPlaceholderControl1_ctl01_boardingTypeContainer" style="list-style: none;"><span>Day/boarding type:</span> <span id="MainControl_CustomFunctionality_ZoneMain_EmbeddedUserControlPlaceholderControl1_ctl01_boardingType" class="infoDetail">**Day, full boarding and weekly boarding**</span></li>
< li id="MainControl_CustomFunctionality_ZoneMain_EmbeddedUserControlPlaceholderControl1_ctl01_boardingFeeContainer" style="list-style: none;"><span>Boarding fees per term:</span> <span id="MainControl_CustomFunctionality_ZoneMain_EmbeddedUserControlPlaceholderControl1_ctl01_boardingFee" class="infoDetail">**£7,317 to £8,370**</span></li>
EXPECTED RESULTS IN EXCEL TABLE
3 Columns Headers: "SCHOOL" "BOARDING TYPE" "BOARDING FEES PER TERM"
Row 1: "**school_A**" "**Day,full boarding and weekly boarding**" "**£7,317 to £8,370**"
Thank you very much for your help

I have some code for this requirement. Please follow this according to your requirement.
import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.text.SimpleDateFormat;
import java.util.Date;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.xssf.usermodel.XSSFCellStyle;
import org.apache.poi.xssf.usermodel.XSSFFont;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
public class HTMLToExcel
{
public static void main(String[] args)
{
BufferedReader br = null;
try {
br = new BufferedReader(new FileReader(new File("D:\\Excels\\log_km_styles1.html")));
// Create Work book
XSSFWorkbook xwork = new XSSFWorkbook();
// Create Spread Sheet
XSSFSheet xsheet = xwork.createSheet("MyFristSheet");
//Create Row (Row is inside spread sheet)
XSSFRow xrow = null;
int rowid =0;
String line ;
while (( line =br.readLine())!= null) {
// Create font for applying bold or italic or same thing else on the content
/*XSSFFont xfont = xwork.createFont();
xfont.setBoldweight(xfont.BOLDWEIGHT_BOLD);
XSSFCellStyle xstyle = xwork.createCellStyle();
xstyle.setFont(xfont);*/
System.out.println(line);
String split[] = line.split("<br>");
Cell cell;
for (int i = 0; i < split.length; i++) {
xrow = xsheet.createRow(rowid);
cell = xrow.createCell(2);
cell.setCellValue(split[i]);
String[] columnSplit = split[i].split("\\W+");
int columnCount = 3;
for (int j = 0; j < columnSplit.length; j++) {
cell = xrow.createCell(columnCount++);
cell.setCellValue(columnSplit[j]);
}
System.out.println(split[i]);
rowid++;
}
}
// create date for adding this to our workbook name like workbookname_date
Date d1 = new Date();
SimpleDateFormat sdf = new SimpleDateFormat("dd-MMM-yy");
String todaysDate = sdf.format(d1);
System.out.println(sdf.format(d1));
//Create file system using specific name
FileOutputStream fout = new FileOutputStream(new File("D:\\Excels\\redaingfromHTMLFile_"+todaysDate+".xlsx"));
xwork.write(fout);
fout.close();
System.out.println("redaingfromHTMLFile_"+todaysDate+".xlsx written successfully" );
}
catch (Exception e) {
e.printStackTrace();
}
}
}
Above code converts html file content into Excel file. It will create new file with today's date in the file name. try with this one. I hope it will help you

Related

Unexpected record type (org.apache.poi.hssf.record.HyperlinkRecord)

The problem:
I'm just trying to open it .xls file using the Apache-poi 4.1.0 library and it gives the same error as 4 years ago in a similar question.
I already tried
to put version 3.12-3.16.
3.13 as well
All versions can open blank .xls and filled by myself but not this one.
This document is generated automatically and I need to make a program that accepts it.
I already made a .Net standart library C# which is work, I tried to use xamarin android it's a horror, the app weighs 50 mb vs 3 mb due to various terrible SDK link errors, but that's a different story. So I decided to do it on Kotlin.
Code is from the documentation
You can check file on git
val inputStream = FileInputStream("./test.xls")
val wb = HSSFWorkbook(inputStream)
I expect no errors while opening xls.
Actual output is
Exception in thread "main" java.lang.RuntimeException: Unexpected record type (org.apache.poi.hssf.record.HyperlinkRecord)
at org.apache.poi.hssf.record.aggregates.RowRecordsAggregate.<init>(RowRecordsAggregate.java:97)
at org.apache.poi.hssf.model.InternalSheet.<init>(InternalSheet.java:183)
at org.apache.poi.hssf.model.InternalSheet.createSheet(InternalSheet.java:122)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:354)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:400)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:381)
at ru.plumber71.toolbox.ExcelParcerKt.main(ExcelParcer.kt:19)
at ru.plumber71.toolbox.ExcelParcerKt.main(ExcelParcer.kt)
The document will not be modified in any way. If there any other libraries to just read the dataset or strings from the .xls file will be OK.
After some investigation I found the problem with your test.xls file.
According the file format specifications, all HyperlinkRecords should be together in the Hyperlink Table. It is contained in the Sheet Substream following the cell records. In your case the HyperlinkRecords are between other records (between NumberRecords and LabelSSTRecords in that case). So I suspect it was not Excel what had created that test.xls file.
Excelmight be tolerant enough to open that file nevertheless. But you cannot expect that apache poi also tries to tolerate all possible violations in file format. If you open the file using Excel and then re-save it, apache poi is able creating the Workbookafter that.
Apache poi is not able repairing this as Excel can do. But one could read the POIFSFileSystem a low level way and filtering out the HyperlinkRecords that are between other records. That way one could read the content using apache poi, of course except the hyperlinks.
Example:
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.poifs.filesystem.DirectoryNode;
import org.apache.poi.hssf.record.Record;
import org.apache.poi.hssf.record.NameRecord;
import org.apache.poi.hssf.record.NameCommentRecord;
import org.apache.poi.hssf.record.HyperlinkRecord;
import org.apache.poi.hssf.record.RecordFactoryInputStream;
import org.apache.poi.hssf.record.RecordFactory;
import org.apache.poi.hssf.model.RecordStream;
import org.apache.poi.hssf.model.InternalWorkbook;
import org.apache.poi.hssf.model.InternalSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFName;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.util.CellReference;
import java.util.List;
import java.util.ArrayList;
import java.lang.reflect.Field;
import java.lang.reflect.Method;
import java.lang.reflect.Constructor;
class ExcelOpenHSSF {
public static void main(String[] args) throws Exception {
String fileName = "test(2).xls";
try (InputStream is = new FileInputStream(fileName);
POIFSFileSystem fileSystem = new POIFSFileSystem(is)) {
//find workbook directory entry
DirectoryNode directory = fileSystem.getRoot();
String workbookName = "";
for(String wbName : InternalWorkbook.WORKBOOK_DIR_ENTRY_NAMES) {
if(directory.hasEntry(wbName)) {
workbookName = wbName;
break;
}
}
InputStream stream = directory.createDocumentInputStream(workbookName);
//loop over all records and manipulate if needed
List<Record> records = new ArrayList<Record>();
RecordFactoryInputStream recStream = new RecordFactoryInputStream(stream, true);
//here we filter out the HyperlinkRecords that are between other records (NumberRecords and LabelSSTRecords in that case)
//System.out.println prints the problematic records
Record record1 = null;
Record record2 = null;
while ((record1 = recStream.nextRecord()) != null) {
record2 = recStream.nextRecord();
if (!(record1 instanceof HyperlinkRecord) && (record2 instanceof HyperlinkRecord)) {
System.out.println(record1);
System.out.println(record2);
records.add(record1);
} else if ((record1 instanceof HyperlinkRecord) && !(record2 instanceof HyperlinkRecord)) {
System.out.println(record1);
System.out.println(record2);
records.add(record2);
} else {
records.add(record1);
if (record2 != null) records.add(record2);
}
}
//now create the HSSFWorkbook
//see https://svn.apache.org/viewvc/poi/tags/REL_4_1_0/src/java/org/apache/poi/hssf/usermodel/HSSFWorkbook.java?view=markup#l322
InternalWorkbook internalWorkbook = InternalWorkbook.createWorkbook(records);
HSSFWorkbook wb = HSSFWorkbook.create(internalWorkbook);
int recOffset = internalWorkbook.getNumRecords();
Method convertLabelRecords = HSSFWorkbook.class.getDeclaredMethod("convertLabelRecords", List.class, int.class);
convertLabelRecords.setAccessible(true);
convertLabelRecords.invoke(wb, records, recOffset);
RecordStream rs = new RecordStream(records, recOffset);
while (rs.hasNext()) {
InternalSheet internelSheet = InternalSheet.createSheet(rs);
Constructor constructor = HSSFSheet.class.getDeclaredConstructor(HSSFWorkbook.class, InternalSheet.class);
constructor.setAccessible(true);
HSSFSheet hssfSheet = (HSSFSheet)constructor.newInstance(wb, internelSheet);
Field _sheets = HSSFWorkbook.class.getDeclaredField("_sheets");
_sheets.setAccessible(true);
#SuppressWarnings("unchecked")
List<HSSFSheet> sheets = (ArrayList<HSSFSheet>)_sheets.get(wb);
sheets.add(hssfSheet);
}
for (int i = 0 ; i < internalWorkbook.getNumNames() ; ++i){
NameRecord nameRecord = internalWorkbook.getNameRecord(i);
Constructor constructor = HSSFName.class.getDeclaredConstructor(HSSFWorkbook.class, NameRecord.class, NameCommentRecord.class);
constructor.setAccessible(true);
HSSFName name = (HSSFName)constructor.newInstance(wb, nameRecord, internalWorkbook.getNameCommentRecord(nameRecord));
Field _names = HSSFWorkbook.class.getDeclaredField("names");
_names.setAccessible(true);
#SuppressWarnings("unchecked")
List<HSSFName> names = (ArrayList<HSSFName>)_names.get(wb);
names.add(name);
}
//now the workbook is created properly
System.out.println(wb);
/*
//getting the data
DataFormatter formatter = new DataFormatter();
Sheet sheet = wb.getSheetAt(0);
for (Row row : sheet) {
for (Cell cell : row) {
CellReference cellRef = new CellReference(row.getRowNum(), cell.getColumnIndex());
System.out.print(cellRef.formatAsString());
System.out.print(" - ");
String text = formatter.formatCellValue(cell);
System.out.println(text);
}
}
*/
}
}
}
I was able to open a file of this "corrupted" type by using JExcel API
But using poi.apache.org also opens the file if manually resave it using excel application. (It may not be suitable for someone)
Sorry that it was asking strange questions. Thank you all and hope that someone may find useful.
val inputStream = FileInputStream("./testCorrupted.xls")
val workbook = Workbook.getWorkbook(inputStream)
val sheet = workbook.getSheet(0)
val cell1 = sheet.getCell(0, 0)
print(cell1.contents + ":")

Add content to a very large Excel file using Apache POI (run out of alternatives...)

I have a large xlsx file which has an empty "data source sheet" and other sheets containing lots of formulas that use the data source sheet. My application should generate the data, open the file, fill the empty sheet up with that data and save it. I'm trying to do all that with Apache POI.
The problem is that opening the file takes an unacceptable amount of memory and time. I've read other threads and couldn't find a solution.
This is how I open the file:
pkg = OPCPackage.open(filename);
wb = new XSSFWorkbook(pkg);
Please note that using SXSSFWorkbook does not work as its constructors take a XSSFWorkbook which I'm unable to create in the first place.
What I need is just to fill one empty sheet in the file, I don't need to completely load it in memory. Any Ideas??
Thank you!!
You could try working only with the OPCPackage without creating a Workbook. But then we must work at the lower level org.openxmlformats.schemas.spreadsheetml.x2006.main objects. This means we have not the support from the XSSF objects while storing string values as data (SharedStringsTable) and evaluating formulas.
The example takes a Excel workbook with at least 4 worksheets. The third worksheet is your "data source sheet". It must exist and will be overwritten with new data. The fourth worksheet is the worksheet in which formulas are referencing the "data source sheet". Since we can't use an evaluator, we must set FullCalcOnLoad true. If we would not do that, we had to press [Ctrl]+[Alt]+[Shift]+[F9] to force fully recalculation.
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.xssf.model.SharedStringsTable;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.OutputStream;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.WorksheetDocument;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTWorksheet;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTSheetData;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTCell;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.STCellType;
import org.openxmlformats.schemas.officeDocument.x2006.relationships.STRelationshipId;
import org.apache.xmlbeans.XmlOptions;
import org.apache.xmlbeans.XmlException;
import javax.xml.namespace.QName;
import java.util.List;
import java.util.Map;
import java.util.HashMap;
import java.util.regex.Pattern;
class ReadAndWriteTest5 {
public static void main(String[] args) {
try {
File file = new File("ReGesamt11_3Test.xlsx");
//we only open the OPCPackage, we don't create a Workbook
OPCPackage opcpackage = OPCPackage.open(file);
//if there are strings in the SheetData, we need the SharedStringsTable
PackagePart sharedstringstablepart = opcpackage.getPartsByName(Pattern.compile("/xl/sharedStrings.xml")).get(0);
SharedStringsTable sharedstringstable = new SharedStringsTable();
sharedstringstable.readFrom(sharedstringstablepart.getInputStream());
//create empty WorksheetDocument for the "data source sheet"
WorksheetDocument worksheetdocument = WorksheetDocument.Factory.newInstance();
CTWorksheet worksheet = worksheetdocument.addNewWorksheet();
CTSheetData sheetdata = worksheet.addNewSheetData();
//put some data in for the "data source sheet"
for (int i = 0; i < 10; i++) {
CTCell ctcell= sheetdata.addNewRow().addNewC();
CTRst ctstr = CTRst.Factory.newInstance();
ctstr.setT("DataRow " + i);
int sRef = sharedstringstable.addEntry(ctstr);
ctcell.setT(STCellType.S);
ctcell.setV(Integer.toString(sRef));
ctcell=sheetdata.getRowArray(i).addNewC();
ctcell.setV(""+(i*100+(i+1)*10+(i+2))+"."+((i+3)*10+(i+4)));
}
//write the SharedStringsTable
OutputStream out = sharedstringstablepart.getOutputStream();
sharedstringstable.writeTo(out);
out.close();
//create XmlOptions for saving the worksheet
XmlOptions xmlOptions = new XmlOptions();
xmlOptions.setSaveOuter();
xmlOptions.setUseDefaultNamespace();
xmlOptions.setSaveAggressiveNamespaces();
xmlOptions.setCharacterEncoding("UTF-8");
xmlOptions.setSaveSyntheticDocumentElement(new QName(CTWorksheet.type.getName().getNamespaceURI(), "worksheet"));
Map<String, String> map = new HashMap<String, String>();
map.put(STRelationshipId.type.getName().getNamespaceURI(), "r");
xmlOptions.setSaveSuggestedPrefixes(map);
//get the PackagePart of the third sheet which is the "data source sheet"
//this sheet must exist and will be replaced with the new content
PackagePart sheetpart = opcpackage.getPartsByName(Pattern.compile("/xl/worksheets/sheet3.xml")).get(0);
//save the worksheet as the third sheet which is the "data source sheet"
out = sheetpart.getOutputStream();
worksheet.save(out, xmlOptions);
out.close();
//get the PackagePart of the fourth sheet which is the sheet on which formulas are referencing the "data source sheet"
//since we can't use Evaluator, we must force recalculation on load for this sheet
sheetpart = opcpackage.getPartsByName(Pattern.compile("/xl/worksheets/sheet4.xml")).get(0);
worksheetdocument = WorksheetDocument.Factory.parse(sheetpart.getInputStream());
worksheet = worksheetdocument.getWorksheet();
//setFullCalcOnLoad true
if (worksheet.getSheetCalcPr() == null) {
worksheet.addNewSheetCalcPr().setFullCalcOnLoad(true);
} else {
worksheet.getSheetCalcPr().setFullCalcOnLoad(true);
}
out = sheetpart.getOutputStream();
worksheet.save(out, xmlOptions);
out.close();
opcpackage.close();
} catch (InvalidFormatException ifex) {
ifex.printStackTrace();
} catch (FileNotFoundException fnfex) {
fnfex.printStackTrace();
} catch (IOException ioex) {
ioex.printStackTrace();
} catch (XmlException xmlex) {
xmlex.printStackTrace();
}
}
}

Using hashmap for POI Java XLSX

I have been trying to edit my code to allow a XLSX file to be uploaded and be able to be read on the website. But after countless tries, the data I typed into the XLSX File is unable to be captured on the website. (Eg: After downloading the XLSX Template from the website, I am able to type in anything that I want in the XLSX file and able to upload it again to the website so I do not need to keep on adding new data by clicking "new" every single time. I can just type in everything in that XLSX File all at once and upload it right away)
I was told to use hashmap but I am unsure of the way it works. The codes I have currently only enables the website to capture the header title and I am not suppose to use jxl.
While removing those codes that has jxl, I encounter some errors (being underline in red).
public HashMap getConstructJXLList_xlsx(UploadedFile File, int Sheetindex) {
String _LOC = "[PageCodeBase: getConstructJXLList]";
HashMap _m = new HashMap();
InputStream _is = null;
try {
_is = File.getInputstream();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
XSSFWorkbook workbook;
XSSFSheet s;
try {
workbook = new XSSFWorkbook(_is);
s = workbook.getSheetAt(Sheetindex);
} catch (Exception e) {
System.out.println(_LOC + "1.0 " + " Test:");
int _totalc = getColumns(); //getColumns is being underline in red
int _totalr = getRows(); //getRows is being underline in red
// Header r=0
String[] _st = new String[_totalc];
//XSSFSheet sheet = null;
for (int _c = 0; _c < _totalc; _c++) {
_st[_c] = getCell(_c, 0); //getCell is being underline in red
}
_m.put("HEADER", _st);
System.out.println(_LOC + "1.0 " + " _m:" + _m);
// Data r=1 thereafter
List _l = new ArrayList();
for (int _r = 1; _r < _totalr; _r++) {
Object[] _o = new Object[_totalc];
String _s_r = null;
for (int _c = 0; _c < _totalc; _c++) {
_o[_c] = getCell(_c, _r);
String _cn = _o[_c].getClass().getName();
String _s_c = null;
if (!isEmptyNull(_s_c)) {
_s_r = "record_available";
}
}
if ((_o != null) && (_o.length != 0)) {
_l.add(_o);
}
}
_m.put("DATA", _l);
System.out.println(_LOC + "1.0 " + " _m:" + _m);
}
return _m;
}
Do you mind helping me to solve this? Why there isn't any data being capture in the website? The error shown is "The method getColumns/getCell/getRows is undefined for the type PageCodeBase." And the help/quick fix given is to create a new method. But after creating the new method, I am unsure of what to add in the methods. Have tried various example (http://snippetjournal.wordpress.com/2014/02/05/read-xlsx-using-poi/) but I stil can't seem to get it work out.
I would recommend you to manage de excel file using this classes from the apache POI api
org.apache.poi.ss.usermodel.Cell;
org.apache.poi.ss.usermodel.Row;
org.apache.poi.ss.usermodel.Sheet;
org.apache.poi.ss.usermodel.Workbook;
org.apache.poi.ss.usermodel.WorkbookFactory;
instead of those XSSFWorkbook, XSSFSheet...
And also when accessing the file input stream try doing it this way:
FileInputStream input = new FileInputStream(new File("C:\\Users\\admin\\Desktop\\Load_AcctCntr_Template.xlsx"));
Workbook workBook = WorkbookFactory.create(stream);
workBook.getSheetAt(0);
use this.
FileInputStream input = new FileInputStream(new File("C:/Users/admin/Desktop/Load_AcctCntr_Template.xlsx"));
Workbook wb = WorkbookFactory.create(input);
as mentioned in user3661357 answer. use
Workbook instead of XSSFWorkbook.
Sheet instead of XSSFSheet.
etc..
Also read this
Getting Exception(org.apache.poi.openxml4j.exception - no content type [M1.13]) when reading xlsx file using Apache POI?
*HINT > use ALT+SHIFT+I in netbeans to load the necessary packages.
A working example
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.Iterator;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.WorkbookFactory;
public class POITest {
public static void test() {
try {
FileInputStream input = new FileInputStream(new File("C:/Users/kingslayer/Desktop/test/a.xlsx"));
Workbook wb = WorkbookFactory.create(input);
Sheet s = wb.getSheetAt(0);
Iterator<Row> rows = s.rowIterator();
while (rows.hasNext()) {
Row row = rows.next();
Iterator cells = row.cellIterator();
while (cells.hasNext()) {
Cell cell = (Cell) cells.next();
if (cell.getCellType() == Cell.CELL_TYPE_STRING) {
System.out.print(cell.getStringCellValue() + "t");
} else if (cell.getCellType() == Cell.CELL_TYPE_NUMERIC) {
System.out.print(cell.getNumericCellValue() + "t");
} else if (cell.CELL_TYPE_BLANK == cell.getCellType()) {
System.out.print("BLANK ");
} else {
System.out.print("Unknown cell type");
}
}
input.close();
}
} catch (IOException | InvalidFormatException ex) {
Logger.getLogger(POITest.class.getName()).log(Level.SEVERE, null, ex);
}
}
public static void main(String[] args) {
test();
}
}
All the libraries you must have on the project path.
commons-codec-1.5.jar ,
commons-logging-1.1.jar ,
dom4j-1.6.1.jar ,
junit-3.8.1.jar ,
log4j-1.2.13.jar ,
poi-3.9-20121203.jar ,
poi-excelant-3.9-20121203.jar ,
poi-ooxml-3.9-20121203.jar ,
poi-ooxml-schemas-3.9-20121203.jar ,
poi-scratchpad-3.9-20121203.jar ,
stax-api-1.0.1.jar ,
xmlbeans-2.3.0.jar ,
1) get rid of POIFSFileSystem fs = new POIFSFileSystem(input); as you are not using it
2) input.close(); is called after first iteration of row

Convert .csv to .xls in Java

Does anyone here know of any quick, clean way to convert csv files to xls or xlsx files in java?
I have something to manage csv files already in place and I need the extra compatibility for other programs.
Sample code in addition to package names is always well appreciated.
Many thanks,
Justian
Here's my code thus far. I need to remove the returns ("\n") from the lines. Some of my cells contain multiple lines of information (a list), so I can use "\n" in csv to indicate multiple lines within a cell, but xls treats these as if I mean to put them on a new line.
The code is modified from the internet and a little messy at the moment. You might notice some deprecated methods, as it was written in 2004, and be sure to ignore the terrible return statements. I'm just using S.o.p at the moment for testing and I'll clean that up later.
package jab.jm.io;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
public class FileConverter {
public static String ConvertCSVToXLS(String file) throws IOException {
if (file.indexOf(".csv") < 0)
return "Error converting file: .csv file not given.";
String name = FileManager.getFileNameFromPath(file, false);
ArrayList<ArrayList<String>> arList = new ArrayList<ArrayList<String>>();
ArrayList<String> al = null;
String thisLine;
DataInputStream myInput = new DataInputStream(new FileInputStream(file));
while ((thisLine = myInput.readLine()) != null) {
al = new ArrayList<String>();
String strar[] = thisLine.split(",");
for (int j = 0; j < strar.length; j++) {
// My Attempt (BELOW)
String edit = strar[j].replace('\n', ' ');
al.add(edit);
}
arList.add(al);
System.out.println();
}
try {
HSSFWorkbook hwb = new HSSFWorkbook();
HSSFSheet sheet = hwb.createSheet("new sheet");
for (int k = 0; k < arList.size(); k++) {
ArrayList<String> ardata = (ArrayList<String>) arList.get(k);
HSSFRow row = sheet.createRow((short) 0 + k);
for (int p = 0; p < ardata.size(); p++) {
System.out.print(ardata.get(p));
HSSFCell cell = row.createCell((short) p);
cell.setCellValue(ardata.get(p).toString());
}
}
FileOutputStream fileOut = new FileOutputStream(
FileManager.getCleanPath() + "/converted files/" + name
+ ".xls");
hwb.write(fileOut);
fileOut.close();
System.out.println(name + ".xls has been generated");
} catch (Exception ex) {
}
return "";
}
}
Don't know if you know this already, but:
Excel (if that's your real target) is easily able to read .csv files directly, so any conversion you'd do would only be a courtesy to your less "gifted" users.
CSV is a lowest-common-denominator format. It's unlikely for any converter to add information to that found in a .csv file that will make it more useful. In other words, CSV is a "dumb" format and converting it to .xls will (probably) increase file size but not make the format any smarter.
Curtis' suggestion of POI is the first thing that would come to my mind too.
If you're doing this conversion on a Windows machine, another alternative could be Jacob, a Java-COM bridge that would allow you to effectively remote control Excel from a Java program so as to do things like open a file and save in a different format, perhaps even applying some formatting changes or such.
Finally, I've also had some success doing SQL INSERTs (via JDBC) into an Excel worksheet accessed via the JDBC-ODBC bridge. i.e. ODBC can make an Excel file look like a database. It's not very flexible though, you can't ask the DB to create arbitrarily named .XLS files.
EDIT:
It looks to me like readLine() is already not giving you whole lines. How is it to know that carriage return is not a line terminator? You should be able to verify this with debug print statements right after the readLine().
If this is indeed so, it would suck because the way forward would be for you to
either recognize incomplete lines and paste them together after the fact,
or write your own substitute for readLine(). A simple approach would be to read character by character, replacing CRs within a CSV string and accumulating text in a StringBuilder until you feel you have a complete line.
Both alternatives are work you probably weren't looking forward to.
If you want to read or write XLS or XLSX files in Java, Apache POI is a good bet: http://poi.apache.org/
Copy paste the below program,I ran the program and it is working fine,Let me know if you have any concerns on this program.(You need Apache POI Jar to run this program)
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Cell;
public class CSVToExcelConverter {
public static void main(String args[]) throws IOException
{
ArrayList arList=null;
ArrayList al=null;
String fName = "test.csv";
String thisLine;
int count=0;
FileInputStream fis = new FileInputStream(fName);
DataInputStream myInput = new DataInputStream(fis);
int i=0;
arList = new ArrayList();
while ((thisLine = myInput.readLine()) != null)
{
al = new ArrayList();
String strar[] = thisLine.split(",");
for(int j=0;j<strar.length;j++)
{
al.add(strar[j]);
}
arList.add(al);
System.out.println();
i++;
}
try
{
HSSFWorkbook hwb = new HSSFWorkbook();
HSSFSheet sheet = hwb.createSheet("new sheet");
for(int k=0;k<arList.size();k++)
{
ArrayList ardata = (ArrayList)arList.get(k);
HSSFRow row = sheet.createRow((short) 0+k);
for(int p=0;p<ardata.size();p++)
{
HSSFCell cell = row.createCell((short) p);
String data = ardata.get(p).toString();
if(data.startsWith("=")){
cell.setCellType(Cell.CELL_TYPE_STRING);
data=data.replaceAll("\"", "");
data=data.replaceAll("=", "");
cell.setCellValue(data);
}else if(data.startsWith("\"")){
data=data.replaceAll("\"", "");
cell.setCellType(Cell.CELL_TYPE_STRING);
cell.setCellValue(data);
}else{
data=data.replaceAll("\"", "");
cell.setCellType(Cell.CELL_TYPE_NUMERIC);
cell.setCellValue(data);
}
//*/
// cell.setCellValue(ardata.get(p).toString());
}
System.out.println();
}
FileOutputStream fileOut = new FileOutputStream("test.xls");
hwb.write(fileOut);
fileOut.close();
System.out.println("Your excel file has been generated");
} catch ( Exception ex ) {
ex.printStackTrace();
} //main method ends
}
}
The tools in Excel are not adequate for what the OP wants to do. He's on the right track there. Excel cannot import multiple CSV files into different worksheets in the same file, which is why you'd want to do it in code. My suggestion is to use OpenCSV to read the CSV, as it can automatically correct for newlines in data and missing columns, and it's free and open source. It's actually very, very robust and can handle all sorts of different non-standard CSV files.
You wrote:
I have something to manage csv files
already in place and I need the extra
compatibility for other programs.
What are those other programs? Are they required to access your data through Excel files, or could they work with an JDBC or ODBC connection to a database? Using a database as the central location, you could extract the data into CSV files or other formats as needed.
I created a small software called csv2xls. It needs Java.

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/xmlbeans/XmlException

I have to read xls file in java.I used poi-3.6 to read xls file in Eclipse.But i m getting this ERROR"Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/xmlbeans/XmlException at ReadExcel2.main(ReadExcel2.java:38)".
I have added following jars
1)poi-3.6-20091214.jar
2)poi-contrib-3.6-20091214.jar
3)poi-examples-3.6-20091214.jar
4)poi-ooxml-3.6-20091214.jar
5)poi-ooxml-schemas-3.6-20091214.jar
6)poi-scratchpad-3.6-20091214.jar
Below is the code which i m using:
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFCell;
import org.apache.poi.xssf.usermodel.XSSFRow;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.Iterator;
import java.util.List;
import java.util.ArrayList;
public class ReadExcel {
public static void main(String[] args) throws Exception {
//
// An excel file name. You can create a file name with a full path
// information.
//
String filename = "C:\\myExcel.xl";
//
// Create an ArrayList to store the data read from excel sheet.
//
List sheetData = new ArrayList();
FileInputStream fis = null;
try {
//
// Create a FileInputStream that will be use to read the excel file.
//
fis = new FileInputStream(filename);
//
// Create an excel workbook from the file system.
//
// HSSFWorkbook workbook = new HSSFWorkbook(fis);
Workbook workbook = new XSSFWorkbook(fis);
//
// Get the first sheet on the workbook.
//
Sheet sheet = workbook.getSheetAt(0);
//
// When we have a sheet object in hand we can iterator on each
// sheet's rows and on each row's cells. We store the data read
// on an ArrayList so that we can printed the content of the excel
// to the console.
//
Iterator rows = sheet.rowIterator();
while (rows.hasNext()) {
Row row = (XSSFRow) rows.next();
Iterator cells = row.cellIterator();
List data = new ArrayList();
while (cells.hasNext()) {
Cell cell = (XSSFCell) cells.next();
data.add(cell);
}
sheetData.add(data);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (fis != null) {
fis.close();
}
}
showExelData(sheetData);
}
private static void showExelData(List sheetData) {
//
// Iterates the data and print it out to the console.
//
for (int i = 0; i < sheetData.size(); i++) {
List list = (List) sheetData.get(i);
for (int j = 0; j < list.size(); j++) {
Cell cell = (XSSFCell) list.get(j);
System.out.print(cell.getRichStringCellValue().getString());
if (j < list.size() - 1) {
System.out.print(", ");
}
}
System.out.println("");
}
}
}
Please help.
thanks in anticipation,
Regards,
Dheeraj!
You need xmlbeans on your classpath.
NoClassDefFoundError means that:
The searched-for class definition existed when the currently executing class was compiled, but the definition can no longer be found.
So next time you get an exception like this, it means that some 3rd party library requires another 3rd party library. Then use google (or any other means) to find which library this is.
Furthermore, most libraries state clearly in their documentations and/or distributions what are their dependencies.
JarFinder suggests XMLBeans.jar
Had the same error on Apache POI 3.16. Added the following jars from Apache POI /ooxml-lib/xmlbeans-2.6.0 and for the next exception regarding collections /lib/commons-collections4-4.1.jar to fix.
I had similar situation in linux env, basically my lib path level was off by 1.

Categories