Convert .csv to .xls in Java

Convert .csv to .xls in Java - java

Does anyone here know of any quick, clean way to convert csv files to xls or xlsx files in java?
I have something to manage csv files already in place and I need the extra compatibility for other programs.
Sample code in addition to package names is always well appreciated.
Many thanks,
Justian
Here's my code thus far. I need to remove the returns ("\n") from the lines. Some of my cells contain multiple lines of information (a list), so I can use "\n" in csv to indicate multiple lines within a cell, but xls treats these as if I mean to put them on a new line.
The code is modified from the internet and a little messy at the moment. You might notice some deprecated methods, as it was written in 2004, and be sure to ignore the terrible return statements. I'm just using S.o.p at the moment for testing and I'll clean that up later.
package jab.jm.io;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
public class FileConverter {
public static String ConvertCSVToXLS(String file) throws IOException {
if (file.indexOf(".csv") < 0)
return "Error converting file: .csv file not given.";
String name = FileManager.getFileNameFromPath(file, false);
ArrayList<ArrayList<String>> arList = new ArrayList<ArrayList<String>>();
ArrayList<String> al = null;
String thisLine;
DataInputStream myInput = new DataInputStream(new FileInputStream(file));
while ((thisLine = myInput.readLine()) != null) {
al = new ArrayList<String>();
String strar[] = thisLine.split(",");
for (int j = 0; j < strar.length; j++) {
// My Attempt (BELOW)
String edit = strar[j].replace('\n', ' ');
al.add(edit);
}
arList.add(al);
System.out.println();
}
try {
HSSFWorkbook hwb = new HSSFWorkbook();
HSSFSheet sheet = hwb.createSheet("new sheet");
for (int k = 0; k < arList.size(); k++) {
ArrayList<String> ardata = (ArrayList<String>) arList.get(k);
HSSFRow row = sheet.createRow((short) 0 + k);
for (int p = 0; p < ardata.size(); p++) {
System.out.print(ardata.get(p));
HSSFCell cell = row.createCell((short) p);
cell.setCellValue(ardata.get(p).toString());
}
}
FileOutputStream fileOut = new FileOutputStream(
FileManager.getCleanPath() + "/converted files/" + name
+ ".xls");
hwb.write(fileOut);
fileOut.close();
System.out.println(name + ".xls has been generated");
} catch (Exception ex) {
}
return "";
}
}

Don't know if you know this already, but:
Excel (if that's your real target) is easily able to read .csv files directly, so any conversion you'd do would only be a courtesy to your less "gifted" users.
CSV is a lowest-common-denominator format. It's unlikely for any converter to add information to that found in a .csv file that will make it more useful. In other words, CSV is a "dumb" format and converting it to .xls will (probably) increase file size but not make the format any smarter.
Curtis' suggestion of POI is the first thing that would come to my mind too.
If you're doing this conversion on a Windows machine, another alternative could be Jacob, a Java-COM bridge that would allow you to effectively remote control Excel from a Java program so as to do things like open a file and save in a different format, perhaps even applying some formatting changes or such.
Finally, I've also had some success doing SQL INSERTs (via JDBC) into an Excel worksheet accessed via the JDBC-ODBC bridge. i.e. ODBC can make an Excel file look like a database. It's not very flexible though, you can't ask the DB to create arbitrarily named .XLS files.
EDIT:
It looks to me like readLine() is already not giving you whole lines. How is it to know that carriage return is not a line terminator? You should be able to verify this with debug print statements right after the readLine().
If this is indeed so, it would suck because the way forward would be for you to
either recognize incomplete lines and paste them together after the fact,
or write your own substitute for readLine(). A simple approach would be to read character by character, replacing CRs within a CSV string and accumulating text in a StringBuilder until you feel you have a complete line.
Both alternatives are work you probably weren't looking forward to.

If you want to read or write XLS or XLSX files in Java, Apache POI is a good bet: http://poi.apache.org/

Copy paste the below program,I ran the program and it is working fine,Let me know if you have any concerns on this program.(You need Apache POI Jar to run this program)
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Cell;
public class CSVToExcelConverter {
public static void main(String args[]) throws IOException
{
ArrayList arList=null;
ArrayList al=null;
String fName = "test.csv";
String thisLine;
int count=0;
FileInputStream fis = new FileInputStream(fName);
DataInputStream myInput = new DataInputStream(fis);
int i=0;
arList = new ArrayList();
while ((thisLine = myInput.readLine()) != null)
{
al = new ArrayList();
String strar[] = thisLine.split(",");
for(int j=0;j<strar.length;j++)
{
al.add(strar[j]);
}
arList.add(al);
System.out.println();
i++;
}
try
{
HSSFWorkbook hwb = new HSSFWorkbook();
HSSFSheet sheet = hwb.createSheet("new sheet");
for(int k=0;k<arList.size();k++)
{
ArrayList ardata = (ArrayList)arList.get(k);
HSSFRow row = sheet.createRow((short) 0+k);
for(int p=0;p<ardata.size();p++)
{
HSSFCell cell = row.createCell((short) p);
String data = ardata.get(p).toString();
if(data.startsWith("=")){
cell.setCellType(Cell.CELL_TYPE_STRING);
data=data.replaceAll("\"", "");
data=data.replaceAll("=", "");
cell.setCellValue(data);
}else if(data.startsWith("\"")){
data=data.replaceAll("\"", "");
cell.setCellType(Cell.CELL_TYPE_STRING);
cell.setCellValue(data);
}else{
data=data.replaceAll("\"", "");
cell.setCellType(Cell.CELL_TYPE_NUMERIC);
cell.setCellValue(data);
}
//*/
// cell.setCellValue(ardata.get(p).toString());
}
System.out.println();
}
FileOutputStream fileOut = new FileOutputStream("test.xls");
hwb.write(fileOut);
fileOut.close();
System.out.println("Your excel file has been generated");
} catch ( Exception ex ) {
ex.printStackTrace();
} //main method ends
}
}

The tools in Excel are not adequate for what the OP wants to do. He's on the right track there. Excel cannot import multiple CSV files into different worksheets in the same file, which is why you'd want to do it in code. My suggestion is to use OpenCSV to read the CSV, as it can automatically correct for newlines in data and missing columns, and it's free and open source. It's actually very, very robust and can handle all sorts of different non-standard CSV files.

You wrote:
I have something to manage csv files
already in place and I need the extra
compatibility for other programs.
What are those other programs? Are they required to access your data through Excel files, or could they work with an JDBC or ODBC connection to a database? Using a database as the central location, you could extract the data into CSV files or other formats as needed.

I created a small software called csv2xls. It needs Java.

Related

Unable to open Excel file after excecuting my java apache poi program and i am using file output stream

I am using Apache poi to extract Mysql data to an Excel file. The code is running correctly but when I am trying to open the excel file it is showing error.
package com.telkomsel.excel;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.ArrayList;
import java.util.HashMap;
import org.apache.poi.xssf.usermodel.XSSFCell;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import com.telkomsel.configuirator.Configurator;
import com.telkomsel.dbconnection.DBConnection;
import com.telkomsel.service.TelkomselEntities;
public class TelkomselExcel {
DBConnection db = new DBConnection();
static Configurator configurator = null;
Connection conn = null;
static Statement statement = null;
static ResultSet resultSet = null;
public static HashMap<Integer, TelkomselEntities> getTelkomselData(Statement statement) {
configurator = new Configurator();
String Query = configurator.getProperty("sql_query1");
HashMap<Integer, TelkomselEntities> all = null;
TelkomselEntities smsModel = null;
try {
all = new HashMap<Integer, TelkomselEntities>();
resultSet = statement.executeQuery(Query);
while (resultSet.next()) {
int hour = resultSet.getInt("hour(timestamp)");
String count = resultSet.getString("count(1)");
smsModel = new TelkomselEntities(hour, count, count, count);
all.put(hour, smsModel);
}
smsModel = new TelkomselEntities();
FileInputStream fis = new FileInputStream(new File("Tracker.xlsx"));
XSSFWorkbook workbook = new XSSFWorkbook(fis);
XSSFSheet worksheet = workbook.getSheetAt(0);
XSSFRow row = null;
XSSFCell cell;
int i = 1;
for (Integer l : all.keySet()) {
TelkomselEntities us = all.get(l);
row = worksheet.createRow(i);
cell = row.createCell(2);
cell.setCellValue(us.getHour());
cell = row.createCell(3);
cell.setCellValue(us.getCharge_Count());
i++;
}
fis.close();
FileOutputStream output_file = new FileOutputStream(new File("Tracker.xlsx"),true);
System.out.println("SUCCESS");
workbook.write(output_file);
workbook.close();
output_file.flush();
output_file.close();
} catch (Exception e) {
System.out.println(e);
}
return all;
}
}
I think file output stream is creating problem as it converts data into byte codes. i tried every thing but doesn't work. my excel file is not working

As you supposed, the problem hides inside the line:
FileOutputStream output_file = new FileOutputStream(new File("Tracker.xlsx"),true);
When creating a new XSSFWorkbook Java Object from an existing excel (which you want to update), that XSSFWorkbook is initially created based on your excel file content, then it is totally independent from it.The proof of this is that all changes to the XSSFWorkbook Java Object ARE NOT going to affect the original excel file at all. Apache Poi works that way!
This is the reason why once you're done editing your XSSFWorkbook you have to save it as a new excel file (using a FileOutputStream) overriding the original one (in a sense, you're now updating your excel file with all your changes).
But as the docs says, you're telling FileOutputStream not to override the original excel file with the new and updated one but to append the second to the first one, upsi dupsi! You're creating a single file which contains both all the bytes of the original old file and all the bytes of the new updated one!
To solve the problem, just use instead:
FileOutputStream output_file = new FileOutputStream(new File("Tracker.xlsx"),false);
or
FileOutputStream output_file = new FileOutputStream(new File("Tracker.xlsx"));
and you're done!
Edit: learn Apache Poi before using Apache Poi
It seems that you're using FileOutputStream wrong because you don't know how Apache Poi works and how to use it. You might want to study a little bit about it before using it, the web is full of examples and tutorials! Here they are some examples provided by Apache Poi itself, you might want to have a look at them.
As I said before, the XSSFWorkbook is initialized with all the content of your original excel file. So if you start filling your XSSFSheet from the second line (that's what you're actually doing with your code) you are literally asking to your XSSFWorkbook to override existing data with new one.
You have to improve your code, searching for already existing data in rows and cells and not overriding it if you don't want to.
Rows and cells of each XSSFSheet of your XSSFWorkbook are numbered using 0-based indexes (that's the reason why your code, which starts filling rows from index 1, is filling rows starting from the second one).
With the method XSSFSheet#getRow(int rownum) you can retreive any row from the current XSSFSheet indicating its 0-based index. If this method returns null, then the row you're asking for has never been used and you have to create it using the method XSSFSheet#createRow(int rownum). If it doesn't, then the row you're asking for has already been used and contains some data in some of its cells.
With the method XSSFRow#getCell(int cellnum) you can retrieve any cell from the current XSSFRow indicating its 0-based index. If this method returns null, then the cell you're asking for has never been used and you have to create it using the method XSSFRow#createCell(int cellnum, CellType celltype). If it doesn't, then the cell you're asking for has already been used and contains some data in it.
You can retrieve the CellType of an existing XSSFCell with the method XSSFCell#getCellType().
You can retreive the content of an existing XSSFCell (on the basis of its CellType) using such methods as XSSFCell#getStringCellValue(), XSSFCell#getNumericCellValue() or XSSFCell#getBooleanCellValue().
Other useful methods are XSSFSheet#getLastRowNum() and XSSFRow#getLastCellNum(). The first one returns the index of the last already used row inside your sheet, the second one returns the index of the first not used cell inside your row.
Here it is an example for you (filling 42 rows of your sheet after the last existing one):
public static void main(String[] args) throws EncryptedDocumentException, FileNotFoundException, IOException {
// Step 1: load your excel file as a Workbook
String excelFilePath = "D:\\Desktop\\textExcel.xlsx";
XSSFWorkbook workbook = (XSSFWorkbook) WorkbookFactory.create(new FileInputStream(excelFilePath));
// Step 2: modify your Workbook as you prefer
XSSFSheet sheet = workbook.getSheetAt(0);
int firstUnusedRowIndex = sheet.getLastRowNum() + 1;
for (int rowIndex = firstUnusedRowIndex ; rowIndex < firstUnusedRowIndex + 42 ; rowIndex++) {
sheet.createRow(rowIndex).createCell(0, CellType.STRING).setCellValue("New Row n°" + (rowIndex - firstUnusedRowIndex + 1));
}
// Step 3: update the original excel file
FileOutputStream outputStream = new FileOutputStream(excelFilePath);
workbook.write(outputStream);
workbook.close();
outputStream.close();
}

Unexpected record type (org.apache.poi.hssf.record.HyperlinkRecord)

The problem:
I'm just trying to open it .xls file using the Apache-poi 4.1.0 library and it gives the same error as 4 years ago in a similar question.
I already tried
to put version 3.12-3.16.
3.13 as well
All versions can open blank .xls and filled by myself but not this one.
This document is generated automatically and I need to make a program that accepts it.
I already made a .Net standart library C# which is work, I tried to use xamarin android it's a horror, the app weighs 50 mb vs 3 mb due to various terrible SDK link errors, but that's a different story. So I decided to do it on Kotlin.
Code is from the documentation
You can check file on git
val inputStream = FileInputStream("./test.xls")
val wb = HSSFWorkbook(inputStream)
I expect no errors while opening xls.
Actual output is
Exception in thread "main" java.lang.RuntimeException: Unexpected record type (org.apache.poi.hssf.record.HyperlinkRecord)
at org.apache.poi.hssf.record.aggregates.RowRecordsAggregate.<init>(RowRecordsAggregate.java:97)
at org.apache.poi.hssf.model.InternalSheet.<init>(InternalSheet.java:183)
at org.apache.poi.hssf.model.InternalSheet.createSheet(InternalSheet.java:122)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:354)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:400)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:381)
at ru.plumber71.toolbox.ExcelParcerKt.main(ExcelParcer.kt:19)
at ru.plumber71.toolbox.ExcelParcerKt.main(ExcelParcer.kt)
The document will not be modified in any way. If there any other libraries to just read the dataset or strings from the .xls file will be OK.

After some investigation I found the problem with your test.xls file.
According the file format specifications, all HyperlinkRecords should be together in the Hyperlink Table. It is contained in the Sheet Substream following the cell records. In your case the HyperlinkRecords are between other records (between NumberRecords and LabelSSTRecords in that case). So I suspect it was not Excel what had created that test.xls file.
Excelmight be tolerant enough to open that file nevertheless. But you cannot expect that apache poi also tries to tolerate all possible violations in file format. If you open the file using Excel and then re-save it, apache poi is able creating the Workbookafter that.
Apache poi is not able repairing this as Excel can do. But one could read the POIFSFileSystem a low level way and filtering out the HyperlinkRecords that are between other records. That way one could read the content using apache poi, of course except the hyperlinks.
Example:
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.poifs.filesystem.DirectoryNode;
import org.apache.poi.hssf.record.Record;
import org.apache.poi.hssf.record.NameRecord;
import org.apache.poi.hssf.record.NameCommentRecord;
import org.apache.poi.hssf.record.HyperlinkRecord;
import org.apache.poi.hssf.record.RecordFactoryInputStream;
import org.apache.poi.hssf.record.RecordFactory;
import org.apache.poi.hssf.model.RecordStream;
import org.apache.poi.hssf.model.InternalWorkbook;
import org.apache.poi.hssf.model.InternalSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFName;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.util.CellReference;
import java.util.List;
import java.util.ArrayList;
import java.lang.reflect.Field;
import java.lang.reflect.Method;
import java.lang.reflect.Constructor;
class ExcelOpenHSSF {
public static void main(String[] args) throws Exception {
String fileName = "test(2).xls";
try (InputStream is = new FileInputStream(fileName);
POIFSFileSystem fileSystem = new POIFSFileSystem(is)) {
//find workbook directory entry
DirectoryNode directory = fileSystem.getRoot();
String workbookName = "";
for(String wbName : InternalWorkbook.WORKBOOK_DIR_ENTRY_NAMES) {
if(directory.hasEntry(wbName)) {
workbookName = wbName;
break;
}
}
InputStream stream = directory.createDocumentInputStream(workbookName);
//loop over all records and manipulate if needed
List<Record> records = new ArrayList<Record>();
RecordFactoryInputStream recStream = new RecordFactoryInputStream(stream, true);
//here we filter out the HyperlinkRecords that are between other records (NumberRecords and LabelSSTRecords in that case)
//System.out.println prints the problematic records
Record record1 = null;
Record record2 = null;
while ((record1 = recStream.nextRecord()) != null) {
record2 = recStream.nextRecord();
if (!(record1 instanceof HyperlinkRecord) && (record2 instanceof HyperlinkRecord)) {
System.out.println(record1);
System.out.println(record2);
records.add(record1);
} else if ((record1 instanceof HyperlinkRecord) && !(record2 instanceof HyperlinkRecord)) {
System.out.println(record1);
System.out.println(record2);
records.add(record2);
} else {
records.add(record1);
if (record2 != null) records.add(record2);
}
}
//now create the HSSFWorkbook
//see https://svn.apache.org/viewvc/poi/tags/REL_4_1_0/src/java/org/apache/poi/hssf/usermodel/HSSFWorkbook.java?view=markup#l322
InternalWorkbook internalWorkbook = InternalWorkbook.createWorkbook(records);
HSSFWorkbook wb = HSSFWorkbook.create(internalWorkbook);
int recOffset = internalWorkbook.getNumRecords();
Method convertLabelRecords = HSSFWorkbook.class.getDeclaredMethod("convertLabelRecords", List.class, int.class);
convertLabelRecords.setAccessible(true);
convertLabelRecords.invoke(wb, records, recOffset);
RecordStream rs = new RecordStream(records, recOffset);
while (rs.hasNext()) {
InternalSheet internelSheet = InternalSheet.createSheet(rs);
Constructor constructor = HSSFSheet.class.getDeclaredConstructor(HSSFWorkbook.class, InternalSheet.class);
constructor.setAccessible(true);
HSSFSheet hssfSheet = (HSSFSheet)constructor.newInstance(wb, internelSheet);
Field _sheets = HSSFWorkbook.class.getDeclaredField("_sheets");
_sheets.setAccessible(true);
#SuppressWarnings("unchecked")
List<HSSFSheet> sheets = (ArrayList<HSSFSheet>)_sheets.get(wb);
sheets.add(hssfSheet);
}
for (int i = 0 ; i < internalWorkbook.getNumNames() ; ++i){
NameRecord nameRecord = internalWorkbook.getNameRecord(i);
Constructor constructor = HSSFName.class.getDeclaredConstructor(HSSFWorkbook.class, NameRecord.class, NameCommentRecord.class);
constructor.setAccessible(true);
HSSFName name = (HSSFName)constructor.newInstance(wb, nameRecord, internalWorkbook.getNameCommentRecord(nameRecord));
Field _names = HSSFWorkbook.class.getDeclaredField("names");
_names.setAccessible(true);
#SuppressWarnings("unchecked")
List<HSSFName> names = (ArrayList<HSSFName>)_names.get(wb);
names.add(name);
}
//now the workbook is created properly
System.out.println(wb);
/*
//getting the data
DataFormatter formatter = new DataFormatter();
Sheet sheet = wb.getSheetAt(0);
for (Row row : sheet) {
for (Cell cell : row) {
CellReference cellRef = new CellReference(row.getRowNum(), cell.getColumnIndex());
System.out.print(cellRef.formatAsString());
System.out.print(" - ");
String text = formatter.formatCellValue(cell);
System.out.println(text);
}
}
*/
}
}
}

I was able to open a file of this "corrupted" type by using JExcel API
But using poi.apache.org also opens the file if manually resave it using excel application. (It may not be suitable for someone)
Sorry that it was asking strange questions. Thank you all and hope that someone may find useful.
val inputStream = FileInputStream("./testCorrupted.xls")
val workbook = Workbook.getWorkbook(inputStream)
val sheet = workbook.getSheet(0)
val cell1 = sheet.getCell(0, 0)
print(cell1.contents + ":")

How to modify a large Excel file when memory is an issue

As the title states, I have a large Excel file (>200 sheets) that I need to add data to. I do not want to create new cells, I only want to modify existing ones.
I tried using Apache Poi but my application runs out of memory even with Xms and Xmx set to 8g. The only option for low-memory writing is seemingly with SXSSF. The problem is that it only works for creating new cells and does not allow modifying existing ones. I also tried using the event API in order to process the sheet's XML, but it only seems to work for read operations. I've been trying to use an XMLEventWriter but I can't find a way to access the sheets' XML data which works for writing. Is there a way to access an Excel file's XML data other than with XSSFReader?

As told in comments above there is no one fits all solution using pure XML reading and writing the Office Open XML spreadsheets. Each Excel workbook needs it's own code dependent on it's structure and on what content shall be changed.
This is because apache poi's high level classes provides a meta level to avoid this. But this needs memory to work. And for very big workbooks it needs much memory. To avoid memory consumption through manipulating the XML directly this meta level is not usable. So one must know the XML structure of a worksheet and the meaning of the XML elements used.
So if we have a Excel workbook having a first sheet having strings in column A and numbers in column B, then we could changing every fifth row using StAX for manipulating the XML directly using following code:
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.openxml4j.opc.PackagePart;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst;
import javax.xml.stream.XMLEventFactory;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLEventWriter;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.events.Characters;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.XMLEvent;
import javax.xml.namespace.QName;
import java.io.File;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.regex.Pattern;
class StaxReadAndChangeTest {
public static void main(String[] args) throws Exception {
File file = new File("ReadAndWriteTest.xlsx");
OPCPackage opcpackage = OPCPackage.open(file);
//since there are strings in the sheet data, we need the SharedStringsTable
PackagePart sharedstringstablepart = opcpackage.getPartsByName(Pattern.compile("/xl/sharedStrings.xml")).get(0);
SharedStringsTable sharedstringstable = new SharedStringsTable();
sharedstringstable.readFrom(sharedstringstablepart.getInputStream());
//get first worksheet
PackagePart sheetpart = opcpackage.getPartsByName(Pattern.compile("/xl/worksheets/sheet1.xml")).get(0);
//get XML reader and writer
XMLEventReader reader = XMLInputFactory.newInstance().createXMLEventReader(sheetpart.getInputStream());
XMLEventWriter writer = XMLOutputFactory.newInstance().createXMLEventWriter(sheetpart.getOutputStream());
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
int rowsCount = 0;
int colsCount = 0;
boolean cellAfound = false;
boolean cellBfound = false;
while(reader.hasNext()){ //loop over all XML in sheet1.xml
XMLEvent event = (XMLEvent)reader.next();
if(event.isStartElement()) {
StartElement startElement = (StartElement)event;
QName startElementName = startElement.getName();
if(startElementName.getLocalPart().equalsIgnoreCase("row")) { //start element of row
rowsCount++;
colsCount = 0;
} else if (startElementName.getLocalPart().equalsIgnoreCase("c")) { //start element of cell
colsCount++;
cellAfound = false;
cellBfound = false;
if (rowsCount % 5 == 0) { // every 5th row
if (colsCount == 1) { // cell A
cellAfound = true;
} else if (colsCount == 2) { // cell B
cellBfound = true;
}
}
} else if (startElementName.getLocalPart().equalsIgnoreCase("v")) { //start element of value
if (cellAfound) {
// create new rich text content for cell A
CTRst ctstr = CTRst.Factory.newInstance();
ctstr.setT("changed String Value A" + (rowsCount));
//int sRef = sharedstringstable.addEntry(ctstr);
int sRef = sharedstringstable.addSharedStringItem(new XSSFRichTextString(ctstr));
// set the new characters for A's value in the XML
if (reader.hasNext()) {
writer.add(event); // write the old event
event = (XMLEvent)reader.next(); // get next event - should be characters
if (event.isCharacters()) {
Characters value = eventFactory.createCharacters(Integer.toString(sRef));
event = value;
}
}
} else if (cellBfound) {
// set the new characters for B's value in the XML
if (reader.hasNext()) {
writer.add(event); // write the old event
event = (XMLEvent)reader.next(); // get next event - should be characters
if(event.isCharacters()) {
double oldValue = Double.valueOf(((Characters)event).getData()); // old double value
Characters value = eventFactory.createCharacters(Double.toString(oldValue * rowsCount));
event = value;
}
}
}
}
}
writer.add(event); //by default write each read event
}
writer.flush();
//write the SharedStringsTable
OutputStream out = sharedstringstablepart.getOutputStream();
sharedstringstable.writeTo(out);
out.close();
opcpackage.close();
}
}
This will be much less memory consuming than apache poi's XSSF classes. But, as said, it only works exactly for this kind of Excel workbook having a first sheet having strings in column A and numbers in column B.

How to store multiple columns of a csv dataset in a single variable in java so that the variable can be used as input feature for ml model

I have done this in python. Here is my python code:
Here X is the input variable in which I stored all the
input columns of csv file and y is the target variable.
dataset=pandas.read_csv("newone.csv")
features = [0,1,4,5,6,7]
X =dataset.iloc[:,features]
y =dataset.iloc[:,2]
How can I do this in java?
Here is my java code in which I read the csv file but
I am able to store only one column value of the csv in a variable.
public static void main(String[] args) throws IOException {
BufferedReader reader = Files.newBufferedReader(Paths.get("C:/Users/N/Desktop/newone.csv"));
CSVParser csvParser = new CSVParser(reader,
CSVFormat.DEFAULT.withHeader("Enounter", "Relation", "Event", "Tag","Encounter_no", "Diagonosis", "User_Id", "Client_Id").withIgnoreHeaderCase().withTrim());
for (CSVRecord csvRecord : csvParser) {
encntr=csvRecord.get("Encounter");
}
}
----------

Consider using MyKong's code on how to read CSV files in Java. Something like the following snippet below (heavily borrowed from MyKong's better version):
br = new BufferedReader(new FileReader(csvFile));
while ((line = br.readLine()) != null) {
String[] valuesFromLine = line.split(cvsSplitBy);
String secondValue = valuesFromLine[1];
// do something with second value
}

This depends entirely on what the relationship between your columns is like. It is impossible to answer this question in a general manner as this changes from dataset to dataset and even from algorithm to algorithm, but here are a few approaches you might like to try:
Use Principal Component Analysis to identify if there any variables in your desired tuple of columns you can omit because they contribute very little to the row class variable.
Use Feature Hashing to reduce the dimensionality of your dataset by bundling together related properties (this does not work as a blanket solution - indeed, nothing in ML ever does. Try it before you commit to it).
If the columns you'd like to unite are numerical, you may want to think of an algorithm to join them in a unique way, or a way which makes sense. If they are categorical, a sparse one-hot bit vector may help you.

Download these jar https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml/3.15 and https://mvnrepository.com/artifact/org.apache.poi/poi/3.15
and add them to your build path
import java.io.FileInputStream;
import java.io.IOException;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
public class ExcelReader {
public static void main(String[] args) throws IOException
{
//specify your file path
FileInputStream file = new FileInputStream("D:\\test.xlsx");
XSSFWorkbook workbook = new XSSFWorkbook(file);
//to fetch sheet
XSSFSheet sheet = workbook.getSheetAt(0);
// for iterating through rows
for(int c=0;c<=sheet.getLastRowNum();c++)
{
// for iterating through columns
Row rows = sheet.getRow(c);
for(int b=0;b<=rows.getLastCellNum();b++)
{
Cell cells=rows.getCell(b);
//to read cell value as string
String comp=cells.getStringCellValue();
}
}
}
}

Creating and Editing Excel Documents (.csv) with Java

For this project I'm working on, I want to take multiple excel sheets and then merge them into one, manipulating the data as I please to make everything a little more readable.
What would be the best way to open files, read their contents, store that content, create a new file (.csv), then paste the information in the organization of my choosing?
I definitely need to stick to java, as this will be part of a pre-existing automated process and I don't want to have to change everything to another language.
Is there a useful package out there that I should know about?
Many thanks
Justian

I think any serious work in Excel should consider Joel's solution of letting Office do it for you on a Windows machine you call remotely if necessary. If, however, your needs are simple enough or you really need a pure Java solution, Apache's POI library does a good enough job.

As far as I know, csv is not excel-specific, but rather just a "comma-separated values"-file.
So this might help you.

Writing CSV files is usually very simple, for obvious reasons. You can write your own helper class to do it. The caveat is to ensure that you do not have your delimeter in any of the outputs.
Reading CSV is trickier. There isn't a standard library like there is in Python (a much better language, IMHO, for doing CSV processing), but if you search for it there are a lot of decent free implementations around.
The biggest question is the internal representation in your program: Depending on the size of your inputs and outputs, keeping everything in memory may be out of the question. Can you do everything in one pass? (I mean, read some, write some, etc.)
You may also want to use sparse representations rather than just represent all the spreadsheets in an array.

Maybe you should try this one:
Jxcell,it is a java spreadsheet component,and can read/write/edit all xls/xlsx/csv files.

Try this code
import java.util.*;
import java.util.Map.Entry;
import java.util.concurrent.TimeoutException;
import java.util.logging.Logger;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbookFactory;
public class App {
public void convertExcelToCSV(Sheet sheet, String sheetName) {
StringBuffer data = new StringBuffer();
try {
FileOutputStream fos = new FileOutputStream("C:\\Users\\" + sheetName + ".csv");
Cell cell;
Row row;
Iterator<Row> rowIterator = sheet.iterator();
while (rowIterator.hasNext()) {
row = rowIterator.next();
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
cell = cellIterator.next();
CellType type = cell.getCellTypeEnum();
if (type == CellType.BOOLEAN) {
data.append(cell.getBooleanCellValue() + ",");
} else if (type == CellType.NUMERIC) {
data.append(cell.getNumericCellValue() + ",");
} else if (type == CellType.STRING) {
data.append(cell.getStringCellValue() + ",");
} else if (type == CellType.BLANK) {
data.append("" + ",");
} else {
data.append(cell + ",");
}
}
data.append('\n');
}
fos.write(data.toString().getBytes());
fos.close();
}
catch (FileNotFoundException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}
}
public static void main(String [] args)
{
App app = new App();
String path = "C:\\Users\\myFile.xlsx";
InputStream inp = null;
try {
inp = new FileInputStream(path);
Workbook wb = WorkbookFactory.create(inp);
for(int i=0;i<wb.getNumberOfSheets();i++) {
System.out.println(wb.getSheetAt(i).getSheetName());
app.convertExcelToCSV(wb.getSheetAt(i),wb.getSheetAt(i).getSheetName());
}
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
finally {
try {
inp.close();
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.