How to convert a huge .csv file to excel using POI - java

I have Java code that converts CSV to xlsx. It works fine with a small file size. Now I have a CSV file with 2 lakh records (200,000) and on conversion I am getting an out of memory error.
I tried changing the workbook to SXSSFWorkbook and increasing the heap size and Java memory size to -Xms8G -Xmx10G. Even this did not work. I tried it on a UNIX box.
On searching, I got some code about using BigGridDemo. Can anyone help me in customizing that to reading a .csv file and then using its logic to write to xlsx, or any other solution?
try { FileReader input = new FileReader(strFileToConvert);
BufferedReader bufIn = new BufferedReader(input);
if(strExcel_Typ.equals("-xls"))
{
wbAll = new HSSFWorkbook();
}
else
{
wbAll = new SXSSFWorkbook(100);
wbAll.setCompressTempFiles(true);
}
sheetAll = wbAll.createSheet();
rowAll = null;
cellAll = null;
shoRowNumAll = 0;
// do buffered reading from a file
while ((line = bufIn.readLine()) != null)
{
intCntr++;
//if there is any data in the line
if (line.length() > 0)
{
System.out.println(shoRowNumAll);
//create a new row on the spreadsheet
rowAll = sheetAll.createRow((int)shoRowNumAll);
shoRowNumAll++;
if (line.indexOf("\"", 0) > 0)
{
if (intCntr == 1)
{
//only issue the message the first time quotes are found
System.out.println("Double quotes found. Stripping double quotes from file");
}
line = line.replaceAll("\"", "");
}
//if its the first row and no delimiters found, there is a problem
if (line.indexOf(strDelim, 0) == -1 && intCntr == 1)
{
System.exit(1);
}
processLine(line);
((SXSSFSheet) sheetAll).flushRows(100);
}
}
bufIn.close();
/write the excel file
try
{ String file = strOutPutFile;
ExcelOutAll = new FileOutputStream(file);
wbAll.write(ExcelOutAll);
}
catch (IOException e)
{
System.err.println(e.toString());
}
ExcelOutAll.close();
Processline Method:
processLine(String line)
{
//find the first next delimiter starting in position 0
intNxtComma = line.indexOf(strDelim, 0);
while (intCurPosInLine < line.trim().replaceAll(",","").length())
{
strCellContent = line.substring((intCurPosInLine), intNxtComma);
//create a new cell on the new row
cellAll = rowAll.createCell(intCellNum);
//set the font defaults
Font font_couriernew_10 = wbAll.createFont();
font_couriernew_10.setFontHeightInPoints((short)10);
font_couriernew_10.setFontName("Courier New");
CellStyle cellStyle = wbAll.createCellStyle();
//if its the first row, center the text
if (shoRowNumAll == 1)
{
cellStyle.setAlignment(CellStyle.ALIGN_CENTER);
font_couriernew_10.setBoldweight(XSSFFont.BOLDWEIGHT_BOLD);
}
cellStyle.setFont(font_couriernew_10);
// if the col. needs to be numeric, set the cell format to number
if ((strNumericCols.indexOf(Integer.toString(intCellNum), 0) > -1) && (intCntr > 1))
{
DataFormat datafrmt = wbAll.createDataFormat();
cellStyle.setDataFormat(datafrmt.getFormat("$#,##0.00"));
}
cellAll.setCellStyle(cellStyle);
//populate the cell
if ((strNumericCols.indexOf(Integer.toString(intCellNum), 0) > -1) && (intCntr > 1))
{
//if the col. needs to be numeric populate with a number
if(strCellContent != null && !"".equals(strCellContent.trim())){
douCellContent = Double.parseDouble(strCellContent.replaceAll(",",""));
cellAll.setCellValue(douCellContent);
}
}
else
{
cellAll.setCellValue(strCellContent.trim());
}
intCellNum++;
intCurPosInLine = intNxtComma + 1;
//if we dont find anymore delimiters, set the variable to the line length
if (line.indexOf(strDelim, intCurPosInLine) == -1)
{
intNxtComma = line.trim().length();
}
else
{
intNxtComma = line.indexOf(strDelim, intNxtComma + 1);
}
}
}

Related

I want to write data into excel(.xlsx file) using Apache poi

I want to write data into excel(.xlsx file) using Apache poi. but getting some error while writing a data. I have followed this video " How to read/write data from Excel file using Apache POI API in Selenium || Latest POI Version", I m able to read data but while writing I m getting this error " Cannot invoke "org.apache.poi.xssf.usermodel.XSSFCell.getStringCellValue()" because the return value of "org.apache.poi.xssf.usermodel.XSSFRow.getCell(int)" is null ", basically nullpointerexception.
enter code here
String resourceGroupNameElement = driver.findElement(By.xpath(FrameworkValidator_Constants.Constants.RESOURCE_GROUP_NAME_XPATH)).getText();
String expectedResult = reader.getCellData("RG",6,2);
if( resourceGroupNameElement== expectedResult) {
String status= "pass";
System.out.println(status);
}
else {
String status="fail";
System.out.println(status);
}
//reader.setCellData("RG", "STATUS/PASS/FAIL", 2, status);
System.out.println(status);
reader.setCellData("RG","ACTUAL RESULT" , 2, resourceGroupNameElement);
Assert.assertEquals(resourceGroupNameElement, expectedResult);
#######
It is showing error in this section
public String setCellData(String sheetName, String colName, int rowNum, String data) {
try {
fis = new FileInputStream(path);
workbook = new XSSFWorkbook(fis);
if (rowNum <= 0)
return "";
int index = workbook.getSheetIndex(sheetName);
int colNum = -1;
if (index == -1)
return "";
sheet = workbook.getSheetAt(index);
row = sheet.getRow(0);
for (int i = 0; i < row.getLastCellNum(); i++) {
// System.out.println(row.getCell(i).getStringCellValue().trim());
if (row.getCell(i).getStringCellValue().trim().equals(colName))
colNum = i;
}
if (colNum == -1)
return "";
sheet.autoSizeColumn(colNum);
row = sheet.getRow(rowNum - 1);
if (row == null)
row = sheet.createRow(rowNum - 1);
cell = row.getCell(colNum);
if (cell == null)
cell = row.createCell(colNum);
// cell style
// CellStyle cs = workbook.createCellStyle();
// cs.setWrapText(true);
// cell.setCellStyle(cs);
cell.setCellValue(data);
fileOut = new FileOutputStream(path);
workbook.write(fileOut);
fileOut.close();
} catch (Exception e) {
e.printStackTrace();
return "";
}
return "";
}
so can anybody tell me where I m going wrong.
Have a look at the HOWTO and examples.
You will notice that there are calls for creating a row or creating a cell. Unless you do so, the row/cell does not exist and your getCell() function will return null.
Changing for loop condition part i.e. i < row.getLastCellNum(); to i < row.getLastCellNum()-1; can resolve this issue.
getLastCellNum() returns index plus one and once the counter will reach to end value, getCell(i) can point to the null value as per your code.

Merging sorted Files using multithreading

Multithreading is new to me so sorry for mistakes.
I have written the below program which merges files with mulithreading but I am not able to figure out how to manage the last file and after one iteration how to merge the newly created files.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.FileWriter;
import java.util.ArrayList;
public class MergerSorter extends Thread {
int fileNumber = 1;
public static void main(String[] args) {
startMergingfiles(9);
}
public MergerSorter(int fileNum) {
fileNumber = fileNum;
}
public static void startMergingfiles(int numberOfFiles) {
int objectcounter = 0;
while (numberOfFiles != 1) {
try {
ArrayList<MergerSorter> objectList = new ArrayList<MergerSorter>();
for (int j = 1; j <= numberOfFiles; j = j + 2) {
if (numberOfFiles == j) {// Last Single remaining File
} else {
objectList.add(new MergerSorter(j));
objectList.get(objectcounter).start();
objectList.get(objectcounter).join();
objectcounter++;
}
}
objectcounter = 0;
numberOfFiles = numberOfFiles / 2;
} catch (Exception e) {
System.out.println(e);
}
}
}
public void run() {
try {
FileReader fileReader1 = new FileReader("src/externalsort/" + Integer.toString(fileNumber));
FileReader fileReader2 = new FileReader("src/externalsort/" + Integer.toString(fileNumber + 1));
BufferedReader bufferedReader1 = new BufferedReader(fileReader1);
BufferedReader bufferedReader2 = new BufferedReader(fileReader2);
String line1 = bufferedReader1.readLine();
String line2 = bufferedReader2.readLine();
FileWriter tmpFile = new FileWriter("src/externalsort/" + Integer.toString(fileNumber) + "op.txt", false);
int whichFileToRead = 0;
boolean file_1_reader = true;
boolean file_2_reader = true;
while (file_1_reader || file_2_reader) {
if (file_1_reader == false) {
tmpFile.write(line2 + "\r\n");
whichFileToRead = 2;
} else if (file_2_reader == false) {
tmpFile.write(line1 + "\r\n");
whichFileToRead = 1;
} else {
String value1 = line1.substring(0, 10);
String value2 = line2.substring(0, 10);
int ans = value1.compareTo(value2);
if (ans < 0) {
tmpFile.write(line1 + "\r\n");
whichFileToRead = 1;
} else if (ans > 0) {
tmpFile.write(line2 + "\r\n");
whichFileToRead = 2;
} else if (ans == 0) {
tmpFile.write(line1 + "\r\n");
whichFileToRead = 1;
}
}
if (whichFileToRead == 1) {
line1 = bufferedReader1.readLine();
if (line1 == null)
file_1_reader = false;
} else {
line2 = bufferedReader2.readLine();
if (line2 == null)
file_2_reader = false;
}
}
tmpFile.close();
bufferedReader1.close();
bufferedReader2.close();
fileReader1.close();
fileReader2.close();
} catch (Exception e) {
System.out.println(e);
}
}
}
I am trying to merge sorted files with multithreading. Say I have 50 files and I want to merge all these individual files into one final sorted file but I want to speed up and utilize every core by multi threading but I am not able to do it. And the files are big so they can't be placed in heap/RAM so I have to read every file and keep writing.
You can do this with merge sort, but instead of lots of little sorted lists, you'll need to use lots of little sorted files. Once you have broken all of the files down into small sorted files, you can start merging them together again until you end up with a single sorted file.
Unfortunately, you likely won't be able to achieve high CPU utilisation as much of the time will be spend waiting for disk I/O to complete.
Edit: just read your response to a comment and it sounds like you are asking for help on the last step of the merge sort. The graphics in the wiki link above will also help you understand. So, assuming all of your files are sorted, here we go:
Read 1 item from each file
Figure out which lowest/smallest/whatever and write that line to the result file
Read a new item from the file which just provided the last item
Repeat steps 2 and 3 until all files have been completely read.

How to put color to specific cells in ods file using java

Here I am able to merge/span the cell using 'setColumnSpannedNumber()' but could not set background color of the cell and alignment.I am using odfdom-java-0.8.6.jar currently..please suggest me a way to set the color for the cells. Thank you.
try
{
document = OdfSpreadsheetDocument.newSpreadsheetDocument();
OdfOfficeSpreadsheet contentRoot = document.getContentRoot();
Node node = contentRoot.getFirstChild();
while (node != null) {
contentRoot.removeChild(node);
node = contentRoot.getFirstChild();
}
} catch (Exception e) {
signature throws Exception
throw new ReportFileGenerationException("Cannot create new ODF spread sheet document. Error: "+ e.getMessage(), e);
}
OdfTable table = OdfTable.newTable(document);
for (int i = 0; i < report.size(); i++) {
List<String> row = report.get(i);
for (int j = 0; j < row.size(); j++) {
String str= row.get(j);
String newStr = str.replaceAll("[\u0000-\u001f]", "");
OdfTableCell cell = table.getCellByPosition(j, i);
if(i==0 && j==17)
{
cell.setColumnSpannedNumber(4);
cell.setCellBackgroundColor(new Color("#ffff00"));
cell.setHorizontalAlignment("center");
}
else if(i==0 && j==21)
{
cell.setColumnSpannedNumber(4);
}
else if(i==0 && j==25)
{
cell.setColumnSpannedNumber(4);
}
else if(i==0 && j==29)
{
cell.setColumnSpannedNumber(4);
}
else if(i==0 && j==33)
{
cell.setColumnSpannedNumber(4);
}
cell.setStringValue(newStr);
}
}
ByteArrayOutputStream os = new ByteArrayOutputStream();
try {
document.save(os);
return os.toByteArray();
} catch (Exception e) {
throw new ReportFileGenerationException("Cannot save the ODF spread sheet document as byte array. Error: "
+ e.getMessage(), e);
} finally {
Helper.close(os);
}
}
}
I use the API property CellBackColor, not CellBackgroundColor.
Also HoriJustify rather than HorizontalAlignment.
At least in StarBasic, this is how I set background colors:
Dim Yellow As Long : Yellow = 16777113
Dim Blue As Long : Blue = 13434879
Dim White As Long : White = -1
Dim Red As Long : Red = 15425853
cell.setCellBackColor(Yellow)
If I want a new color, I manually recolor the background and then use a macro to read out the Long value associated with that color.
And to center align:
cell.setHoriJustify(2)

HTML Formatted Cell value from Excel using Apache POI

I am using apache POI to read an excel document. To say the least, it is able to serve my purpose as of now. But one thing where I am getting struck is extracting the value of cell as HTML.
I have one cell wherein user will enter some string and apply some formatting(like bullets/numbers/bold/italic) etc.
SO when I read it the content should be in HTML format and not a plain string format as given by POI.
I have almost gone through the entire POI API but not able to find anyone. I want to remain the formatting of just one particular column and not the entire excel. By column I mean, the text which is entered in that column. I want that text as HTML text.
Explored and used Apache Tika also. However as I understand it can only get me the text but not the formatting of the text.
Please someone guide me. I am running out of options.
Suppose I wrote My name is Angel and Demon in Excel.
The output I should get in Java is My name is <b>Angel</b> and <i>Demon</i>
I've paste this as unicode to cell A1 of xls file:
<html><p>This is a test. Will this text be <b>bold</b> or <i>italic</i></p></html>
This html line produce this:
This is a test. Will this text be bold or italic
My code:
public class ExcelWithHtml {
// <html><p>This is a test. Will this text be <b>bold</b> or
// <i>italic</i></p></html>
public static void main(String[] args) throws FileNotFoundException,
IOException {
new ExcelWithHtml()
.readFirstCellOfXSSF("/Users/rcacheira/testeHtml.xlsx");
}
boolean inBold = false;
boolean inItalic = false;
public void readFirstCellOfXSSF(String filePathName)
throws FileNotFoundException, IOException {
FileInputStream fis = new FileInputStream(filePathName);
XSSFWorkbook wb = new XSSFWorkbook(fis);
XSSFSheet sheet = wb.getSheetAt(0);
String cellHtml = getHtmlFormatedCellValueFromSheet(sheet, "A1");
System.out.println(cellHtml);
fis.close();
}
public String getHtmlFormatedCellValueFromSheet(XSSFSheet sheet,
String cellName) {
CellReference cellReference = new CellReference(cellName);
XSSFRow row = sheet.getRow(cellReference.getRow());
XSSFCell cell = row.getCell(cellReference.getCol());
XSSFRichTextString cellText = cell.getRichStringCellValue();
String htmlCode = "";
// htmlCode = "<html>";
for (int i = 0; i < cellText.numFormattingRuns(); i++) {
try {
htmlCode += getFormatFromFont(cellText.getFontAtIndex(i));
} catch (NullPointerException ex) {
}
try {
htmlCode += getFormatFromFont(cellText
.getFontOfFormattingRun(i));
} catch (NullPointerException ex) {
}
int indexStart = cellText.getIndexOfFormattingRun(i);
int indexEnd = indexStart + cellText.getLengthOfFormattingRun(i);
htmlCode += cellText.getString().substring(indexStart, indexEnd);
}
if (inItalic) {
htmlCode += "</i>";
inItalic = false;
}
if (inBold) {
htmlCode += "</b>";
inBold = false;
}
// htmlCode += "</html>";
return htmlCode;
}
private String getFormatFromFont(XSSFFont font) {
String formatHtmlCode = "";
if (font.getItalic() && !inItalic) {
formatHtmlCode += "<i>";
inItalic = true;
} else if (!font.getItalic() && inItalic) {
formatHtmlCode += "</i>";
inItalic = false;
}
if (font.getBold() && !inBold) {
formatHtmlCode += "<b>";
inBold = true;
} else if (!font.getBold() && inBold) {
formatHtmlCode += "</b>";
inBold = false;
}
return formatHtmlCode;
}
}
My output:
This is a test. Will this text be <b>bold</b> or <i>italic</i>
I think it is what you want, i'm only show you the possibilities, i'm not using the best code practices, i'm just programming fast to produce an output.

How to read data from a specific column from csv file using jsp/java?

In my application I need to read a specific column of tab separated csv file using jsp. But I can read the data of full row not a specific column.
I need help this regard. Please help me
Thanks
mycode:
<%# page import="java.io.*"%>
<html>
<body>
<%
String fName = "c:\\csv\\myfile.csv";
String thisLine;
int count=0;
FileInputStream fis = new FileInputStream(fName);
DataInputStream myInput = new DataInputStream(fis);
int i=0;
%>
<table>
<%
while ((thisLine = myInput.readLine()) != null)
{
String strar[] = thisLine.split(",");
for(int j=0;j<strar.length;j++)
{
if(i!=0)
{
out.print(" " +strar[j]+ " ");
}
else
{
out.print(" <b>" +strar[j]+ "</b> ");
}
}
out.println("<br>");
i++;
}
%>
</table>
</body>
</html>
I don't think you can read specific column.Better to read entire row using CSVParser or you can read CSV line by line and split it and get String array then you can get specific column but yes you need to read whole row gain.
Try it.
String fName = "C:\\Amit\\abc.csv";
String thisLine;
int count = 0;
FileInputStream fis = new FileInputStream(fName);
DataInputStream myInput = new DataInputStream(fis);
int i = 0;
while ((thisLine = myInput.readLine()) != null) {
String strar[] = thisLine.split(",");
System.out.println(strar[3]);
// Here column 2
}
}
By this way you can read specific column.
I had a similar problem in Objective C the other day, but this is how I solved it.
This method assumes you know the column number of the data you want. (I.E. if you want column 1 of 6)
Read all the rows into strings and append them into one.
Data sample: (columns 1 to 6)
1,2,3,4,5,6
13,45,63,29,10,8
11,62,5,20,13,2
String 1 = 1,2,3,4,5,6
String 2 = 13,45,63,29,10,8
String 3 = 11,62,5,20,13,2
Then you should get this:
String combined = 1,2,3,4,5,6,13,45,63,29,10,8,11,62,5,20,13,2 //add in the missing "," when you concatenate strings
Next you need to split the string into an array of all values.
Use code somewhat like this: (written off the top of my head so may be off.)
String[] values = combined.split(",");
Now you should have something like this:
Values = `"1", "2", "3", ... etc`
The last step is to loop through the entire array and modulo for whatever column you need:
//Remember that java numbers arrays starting with 0.
//The key here is that all remainder 0 items fall into the first column. All remainder 1 items fall into the second column. And so on.
for(int i = 0; i < values.length(); i++)
{
//Column1 - Column6 -> array lists of size values.length/number of columns
//In this case they need to be size values.length/6
if(i % 6 == 0)
column1.add(values[i]);
else if(i % 6 == 1)
column2.add(values[i]);
else if(i % 6 == 2)
column3.add(values[i]);
else if(i % 6 == 3)
column4.add(values[i]);
else if(i % 6 == 4)
column5.add(values[i]);
else if(i % 6 == 5)
column6.add(values[i]);
}
~~~~~~~~~~~~~~~~
Edit:
You added code to your question. Above I was saving them into memory. You just loop through and print them out. In your while loop, split each line separately into an array and then either hardcode the column number or modulo the length of the array as the index.
public class ParseCSVs {
public static void main(String[] args) {
try {
// csv file containing data
String strFile = "./input//SIMNumbers.csv";
String line = "";
System.out.println("Enter line number to configure");
Scanner sc = new Scanner(System.in);
int lineNumber = sc.nextInt();
BufferedReader br = new BufferedReader(new FileReader(strFile));
if ((line = br.readLine()) != null) {
String cvsSplitBy = ",";
String blankCell = null;
// use comma as separator
String[] cols = line.split(cvsSplitBy);
for (int i = 0; i < cols.length; i++)
System.out.println("Coulmns = " + cols[i]);
// System.exit(0);
} else
System.out.println("No data found in csv");
} catch (IOException e) {
e.printStackTrace();
}
}

Categories