I've already tried exporting my database tables to CSV using the CSVWriter.
But my tables contain BLOB data. How can I include them in my export?
Then later on, im going to import that exported CSV using CSVReader. Can anyone share some concepts?
This is a part of my code for export
ResultSet res = st.executeQuery("select * from "+db+"."+obTableNames[23]);
int colunmCount = getColumnCount(res);
try {
File filename = new File(dir,""+obTableNames[23]+".csv");
fw = new FileWriter(filename);
CSVWriter writer = new CSVWriter(fw);
writer.writeAll(res, false);
int colType = res.getMetaData().getColumnType(colunmCount);
dispInt(colType);
fw.flush();
fw.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Did you take a look at encodeBase64String(byte[] data) method from the Base64 provided by Apache?
Encodes binary data using the base64 algorithm but does not chunk the output.
This should allow you to return encoded strings representing your Binary Large Object and incorporate it in your CSV.
People on the other side can then use the decodeBase64String(String data) to get the BLOB back again.
Related
I am trying to write data in java into apache parquet. So far, what i've done is use apache arrow via the examples here: https://arrow.apache.org/cookbook/java/schema.html#creating-fields and create an arrow format dataset.
Question is, how do I write it into parquet after that? Also, do I need to use apache arrow to output the data as a parquet file? or can I use apache parquet directly to serialize the data and then output it as a parquet file?
what i've done:
try (BufferAllocator allocator = new RootAllocator()) {
Field name = new Field("name", FieldType.nullable(new ArrowType.Utf8()), null);
Field age = new Field("age", FieldType.nullable(new ArrowType.Int(32, true)), null);
Schema schemaPerson = new Schema(asList(name, age));
try(
VectorSchemaRoot vectorSchemaRoot = VectorSchemaRoot.create(schemaPerson, allocator)
){
VarCharVector nameVector = (VarCharVector) vectorSchemaRoot.getVector("name");
nameVector.allocateNew(3);
nameVector.set(0, "David".getBytes());
nameVector.set(1, "Gladis".getBytes());
nameVector.set(2, "Juan".getBytes());
IntVector ageVector = (IntVector) vectorSchemaRoot.getVector("age");
ageVector.allocateNew(3);
ageVector.set(0, 10);
ageVector.set(1, 20);
ageVector.set(2, 30);
vectorSchemaRoot.setRowCount(3);
File file = new File("randon_access_to_file.arrow");
try (
FileOutputStream fileOutputStream = new FileOutputStream(file);
ArrowFileWriter writer = new ArrowFileWriter(vectorSchemaRoot, null, fileOutputStream.getChannel())
) {
writer.start();
writer.writeBatch();
writer.end();
System.out.println("Record batches written: " + writer.getRecordBlocks().size() + ". Number of rows written: " + vectorSchemaRoot.getRowCount());
} catch (IOException e) {
e.printStackTrace();
}
}
}
but this outputs as an arrow file. not a parquet. Any ideas how I can output this to parquet file instead? And do i need arrow to generate a parquet file to begin with - or can i just use parquet directly?
Arrow Java does not yet support writing to Parquet files, but you can use Parquet to do that.
There is some code in the Arrow dataset test classes that may help. See
org.apache.arrow.dataset.ParquetWriteSupport;
org.apache.arrow.dataset.file.TestFileSystemDataset;
The second class has some tests that use the utilities in the first one.
You can find them on GitHub here:
https://github.com/apache/arrow/tree/master/java/dataset/src/test/java/org/apache/arrow/dataset
I have this application I'm developing in JSP and I wish to export some data from the database in XLS (MS Excel format).
Is it possible under tomcat to just write a file as if it was a normal Java application, and then generate a link to this file? Or do I need to use a specific API for it?
Will I have permission problems when doing this?
While you can use a full fledged library like JExcelAPI, Excel will also read CSV and plain HTML tables provided you set the response MIME Type to something like "application/vnd.ms-excel".
Depending on how complex the spreadsheet needs to be, CSV or HTML can do the job for you without a 3rd party library.
Don't use plain HTML tables with an application/vnd.ms-excel content type. You're then basically fooling Excel with a wrong content type which would cause failure and/or warnings in the latest Excel versions. It will also messup the original HTML source when you edit and save it in Excel. Just don't do that.
CSV in turn is a standard format which enjoys default support from Excel without any problems and is in fact easy and memory-efficient to generate. Although there are libraries out, you can in fact also easily write one in less than 20 lines (funny for ones who can't resist). You just have to adhere the RFC 4180 spec which basically contains only 3 rules:
Fields are separated by a comma.
If a comma occurs within a field, then the field has to be surrounded by double quotes.
If a double quote occurs within a field, then the field has to be surrounded by double quotes and the double quote within the field has to be escaped by another double quote.
Here's a kickoff example:
public static <T> void writeCsv (List<List<T>> csv, char separator, OutputStream output) throws IOException {
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(output, "UTF-8"));
for (List<T> row : csv) {
for (Iterator<T> iter = row.iterator(); iter.hasNext();) {
String field = String.valueOf(iter.next()).replace("\"", "\"\"");
if (field.indexOf(separator) > -1 || field.indexOf('"') > -1) {
field = '"' + field + '"';
}
writer.append(field);
if (iter.hasNext()) {
writer.append(separator);
}
}
writer.newLine();
}
writer.flush();
}
Here's an example how you could use it:
public static void main(String[] args) throws IOException {
List<List<String>> csv = new ArrayList<List<String>>();
csv.add(Arrays.asList("field1", "field2", "field3"));
csv.add(Arrays.asList("field1,", "field2", "fie\"ld3"));
csv.add(Arrays.asList("\"field1\"", ",field2,", ",\",\",\""));
writeCsv(csv, ',', System.out);
}
And inside a Servlet (yes, Servlet, don't use JSP for this!) you can basically do:
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
String filename = request.getPathInfo().substring(1);
List<List<Object>> csv = someDAO().findCsvContentFor(filename);
response.setHeader("content-type", "text/csv");
response.setHeader("content-disposition", "attachment;filename=\"" + filename + "\"");
writeCsv(csv, ';', response.getOutputStream());
}
Map this servlet on something like /csv/* and invoke it as something like http://example.com/context/csv/filename.csv. That's all.
Note that I added the possiblity to specify the separator character separately, because it may depend on the locale used whether Excel would accept a comma , or semicolon ; as CSV field separator. Note that I also added the filename to the URL pathinfo, because a certain webbrowser developed by a team in Redmond otherwise wouldn't save the download with the proper filename.
You will probably need a library to manipulate Excel files, like JExcelAPI ("jxl") or POI. I'm more familiar with jxl and it can certainly write files. You can generate them and store them by serving a URL to them but I wouldn't. Generated files are a pain. They add complication in the form on concurrency, clean-up processes, etc.
If you can generate the file on the fly and stream it to the client through the standard servlet mechanisms.
If it's generated many, may times or the generation is expensive then you can cache the result somehow but I'd be more inclined to keep it in memory than as a file. I'd certainly avoid, if you can, linking directly to the generated file by URL. If you go via a servlet it'll allow you to change your impleemntation later. It's the same encapsualtion concept as in OO dsign.
POI or JExcel are good APIs. I personally like better POI, plus POI is constantly updated. Furthermore, there are more resources online about POI than JExcel in case you have any questions. However, either of the two does a great job.
maybe you should consider using some reporting tool with an option of exporting files into XLS format. my suggestion is JasperReports
try {
String absoluteDiskPath = test.xls";
File f = new File(absoluteDiskPath);
response.setContentType("application/xlsx");
response.setHeader("Content-Disposition", "attachment; filename=" + absoluteDiskPath);
String name = f.getName().substring(f.getName().lastIndexOf("/") + 1, f.getName().length());
InputStream in = new FileInputStream(f);
out.clear(); //clear outputStream prevent illegalStateException write binary data to outputStream
ServletOutputStream outs = response.getOutputStream();
int bit = 256;
int i = 0;
try {
while ((bit) >= 0) {
bit = in.read();
outs.write(bit);
}
outs.flush();
outs.close();
in.close();
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
try {
if(outs != null)
outs.close();
if(in != null)
in.close();
}catch (Exception ioe2) {
ioe2.printStackTrace();
}
}
} catch (Exception ex) {
ex.printStackTrace();
}
I tried like as below in JSP, it is working fine.
<% String filename = "xyz.xls";
response.setContentType("application/octet-stream");
response.setHeader("Content-Disposition","attachment; filename=\"" + filename + "\"");
java.io.File excelFile=new java.io.File("C:\\Users\\hello\\Desktop\\xyz.xls");
java.io.FileInputStream fileInputStream=new java.io.FileInputStream(excelFile);
byte[] bytes = new byte[(int) excelFile.length()];
int offset = 0;
while (offset < bytes.length)
{
int result = fileInputStream.read(bytes, offset, bytes.length - offset);
if (result == -1) {
break;
}
offset += result;
}
javax.servlet.ServletOutputStream outs = response.getOutputStream();
outs.write(bytes);
outs.flush();
outs.close();
fileInputStream.close();
%>
My file is 14GB and I would like to read line by line and will be export to excel file.
As the file include different language, such as Chinese and English,
I tried to use FileInputStream with UTF-16 for reading data,
but result in java.lang.OutOfMemoryError: Java heap space
I have tried to increase the heap space but problem still exist
How should I change my file reading code?
createExcel(); //open a excel file
try {
//success but cannot read and output for different language
//br = new BufferedReader(
// new FileReader("C:\\Users\\brian_000\\Desktop\\appdatafile.json"));
//result in java.lang.OutOfMemoryError: Java heap space
br = new BufferedReader(new InputStreamReader(
new FileInputStream("C:\\Users\\brian_000\\Desktop\\appdatafile.json"),
"UTF-16"));
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("cann be print");
String line;
int i=0;
try {
while ((line = br.readLine()) != null) {
// process the line.
try{
System.out.println("cannot be print");
//some statement for storing the data in variables.
//a function for writing the variable into excel
writeToExcel(platform,kind,title,shareUrl,contentRating,userRatingCount,averageUserRating
,marketLanguage,pricing
,majorVersionNumber,releaseDate,downloadsCount);
}
catch(com.google.gson.JsonSyntaxException exception){
System.out.println("error");
}
// trying to get the first 1000rows
i++;
if(i==1000){
br.close();
break;
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
closeExcel();
public static void writeToExcel(String platform,String kind,String title,String shareUrl,String contentRating,String userRatingCount,String averageUserRating
,String marketLanguage,String pricing,String majorVersionNumber,String releaseDate,String downloadsCount){
currentRow++;
System.out.println(currentRow);
if(currentRow>1000000){
currentsheet++;
sheet = workbook.createSheet("apps"+currentsheet, 0);
createFristRow();
currentRow=1;
}
try {
//character id
Label label = new Label(0, currentRow, String.valueOf(currentRow), cellFormat);
sheet.addCell(label);
//12 of statements for write the data to excel
label = new Label(1, currentRow, platform, cellFormat);
sheet.addCell(label);
} catch (WriteException e) {
e.printStackTrace();
}
Excel, UTF-16
As mentioned, the problem is likely caused by the Excel document construction. Try whether UTF-8 yields a lesser size; for instance Chinese HTML still is better compressed with UTF-8 rather than UTF-16 because of the many ASCII chars.
Object creation java
You can share common small Strings. Useful for String.valueOf(row) and such. Cache only strings with a small length. I assume the cellFormat to be fixed.
DIY with xlsx
Excel builds a costly DOM.
If CSV text (with a Unicode BOM marker) is no options (you could give it the extension .xls to be opened by Excel), try generating an xslx.
Create an example workbook in xslx.
This is a zip format you can process in java easiest with a zip filesystem.
For Excel there is a content XML and a shared XML, sharing cell values with an index from content to shared strings.
Then no overflow happens as you write buffer-wise.
Or use a JDBC driver for Excel. (No recent experience on my side, maybe JDBC/ODBC.)
Best
Excel is hard to use with that much data. Consider more effort using a database, or write every N rows in a proper Excel file. Maybe you can later import them with java in one document. (I doubt it.)
I have a method to write data to a file.
public void writeCSFFileData(List<String> fileData){
try {
CsvListWriter csvWriter = new CsvListWriter(new FileWriter("/path/file.csv"), CsvPreference.STANDARD_PREFERENCE);
csvWriter.write(fileData);
csvWriter.close();
} catch (Exception e) {
SimpleLogger.getInstance().writeError(e);
}
The above method is called several times to write to a file.
But, each time the file is not appended instead it is overwritten.
Thanks in advance.
I found the solution myself, I just need to add true in FileWriter to append the data.
Ex: new FileWriter("/path/file.csv",true)
I have this application I'm developing in JSP and I wish to export some data from the database in XLS (MS Excel format).
Is it possible under tomcat to just write a file as if it was a normal Java application, and then generate a link to this file? Or do I need to use a specific API for it?
Will I have permission problems when doing this?
While you can use a full fledged library like JExcelAPI, Excel will also read CSV and plain HTML tables provided you set the response MIME Type to something like "application/vnd.ms-excel".
Depending on how complex the spreadsheet needs to be, CSV or HTML can do the job for you without a 3rd party library.
Don't use plain HTML tables with an application/vnd.ms-excel content type. You're then basically fooling Excel with a wrong content type which would cause failure and/or warnings in the latest Excel versions. It will also messup the original HTML source when you edit and save it in Excel. Just don't do that.
CSV in turn is a standard format which enjoys default support from Excel without any problems and is in fact easy and memory-efficient to generate. Although there are libraries out, you can in fact also easily write one in less than 20 lines (funny for ones who can't resist). You just have to adhere the RFC 4180 spec which basically contains only 3 rules:
Fields are separated by a comma.
If a comma occurs within a field, then the field has to be surrounded by double quotes.
If a double quote occurs within a field, then the field has to be surrounded by double quotes and the double quote within the field has to be escaped by another double quote.
Here's a kickoff example:
public static <T> void writeCsv (List<List<T>> csv, char separator, OutputStream output) throws IOException {
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(output, "UTF-8"));
for (List<T> row : csv) {
for (Iterator<T> iter = row.iterator(); iter.hasNext();) {
String field = String.valueOf(iter.next()).replace("\"", "\"\"");
if (field.indexOf(separator) > -1 || field.indexOf('"') > -1) {
field = '"' + field + '"';
}
writer.append(field);
if (iter.hasNext()) {
writer.append(separator);
}
}
writer.newLine();
}
writer.flush();
}
Here's an example how you could use it:
public static void main(String[] args) throws IOException {
List<List<String>> csv = new ArrayList<List<String>>();
csv.add(Arrays.asList("field1", "field2", "field3"));
csv.add(Arrays.asList("field1,", "field2", "fie\"ld3"));
csv.add(Arrays.asList("\"field1\"", ",field2,", ",\",\",\""));
writeCsv(csv, ',', System.out);
}
And inside a Servlet (yes, Servlet, don't use JSP for this!) you can basically do:
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
String filename = request.getPathInfo().substring(1);
List<List<Object>> csv = someDAO().findCsvContentFor(filename);
response.setHeader("content-type", "text/csv");
response.setHeader("content-disposition", "attachment;filename=\"" + filename + "\"");
writeCsv(csv, ';', response.getOutputStream());
}
Map this servlet on something like /csv/* and invoke it as something like http://example.com/context/csv/filename.csv. That's all.
Note that I added the possiblity to specify the separator character separately, because it may depend on the locale used whether Excel would accept a comma , or semicolon ; as CSV field separator. Note that I also added the filename to the URL pathinfo, because a certain webbrowser developed by a team in Redmond otherwise wouldn't save the download with the proper filename.
You will probably need a library to manipulate Excel files, like JExcelAPI ("jxl") or POI. I'm more familiar with jxl and it can certainly write files. You can generate them and store them by serving a URL to them but I wouldn't. Generated files are a pain. They add complication in the form on concurrency, clean-up processes, etc.
If you can generate the file on the fly and stream it to the client through the standard servlet mechanisms.
If it's generated many, may times or the generation is expensive then you can cache the result somehow but I'd be more inclined to keep it in memory than as a file. I'd certainly avoid, if you can, linking directly to the generated file by URL. If you go via a servlet it'll allow you to change your impleemntation later. It's the same encapsualtion concept as in OO dsign.
POI or JExcel are good APIs. I personally like better POI, plus POI is constantly updated. Furthermore, there are more resources online about POI than JExcel in case you have any questions. However, either of the two does a great job.
maybe you should consider using some reporting tool with an option of exporting files into XLS format. my suggestion is JasperReports
try {
String absoluteDiskPath = test.xls";
File f = new File(absoluteDiskPath);
response.setContentType("application/xlsx");
response.setHeader("Content-Disposition", "attachment; filename=" + absoluteDiskPath);
String name = f.getName().substring(f.getName().lastIndexOf("/") + 1, f.getName().length());
InputStream in = new FileInputStream(f);
out.clear(); //clear outputStream prevent illegalStateException write binary data to outputStream
ServletOutputStream outs = response.getOutputStream();
int bit = 256;
int i = 0;
try {
while ((bit) >= 0) {
bit = in.read();
outs.write(bit);
}
outs.flush();
outs.close();
in.close();
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
try {
if(outs != null)
outs.close();
if(in != null)
in.close();
}catch (Exception ioe2) {
ioe2.printStackTrace();
}
}
} catch (Exception ex) {
ex.printStackTrace();
}
I tried like as below in JSP, it is working fine.
<% String filename = "xyz.xls";
response.setContentType("application/octet-stream");
response.setHeader("Content-Disposition","attachment; filename=\"" + filename + "\"");
java.io.File excelFile=new java.io.File("C:\\Users\\hello\\Desktop\\xyz.xls");
java.io.FileInputStream fileInputStream=new java.io.FileInputStream(excelFile);
byte[] bytes = new byte[(int) excelFile.length()];
int offset = 0;
while (offset < bytes.length)
{
int result = fileInputStream.read(bytes, offset, bytes.length - offset);
if (result == -1) {
break;
}
offset += result;
}
javax.servlet.ServletOutputStream outs = response.getOutputStream();
outs.write(bytes);
outs.flush();
outs.close();
fileInputStream.close();
%>