I need to suggest an input, excel file or text file.
assuming the input is large number of lines where I need to read the first String, for example:
A,B,C,D....
I need to read the first String (in this case A) to identify the matching row, should I use excel file and use POI to read the first cell of each row? or text file where each line tokens are separated by delimiter and to parse each line reading the first token.
Use a text file. Because computers like it more. If business requires it, rename that text file into a "csv" file and you've got an Excel file.
If humans are going to enter data then use Excel. If the file is used as a communication channel between two systems use as simple as possible file.
If at all possible, use text file - much easier to handle/troubleshoot, easier to generate, uses less memory, does not have restrictions on number of rows, etc. In general - more predictable.
If you go with text files and you have people manually preparing those text files, and you are dealing with non-ASCII text, you better make sure everybody will send you the files in correct encoding (usually UTF-8 would be the best). This is not an issue with Excel.
The only reason to use Excel workbook would be when you need some "business-people" to produce those input files, then that input effectively becomes a user interface to your system - Excel is usually considered more user friendly than Notepad. ;-)
If you do go with Excel, make sure that the people producing those Excel files will give you the correct version (I assume you would want the "old" XLS format, not the new XLSX format).
Rule of thumb: use a text file. It's more interchangeable and way easier to handle by any other software you may need to support in a few years.
If you need some humans to edit those data and you need some beautiful/color display the Excel can provide, consider creating a macro that would store data in csv.
Related
I need to output data to a CSV file from Java, but in that csv file I hope to create multiple sheets so that data can be organized in a better way. After some googling, it seems this is not possible. A CSV file can only receive one-sheet data.
Is this true? If yes, what would be the options? Thank you.
CSV file is interpreted a sequence of characters which comply to some standardization, therefor it cannot contains more than one sheet. You can output your data in a Excel file that contains more than one sheet using the Apache POI api.
Comma Separated Value lists are generally created in plain text files, which do not have multiple pages. What you could do instead is create multiple CSV files, but if you really have a lot of data, a data base might be your best bet.
I have to read a text file and write it to an already existing excel file. The excel file is a customized excel sheet with different items in different columns. The items has different values for each of them... These items with there value can be found in a text file. But i dont have much idea as to how to do this.
E.g- example.txt
Name: John
Age=24
Sex=M
Graduate=M.S
example.xlsx
Age: Sex:
Name: Graduate:
Thanks in advance :)
Just as for so many other problems that need solved, there's an Apache library for that! In this case, it's the POI library. I've only used it for very basic spreadsheet manipulation, but managed that by just following a few tutorials. I'd link to one, but I can't now remember where it was.
Please see Apache POI-HSSF library for reading and writing Excel files with Java. There are some quick guides to get you started.
This post How to read and write excel file in java might help you.
You can also create a *.csv (comma separated value) file in Java. Just create a simple text file with CSV extension and put your values in there like that :
Age:,24,Sex:,M,
So you just separate your values with commas (or other delimiters like ';').
Every line in this file is a row, and every delimiter separates two columns. You won't be able to add colours/styles/formatting this way, but it gives you a file that is openable and understandable even without Excel (or other spreadsheet software).
I have a program which will be used for building questions database. I'm making it for a site that want user to know that contet was donwloaded from that site. That's why I want the output be PDF - almost everyone can view it, almost nobody can edit it (and remove e.g. footer or watermark, unlike in some simpler file types). That explains why it HAS to be PDF.
This program will be used by numerous users which will create new databases or expand existing ones. That's why having output formed as multple files is extremly sloppy and inefficient way of achieving what I want to achieve (it would complicate things for the user).
And what I want to do is to create PDF files which are still editable with my program once created.
I want to achieve this by implementing my custom file type readable with my program into the output PDF.
I came up with three ways of doing that:
Attach the file to PDF and then corrupting the part of PDF which contains it in a way it just makes the PDF unaware that it contains the file, thus making imposible for user to notice it (easely). Upon reading the document I'd revert the corruption and extract file using one of may PDF libraries.
Hide the file inside an image which would be added to the PDF somwhere on the first or last page, somehow (that is still need to work out) hidden from the public eye. Knowing it's location, it should be relativley easy to retrieve it using PDF library.
I have learned that if you add "%" sign as a first character in line inside a PDF, the whole line will be ignored (similar to "//" in Java) by the PDF reader (atleast Adobe reader), making possible for me to add as many lines as I want to the PDF (if I know where, and I do) whitout the end user being aware of that. I could implement my whole custom file into PDF that way. The problem here is that I actually have to read the PDF using one of the Java's input readers, but I'm not sure which one. I understand that PDF can't be read like a text file since it's a binary file (Right?).
In the end, I decided to go with the method number 3.
Unless someone has any better ideas, and the conditions are:
1. One file only. And that file is PDF.
2. User must not be aware of the addition.
The problem is that I don't know how to read the PDF as a file (I'm not trying to read it as a PDF, which I would do using a PDF library).
So, does anyone have a better idea?
If not, how do I read PDF as a FILE, so the output is array of characters (with newline detection), and then rewrite the whole file with my content addition?
In Java, there is no real difference between text and binary files, you can read them both as an inputstream. The difference is that for binary files, you can't really create a Reader for it, because that assumes there's a way to convert the byte stream to unicode characters, and that won't work for PDF files.
So in your case, you'd need to read the files in byte buffers and possibly loop over them to scan for bytes representing the '%' and end-of-line character in PDF.
A better way is to use another existing way of encoding data in a PDF: XMP tags. This is allows any sort of complex Key-Value pairs to be encoded in XML and embedded in PDF's, JPEGs etc. See http://partners.adobe.com/public/developer/en/xmp/sdk/XMPspecification.pdf.
There's an open source library in Java that allows you to manipulate that: http://pdfbox.apache.org/userguide/metadata.html. See also a related question from another guy who succeeded in it: custom schema to XMP metadata or http://plindenbaum.blogspot.co.uk/2010/07/pdfbox-insertextract-metadata-frominto.html
It's all just 1's and 0's - just use RandomAccessFile and start reading. The PDF specification defines what a valid newline character(s) is/are (there are several). Grab a hex editor and open a PDF and you can at least start getting a feel for things. Be careful of where you insert your lines though - you'll need to add them towards the end of the file where they won't screw up the xref table offsets to the obj entries.
Here's a related question that may be of interest: PDF parsing file trailer
I would suggest putting your comment immediately before the startxref line. If you put it anywhere else, you could wind up shifting things around and breaking the xref table pointers.
So a simple algorithm for inserting your special comment will be:
Go to the end of the file
Search backwards for startxref
Insert your special comment immediately before startxref - be sure to insert a newline character at the end of your special comment
Save the PDF
You can (and should) do this manually in a hex editor.
Really important: are your users going to be saving changes to these files? i.e. if they fill in the form field, are they going to hit save? If they are, your comment lines may be removed during the save (and different versions of different PDF viewers could behave differently in this regard).
XMP tags are the correct way to do what you are trying to do - you can embed entire XML segments, and I think you'd be hard pressed to come up with a data structure that couldn't be expressed as XML.
I personally recommend using iText for this, but I'm biased (I'm one of the devs). The iText In Action book has an excellent chapter on embedding XMP data into PDFs. Here's some sample code from the book (which I definitely recommend): http://itextpdf.com/examples/iia.php?id=217
I have a scenario wherein the user uploads a file to the system. The only file that the system understands in a CSV, but the user can upload any type of file eg: jpeg, doc, html. I need to throw an exception if the user uploads anything other than CSV file.
Can anybody let me know how can I find if the uploaded file is a CSV file or not?
CSV files vary a lot, and they all could be called, legitimately, CSV files.
I guess your approach is not the best one, the correct approach would be to tell if the uploaded file is a text file the application can parse instead of it it's a CSV or not.
You would report errors whenever you can't parse the file, be it a JPG, MP3 or CSV in a format you cannot parse.
To do that, I would try to find a library to parse various CSV file formats, else you have a long road ahead writing code to parse many possible types of CSV files (or restricting the application's flexibility by supporting few CSV formats.)
One such library for Java is opencsv
If you're using some library CSV parser, all you would have to do is catch any errors it throws.
If the CSV parser you're using is remotely robust, it will throw some useful errors in the event that it doesn't understand the file format.
I can think of several methods.
One way is to try to decode the file using UTF-8. (This is built into Java and is probably built into .NET too.) If the file decodes properly, then you at least know that it's a text file of some kind.
Once you know it's a text file, parse out the individual fields from each line and check that you get the number of fields that you expect. If the number of fields per line is inconsistent then you might just have a file that contains text but is not organized into lines and fields.
Otherwise you have a CSV. Then you can validate the fields.
If it's a web application, you might want to check the content-type HTTP header the browser sends when uploading/posting a file through a form.
If there's a bind for the language you're using, you might also try using libmagic, is pretty good at recognizing file types. For example, the UNIX tool file uses it.
http://sourceforge.net/projects/libmagic/
I don't know if you can tell for 100% certain in any way, but I'd suggest that the first validations should be:
Is the file extension .csv
Count the number of commas in the file per line, there should normally be the same amount of commas on each line of the file for it to be a valid CSV file. (As Jkramer said, this only works if the files can't contain quoted commas).
try this one :
String type = Files.probeContentType(Paths.get(filepath));
I solved it like this: read the file with UTF-16 encoding, if no comma is found in the file, it means UTF-16 encoding didnt work. Which means that this csv file is of Excel format (NOT plain text).
if(fileA.endsWith(".csv") && fileB.endsWith(".csv")) {
second_list=readCSVFile(fileA);
new_list=readCSVFile(fileB);
if(!String.join("", second_list).contains(",") || !String.join("", new_list).contains(",")) {
//read these files with UTF-8 encoding
System.out.println("[WARN] csv files will be read like text files. (UTF-16 encoding couldnt find any comma in the file i.e., UTF-16 encoding didn't work)");
second_list=readFile(fileA);
new_list=readFile(fileB);
} else {
// keep the csv as UTF-16 encoded
}
I want to make a GUI application that contains three functions as follows:
Add a record
Edit a record
Delete a record
A record contains two fields - Name and Profession
There are two restrictions for the application
You can't use database to store info. You have to use a flat file.
Total file should not be re-written for every add/delete operation.
So, my questions are mentioned below:
Q1. Which file format would be better? (.xml or .csv or .txt or any other)
Q2. How can we perform the add/delete operation without the whole file being re-written?
The second part of your question is answered here : Best Way to Write Bytes in the Middle of a File in Java
As for the format - I would go with something as simple as possible. You don't want to have to deal with a bunch of markup processing, as using RandomAccessFile, you will going directly to a byte position. A fixed width style format would be good, so that based on the record number, you can calculate the starting position of a record or field in the file, without having to read everything in the file. The fields would then be padded out to the fixed width with spaces or some other suitable character.
I would go with CSV, zipped. it is both readable, and editable externally.
If CSV is your choice, this can help: http://javacsv.sourceforge.net/
Did you look at this? http://sourceforge.net/projects/flatworm/
Also consider Apache Derbi and HSQLDB
Another solution is this http://www.coyotegulch.com/products/jisp/index.html
You can reinvent the wheel, but that is only required if this is an academic assigment...
Given that the whole file must not be rewritten, I would suggest using RandomAccessFile that allow you to read and write only the record you want.
For the file format, a binary file, using fixed length for the record : ex: Name on 20 characters, Profession on 30.
This will allow you to use the seek() method of RandomAccessFile to directly access your data.