I've found numerous posts about reading CSV with Java and the APIs they were pointing at all had a line-oriented approach when it came to reading a CSV file. Something like "while you get a line, get the values of every column".
Are there better ways to do that?
Thanks for any suggestions!
You will need a database. Whether you write your own or use a third party one.
If not you will be doing sequential searches on your data to find anything.
You might want to look at this post: Reading a CSV file into Java as a DB table
It looks like you have all the info you need.
Related
I want to store my blobs outside of the database in files, however they are just random blobs of data and aren't directly linked to a file.
So for example I have a table called Data with the following columns:
id
name
comments
...
I can't just include a column called fileLink or something like that because the blob is just raw data. I do however want to store it outside of the database. I would love to create a file called 3.dat where 3 is the id number for that row entry. The only thing with this setup is that the main folder will quickly start to have a large number of files as the id is a flat folder structure and there will be OS file issues. And no the data is not grouped or structured, it's one massive list.
Is there a Java framework or library that will allow me to store and manage the blobs so that I can just do something like MyBlobAPI.saveBlob(id, data); and then do MyBlobAPI.getBlob(id) and so on? In other words something where all the File IO is handled for me?
Simply use an appropriate database which implements blobs as you described, and use JDBC. You really are not looking for another API but a specific implementation. It's up to the DB to take care of effective storing of blobs.
I think a home rolled solution will include something like a fileLink column in your table and your api will create files on the first save and then write that file on update.
I don't know of any code base that will do this for you. There are a bunch that provide an in memory file system for java. But it's only a few lines of code to write something that writes and reads java objects to a file.
You'll have to handle any file system limitations yourself. Though I doubt you'll ever burn through the limitations of modern file systems like btrfs or zfs. FAT32 is limited to 65K files per directory. But even last generation file systems support something on the order of 4 billion files per directory.
So by all means, write a class with two functions. One to serialize an object to a file; given it a unique key as a name. And another to deserialize the object by that key. If you are using a modern file system, you'll never run out of resources.
As far as I can tell there is no framework for this. The closest I could find was Hadoop's HDFS.
That being said the advice of just putting the BLOB's into the database as per the answers below is not always advisable. Sometimes it's good and sometimes it's not, it really depends on your situation. Here are a few links to such discussions:
Storing Images in DB - Yea or Nay?
https://softwareengineering.stackexchange.com/questions/150669/is-it-a-bad-practice-to-store-large-files-10-mb-in-a-database
I did find some addition really good links but I can't remember them offhand. There was one in particular on StackOverFlow but I can't find it. If you believe you know the link please add it in the comments so that I can confirm it's the right one.
I want to use XML for storing some data. But I do not want read full file when I want to get the last data that was inserted there, as well as I do not want to rewrite full file when adding new data there. Is there a standard way in java to parse xml file not from the beginning but from the end. So that for example SAX or StaX parser will first encounter last closing root tag and than last tag. Or if I want to do this I should read and write everything like I am reading/writing regular text file?
Fundamentally, XML is a poor representation choice for this. The format is inherently "contained" like this, and I haven't seen any APIs which encourage you to fight against that.
Options:
Choose a different format entirely (e.g. use a database)
Create lots of small XML files instead - each one self-contained. When you want the whole of the data, read all the files
Just swallow the hit and read/write the whole file each time.
I found a good topic on this with example solutions for what I want.
This link: http://www.oreillynet.com/xml/blog/2007/03/parsing_xml_backwards.html
Seems that XML is not good file format to achieve what I want. There is no standard parser that can parse XML from the end instead of beginning.
Probably the best solution for will be storing all xml data in one file that contains composition of many xml files contents. On each line stored separate contents of XML. The file itself is not well formed XML but each line contains well formed xml that I will parse using standard xml parser(StaX).
This way I will be able to read just lines from the end of file and append new data to the end of file. When I need the whole data or only the part of it I will read all line or part of them. Probably I can also implement pagination from the end of file for that because the file can be big.
Why XML in each line? I think it is easy to use API for parsing it as well as it is human readable to store data in xml instead of just separating values in the line with some symbol.
Why not use sax/stax and simply process only your last entry? Yes, it will need to open and go through the whole file, but at least it's fairly efficient as opposed to loading the whole DOM tree.
Short of doing that, I don't think you can do what you're asking using XML as a source.
Another alternative, apart from the ones provided by Jon Skeet in his answer, would be to keep the same format but insert the latest entries first, and stop processing the files as soon as you've read your entry.
I'm writing a tool to analyze stock market data. For this I download data and then save all the data corresponding to a stock as a double[][] 20*100000 array in a data.bin on my hd, I know I should put it in some database but this is simply performance wise the best method.
Now here is my problem: I need to do updates and search on the data:
Updates: I have to append new data to the end of the array as time progresses.
Search: I want to iterate over different data files to find a minimum or calculate moving averages etc.
I could do both of them by reading the whole file in and update it writing or do search in a specific area... but this is somewhat overkill since I don't need the whole data.
So my question is: Is there a library (in Java) or something similar to open/read/change parts of the binary file without having to open the whole file? Or searching through the file starting at a specific point?
RandomAccessFile allows seeking into particular position in a file and updating parts of the file or adding new data to the end without rewriting everything. See the tutorial here: http://docs.oracle.com/javase/tutorial/essential/io/rafs.html
You could try looking at Random Access Files:
Tutorial: http://docs.oracle.com/javase/tutorial/essential/io/rafs.html
API: http://docs.oracle.com/javase/6/docs/api/java/io/RandomAccessFile.html
... but you will still need to figure out the exact positions you want to read in a binary file.
You might want to consider moving to a database, maybe a small embedded one like H2 (http://www.h2database.com)
I would like to be able to take 2 csv files as input, join them (SQL style) on a specific column, and output a new csv file which contains all file1 data, plus one of the columns of data from file2.
Any tips on what the best way to achieve this would be? Since SQL offerse the join command then possibly some method of treating the csv files as databases would work well, but I'm open to all suggestions really - the easiest wins.
All help is much appreciated!
Do some simple file IO, split each line and load it into a Set type container. Then you can do set type operations on the content of the two files:
http://www.java2s.com/Code/Java/Collections-Data-Structure/Setoperationsunionintersectiondifferencesymmetricdifferenceissubsetissuperset.htm
you can parse your CSV files and bind them to the Beans with opencsv:
http://opencsv.sourceforge.net/
here, you can bind entities in CSV to a list of Beans:
http://opencsv.sourceforge.net/#javabean-integration
you can then do with List of Beans programmaticly what you want, like appending lists each other, or a join-like logic etc.
A very simple, non-programmatic approach: import both text files into a spreadsheet, then use vlookup (or its equivalent) to look up values from one sheet into the other.
For direct manipulation of CSV files as SQL tables see:
Reading a CSV file into Java as a DB table
You might also try to use a JDBC driver for CSV files like this one:
http://sourceforge.net/projects/csvjdbc/
I have written a command line program to execute arbitrary SQL on a csv files, including multi-file joins, called gcsvsql. You can read about it here:
http://bayesianconspiracy.blogspot.com/2010/03/gcsvsql.html
There is a Google Code project for it here: http://code.google.com/p/gcsvsql/
It's written in Java/Groovy, and will run anywhere Java is available.
Has anybody written any classes for reading and writing Palm Database (PDB) files in Java? (I mean on a server, not on the Palm device itself.) I tried to google, but all I got were Protein Data Bank references.
I wrote a Perl program that does it using Palm::PDB.pm, but I want to turn it into a servlet for a GWT app.
The jSyncManager project at http://www.jsyncmanager.org/ is under the LGPL and includes classes to read and write PDB files -- look in jSyncManager/API/Protocol/Util/DLPDatabase.java in its source code. It looks like the core code you need from this could be isolated from the rest of the library with a little effort.
There are a few ways that you can go about this;
Easiest but slowest: Find a perl-> java bridge. This will not be quick, but it will work and it should involve the least amount of work.
Find a C++/C# implementation that you have the source to and convert it (this should be the fastest solution)
Find a Java reader ... there seems to be a few listed under google... however I do not have any experience with these.
Depending on what your intended usage is, you might look into writing a simple reader yourself. The format is pretty simple and you only need to handle a couple of simple fields to parse it.
Basically there is a header for the entire file which has a 2 byte integer at the end which specifies the number of record. So just skip your way through the bytes for all the other fields in the header and then read the last field which is the number of records in the file. Be aware that the PDB format writes integers with most significant byte first.
Following this, there will be a record header for each record, the first field of which is the actual offset into the file for the record itself. Again, be aware of the byte order.
So, now you have the offsets into the file for each record in the file, which should make it very easy to read the actual records as long as you know the format of these for the type of PDB file you are trying to read.
Wikipedia has a nice overview of the header formats.
Maybe JPilot can help? They must have a lot of Java code dealing with Palm OS data.