I need some way to store a configuration/status file that needs to be changed rapidly. The status of each key value pair (key-value) is stored in that file. The status needs to be changed rather too rapidly as per the status of a communication (Digital multimedia broadcasting) hardware.
What is the best way to go about creating such a file? ini? XML? Any off the shelf filewriter in Java? I can't use databases.
It sounds like you need random access to update parts of the file frequently without re-writing the entire file. Design binary file format and use RandomAccessFile API to read/write it. You are going to want to use fixed number of bytes for key and for value, such that you can index into the middle of the file and update the value without having to re-write all of the following records. Basically, you would be re-implementing how a database stores a table.
Another alternative is to only store a single key-value pair per file such that the cost of re-writing the file is minor. Maybe you can think of a way to use file name as the key and only store value in the file content.
I'd be inclined to try the second option unless you are dealing with more than a few thousand records.
The obvious solution would be to put the "configuration" information into a Properties object, and then use Properties.store(...) or Properties.storeToXML(...) to save to a file output stream or writer.
You also need to do something to ensure that whatever is reading the file will see a consistent snapshot. For instance, you could write to a new file each time and do a delete / rename dance to replace the the old with the new.
But if the update rate for the file is too high, you are going to create a lot of disc traffic, and you are bound slow down your application. This is going to apply (eventually) no matter what file format / API you use. So, you may want to consider not writing to a file at all.
At some point, configuration that changes too rapidly becomes "program state" and not configuration. If it is changing so rapidly, why do you have confidence that you can meaningfully write it to, and then read it from, a filesystem?
Say more about what the status is an who the consumer of the data is...
Related
I want to store my blobs outside of the database in files, however they are just random blobs of data and aren't directly linked to a file.
So for example I have a table called Data with the following columns:
id
name
comments
...
I can't just include a column called fileLink or something like that because the blob is just raw data. I do however want to store it outside of the database. I would love to create a file called 3.dat where 3 is the id number for that row entry. The only thing with this setup is that the main folder will quickly start to have a large number of files as the id is a flat folder structure and there will be OS file issues. And no the data is not grouped or structured, it's one massive list.
Is there a Java framework or library that will allow me to store and manage the blobs so that I can just do something like MyBlobAPI.saveBlob(id, data); and then do MyBlobAPI.getBlob(id) and so on? In other words something where all the File IO is handled for me?
Simply use an appropriate database which implements blobs as you described, and use JDBC. You really are not looking for another API but a specific implementation. It's up to the DB to take care of effective storing of blobs.
I think a home rolled solution will include something like a fileLink column in your table and your api will create files on the first save and then write that file on update.
I don't know of any code base that will do this for you. There are a bunch that provide an in memory file system for java. But it's only a few lines of code to write something that writes and reads java objects to a file.
You'll have to handle any file system limitations yourself. Though I doubt you'll ever burn through the limitations of modern file systems like btrfs or zfs. FAT32 is limited to 65K files per directory. But even last generation file systems support something on the order of 4 billion files per directory.
So by all means, write a class with two functions. One to serialize an object to a file; given it a unique key as a name. And another to deserialize the object by that key. If you are using a modern file system, you'll never run out of resources.
As far as I can tell there is no framework for this. The closest I could find was Hadoop's HDFS.
That being said the advice of just putting the BLOB's into the database as per the answers below is not always advisable. Sometimes it's good and sometimes it's not, it really depends on your situation. Here are a few links to such discussions:
Storing Images in DB - Yea or Nay?
https://softwareengineering.stackexchange.com/questions/150669/is-it-a-bad-practice-to-store-large-files-10-mb-in-a-database
I did find some addition really good links but I can't remember them offhand. There was one in particular on StackOverFlow but I can't find it. If you believe you know the link please add it in the comments so that I can confirm it's the right one.
I am writing a software which has a part dealing with read and write operaions. I am wondering how costly these operations are on a csv file. Is there are any other file formats that consume less time? Because I have to do write and read on csv files at the end of every cycle.
Read and write operations depend on the file system, hardware, software configuration, memory, mermory setup and size of the file to read. But not on the format. A different problem related with this is the cost of parsing the file that surely must relative low as csv is very simple.
The point is that CSV is a good format for tables of data but not for nested data. If your data has a lot of nested information you can separate it into different csv files or you will have some information redundancy that will penalize your performance. But other formats might have other kind of redundancy.
And do not optimize prematurily. If you are reading and writing from the file very frecuently this file will surely be kept on RAM. JSON or a zipped file might save size and be read faster but would have a higher parsing time and could be even slower at the end. And the parsing time depends also on the implemenation of the library (Gson vs Jackson) and version.
It will be nice to know the reasons behind your problem to give better ansewrs.
The cost of reading / writing to a CSV file, and whether it is suitable for your application, depend on the details of your use case. Specifically, if you are simply reading from the beginning of the file and writing to the end of the file, then the CSV format is likely to work fine. However, if you need to access particular records in the middle of your file then you probably wish to choose another format.
The main issue with a CSV file is that it is not a good format choice for random access, since each record (row) is of variable size, so you cannot simply seek to a particular record offset in the file, and instead need to read every row (well, you could still jump and sample, but you cannot seek directly by record offset). Other formats with fixed sized records would allow you to seek directly to a particular record in the file, making updating of an entry in the middle of the file possible without needing to re-read and re-write the entire file.
If I have a property of an object which is a large String (say the contents of a file ~ 50KB to 1 MB, maybe larger), what is the practice around declaring such a property in a POJO? All I need to do is to be able to set a value from one layer of my application and transfer it to another without making the object itself "heavy".
I was considering if it makes sense to associate an InputStream or OutputStream to get / set the value, rather than reference the String itself - which means when I attempt to read the value of the contents, I read it as a stream of bytes, rather than a whole huge string loaded into memory... thoughts?
What you're describing depends largely on your anticipated use of the data. If you're delivering the contents in raw form, then there may be more efficient ways to manage it.
For example, if your app has a web interface, your app may just provide a URL for a web server to stream the contents to the requester. If it's a CLI-based app, you may be able to get away with a simple file copy. If your app is processing the file, however, then perhaps your POJO could retain only the results of that processing rather than the raw data itself.
If you wish to provide a general pattern along the lines of using POJO's with references to external streams, I would suggest storing in your POJO something akin to a URI that tells where to find the stream (like a row ID in a database or a filename or a URI) rather than storing an instance of the stream itself. In doing so, you'll reduce the number of open file handles, prevent potential concurrency issues, and will be able to serialize those objects locally if needed without having to duplicate the raw data persisted elsewhere.
You could have an object that supplies a stream or an iterator every time you access it. Note that the content has to live on some storage, like a file. I.e your object will store a pointer (e.g. a file path) to the storage and every time someone access it, you open a stream or create an iterator and let that party read. Note also that in order to save on memory, whoever consumes it has to make sure not to store the whole content in memory.
However, 50KB or 1MB is really tiny. Unless you have like gigabytes (or maybe hundred megabytes), I wouldn't try to do something like that.
Also, even if you have large data, it's often simpler to just use files or whatever storage you'll use.
tl;dr: Just use String.
I am developing an application that is writing data to a file. Lets assume while it is writing the data we plug off the battery. What will happen to the file? will it be half-writen (corrupted), empty or same as before we wrote to it? My guess is that it will be corrupted. How to check if it is corrupted when we restart the phone when the file was storing an arraylist of objects? will java throw a corrupted file exception or say that the read arraylist is null or that it is an unknown object?
PS. maybe create another file that will keep the MD5 checksum of the data file? And whenever I write to the file data first I produce its checksum and then when I read from the data file produce a checksum and compare it with the previous. That will indicate whether my data are intact but it wont allow me to roll back to a previous state (pre-corrupted one). I would like a method that would be as lightweight as possible, I am already using the CPU too much by reading/writing changes to my storage on every attribute change of a set of thousands. Probably a database would have been a better idea.
I can't say how Java will read in a corrupted serialized array, but for safety let's assume that there's no error detection.
In that case, you have two easy options:
Store a checksum of your data inside your data structure, before you serialize it.
Compute the checksum of the final serialized file.
Either case will work the same way, though the first option might be a bit faster since you compute the checksum before you've written anything to disk (and therefor avoid an extra round of file I/O).
As you mentioned, MD5 would be fine for this. (Even a basic CRC would probably be fine -- you don't need a cryptograhpic hash for this.)
If you want to allow rolling back to a previous version -- I'd just store each version as a separate file and then have a pointer to the most recent one. (If you update the pointer as the last step of your write operation, this will also provide an extra level of protection against corrupt data being input to your app -- though you'll have to prepare for this pointer to be corrupt as well. Since this is essentially a commit step, you could interpret a corrupt pointer as "use the last version".)
And yes, at this point you might want to just use the SQLite functionality built into Android. :)
I am currently writing a program which takes user input and creates rows of a comma delimited .csv file. I am in need of a way to save this data in a way in which users are not able to easily edit this data. It does not need to be super secure, just enough so that it couldn't accidentally be edited. I also need another file (or the same file?) created to then be easily accessible (in the file system) by the user so that they may then email this file to a system admin who can then open the .csv file. I could provide this second person with a conversion program if necessary.
The file I save data in and the file to be sent can be two different files if there are any advantages to this. I was currently considering just using a file with a weird file extension, but saving it as a text file so that the user will only be able to open it if they know to try that. The other option being some sort of encryption, but I'm not sure if this is necessary and even if it was where I would start.
Thanks for the help :)
Edit: This file is meant to store the actual data being entered. Currently the data is being gathered on paper forms which are then sent to the admin to manually enter all of the data. This little app is meant to have someone else enter the data from the paper form and then tell them if they've entered it all correctly. After they've entered it all they then need to send the data to the admin. It would be preferable if the sending was handled automatically, but this app needs to be very simple and low budget and I don't want an internet connection to be a requirement.
You could store your data in a serializable object and save that. It would resist casual editing and be very simple to read and write from your app. This page should get you started: http://java.sun.com/developer/technicalArticles/Programming/serialization/
From your question, I am guessing that the uneditable file's purpose is to store some kind of system config and you don't want it to get messed up easily. From your own suggestions, it seems that even knowing that the file has been edited would help you, since you can then avoid using it. If that is the case, then you can use simple checks, such as save the total number of characters in the line as the first or last comma delimited value. Then, before you use the file, you just run a small validation code on it to verify that the file is indeed unaltered.
Another approach may just be to use a ZIP (file) of a "plain text format" (CSV, XML, other serialization method, etc) and, optionally, utilize a well-known (to you) password.
This approach could be used with other stream/package types: the idea behind using a ZIP (as opposed to an object serializer directly) is so that one can open/inspect/modify said data/file(s) easily without special program support. This may or may not be a benefit and using a password may or may not even be required, see below.
Some advantages of using a ZIP (or CAB):
The ability for multiple resources (aids in extensibility)
The ability to save the actual data in a "text format" (XML, perhaps)
Maintain competitive file-sizes for "common data"
Re-use existing tooling support (also get checksum validation for free!)
Additionally, using a non-ZIP file extension will prevent most users from casually associating the file (a similar approach to what is presented in the original post, but subtly different because the ZIP format itself is not "plain text") with the ZIP format and being able to open it. A number of modern Microsoft formats utilize the fact that the file-extension plays an important role and use CAB (and sometimes ZIP) formats as the container format for the document. That is, an ".XSN" or ".WSP" or ".gadget" file can be opened with a tool like 7-zip, but are generally only done so by developers who are "in the know". Also, just consider ".WAR" and ".JAR" files as other examples of this approach, since this is Java we're in.
Traditional ZIP passwords are not secure, and more-so is using a static password embedded in the program. However, if this is just a deterrent (e.g. not for "security") then those issues are not important. Coupled with an "un-associated" file-type/extension, I believe this offers the protection asked for in the question while remaining flexible. It may be possible to entirely drop the password usage and still prevent "accidental modifications" just by using a ZIP (or other) container format, depending upon requirement/desires.
Happy coding.
Can you set file permissions to make it read-only?
Other than doing a binary output file, the file system that Windows runs (I know for sure it works from XP through x64 Windows 7) has a little trick that you can use to hide data from anyone simply perusing through your files:
Append your output and input files with a colon and then an arbitrary value, eg if your filename is "data.csv", make it instead "data.csv:42". Any existing or non-existing file can be appended to to access a whole hidden area (and every file for every value after the colon is distinct, so "data.csv:42" != "data.csv:carrots" != "second.csv:carrots").
If this file doesn't exist, it will be created and initialized to have 0 bytes of data with it. If you open up the file in Notepad you will indeed see that it holds exactly the data it held before writing to the :42 file, no more, no less, but in reality subsequent data read from this "data.csv:42" file will persist. This makes it a perfect place to hide data from any annoying user!
Caveats: If you delete "data.csv", all associated hidden data will be deleted too. Also, there are indeed programs that will find these files, but if your user goes through all that trouble to manually edit some csv file, I say let them.
I also have no idea if this will work on other platforms, I've never thought to try it.