We have seen lot of applications who are working with JSON file but i have a case study of which i want to get solution.
Let us see ...
a app is working with json file which gets requests from million users and every second thousands of requests has been completed.
JSON file is updated by admin panel every minute or second or specific time frame.
what will be behaviour of JSON file while request has been received to access JSON file and open for update from admin at same time (i have read it that JSON file will be fetched in readable mode.)
Let JSON file is writing using some script and its process is third of a second than what will be behaviour while 50% file has been updated.
Either file will be given with new written content when process completed or when it was partially updated?
Don't bother with locking, just use rename().
Assuming you're running on an OS where a rename() is an atomic operation, create a new file, say "/data/file/name.json.new", then when that's complete, rename the file. In C that would look like this:
rename( "/data/file/name.json.new", "/data/file/name.json" );
This way, any process opening "/data/file/name.json" will always see a consistent data file.
Practically, by what you describe, you want a service that applies operations on a file server-side.
You should though avoid taking the responsibility of Creating, Readind, Updating and Deleting (CRUD), as you will have troubles on preserving principles such as Atomicity, Consistency, Isolation and Durability (ACID), while there are systems doing that for you, the Database Management Systems.
In simple words, scenarios like what you describe should be a responsibility of a DBMS and not yours.
You probably need a NoSQL DBMS, that responsible for the CRUD operations of your database - which can be file-based in a JSON format and other forms, preserving ACID always (or almost always, but this is probably something you will learn on searching on it). MongoDB is a great example of such system.
Because you mentioned JSON, please take into consideration that it is another story to transfer the data, and another to store them. I suggest that you use the JSON format for requests & responses, but explore other options in storage. For instance, even a Relational DBMS that uses SQL can be good for you, it always depends on your needs. You might just need to form (encode & decode) the data in JSON format wherever received or sent to each client.
Take a look here for more info.
Related
I am a beginner in programming, so I am trying to learn with projects. My newest project is to create an agenda/calendar that is accessible from different computers (like a family calendar) so mom or dad can put up their events and everyone can see everyone's plans.
To a program that can store the instance of a family's agenda so they can go back to it at any time, I assume some sort of database or server to store their information is needed. How could I do this?
I apologize if my question is vague. I am relatively new to programming, but am so eager to keep learning.
You have several options.
The easiest is Serialization. Serialization takes an object and writes it to a stream using an ObjectOutputStream. You can read it back with an ObjectInputStream.
It's trivial because without error checking, it's just a few lines of code.
FileOutputStream fos = new FileOutputStream("calendar.dat");
ObjectOutputStream oos = new ObjectOutputStream(fos);
oos.writeObject(yourCalendar);
oos.close();
Similarly:
FileInputStream fis = new FileInputStream("calendar.dat");
ObjectInputStream ois = new ObjectInputStream(fis);
YourCalendar yourCalendar = (YourCalendar)ois.readObject();
ois.close();
Where yourCalendar is the master object containing your entire calendar and the appointments, etc.
Since you're not dealing with large amounts of information, it's perfectly adequate.
Now, that said, it's also fraught with danger. The file format is opaque (you can't just open it up in an editor and look at it). It can also be quite brittle. If you change your underlying classes that you're serializing, you may not be able to read a file back in. There's also potential security implications (likely not germane in your case, but they're still there).
Much of those can be mitigated, at the cost of complexity.
Similarly, you can use one of the JSON or XML libraries to serialize your objects out in to one of those text based formats. These are human readable, and can be a bit less sensitive to change than the binary format.
Of course with all of these, they're "all or nothing". In this case, you're writing out the entire object and all of its embedded objects. That means you can can't individually access the data. Nor can you use a 3rd party tool to access the data (like an SQL toolset). But, again, you don't have much data, so having this kind of access is likely not a big deal.
You wouldn't want to use this in a multi-user scenario, as it can not be incrementally updated (again, all or nothing).
But, all that said, for getting up and running, for simple persistence and being cognizant of its limitations, it will do the job for you and let you check this box on your project as you strive to work the other aspects of it. It's easy to enough to start with this and then, later, make a more robust persistence mechanism.
Memory is volatile. For storing data persistently you need to write it either in files or in databases.
Since it is opinion based question, I am putting my opinion.
You may begin with learning to read and write to files (text as well as binary).
While Writing to and reading from files you need to decide in which format you need to store in which format JSON, Yaml, XML or comma-separated or serialize your objects and store them into a file. The choice is yours.
While reading you need to write your own logic to search into them. So while files are a good and easy way to store data, you need to write either your own search mechanism or use document search like Elastic search.
Another option is to use a database that provides the power of SQL (if using a relational database) to search into your database. In order to use a database, you should learn about databases, reading from and writing to databases, and making a connection to the database in java.
In my opinion,
You should begin with the database approach as you can easily query on a date to get all the events present on the given date. Since, you not only want to store the events but you also want to go to a particular date and list out the events planned on that date. So, you need to store your data in such a way in which it is easy to search and read for you.
Also, I advise you to use the Spring framework and Maven which can take care of all the dependency, database connection with minimal configuration.
You may use h2 database, it is SQLite version and easy to use. Use file-based database, you need not use a server as of now.
Edit
Also as suggested by #springe, you can use any ORM like Hibernate to deal with the database which is a secure and recommended way used even in industrial code. Basically it is good practice to use JPA/ Hibernate when performing CRUD operations.
However, since you are new to programming and stuff, get mastery over plain SQL as well as learn good practices like using ORM.
For references
You can refer Baeldung for references, just google how to do this and that in java Bealdung and you will get pretty cool and short guide how to do it.
You will get spring configuration to connect to h2database, maven dependency to for Spring, and database there at Baeldung. Everything is standard and you just need to copy-paste while also learning how things work.
Keep learning, I loved your spirit. :)
I want to store my blobs outside of the database in files, however they are just random blobs of data and aren't directly linked to a file.
So for example I have a table called Data with the following columns:
id
name
comments
...
I can't just include a column called fileLink or something like that because the blob is just raw data. I do however want to store it outside of the database. I would love to create a file called 3.dat where 3 is the id number for that row entry. The only thing with this setup is that the main folder will quickly start to have a large number of files as the id is a flat folder structure and there will be OS file issues. And no the data is not grouped or structured, it's one massive list.
Is there a Java framework or library that will allow me to store and manage the blobs so that I can just do something like MyBlobAPI.saveBlob(id, data); and then do MyBlobAPI.getBlob(id) and so on? In other words something where all the File IO is handled for me?
Simply use an appropriate database which implements blobs as you described, and use JDBC. You really are not looking for another API but a specific implementation. It's up to the DB to take care of effective storing of blobs.
I think a home rolled solution will include something like a fileLink column in your table and your api will create files on the first save and then write that file on update.
I don't know of any code base that will do this for you. There are a bunch that provide an in memory file system for java. But it's only a few lines of code to write something that writes and reads java objects to a file.
You'll have to handle any file system limitations yourself. Though I doubt you'll ever burn through the limitations of modern file systems like btrfs or zfs. FAT32 is limited to 65K files per directory. But even last generation file systems support something on the order of 4 billion files per directory.
So by all means, write a class with two functions. One to serialize an object to a file; given it a unique key as a name. And another to deserialize the object by that key. If you are using a modern file system, you'll never run out of resources.
As far as I can tell there is no framework for this. The closest I could find was Hadoop's HDFS.
That being said the advice of just putting the BLOB's into the database as per the answers below is not always advisable. Sometimes it's good and sometimes it's not, it really depends on your situation. Here are a few links to such discussions:
Storing Images in DB - Yea or Nay?
https://softwareengineering.stackexchange.com/questions/150669/is-it-a-bad-practice-to-store-large-files-10-mb-in-a-database
I did find some addition really good links but I can't remember them offhand. There was one in particular on StackOverFlow but I can't find it. If you believe you know the link please add it in the comments so that I can confirm it's the right one.
In my tiny little standalone Java application I want to store information.
My requirements:
read and write java objects (I do not want to use SQL, and also querying is not required)
easy to use
easy to setup
minimal external dependencies
I therefore want to use jaxb to store all the information in a simple XML-file in the filesystem. My example application looks like this (copy all the code into a file called Application.java and compile, no additional requirements!):
#XmlRootElement
class DataStorage {
String emailAddress;
List<String> familyMembers;
// List<Address> addresses;
}
public class Application {
private static JAXBContext jc;
private static File storageLocation = new File("data.xml");
public static void main(String[] args) throws Exception {
jc = JAXBContext.newInstance(DataStorage.class);
DataStorage dataStorage = load();
// the main application will be executed here
// data manipulation like this:
dataStorage.emailAddress = "me#example.com";
dataStorage.familyMembers.add("Mike");
save(dataStorage);
}
protected static DataStorage load() throws JAXBException {
if (storageLocation.exists()) {
StreamSource source = new StreamSource(storageLocation);
return (DataStorage) jc.createUnmarshaller().unmarshal(source);
}
return new DataStorage();
}
protected static void save(DataStorage dataStorage) throws JAXBException {
jc.createMarshaller().marshal(dataStorage, storageLocation);
}
}
How can I overcome these downsides?
Starting the application multiple times could lead to inconsistencies: Several users could run the application on a network drive and experience concurrency issues
Aborting the write process might lead to corrupted data or loosing all data
Seeing your requirements:
Starting the application multiple times
Several users could run the application on a network drive
Protection against data corruption
I believe that an XML based filesystem will not be sufficient. If you consider a proper relational database an overkill, you could still go for an H2 db. This is a super-lightweight db that would solve all these problems above (even if not perfectly, but surely much better than a handwritten XML db), and is still very easy to setup and maintain.
You can configure it to persist your changes to the disk, can be configured to run as a standalone server and accept multiple connections, or can run as part of your application in embedded-mode too.
Regarding the "How do you save the data" part:
In case you do not want to use any advanced ORM library (like Hibernate or any other JPA implementation) you can still use plain old JDBC. Or at least some Spring-JDBC, which is very lightweight and easy to use.
"What do you save"
H2 is a relational database. So whatever you save, it will end up in columns. But! If you really do not plan to query your data (neither apply migration scripts on it), saving your already XML-serialized objects is an option. You can easily define a table with an ID + a "data" varchar column, and save your xml there. There is no limit on data-length in H2DB.
Note: Saving XML in a relational database is generally not a good idea. I am only advising you to evaluate this option, because you seem confident that you only need a certain set of features from what an SQL implementation can provide.
Inconsistencies and concurrency are handled in two ways:
by locking
by versioning
Corrupted writing can not be handled very well at application level. The file system shall support journaling, which tries to fix that up to some extent. You can do this also by
making your own journaling file (i.e. a short-lived separate file containing changes to be committed to the real data file).
All of these features are available even in the simplest relational database, e.g. H2, SQLite, and even a web page can use such features in HTML5. It is quite an overkill to reimplement these from scratch, and the proper implementation of the data storage layer will actually make your simple needs quite complicated.
But, just for the records:
Concurrency handling with locks
prior starting to change the xml, use a file lock to gain an exclusive access to the file, see also How can I lock a file using java (if possible)
once the update is done, and you sucessfully closed the file, release the lock
Consistency (atomicity) handling with locks
other application instances may still try to read the file, while one of the apps are writing it. This can cause inconsistency (aka dirty-read). Ensure that during writing, the writer process has an exclusive lock on the file. If it is not possible to gain an exclusive access lock, the writer has to wait a bit and retry.
an application reading the file shall read it (if it can gain access, no other instances do an exclusive lock), then close the file. If reading is not possible (because of other app locking), wait and retry.
still an external application (e.g. notepad) can change the xml. You may prefer an exclusive read-lock while reading the file.
Basic journaling
Here the idea is that if you may need to do a lot of writes, (or if you later on might want to rollback your writes) you don't want to touch the real file. Instead:
writes as changes go to a separate journaling file, created and locked by your app instance
your app instance does not lock the main file, it locks only the journaling file
once all the writes are good to go, your app opens the real file with exclusive write lock, and commits every change in the journaling file, then close the file.
As you can see, the solution with locks makes the file as a shared resource, which is protected by locks and only one applicaition can access to the file at a time. This solves the concurrency issues, but also makes the file access as a bottleneck. Therefore modern databases such as Oracle use versioning instead of locking. The versioning means that both the old and the new version of the file are available at the same time. Readers will be served by the old, most complete file. Once writing of the new version is finished, it is merged to the old version, and the new data is getting available at once. This is more tricky to implement, but since it allows reading all the time for all applications in parallel, it scales much better.
To answer your three issues you mentioned:
Starting the application multiple times could lead to inconsistencies
Why would it lead to inconsistencies? If what you mean is multiple concurrent edit will lead to inconsistencies, you just have to lock the file before editing. The easiest way to create a lock file beside the file. Before starting edit, just check if a lock file exists.
If you want to make it more fault tolerant, you could also put a timeout on the file. e.g. a lock file is valid for 10 minutes. You could write a randomly generated uuid in the lockfile, and before saving, you could check if the uuid stil matches.
Several users could run the application on a network drive and experience concurrency issues
I think this is the same as number 1.
Aborting the write process might lead to corrupted data or loosing all data
This can be solved by making the write atomic or the file immutable. To make it atomic, instead of editing the file directly, just copy the file, and edit on the copy. After the copy is saved, just rename the files. But if you want to be on the safer side, you could always do things like append the timestamp on the file and never edit or delete a file. So every time an edit is made, you create a copy of it, with a newer timestamp appended on the file. And for reading, you will read always the newest one.
note that your simple answer won't handle concurrent writes by different instances. if two instances make changes and save, simply picking the newest one will end up losing the changes from the other instance. as mentioned by other answers, you should probably try to use file locking for this.
a relatively simple solution:
use a separate lock file for writing "data.xml.lck". lock this when writing the file
as mentioned in my comment, write to a temp file first "data.xml.tmp", then rename to the final name when the write is complete "data.xml". this will give a reasonable assurance that anyone reading the file will get a complete file.
even with the file locking, you still have to handle the "merge" problem (one instance reads, another writes, then the first wants to write). in order to handle this you should have a version number in the file content. when an instance wants to write, it first acquires the lock. then it checks its local version number against the file version number. if it is out of date, it needs to merge what is in the file with the local changes. then it can write a new version.
After thinking about it for a while, I would want to try to implement it like this:
Open the data.<timestamp>.xml-file with the latest timestamp.
Only use readonly mode.
Make changes.
Save the file as data.<timestamp>.xml - do not overwrite and check that no file with newer timestamp exists.
I´m currently working on a mapReduce job processing xml data and I think there´s something about the data flow in hadoop that I´m not getting correctly.
I´m running on Amazon´s ElasticMapReduce service.
Input data: large files (significantly above 64MB, so they should be splitable), consisting of a lot of small xml files that are concatenated by a previous s3distcp operation that concatenates all files into one.
I am using a slightly modified version of Mahout´s XmlInputFormat to extract the individual xml snippets from the input.
As a next step I´d like to parse those xml snippets into business objects which should then be passed to the mapper.
Now here is where I think I´m missing something: In order for that to work, my business objects need to implement the Writable interface, defining how to read/write an instance from/to an DataInput or DataOutput.
However, I don´t see where this comes into play - the logic needed to read an instance of my object is already in the InputFormat´s record reader, so why does the object have to be capable of reading/writing itself??
I did quite some research already and I know (or rather assume) WritableSerialization is used when transferring data between nodes in the cluster, but I´d like to understand the reasons behind that architecture.
The InputSplits are defined upon job submission - so if the name node sees that data needs to be moved to a specific node for a map task to work, would it not be sufficient to simply send the raw data as a byte stream? Why do we need to decode that into Writables if the RecordReader of our input format does the same thing anyway?
I really hope someone can show me the error in my thoughts above, many thanks in advance!
I have an application which stores information in a JList. However, of course, when the application is closed all of the information is deleted from memory.
I'm trying to build the app so that when re-launched, it will contain the same data. So is there a way to store this data in a database or similar and if so? Where and how do I go about this?
The simplest way to persist IMHO is in a File.
Try using Properties if you need a key-value map.
Or, if it you're binding more complex objects I recommend a Simple XML serialization package.
You need to connect your application to a database using JDBC. JDBC stands for Java Database Connectivity. As you can see from the name, it lets you to connect to a database. Hence, you can link your application to a database,and store your data persistenly.Here's a link to start off with. And here is something for further reading.
If the data is not complex and is not large (more than a few instances of a few objects) you could persist the list to a file using serialization. This will get you started. If you list is large or complex you might consider a database. Searching for JDBC will in your favorite search engine will get you started.
I think you want a plain flat file. It's simple; you can have one going in no time. (The learning curve is much less than with databases.) And it's fast; you can read a 1 GB file before you can even log on to a DB. Java serialization is a bit tricky, but it can be a very powerful way to save vast amounts of complicated data. (See here for things to watch out for, plus more helpful links.) If, for instance, you wanted to save a large, complex game between sessions, serializing it is the way to go. No need to convert an Object Oriented structure to a relational one.
Use a database:
if you want to add data to a large file, or read only part of the data from a large file. Or if other processes are going to read and modify it.
Consider a DB:
if you are already using one for other purposes. If the user might start on another machine and have trouble finding the file from the last session and the data is not too extensive. Or if the data is relational in nature anyway and someone else may be interested in looking at it.
So if you have a simple case where the user always starts in the same directory, just write and read a simple file. If you have a lot of complex, extensive OO data, use a flat file even if it is not easy to do--you'll need the speed. Otherwise, think about a DB.