I'm fairly new to programming, at least when it comes to anything substantial. I am about to start work on a management software for my employer which draws it's data from, and stores it's data to, an SQL database. I will likely be using JDBC to interact with it.
To try and accurately describe the problem I am going to focus on a very small portion of the program. In the database, there is a table that stores Job records. There are a couple of thousand of them. I want to display all available Jobs (as a text reference from the table) in a scroll-able panel in the program with a search function.
So, my question is... Should I create Job objects from each record in one go and have the program work with the objects to display them, OR should I simply display strings taken directly from the records? The first method would mean that other details of each job are stored in advanced so that when I open a record in the UI the load times should be minimal, however it also sounds like it would take a great deal of resources when it initially populates the panel and generates the objects. The second method would mean issuing a large quantity of queries to the Database, but might avoid the initial resource overhead, but I don't want to put too much strain on the SQL Server because other software in-house relies on it.
Really, I don't know anything about how I should be doing this. But that really is my question. Apologies if I am displaying my ignorance in this post, and thank you in advanced for any help you can offer.
"A couple thousand" is a very small number for modern computers. If you have any sort of logic to perform on these records (they're not all modified solely via stored procedures), you're going to have a much easier time using an object-relational mapping (ORM) tool like Hibernate. Look into the JPA specification, which allows you to create Java classes that represent database objects and then simply annotate them to describe how they're stored in the database. Using an ORM like this system does have some overhead, but it's nearly always worthwhile, since computers are fast and programmers are expensive.
Note: This is a specific example of the rule that you should do things in the clearest and easiest-to-understand way unless you have a very specific reason not to, and in particular that you shouldn't optimize for speed unless you've measured your program's performance and have determined that a specific section of the code is causing problems. Use the abstractions that make the code easy to understand and come back later if you actually have to speed things up.
Related
I fear I may not be truly understanding the utility of database software like MySQL, so perhaps this is an easy question to answer.
I'm writing a program that stores and accesses a bestiary for use in the program. It is a stand-alone application, meaning that it will not connect to the internet or a database (which I am under the impression requires a connection to a server). Currently, I have an enormous .txt file that it parses via a simple pattern (Habitat is on every tenth line, starting with the seventh; name is on every tenth line, starting with the first; etc.) This is prone to parsing errors (problems with reading data that is unrecognizable with the specified encoding, as a lot of the data is copy/pasted by lazy data-entry-ists) and I just feel that parsing a giant .txt file every time I want data is horribly inefficient. Plus, I've never seen a deployed program that had a .txt laying around called "All of our important data.txt".
Are databases the answer? Can they be used simply in basic applications like this one? Writing a class for each animal seems silly. I've heard XML can help, too - but I know virtually nothing about it except that its a mark-up language.
In summary, I just don't know how to store large amounts of data within an application. A good analogy would be: How would you store data for a dictionary/encyclopedia application?
So you are saying that a standalone application without internet access cannot have a database connection? Well your Basic assumption that DB cannot exist in standalone apps is wrong. Today's web applications use Browser assisted SQL databases to store data. All you need is to experiment rather than speculate. If you need direction, start with light weight SQLite
While databases are undoubtedly a good idea for the kind of application you're describing, I'll throw another suggestion your way, which might suit you if your data doesn't necessarily need to change at all, and there's not a "huge" amount of it.
Java provides the ability to serialise objects, which you could use to persist and retrieve object instance data directly to/from files. Using this simple approach, you could:
Write code to parse your text file into a collection of serialisable application-specific object instances;
Serialise these instances to some file(s) which form part of your application;
De-serialise the objects into memory every time the application is run;
Write your own Java code to search and retrieve data from these objects yourself, for example using ordered collection structures with custom comparators.
This approach may suffice if you:
Don't expect your data to change;
Do expect it to always fit within memory on the JVMs you're expecting the application will be run on;
Don't require sophisticated querying abilities.
Even if one or more of the above things do not hold, it may still suit you to try this approach, so that your next step could be to use a so-called object-relational mapping tool like Hibernate or Castor to persist your serialisable data not in a file, but a database (XML or relational). From there, you can use the power of some database to maintain and query your data.
I'm working on an application for a Nursing students; It is a program where a user enters data about their Patient's Vitals, Skin Assessments, Medicine Administered, etc.
Flowchart for program structure in respect to Data:
That data needs to be saved in a structure divisible by Patient and then by the Time recorded. Problem is this is going to be a HUGE amount of data since entries need to be made every 15 minutes.
Flowchart for what interactions necessary between the project and its data:
request patient var over Time and request populate timeline both search for all entries of that patient between two given dates.
The best way I can think of how to organize this data is directory based:
data/PatientName/Month/19102012.file (the date 19 Oct 2012, for quick omission of ignored dates)
This way might be okay but it feels really hacked together, what better organization should I use for this data?
I honestly don't think students entering patient data every 15 minutes qualifies as HUGE these days. As such, virtually any technology would be of use. Some sort of relational database is an obvious choice, and given the above, I don't think you need anything remotely enterprise-scale.
One question that springs to mind is, is security important ? This is medical data, after all. That may influence the technology you choose since filesystems implement security in a radically different fashion to (say) the filesystem.
The one piece of advice I can give now is to abstract your data storage away from the rest of your solution. That way you can implement something trivial now and replace it easily in the future as your requirements solidify.
You can define a custom class(A POJO) containing all the parameters needed as properties in that POJO, and stuff the instances created of that POJO in some database.
Using Database might be an elegant way to handle huge amount of data.
Your suggested directory-based approach would realistically probably be fine. As Brian and Rohit pointed out, the key is that you want to abstract out the data storage. In other words, you should have some interfaces between components of your system that provide the data access methods that you want, and then link up what you want (i.e., request a specific patient, over some time period, etc) with what you have (i.e., a filesystem, or a database, etc).
As Brian pointed out, in today's world "huge" refers to an entirely different scale than recording entries every 15 minutes. I would build something that works, and then address the scale problem when and if it arises. There are lots of other important things to worry about as well, such as security, reliability, etc.
I am writing a program in Java which tracks data about baseball cards. I am trying to decide how to store the data persistently. I have been leaning towards storing the data in an XML file, but I am unfamiliar with XML APIs. (I have read some online tutorials and started experimenting with the classes in the javax.xml hierarchy.)
The software has to major use cases: the user will be able to add cards and search for cards.
When the user adds a card, I would like to immediately commit the data to the persistant storage. Does the standard API allow me to insert data in a random-access way (or even appending might be okay).
When the user searches for cards (for example, by a player's name), I would like to load a list from the storage without necessarily loading the whole file.
My biggest concern is that I need to store data for a large number of unique cards (in the neighborhood of thousands, possibly more). I don't want to store a list of all the cards in memory while the program is open. I haven't run any tests, but I believe that I could easily hit memory constraints.
XML might not be the best solution. However, I want to make it as simple as possible to install, so I am trying to avoid a full-blown database with JDBC or any third-party libraries.
So I guess I'm asking if I'm heading in the right direction and if so, where can I look to learn more about using XML in the way I want. If not, does anyone have suggestions about what other types of storage I could use to accomplish this task?
While I would certainly not discourage the use of XML, it does have some draw backs in your context.
"Does the standard API allow me to insert data in a random-access way"
Yes, in memory. You will have to save the entire model back to file though.
"When the user searches for cards (for example, by a player's name), I would like to load a list from the storage without necessarily loading the whole file"
Unless you're expected multiple users to be reading/writing the file, I'd probably pull the entire file/model into memory at load and keep it there until you want to save (doing periodical writes the background is still a good idea)
I don't want to store a list of all the cards in memory while the program is open. I haven't run any tests, but I believe that I could easily hit memory constraints
That would be my concern to. However, you could use a SAX parser to read the file into a custom model. This would reduce the memory overhead (as DOM parsers can be a little greedy with memory)
"However, I want to make it as simple as possible to install, so I am trying to avoid a full-blown database with JDBC"
I'd do some more research in this area. I (personally) use H2 and HSQLDB a lot for storage of large amount of data. These are small, personal database systems that don't require any additional installation (a Jar file linked to the program) or special server/services.
They make it really easy to build complex searches across the datastore that you would otherwise need to create yourself.
If you were to use XML, I would probably do one of three things
1 - If you're going to maintain the XML document in memory, I'd get familiar with XPath
(simple tutorial & Java's API) for searching.
2 - I'd create a "model" of the data using Objects to represent the various nodes, reading it in using a SAX. Writing may be a little more tricky.
3 - Use a simple SQL DB (and Object model) - it will simply the overall process (IMHO)
Additional
As if I hadn't dumped enough on you ;)
If you really want to XML (and again, I wouldn't discourage you from it), you might consider having a look a XML database style solution
Apache Xindice (apparently retired)
Or you could have a look at some other people think
Use XML as database in Java
Java: XML into a Database, whats the simplest way?
For example ;)
The essence of my problem is that there are too many solutions, and I would like to find which one wins out in pros and cons before I build an infrastructure around it.
(Simplified for the purpose of this forum) This is an auction site where five auctions are stored in a rank #1-5, #1 being the currently featured auction. The other four are simply "on deck." After either a couple hours or the completion of that auction, #2-5 move up to #1-4 and a new one is chosen to be #5
I'm using a dedicated server and I've been considering just storing the data in the servlet or maybe adding a column in the database as a boolean for each auction...like "isFeatured = 1"
Suffice it to say the data is read about 5 times+ more often than it is written, which is why I'm leaning towards good old SQL.
When you can retrieve the relevant auctions from DB with a simple query with ORDER BY and TOP or something similar then try this. If no performance issues occur then KISS and you're done.
Otherwise when these 5 auctions are valid for a while then cache them in memory. Have a singleton holding these auctions and provide methods for updating for example. Maybe you want to use a caching lib instead. Update these Top5 whenever necessary but serve them directly out of memory without hiting a DB or something similar expensive.
What kind of scale are you looking for? How many application servers need access to the data?
I think you're probably making this more complicated than it is. Just use a database, take a hit of ACID, and move onto whatever else you need to work on. :P
Have you taken a look at SQLite? It allows for "good old SQL" without all of the hassles of setting up a separate database server. As long as the data isn't too huge (to be fair, I haven't tested the size limits, but I've skimmed blog entries mentioning the use of SQLite to process files of several dozen MB in size quickly and with no problems), you should be fine.
It isn't a perfect solution for all needs (frankly, I sometimes find the dynamic typing to be a pain), but since it relies on locally stored files, reads will be much faster than firing up a network connection to talk to a more "traditional" RDBMS.
I am attempting to model a realistic social network (Facebook). I am a Computer Science Graduate student so I have a grasp on basic data structures and algorithms.
The Idea:
I began this project in java. My idea is to create multiple Areas of Users. Each User in a given area will have a random number of friends with a normal distribution around a given mean. Each User will have a large percentage or cluster of "Friends" from the Area that they belong to. The remainder of their "Friends" will be smaller clusters from a few different random Areas.
Initial Structure
I wanted to create an ArrayList of areas
ArrayList<Area> areas
With each Area holding an ArrayList of Users
ArrayList<User> users
And each User holding an ArrayList of "Friends"
ArrayList<User> friends
From there I can go through each Area, and each User in that Area and give that user most of their friends from that Area, as well as a few friends from a few random Areas. This is easy enough as long as my data set remains small.
The problem:
When I try to create large data sets, I get an OutOfMemoryError due to no more memory in the heap. I now realize that this way of doing it will be impossible if I want to create, say, 30 Area's with 1 millions users per area, and 200 friends per User. I eat up almost 2gb with 1 Area...So now what. My algorithm would work if I could create all the users ahead of time, then simply "give" friends to each user. But I need the Areas and Users created first. There needs to be a User in an Area before it can be made a "friend".
Next Step:
I like my algorithm, it is simple and easy to understand. What I need is a better way to store this data, since it cant be stored and held in memory all at once. I am going to need to not only access the Area a user belongs too, but also a few random areas as well, for each user.
My Questions:
1. What technology/data structure should I be putting this data into. In the end I basically want a User->Friends relationship. The "Area" idea is a way to make this relationship realistic.
2. Should I be using a different language all together. I know that technologies such as Lucene, Hadoop, etc. were created with Java, and are used for large amounts of data...But I have never used them and would like some guidance before I dive into something new.
3. Where should I begin? Obviously I cannot use only java with the data in memory. But I also need to create these Areas of Users before I can give a User a list of Friends.
Sorry for the semi-long read, but I wanted to lay out exactly where I am so you could guide me in the right direction. Thank you to everyone that took the time to read/help me with this topic.
You need a searchable storage solution to hold your data (rather than holding it all in memory). Either a relational database (such as Oracle, MySQL, or SQL Server) with an O/RM (such as Hibernate) or a nosql database such as mongodb will work just fine.
Use a database with some ORM tool[JPA with Hibernate etc.] ,
Load data Lazily, when they are really needed
Unload them when them from Cache/Session when they are not really required or inactive.
Feel comfortable to let me know in case there is any difficulty to understand.
http://puspendu.wordpress.com/
There is probably no benefit keeping it all in memory, unless you are planning on using every node in some visual algorithm to display relationships.
So, if you use a database then you can build your relationships, give random demographic information, if you want to model that also, and then it is a matter of just writing your queries.
But, if you do need a large amount of data then by using 64-bit Java then you can set the memory to a much larger number, depending on what is on your computer.
So, once you built your relationships, then you can begin to write the queries to relate the information in different ways.
You may want to look at using Lists instead of Arrays, when sizes are different, so that you aren't wasting memory when you read the data back. I expect that is the main reason you are running out of memory, if you assume that there are 100 users and the largest number of friends for any of these is 50, but most will have 10, then for the vast majority of users you are wasting space, especially when you are dealing with millions, as the pointer for each object will become non-trivial.
You may want to re-examine your data structures, I expect you have some ineffiencies there.
You may also want to use some monitoring tools, and this page may help:
http://www.scribd.com/doc/42817553/Java-Performance-Monitoring
Even something as simple as jconsole would help you to see what is going on with your application.
Well you are not breaking new ground here and there are a lot of existing models that you can pull great amounts of information from and tailor to suit your needs. Especially if you are open to the technologies used. I understand your desire to have it fill this huge number from the start but keep in mind a solid foundation can be built upon and changed as needed without a complete rewrite.
There is some good info and many links to additional good info as to what FB, LinkedIn, Digg, and others are doing here at Stackoverflow question 1009025