I work on a graph where I visualize my emails. I want to be able to get the emails from a certain day.
Is this a bad way to store?
HashMap<DateTime, ArrayList<Email>>
Or is it better to convert the date to a string and then use HashMap<String, ArrayList<Email>>
Note, the dates are added without hours, minutes and seconds, so just like 06/07/2010 for example.
DateTime has properly defined equals and hashcode methods, so using those as the key in a HashMap is perfectly OK. There's not much to be gained by converting them to strings first.
I would suggest, however, that if you only want to store the year/month/day components, then you may want to use LocalDate instead of DateTime.
Additionally, you could also consider using TreeMap rather than HashMap, so that your map is automatically sorted by date. Might be handy.
Related
I am doing some basic (edit: reading and writing to a txt file), which requires me to store a bunch of expenses, and their attributes (i.e. name, price, date of purchase, etc.) I would like to be able to compare dates of purchases if possible. It occured to me that I had a few options when it came to what type of object the date of purchase should be:
I could make the date a Calendar object, and store it on the .txt this would mean storing lots of Calendar objects at once, and then easily compare the dates
I could make the date a String, store it, transmute it to a Calendar object, and then compare them
I could leave the dates as strings and when I am ready to compare them, create some kind of code to go through individual characters and pick out a certain phrase or set of characters.
Which of these would probably be best for keeping the load on the computer down? Also, how would you go about loading objects as they build up over time? Once a person has a lot of spending, it would get pretty hefty to load every single item.
I would strongly suggest using Joda Time wherever possible, rather than Calendar and Date - it's a much cleaner date/time API.
Beyond that, definitely make your object model match your domain as closely as possible. You're dealing with dates, not strings - so make your object model reflect that. You should be converting between strings and dates as rarely as possible. It not clear what you mean by "store it on the .txt" (given that elsewhere you're talking about a database) but using JDBC you'd use parameters anyway, without string conversions.
As for load - work out your performance requirements beforehand, try the simplest approach that works, and test whether that meets your requirements. Usually when people talk about having to have an efficient solution they haven't actually considered what they need. You talk about it getting "pretty hefty" to load every single item - how many items? Can you load them in a batch? Where will the database be? You'd be amazed how much data can be processed these days - but you need to understand the parameters of your problem before you make too many decisions that are hard to change later.
The Sortedset can sort itself automatically but in some case, it doesn't work as I want. For example. I stored String date value in a sortedset but apparently it didn't work as my expectation. This is what I got:
[03-10-2013, 06-10-2013, 08-10-2013, 09-10-2013, 18-09-2013, 24-09-2013, 29-09-2013]
Is there any good way to deal with this problem without having to introduce a comparator?
The best way is to avoid using String to represent a Date. Use a Date, which has a natural chronological order. Transform the date to a String only when necessary, i.e. to display it to users or store them in files.
The reason it doesn't work is that the natural ordering of String is the lexicographic order. So "18-09-2013" comes after "03-10-2013", simply because '1' comes after '0' in the lexicographic order.
Use a set of either:
Date objects java.util.Date or
Time in milli seconds java.lang.Integer
These objects can be compared much easier.
My reason for doing so is that dates stored as date objects in whatever database tend to be written in a specific format, which may greatly differ from what you need to present to the user on the front-end. I also think it's especially helpful if your application is pulling info from different types of data stores. A good example would be the difference between a MongoDB and SQL date object.
However, I don't know whether this is recommended practice. Should I keep storing dates as longs (time in milliseconds) or as date objects?
I can't speak for it in relation to MongoDB, but in SQL database, no, it's not best practice. That doesn't mean there might not be the occasional use case, but "best practice," no.
Store them as dates, retrieve them as dates. Your best bet is to set up your database to store them as UTC (loosely, "GMT") so that the data is portable and you can use different local times as appropriate (for instance, if the database is used by geographically diverse users), and handle any conversions from UTC to local time in the application layer (e.g., via Calendar or a third-party date library).
Storing dates as numbers means your database is hard to report against, run ad-hoc queries against, etc. I made that mistake once, it's not one I'll repeat without a really good reason. :-)
It very much depends on:
What database you're using and its date/time support
Your client needs (e.g. how happy are you to bank on the idea that you'll always be using Java)
What information you're really trying to represent
Your diagnostic tools
The third point is probably the most important. Think about what the values you're trying to store really mean. Even though you're clearly not using Noda Time, hopefully my user guide page on choosing which Noda Time type to use based on your input data may help you think about this clearly.
If you're only ever using Java, and your database doesn't have terribly good support for date/time types, and you're only trying to represent an "instant in time" (rather than, say, an instant in a particular time zone, or a local date/time with an offset, or just a local date/time, or just a date...), and you're comfortable writing diagnostic tools to convert your data into more human readable forms - then storing a long is reasonable. But that's a pretty long list of "if"s.
If you want to be able to perform date manipulation in the database - e.g. asking for all values which occur on the first day of the month - then you should probably use a date/time type, being careful around time zones. (My experience is that most databases are at least shocking badly documented when it comes to their date/time types.)
In general, you should use whatever type is able to meet all your requirement and is the most natural representation for that particular environment. So in a database which has a date/time type which doesn't give you issues when you interact with it (e.g. performing arbitrary time zone conversions in an unrequested way), use that type. It will make all kinds of things easier.
The advantage of using a more "primitive" representation (e.g. a 64 bit integer) is precisely that the database won't mess around with it. You're effectively hiding the meaning of the data from the databae, with all the normal pros and cons (mostly cons) of that approach.
It depends on various aspects. When using the standard "seconds since epoch", and someone uses only integer precision, their dates are limited to the 1970-2038 year range.
But there also is some precision issue. For example, unix time ignores leap seconds. Every day is defined to have the same number of seconds. So when computing time deltas between unix time, you do get some error.
But the more important thing is that you assume all your dates to be completely known, as your representation does not have the possibility to half only half-specified dates. In reality, there is a lot of events you do not know at a second (or even ms) precision. So it is a feature if a representation allows specifing e.g. only a day precision. Ideally, you would store dates with their precision information.
Furthermore, say you are building a calendar application. There is time, but there also is local time. Quite often, you need both information available. When scheduling overlaps, you of course can do this best in a synchronized time, so longs will be good here. If you however do also want to ensure you are not scheduling events outside of 9-20 h local time, you also always need to preserve timezone information. For anything that does span more than one location, you really need to include the time zone in your date representation. Assuming that you can just convert all dates you see to your current local time is quite naive.
Note that dates in SQL can lead to odd situations. One of my favorites is the following MySQL absurdity:
SELECT * FROM Dates WHERE date IS NULL AND date IS NOT NULL;
may return records that have the date 0000-00-00 00:00:00, although this violates the popular understanding of logic.
Since this question is tagged with MongoDB: MongoDB does not store dates in String or what not, they actually store it as a long ( http://www.mongodb.org/display/DOCS/Dates ):
A BSON Date value stores the number of milliseconds since the Unix epoch (Jan 1, 1970) as a 64-bit integer. v2.0+ : this number is signed so dates before 1970 are stored as a negative numbers.
Since MongoDB has no immediate plans to utilise the complex date handling functions (like getting only year for querying etc) that SQL has within standard querying there is no real downside, it might infact reduce the size of your indexes and storage.
There is one thing to take into consideration here, the aggregation framework: http://docs.mongodb.org/manual/reference/aggregation/#date-operators there are weird and wonderful things you can only with the supported BSON date type in MongoDB, however, as to whether this matters to you depends upon your queries.
Do you see yourself as needing the aggregation frameworks functions? Or would housing the extra object overhead be a pain?
My personal opinion is that the BSON date type is such a small object that to store a document without it would be determental to the entire system and its future compatibility for no apparent reason. So, yes, I would use the BSON date type rather than a long and I would consider it good practice to do so.
I dont think its a best practice to store dates as long because, that would mean that you would not be able to do any of the date specific queries. like :
where date between
We also wont be able to get the date month of year from the table using sql queries easily.
It is better to use a single date format converter in the java layer and convert the date into that and use a single format throughout the application.
IMHO , storing dates in DB will be best if you can use Strings. Hence avoid unnecessary data going up and down to server , if you don't need all the fields in Calendar.
There is a lot of data is in Calendar and each instance of Calender is pretty heavy too.
So store it as String , with only required data and convert it back to Calendar , whenvever you need them and use them.
I have a rather large text file (~4m lines) I'd like to parse and I'm looking for advice about a suitable data structure in which to store the data. The file contains lines like the following:
Date Time Value
2011-11-30 09:00 10
2011-11-30 09:15 5
2011-12-01 12:42 14
2011-12-01 19:58 19
2011-12-01 02:03 12
I want to group the lines by date so my initial thought was to use a TreeMap<String, List<String>> to map the date to the rest of the line but is a TreeMap of Lists a ridiculous thing to do? I suppose I could replace the String key with a date object (to eliminate so many string comparisons) but it's the List as a value that I'm worried might be unsuitable.
I'm using a TreeMap because I want to iterate the keys in date order.
There's nothing wrong with using a List as the value for a Map. All of those <> look ugly, but it's perfectly fine to put a generics class inside of a generics class.
Instead of using a String as the key, it would probably be better to use java.util.Date because the keys are dates. This will allow the TreeMap to more accurately sort the dates. If you store the dates as Strings, then the TreeMap may not properly sort the dates (they will be sorted as strings, not as "real" dates).
Map<Date, List<String>> map = new TreeMap<Date, List<String>>();
is a TreeMap of Lists a ridiculous thing to do?
Conceptually not, but it is going to be very memory-inefficient (both because of the Map and because of the List). You're looking at an overhead of 200% or more. Which may or may not be acceptable, depending on how much memory you have to waste.
For a more memory-efficient solution, create a class that has fields for every column (including a Date), put all those in a List and sort it (ideally using quicksort) when you're done reading.
There is no objection against using Lists. Though in your case maybe a List<Integer> as values of the Map would be appropriate.
What I need is something like Hashtable which I will fill with prices that were actual at desired days.
For example: I will put two prices: January 1st: 100USD, March 5th: 89USD.
If I search my hashtable for price: hashtable.get(February 14th) I need it to give me back actual price which was entered at Jan. 1st because this is the last actual price. Normal hashtable implementation won't give me back anything, since there is nothing put on that dat.
I need to see if there is such implementation which can find quickly object based on range of dates.
Off the top of my head, there are a couple ways, but I would use a TreeMap<Date> (or Calendar, etc).
When you need to pull out a Date date, try the following:
Attempt to get(date)
If the result is null, then the result is in headMap(date).lastKey()
One of those will work. Of course, check the size of headMap(date) first because lastKey() will throw an Exception if it is empty.
You could use a DatePrice object that contains both and keep those in a list or array sorte by date, then use binary search (available in the Collections and Arrays classes) to find the nearest date.
This would be significantly more memory-effective than using TreeMap, and it doesn't look like you'll want to insert or remove data randomly (which would lead to bad performance with a array).
Create a Tree Map with Date,String. If some one calls for a date then convert the string to date and call map.get(date), if you find then take the previous key than the current element.
You have all your tools already at hand. Consider a TreeMap. Then you can create a headmap, that contains only the portion of the map that is strictly lower that a given value. Implementation example:
TreeMap<Date,Double> values = new TreeMap<Date,Double>();
...fill in stuff...
Date searchDate = ...anydate...
// Needed due to the strictly less contraint:
Date mapContraintDate = new Date(searchDate.getTime()+1);
Double searchedValue = values.get(values.headMap(mapContraintData).lastKey);
This is efficient, because the headMap is not create by copying the original map, but returns only a view.