Better data structure for retrieving data in between dates

Better data structure for retrieving data in between dates - java

I have a class which stores a date as the key and a price as the value. My data structures stores about 5M entries. When I want to retrieve the data which are in a certain date range, I will loop through the data structure and check if the current data is within the date range.
e.g.
if (startDate >= data.date && data.date <= endDate)
//do something
However, this is extremely inefficient. Is there a better way to do this?

If memory/performance is not a constraint*, you could simply use a TreeMap, which has a subMap method that allows you to filter on a time window:
TreeMap<Date, Double> data = ...;
for (Double price : data.subMap(startDate, true, endDate, true).values()) {
//do something with price
}
*i.e. if you don't need to keep the prices as primitive doubles for example

Make sure that the data is ordered by the key (i.e. by the date).
Use binary search to find the start date
Enumerate as long as you haven't reached the end date
Voila
Edit: Yup, using a TreeMap automates the job pretty well. Don't know if you are allowed to change your data structure.

Related

Sorting arrays different ways in Java

I have a map(String, LinkedList) mp. This corresponds to a table like this:
Name Date Dept
Tony 4/1/2014 55125
Bob 3/2/2013 54112
Jill 7/2/2014 55265
(I just made these up for this example. The first row (Name, Date, Dept) are the headings and correspond to the key value in the map (so you do an mp.get("Name") for example). The linked list that is returned for Name is , for Date is <4/1/2014, 3/2/2013, 7/2/2014> and Dept is <55125, 54112, 55265> etc.
I need to get the list for a value and then sort it. For Name and Dept I believe I can just do a Collections.sort(), but this will not work for date. If date was yyyymmdd it would work, but date is mm/dd/yyyy which is not guaranteed to sort correctly, and usually won't. I suppose I could process each entry and change it to yyyymmdd, but some of the fields will be blank (or maybe null but I think blank).
I can easily determine when to sort normally and when by date. Just not sure how to use Collections.sort() or something else to sort the date list. Can anyone help?

The Collections.Sort method allows you to (optionally) specify your own Comparator. In your case, when you sort the date column you will need to provide your own comparator.
If you are using Java 8 then the code will look something like this:
Collections.sort(dataMap.get("Date"), (date1, date2) -> compareDates(date1, date2));
The compareDates method would then need to be able to handle nulls and would need to pull apart the date components and return number of days from date1 to date2. You could easily use the java.util.Date object to do that - just convert the dates (possibly using DataFormat) and then compare.

Storing tables in java for refrencing

So the question is regarding optimization of the code. I have a table for retirement date which im going to list below
Year of Birth Full Retirement Age
1937 or earlier.............................65
1938........................................65 years 2 months
1939........................................65-4
1934.......................................65-6
.
.
.and the list is a long list
What i want to do is to store this table in a in list object or something so that I can pass in the year of birth in a method and the list object and get back the corresponding retirement age. I dont want to have a lot of If and Else Statements in my code because the list is so damn big and the code will be confusing.
What can be a possible solution for this problem?
Thanks in advance

Try using map instead of list. Use year of birth as key, so that you can directly get the associated value from the map.

You can use map but there is a chance for duplicate keys.
Two persons can born in same year.
Use MultiMap
A Multimap that can hold duplicate key-value pairs and that maintains the insertion ordering of values for a given key. See the Multimap documentation for information common to all multimaps.

Use a map. Map is a List object with Key:Value.
Map<String, Object> map = new HashMap<String, Object>();
map.put('1937', 65);
...
To go through a map you can use this:
for (String key : map.keySet()) {
System.out.println(map.get(key));
}
You can change values for <String, Object> as you wish (Integer, Date... or whatever). Always follow the same order <KeyType, ValueType>

Store your list/table into a HashMap...then retrieve from your method, something like:
public String getRetirementAge(String yearOfBirth) {
return yourMap.get(yearOfBirth);
}

If you have data for every year i would use a java map http://docs.oracle.com/javase/tutorial/collections/interfaces/map.html where the key is the year and the value is the retirement value.
This would give you an O(1)
If you have sparse data and you have somehow to calculate the nearest year you could either use a sorted List and use Binary search which gives you an O(logn) or even use a B-tree.
BR,
David

I would recommend that you store this information in a database, especially if the list is a very long list (which you say it is). There will be many optimizations that come from using a database. For one thing, you won't have to store that huge list in memory. For another, SQL queries for data are often much faster than data structures in code. Martin Fowler has an (admittedly old) article about this at http://www.martinfowler.com/articles/dblogic.html. The other thing you gain from putting this in a database is that this is the type of list that is likely to change. They are already talking about adjusting retirement age in order to save social security. It is much easier to update data in a database than it is to edit code and recompile / redeploy.
The type of database you use can be NoSQL or relational, embedded or online. That decision I'll leave up to you. It will be a bonus for you if there is already a database available to this application for other reasons.

Tweet Analyis : How to design

I need advice in designing a system meant for tweet analysis.
Objective: For a given hashtag, find out the frequency of co-occurrence with other hash-tags. Find out hourly pattern. We should be able to answer queries of this format: For a given date (say 13/Apr/2013) and for a given one hour time period (say 3:00-4:00 PM ) what are the top 5 co-occurring hashtag with "#iPhone".
My Approach: I am using "twitter4j" liabrary to access twitter data. I can query and get 100 tweets for one call(twitter only allows only those many). I can extract time and other relevant data. I am planning to have thread which will query twitter for every 5 mins. This is done to observer hourly patterns. Here is where I am struck: How should I store this information in DB? Should I maintain a hashmap with key as and value as frequency of occurring with "#iPhone". Or should I store unaggregated data directly in DB? what is the best way to query "twitter" to observer hourly patterns? Should I store the time in "epoch" format in DB or as date one column and hour as another column in DB ?
Thanks a lot for your valuable inputs.

I would suggest you to use the Streaming API in Twitter. That will allow you to keep a persistent HTTP connection to twitter so that you can search over tweets. Twitter recommend the Streaming API for tweet analysis type applications.
But you have to pre-process certain data so that the analysis will be faster. Also look into twitter4j's inherent Streaming API support.
For an example please look into the following Github code.

As ay89 said, use key - tag and value - freq, aggregate before storing to DB, and use epoch.
In addition, because this is a multithreaded program, you have two options for synchronization:
Option 1 is to use a ConcurrentHashMap. When the aggregator runs, it will use:
(for Key key : hashMap.keySet()) {
Database.save(key, hashMap.get(key));
hashMap.replace(key, 0);
}
In other words, set a tag's freq to 0 after writing it to the database. And the method adding tweet data will use
public void increment(Key key) {
boolean done = false;
while(!done) {
int current = hashMap.get(key);
int newValue = current + 1;
done = hashMap.replace(key, current, newValue);
}
}
This is a thread-safe way to increment the frequency.
Option 2 probably makes more sense. Your aggregator will replace the hashmap with a new instance.
class DataStore {
Map map = new HashMap();
public void add(Key key, Value value) {
// called by the method querying tweet data
}
public void aggregate() {
// called by the aggregator thread every five minutes
Map oldMap = map;
map = new HashMap();
DataBase.save(oldMap);
}
}
Bottom line is that you don't want to modify the hashmap in an uncontrolled fashion while the aggregator is saving it to the database. The second option is simpler because it simply creates a new hashmap for the querying thread to modify while the aggregator saves the old hashmap to the database.

since you only have to retrieve the frequency, its better to store it in hash, (key - tag, value - freq) because having non-aggregated data stored in db would take more space (and mostly for info which is not required) and ultimately you would have to aggregate it later.
epoch time is good way to store the time. since you can use it to localize it according to timezone, if required later on.

Suitable Java data structure for parsing large data file

I have a rather large text file (~4m lines) I'd like to parse and I'm looking for advice about a suitable data structure in which to store the data. The file contains lines like the following:
Date Time Value
2011-11-30 09:00 10
2011-11-30 09:15 5
2011-12-01 12:42 14
2011-12-01 19:58 19
2011-12-01 02:03 12
I want to group the lines by date so my initial thought was to use a TreeMap<String, List<String>> to map the date to the rest of the line but is a TreeMap of Lists a ridiculous thing to do? I suppose I could replace the String key with a date object (to eliminate so many string comparisons) but it's the List as a value that I'm worried might be unsuitable.
I'm using a TreeMap because I want to iterate the keys in date order.

There's nothing wrong with using a List as the value for a Map. All of those <> look ugly, but it's perfectly fine to put a generics class inside of a generics class.
Instead of using a String as the key, it would probably be better to use java.util.Date because the keys are dates. This will allow the TreeMap to more accurately sort the dates. If you store the dates as Strings, then the TreeMap may not properly sort the dates (they will be sorted as strings, not as "real" dates).
Map<Date, List<String>> map = new TreeMap<Date, List<String>>();

is a TreeMap of Lists a ridiculous thing to do?
Conceptually not, but it is going to be very memory-inefficient (both because of the Map and because of the List). You're looking at an overhead of 200% or more. Which may or may not be acceptable, depending on how much memory you have to waste.
For a more memory-efficient solution, create a class that has fields for every column (including a Date), put all those in a List and sort it (ideally using quicksort) when you're done reading.

There is no objection against using Lists. Though in your case maybe a List<Integer> as values of the Map would be appropriate.

I need data structure for effective handling with dates

What I need is something like Hashtable which I will fill with prices that were actual at desired days.
For example: I will put two prices: January 1st: 100USD, March 5th: 89USD.
If I search my hashtable for price: hashtable.get(February 14th) I need it to give me back actual price which was entered at Jan. 1st because this is the last actual price. Normal hashtable implementation won't give me back anything, since there is nothing put on that dat.
I need to see if there is such implementation which can find quickly object based on range of dates.

Off the top of my head, there are a couple ways, but I would use a TreeMap<Date> (or Calendar, etc).
When you need to pull out a Date date, try the following:
Attempt to get(date)
If the result is null, then the result is in headMap(date).lastKey()
One of those will work. Of course, check the size of headMap(date) first because lastKey() will throw an Exception if it is empty.

You could use a DatePrice object that contains both and keep those in a list or array sorte by date, then use binary search (available in the Collections and Arrays classes) to find the nearest date.
This would be significantly more memory-effective than using TreeMap, and it doesn't look like you'll want to insert or remove data randomly (which would lead to bad performance with a array).

Create a Tree Map with Date,String. If some one calls for a date then convert the string to date and call map.get(date), if you find then take the previous key than the current element.

You have all your tools already at hand. Consider a TreeMap. Then you can create a headmap, that contains only the portion of the map that is strictly lower that a given value. Implementation example:
TreeMap<Date,Double> values = new TreeMap<Date,Double>();
...fill in stuff...
Date searchDate = ...anydate...
// Needed due to the strictly less contraint:
Date mapContraintDate = new Date(searchDate.getTime()+1);
Double searchedValue = values.get(values.headMap(mapContraintData).lastKey);
This is efficient, because the headMap is not create by copying the original map, but returns only a view.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Better data structure for retrieving data in between dates - java

Make sure that the data is ordered by the key (i.e. by the date). Use binary search to find the start date Enumerate as long as you haven't reached the end date Voila Edit: Yup, using a TreeMap automates the job pretty well. Don't know if you are allowed to change your data structure.

Related

Sorting arrays different ways in Java

Storing tables in java for refrencing

Tweet Analyis : How to design

Suitable Java data structure for parsing large data file

I need data structure for effective handling with dates

Categories

Resources