Multiple iCalendar VEvents or VTODOs for the same meeting - java

My application needs to have a feature which would allow the creation of sort of like a project, where you put the total number of hours you need to work on it, the start date, the end date and how long each activity takes(additional constraints might be included as well).
What is the best way to create multiple VEvents according to those constraints with an option to change those VEvents? Also what's the best method to check the current iCalendar if the date is busy or not? Can I somehow retrieve all the busy dates from ics file and then just kind of check if the time gap is free or busy?

Typically, projects are described in terms of tasks (VTODO) instead of events. See http://www.calconnect.org/7_things_tasks.shtml for an introduction on tasks. This document also describes how tasks can be grouped together.
The second part of your question is a bit fuzzy. With a library like ical4j, you can make freebusy requests within a stream of vtodos. The other option would be to rely on a CalDAV server to store those.

Related

Choosing databasetype for a decentralized calendar project

I am developing a calendar system which is decentralised. It should save the data on each device and synchronise if they have both internet connection. My first idea was, just using a relational database and try to synchronise data after connection. But the theory says something else. The Brewers CAP-Theorem describes the theory behind it, but i am not sure if this theorem maybe is outdated. If i use this theorem i have "AP [Availability/Partition Tolerance] Systems". "A" because i need at any given time the data for my calendar and "P" because it can happen, that there is no connection between the devices and the data can't be synchronised. The example databases are CouchDB, RIAK or Cassandra. I have worked only with relational databases and doesn't know how to go on now. Is it that bad to use a relational Database for my project?
This is for my bachelor thesis. I just wanted to start using Postgres but then i found this theorem...
The whole project is based on Java.
I think the CAP theorem isn't really helpful to your scenario. Distributed systems that deal with partitions need to decide what to when one part wants to make a modification to the data, but can't reach the other part. One solution is to make the write wait - and this is giving up the "availability" because of the "partition", one of the options presented by the CAP theorem. But there are more useful options. The most useful (highly-available) option is to allow both parts to be written independently, and reconcile the conflicts when they can connect again. The question is how to do that, and different distributed systems choose different approaches.
Some systems, like Cassandra or Amazon's DynamoDB, use "last writer wins" - when we see a conflict between two conflicting writes, the last one (according some synchronized clock) wins. For this approach to make sense you need to be very careful about how you model your data (e.g., watch out for cases where the conflict resolution results in an invalid mixture of two states).
In other systems (and also in Cassandra and DynamoDB - in their "collection" types) writes can still happen independently on different nodes, but there is more sophisticat conflict resolution. A good example is Cassandra's "list": One can send an update saying "add item X to the list", and another update saying "add item Y to the list". If these updates happen on different partitions, the conflict is later resolved by adding both X and Y to the list. The data structures such as this list - which allows the content to be modified independently in certain ways on two nodes and then automatically reconciled in a sensible way, is known as a Conflict-free Replicated Data Type (CRDT).
Finally, another approach was used in Amazon's Dynamo paper (not to be confused by their current DynamoDB service!), known as "vector clocks": When you want to write to an object - e.g., a shopping cart - you first read the current state of the object and get with it a "vector clock", which you can think of as the "version" of the data you got. You then make the modification (e.g., add an item to the shopping cart), and write back the new version while saying what was the old version you started with. If two of these modifications happen on parallel on different partitions, we later need to reconcile the two updates. The vector clocks allow the system to determine if one modification is "newer" than the other (in which case there is no conflict), or they really do conflict. And when they do, application-specific logic is used to reconcile the conflict. In the shopping cart example, if we see the conflict is that in one partition item A was added to the shopping cart and in the other partition, item B was added to the shopping cart, the straightforward resolution is to just add both times A and B to the shopping cart.
You should probably pick one of these approaches. Just saying "the CAP theorem doesn't let me do this" is usually not an option ;-) In fact, in some ways, the problem you're facing is different than some of the systems I mentioned. In those systems, the common case is every node is always connected (no partition), with very low latency, and they want this common case to be fast. In your case, you can probably assume the opposite: the two parts are usually not connected, or if they are connected there is high latency, so conflict resolution because the norm, rather than the exception. So you need to decide how to do this conflict resolution - what happens if one adds a meeting on one device and a different meeting on the other device (most likely, just keep both as two meetings...), how do you know that one device modified a pre-existing meeting and didn't add a second meeting (vector clocks? unique meeting ids? etc.) so the conflict resolution ends up fixing the existing meeting instead of adding a second one? And so on. Once you do that, where you store the data on both partitions (probably completely different database implementations in the client and server) and which protocol you send the updates on become implementation details.
There's another issue you'll need to consider. When do we do these reconciliations? In many systems like I listed above, the reconciliation happens on read: If the client wants to read data and we suddenly see two conflicting versions on two reachable nodes, we reconcile. In your calendar application, you need a slightly different approach: It is possible that the client will only ever try to read (use) the calendar when not connected. You need to use the rare opportunities when he is connected to reconcile all the differences. Moreover, you may need to "push" changes - e.g., if the data on the server changed, the client may need to be told, "hey, I have some changed data, come and reconcile", so the end-user will immediately see an announcement on a new meeting, for example, that was added remotely (e.g., perhaps by a different user sharing the same calendar). You'll need to figure out how you want to do this. Again, there is no magic solution like "use Cassandra".

Google Places API - saving place_id and violation of terms and conditions

I want to build an app which shows places around user using Google Places based on user interests. As mentioned here:
Place IDs are exempt from the caching restrictions stated in Section
10.5.d of the Google Maps APIs Terms of Service. You can therefore store place ID values indefinitely.
So, can I save place_id in cloud database and perform any analytics operation over it? For example; if I gather place_ids added in each user's favorite places table and from analytics; I can know which place_id are the most ones added to favorites? or can I show something like 'Trending Places' in app from gathered place_ids in responses?
Will it violate the terms and conditions? I read the whole page of terms but couldn't find the answer.
can anyone help me out? Thanks.
Yes you can 100% store the place_id indefinitely and reuse it.
See Referencing a Place with a Place ID.
Please note one thing that
A single place ID refers to only one place, but a place can have
multiple place IDs
These terms and conditions are kind of self explanatory. Except your requirement which will be clarified after the below link is read carefully. As per your requirement , inorder to prevent calling services next time with same query which user had done with an intention of saving network calls is acceptable.
No caching or storage: You will not pre-fetch, cache, index, or store any Content to be used outside the Service, except that you may store limited amounts of Content solely for the purpose of improving the performance of your Maps API Implementation due to network latency (and not for the purpose of preventing Google from accurately tracking usage), and only if such storage
1) is temporary (and in no event more than 30 calendar days)
2) is secure 3)
does not manipulate or aggregate any part of the Content or Service 4) and
does not modify attribution in any way. Go through this Section 10.5 Intellectual Property Restrictions. Subsection (B)
You'll need to contact Google to get a 100% answer.
That being said, from my experience it looks like the clause you included is intended exactly for the kind of thing you want to do.
Again, I want to reiterate that contacting Google directly is something you should do if you still have concerns.
You can store place ID values indefinitely.
Just What part of
You can therefore store place ID Values indefinitely.
Don't you understand?
Indefinitely requires a server.

What sort of data format to use for a project dealing with massive quantities of personalized data?

I'm working on an application for a Nursing students; It is a program where a user enters data about their Patient's Vitals, Skin Assessments, Medicine Administered, etc.
Flowchart for program structure in respect to Data:
That data needs to be saved in a structure divisible by Patient and then by the Time recorded. Problem is this is going to be a HUGE amount of data since entries need to be made every 15 minutes.
Flowchart for what interactions necessary between the project and its data:
request patient var over Time and request populate timeline both search for all entries of that patient between two given dates.
The best way I can think of how to organize this data is directory based:
data/PatientName/Month/19102012.file (the date 19 Oct 2012, for quick omission of ignored dates)
This way might be okay but it feels really hacked together, what better organization should I use for this data?
I honestly don't think students entering patient data every 15 minutes qualifies as HUGE these days. As such, virtually any technology would be of use. Some sort of relational database is an obvious choice, and given the above, I don't think you need anything remotely enterprise-scale.
One question that springs to mind is, is security important ? This is medical data, after all. That may influence the technology you choose since filesystems implement security in a radically different fashion to (say) the filesystem.
The one piece of advice I can give now is to abstract your data storage away from the rest of your solution. That way you can implement something trivial now and replace it easily in the future as your requirements solidify.
You can define a custom class(A POJO) containing all the parameters needed as properties in that POJO, and stuff the instances created of that POJO in some database.
Using Database might be an elegant way to handle huge amount of data.
Your suggested directory-based approach would realistically probably be fine. As Brian and Rohit pointed out, the key is that you want to abstract out the data storage. In other words, you should have some interfaces between components of your system that provide the data access methods that you want, and then link up what you want (i.e., request a specific patient, over some time period, etc) with what you have (i.e., a filesystem, or a database, etc).
As Brian pointed out, in today's world "huge" refers to an entirely different scale than recording entries every 15 minutes. I would build something that works, and then address the scale problem when and if it arises. There are lots of other important things to worry about as well, such as security, reliability, etc.

How to efficiently trigger events in Java according to date?

I am building a relatively complex app for a company that keeps track of several objects and triggers events according to the object's age or preset dates contained in it.
For example:
Object object_A contains a "renewal date". When the renewal date and the system date coincide, the application needs to run a particular routine.
My question is: Since there will be thousands of such objects, what is the most efficient way of keeping track of all of them and triggering the respective routines at the respective dates and times?
Worth mentioning that I've used Calendar objects for describing these dates (though they can easily be converted into Date objects, so that's not entirely relevant).
I'd appreciate any pointing into the right direction...
Quartz is an excellent library for robust/complex task scheduling.
I will encourage you (since you say you're building the app) to reconsider about responsibilities. I don't think it should be object responsibility to maintain that renewal date. I mean you can do it that way and has an agent that will periodically visit those items to see if its renewal date is today.
But, if you think as an external agent that will contain a renewal dates table it will be more efficient since:
All objects' triggers with same renewal date will fire at the same time.
Managing the check logic will be centralized
Cost (in space) of storing renewal date will reduce if collapse into a same date.
This is a big picture view but hopes it serves.

User matching with current data

I have a database full of two different types of users (Mentors and Mentees), whereby I want the second group (Mentees) to be able to "search" for people in the first group (Mentors) who match their profile. Mentors and Mentees can both go in and change items in their profile at any point in time.
Currently, I am using Apache Mahout for the user matching (recommender.mostSimilarIDs()). The problem I'm running into is that I have to reload the user data every single time anyone searches. By itself, this doesn't take that long, but when Mahout processes the data it seems to take a very long time (14 minutes for 3000 Mentors and 3000 Mentees). After processing, matching takes mere seconds. I also get the same INFO message over and over again while it's processing ("Processed 2248 users"), while looking at the code shows that the message should only be outputted every 10000 users.
I'm using the GenericUserBasedRecommender and the GenericDataModel, along with the NearestNUserNeighborhood, AveragingPreferenceInferrer and PearsonCorrelationSimilarity. I load mentors from the database, add the mentee to the list of POJOs and convert them to a FastByIDMap to give to the DataModel.
Is there a better way to be doing this? The product owner needs the data to be current for every search.
(I'm the author.)
You shouldn't need to ask it to reload the data every time, why's that?
14 minutes sounds way, way too long to load such a small amount of data too, something's wrong. You might follow up with more info at user#mahout.apache.org.
You are seeing log messages from a DataModel, which you can disable in your logging system of choice. It prints one final count. This is nothing to worry about.
I would advise you against using a PreferenceInferrer unless you absolutely know you want it. Do you actually have ratings here? I might suggest LogLikelihoodSimilarity if not.

Categories