I want to build an app which shows places around user using Google Places based on user interests. As mentioned here:
Place IDs are exempt from the caching restrictions stated in Section
10.5.d of the Google Maps APIs Terms of Service. You can therefore store place ID values indefinitely.
So, can I save place_id in cloud database and perform any analytics operation over it? For example; if I gather place_ids added in each user's favorite places table and from analytics; I can know which place_id are the most ones added to favorites? or can I show something like 'Trending Places' in app from gathered place_ids in responses?
Will it violate the terms and conditions? I read the whole page of terms but couldn't find the answer.
can anyone help me out? Thanks.
Yes you can 100% store the place_id indefinitely and reuse it.
See Referencing a Place with a Place ID.
Please note one thing that
A single place ID refers to only one place, but a place can have
multiple place IDs
These terms and conditions are kind of self explanatory. Except your requirement which will be clarified after the below link is read carefully. As per your requirement , inorder to prevent calling services next time with same query which user had done with an intention of saving network calls is acceptable.
No caching or storage: You will not pre-fetch, cache, index, or store any Content to be used outside the Service, except that you may store limited amounts of Content solely for the purpose of improving the performance of your Maps API Implementation due to network latency (and not for the purpose of preventing Google from accurately tracking usage), and only if such storage
1) is temporary (and in no event more than 30 calendar days)
2) is secure 3)
does not manipulate or aggregate any part of the Content or Service 4) and
does not modify attribution in any way. Go through this Section 10.5 Intellectual Property Restrictions. Subsection (B)
You'll need to contact Google to get a 100% answer.
That being said, from my experience it looks like the clause you included is intended exactly for the kind of thing you want to do.
Again, I want to reiterate that contacting Google directly is something you should do if you still have concerns.
You can store place ID values indefinitely.
Just What part of
You can therefore store place ID Values indefinitely.
Don't you understand?
Indefinitely requires a server.
Related
I am developing a calendar system which is decentralised. It should save the data on each device and synchronise if they have both internet connection. My first idea was, just using a relational database and try to synchronise data after connection. But the theory says something else. The Brewers CAP-Theorem describes the theory behind it, but i am not sure if this theorem maybe is outdated. If i use this theorem i have "AP [Availability/Partition Tolerance] Systems". "A" because i need at any given time the data for my calendar and "P" because it can happen, that there is no connection between the devices and the data can't be synchronised. The example databases are CouchDB, RIAK or Cassandra. I have worked only with relational databases and doesn't know how to go on now. Is it that bad to use a relational Database for my project?
This is for my bachelor thesis. I just wanted to start using Postgres but then i found this theorem...
The whole project is based on Java.
I think the CAP theorem isn't really helpful to your scenario. Distributed systems that deal with partitions need to decide what to when one part wants to make a modification to the data, but can't reach the other part. One solution is to make the write wait - and this is giving up the "availability" because of the "partition", one of the options presented by the CAP theorem. But there are more useful options. The most useful (highly-available) option is to allow both parts to be written independently, and reconcile the conflicts when they can connect again. The question is how to do that, and different distributed systems choose different approaches.
Some systems, like Cassandra or Amazon's DynamoDB, use "last writer wins" - when we see a conflict between two conflicting writes, the last one (according some synchronized clock) wins. For this approach to make sense you need to be very careful about how you model your data (e.g., watch out for cases where the conflict resolution results in an invalid mixture of two states).
In other systems (and also in Cassandra and DynamoDB - in their "collection" types) writes can still happen independently on different nodes, but there is more sophisticat conflict resolution. A good example is Cassandra's "list": One can send an update saying "add item X to the list", and another update saying "add item Y to the list". If these updates happen on different partitions, the conflict is later resolved by adding both X and Y to the list. The data structures such as this list - which allows the content to be modified independently in certain ways on two nodes and then automatically reconciled in a sensible way, is known as a Conflict-free Replicated Data Type (CRDT).
Finally, another approach was used in Amazon's Dynamo paper (not to be confused by their current DynamoDB service!), known as "vector clocks": When you want to write to an object - e.g., a shopping cart - you first read the current state of the object and get with it a "vector clock", which you can think of as the "version" of the data you got. You then make the modification (e.g., add an item to the shopping cart), and write back the new version while saying what was the old version you started with. If two of these modifications happen on parallel on different partitions, we later need to reconcile the two updates. The vector clocks allow the system to determine if one modification is "newer" than the other (in which case there is no conflict), or they really do conflict. And when they do, application-specific logic is used to reconcile the conflict. In the shopping cart example, if we see the conflict is that in one partition item A was added to the shopping cart and in the other partition, item B was added to the shopping cart, the straightforward resolution is to just add both times A and B to the shopping cart.
You should probably pick one of these approaches. Just saying "the CAP theorem doesn't let me do this" is usually not an option ;-) In fact, in some ways, the problem you're facing is different than some of the systems I mentioned. In those systems, the common case is every node is always connected (no partition), with very low latency, and they want this common case to be fast. In your case, you can probably assume the opposite: the two parts are usually not connected, or if they are connected there is high latency, so conflict resolution because the norm, rather than the exception. So you need to decide how to do this conflict resolution - what happens if one adds a meeting on one device and a different meeting on the other device (most likely, just keep both as two meetings...), how do you know that one device modified a pre-existing meeting and didn't add a second meeting (vector clocks? unique meeting ids? etc.) so the conflict resolution ends up fixing the existing meeting instead of adding a second one? And so on. Once you do that, where you store the data on both partitions (probably completely different database implementations in the client and server) and which protocol you send the updates on become implementation details.
There's another issue you'll need to consider. When do we do these reconciliations? In many systems like I listed above, the reconciliation happens on read: If the client wants to read data and we suddenly see two conflicting versions on two reachable nodes, we reconcile. In your calendar application, you need a slightly different approach: It is possible that the client will only ever try to read (use) the calendar when not connected. You need to use the rare opportunities when he is connected to reconcile all the differences. Moreover, you may need to "push" changes - e.g., if the data on the server changed, the client may need to be told, "hey, I have some changed data, come and reconcile", so the end-user will immediately see an announcement on a new meeting, for example, that was added remotely (e.g., perhaps by a different user sharing the same calendar). You'll need to figure out how you want to do this. Again, there is no magic solution like "use Cassandra".
I am developing an app which will give you nearby Mosques within 10 km of your current location. Now that the Places API allows a certain number of queries per day, I have used firebase to store nearby Mosques for a certain location and I first check if the data is in database or not before querying. But this still doesn't solve the problem. e.g. if a user is on the go the whole day then the results must be changing every single minute, according to his/her location. How can I achieve the desirable results?
As mentioned earlier, I am saving nearby locations in a database with their relative location (around which they exist). But this doesn't quite solve the problem.
Any help will be greatly appreciated.
Places API is a commercial offering - you are meant to pay for using it, if you want to make applications around it.
There's a certain small number of calls that you can do for free, but this is only meant as testing grounds or private use. I am no lawyer, but I would guess that circumventing the fee by scraping the map (like setting a bot to go around a country to build a database of points of interests) would be illegal and would probably get you a letter from Google saying you should stop.
Use AutocompleteSessionToken class to generate a token and place it after your key , this token will reduce your usage because you can request the places api multiple times and still it will be considered as a single request. i hope this will help cause i didnt get your question very well. here is sample of the link:
https://maps.googleapis.com/maps/api/place/autocomplete/json?input=1600+Amphitheatre&key=&sessiontoken=1234567890.
For more details.see here
I am currently developing an Android Game and my goal is to create a free/demo version of the game, so that the users can try it out. But I also want that the savegames from the demo are automatically imported in the full version.
The savegames I store in the applications private storage and they are basically JSON strings mapping several Java Objects. The user can create as many "new games" as he wants and there is an "auto save" and a "manual save" file for each game he started. To keep track of all the files, I have a list containing the filenames and some additional information (like the players name etc).
So basically there are quite a lot of small files handling the savegames. This may not be the most elegant approach, but it works quite well.
So here is my question: Lets say the user has started a game in the demo version (so there will be 3 files saved in the private storage of the demo version). How can I now access these files from within the full version?
The two versions won't be much different. They are actually the same, despite the limitations of the demo version. But I would be using the same code base.
I know there has been quite some questions about this issue in this forum and elsewhere, but I was not able to find a suitable solution. All I could find involved either:
storing the files in a world readable storage (like the SD-card) or
using the SharedPreferences
But I neither want the user to be able to read the savegames (or even alter it – because this could mess up the game) – so no sd-card, nor can I use the SharedPreferences, because each single savegame has approx. 200 lines of code (many many java objects translated into JSON) and mapping all those values and objects into some kind of key-value structure used for the SharedPreferences seems quite impossible to me.
Is this all messed up, or does anyone have an idea?
Thank you for taking the time, looking forward to hear your ideas!
Christoph
So I see just 2 Solutions:
The first is a WorldReadable SharedPreferences. You said, that you store JsonStrings, so there is no need to map them any further down, if you can make Objects out of your json-strings (I like to use Gson for this kind of work), you can simply store these
Strings inside SharedPreferences.
The second Way is to bother with ContentProviders and implement a ContentResolver interface. This is the safest way I can imagine for your use-case, but you have to implement a lot for it
What you can't avoid
There are two things that you can't avoid:
If the user decides to root the phone, you can't prevent a user from accessing it, doesn't matter what you do to make it harder.
If you want a second app to access the same data (the saved games) in a non rooted device, there would always be a away for user access it from outside your apps.
What can you do to make it harder
You can encrypt (i.e. using device IMEI) the data before store it in a file or shared preferences (together with a hash to prevent changes)
You can store the data in a SQLite database (would require more knowledge to change it), and encrypt before store it (even harder).
You can use SQLCipher to store it in a ciphered database (encryption will be transparent).
Regards.
You can use a shared Content Provider (here the general documentation about ContentProviders http://developer.android.com/guide/topics/providers/content-providers.html)
you then have to declare it as exportable using the flag: android:exported="true" in the manifest
example:
<provider android:name="[yourpackage_here].SavegameProvider"
android:authorities="[yourpackage_here].SavegameProvider"
android:exported="true" />
you will be able to open it within your new app.
I don't want to use Lucene because i think it is to heavy.
Is there any easier way to implement this (Millons of data) ?
If you don't want to have to worry about performance, I recommend you take a look at Amazon Web Services new CloudSearch service. It's fast and scales as your needs scale. It can also handle millions of documents without a problem and supports wildcard searches (ex: quo*, would retrieve Quora).
Check it out here.
Obviously this isn't how it definitely works at either Quora or Google, as I haven't had the pleasure to work at either...this is just how I'd go about doing it.
The first thing to obtain is a list of search terms - I'm assuming you don't want to know how this is done, as it will really depend on all sorts of things, but basically you're either going to do a select distinct title from pages (in the case of the autocomplete on Wikipedia) or something much more advanced in the case of Google's.
The next step is also pretty simple at a high level: you need to perform the query select title from titles where title like 'Qu%' in the case of the user typing Qu into the search box. The list of titles is then returned to the browser as the response to some kind of Ajax request, perhaps in the form of JSON or similar. And you need to do it as fast as possible - that's where it becomes difficult.
How do they do it so quickly? There are probably four things to bear in mind.
They have LOTS of machines handling the requests. Bear in mind that Google's autocomplete is turned on by default and works in (almost?) all languages. That's a lot of searches against the autocomplete index. A lot more than there will be against the web index itself: for each web search request, Google will probably have processed 3 or 4 autocomplete requests.
They're probably doing it in memory. Google is already known to store its web indexes in memory, so I would expect them to be doing the same with this.
Specialised software (this is where it gets really interesting). While a traditional database or a NoSQL database could do this and do it quickly I would expect the big boys to actually be doing this with specialised code whose sole purpose is to provide autocomplete suggestions. The SQL statement I provided above was purely to demonstrate the logical request that would be needed. You're probably looking at some kind of specialised tree, such as a suffix tree, radix tree, or similar.
Sharding. To cope with the quantity of data and the number of machines doing the requests you're going to need to shard. That is ensure that a certain subset of all the machines involved only process requests requests that begin with one or more letters. eg a group of X machines processing searches that begin with a certain letter or even 2 letters. That means that you've got more machines, but they don't each have to have the whole index to hand. How does a particular group of machines get chosen? You're either routing once the request is in your data centre, or you could route on the client side (eg in your Javascript decide which IP to query based upon the first X letters of the search term)
So, that's how I would do it. Not having had the experience of the enormous datasets Google/Quora are dealing with, I'm sure there are things that I've not considered. But, it's a start.
And, here's how I have done it, purely in an experimental environment at home:
I had a simple list of a good few hundred thousand titles to search. These were loaded into a dedicated MongoDB collection, which had a single index defined on it. I then had a Play Framework controller in front of it and used jQuery's autocomplete plugin to do the search.
Obviously this is tiny compared with what you are looking for, but MongoDB should provide the same kind of performance for your dataset provided you follow the recommendations (ie good hardware, lots of RAM, keep the indexes in memory). In addition, Mongo supports sharding, and the Play Framework is shared nothing, so adding new machines to cope with the load should your userbase grow would be straightforward in this situation.
By the way, Mongo is by no means the only solution, traditional SQL databases will be up to the job too, of course - I was just using Mongo for other reasons.
First, for autocomplete you should aim to get the response back to the user in <= 100ms if you want something that appears fast. That should be your first concern. Any setup that can't do that probably won't be good enough for users. In my own tests in Firefox using Firebug, Google's autocomplete returned returns in about 50ms and Quora in about 65ms.
See, e.g.
http://stackoverflow.com/questions/536300/what-is-the-shortest-perceivable-application-response-delay
Apparently, Quora uses prefix matching, not full text search which makes it faster. To roll your own fast prefix-based autocomplete, which should be sufficient for many cases, but won't handle things like misspellings using fuzzy matching, etc., try an in-memory data store like Redis. The details can be seen here:
http://charlesleifer.com/blog/powerful-autocomplete-with-redis-in-under-200-lines-of-python/
I haven't been able to get CloudSearch (95-125ms in browser fetching from endpoint directly as measured by Firebug, and + 20-30ms longer accessing endpoint via cURL in PHP) down to the low latencies of Google and Quora I cited regardless of the simplicity of the search query. An Elasticsearch cluster is a bit faster. These statements obviously depend upon use case and probably don't generalize well, but something to think about.
I need to develop a feature in the system which allows unregistered users to get one-off system access via URL token that is generated/sent by an authenticated user.
For example, a user logs in and wants to share a piece of information so the system generates a URL like http://host/page?token=jkb345k4b5234k54kh5345kb34kb34. Then this URL is sent to an unregistered user who would follow the URL to get some limited access to normally protected data.
First question - are there any standards (RFC? IETF? others?) that would be defining URL generation? The only ones I was able to find are RFC2289 and OpenToken, but none of these are directly related to what I need to do and the latter is only in a second draft state.
There is another design consideration: whether to use one way crypto hash functions and store the payload in a local data store VS using private-public key pairs and encode all necessary payload in the unique string itself.
At the moment I am heavily leaning towards one way hash as it would give me much more freedom (no dependency between payload size and generated string) and less potential problems in the future (e.g. what if I decide to add more payload - how to ensure backwards compatibility). Last but not least, accidental exposure of server-side private key would require massive efforts in key regeneration, update of all live instances, etc. None of these problems are relevant if choosing one-way hash option, but maybe there's something I overlook? RFC2289 prefers one way crypto function whereas OpenToken chooses the key pair option.
And finally, is anybody aware of any Java library for generating these?
Thanks in advance.
Also have a look at http://en.wikipedia.org/wiki/Universally_unique_identifier and RFC4122. Inside the backend you would need to attach the generated uuid to your entity so verification based on the UUID can be done later.
Apart from that most often the token could include some data (e.g. versioning+userdata) and then a secure MD5-hash is used to 'obfuscate/anonymize' it. Later then the data is concatenated by server and the hash-values are compared again.
Regarding java-lib and uuid have a look at UUID-javadoc.
Generate random strings and store them in a database with credentials.
The codes generated need to have two properties: complexity and uniqueness. Complexity ensures that they cannot be guessed and uniqueness ensures that the same code can never be generated twice. Beyond this, the specific method doesn't matter.
Generate token strings with two parts to them. The first part is time-dependent, where the key will increment and change in a predictable way with each millisecond. The second part is completely random. Combined, this will give you a long string that is unique and complex.
When you generate the token, store it in the database with the credentials that are granted when this token is used. It's important that these credentials are not encoded into the string, since this ensures that the strings cannot be hacked.
When the user click on the link with the token, mark that token as used in the database. Even better is to set a timestamp for the use, so that it can be expired, perhaps, 24 hours after the first click. This approach gives you the flexibility to implement this specific part of the requirement as necessary for your project.
I've used this solution before in many different cases for not only one-off system access, but also for ticket admission codes, gift certificate codes, and anything that's one-time use. It doesn't matter so much what you use to generate the token, so much as you can guarantee its complexity and uniqueness.
Here's how I would have done it:
Create a token (you could use a UUID for this) and add it to your database along with creation time and what resource the token should grant access to
Send an email to the user with the url http://www.myserver.com/page?token=
When the user navigates to the url, create a new session with the desired timeout and mark that session as authorized to view whatever the database says the user should be able to see (If the token isn't too old. Check the creation time against current time)
Either delete the token from the database, or mark it as expired
You only need a token when a user shares one piece of information. So, can't you just generate a random token, and associate this with the piece of information (e.g. a database field)? It's a lot simpler than doing any crypto stuff...