Prevent bots to query my database several times

Prevent bots to query my database several times - java

I'm building an application that is a kind of registry. Think about the dictionary: you lookup for a word and it return something if the word is found. Now, that registry is going to store valuable informations about companies, and some could be tempted to get the complete listing. My application use EJB 3.0 that replies to WS.
So I was thinking about permits a maximum of 10 query per IP address per day. Storing the IP address and a counter on a table that would be empty by a script every night.
Is it a good idea/practice to do so? If yes, how can I get the IP address on the EJB side?
Is there a better way to prevent something to get all the data from my database?
I've also though about CAPTCHA but I think it's a pain for the user, and sometime, they are difficult to read even for real human.
Hope it's all clear since I'm not english...
Thanks
Alain

I'd say the limit of 10 query per day per IP is not very good. Take into account that many people may share the same public IP.
Although it's not 100% accurate you could analyze if an unusual amount of request are coming from the same IP in a short period of time. In case that your alarm sounds, you show a CAPTCHA.

An alternative is to put an unique request based token in a hidden field of the form which you store in the session scope and then compare that on submit of the form. That would filter out the bots which doesn't maintain the session and that are already pretty much.
To go a step further, you could add a timestamp to the request based token and then check if the form is submitted within reasonable time, e.g. 5 seconds (at least the fastest time a normal human can enter and submit the form). That would filter out another bots which usually instantly fills and submits the form in subsecond. Another advantage of this is that in case of a very smart bot that it is then forced to take it more easy with firing lot of subsequent requests.
I would at least not rely on the IP address. It comes with too much external disturbing factors.

Related

OpenstreetMap how to validate post addresses

I am working on a project where I should develop an application to validate post addresses in Germany, Switzerland and Austria. For that I need to set up a address database with all information. But I don't know where i can get the data. I googled for a long time but I didn't find an answer for my problem.
I have 2 questions:
can I work with the rest service of nominatim in a productive environment? the application will process approximately over then 300000 request a day.
can I import an openstreetmap file (*.osm) in a DB (MySql) and work with that? ist there all the informations i need to validate addresses. such as streetname exists? housenumber exists? fit street to housenumber and town?
thans in advance
Achraf

Yes, you can use Nominatim in a productive environment. However not
OSM's public instance (take a look at the usage policy for the
reasons). Just install your own Nominatim instance or use one of the other alternatives.
That depends on the exact information you need. Some adress information are implicitly contained (usually address, house number), some others need to be calculated first (often the city, municipality, state, post codes etc.) because they are often not attached to the address elements directly but to administrative boundary relations instead. Nominatim does all of these processing for you.
Also take a look at other OSM search engines. And remember that OSM doesn't contain every possible address.

Storing a list of used tokens in App Engine servlet - java

I have a little GAE application, a backend for my Android app.
I have a servlet in the app that pulls data from the datastore and send it to the user.
I don't want anyone to be able to use this servlet, so I store a private key in the app, and for every request I'm sending a token - a hash string of the private key and the current milliseconds, and the milliseconds I've used in the hash.
The server is taking the milliseconds and the private key, and comparing it with the token. If it went well, the server is storing the milliseconds in a HashSet so it will know not to use it again. (Someone can sniff the device data - and send the same milliseconds and token over and over again).
At first, I held a static field in the Servlet class, which was later discovered as mistake, because this field is not persisted, and all the data is getting lost when the instance get destroyed.
I've read about Memcache, but it's not an optimal solution because from what I understand, the data in the Memcache can get erased if the app is low on memory, or even if there are server failures.
I don't want to use datastore because it will really make the requests much slower.
I guess I'm not the first who is facing the problem.
How can I solve it?

I used a reverse approach in one of my apps:
Whenever a new client connects, I generate a set of three random "challenges" on the server (like your milliseconds), which I store in memcache with an expiration time of a minute or so. Then I send these challenges to the client. For each request that the client makes, it needs to use one of these 3 challenges (hashed with aprivate key). The server then deletes the used challenge, creates a new one and sends it to the client. That way, each challenge is single-use and I won't have to worry about replay-attacks.
A couple of notes on this approach:
The reason I generate 3 challenges is to allow for multiple requests in flight in parallel.
The longer you make the challenge, the less likely it will be that it will be randomly reused (allowing for a playback attack then).
If memcache forgets the challenges I stored, the app's request will fail. In the failure, response I include a "forget all other challenges and use these 3 new ones: ..." command.
You can tie the challenges to the client's IP address or some other sort of session info to make it even less likely that someone can "hack" you.
In general, it's probably always best to have the server generate the challenge or salt for an authentication than giving that flexibility to the client.
Another approach you could use if you would like to stick with using a timestamp is to use the first request interchange to determine the time offset between your server instance and your client device. Then, only accept requests with a "current" timestamp. For this, you would need to determine the uncertainty with which you can get the time offset and use that as a cutoff for a timestamp not to be current. To prevent replay-attacks within that cutoff period, you might need to save and disallow the last couple of timestamps used. This, you can probably do inside your instance since AppEngine, AFAIK, routes requests from the same client preferentially to the same instance. Then, if it takes longer to shut down an instance and restart one (i.e. to clear your disallow cache) than your "current"-cutoff is, you shouldn't have too many issues with replay-attacks.

apps, remote sripts and security/obfuscation

I will construct a fictional app in order to construct my question.
I write a kind of treasure hunt app where the user gets a prize if they visit several locations around town. In effect the app would get their current lat/lon and check its proximity to the list of "treasure locations", if they are within 10 meters of any treasure location they get a notification.
The app will then do a http post to a remote script which basically inserts into a database. The post parameters will be uuid of device and the location they visited.
An attacker could easily watch wireshark and get the name of the script along with the parameters. They could go further, decompile the apk and get other things such as any hashing/obfuscation. They could then just use curl to post willynilly as they pleased and the game would be ruined for non-cheaters. This is a problem have never had to really address since in all the apps I have written there is always data which isnt sensitive and I dont mind it being exposed to the public.
What do I do?

The best think you could do is to send the data in a secure manner. Using HTTPS would be a much better choice, regardless of method. This effectively prevents eavesdroppers, it is the fundamental technology behind any secure communication on the internet.
Aside from the protocol to communicate with the server, there are still insecurities. Essentially, there are three methods that could work to overcome these.
The location of the player could be sent to the server at some periodic interval. The server responds back if they are close enough to one of the areas. Perhaps the server could include enough smarts to know that it takes time to get from point A to point B.
A single location could be sent at a time to the app. The track of the user could also be uploaded, to verify that the location is correct.
The locations could be sent through a one way function to the program. The real answer could be then sent to the server. The problem with this is that the exact location would need to be discovered in order for the same hash to result back. However, as GPS coordinates tend to only be accurate to a few meters, and don't tend to give insignificant digits, then multiple values could be tested near the current location. The one-way function would have to require some time to calculate in an effective manner, as otherwise it would be trivial for a bad guy to simply test every square meter in the city to figure out what would work.
The best method from a security standpoint would be the first, as at no time does the application know where it is supposed to go, until it reaches that location. Of course, this pings the server a large number of times needlessly.

Multiplayer-Game Network Protocol

I am responsible of the network part of a multiplayergame.
I hope anybody of you got some eperience with that.
My questions are:
Should I create an Object which contains all information (Coordinates, Stats, Chat) or is it better to send an own Object for each of them?
And how can i avoid the Object/s beeing cached at the client so i can update the Object and send it again? (i tried ObjectInputStream.reset() but it still received the same)
(Sorry for my bad english ;))

For every time send all data is not good solution, just diff of previous values can be better. Sometimes(eg 1 time for every 10 or maybe 100 update) send all values to sync.

1.in the logic layer, you can split the objects, and in transmission layer you send what you want, of course you can combine them and send.
2.you can maintain a version for each user and the client also have the version number, when things change, update the corresponding version in the server and then send the updates to all the clients, then the client should update version. it should be a subcribe mode.

User matching with current data

I have a database full of two different types of users (Mentors and Mentees), whereby I want the second group (Mentees) to be able to "search" for people in the first group (Mentors) who match their profile. Mentors and Mentees can both go in and change items in their profile at any point in time.
Currently, I am using Apache Mahout for the user matching (recommender.mostSimilarIDs()). The problem I'm running into is that I have to reload the user data every single time anyone searches. By itself, this doesn't take that long, but when Mahout processes the data it seems to take a very long time (14 minutes for 3000 Mentors and 3000 Mentees). After processing, matching takes mere seconds. I also get the same INFO message over and over again while it's processing ("Processed 2248 users"), while looking at the code shows that the message should only be outputted every 10000 users.
I'm using the GenericUserBasedRecommender and the GenericDataModel, along with the NearestNUserNeighborhood, AveragingPreferenceInferrer and PearsonCorrelationSimilarity. I load mentors from the database, add the mentee to the list of POJOs and convert them to a FastByIDMap to give to the DataModel.
Is there a better way to be doing this? The product owner needs the data to be current for every search.

(I'm the author.)
You shouldn't need to ask it to reload the data every time, why's that?
14 minutes sounds way, way too long to load such a small amount of data too, something's wrong. You might follow up with more info at user#mahout.apache.org.
You are seeing log messages from a DataModel, which you can disable in your logging system of choice. It prints one final count. This is nothing to worry about.
I would advise you against using a PreferenceInferrer unless you absolutely know you want it. Do you actually have ratings here? I might suggest LogLikelihoodSimilarity if not.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.