OpenstreetMap how to validate post addresses - java

I am working on a project where I should develop an application to validate post addresses in Germany, Switzerland and Austria. For that I need to set up a address database with all information. But I don't know where i can get the data. I googled for a long time but I didn't find an answer for my problem.
I have 2 questions:
can I work with the rest service of nominatim in a productive environment? the application will process approximately over then 300000 request a day.
can I import an openstreetmap file (*.osm) in a DB (MySql) and work with that? ist there all the informations i need to validate addresses. such as streetname exists? housenumber exists? fit street to housenumber and town?
thans in advance
Achraf

Yes, you can use Nominatim in a productive environment. However not
OSM's public instance (take a look at the usage policy for the
reasons). Just install your own Nominatim instance or use one of the other alternatives.
That depends on the exact information you need. Some adress information are implicitly contained (usually address, house number), some others need to be calculated first (often the city, municipality, state, post codes etc.) because they are often not attached to the address elements directly but to administrative boundary relations instead. Nominatim does all of these processing for you.
Also take a look at other OSM search engines. And remember that OSM doesn't contain every possible address.

Related

Google Places API - saving place_id and violation of terms and conditions

I want to build an app which shows places around user using Google Places based on user interests. As mentioned here:
Place IDs are exempt from the caching restrictions stated in Section
10.5.d of the Google Maps APIs Terms of Service. You can therefore store place ID values indefinitely.
So, can I save place_id in cloud database and perform any analytics operation over it? For example; if I gather place_ids added in each user's favorite places table and from analytics; I can know which place_id are the most ones added to favorites? or can I show something like 'Trending Places' in app from gathered place_ids in responses?
Will it violate the terms and conditions? I read the whole page of terms but couldn't find the answer.
can anyone help me out? Thanks.
Yes you can 100% store the place_id indefinitely and reuse it.
See Referencing a Place with a Place ID.
Please note one thing that
A single place ID refers to only one place, but a place can have
multiple place IDs
These terms and conditions are kind of self explanatory. Except your requirement which will be clarified after the below link is read carefully. As per your requirement , inorder to prevent calling services next time with same query which user had done with an intention of saving network calls is acceptable.
No caching or storage: You will not pre-fetch, cache, index, or store any Content to be used outside the Service, except that you may store limited amounts of Content solely for the purpose of improving the performance of your Maps API Implementation due to network latency (and not for the purpose of preventing Google from accurately tracking usage), and only if such storage
1) is temporary (and in no event more than 30 calendar days)
2) is secure 3)
does not manipulate or aggregate any part of the Content or Service 4) and
does not modify attribution in any way. Go through this Section 10.5 Intellectual Property Restrictions. Subsection (B)
You'll need to contact Google to get a 100% answer.
That being said, from my experience it looks like the clause you included is intended exactly for the kind of thing you want to do.
Again, I want to reiterate that contacting Google directly is something you should do if you still have concerns.
You can store place ID values indefinitely.
Just What part of
You can therefore store place ID Values indefinitely.
Don't you understand?
Indefinitely requires a server.

Track the exact location of IP address

I am trying to create an app to find the exact location of IP address. I did some research on IP addresses and many more. But whenever I try to locate an IP address it provides the location of Internet Service Provider. I want to track the exact geolocation or long & lat of that place. So can anyone help me out to let me know how to find the geolocation of Dynamic IP address
The information upon which such a service might (hypothetically) be based is not available. Ergo, the ISP level information is about as good as you are likely to get for wired IP addresses.
#salocinix wrote:
"The exact position of an end-user's IP is only store at the ISP database and is normally not given away."
The second part is definitely true. They don't and shouldn't give away details about their customers' physical locations. There are obvious privacy concerns with doing that.
But it is quite possible that the ISP doesn't store the customer's physical location at all. Certainly, there is no need for them to store it in the form of longitude and latitude. Whether they need to store it depends on who owns the wires. In Australia for instance, many customers' ISPs don't own the wires that carry the traffic to the customers' dwelling. In that case, the ISP (in theory) only needs to know the billing address for the customer. And then there is the case of ISPs who sell internet connectivity for mobile devices ... where the physical location of a given IP address can change on a minute-by-minute basis.
... just asking because google maps shows the exact location of my PC on map how does it work
The PC is most likely geolocating itself via a combination of GPS and triangulation of local wireless base stations.
The exact position of an end-user's IP is only store at the ISP database and is normally not given away by the IS-provider. Try out the following link, you'll not achieve much more precision.
http://www.iplocation.net/index.php
I urge you to read the following thread on NANOG which was written by Fred Baker of Cisco, author of 50 network related RFC's.
Well, let me ask you you think 171.70.120.60 is. I'll give you a hint;
at this instant, there are 72 of us.
Here's another question. Whom would you suspect 171.71.241.89 is? At
this point in time, I am in Barcelona; if I were home, that would be my
address as you would see it, but my address as I would see it would be
in 10.32.244.216/29. There might be several hundred people you would
see using 171.71.241.89;
Geolocating is gimmicky at best.

Prevent bots to query my database several times

I'm building an application that is a kind of registry. Think about the dictionary: you lookup for a word and it return something if the word is found. Now, that registry is going to store valuable informations about companies, and some could be tempted to get the complete listing. My application use EJB 3.0 that replies to WS.
So I was thinking about permits a maximum of 10 query per IP address per day. Storing the IP address and a counter on a table that would be empty by a script every night.
Is it a good idea/practice to do so? If yes, how can I get the IP address on the EJB side?
Is there a better way to prevent something to get all the data from my database?
I've also though about CAPTCHA but I think it's a pain for the user, and sometime, they are difficult to read even for real human.
Hope it's all clear since I'm not english...
Thanks
Alain
I'd say the limit of 10 query per day per IP is not very good. Take into account that many people may share the same public IP.
Although it's not 100% accurate you could analyze if an unusual amount of request are coming from the same IP in a short period of time. In case that your alarm sounds, you show a CAPTCHA.
An alternative is to put an unique request based token in a hidden field of the form which you store in the session scope and then compare that on submit of the form. That would filter out the bots which doesn't maintain the session and that are already pretty much.
To go a step further, you could add a timestamp to the request based token and then check if the form is submitted within reasonable time, e.g. 5 seconds (at least the fastest time a normal human can enter and submit the form). That would filter out another bots which usually instantly fills and submits the form in subsecond. Another advantage of this is that in case of a very smart bot that it is then forced to take it more easy with firing lot of subsequent requests.
I would at least not rely on the IP address. It comes with too much external disturbing factors.

Java Compare Addresses

Does anyone know a library to compare addresses in Java ?
Something that would give equality on addresses, written in different ways.
For example, it should recognize that
"22 Acacia Avenue" and "22 acacia av."
is the same address.
Of course, this can escalate a lot, that's why i'm asking.
Thanks in advance.
You should check out this question:
Where is a good Address Parser
The only way to truly and accurately compare addresses is to ensure that both are standardized and certified. Within the USA, you can leverage the 12-digit delivery point barcode on a certified address which can serve as a unique identifier for a given address. Beyond that, there's not much else you can because addresses are not self validating and can be written in countless different ways. Even complex regex don't help. And don't get me started on how people spell streets and cities incorrectly.
I should mention that I'm the founder of SmartyStreets. We have a CASS-certified address verification service which allows you to clean, standardize, verify, and confirm each address which then makes duplicate detection a piece of cake. We offer both batch processing to obtain a CASS-certified list or individual "live" checking via an address verification web service API.

Designing Address validation for app

I am planning to design an address validation for users registering in my app. Possibly validating by zipcode and state.
Any idea how to handle addresses from around the globe?
Do i need to insert all the zipcodes in the database and then validate the address. Any possible suggestion for the implementation?
Thanks and Welcome :)
Krisp
Since there is no international standard for zip codes and a list of all zip codes in the world would be out of date before you were finished putting it together, I suggest a smaller approach:
Identify the countries that you will have to handle most and develop seperate validation rules for each of them. Make certain that with this you handle a vast majority of your users (e.g. 95%, or98%). For all the other countries, just accept what they enter vithout further validation.
There are so many different address formats in the world that it is just not worth the effort (if at all possible) to handle them all.
There is MASSIVE variance among address and postal code formats, such that there is not any "standard" way of doing this. See "Frank's Compulsive Guide to Postal Addresses"...
How much/what kind of validation do you really need? If the user is entering their shipping address, for example, they're more likely than you to know what particular format their local postal/shipping provider needs. Just give them a multiline textarea to enter it. If you need parts of it to calculate shipping costs, request just the information you need (City/Country, for example)
Postal Codes can actually be a headache because in some places they can represent very tiny areas as opposed to the US where they often represent relatively large areas (except in a big city where they may represent a few blocks).
Look at Canada, their postal codes can actually represent very very tiny areas. Two stores on opposite sides of the street often have different Canadian postal codes. Also in a list of Canadian businesses, when merging the list it is not uncommon to see the same address with a slightly different postal code. This just indicates that a lot of people get it wrong. On a customer basis I don't know how realistic it is that they actually get their exact zip code right.
http://www.columbia.edu/kermit/postal-ca.html
Basically it seems that each apartment or business dwelling may get their own zip code, which would make sense based upon what I have seen with Canadian business addresses.
The other point is that this is just Canada. Each European country will have its own address/postal code, so will Australia, Russia, etc... If you really want to do address verification, this is a major project.
To actually verify the address you need to to verify the postal code, city, and street. In the US the census releases the TIGER database files which often have a list of streets. But for other countries I don't know how you can get a list of streets. It may be best to look into a commercial package (maybe one of the GIS packages, although a lot of them only offer detailed addresses for the US/Canada and sometimes a few European countries).
A perfect Address validation can't be exactly placed in the already developed application, the validation of zip-code / postal code can be done as per the name of country though.
Please check the regex from the 'supplementalData.xml' xml file from the source xml-files source.
By parsing the xml you can find the corresponding postal-code regular expression for the country-code passed at the run-time, where you can check whether it's matching with country.
Have found another answer on this :
please refer the wiki's link : http://en.wikipedia.org/wiki/List_of_postal_codes.
Here you can find most of the zip-code patterns of most of the countries, of which you may write regex and maintain into database, which would help you to validate zip-code easily and also an optimized approach !
As many users have mentioned previously, verifying international addresses is basically impossible because there are no standards across countries and many countries don't have the resources for their postal system. Technically speaking, even in the United States, the USPS is struggling.
On a minimum you can offer address verification on a per-country basis. One of the easiest countries where you get a lot of coverage is in the USA. To do this you need to connect to some kind of address verification web service. There are several companies which have web services for this. One thing to be careful of is ensuring that each provider has geo-distribution of their API to ensure that any outages on their part don't flow back to you and kill your application. Beyond that, just make sure the results are CASS certified.
In the interest of full disclosure, I'm the founder of SmartyStreets. We have an address verification web service API called LiveAddress. You're more than welcome to contact me personally if you have any questions.

Categories