I think I'm facing a paradox here.
What I'm trying to do is when I receive/make a call, I have the number, so I need to know if its an international number, if its a local number, etc.
The problem is:
For me to know if a number is international, I need to parse it and check its length, but, the length differs from country to country, so, should I do a method that parses and recognizes for each country? (Unfunctional in my opinion);
For me to know if its a local number, I need the area code, so I have to make the same thing, parse the number and check the lenght, get the first numbers based on the area code lenght;
Its kinda hard to find the solution for this. The library libphonenumber offers a lot of usefull classes, but the one that I thought that could help me, took me to another paradox.
The method phoneUtil.parse(number, countryAcronym) returns the number with its country code, but what it does is, if I pass the number with the acronym "US" it return the number with country code '1', now if I change the acronym to "BR" it changes the number and return '55' that is the country code for Brazil. So, anyways, I need the country acronym based on the number I get.
EX:
numberReturned = phoneUtil.parse(phoneNumber, "US");
phoneUtil.format(numberReturned, PhoneNumberFormat.INTERNATIONAL);
The above code, returns the number with the US country code but now if I change the "US" to any other country acronym it will return the same number but with the country code of that country.
I know that this lib is not supposed to guess from which country the number is (THAT WOULD BE AWESOME!!), but thats what I need.
This is really making my mind goes crazy. I need good advices from the wise mages of SO.
If you please could help me with a good decision, I'd be so thankfull.
Thanks.
PS: If you already use libphonenumber and has more experience with this, please guide me on which class to use, if there is one capable of solving this problem. =)
1) The second parameter to phoneUtil.parse must match the country you're currently in - it's used if the phone number received does not include an international prefix. That's why you get different results when you change the parameter: the phone number you pass it does not contain such a prefix, so it just uses what you've told it.
Any parsing solution set to determine if the phone number is international or not will need to be aware of this: depending on the source, even a national number may be represented with the international dialing prefix (usually abstracted as +, since it differs between countries, but this is not guaranteed).
2) For area code parsing, there is no universal standard; some countries don't use them, and even within a country, area codes may have differing lengths (e.g. Germany). I'm not aware of an international library for this - and a quick search doesn't find anything (though that doesn't mean one does not exist). You might need to roll your own here; if you only need to support a single country, this shouldn't be too hard.
Related
In my application, I search a user by his phone number on a server.
I have encountered such an issue: in my country, you can write a phone number both as '8xxxxxxxx' and '+7xxxxxxxx' and '7xxxxxxxx'.
So I should perform a search so that a number written in different notation still would be found.
But how is it in different countries? How the numbers are written? Is there a way to perform a valid search?
You'll probably want to look at the E.164 Standard Format.
E.164 organizes phone numbers with all the necessary localization information in a easily human readable and machine parseable format. It defines a simple format for unambiguously storing phone numbers in an easily readable string. The string starts with a + sign, followed by the country code and the “subscriber” number which is the phone number without any context prefixes such as local dialing codes, international dialing codes or formatting.
Numbers stored as E.164 can easily be parsed, formatted and displayed in the appropriate context, since the context of a phone number can greatly affect its format.
The library responsible for this is called Google’s libphonenumber. With libphonenumber you can parse, verify, and format phone number inputs quite easily, do as you type formatting and even glean extra information about the number, like whether it was a mobile or landline or what state or province it was from. Libphonenumber in its basic form consists of a set of rules and regular expressions in an XML file for breaking down and parsing a number.
As input I am getting an address as a String. It may say something like "123 Fake Street\nLos Angeles, CA 99988". How can I convert this into an object with fields like this:
Address1
Address2
City
State
Zip Code
Or something similar to this? If there is a java library that can do this, all the better.
Unfortunately, I don't have a choice about the String as input. It's part of a specification I'm trying to implement.
The input is not going to be very well structured so the code will need to be very fault tolerant. Also, the addresses could be from all over the world, but 99 out of 100 are probably in the US.
You can use JGeocoder
public static void main(String[] args) {
Map<AddressComponent, String> parsedAddr = AddressParser.parseAddress("Google Inc, 1600 Amphitheatre Parkway, Mountain View, CA 94043");
System.out.println(parsedAddr);
Map<AddressComponent, String> normalizedAddr = AddressStandardizer.normalizeParsedAddress(parsedAddr);
System.out.println(normalizedAddr);
}
Output will be:
{street=Amphitheatre, city=Mountain View, number=1600, zip=94043, state=CA, name=Google Inc, type=Parkway}
{street=AMPHITHEATRE, city=MOUNTAIN VIEW, number=1600, zip=94043, state=CA, name=GOOGLE INC, type=PKWY}
There is another library International Address Parser you can check its trial version. It supports country as well.
AddressParser addressParser = AddressParser.getInstance();
AddressStandardizer standardizer = AddressStandardizer.getInstance();//if enabled
AddressFormater formater = AddressFormater.getInstance();
String rawAddress = "101 Avenue des Champs-Elysées 75008 Paris";
//you can try to detect the country
CountryDetector detector = CountryDetector.getInstance();
String countryCode = detector.getCountryCode("7580 Commerce Center Dr ALABAMA");
System.out.println("detected country=" + countryCode);
Also, please check Implemented Countries in this library.
Cheers !!
I work at SmartyStreets where we develop address parsing and extraction algorithms.
It's hard.
If most of your addresses are in the US, you can use an address verification service to provide guaranteed accurate parse results (since the addresses are checked against a master list).
There are several providers out there, so take a look around and find one that suits you. Since you probably won't be able to install the database locally (not without a big fee, because address data is licensed by the USPS), look for one that offers a REST endpoint so you can just make an HTTP request. Since it sounds like you have a lot of addresses, make sure the API is high-performing and lets you do batch requests.
For example, with ours:
Input:
13001 Point Richmond Dr NW, Gig Harbor WA
Output:
Or the more specific breakdown of components, if needed:
If the input is even messier, there are a few address extraction services available that can handle a little bit of noise within an address and parse addresses out of text and turn them into their components. (SmartyStreets offers this also, as a beta API. I believe some other NLP services do similar things too.)
Granted, this only works for US addresses. I'm not as expert on UK or Canadian addresses, but I believe they may be slightly simpler in general.
(Beyond a small handful of well-developed countries, international data is really hit-and-miss. Reliable data sets are hard to obtain or don't exist. But if you're on a really tight budget you could write your own parser for all the address formats.)
If you are sure on the format, you can use regular expressions to get the address out of the string. For the example you provided something like this:
String address = "123 Fake Street\\nLos Angeles, CA 99988";
String[] parts = address.split("(.*)\\n(.*), ([A-Z]{2}) ([0-9]{5})");
I assume the sequence of information is always the same, as in the user will never enter postal code before State. If I got your question correctly you need logic to process afdress that may be incomplete (like missing a portion).
One way to do it is look for portions of string you know are correct. You can treat the known parts of Address as separators. You will need City and State names and address words (Such as "Street", "Avenue", "Road" etc) in an array.
Perform Index of with cities,states and the address words (and store them).
Substring and cut out the 1st line of address (from start to the index of address signifying word +it's length).
Check index of city name (index found in step 1). If it's -1 skip this step. If it's 0 Take it out (0 also means address line 2 is not in string). If it's more than 0, Substring and cut out anything from start of string to index of city name as the 2nd line of address.
Check the index of state name. Once again if -1 skip this step. If 0 substring and cut out as state name.
Whatever remains is your postal code
Check the strings you just extracted for left over separators (commas, dots, new lines etc) and extract them;
If the address is missing both state and city you would actually need an a list of zip codes too, so better ensure the user enters at least 1 of them.
It's not impossible to implement what you need, but you probably don't want to waste all that time doing it. It's easier to just ensure user enters everything correctly.
Maybe you can use Regular Expression
I'm comparing some senses with RitaWordNet and using SenseRelate::AllWords to word sense disambiguate them, but I'm in trouble. I can't figure out how to compare the output from RitaWordnet with AllWords script.
Rita give me senseid, name, description, pos/bestPos (adjective, verb, noun etc) but not sense number (#1,#2,#3..) The output I I get is like this:
"user","n", "someone who use something..".
::AllWords can't retrieve description, but (wsd.pl) give me
name#pos#sensenumber ("User#n#1").
Which was what I was hoping for actually, but then I realized that Rita doesn't support sense numbers (strangely).
So now I'm a little stuck on how to compare them, how to determine if they are the same sense. Any ideas on how to solve this?
I realized that WordNet returns a sorted list based on nouns, verbs etc.
Example: "story"
Noun: narrative, narration, story, tale #1
Noun: story #2
Noun: floor, level, storey, story #3
So now I can simply add sense number 1, 2 and 3 to the first three nouns (and the rest will continune of course). Then I can compare this with user#n#1 from the word sense disambiguation. The creators of RitaWordNet responded that they would like to implement the sense number in their api, but they don't have an upcoming relase for a new version so this may take a while.
I need to compare two phone numbers to determine if they're from the same sender/receiver. The user may send a message to a contact, and that contact may reply.
The reply usually comes in
+[country-code][area-code-if-any][and-then-the-actual-number] format. For example,
+94 71 2593276 for a Sri Lankan phone number.
And when the user sends a message, he will usually enter in the format (for the above example) 0712593276 (assume he's also in Sri Lanka).
So what I need is, I need to check if these two numbers are the same. This app will be used internationally. So I can't just replace the first 2 digits with a 0 (then it will be a problem for countries like the US). Is there any way to do this in Java or Android-specifically?
Thanks.
Android has nice PhoneNumberUtils, and i guess your looking for :
public static boolean compare (String a, String b)
look in :
http://developer.android.com/reference/android/telephony/PhoneNumberUtils.html
using it should look like this :
String phone1
String phone2
if (PhoneNumberUtils.compare(phone1, phone2)) {
//they are the same do whatever you want!
}
android.telephony.PhoneNumberUtils class provides almost all necessary functions to deal with phone numbers and standards.
For your case, the solution is PhoneNumberUtils.compare(Context context, String number1, String number2) or else PhoneNumberUtils.compare(String number1, String number2).The former one checks a resource to determine whether to use a strict or loose comparison algorithm, thus the better choice in most cases.
PhoneNumberUtils.compare("0712593276", "+94712593276") // always true
PhoneNumberUtils.compare("0712593276", "+44712593276") // always true
// true or false depends on the context
PhoneNumberUtils.compare(context, "0712593276", "+94712593276")
Take a look at the official documentation. And the source code.
How about checking if the number is a substring of the receiver's number?
For instance, let's say my Brazilian number is 888-777-666 and yours is 111-222-333.
To call you, from here, I need to dial additional numbers to make international calls. Let's say I need to add 9999 + your_number, resulting in 9999111222333.
If RawNumber.substring(your_number) returns true I can say that I'm calling you.
just apply your logic to remove () and -
and follow PhoneNumberUtils
We read data from XLS cells formatted as text.
The cell hopefully contains a number, output will be a BigDecimal (because of arbitrary precision).
Problem is, the cell format is also arbitrary, which means it may contain numbers like:
with currency symbols ($1000)
leading and trailing whitespaces, or whitespaces in between digits (eg. 1 000 )
digit grouping symbols (eg. 1,000.0)
of course, negative numbers
'o's and 'O's as zeros (eg. 1,ooo.oo)
others I can't think of
It's mostly because of this last point that I'm looking for a standard library that can do all this, and which is configurable, well tested etc.
I looked at Apache first, found nothing but I might be blind... perhaps it's a trivial answer for someone else...
UPDATE: the domain of the question is financial applications. Actually I'm expecting a library where the domain could be an input parameter - financial, scientific, etc. Maybe even more specific: financial with currency symbols? With stock symbols? With distances and other measurement units? I can't believe I'm the first person to think of something like this...
I don't know any library, but you can try that:
Put your number on a string. (ex: $1,00o,oOO.00)
Remove all occurrences of $,white-spaces or any other strang symbols you can think of...
Replace occurrences of o and O.
Try to parse the number =]
That should solve 99% of the entrys...
Buy bunch photos or even better videos with legal adult content. Create a web site with these resources but limit the access with captcha which will be displaying unsolved number formats. Create a set of number decoders out of known number formats and create an algorithm which will add new ones based on user solved captchas.
I think this is what I've been looking for:
http://site.icu-project.org/
Very powerful library, although at the moment it's not clear whether it can only format or all the formatted stuff can be parsed back as well.