The right way to search phone numbers - java

In my application, I search a user by his phone number on a server.
I have encountered such an issue: in my country, you can write a phone number both as '8xxxxxxxx' and '+7xxxxxxxx' and '7xxxxxxxx'.
So I should perform a search so that a number written in different notation still would be found.
But how is it in different countries? How the numbers are written? Is there a way to perform a valid search?

You'll probably want to look at the E.164 Standard Format.
E.164 organizes phone numbers with all the necessary localization information in a easily human readable and machine parseable format. It defines a simple format for unambiguously storing phone numbers in an easily readable string. The string starts with a + sign, followed by the country code and the “subscriber” number which is the phone number without any context prefixes such as local dialing codes, international dialing codes or formatting.
Numbers stored as E.164 can easily be parsed, formatted and displayed in the appropriate context, since the context of a phone number can greatly affect its format.
The library responsible for this is called Google’s libphonenumber. With libphonenumber you can parse, verify, and format phone number inputs quite easily, do as you type formatting and even glean extra information about the number, like whether it was a mobile or landline or what state or province it was from. Libphonenumber in its basic form consists of a set of rules and regular expressions in an XML file for breaking down and parsing a number.

Related

Randomly Generate Meaningful(Valid) English Words In Android Application

I am making a Dictionary Application. I am using Pearson Dictionary API for the same. I need to generate a word so that I could query that word for its definition.
PROBLEM
I know how to generate a random word but I don't know how to generate a meaningful English word.
I tried to solve this problem by requesting a JSON response and checking the results[](results[ ] hold definitions for the word) in the response. So, if results[].lenght > 0 then the word is a valid English word.
But the solution above has its own serious problem: Suppose I want to generate a 5 letter word, there are as many as 26^5 = 11881376different combinations whereas there aren't as many 5 letter meaningful English words. As the letters in the word increases, the number of combinations increases too. Thus, generating a meaningful word can take a very long time.
How can I check if the generated word is a meaningful English word or not? Isn't there any feasible programmatic way of doing this?
OR Is there any other way I could solve this Problem?
As far as I can see, you either generate random strings of letters and check to see if they're words (which, as you realise, is very slow, hit-or-miss approach) or you store a list of "known good" words and select randomly from that list.
How big that list needs to be depends on what you're trying to achieve.
According to this page the OED has around 171,476 main entries, not including variants like plurals (cat, cats), standard variants (sit, sitting), nor words that have multiple classes (e.g. dog can be a noun [the animal] or a verb [to follow persistently] etc.). According to this page an average adult knows between 20,000 and 35,000 words, so a prudent selection of 50,000 should cover most general purpose uses.
The answers to this question (now closed) provide a number of sources for word-lists. Examining one of them (originally provided by infochimps.org but available as a simple text-list on github) shows that the average length of 350,000+ words is just under 10 characters. For Linux (and possibly other flavours) /usr/share/dict/words may be a useful place to start.
There is this beautifull text file containing all english wordS:
https://github.com/AlexHakman/Java-challenge/blob/master/words.txt
You can then generate 5 letter words based on whats inside this text document :)
Get per line the length of the line, or just generate and compare it with the text file :)
Instead of doing it random because you need to spend time verifying just store a dictionary of the words that you would require and have a lookup table for it.
A relatively complete dictionary for English is about 2MBs compressed like the one here http://wordlist.aspell.net/12dicts/
Even for an Android app unless you're targeting really under powered devices it shouldn't be that big.
You can use SQLite to store the data so it may take up a bit more storage but you get SQL as your query language rather than making up your own.
Since you would also need a bit of randomness, each row can add some sort of randomized key that you can further query.
If you really wanted to limit it to 5 characters then just use a subset of the dictionary. But this will allow you to have an arbitrary length even length ranges (e.g. 2 to 10 characters)

AltBeacon getIdentifier returns wrong value

I have a problem with my code or beacon returning a "wrong" value (in quotes, as this is most likely a mistake in my code). I have been reading up on beacons, and as far as I understand, I can give my beacons 3 identifiers. I have configured my beacon's identifiers to 000000000000001234 (lots of 0s, ending with 1234), 0001 for major and 0002 for minor.
Here is some code Im using while ranging beacons:
String id1 = beacon.getId1().toString();
String id2 = beacon.getId2().toString();
String id3 = beacon.getId3().toString();
I was assuming that these would represent the identifiers I had in my beacon, but the value I get for id1 is "0x02676f6f2e67c...", and id2 and id3 are null. Am I totally off?
Maybe I am using a wrong parser? (I got this in a mail from the beacon customer support, although I did not specify that I wanted to use the identifiers)
.setBeaconLayout("s:0-1=feaa,m:2-2=10,p:3-3:-41,i:4-20v"));
I must admit, I dont quite get if the beacon parser depends on HOW I want to decode my beacon, or on WHAT kind of beacon Im having, or maybe even both..
For the record, I am using Android, but I assume this is irrelevant.
A few points:
There are several popular beacon formats, each of which transmit a different number of identifiers with different identifier lengths. AltBeacon and iBeacon send three identifiers of 16, 2 and 2 bytes respectively. Eddystone-UID sends two identifiers of 10 and 6 bytes respectively. And Eddystone-URL sends a single identifier of a variable length between 1-17 bytes.
The question doesn't say what beacon format is being transmitted. It sounds like it is intended to be iBeacon or AltBeacon because those formats have a three part identifier (sometimes called ProximityUUID, major and minor). But the first identifier of those formats is a 16 bytes UUID, and the example shows an identifier like this: 000000000000001234, which may be 9 bytes if shown in hex, or an unknown number of bytes if expressed in decimal.
The beacon layout string shown ("s:0-1=feaa,m:2-2=10,p:3-3:-41,i:4-20v") is for Eddystone-URL, which is a format with a single variable length identifier, that can be converted to a URL string using a custom compression algorithm.
The beacon detected with a single identifier (ID2 and ID3 are null) probably is an Eddystone-URL transmission. The partially shown ID1 of 0x02676f6f2e67c... is equivalent to the URL of "http://goo.g"...
Conclusions:
The beacon being detected is probably not the one you intend to detect.
You may have multiple transmitting beacons in the vicinity or a beacon that sends out multiple transmission of different types, which is why you are detecting the Eddystone-URL beacon.
The beacon transmission you intend to detect is probably not in the Eddystone-URL format, so you probably need a different BeaconParser for this. You need to figure out the format first so you can add the proper BeaconParser.

Dealing with phone numbers formats

I think I'm facing a paradox here.
What I'm trying to do is when I receive/make a call, I have the number, so I need to know if its an international number, if its a local number, etc.
The problem is:
For me to know if a number is international, I need to parse it and check its length, but, the length differs from country to country, so, should I do a method that parses and recognizes for each country? (Unfunctional in my opinion);
For me to know if its a local number, I need the area code, so I have to make the same thing, parse the number and check the lenght, get the first numbers based on the area code lenght;
Its kinda hard to find the solution for this. The library libphonenumber offers a lot of usefull classes, but the one that I thought that could help me, took me to another paradox.
The method phoneUtil.parse(number, countryAcronym) returns the number with its country code, but what it does is, if I pass the number with the acronym "US" it return the number with country code '1', now if I change the acronym to "BR" it changes the number and return '55' that is the country code for Brazil. So, anyways, I need the country acronym based on the number I get.
EX:
numberReturned = phoneUtil.parse(phoneNumber, "US");
phoneUtil.format(numberReturned, PhoneNumberFormat.INTERNATIONAL);
The above code, returns the number with the US country code but now if I change the "US" to any other country acronym it will return the same number but with the country code of that country.
I know that this lib is not supposed to guess from which country the number is (THAT WOULD BE AWESOME!!), but thats what I need.
This is really making my mind goes crazy. I need good advices from the wise mages of SO.
If you please could help me with a good decision, I'd be so thankfull.
Thanks.
PS: If you already use libphonenumber and has more experience with this, please guide me on which class to use, if there is one capable of solving this problem. =)
1) The second parameter to phoneUtil.parse must match the country you're currently in - it's used if the phone number received does not include an international prefix. That's why you get different results when you change the parameter: the phone number you pass it does not contain such a prefix, so it just uses what you've told it.
Any parsing solution set to determine if the phone number is international or not will need to be aware of this: depending on the source, even a national number may be represented with the international dialing prefix (usually abstracted as +, since it differs between countries, but this is not guaranteed).
2) For area code parsing, there is no universal standard; some countries don't use them, and even within a country, area codes may have differing lengths (e.g. Germany). I'm not aware of an international library for this - and a quick search doesn't find anything (though that doesn't mean one does not exist). You might need to roll your own here; if you only need to support a single country, this shouldn't be too hard.

Parse BigDecimal from String containing a number in arbitrary format

We read data from XLS cells formatted as text.
The cell hopefully contains a number, output will be a BigDecimal (because of arbitrary precision).
Problem is, the cell format is also arbitrary, which means it may contain numbers like:
with currency symbols ($1000)
leading and trailing whitespaces, or whitespaces in between digits (eg. 1 000 )
digit grouping symbols (eg. 1,000.0)
of course, negative numbers
'o's and 'O's as zeros (eg. 1,ooo.oo)
others I can't think of
It's mostly because of this last point that I'm looking for a standard library that can do all this, and which is configurable, well tested etc.
I looked at Apache first, found nothing but I might be blind... perhaps it's a trivial answer for someone else...
UPDATE: the domain of the question is financial applications. Actually I'm expecting a library where the domain could be an input parameter - financial, scientific, etc. Maybe even more specific: financial with currency symbols? With stock symbols? With distances and other measurement units? I can't believe I'm the first person to think of something like this...
I don't know any library, but you can try that:
Put your number on a string. (ex: $1,00o,oOO.00)
Remove all occurrences of $,white-spaces or any other strang symbols you can think of...
Replace occurrences of o and O.
Try to parse the number =]
That should solve 99% of the entrys...
Buy bunch photos or even better videos with legal adult content. Create a web site with these resources but limit the access with captcha which will be displaying unsolved number formats. Create a set of number decoders out of known number formats and create an algorithm which will add new ones based on user solved captchas.
I think this is what I've been looking for:
http://site.icu-project.org/
Very powerful library, although at the moment it's not clear whether it can only format or all the formatted stuff can be parsed back as well.

Where can I find "reference barcodes" to verify barcode library output?

This question is not about 'best' barcode library recommendation, we use various products on different platforms, and need a simple way to verify if a given barcode is correct (according to its specification).
We have found cases where a barcode is rendered differently by different barcode libraries and free online barcode generators in the Internet. For example, a new release of a Delphi reporting library outputs non-numeric characters in Code128 as '0' or simply skips them in the text area. Before we do the migration, we want to check if these changes are caused by a broken implementation in the new library so we can report this as a bug to the author.
We mainly need Code128 and UCC/EAN-128 with A/B/C subcodes.
Online resources I checked so far are:
IDAutomation.com (displays ABC123 as 0123 with Code128-C)
Morovia.com
BarcodesInc (does not accept comma)
TEC-IT
They show different results too, for example in support for characters like comma or plus signs, at least in the human readable text.
For Code128 there isn't a single correct answer. If you use Code128-A you can get a different result than Code128-C. By result I mean how it looks. Take "803150" as an example. In Code128-A you'll need 6 characters (+ start, checksum, stop) to represent this number. Code128-C only consists of numbers, so you can compress two digits into one character. Hence you'll need only 3 characters (+ start, checksum, stop) to represent the same number. The barcodes will look different (A being longer in this case), but if you scan them both will give the correct number.
Further, Code128 doesn't need to be just A, B or C. You can actually combine the different subsets. This is common for cases like "US123457890", where Code128-A or B is used on "US" and Code128-C is used on the remaining digits. This is sometime referred to as Code-128 Auto, or just Code-128. The result is a "compressed" barcode in terms of width. You could represent the same data with A/B but again that would give you a longer barcode.
Take two online generators:
IDAutomation
BarcodesInc
I recommend the first one, where you can select between Auto/A/B/C. Here is an example image illustrating the differences:
On IDAutomation, Auto is default while A is default on Barcodes-Inc. Both are correct, you just need to be careful what subset you have selected when comparing output. I also recommend a barcode reader for use in development to test the output. Also, see this page for a comparision of the different subsets with ASCII values. I also find grandzebu.net useful, which has a free Code128 font you can use as well.
It sounds like your Delphi library always use Code128-C, since it's only possible to represent numbers in this subset.
Why not just scan them and see what comes back?

Categories