AltBeacon getIdentifier returns wrong value - java

I have a problem with my code or beacon returning a "wrong" value (in quotes, as this is most likely a mistake in my code). I have been reading up on beacons, and as far as I understand, I can give my beacons 3 identifiers. I have configured my beacon's identifiers to 000000000000001234 (lots of 0s, ending with 1234), 0001 for major and 0002 for minor.
Here is some code Im using while ranging beacons:
String id1 = beacon.getId1().toString();
String id2 = beacon.getId2().toString();
String id3 = beacon.getId3().toString();
I was assuming that these would represent the identifiers I had in my beacon, but the value I get for id1 is "0x02676f6f2e67c...", and id2 and id3 are null. Am I totally off?
Maybe I am using a wrong parser? (I got this in a mail from the beacon customer support, although I did not specify that I wanted to use the identifiers)
.setBeaconLayout("s:0-1=feaa,m:2-2=10,p:3-3:-41,i:4-20v"));
I must admit, I dont quite get if the beacon parser depends on HOW I want to decode my beacon, or on WHAT kind of beacon Im having, or maybe even both..
For the record, I am using Android, but I assume this is irrelevant.

A few points:
There are several popular beacon formats, each of which transmit a different number of identifiers with different identifier lengths. AltBeacon and iBeacon send three identifiers of 16, 2 and 2 bytes respectively. Eddystone-UID sends two identifiers of 10 and 6 bytes respectively. And Eddystone-URL sends a single identifier of a variable length between 1-17 bytes.
The question doesn't say what beacon format is being transmitted. It sounds like it is intended to be iBeacon or AltBeacon because those formats have a three part identifier (sometimes called ProximityUUID, major and minor). But the first identifier of those formats is a 16 bytes UUID, and the example shows an identifier like this: 000000000000001234, which may be 9 bytes if shown in hex, or an unknown number of bytes if expressed in decimal.
The beacon layout string shown ("s:0-1=feaa,m:2-2=10,p:3-3:-41,i:4-20v") is for Eddystone-URL, which is a format with a single variable length identifier, that can be converted to a URL string using a custom compression algorithm.
The beacon detected with a single identifier (ID2 and ID3 are null) probably is an Eddystone-URL transmission. The partially shown ID1 of 0x02676f6f2e67c... is equivalent to the URL of "http://goo.g"...
Conclusions:
The beacon being detected is probably not the one you intend to detect.
You may have multiple transmitting beacons in the vicinity or a beacon that sends out multiple transmission of different types, which is why you are detecting the Eddystone-URL beacon.
The beacon transmission you intend to detect is probably not in the Eddystone-URL format, so you probably need a different BeaconParser for this. You need to figure out the format first so you can add the proper BeaconParser.

Related

Why do all my decoded Strings have '?' at the end? Java String decoding

I am retrieving tweets from Twitter using the Tweepy library (Python) and Kafka. The text is encoded in UTF-8 as this line shows:
self.producer.send('my-topic', data.encode('UTF-8'))
Where "data" is a String. Then, this data is stored into an Oracle NoSQL database in key-value format. For this reason, the tweet itself is encoded. I do this with Java:
Value myValue = Value.createValue(msg.value().getBytes("UTF-8"));
Finally, the tweets are retrieved by a Formatter developed in Java. In order to store it in a relational schema, I have to parse the tweet so I retrieve it as a String.
String data = new String(value.toByteArray(),StandardCharsets.UTF_8);
As you see, I maintain UTF-8 encoding in all the steps I make. However, when I see the text of the tweet in my database it's always cut. For example:
RT #briIIohead: the hardest pill i had to swallow this year was learning that no matter how good you could be to somebody, no matter how mu?
Notice how it ends with '?' symbol, and it has been clearly cut. Well, this happens with every long tweet. I mean, if the text is like 30 characters long, then it shows fine, however anything longer than 100 or so is cut.
At first I thought it could be my table definition, but the field "Text" is declared as VARCHAR2(400 CHAR) which is the maximum number of characters a tweet can have in the social network.
Any ideas on how can I spot what's cutting the text and putting the '?' symbol at the end?
How "data" looks like:
{"created_at":"Tue May 28 09:23:36 +0000 2019","id":1133302792129351681,"id_str":"1133302792129351681","text":"RT #AppleEDU: Learn, create, and do more with iPad in your classroom. Get the free Everyone Can Create curriculum and bring projects to lif\u2026","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":1060510851889750022,"id_str":"1060510851889750022","name":"Rem.0112","screen_name":"0112Rem","location":"Mawson Lakes, Adelaide","url":null,"description":null,"translator_type":"none","protected":false,"verified":false,"followers_count":739,"friends_count":1853,"listed_count":10,"favourites_count":33406,"statuses_count":36936,"created_at":"Thu Nov 08 12:34:25 +0000 2018","utc_offset":null,"time_zone":null,"geo_enabled":true,"lang":null,"contributors_enabled":false,"is_translator":false,"profile_background_color":"F5F8FA","profile_background_image_url":"","profile_background_image_url_https":"","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/1093157842163355649\/6oAdJTCs_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/1093157842163355649\/6oAdJTCs_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/1060510851889750022\/1546155144","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Thu May 23 15:15:16 +0000 2019","id":1131579354964725760,"id_str":"1131579354964725760","text":"Learn, create, and do more with iPad in your classroom. Get the free Everyone Can Create curriculum and bring proje\u2026 https:\/\/t.co\/aeeSPTXtFx","source":"\u003ca href=\"https:\/\/ads-api.twitter.com\" rel=\"nofollow\"\u003eTwitter Ads Composer\u003c\/a\u003e","truncated":true,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":468741166,"id_str":"468741166","name":"Apple Education","screen_name":"AppleEDU","location":"Cupertino, CA","url":null,"description":"Spark new ideas, create more aha moments, and teach in ways you\u2019ve always imagined. Follow #AppleEDU for tips, updates, and inspiration.","translator_type":"none","protected":false,"verified":true,"followers_count":728781,"friends_count":273,"listed_count":2594,"favourites_count":13189,"statuses_count":2766,"created_at":"Thu Jan 19 21:26:14 +0000 2012","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":null,"contributors_enabled":false,"is_translator":false,"profile_background_color":"F0F0F0","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"0088CC","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":false,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/892429342046691328\/2SOlm_09_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/892429342046691328\/2SOlm_09_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/468741166\/1530123538","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"extended_tweet":{"full_text":"Learn, create, and do more with iPad in your classroom. Get the free Everyone Can Create curriculum and bring projects to life through music, drawing, video and photography.","display_text_range":[0,173],"entities":{"hashtags":[],"urls":[],"user_mentions":[],"symbols":[]}},"quote_count":0,"reply_count":3,"retweet_count":3,"favorite_count":58,"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/aeeSPTXtFx","expanded_url":"https:\/\/twitter.com\/i\/web\/status\/1131579354964725760","display_url":"twitter.com\/i\/web\/status\/1\u2026","indices":[117,140]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"scopes":{"followers":false},"filter_level":"low","lang":"en"},"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"AppleEDU","name":"Apple Education","id":468741166,"id_str":"468741166","indices":[3,12]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en","timestamp_ms":"1559035416048"}
I also must mention that this whole chunk is what's encoded. Then decoded, and finally parsed to be introduced in the database. All fields are correctly decoded and parsed, except "text" that's cut
According to the official documentation, a tweet has no more than "140" characters (that is a broad definition); but lately they changed it to 280.
That same document says:
Twitter counts the length of a Tweet using the Normalization Form C (NFC) version of the text.
So they first normalize the text (I'll let you figure out how this is done is java). And later they say:
Twitter also counts the number of codepoints in the text rather than UTF-8 bytes.
Thus:
String test = "RT #briIIohead: the hardest pill i had to swallow this year was learning that no matter how good you could be to somebody, no matter how mu";
System.out.println(test.codePoints().count()); // 139
It seems that the initial tweet was 280 "characters" and your library that you use is not aware of that, so it only uses the previous 140 ones. Since that does some chunking, it seems that the chunking is wrong too it removes some "partial" bytes at the end. When you try to print those - java does not know what those (at the end) bytes actually mean (because of some wrong chunking) and simply says ? (which is the default strategy on what to show when it simply does not understand something).

The right way to search phone numbers

In my application, I search a user by his phone number on a server.
I have encountered such an issue: in my country, you can write a phone number both as '8xxxxxxxx' and '+7xxxxxxxx' and '7xxxxxxxx'.
So I should perform a search so that a number written in different notation still would be found.
But how is it in different countries? How the numbers are written? Is there a way to perform a valid search?
You'll probably want to look at the E.164 Standard Format.
E.164 organizes phone numbers with all the necessary localization information in a easily human readable and machine parseable format. It defines a simple format for unambiguously storing phone numbers in an easily readable string. The string starts with a + sign, followed by the country code and the “subscriber” number which is the phone number without any context prefixes such as local dialing codes, international dialing codes or formatting.
Numbers stored as E.164 can easily be parsed, formatted and displayed in the appropriate context, since the context of a phone number can greatly affect its format.
The library responsible for this is called Google’s libphonenumber. With libphonenumber you can parse, verify, and format phone number inputs quite easily, do as you type formatting and even glean extra information about the number, like whether it was a mobile or landline or what state or province it was from. Libphonenumber in its basic form consists of a set of rules and regular expressions in an XML file for breaking down and parsing a number.

How do you convert a java String to a mailing address object?

As input I am getting an address as a String. It may say something like "123 Fake Street\nLos Angeles, CA 99988". How can I convert this into an object with fields like this:
Address1
Address2
City
State
Zip Code
Or something similar to this? If there is a java library that can do this, all the better.
Unfortunately, I don't have a choice about the String as input. It's part of a specification I'm trying to implement.
The input is not going to be very well structured so the code will need to be very fault tolerant. Also, the addresses could be from all over the world, but 99 out of 100 are probably in the US.
You can use JGeocoder
public static void main(String[] args) {
Map<AddressComponent, String> parsedAddr = AddressParser.parseAddress("Google Inc, 1600 Amphitheatre Parkway, Mountain View, CA 94043");
System.out.println(parsedAddr);
Map<AddressComponent, String> normalizedAddr = AddressStandardizer.normalizeParsedAddress(parsedAddr);
System.out.println(normalizedAddr);
}
Output will be:
{street=Amphitheatre, city=Mountain View, number=1600, zip=94043, state=CA, name=Google Inc, type=Parkway}
{street=AMPHITHEATRE, city=MOUNTAIN VIEW, number=1600, zip=94043, state=CA, name=GOOGLE INC, type=PKWY}
There is another library International Address Parser you can check its trial version. It supports country as well.
AddressParser addressParser = AddressParser.getInstance();
AddressStandardizer standardizer = AddressStandardizer.getInstance();//if enabled
AddressFormater formater = AddressFormater.getInstance();
String rawAddress = "101 Avenue des Champs-Elysées 75008 Paris";
//you can try to detect the country
CountryDetector detector = CountryDetector.getInstance();
String countryCode = detector.getCountryCode("7580 Commerce Center Dr ALABAMA");
System.out.println("detected country=" + countryCode);
Also, please check Implemented Countries in this library.
Cheers !!
I work at SmartyStreets where we develop address parsing and extraction algorithms.
It's hard.
If most of your addresses are in the US, you can use an address verification service to provide guaranteed accurate parse results (since the addresses are checked against a master list).
There are several providers out there, so take a look around and find one that suits you. Since you probably won't be able to install the database locally (not without a big fee, because address data is licensed by the USPS), look for one that offers a REST endpoint so you can just make an HTTP request. Since it sounds like you have a lot of addresses, make sure the API is high-performing and lets you do batch requests.
For example, with ours:
Input:
13001 Point Richmond Dr NW, Gig Harbor WA
Output:
Or the more specific breakdown of components, if needed:
If the input is even messier, there are a few address extraction services available that can handle a little bit of noise within an address and parse addresses out of text and turn them into their components. (SmartyStreets offers this also, as a beta API. I believe some other NLP services do similar things too.)
Granted, this only works for US addresses. I'm not as expert on UK or Canadian addresses, but I believe they may be slightly simpler in general.
(Beyond a small handful of well-developed countries, international data is really hit-and-miss. Reliable data sets are hard to obtain or don't exist. But if you're on a really tight budget you could write your own parser for all the address formats.)
If you are sure on the format, you can use regular expressions to get the address out of the string. For the example you provided something like this:
String address = "123 Fake Street\\nLos Angeles, CA 99988";
String[] parts = address.split("(.*)\\n(.*), ([A-Z]{2}) ([0-9]{5})");
I assume the sequence of information is always the same, as in the user will never enter postal code before State. If I got your question correctly you need logic to process afdress that may be incomplete (like missing a portion).
One way to do it is look for portions of string you know are correct. You can treat the known parts of Address as separators. You will need City and State names and address words (Such as "Street", "Avenue", "Road" etc) in an array.
Perform Index of with cities,states and the address words (and store them).
Substring and cut out the 1st line of address (from start to the index of address signifying word +it's length).
Check index of city name (index found in step 1). If it's -1 skip this step. If it's 0 Take it out (0 also means address line 2 is not in string). If it's more than 0, Substring and cut out anything from start of string to index of city name as the 2nd line of address.
Check the index of state name. Once again if -1 skip this step. If 0 substring and cut out as state name.
Whatever remains is your postal code
Check the strings you just extracted for left over separators (commas, dots, new lines etc) and extract them;
If the address is missing both state and city you would actually need an a list of zip codes too, so better ensure the user enters at least 1 of them.
It's not impossible to implement what you need, but you probably don't want to waste all that time doing it. It's easier to just ensure user enters everything correctly.
Maybe you can use Regular Expression

Java NIO: Writing File Header - Using SeekableByteChannel

I am manually serializing data objects to a file, using a ByteBuffer and its operations such as putInteger(), putDouble() etc.
One of the fields I'd like to write-out is a String. For the sake of example, let's say this contains a currency. Each currency has a three-letter ISO currency code, e.g. GBP for British Pounds Sterling.
Assuming each object I'm serializing just has a double and a currency; you could consider the serialized data to look something like:
100.00|GBP
200.00|USD
300.00|EUR
Obviously in reality I'm not delimiting the data (the pipe between fields, nor the line feeds), it's stored in binary - just using the above as an illustration.
Encoding the currency with each entry is a bit inefficient, as I keep storing the same three-characters. Instead, I'd like to have a header - which stores a mapping for currencies. The file would look something like:
100
GBP
USD
EUR
~~~
~~~
100.00|1
200.00|2
300.00|3
The first 2 bytes in the file is a short, filled with the decimal value 100. This informs me that there are 100 spaces for currencies in the file. Following this, there are 3-byte chunks which are the currencies in order (ASCII-only characters).
When I read the file back in, all I have to do is build up a 100-element array with the currency codes, and I can cheaply / efficiently look up the relevant currency for each line.
Reading the file back-in seems simple. But I'm interested to hear thoughts on writing-out the data.
I don't know all the currencies up-front, and I'm actually supporting any three-character code - even if it's invalid. Thus I have to build-up the table converting currencies to indexes on-the-fly.
I am intending on using a SeekableByteChannel to address my file, and seeking back to the header every time I find a new currency I've not indexed before.
This has obvious I/O overhead of moving round the file. But, I am expecting to see all the different currencies within the first few data objects written. So it'll probably only seek for the first few seconds of execution, and then not have to perform an additional seek for hours.
The alternative is to wait for the stream of data to finish, and then write the header once. However, if my application crashes and I haven't written-out the header, the data in the file cannot be recovered back to its original content.
Seeking seems like the right thing to do, but I've not attempted it before - and was hoping to hear horror-stories up-front, rather than through trial/error on my end.
The problem with your approach is that you say that you do not want to limit the number of currency codes which implies that you don’t know how much space you have to reserve for the header. Seeking in a plain local file might be cheap if not performed too often, but shifting the entire file contents to reserve more room for the header is big.
The other question is how you define efficiency. If you don’t limit the number of currency codes you have to be aware of the case that a single byte is not sufficient for your index so you need either a dynamic possibly-multi-byte encoding which is more complicated to parse or a fixed multi-byte encoding which ends up taking the same number of bytes as the currency code itself.
So if not space-efficiency for the typical case is more important to you than decoding efficiency you can use the fact that these codes are all made up of ASCII characters only. So you can encode each currency code in three bytes and if you accept one padding byte you can use a single putInt/getInt for storing/retrieving a currency code without the need for any header lookup.
I don’t believe that optimizing these codes further would improve you storage significantly. The table does not consist of currency codes only. It’s very likely the other data will take much more space.

Where can I find "reference barcodes" to verify barcode library output?

This question is not about 'best' barcode library recommendation, we use various products on different platforms, and need a simple way to verify if a given barcode is correct (according to its specification).
We have found cases where a barcode is rendered differently by different barcode libraries and free online barcode generators in the Internet. For example, a new release of a Delphi reporting library outputs non-numeric characters in Code128 as '0' or simply skips them in the text area. Before we do the migration, we want to check if these changes are caused by a broken implementation in the new library so we can report this as a bug to the author.
We mainly need Code128 and UCC/EAN-128 with A/B/C subcodes.
Online resources I checked so far are:
IDAutomation.com (displays ABC123 as 0123 with Code128-C)
Morovia.com
BarcodesInc (does not accept comma)
TEC-IT
They show different results too, for example in support for characters like comma or plus signs, at least in the human readable text.
For Code128 there isn't a single correct answer. If you use Code128-A you can get a different result than Code128-C. By result I mean how it looks. Take "803150" as an example. In Code128-A you'll need 6 characters (+ start, checksum, stop) to represent this number. Code128-C only consists of numbers, so you can compress two digits into one character. Hence you'll need only 3 characters (+ start, checksum, stop) to represent the same number. The barcodes will look different (A being longer in this case), but if you scan them both will give the correct number.
Further, Code128 doesn't need to be just A, B or C. You can actually combine the different subsets. This is common for cases like "US123457890", where Code128-A or B is used on "US" and Code128-C is used on the remaining digits. This is sometime referred to as Code-128 Auto, or just Code-128. The result is a "compressed" barcode in terms of width. You could represent the same data with A/B but again that would give you a longer barcode.
Take two online generators:
IDAutomation
BarcodesInc
I recommend the first one, where you can select between Auto/A/B/C. Here is an example image illustrating the differences:
On IDAutomation, Auto is default while A is default on Barcodes-Inc. Both are correct, you just need to be careful what subset you have selected when comparing output. I also recommend a barcode reader for use in development to test the output. Also, see this page for a comparision of the different subsets with ASCII values. I also find grandzebu.net useful, which has a free Code128 font you can use as well.
It sounds like your Delphi library always use Code128-C, since it's only possible to represent numbers in this subset.
Why not just scan them and see what comes back?

Categories