List of reserved words in Android - java

I'm currently in the interface design process of developing another Android app and once again I seem to be trying to use reserved words for the resources (be it drawables and layouts). To my knowledge there are a set of rules you need to know:
No uppercase is allowed.
No symbols apart from underscore.
No numbers
Appart from those (please correct me if I'm wrong) I think you can't use any of the reserver words from JAVA which after a little googling appear to be the following:
So my question would be if there is somewhere in the docs that I've failed to locate, that explains in detail what we can and can not use for resource names. This is right after reading the page about resources so its possible that I'm simply worthless in reading.
Source for the reserved words

To my knowledge there are a set of rules you need to know:
No uppercase is allowed.
AFAIK, that is not a rule. A convention is to use all lowercase, but mixed case has worked.
NOTE: In layouts you can ONLY use lowercase letters (a-z), numbers (0-9) and underscore (_).
No symbols apart from underscore
Correct. More accurately, the name has to be a valid Java data member name, which limits you to letters, numbers, and underscores, and it cannot start with a number.
No numbers
That is not a rule, though your name cannot start with a number, as noted above.
Appart from those (please correct me if I'm wrong) I think you can't use any of the reserver words from JAVA which after a little googling appear to be the following:
That is because reserved words are not valid Java data member names.
So my question would be if there is somewhere in the docs that I've failed to locate, that explains in detail what we can and can not use for resource names
Apparently not.

Well my answer would be a mix of some pages where you can find what you need.
1.- First i would recommend you to read the Conventions that Oracle recommends for java
NOTE: especially the section of "Naming Conventions" (this is something that most of the other answers have), after that i would suggest you to read the "Java Languages Keywords" cause you can not use any of those words, BUT remember that JAVA is CASE-SENSITIVE, so if you write "Abstract" instead of "abstract" then is OK, but of course that may confuse someone later one (maybe yourself).
2.- Last but not least you can read the "Code Style Guidelines", this are the conventions that contributors to the android source code need to apply to their code to be accepted.
If you follow this rules, your code not only will be valid (Of course this is important), is going to be more readable for you and others, and if another person needs to make some modification later on, that would be a easier task than if you just start typing random names like "x1, x2, X1, _x1, etc..."
OTHER USEFUL ARTICLE:
If you are starting your app, then this article is going to be very useful for you, it explains why the use of setters and getters in a exaggerated way is a very bad practice, they need to be ONLY if is needed not just for setting and getting every variable in your object.

If you use identifiers that are valid Java variable names (this means consist only of a-z, A-Z, 0-9 and the underscore characters) you will not have any problems. The actual namespace is probably larger, but this works for me.
documentation

I'll just chip in and say this:
You can't use keywords but managing android resources isn't quite easy either... for instance, you cannot have different folders for drawables, they need to go to drawable-xxxx folder...
So, try to come up with sensible prefixes for your drawables and selectors.
Android accepts all valid Java variable names so I don't really see where this question comes from.

Related

What's the proper constant naming convention for words stylized in CamelCase?

Everyone knows that Java's constant naming convention is uppercase with underscores between words, like USERNAME and ERROR_CODE.
But should words/names that are normally spelled in CamelCase also use an underscore? Should SomeBrand™ be named SOME_BRAND or SOMEBRAND? The latter can be harder to read and it doesn't convey the proper case, but the former can be incorrectly read as separate words, especially with another word attached (e.g. SOME_BRAND_ID). Is there an accepted convention? Maybe a third option?
In your case, I would use SOMEBRAND. One reason is the same that jonmecer mentioned, because SomeBrand is treated as one word in a sentence. In addition, SOME_BRAND could lead to confusion about its definition: do you mean you are talking about "some brand", or are you talking about your one and only SomeBrand™? Putting it as SOMEBRAND clears up any confusion in that regard.
According to sun each word should be separated by "_". Now when you type SomeBrand in a sentence, you would treat it as a word. So, I think SOMEBRAND works better. But then again, this is based on your company's policy.

Getting the alphabet of a language from the Locale in Java

I am making an internationalized app in Java. I need a list of all the letters in a language, starting from the Locale. There are some questions like Alphabet constant in Java? or Create Alphabet List from list : Java which touch on the issue, but I'm wondering is there a Utils class or something where it's already defined and where I can get a list of chars or a String containing all the letters in the alphabet of a language by it's Locale.
You can refer this library and methods in detail, com.ibm.icu.util.LocaleData. Pass argument as Locale.ENGLISH to get alphabets of English.
There are several issues here.
First, I have to point out that there are many languages that aren't alphabetic. Obviously, Chinese, or Japanese are examples of ideographic languages. Unfortunately, it will be very hard, next to impossible to create a list of all the characters in these languages.
Second, although Common Locale Data Repository and as a consequence ICU have predefined sets of index exemplars and example characters this information is far from being complete.
Third, there are languages that use more than one script (aka writing system). Depending on the source of your locale you may or may not know which characters needs to be displayed.
Finally, it is hard to give you right answer when you haven't provided your use case. The design of your application may impose serious limitations on usability or localizability...

Regex: what is InCombiningDiacriticalMarks?

The following code is very well known to convert accented chars into plain Text:
Normalizer.normalize(text, Normalizer.Form.NFD).replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
I replaced my "hand made" method by this one, but i need to understand the "regex" part of the replaceAll
1) What is "InCombiningDiacriticalMarks" ?
2) Where is the documentation of it? (and similars?)
Thanks.
\p{InCombiningDiacriticalMarks} is a Unicode block property. In JDK7, you will be able to write it using the two-part notation \p{Block=CombiningDiacriticalMarks}, which may be clearer to the reader. It is documented here in UAX#44: “The Unicode Character Database”.
What it means is that the code point falls within a particular range, a block, that has been allocated to use for the things by that name. This is a bad approach, because there is no guarantee that the code point in that range is or is not any particular thing, nor that code points outside that block are not of essentially the same character.
For example, there are Latin letters in the \p{Latin_1_Supplement} block, like é, U+00E9. However, there are things that are not Latin letters there, too. And of course there are also Latin letters all over the place.
Blocks are nearly never what you want.
In this case, I suspect that you may want to use the property \p{Mn}, a.k.a. \p{Nonspacing_Mark}. All the code points in the Combining_Diacriticals block are of that sort. There are also (as of Unicode 6.0.0) 1087 Nonspacing_Marks that are not in that block.
That is almost the same as checking for \p{Bidi_Class=Nonspacing_Mark}, but not quite, because that group also includes the enclosing marks, \p{Me}. If you want both, you could say [\p{Mn}\p{Me}] if you are using a default Java regex engine, since it only gives access to the General_Category property.
You’d have to use JNI to get at the ICU C++ regex library the way Google does in order to access something like \p{BC=NSM}, because right now only ICU and Perl give access to all Unicode properties. The normal Java regex library supports only a couple of standard Unicode properties. In JDK7 though there will be support for the Unicode Script propery, which is just about infinitely preferable to the Block property. Thus you can in JDK7 write \p{Script=Latin} or \p{SC=Latin}, or the short-cut \p{Latin}, to get at any character from the Latin script. This leads to the very commonly needed [\p{Latin}\p{Common}\p{Inherited}].
Be aware that that will not remove what you might think of as “accent” marks from all characters! There are many it will not do this for. For example, you cannot convert Đ to D or ø to o that way. For that, you need to reduce code points to those that match the same primary collation strength in the Unicode Collation Table.
Another place where the \p{Mn} thing fails is of course enclosing marks like \p{Me}, obviously, but also there are \p{Diacritic} characters which are not marks. Sadly, you need full property support for that, which means JNI to either ICU or Perl. Java has a lot of issues with Unicode support, I’m afraid.
Oh wait, I see you are Portuguese. You should have no problems at all then if you only are dealing with Portuguese text.
However, you don’t really want to remove accents, I bet, but rather you want to be able to match things “accent-insensitively”, right? If so, then you can do so using the ICU4J (ICU for Java) collator class. If you compare at the primary strength, accent marks won’t count. I do this all the time because I often process Spanish text. I have an example of how to do this for Spanish sitting around here somewhere if you need it.
Took me a while, but I fished them all out:
Here's regex that should include all the zalgo chars including ones bypassed in 'normal' range.
([\u0300–\u036F\u1AB0–\u1AFF\u1DC0–\u1DFF\u20D0–\u20FF\uFE20–\uFE2F\u0483-\u0486\u05C7\u0610-\u061A\u0656-\u065F\u0670\u06D6-\u06ED\u0711\u0730-\u073F\u0743-\u074A\u0F18-\u0F19\u0F35\u0F37\u0F72-\u0F73\u0F7A-\u0F81\u0F84\u0e00-\u0eff\uFC5E-\uFC62])
Hope this saves you some time.

Detecting words that start with an accented uppercase using regular expressions

I want to extract the words that begin with a capital — including accented capitals — using regular expressions in Java.
This is my conditional for words beginning with capital A through Z:
if (link.text().matches("^[A-Z].+") == true)
But I also want words that begin with an accented uppercase character, too.
Do you have any ideas?
Start with http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
\p{javaUpperCase} Equivalent to java.lang.Character.isUpperCase()
To match an uppercase letter at the beginning of the string, you need the pattern ^\p{Lu}.
Unfortunately, Java does not support the mandatory \p{Uppercase} property, necessary for meeting UTS#18’s RL1.2.
That’s hardly the only thing missing from Java regular expressions to meet even Level 1, the most bareboned Basic Unicode Functionality. Without Level 1, you really can’t work with Unicode test using regular expressions. Too much is broken or absent.
UTS#18’s RL1.1 will finally be met with JDK7, but I do not believe there are currently any plans to meet RL1.2, RL1.2a, or any of the others that it’s currently lacking, nor even meeting the two Strong Recommendations. Alas!
Indeed, of the very short list of mandatory properties required by RL1.2, Java is missing the \p{Alphabetic}, \p{Uppercase}, \p{Lowercase}, \p{White_Space}, \p{Noncharacter_Code_Point}, \p{Default_Ignorable_Code_Point}, \p{ANY}, and \p{ASSIGNED} properties. Those are all mandatory but either completely missing or else fail to obey The Unicode Standard with respect to their definitions. This is also the problem with the POSIX compatible properties in Java: they’re all broken with respect to UTS#18.
Prior to JDK7, it is also missing the mandatory Script properties. JDK7 does get script properties at long last, but that’s all — nothing else. Java is still light years away from meeting even RL1.2a, which is a daily gotcha for zillions of programmers.
In JDK7, you can finally also two-part properties in the form \p{name=value} if they’re block, script, or general categories. That means these are all the same in JDK7’s Pattern class:
\p{Block=Number_Forms}, \p{blk=Number_Forms}, and \p{InNumber_Forms}.
\p{Script=Latin}, \p{sc=Latin}, \p{IsLatin}, and \p{Latin}.
\p{General_Category=Lu}, \p{GC=Lu}, and \p{Lu}.
However, you still cannot use the the long forms like \p{Lowercase_Letter} and \p{Letter_Number}, and the POSIX-looking properties are all broken from RL1.2a’s perspective. Plus super-basic properties from RL1.2 like \p{White_Space} and \p{Alphabetic} are still missing.
There was some talk of trying to fix \b and \B, which are miserably broken with respect to \w and \W, but I don't know how they’re going to fix all that without fully complying with RL1.2a. And no, I have no idea when they will add those basic properties to Java. You can’t get by without them, either.
To fully work with Unicode using regexes in Java at even Level 1, you really cannot use the standard Pattern class that Java comes with. The easiest way to do so is to instead use JNI to connect up with ICU regex libraries using the Google Android code, which is available.
There do exist other languages that are at least Level-1 compliant (or better) with UTS#18, but if you want to stay within Java, ICU is currently your own real option.
java has an method java.lang.Character.isUpperCase, its not exactly a regular expression, but might satisfy.
http://download.oracle.com/javase/1.5.0/docs/api/java/lang/Character.html#isUpperCase(int)

Need some ideas on how to acomplish this in Java (parsing strings)

Sorry I couldn't think of a better title, but thanks for reading!
My ultimate goal is to read a .java file, parse it, and pull out every identifier. Then store them all in a list. Two preconditions are there are no comments in the file, and all identifiers are composed of letters only.
Right now I can read the file, parse it by spaces, and store everything in a list. If anything in the list is a java reserved word, it is removed. Also, I remove any loose symbols that are not attached to anything (brackets and arithmetic symbols).
Now I am left with a bunch of weird strings, but at least they have no spaces in them. I know I am going to have to re-parse everything with a . delimiter in order to pull out identifiers like System.out.print, but what about strings like this example:
Logger.getLogger(MyHash.class.getName()).log(Level.SEVERE,
After re-parsing by . I will be left with more crazy strings like:
getLogger(MyHash
getName())
log(Level
SEVERE,
How am I going to be able to pull out all the identifiers while leaving out all the trash? Just keep re-parsing by every symbol that could exist in java code? That seems rather lame and time consuming. I am not even sure if it would work completely. So, can you suggest a better way of doing this?
There are several solutions that you can use, other than hacking your-own parser:
Use an existing parser, such as this one.
Use BCEL to read bytecode, which includes all fields and variables.
Hack into the compiler or run-time, using annotation processing or mirrors - I'm not sure you can find all identifiers this way, but fields and parameters for sure.
I wouldn't separate the entire file at once according to whitespace. Instead, I would scan the file letter-by-letter, saving every character in a buffer until I'm sure an identifier has been reached.
In pseudo-code:
clean buffer
for each letter l in file:
if l is '
toggle "character mode"
if l is "
toggle "string mode"
if l is a letter AND "character mode" is off AND "string mode" is off
add l to end of buffer
else
if buffer is NOT a keyword or a literal
add buffer to list of identifiers
clean buffer
Notice some lines here hide further complexity - for example, to check if the buffer is a literal you need to check for both true, false, and null.
In addition, there are more bugs in the pseudo-code - it will find identify things like the e and L parts of literals (e in floating-point literals, L in long literals) as well. I suggest adding additional "modes" to take care of them, but it's a bit tricky.
Also there are a few more things if you want to make sure it's accurate - for example you have to make sure you work with unicode. I would strongly recommend investigating the lexical structure of the language, so you won't miss anything.
EDIT:
This solution can easily be extended to deal with identifiers with numbers, as well as with comments.
Small bug above - you need to handle \" differently than ", same with \' and '.
Wow, ok. Parsing is hard -- really hard -- to do right. Rolling your own java parser is going to be incredibly difficult to do right. You'll find there are a lot of edge cases you're just not prepared for. To really do it right, and handle all the edge cases, you'll need to write a real parser. A real parser is composed of a number of things:
A lexical analyzer to break the input up into logical chunks
A grammar to determine how to interpret the aforementioned chunks
The actual "parser" which is generated from the grammar using a tool like ANTLR
A symbol table to store identifiers in
An abstract syntax tree to represent the code you've parsed
Once you have all that, you can have a real parser. Of course you could skip the abstract syntax tree, but you need pretty much everything else. That leaves you with writing about 1/3 of a compiler. If you truly want to complete this project yourself, you should see if you can find an example for ANTLR which contains a preexisting java grammar definition. That'll get you most of the way there, and then you'll need to use ANTLR to fill in your symbol table.
Alternately, you could go with the clever solutions suggested by Little Bobby Tables (awesome name, btw Bobby).

Categories