Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 6 months ago.
Improve this question
First, I understand this goes against all convention and advice, but I want to do it anyway.
How can I (or is it even possible) compile java code using unicode characters in identifiers (method names, variable names, etc.)
I want to be able to do something like the following:
public class 😋 extends 😃 {
public void сделайЧтонибудь() { ... }
}
Completely ridiculous example, but you get the point.
No, you can't.
An identifier has to start with a so-called Java letter that is
[...] a character for which the method Character.isJavaIdentifierStart(int) returns true.
Which in turn means
A character [ch] may start a Java identifier if and only if one of the following conditions is true:
isLetter(ch) returns true
getType(ch) returns LETTER_NUMBER
ch is a currency symbol (such as '$')
ch is a connecting punctuation character (such as '_').
The (optional) subsequent characters must be a Java letter-or-digit, that is
[...] a character for which the method Character.isJavaIdentifierPart(int) returns true.
Which in turn means
A character may be part of a Java identifier if any of the following conditions are true:
it is a letter
it is a currency symbol (such as '$')
it is a connecting punctuation character (such as '_')
it is a digit
it is a numeric letter (such as a Roman numeral character)
it is a combining mark
it is a non-spacing mark
isIdentifierIgnorable returns true for the character
None of the above is true for either 😋 or 😃, but it is for сделайЧтонибудь which is, in fact, a valid identifier.
What you could do (why bother, tho) is write a pre-processor that translates those emojis into sequences of Java letters, with its output being a java program with valid identifiers which you can finally feed to the compiler.
This is not valid Java, so you can't "make" it compile. Choose a valid identifier name as defined by the specification:
https://docs.oracle.com/javase/specs/jls/se18/html/jls-3.html#jls-3.8
Identifiers may contain "Java letters" or "Java digits", which are unicode, but do not allow arbitrary unicode symbols:
The "Java letters" include uppercase and lowercase ASCII Latin letters A-Z (\u0041-\u005a), and a-z (\u0061-\u007a), and, for historical reasons, the ASCII dollar sign ($, or \u0024) and underscore (_, or \u005f). The dollar sign should be used only in mechanically generated source code or, rarely, to access pre-existing names on legacy systems. The underscore may be used in identifiers formed of two or more characters, but it cannot be used as a one-character identifier due to being a keyword.
The "Java digits" include the ASCII digits 0-9 (\u0030-\u0039).
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I need a Regex expression for the following:
9845d530c7594ab45e8b905bbff
It should always start with 984, then have a UUID and the maximum length should be 27. Over here it is 5d530c7594ab45e8b905bbff (the UUID).I know for the UUID but I am not sure how to combine it in one.
For starting with 984 it should be
^984
And for a specific length it should be
\d{27}
But I am not sure about UUID(which over here should be case insensitive).
^984[0-9a-fA-F]{24}$
A quick explanation:
^984 must begin with "984"
[...]{24}$ 24 characters that match the given character set, then end
[0-9a-fA-F] a character set that includes any number 0-9, character a-f or character A-F
You can also use character classes \d for the numeric portion (must be a single numeric digit), but I like to be explicit, because otherwise my brain hurts. Character classes are useful if you're running against an unknown character set that might have multiple representations for a number. For instance, \d might match the Arabic numbers (٠ ١ ٢ ٣ ٤ ٥ ٦ ٧ ٨ ٩) or Devanagari numbers (० १ २ ३ ४ ५ ६ ७ ८ ९).
There are also regex "modifiers" that allow case insensitivity without having to specify the upper and lower case in the character set. It does depend on which regex implementation you use.
Java's built-in regular expressions library has a modifier flag:
Pattern.compile("^984[0-9a-f]$",CASE_INSENSITIVE);
I tried researching for this, got it to partially work but when include the special characters, it won't work. On the special characters - it is NOT required but if a special character is used, it must only be the allowed special character (# and -(dash)).
Tried this but it wont work. Anyone can help? by the way it should be at least 8 characters long
^(?=.*?[a-zA-Z])(?=.*?[0-9#-]).{8,}$
Some examples:
"JohnDoe" should be invalid
"JohnDoe2" should be valid
"22222222" should be invalid
"22222222a" should be valid
"JohnDoe2#" should be valid
"JohnDoe2#" should be invalid
"johndoe2" should be valid
This should do the trick:
(?=(?:.*[a-zA-Z]){1,})(?=(?:.*[#-]){0,})(?=(?:.*[0-9]){1,})^[a-zA-Z0-9#-]*$
The first section follows the pattern (?=(?:.*[GROUP]){NUMBER}) from here.
Then I added the section ^[a-zA-Z0-9#-]*$ which basically says from the beginning of the string, ^, to the end ,$, the only characters present should be from the set [a-zA-Z0-9#-]
Here is the Regex101 with unit tests you provided.
I have a password expression like below.
It has to allow either one small letter, one capital letter, one numeric or one small letter, one capital letter , one special character or one small letter, one capital letter, one numeric, one special character. I joined all the three conditions using | or. It should be min of 8 and max 20 characters. It should allow only specific special characters $##!%. But here it is allowing all the special characters though I mentioned specific set. Thats the main issue. I spent lot time in changing patterns but still the same allowing all special characters. –
I don't understand why its allowing ^ (Marr1234^)?
(((?=.\d)(?=.[a-z])(?=.[A-Z])(?=.[$##!%]))|((?=.[a-z])(?=.[A-Z])(?=.[$##!%]))|((?=.\d)(?=.[a-z])(?=.[A-Z]))).{8,20}
Any ideas
Have you thought about .{8,20} matching your input? This term matches every character (.==every character) 8 up to 20 times... So this would also match the ^-character
Thanks rdmuller for the help. I was able to fix it.
I need to add the range [A-Za-z0-9] and avoid "." like .{8,20}.
Here is the expression I used
^(?=.\d)(?=.[a-z])(?=.*[A-Z])[A-Za-z0-9]{8,20}$
I want to include backslash in string variable name how to do that .
Ex:
String Cd_St_SSLC/PUC;
/ (forward-slashes) are discouraged as they are reserved characters. The presence of a / will throw a compile-time error if you are not dividing, commenting (//, /** */, or /* */), or enclosing it in a string ("//") or treating it as a character literal ('//'). Operators cannot be in a variable's name.
The Java™ Tutorials
Variables
Naming
Every programming language has its own set of rules and conventions for the kinds of names that you're allowed to use, and the Java programming language is no different. The rules and conventions for naming your variables can be summarized as follows:
Variable names are case-sensitive. A variable's name can be any legal identifier — an unlimited-length sequence of Unicode letters and digits, beginning with a letter, the dollar sign "$", or the underscore character "". The convention, however, is to always begin your variable names with a letter, not "$" or "". Additionally, the dollar sign character, by convention, is never used at all. You may find some situations where auto-generated names will contain the dollar sign, but your variable names should always avoid using it. A similar convention exists for the underscore character; while it's technically legal to begin your variable's name with "_", this practice is discouraged. White space is not permitted.
Subsequent characters may be letters, digits, dollar signs, or underscore characters. Conventions (and common sense) apply to this rule as well. When choosing a name for your variables, use full words instead of cryptic abbreviations. Doing so will make your code easier to read and understand. In many cases it will also make your code self-documenting; fields named cadence, speed, and gear, for example, are much more intuitive than abbreviated versions, such as s, c, and g. Also keep in mind that the name you choose must not be a keyword or reserved word.
If the name you choose consists of only one word, spell that word in all lowercase letters. If it consists of more than one word, capitalize the first letter of each subsequent word. The names gearRatio and currentGear are prime examples of this convention. If your variable stores a constant value, such as static final int NUM_GEARS = 6, the convention changes slightly, capitalizing every letter and separating subsequent words with the underscore character. By convention, the underscore character is never used elsewhere.
See also 1/2
The Java language specification for identifiers.
The Java® Language Specification: Java SE 7 Edition
Chapter 3. Lexical Structure
3.8. Identifiers
An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter.
Identifier:
IdentifierChars but not a Keyword or BooleanLiteral or NullLiteral
IdentifierChars:
JavaLetter
IdentifierChars JavaLetterOrDigit
JavaLetter:
any Unicode character that is a Java letter (see below)
JavaLetterOrDigit:
any Unicode character that is a Java letter-or-digit (see below)
3.12. Operators
37 tokens are the operators, formed from ASCII characters.
Operator: one of
= > < ! ~ ? :
== <= >= != && || ++ --
+ - * / & | ^ % << >> >>>
+= -= *= /= &= |= ^= %= <<= >>= >>>=
See also 2/2
The following method Character.isUnicodeIdentifierPart can determine "if the character may be part of a Unicode identifier".
Method: Java.lang.Character.isUnicodeIdentifierPart()
Description
The java.lang.Character.isUnicodeIdentifierPart(char ch) [method] determines if the specified character may be part of a Unicode identifier as other than the first character.
A character may be part of a Unicode identifier if and only if one of the following statements is true:
it is a letter
it is a connecting punctuation character (such as '_')
it is a digit
it is a numeric letter (such as a Roman numeral character)
it is a combining mark
it is a non-spacing mark
isIdentifierIgnorable returns true for this character.
That's a forward slash, and not legal in a Java variable name because it is the division operator.
int a = b/c;
I suggest you to take into consideration the Java naming conventions! You can read more about this in "Thinking in java", from http://java.about.com/od/javasyntax/a/nameconventions.htm... It is a good practice to avoid characters like '/', maybe you can replace it with '_'.
Roxana
I assist you not use this String Cd_St_SSLC/PUC because it is not legal in a Java variable name instead of this if you want to meaningful name use String Cd_St_SSLC_PUC underscore.
In the Java Language Specification 6.2 Link
Here is the following code example:
class Test {
public static void main(String[] args) {
Class c = System.out.getClass();
System.out.println(c.toString().length() +
args[0].length() + args.length);
}
}
And it states:
the identifiers Test, main, and the first occurrences of args and c are not names. Rather, they are used in declarations to specify the names of the declared entities. The names String, Class, System.out.getClass, System.out.println, c.toString, args, and args.length appear in the example.
But are the names like Class and String also identifiers? What is an identifier exactly?
An identifier is a type of a token. From the specification of the lexical structure of Java:
3.8. Identifiers
An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter.
Identifier:
IdentifierChars but not a Keyword or BooleanLiteral or NullLiteral
IdentifierChars:
JavaLetter
IdentifierChars JavaLetterOrDigit
JavaLetter:
any Unicode character that is a Java letter (see below)
JavaLetterOrDigit:
any Unicode character that is a Java letter-or-digit (see below)
A "Java letter" is a character for which the method
Character.isJavaIdentifierStart(int) returns true.
A "Java letter-or-digit" is a character for which the method
Character.isJavaIdentifierPart(int) returns true.
The "Java letters" include uppercase and lowercase ASCII Latin letters
A-Z (\u0041-\u005a), and a-z (\u0061-\u007a), and, for historical
reasons, the ASCII underscore (_, or \u005f) and dollar sign ($, or
\u0024). The $ character should be used only in mechanically generated
source code or, rarely, to access pre-existing names on legacy
systems.
The "Java digits" include the ASCII digits 0-9 (\u0030-\u0039).
Letters and digits may be drawn from the entire Unicode character set,
which supports most writing scripts in use in the world today,
including the large sets for Chinese, Japanese, and Korean. This
allows programmers to use identifiers in their programs that are
written in their native languages.
An identifier cannot have the same spelling (Unicode character
sequence) as a keyword (§3.9), boolean literal (§3.10.3), or the null
literal (§3.10.7), or a compile-time error occurs.
An identifier is a user defined symbol.
It allows the compiler to differentiate between bindings to objects of the same type in the symbol table.
This might answer your 2nd question:
http://www.cafeaulait.org/course/week2/08.html
Identifiers are the names of variables, methods, classes, packages and
interfaces. Unlike literals they are not the things themselves, just
ways of referring to them.