I know your first reaction will be "why on earth would you do this, your method names are clearly ridiculous" but it's because I'm using Spring Boot JPA where you can use the method name to construct the query for you by reflection (it's amazing). But when querying based on a few different columns and the entity variables have longish names the method name ends up being quite long and hard to read, so I was wondering if there is a way of splitting it across multiple lines?
At the moment I have something like (simplified for this question):
public List<Employee> findByFirstNameAndLastNameAndAgeBetweenAndBirthdayBefore(String firstName, String lastName, Integer minAge, Integer maxAge, Date maxBirthday);
And I would like it in my code to be more like:
public List<Employee> findBy
FirstName
AndLastName
AndAgeBetween
AndBirthdayBefore
(String firstName, String lastName, Integer minAge, Integer maxAge, Date maxBirthday);
Is this at all possible?
A method name is an identifier, which is made up of IdentifierChars.
IdentifierChars are defined as starting with a Java Letter, and thereafter may be a Java Letter or Digit. Those are described in the Javadoc of Character.isJavaIdentifierPart (and isJavaIdentifierStart):
A character may be part of a Java identifier if any of the following
are true:
it is a letter it is a currency symbol (such as '$') it is a
connecting punctuation character (such as '_')
it is a digit
it is a
numeric letter (such as a Roman numeral character)
it is a combining
mark
it is a non-spacing mark
isIdentifierIgnorable(codePoint) returns
true
And isIdentifierIgnorable(int) says (emphasis mine):
The following Unicode characters are ignorable in a Java identifier or a Unicode identifier:
ISO control characters that are not whitespace
'\u0000' through '\u0008'
'\u000E' through '\u001B'
'\u007F' through '\u009F'
all characters that have the FORMAT general category value
So no, you can't have newlines in a method name.
Well, that is an invalid method declaration, Java doesn't allow you to extend method name to next line but even if it does it is still not a good practice.
I guess you trying to do this because of long method names, and even Spring Data JPA documentation suggests that if your method names are growing large you consider writing the query by using #Query annotation or use Query DSL.
No you can't include line breaks into you method name.
Because a method name should be a valid "Java Identifier" (see "Method Declarations" from Java Language Specification) and a "Java Identifier" is made of Java letters or digits but no line breaks (see "Identifiers" from Java Language Specification)
The "Java letters" include uppercase and lowercase ASCII Latin letters A-Z (\u0041-\u005a), and a-z (\u0061-\u007a), and, for historical reasons, the ASCII underscore (_, or \u005f) and dollar sign ($, or \u0024). The $ character should be used only in mechanically generated source code or, rarely, to access pre-existing names on legacy systems.
The "Java digits" include the ASCII digits 0-9 (\u0030-\u0039).
Related
I found an interesting regex in a Java project: "[\\p{C}&&\\S]"
I understand that the && means "set intersection", and \S is "non-whitespace", but what is \p{C}, and is it okay to use?
The java.util.regex.Pattern documentation doesn't mention it. The only similar class on the list is \p{Cntrl}, but they behave differently: they both match on control characters, but \p{C} matches twice on Unicode characters above U+FFFF, such as PILE OF POO:
public class StrangePattern {
public static void main(String[] argv) {
// As far as I can tell, this is the simplest way to create a String
// with code points above U+FFFF.
String poo = new String(Character.toChars(0x1F4A9));
System.out.println(poo); // prints `💩`
System.out.println(poo.replaceAll("\\p{C}", "?")); // prints `??`
System.out.println(poo.replaceAll("\\p{Cntrl}", "?")); // prints `💩`
}
}
The only mention I've found anywhere is here:
\p{C} or \p{Other}: invisible control characters and unused code points.
However, \p{Other} does not seem to exist in Java, and the matching code points are not unused.
My Java version info:
$ java -version
java version "1.8.0_92"
Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.92-b14, mixed mode)
Bonus question: what is the likely intent of the original pattern, "[\\p{C}&&\\S]"? It occurs in a method which validates a string before it is sent in an email: if that pattern is matched, an exception with the message "Invalid string" is raised.
Buried down in the Pattern docs under Unicode Support, we find the following:
This class is in conformance with Level 1 of Unicode Technical Standard #18: Unicode Regular Expression, plus RL2.1 Canonical Equivalents.
...
Categories may be specified with the optional prefix Is: Both \p{L}
and \p{IsL} denote the category of Unicode letters. Same as scripts
and blocks, categories can also be specified by using the keyword
general_category (or its short form gc) as in general_category=Lu or
gc=Lu.
The supported categories are those of The Unicode Standard in the
version specified by the Character class. The category names are those
defined in the Standard, both normative and informative.
From Unicode Technical Standard #18, we find that C is defined to match any Other General_Category value, and that support for this is part of the requirements for Level 1 conformance. Java implements \p{C} because it claims conformance to Level 1 of UTS #18.
It probably should support \p{Other}, but apparently it doesn't.
Worse, it's violating RL1.7, required for Level 1 conformance, which requires that matching happen by code point instead of code unit:
To meet this requirement, an implementation shall handle the full range of Unicode code points, including values from U+FFFF to U+10FFFF. In particular, where UTF-16 is used, a sequence consisting of a leading surrogate followed by a trailing surrogate shall be handled as a single code point in matching.
There should be no matches for \p{C} in your test string, because your test string should be matched as a single emoji code point with General_Category=So (Other Symbol) instead of as two surrogates.
According to https://regex101.com/, \p{C} matches
Invisible control characters and unused code points
(the \ has to be escaped because java string, so string \\p{C} is regex \p{C})
I'm guessing this is a 'hacked string check' as a \p{C} probably should never appear inside a valid (character filled) string, but the author should have left a comment as what they checked and what they wanted to check are usually 2 different things.
Anything other than a valid two-letter Unicode category code or a single letter that begins a Unicode category code is illegal since Java supports only single letter and two-letter abbreviations for Unicode categories. That's why \p{Other} doesn't work here.
\p{C} matches twice on Unicode characters above U+FFFF, such as PILE
OF POO.
Right. Java uses UTF-16 encoding internally for Unicode characters and 💩 is encoded as two 16-bit code units (0xD83D 0xDCA9) called surrogate pairs (high surrogates) and since \p{C} matches each half separately
\p{Cs} or \p{Surrogate}: one half of a surrogate pair in UTF-16
encoding.
you see two matches in result set.
What is the likely intent of the original pattern, [\\p{C}&&\\S]?
I don't see a much valid reason but it seems developer worried about characters in category Other (like avoiding spammy goomojies in email's subject) so simply tried to block them.
As for the Bonus question: the expression [\\p{C}&&\\S] finds control characters excluding whitespace characters like tabs or line feeds in Java. These characters have no value in regular mails and therefore it is a good idea to filter them away (or, as in this case, declare an email content as faulty). Be aware that the double backslashes (\\) are only necessary to escape the expression for Java processing. The correct regular expression would be: [\p{C}&&\S]
Many places on SO lead to the JLS section on Identifiers, but I have a question on what's written there.
The "Java letters" include uppercase and lowercase ASCII Latin letters
A-Z (\u0041-\u005a), and a-z (\u0061-\u007a), and, for historical
reasons, the ASCII underscore (_, or \u005f) and dollar sign ($, or
\u0024). The $ character should be used only in mechanically generated
source code or, rarely, to access pre-existing names on legacy
systems. The "Java digits" include the ASCII digits 0-9
(\u0030-\u0039).
But it goes on to say:
Letters and digits may be drawn from the entire Unicode character set,
which supports most writing scripts in use in the world today,
including the large sets for Chinese, Japanese, and Korean. This
allows programmers to use identifiers in their programs that are
written in their native languages.
I don't understand how these can both be true. The first section seems to dictate exactly which characters are allowed whereas the second section seems to say that the allowance is much more flexible.
I agree that usage of "includes" instead of "includes but is not limited to" shows that it doesn't exactly contradict. But it also first refers specifically to "Java letters"/"Java digits" and then relaxes this to just "letters"/"digits". My main point is lack of clarity and I wanted confirmation on what I assumed it meant.
As per the question Legal identifiers in Java you can see that there are many legal identifiers.
[For languages using the roman alphabet] only alphanumeric characters and occasionally underscores are used when naming identifiers by convention. However, a vast array of characters can be used.
The first paragraph refers to the code-style, or convention, among java programmers to use a reasonably consistent and readable naming scheme. The second paragraph you've quoted explains that there are a vast array of other characters which the JVM will accept - although your fellow programmers may disapprove.
First section is a special case of the second, and characters mentioned in both the sections have to satisfy the criteria mentioned in JLS 3.8 that is missed here,
A "Java letter" is a character for which the method Character.isJavaIdentifierStart(int) returns true.
A "Java letter-or-digit" is a character for which the method
Character.isJavaIdentifierPart(int) returns true.
The above methods accept/verify the code points that correspond to the characters in the entire Unicode character set (Section 2) which includes the Basic-Latin character set (Section 1).
Usually, you will never see anybody going beyond the Basic-Latin character set in their Java source files.
What characters are valid in a Java class name? What other rules govern Java class names (for instance, Java class names cannot begin with a number)?
You can have almost any character, including most Unicode characters! The exact definition is in the Java Language Specification under section 3.8: Identifiers.
An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter. ...
Letters and digits may be drawn from the entire Unicode character set, ... This allows programmers to use identifiers in their programs that are written in their native languages.
An identifier cannot have the same spelling (Unicode character sequence) as a keyword (§3.9), boolean literal (§3.10.3), or the null literal (§3.10.7), or a compile-time error occurs.
However, see this question for whether or not you should do that.
Every programming language has its own set of rules and conventions for the kinds of names that you're allowed to use, and the Java programming language is no different. The rules and conventions for naming your variables can be summarized as follows:
Variable names are case-sensitive. A variable's name can be any legal identifier — an unlimited-length sequence of Unicode letters and digits, beginning with a letter, the dollar sign "$", or the underscore character "_". The convention, however, is to always begin your variable names with a letter, not "$" or "_". Additionally, the dollar sign character, by convention, is never used at all. You may find some situations where auto-generated names will contain the dollar sign, but your variable names should always avoid using it. A similar convention exists for the underscore character; while it's technically legal to begin your variable's name with "_", this practice is discouraged. White space is not permitted.
Subsequent characters may be letters, digits, dollar signs, or underscore characters. Conventions (and common sense) apply to this rule as well. When choosing a name for your variables, use full words instead of cryptic abbreviations. Doing so will make your code easier to read and understand. In many cases it will also make your code self-documenting; fields named cadence, speed, and gear, for example, are much more intuitive than abbreviated versions, such as s, c, and g. Also keep in mind that the name you choose must not be a keyword or reserved word.
If the name you choose consists of only one word, spell that word in all lowercase letters. If it consists of more than one word, capitalize the first letter of each subsequent word. The names gearRatio and currentGear are prime examples of this convention. If your variable stores a constant value, such as static final int NUM_GEARS = 6, the convention changes slightly, capitalizing every letter and separating subsequent words with the underscore character. By convention, the underscore character is never used elsewhere.
From the official Java Tutorial.
Further to previous answers its worth noting that:
Java allows any Unicode currency symbol in symbol names, so the following will all work:
$var1
£var2
€var3
I believe the usage of currency symbols originates in C/C++, where variables added to your code by the compiler conventionally started with '$'. An obvious example in Java is the names of '.class' files for inner classes, which by convention have the format 'Outer$Inner.class'
Many C# and C++ programmers adopt the convention of placing 'I' in front of interfaces (aka pure virtual classes in C++). This is not required, and hence not done, in Java because the implements keyword makes it very clear when something is an interface.
Compare:
class Employee : public IPayable //C++
with
class Employee : IPayable //C#
and
class Employee implements Payable //Java
Many projects use the convention of placing an underscore in front of field names, so that they can readily be distinguished from local variables and parameters e.g.
private double _salary;
A tiny minority place the underscore after the field name e.g.
private double salary_;
As already stated by Jason Cohen, the Java Language Specification defines what a legal identifier is in section 3.8:
"An identifier is an unlimited-length sequence of Java letters and Java digits, the
first of which must be a Java letter. [...] A 'Java letter' is a character for which the method Character.isJavaIdentifierStart(int) returns true. A 'Java letter-or-digit' is a character for which the method Character.isJavaIdentifierPart(int) returns true."
This hopefully answers your second question. Regarding your first question; I've been taught both by teachers and (as far as I can remember) Java compilers that a Java class name should be an identifier that begins with a capital letter A-Z, but I can't find any reliable source on this. When trying it out with OpenJDK there are no warnings when beginning class names with lower-case letters or even a $-sign. When using a $-sign, you do have to escape it if you compile from a bash shell, however.
I'd like to add to bosnic's answer that any valid currency character is legal for an identifier in Java. th€is is a legal identifier, as is €this, and € as well. However, I can't figure out how to edit his or her answer, so I am forced to post this trivial addition.
What other rules govern Java class names (for instance, Java class names cannot begin with a number)?
Java class names usually begin with a capital letter.
Java class names cannot begin with a number.
if there are multiple words in the class name like "MyClassName" each word should begin with a capital letter. eg- "MyClassName".This naming convention is based on CamelCase Type.
Class names should be nouns in UpperCamelCase, with the first letter of every word capitalised. Use whole words — avoid acronyms and abbreviations (unless the abbreviation is much more widely used than the long form, such as URL or HTML).
The naming conventions can be read over here:
http://www.oracle.com/technetwork/java/codeconventions-135099.html
Identifiers are used for class names, method names, and variable names. An identifiermay be any descriptive sequence of uppercase and lowercase letters, numbers, or theunderscore and dollar-sign characters. They must not begin with a number, lest they beconfused with a numeric literal. Again, Java is case-sensitive, so VALUE is a differentidentifier than Value.
Some examples of valid identifiers are:
AvgTemp ,count a4 ,$test ,this_is_ok
Invalid variable names include:
2count, high-temp, Not/ok
Is it possible to declare java attribute name using specials characters, exemple:
private String var/name;
private int one+one;
I ask this question because I need to retrieve data from DB, whose the name of one colomn is annoces/status, And I need to use sql query (not hql or criteria)
It's not possible to do that:
Variable names are case-sensitive. A variable's name can be any legal
identifier — an unlimited-length sequence of Unicode letters and
digits, beginning with a letter, the dollar sign "$", or the
underscore character "_". The convention, however, is to always begin
your variable names with a letter, not "$" or "_". Additionally, the
dollar sign character, by convention, is never used at all. You may
find some situations where auto-generated names will contain the
dollar sign, but your variable names should always avoid using it. A
similar convention exists for the underscore character; while it's
technically legal to begin your variable's name with "_", this
practice is discouraged. White space is not permitted.
I don't understand why you need that though.
Edit
You can use something like bbr.sendQuery("Select Status, Name, Annonces/Sta AS annoncesSta, From table ",MyObject.class); with AS you can change the name of the column that you receive in the result. So your attribute in the java class can be "annoncesSta".
Anyway is wired to have column names with "/", best practices for names are:
AnnoncesSta
announces_sta
And most mappers handle those names automatically.
There is absolutely no need to name the Java variables same as the column names in some table in some database.
If you are trying to store column values in some Map then map keys can have any special character you want.
You can't do what you are asking - as detailed by others. My suggested solution is to sort out the column names in your database so that they follow normal standards and then you can use the same names in java.
By normal standards I mean:
alphanumeric characters or the following special characters: $ _ #
I have the following reg expression that works fine when the user's inputs English.
But it always fails when using Portuguese characters.
Pattern p = Pattern.compile("^[a-zA-Z]*$");
Matcher matcher = p.matcher(fieldName);
if (!matcher.matches())
{
....
}
Is there any way to get the pattern object to recognise valid Portuguese characters such as ÁÂÃÀÇÉÊÍÓÔÕÚç....?
Thanks
You want a regular expression that will match the class of all alphabetic letters. Across all the scripts of the world, there's loads of those, but luckily we can tell Java 6's RE engine that we're after a letter and it will use the magic of Unicode classes to do the rest. In particular, the L class matches all types of letters, upper, lower and “oh, that concept doesn't apply in my language”:
Pattern p = Pattern.compile("^\\p{L}*$");
// the rest is identical, so won't repeat it...
When reading the docs, remember that backslashes will need to be doubled up if placed in a Java literal so as to stop the Java compiler from interpreting them as something else. (Also be aware that that RE is not suitable for things like validating the names of people, which is an entirely different and much more difficult problem.)
It should work with "^\p{IsAlphabetic}*$", that takes into account Unicode characters. For reference see the options in http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Check out the Pattern doc and particularly the section on Unicode:
Unicode blocks and categories are written with the \p and \P
constructs as in Perl. \p{prop} matches if the input has the property
prop, while \P{prop} does not match if the input has that property.
Blocks are specified with the prefix In, as in InMongolian. Categories
may be specified with the optional prefix Is: Both \p{L} and \p{IsL}
denote the category of Unicode letters. Blocks and categories can be
used both inside and outside of a character class.
(for Java 1.4.x). I suspect you're interested in identifying Unicode letters and not particularly Portuguese letters?