For instance
import org.apache.nutch.plugin.Extension,
though used many times,
I've no much idea what is done essentially.
EDIT: Is org.apache.nutch.plugin essentially 4 directories or fewer than 4 like a directory named org.apache?
I think the question you might be trying to ask is, "What are packages in Java, and how does the import keyword relate to them?". Your confusion about directory structures might stem from the fact that some other languages have include directives that use file names to literally include the contents of the specified file in your source code at compile time. C/C++ are examples of languages that use this type of include directive. Java's import keyword does not work this way. As others have said, the import keyword is simply a shorthand way to reference one or more classes in a package. The real work is done by the Java Virtual Machine's class loader (details below).
Let's start with the definition of a "Java package", as described in the Wikipedia article:
A Java package is a mechanism for
organizing Java classes into
namespaces similar to the modules of
Modula. Java packages can be stored in
compressed files called JAR files,
allowing classes to download faster as
a group rather than one at a time.
Programmers also typically use
packages to organize classes belonging
to the same category or providing
similar functionality.
In Java, source code files for classes are in fact organized by directories, but the method by which the Java Virtual Machine (JVM) locates the classes is different from languages like C/C++.
Suppose in your source code you have a package named "com.foo.bar", and within that package you have a class named "MyClass". At compile time, the location of that class's source code in the file system must be {source}/com/foo/bar/MyClass.java, where {source} is the root of the source tree you are compiling.
One difference between Java and languages like C/C++ is the concept of a class loader. In fact, the concept of a class loader is a key part of the Java Virtual Machine's architecture. The job of the class loader is to locate and load any class files your program needs. The "primordial" or "default" Java class loader is usually provided by the JVM. It is a regular class of type ClassLoader, and contains a method called loadClass() with the following definition:
// Loads the class with the specified name.
// Example: loadClass("org.apache.nutch.plugin.Extension")
Class loadClass(String name)
This loadClass() method will attempt to locate the class file for the class with given name, and it produces a Class object which has a newInstance() method capable of instantiating the class.
Where does the class loader search for the class file? In the JVM's class path. The class path is simply a list of locations where class files can be found. These locations can be directories containing class files. It can even contain jar files, which can themselves contain even more class files. The default class loader is capable of looking inside these jar files to search for class files. As a side note, you could implement your own class loader to, for example, allow network locations (or any other location) to be searched for class files.
So, now we know that whether or not "com.foo.bar.MyClass" is in a class file in your own source tree or a class file inside a jar file somewhere in your class path, the class loader will find it for you, if it exists. If it does not exist, you will get a ClassNotFoundException.
And now to address the import keyword: I will reference the following example:
import com.foo.bar.MyClass;
...
public void someFunction() {
MyClass obj1 = new MyClass();
org.blah.MyClass obj2 = new org.blah.MyClass("some string argument");
}
The first line is simply a way to tell the compiler "Whenever you see a variable declared simply as type MyClass, assume I mean com.foo.bar.MyClass. That is what's happening in the case of obj1. In the case of obj2, you are explicitly telling the compiler "I don't want the class com.foo.bar.MyClass, I actually want org.blah.MyClass". So the import keyword is just a simple way of cutting down on the amount of typing programmers have to do in order to use other classes. All of the interesting stuff is done in the JVM's class loader.
For more information about exactly what the class loader does, I recommend reading an article called The Basics of Java Class Loaders
All it's doing is saving you typing. Instead of having to type "org.apache.nutch.plugin.Extension" every time you want to use it, the import allows you to refer to it by its short name, "Extension".
Don't be confused by the word "import" - it's not loading the .class file or anything like that. The class loader will search for it on the CLASSPATH and load it into perm space the first time your code requires it.
UPDATE: As a developer you have to know that packages are associated with directories. If you create a package "com.foo.bar.baz" in your .java file, it'll have to be stored in a directory com/foo/bar/baz.
But when you download a JAR file, like that Apache Nutch library, there are no directories involved from your point of view. The person who created the JAR had to zip up the proper directory structure, which you can see as the path to the .class file if you open the JAR using WinZip. You just have to put that JAR in the CLASSPATH for your app when you compile and run.
Imports are just hints to the compiler telling him how to figure out the full name of classes.
So if you have "import java.util.*;" and in your code you are doing something like "new ArrayList()", when the compiler processes this expression it first needs to find the fully qualified name of the type ArrayList. It does so by going thru the list of imports and appending ArrayList to each import. Specifically, when it appends ArrayList to java.util it get the FQN java.util.ArrayList. It then looks up this FQN in its class-path. If it finds a class with such a name then it knows that java.util.ArrayList is the correct name.
is "org.apache.nutch.plugin" essentially 4 directories?
If you have a class whose name is org.apache.nutch.plugin.Extension, then it is stored somewhere in the classpath as a file org/apache/nutch/plugin/Extension.class. So the root directory contains four nested subdirectories ("org", "apache", "nutch", "plugin") which in turn contain the class file.
import org.apache.nutch.plugin.Extension is a compilation time shortcut that allows you to refer to the Extension class without using the class' fully qualified name. It has no meaning at runtime, it's only a compilation time trick to save typing.
By convention the .class file for this class will be located in folder org/apache/nutch/plugin either in the file system or in a jar file, either of which need to be in your classpath, both at compile time and runtime. If the .class file is in a jar file then that jar file needs to be in your classpath. If the .class file is in a folder, then the folder that is the parent of folder "org" needs to be in your classpath. For example, if the class was located in folder c:\myproject\bin\org\apache\nutch\plugin then folder c:\myproject\bin would need to be part of the classpath.
If you're interested in finding out where the class was loaded from when you run your program, use the -verbose:class java command line option. It should tell you which folder or jar file the JVM found the class.
Basically when you make a class you can declare it to be part of a package. I personally don't have much experience with doing packages. However, afaik, that basically means that you are importing the Extension class from the org.apache.nutch.plugin package.
Buliding off of Thomas' answer, org.apache.nutch.plugin is a path to the class file(s) you want to import. I'm not sure about this particular package, but generally you'll have a .jar file that you add to your classpath, and your import statement points to the directory "./[classpath]/[jarfile]/org/apache/nutch/plugin"
you can't have a directory named org.apache as a package. the compiler won't understand that name and will look for the directory structure org/apache when you import any class from that package.
also, do not mistake the Java import statement with the C #include preprocessor instruction. the import statement is, like they've said, a shorthand for you to type fewer characters when referring to a class name.
Related
I am building an SDK in Java that has a public API and lots of internal 'private' classes. I would like to keep the public classes as public, but restrict the visibility of all the internals. I have to stick to Java 8, so I can't really take advantage of modularity introduced in later Java versions. We all know that in Java (unfortunately) packages are not really hierarchical - for example com.test1.test2 package is not really a sub-package of com.test1 and thus any class declared with package visibility modifier inside com.test1.test2 will not be visible from class declared inside com.test1. I can't really simply put all the classes in a single directory as that would make working with the project a nightmare.
I was wondering if it's possible to keep the file system hierarchy as usual, but declare classes as if they were inside a single package. For example create 2 files like these:
Class1 under path com/test1/test2/Class1.java
package com.test1;
class Class1 {}
Class2 under path com/test1/Class2.java
package com.test1;
class Class2 {}
So that logically, both of these classes would end up under the same package and be accessible from within one another using package visibility modifiers.
I know this is highly unusual and probably not supported by many IDEs, but I gave it a try using plain old javac and as long as I specify each source file by it's full path it compiles and runs just fine. Do you see any technical problems with that, other than (obviously) breaking the 'good practices'. If it makes any difference it is an Android project but written in Java.
If you want to keep two .java source files in separate directories, but having the Java classes belong to the same package, then you need multiple source root directories.
This is e.g. how test code is kept separate from regular code in a standard Maven project.
So, create the files as follows:
src1/com/test1/Class1.java
src2/com/test1/Class2.java
When compiling, have both src1 and src2 on the source path. Since the source path by default is the same as the classpath, that means using e.g. javac -cp src1;src2 -d bin com/test1/Class1.java, which will correctly find and compile Class2 if it is used by Class1. All the compiled .class files are consolidated in the single destination directory (bin in this example).
Having multiple source roots is common (all Maven projects have them, by default), so it is fully supported by IDEs.
All Java projects I have seen use a folder structure that follows the package structure. This results I large number of folders that do not contain any files.
So for example packages start with com.mydomain.mysystem.myutility. This would result in a folders src\com, src\com\mydomain, src\com\mydomain\mysystem that do not contain any files. Most likely the myutility will also only contain only folders.
Most likely there will also be a project folder that contains the name myutility so the complete folder path could be myutility\src\main\java\com\mydomain\mysystem\myutility\otherfolder
This practice is very common but it makes we wonder how useful it is. What is benefit compared to the situation where these extra folders are not created? Using for example myutility\src\main\java\otherfolder
It seems to be just as valid but it saves everybody the extra navigation steps. I can compile Java source files with both approaches just fine.
In a project typically all source is in com\mydomain\mysystem. What is the benefit of putting those 'empty' folders in all projects?
Just to be clear, I am not questioning the usefulness of package structure. Also Maven is clear.
The question is why we use the empty folders that are typically the same throughout the repository for an organisation.
The source (and class) files are organized like that so that the Java compiler (and the runtime environment) can find them.
When the Java compiler compiles your class, it needs the source or class file of each class that your class depends on, so that it can check that the class exists, that all methods are called with the correct arguments, etc. Also, if it find a source file but not class file, or if the class file is older than the source file, it will compile the source file of the class you use.
The compiler could of course just check all subfolders of your class path, or even the entire disk, but that would take a lot of time. Because of this convention the compiler only has to check a single subfolder for each classpath entry. Of course you can think of different solutions to this problem, but the people at (then) Sun thought this was the best option.
Of course, the above also applies to the class files which are loaded at run-time, so also the class files are stored in a similar folder structure.
Note also that Java applications and libraries are often packaged as a Jar file (which basically is a zip file with the same folder structure inside), so in many cases they appear as a single file in the file system.
The reason it is done this way is to prevent conflicts and ensure that classes can be uniquely identified. This means that if 2 classes have the same name they can still be loaded via different imports.
The safest way to do this is by using a domain name, which is by its nature unique:
com.google.<classname> for example.
Your approach will work and saves a few empty folders but is not scalable.
I'm working on some Java code in eclipse. Code is contained in a single class called Adder, which in Eclipse is in the package org.processing. The first thing in the class file is the line
package org.processing
Q1) What, exactly is this line doing? Why is there, what's it's role.
The code runs fine in eclipse, however, when I move into the workspace if I go to the src/org/processing/ folder in src, compile with javac Adder.class when I try and run using java Adder I get the following error
java.lang.NoClassDefFoundError: Adder (wrong name: org/processing/Adder)
On the other hand, if I compile from src using
javac org/processing/Adder.java
and I can run it from src using java org.processing.Adder but STILL not from within the processing directory.
Q2) Does this mean that compilation is always relative to directory structure?
Finally, if I remove the package org.processing line from the start are the .class file I can compile and run from within the .class file's directory.
Q3) Why is all this the way it is? I can fully understand enforcing a directory structure for code development, but once you're in bytecode this seems a bit over the top, because now I can (apparently) only run the bytecode from one director (src) using java org.processing.Adder. Now, I'm sure I'm missing the point here, so if someone could point out what it is, that would be great.
The compiler has to be able to find related source code files when compiling. This is why the package and directory structure must agree for source code. Similarly, the JVM must be able to find referenced .class files. So the same directory structure is required at runtime. It's no more complex than that.
Q1) The issue here is that once you got into the folders that represent your package hierarchy, you set that as the working directory. It's gonna look inside of org/processing/Adder for the path org/processing/Adder (essentially looking from the root for org/processing/Adder/org/processing/Adder). You need to call it from the root with the full path. The purpose of packages is A: to organize related classes into groups. And B: Along with A, classes in package Foo.bar can't view private classes in other packages, as they are like internal classes for that package, only the package they're in can use them
Q2) Yes
Q3) The paths are used as a basic structure for the JVM to know where exactly the class files (each containing their bytecode) are. If you change where you call it from, your basically trying to change the location for the JVM to look for the class files, but their true location hasn't changed.
The short answer - Packages help keep your project structure well-organized, allow you to reuse names (try having two classes named Account), and are a general convention for very large projects. They're nothing more than folder structures, but why they're used can burn beginners pretty badly. Funnily enough, with a project less than 5 classes, you probably won't need it.
What, exactly is this line doing? Why is there, what's it's role.
The line
package org.processing
is telling Java that this class file lives in a folder called /org/processing. This allows you to have a class which is fully defined as org.processing.Processor here, and in another folder - let's say /org/account/processing, you can have a class that's fully defined as org.account.processing.Processor. Yes, both use the same name, but they won't collide - they're in different packages. If you do decide to use them in the same class, you would have to be explicit about which one you want to use, either through the use of either import statements or the fully qualified object name.
Does this mean that compilation is always relative to directory structure?
Yes. Java and most other languages have a concept known as a classpath. Anything on this classpath can be compiled and run, and by default, the current directory you're in is on the classpath for compilation and execution. To place other files on the classpath, you would have to use another command-line invocation to your compilation:
javac -sourcepath /path/to/source MainClass.java
...and this would compile everything in your source path to your current directory, neatly organized in the folder structure specified by your package statements.
To run them, as you've already established, you would need to include the compiled source in your classpath, and then execute via the fully qualified object name:
java -cp /path/to/source org.main.MainClass
Why is all this the way it is?
Like I said before, this is mostly useful for very large projects, or projects that involve a lot of other classes and demand structure/organization, such as Android. It does a few things:
It keeps source organized in an easy-to-locate structure. You don't have objects scattered all over the place.
It keeps the scope of your objects clear. If I had a package named org.music.db, then it's pretty clear that I'm messing with objects that deal with the database and persistence. If I had a package named org.music.gui, then it's clear that this package deals with the presentation side. This can help when you want to create a new feature, or update/refactor an existing one; you can remember what it does, but you can't recall its name exactly.
It allows you to have objects with the same name. There is more than one type of Map out there, and if you're using projects that pull that in, you'd want to be able to specify which Map you get - again, accomplished through either imports or the fully qualified object name.
For Q1: The package declaration allows you to guarantee that your class will never be mistaken for another class with the same name. This is why most programmers put their company's name in the package; it's unlikely that there will be a conflict.
For Q2: There is a one-to-one correspondence between the package structure and the directory structure. The short of it is that directories and packages must be the same, excepting the package is usually rooted under a folder called src.
For Q3: Once it's compiled, the class files will probably be in the appropriate folders in a jar file. Your ant or maven tasks will build the jar file so you won't really have to bother with it beyond getting the ant task set up the first time.
Suppose somewhere I import javax.servlet.http.HttpServlet.
My questions:
Does this mean: I could find a folder structure like javax/servlet/http somewhere and inside that HttpServlet.class file would be present?
If not, where exactly this class file could be found?
Does this mean: These are just nested namespaces with no relevance to folder structures?
Package name in the above mentioned import would be javax.servlet or javax.servlet.http? Probably both are packages and first one is super package of the later one?
How is this class file actually included? I've read import is not like c/c++ include.
Thanks.
yes
see 1.
see 1.
package name is javax.servlet.http
The classloader will locate the class (from its classpath) at runtime
Yes
It could also be in a jar file (in the javax/servlet/http directory)
No (see 1.)
More precisely, it's the parent package
Imports gives access to classes external to the file being compiled. The .class file contains references to the external classes it needs. Constants (final static variables) can be inlined (their values are inserted by the compiler in the code that use them).
1 - Does this mean: I could find a folder structure like javax/servlet/http somewhere and inside that HttpServlet.class file would be present?
In this case, probably not in the file system, per se. (This class is part of the J2SE runtime libraries.)
2 - If not, where exactly this class file could be found?
In a JAR file that is on your JVM's classpath or bootclasspath. The JAR file is an archive containing .class files and other resources. The pathname of the class within the JAR file would be /javax/servlet/http/HttpServlet.class. (In this case the class in the rt.jar file.)
3 - Does this mean: These are just nested namespaces with no relevance to folder structures?
No. If you have file system folders on your classpath, they may be searched to find classes, before or after JAR files, depending on where on the classpath they are. The classpath effectively overlays the namespaces. Namespaces of JAR files can overlay namespaces of file system folders, and vice versa, depending on the effective classpath.
4 - Package name in the above mentioned import would be javax.servlet or javax.servlet.http?
javax.servlet.http
4 continued - Probably both are packages and first one is super package of the later one?
Both are packages, but there is no such thing as a "super package" in Java. As far as the Java language is concerned javax.servlet and javax.servlet.http are unrelated packages. Some people might say that javax.servlet is the parent package of javax.servlet.http, but this statement has no intrinsic meaning from the perspective of the Java language. The apparent parent-child relationship is purely conventional.
5 - How is this class file actually included? I've read import is not like C/C++ include.
The class file is not "included" in any sense. A Java import is little more than a shorthand that allows you to refer to the imported name without qualifying it with its full package name.
Does this mean: I could find a folder
structure like javax/servlet/http
somewhere and inside that
HttpServlet.class file would be
present?
Yes (most likely packaged inside a jar file)
Package name in the above mentioned import would be
javax.servlet or javax.servlet.http?
Probably both are packages and first
one is super package of the later one?
Yes again
How is this class file actually
included? I've read import is not like
c/c++ include.
import packagename.classname and must be before class declaration ex:
import javax.servlet.http.HttpServlet;
A fully qualified classname consists of one package name (the namespace) and the classname. Let's take a simple example:
java.lang.Object
The (simple) classname is Object, the packagename is java.lang. There is a practical recommendation in the JLS to construct a packagename with identifiers separated by dots. This is practical because this way we can map a packagename to a folder structure. The packagename from the above example is mapped to ./java/lang, the fully qualified classname to a file ./java/lang/Object.class for the binary and ./java/lang/Object.java for the source file.
This makes it pretty easy for a classloader to find classfiles on the filesystem. The classloader simply evaluates the namespace (packagename) for the folder and the simple classname for the name of the classfile.
A common misunderstanding is thinking, packagenames are somewhat hierarchical. This is not true. There is no relation between packages com.example.bean and com.example.bean.impl. The first one is not a sort of parent package.
For a slightly higher-level answer: the package name is a way of creating a namespace - there is no hierarchy of namespaces. However, the actual class (using the fully-qualified name, e.g. javax.servlet.http.HttpServlet) needs to be loaded by a ClassLoader.
The bog-standard ClassLoader that is used by the JVM is an instance of java.net.URLClassLoader (well, a subclass). This can look up classes given a starting point of either a directory or a JAR file. The package name is the overlaid over the filesystem structure to get the location of the class.
There are other ClassLoaders out there - most also follow this convention to some degree, but it is just a convention. ClassLoaders can load classes however they choose, including dynamically generating them.
For javax.servlet.http.HttpServlet, you'll probably find the class file inside a jar called something like servlet.jar or j2ee.jar. There's a neat utility called JFind that will help you locate where a class can be found - it's also usually pretty easy within an IDE as well.
For question 5 - as others have mentioned, the import statement simply brings the imported class or package into the local namespace, allowing you to use the convenient short name of HttpServlet instead of using the long name of javax.servlet.http.HttpServlet everytime you need to refer to it. You can program in Java and never use an import statement if you like, though people will probably look at you oddly.
Consider a scenario that a java program imports the classes from jar files. If the same class resides in two or more jar files there could be a problem.
In such scenarios what is the class that imported by the program? Is it the class
with the older timestamp??
What are the practices we can follow to avoid such complications.
Edit : This is an example. I have 2 jar files my1.jar and my2.jar. Both the files contain com.mycompany.CrazyWriter
By default, classes are loaded by the ClassLoader using the classpath which is searched in order.
If you have two implementations of the same class, the one the class loader finds first will be loaded.
If the classes are not actually the same class (same names but different methods), you'll get an exception when you try to use it.
You can load two classes with the same names in a single VM by using multiple class loaders. The OSGI framework can manage lots of the complexitites for you, making sure the correct version is loaded, etc.
First, I assume that you mean that the same class resides in two more jar files...
Now, answering your questions:
Which class is imported is dependent on your classloader and JVM. You cannot guarantee which class it will be, but in the normal classloader it will be the class from the first jar file on your classpath.
Don't put the same class into multiple jar files, or if you are trying to override system classes, use -bootclasspath.
Edit: To address one of the comments on this answer. I originally thought that sealing the jar would make a difference, since in theory it should not load two classes from the same package from different jar files. However, after some experimentation, I see that this assumption does not hold true, at least with the default security provider.
The ClassLoader is responsible for loading the Classes.
It scanns the ClassPath and loads the class that it found first.
If you have the same Jar twice on the ClassPath or if you have two Jars that contain two different versions of the same Class (that is com.packagename.Classname), the one that is found first is loaded.
Try to avoid having the same jar on the classpath twice.
Not sure what you meant by "the same class resides in two more classes"
if you meant inner/nested classes, there should be no problem since they are in different namespaces.
If you meant in two more JARs, as already answered, the order in the classpath is used.
How to avoid?
A package should be in only one JAR to avoid duplicated classes. If two classes have the same simple name, like java.util.Date and java.sql.Date, but are in different packages, they actually are different classes. You must use the fully qualified name, at aleast from one of the classes, to distinguish them.
If you have a problem finding out which version of a class is being used, then jwhich might be of use:
http://www.fullspan.com/proj/jwhich/index.html
If the same class resides in two more jars there should be a problem.
What do you mean exactly? Why should this be a problem?
In such scenarios what is the class that imported by the program? (Class with older timestamp??)
If a class exists in two JARs, the class will be loaded from the first JAR on the class path where it is found. Quoting Setting the class path (the quoted part applies to archive files too):
The order in which you specify multiple class path entries is important. The Java interpreter will look for classes in the directories in the order they appear in the class path variable. In the example above, the Java interpreter will first look for a needed class in the directory C:\java\MyClasses. Only if it doesn't find a class with the proper name in that directory will the interpreter look in the C:\java\OtherClasses directory.
In other words, if a specific order is required then just enumerate the JAR files explicitly in the class path. This is something commonly used by application server vendors: to patch specific class(es) of a product, you put a JAR (e.g. CR1234.jar) containing patched class(es) on the class path before the main JAR (say weblogic.jar).
What are the practices we can follow to avoid such complications.
Well, the obvious answer is don't do it (or only on purpose like in the sample given above).