How do you use java.util.regex.Pattern.Node - java

I need to parse a sequence of Prolog statements and I've been putting together ad-hoc regexs to handle them but the result is not very robust. I noticed java.util.regex.Pattern.Prolog, which is a subclass of java.util.regex.Pattern.Node, but I can't seem to find anything that explains what these classes are for or how to use them. The Javadocs are mostly empty. Are there tutorials or fleshed-out documentation of the purpose and usage of these classes? Can they be used to parse Prolog?

Those classes have package access modifiers. For example, Node, in Oracle JDK 7, is declared as
static class Node extends Object {
They can only be accessed from classes in the same package. However, since that package is typically secured by the JVM, you cannot add your classes to it. You'll get an exception like
Exception in thread "main" java.lang.SecurityException: Prohibited package name: java.util.regex
You can find and copy the source code if you want, but you will not be able to use the classes themselves.
As for their purpose, you have to again go to the source code and look at the comments.
/**
* The following classes are the building components of the object
* tree that represents a compiled regular expression. The object tree
* is made of individual elements that handle constructs in the Pattern.
* Each type of object knows how to match its equivalent construct with
* the match() method.
*/

Related

Distinguishing internal/external methods for Javadoc

I am writing a library in Java. I've divided its implementation into Java packages to help manage the complexity. Only one package contains classes that are visible to clients of the library. However, because only public methods are visible outside of the package for use by other packages of the library, I find myself forced to do one of the following:
(1) Only put interfaces and factory methods in the externally-visible package, putting implementations of those interfaces in a separate package, as described in this SO answer. For example external.MyInterface and internal.MyInterfaceImpl. I find this messy.
(2) Make both internal and external methods public in the external package, and attach Javadoc tags to the internal methods so I can remove their docs prior to publication, either manually (error-prone) or by writing some sort of Javadoc preprocessor or postprocessor.
(3) Use a mechanism that Javadoc provides for this purpose -- ideally, a Javadoc tag.
Whatever the approach, all I really care about is having a consistent way to automatically generate Javadocs for just the external APIs. Is there a standard way to do this? A tool for the purpose?
An alternative solution I've been using for years is to add an #exclude tag, using the public domain code provided in this blog: Implementing #exclude using Dynamic Proxies.
To exclude a Java element (attribute, method, constructor, class, inner class or package) from the Javadoc output, just add the #exclude tag in its Javadoc:
public class MyClass {
/**
* This is my internal attribute, javadoc not exposed.
* #exclude
*/
protected String myInternalAttribute;
/**
* This is my external attribute, javadoc is exposed.
*/
protected String myExternalAttribute;
/**
* This is my internal method, javadoc not exposed.
* #exclude
*/
public void myInternalMethod() { }
/**
* This is my external method, javadoc is exposed.
*/
public void myExternalMethod() { }
}
I found these two answers elsewhere on SO. One approach is to create a custom Javadoc annotation and have an Ant task replace the annotation with deprecated prior to generating the Javadoc. The other, far simpler approach is to use Doxygen's conditional inclusion.
I'm not stuck with Javadoc, so I could go with Doxygen. However, looking at Doxygen right now, it's so different from Javadoc that I'm not sure it's worth the learning curve or establishing a precedent just to be able to generate external APIs.
Here's another solution I will try next time I'm in a position to build: I'll demarcate the portions of the source files that are internal-only, write a tool that duplicates the source files of the external package while removing the portions of the files that are demarcated internal-only, and then run Javadoc off of the generated source. This should work unless Javadoc needs the linker to be happy.
I don't know if it's worth keeping my question around. Might help others find the answer, should they be thinking about it the way I was. Even so, no one has presented a great solution yet.

Java collect all used classes from bytecode

I trying to implement a RemoteClassLoader which copy and load all classes which will be used in runtime. First I need to collect the used Classes, I found a solution:
Find out which classes of a given API are used
but this is not exactly what a need, it collect only the "visible" class usages, just like loading the class and iterating all of declared field and methods, and collecting all types.
I have a class which contains only static methods, instance of this method is not used, so it will be never given to a function or will be a filed, and so I can't see that class.
Naturally the bytecode file contains the name of this class:
strings TestClass.class | grep -i "json"
gives: org/json/JSONObject
And yes that class I search and not fond.
How can I find it? And the others which I use only in functions.
The easiest, albeit conservative method is to simply take all of the Class_info entries from the constant pool. In order to call a method or access a field of a class, there must be a constant pool entry for that class (not counting reflection and not counting overriding methods in subclasses).
There are a number of tools out there that will parse a classfile and give you access to this. Reflection of course is much harder, and in general undecideable.
Edit: This won't include type descriptors, which are just Utf8_infos. If you want to find classes used as types as well, there are two approaches. Either you can go through all the Utf8s and include everything that looks like a descriptor (which may have false positives in rare cases), or you can go through the classfile and find all the type descriptor references.

Case sensitivity of Java class names

If one writes two public Java classes with the same case-insensitive name in different directories then both classes are not usable at runtime. (I tested this on Windows, Mac and Linux with several versions of the HotSpot JVM. I would not be surprised if there other JVMs where they are usable simultaneously.) For example, if I create a class named a and one named A like so:
// lowercase/src/testcase/a.java
package testcase;
public class a {
public static String myCase() {
return "lower";
}
}
// uppercase/src/testcase/A.java
package testcase;
public class A {
public static String myCase() {
return "upper";
}
}
Three eclipse projects containing the code above are available from my website.
If try I calling myCase on both classes like so:
System.out.println(A.myCase());
System.out.println(a.myCase());
The typechecker succeeds, but when I run the class file generate by the code directly above I get:
Exception in thread "main" java.lang.NoClassDefFoundError: testcase/A (wrong name: testcase/a)
In Java, names are in general case sensitive. Some file systems (e.g. Windows) are case insensitive, so I'm not surprised the above behavior happens, but it seems wrong. Unfortunately the Java specifications are oddly non-commital about which classes are visible. The Java Language Specification (JLS), Java SE 7 Edition (Section 6.6.1, page 166) says:
If a class or interface type is declared public, then it may be accessed by
any code, provided that the compilation unit (ยง7.3) in which it is declared is
observable.
In Section 7.3, the JLS defines observability of a compilation unit in extremely vague terms:
All the compilation units of the predefined package java and its subpackages lang
and io are always observable. For all other packages, the host system determines which compilation units are observable.
The Java Virtual Machine Specification is similarly vague (Section 5.3.1):
The following steps are used to load and thereby create the nonarray class or
interface C denoted by [binary name] N using the bootstrap class loader [...]
Otherwise, the Java virtual machine passes the argument N to an invocation of a
method on the bootstrap class loader to search for a purported representation of C
in a platform-dependent manner.
All of this leads to four questions in descending order of importance:
Are there any guarantees about which classes are loadable by the default class loader(s) in every JVM? In other words, can I implement a valid, but degenerate JVM, that won't load any classes except those in java.lang and java.io?
If there are any guarantees, does the behavior in the example above violate the guarantee (i.e. is the behavior a bug)?
Is there any way to make HotSpot load a and A simultaneously? Would writing a custom class loader work?
Are there any guarantees about which classes are loadable by the bootstrap class loader in every JVM?
The core bits and pieces of the language, plus supporting implementation classes. Not guaranteed to include any class that you write. (The normal JVM loads your classes in a separate classloader from the bootstrap one, and in fact the normal bootstrap loader loads its classes out of a JAR normally, as this makes for more efficient deployment than a big old directory structure full of classes.)
If there are any guarantees, does the behavior in the example above violate the guarantee (i.e. is the behavior a bug)?
Is there any way to make "standard" JVMs load a and A simultaneously? Would writing a custom class loader work?
Java loads classes by mapping the full name of the class into a filename that is then searched for on the classpath. Thus testcase.a goes to testcase/a.class and testcase.A goes to testcase/A.class. Some filesystems mix these things up, and may serve the other up when one is asked for. Others get it right (in particular, the variant of the ZIP format used in JAR files is fully case-sensitive and portable). There is nothing that Java can do about this (though an IDE could handle it for you by keeping the .class files away from the native FS, I don't know if any actually do and the JDK's javac most certainly isn't that smart).
However that's not the only point to note here: class files know internally what class they are talking about. The absence of the expected class from the file just means that the load fails, leading to the NoClassDefFoundError you received. What you got was a problem (a mis-deployment in at least some sense) that was detected and dealt with robustly. Theoretically, you could build a classloader that could handle such things by keeping searching, but why bother? Putting the class files inside a JAR will fix things far more robustly; those are handled correctly.
More generally, if you're running into this problem for real a lot, take to doing production builds on a Unix with a case-sensitive filesystem (a CI system like Jenkins is recommended) and find which developers are naming classes with just case differences and make them stop as it is very confusing!
Donal's fine explanation leaves little to add, but let me briefly muse on this phrase:
... Java classes with the same case-insensitive name ...
Names and Strings in general are never case-insensitive in themselves, it's only there interpretation that can be. And secondly, Java doesn't do such an interpretation.
So, a correct phrasing of what you had in mind would be:
... Java classes whose file representations in a case-insensitive file-system have identical names ...
I tried to add or remove a character from one of the class names and it worked. I feel it's always better to use different class names.
Don't think just about folders.
Use explicit different namespaces ("packages") for your classes, and maybe use folders to match your classes.
When I mention "packages", I don't mean "*.JAR" files, but, just the concept of:
package com.mycompany.mytool;
// "com.mycompany.mytool.MyClass"
public class MyClass
{
// ...
} // class MyClass
When you do not specify a package for your code, the java tools (compiler, I.D.E., whatever), assume to use the same global package for all. And, in case of several similar classes, they have a list of folders, where to look for.
Packages are like "virtual" folders in your code, and apply to all your packages on your classpath, or installation of Java. You can have several classes, with the same I.D., but, if they are in different package, and you specify which package to look for, you won't have any problem.
Just my 2 cents, for your cup of Java coffe

Where to put potentially re-useable helper functions?

This is language agnostic, but I'm working with Java currently.
I have a class Odp that does stuff. It has two private helper methods, one of which determines the max value in an int[][], and the other returns the occurrences of a character in a String.
These aren't directly related to the task at hand, and seem like they could be reused in future projects. Where is the best place to put this code?
Make it public -- bad, because Odp's functionality is not directly related, and these private methods are an implementation detail that don't need to be in the public interface.
Move them to a different class -- but what would this class be called? MiscFunctionsWithNoOtherHome? There's no unifying theme to them.
Leave it private and copy/paste into other classes if necessary -- BAD
What else could I do?
Here's one solution:
Move the method that determines te max value in a two-dimensional int array to a public class called IntUtils and put the class to a util package.
Put the method that returns the occurrences of a character in a String to a puclic class called StringUtils and put the class to a util package.
There's nothing particularly bad about writing static helper classes in Java. But make sure that you don't reinvent the wheel; the methods that you just described might already be in some OS library, like Jakarta Commons.
Wait until you need it!
Your classes wil be better for it, as you have no idea for now how your exact future needs will be.
When you are ready, in Eclipse "Extract Method".
EDIT: I have found that test driven development give code that is easier to reuse because you think of the API up front.
A lot of people create a Utility class with a lot of such methods declared as static. Some people don't like this approach but I think it strikes a balance between design, code reuse, and practicality.
If it were me, I'd either:
create one or more Helper classes that contained the methods as static publics, naming them as precisely as possible, or
if these methods are all going to be used by classes of basically the same type, I'd create an abstract base class that includes these as protected methods.
Most of the time I end up going with 1, although the helper methods I write are usually a little more specific than the ones you've mentioned, so it's easier to come up with a class name.
I not know what the other languages do but I have the voice of experience in Java on this: Just move to the end-brace of your class and write what you need ( or nested class if you prefer as that is accepted canonical convention in Java )
Move the file scope class ( default access class right there in the file ) to it's own compilation unit ( public class in it's own file ) when the compiler moans about it.
See other's comments about nested classes of same name if differing classes have the same functionality in nested class of same name. What will happen on larger code bases is the two will diverge over time and create maintainability issues that yield to Java's Name of class as type of class typing convention that forces you to resolve the issue somehow.
What else could I do?
Be careful not to yield to beginner impulses on this. Your 1-2 punch nails it, resist temptation.
In my experience, most large projects will have some files for "general" functions, which are usually all sorts of helper functions like this one which don't have any builtin language library.
In your case, I'd create a new folder (new package for Java) called "General", then create a file to group together functions (for Java, this will just be a class with lots of static members).
For example, in your case, I'd have something like: General/ArrayUtils.java, and in that I'd throw your function and any other function you need.
Don't worry that for now this is making a new class (and package) for only one function. Like you said in the question, this will be something you'll use for the next project, and the next. Over time, this "General" package will start to grow all sorts of really great helper classes, like MathUtils, StringUtils, etc. which you can easily copy to every project you work on.
You should avoid helper classes if you can, since it creates redundant dependencies. Instead, if the classes using the helper methods are of the same type (as kbrasee wrote), create an abstract superclass containing the methods.
If you do choose to make a separate class do consider making it package local, or at least the methods, since it may not make sense for smaller projects. If your helper methods are something you will use between projects, then a library-like approach is the nicest to code in, as mentioned by Edan Maor.
You could make a separate project called utils or something, where you add the classes needed, and attach them as a library to the project you are working on. Then you can easily make inter-project library updates/fixes by one modification. You could make a package for these tools, even though they may not be that unified (java.util anyone?).
Option 2 is probably your best bet in Java, despite being unsatisfying. Java is unsatisfying, so no surprise there.
Another option might be to use the C Preprocessor as a part of your build process. You could put some private static functions into file with no class, and then include that file somewhere inside a class you want to use it in. This may have an effect on the size of your class files if you go overboard with it, of course.

How to determine which classes are referenced in a compiled .Net or Java application?

I wonder if there's an easy way to determine which classes from a library are "used" by a compiled .NET or Java application, and I need to write a simple utility to do that (so using any of the available decompilers won't do the job).
I don't need to analyze different inputs to figure out if a class is actually created for this or that input set - I'm only concerned whether or not the class is referenced in the application. Most likely the application would subclass from the class I look for and use the subclass.
I've looked through a bunch of .Net .exe's and Java .classes with a hex editor and it appears that the referenced classes are spelled out in plaintext, but I am not sure if it will always be the case - my knowledge of MSIL/Java bytecode is not enough for that. I assume that even though the application itself can be obfuscated, it'll still have to call the library classes by the original name?
Extending what overslacked said.
EDIT: For some reason I thought you asked about methods, not types.
Types
Like finding methods, this doesn't cover access through the Reflection API.
You have to locate the following in a Reflector plugin to identify referenced types and perform a transitive closure:
Method parameters
Method return types
Custom attributes
Base types and interface implementations
Local variable declarations
Evaluated sub-expression types
Field, property, and event types
If you parse the IL yourself, all you have to do is process from the main assembly is the TypeRef and TypeSpec metadata, which is pretty easy (of course I'm speaking from parsing the entire byte code here). However, the transitive closure would still require you process the full byte code of each referenced method in the referenced assembly (to get the subexpression types).
Methods
If you can write a plugin for Reflector to handle the task, it will definitely be the easiest way. Parsing the IL is non-trivial, though I've done it now so I would just use that code if I had to (just saying it's not impossible). :D
Keep in mind that you may have method dependencies you don't see on the first pass that neither method mentioned will catch. These are due to indirect dispatch via the callvirt (virtual and interface method calls) and calli (generally delegates) instructions. For each type T created with newobj and for each method M within the type, you'll have to check all callvirt, ldftn, and ldvirtftn instructions to see if the base definition for the target (if the target is a virtual method) is the same as the base method definition for M in T or M is in the type's interface map if the target is an interface method. This is not perfect, but it is about the best you can do for static analysis without a theorem prover. It is a superset of the actual methods that will be called outside of the Reflection API, and a subset of the full set of methods in the assembly(ies).
For .NET: it looks like there's an article on MSDN that should help you get started. For what it's worth, for .NET the magic Google words are ".net assembly references".
In Java, the best mechanism to find class dependencies (in a programmatic fashion) is through bytecode inspection. This can be done with libraries like BCEL or (preferably) ASM. If you wish to parse the class files with your own code, the class file structure is well documented in the Java VM specification.
Note that class inspection won't cover runtime dependencies (like classes loaded using the service API).

Categories